ARKit or CARKit?

I watched the intro to ARKit session today and the whole time I’m thinking: That’s great for a car. That’s great for a car. It wasn’t just my gut telling me this. The talking points that Apple brought up are all up to par with the feature set you want out of a car.

Mike Buerli, one of the Apple engineers who ran the ARKit session described the tech as “the illusion that virtual objects are placed in the physical world […] based on what your camera sees.” But why? So you can re-model your home? What about games, you say?

Maybe I’m wrong, but I also bet that ARKit powered games, especially on iPad, won’t be great at long immersive play. Quick encounters in Pokémon Go? Sure. Pokémon Blue in the real world? Nope. I’d get tired holding up an iPad for too long, the same way a touch screen Mac has been maligned in the past. It’s tiresome, and just not a great interface for consumption of large amounts of useful data. I think the key part of that quote is “based on what your camera sees.”

ARKit or CARKit?

If augmented reality is cool for people, it’s downright useful for machines. Let’s pretend we’re writing some autonomous car software.

We probably want to map out the terrain, so we can move the car safely. We’re going to want to make sure other drivers on the road don’t collide with our drivers, and you’re going to want to be able to adapt to new environments based on movement, and light. We’re going to want to be able to respond quickly.

ARKit provides a partial solution for this, with what Apple is calling “World Tracking.” Buerli describes some of this functionality as one of the three components of ARKit:

“With World Tracking we provide you the ability to get your devices relative position in the physical environment,” he said. He also stressed orientation as a huge benefit of using ARKit. ARKit is about where your device, and by extension, you are in the world. Your device can be your phone or your car.

So this is already getting us part of the way there. We can read the terrain with ARKit and we can tell where you are. What about movements and low light?

Having a Vision

The Vision framework isn’t just a fresh take on Apple’s previous software offerings for face detection. Fewer false positives, smaller faces, occlusion detection, and strong profiles (such as the side of a face) were all things Apple tauted in their Vision framework session. (You know that game people play where they pretend the fronts of cars look like faces?)

I imagine each of these features directly correspond to a challenge one could conceivable face while developing autonomous car software. Accuracy is important if you want to detect an approahcing car in the distance, moving at highway speeds.

Apple wouldn’t just re-write something to make it better. There’s usually a deeper, long term motivation involved. Metal enabled a lot of this years technology. This kind of wheel re-inventing means that something is going on, and I think that thing is putting the pieces together for the now confirmed car.

Fewer false positives and smaller faces deal with the vanishing point. Occlusion detection and strong profiles deal with stop signs and intersections. (Cars often approach at 45 degree angles.) These pain points sound exactly like the kinds of problems I’d want to have solved.

Detour(s) Ahead

During the ARKit session, Apple outlined some of the scenarious where ARKit won’t work well, and they all sound like impediments to shipping a safe system: Low light, temporarily blocked cameras, drift. I think that’s where machine learning can help at least little bit.

Low light is a problem around the world. Can Apple make car software work with an alternate band of light, such as infrared? Is a design goal of the project to work with stock hardware that can easily spread adoption across partner manufacturers around the world? That might be a reason not to use another kind of light, and make it work with stock components.

Temporarily blocked cameras are going to be a problem on any camera-based system. It’s the same thing as someone waving something in front of your face or blindfolding you. It happens, and that’s a reason not to rely on cameras 100% for driving. That said, I don’t think this is a problem worth solving.

Lastly, Apple said that objects in an ARKit scene can start to drift of they are moving. That would seemingly pose a problem for detecting other drivers. This is why I think a combination of ARKit and Vision are going to be critical to the success of a car project. ARKit for terrain and world. Vision for tracking nonstationary obstacles.

Thoughts on Hardware

Apple has made some interesting hardware advances recently, and I think that one is notable in particular, given how accurate ARKit is with just a single camera. The dual lens on iPhone 7 Plus isn’t just great for photography, but it also more closely resembles how our brain processes depth. That’s potentially a component of this whole picture as well.

The contrast to the one or two camera system is the panoramic 360 degree cameras we’ve seen on almost every self-driving car prototype to be secretly photographed outside someone’s backyard in Silicon Valley. If Apple has such accurate terrain mapping capable of running in real time on an A9, what are those contraptions on cars doing?

What about CoreML?

I think that machine learning is going to be useful for complementing all of the vision based technology. I wouldn’t rely on machine learning if the cameras go out, but it might be useful for learning traffic patterns or roadwork patterns. There are a number of interesting applications surrounding driving habits as well.

I initially thought that machine learning might be useful for kicking in when a car’s cameras fail. That could be, but even familiarity with the terrain doesn’t guarantee that another crazy driver isn’t passing through on a perpendicular road at 80 miles an hour.

From Here To There: Apple Car

Augmented Reality. Vision. CoreML. Are you getting it yet? These aren’t three devices. It’s one device and we’re calling it Apple Car.