How to AR — Part 4, Technical concepts for quick grasp

Amol Wankhede
5 min readApr 1, 2018

Want to talk a little about AR in cocktail conversation? you are reading the right article. This is the last in series of my 4 articles . In this article, we will get little under the hood AR technology and terminologies. Again, this is very high level overview and goal is not to discussing any API in depth.

To augment the real world with an object, aka virtual object — we basically needs 2 things
a) “Pin” that object to something that a camera (on smartphone) is looking at
b) keep the object planted as the camera with phone moves freely in the environment.
If we can achieve this successfully with least amount of drift and good frame rate, we can achieve a decent AR experience

So here are 3 main pieces of AR experience from technical point of view. Any SDK provides API for these or more. Also, a way to integrate other important libraries to build great experience.

1) Rendering
2) Scene understanding
3) Tracking

To recap — most of the AR app has simple workflow.

The app starts, app opens the camera, then scene is understood. We then place objects and track it for the length of a session. This might sound overwhelming now, but lets look at each and should be easy as we dive into some of these.

Understanding smartphone AR concepts

1.Rendering —
If we start from the top of the stack, any AR app or solution experience needs rendering of camera texture (fancy word for what your camera sees) and virtual objects on top of it.
ARKit uses something called scenekit, metal etc. For Android, you have choice of unreal and unity. Off-course you can use plain old openGL, but its not the preferred choice for good rendering.

Session — Most SDKs and framework for AR provides session based API. ARCore is session based API. Depending on design of the app, you create a session and configure it.

Once Session is established, we get the update of position (device) through frames. Frame is a unit what gets ultimately rendered to the user.

2. Scene Understanding
To be able to place virtual objects and render them correctly, it is very important to understand the environment or scene in which objects will be placed, rendered and tracked.

From above picture — here are few interesting things-

— Plane and plane detection
Most AR sdks give you ability to detect the planes. Horizontal and in some cases vertical. Note that horizontal planes are horizontal with respect to gravity. This plane forms the reference for your AR objects to be placed. The same plane is spanned over multiple frames and when multiple planes are detected, they are merged to give one smooth experience. As a side note, depending on the contrast of the scene and features available for camera in the scene, it might take a while for planes to be detected.

— Anchors points
Once the planes in the scene are detected, you can create a reference point, called anchor with respect to that plane. As name suggest, anchor is point on the plane to which you can anchor your virtual assets. Anchor points holds your objects in the space for that session. ARKit and and ARCore differ here. ARCore give anchor points that are fixed in space and not attached to the plane. ARKit attaches anchors to plane. (at least this was the case with dev preview). Most SDK can support multiple anchors in a session. Number of anchors in a session has a significant performance implications :)

— HIT-Test
Great, so now we have a session, there is a plane in the session. Now, how do you know where to set the anchor? There is something called Hit-Test. Imagine the ray of light coming out of your phone. The point where it hits in the straight line on the plane, that’s your anchor point. The hit is typically when user taps the phone screen where they want to place an object. Most SDK provides a HITResult type object where you put your anchor.

— light estimation
So now, we have a session, we have plane detected and we have an object anchored to it. We want that object to gel well in the scene. For example, if you are in snow, you want your virtual objects to be well lit and bright with appropriate shadows. If you are in dark, you want you objects to be dimly light. Most SDKs provide ambient light estimation which rendering engine will use to render the objects

3. Tracking
Understanding tracking is critical. This is what makes AR an AR. For smartphone AR, the goal is to accurately and constantly find the position of the device in relation to virtual object. What you are really tracking is the position of the device with respect to your surrounding in relation to virtual object.

The details of “world coordinate space” and pose is something I will keep outside the scope of this article. For those curious here is a great explanation.

Most AR sdks and frameworks uses 6DOF tracking . Tracking algorithms have evolved overtime and is rather complicated topic. Some arguably call 6DOF as a sub-branch of SLAM .

Why good tracking is hard?
When object is anchored, every frame is calculating the position & orientation of your device and calculating / updating what the virtual object should look like

There are many factors that come in play for tracking to be smooth and accurate-
a) Surface and lighting condition where tracking algorithm is trying to find interesting points, called features.
b) Device, its CPU speed, operating system, camera focus systems and other hardware capabilities.
c) Most importantly, calibration. At a very high level, calibration is a relative map of device’s camera and IMUs (inertial measurement units). This is important to accurately figure out phone position and orientation. In case of ARCore, there are pre-built calibration files in SDK and this determines what devices are supported.

This post became little longer than I was planning originally. I hope this gives 30k feet view of some of the technical concept for smartphone AR. I personally find this a big leap as addition to the world of modalities in human machine interaction.

I like to prioritize clarification or simplification. Sometimes, effort of simplification leads to stripping down important concepts that are necessary to bring clarity.

Follow for more thoughts on this and other areas. Check out www.amolw.com

--

--