AI-enhanced AR Experiment


An experiment in AI-enhanced AR – it can automatically recognize common objects via a neural-net and attempt to clone them in-situ from an online library of 3D models.

More details and examples on my twitter feed: https://twitter.com/ben_ferns/status/961930799900540928

I’m using Unity3D’s early Tensorflow support https://github.com/Unity-Technologies/ml-agents/ and the approach they use to provide ‘observations’ (images from a game camera) to provide images from an actual camera.

I then found a pre-trained model trained on the COCO (Common Objects in Context) dataset which recognizes 80 objects and returns rough bounding boxes. I used a Mobilenet SSD net for performance. Finding exactly the right inputs for this was a bit tricky without documentation – a lot of the AI OSS world is still very academic-focused and assumes experience with their tech stacks.

I’m running ARCore/ARKit for the AR tracking and plane detection. Most of the time spent getting this running actually went on placement and scaling (and avoiding placing the same object multiple times, especially as the neural-net switches labels occasionally). I could have not bothered solving those problems just to make the demo video, but I wanted to see if they were insurmountable, and how that could impact UX decisions in similar AR apps.

I made the same decision with performance – I wanted to get it close to realtime because without the instant visual feedback the sense of magic is lost, even if its technically the same process. I ended up putting the tensorflow interactions on a separate thread and only running the model once every 2 seconds, and parsing the Poly assets over a series of frames (can’t be threaded as generating assets must be on the Unity main thread). The TF model I use takes between 400ms and 1000ms to return, which I think can be improved.

Poly have a great API and docs https://developers.google.com/poly/develop/ , I really love what google are doing here, and their focus on stylized almost ‘wireframe’ 3D rather than realistic 3d, as its very achievable to build a huge, broadly useful library at this quality.

If I have time, I’d very much like to port this whole dumb thing to WebXR (using google cloud Automl as the AI component) and use the Poly web API.

Leave a comment