March 4, 2021
Detectron2, released by Facebook AI Research (FAIR) in 2019, gives developers an easy path to plugging custom modules into any object detection system. Today, the Mobile Vision team at Facebook Reality Labs (FRL) is expanding on Detectron2 with the introduction of Detectron2Go (D2Go), a new, state-of-the-art extension for training and deploying efficient deep learning object detection models on mobile devices and hardware. D2Go is built on top of Detectron2, PyTorch Mobile, and TorchVision. It’s the first tool of its kind, and it will allow developers to take their machine learning models from training all the way to deployment on mobile.
Use cases for object detection rely on two key factors — latency (speed) and accuracy. Consider the safety measures for an autonomous vehicle, using object recognition to identify hazards in mining operations, or even creating a seamless augmented reality (AR) experience for people on Instagram. In instances like these, a system not only needs to be able to accurately detect and identify objects, but also do so quickly and efficiently.
The challenge many visions systems face, however, is latency. It takes time for devices using server- or cloud-based models to gather data, send it to the cloud for processing, and then act on it. But if the model can live on the edge, inside the device itself, it greatly reduces this latency.
On-device models also offer additional security and privacy benefits for end users. Similar to speech and natural language processing (NLP) tasks, object recognition carries privacy concerns in the sense that people worry about sensitive data (such as personal images) being sent to the cloud. But with mobile models like the ones developed with D2Go, all the processing is handled on-device.
Detectron2 is a PyTorch-based library designed for training machine learning models to perform image classification and object detection tasks. With the new D2Go extension, developers can take their Detectron2 development one step further and create FBNet models that are already optimized for mobile devices, with architectures that can efficiently perform detection and segmentation tasks. And the models are quantized, meaning they can perform the same tasks as much larger, server-based models with more efficiency and comparable accuracy. In FAIR’s own testing, mobile-based models developed with D2Go showed reduced latency and comparable accuracy to their server-based counterparts. D2Go is built with interoperability with open-source software in mind — giving developers an option to use PyTorch Lightning as their training framework and to leverage the preexisting tools from the community.
Combined with FBNetV3, D2Go provides efficient detection, instance segmentation, and keypoint estimation models that save compute in resource-abundant scenarios and allow resource-limited scenarios to run on-device. D2Go is already being used in Facebook’s own development of computer vision models, specifically within FRL, where having hardware-aware, real-time models is essential for providing a great user experience — Facebook’s 3D Photos feature being one such example.
As part of the open source rollout of D2Go, the FRL Mobile Vision team has released a demo app and a series of tutorials to help developers get started. The first is targeted toward those with less experience with Detectron2 and provides a high-level overview of Detectron2 and D2Go, walking through the basics and how to create an object detector using a custom data set. The second tutorial focuses on how to write a training script to simplify your development flow and how to customize D2Go for your specific needs.
To get started with D2Go and to learn more about it, visit the D2Go GitHub repository.
Thanks to the following for their contributions to this project: Hang Zhang, Yanghan Wang, Xiaoliang Dai, Matthew Yu, Bichen Wu, Tao Xu, Sam Tsai, Peizhao Zhang, Francisco Massa, Jeff Tang, Yuxin Wu, Wan-Yen Lo, Ross Girshick, Kai Zhang, Luis Perez, Vasiliy Kuznetsov, Raghuraman Krishnamoorthi, Matt Uyttendaele, Christian Keller, Gaurav Aggarwal, Donny Greenberg, Vasilis Vryniotis, and Peter Vajda
We’re announcing updates to Facebook’s population density maps, which can be used to coordinate and improve the delivery of humanitarian aid around the world, including global COVID-19 vaccinations.
April 15, 2021
Working with Inria researchers, we’ve developed a self-supervised image representation method, DINO, which produces remarkable results when trained with Vision Transformers. We are also detailing PAWS, a new method for 10x more efficient training.
April 30, 2021