Detectron2: A PyTorch-based modular object detection library

October 10, 2019

Written by Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick

Written by

Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick

Share

Since its release in 2018, the Detectron object detection platform has become one of Facebook AI Research (FAIR)’s most widely adopted open source projects. To build on and advance this project, we are now sharing the second generation of the library, with important enhancements for both research and production use. It is available here.

Detectron2 is a ground-up rewrite of Detectron that started with maskrcnn-benchmark. The platform is now implemented in PyTorch. With a new, more modular design, Detectron2 is flexible and extensible, and able to provide fast training on single or multiple GPU servers. Detectron2 includes high-quality implementations of state-of-the-art object detection algorithms, including DensePose, panoptic feature pyramid networks, and numerous variants of the pioneering Mask R-CNN model family also developed by FAIR. Its extensible design makes it easy to implement cutting-edge research projects without having to fork the entire codebase.

We built Detectron2 to meet the research needs of Facebook AI and to provide the foundation for object detection in production use cases at Facebook. We are now using Detectron2 to rapidly design and train the next-generation pose detection models that power Smart Camera, the AI camera system in Facebook’s Portal video-calling devices. By relying on Detectron2 as the unified library for object detection across research and production use cases, we are able to rapidly move research ideas into production models that are deployed at scale.

Something Went Wrong
We're having trouble playing this video.

This video shows different types of object detection tasks done with Detectron2.

We’re sharing Detectron2 because open source research platforms are critical to the rapid advances in AI made by the entire community, including researchers and practitioners in academia and industry. We hope that releasing Detectron2 will continue to accelerate progress in the area of object detection and segmentation.

Improvements in Detectron2

PyTorch: The original Detectron was implemented in Caffe2. PyTorch provides a more intuitive imperative programming model that allows researchers and practitioners to iterate more rapidly on model design and experiments. Because we’ve rewritten Detectron2 from scratch in PyTorch, users can now benefit from PyTorch’s approach to deep learning as well as the large and active community that continually improves PyTorch

Modular, extensible design: In Detectron2, we’ve introduced a modular design that allows users to plug custom module implementations into almost any part of an object detection system. This means that many new research projects can be written in hundreds of lines of code with a clean separation between the core Detectron2 library and the novel research implementation. We continue to refine the modular, extensible design by implementing new models and discovering new ways in which we can make Detectron2 more flexible.

Something Went Wrong
We're having trouble playing this video.

This graphic shows how Detectron2’s modular design allows users to take an image and easily switch to custom backbones, insert different prediction heads, and perform panoptic segmentation.

New models and features: Detectron2 includes all the models that were available in the original Detectron, such as Faster R-CNN, Mask R-CNN, RetinaNet, and DensePose. It also features several new models, including Cascade R-CNN, Panoptic FPN, and TensorMask, and we will continue to add more algorithms. We’ve also added features such as synchronous Batch Norm and support for new datasets like LVIS.

New tasks: Detectron2 supports a range of tasks related to object detection. Like the original Detectron, it supports object detection with boxes and instance segmentation masks, as well as human pose prediction. Beyond that, Detectron2 adds support for semantic segmentation and panoptic segmentation, a task that combines both semantic and instance segmentation.

Implementation quality: Rewriting Detectron2 from the ground up allowed us to revisit low-level design decisions and address several implementation issues in the original Detectron.

Speed and scalability: By moving the entire training pipeline to GPU, we were able to make Detectron2 faster than the original Detectron for a variety of standard models. Additionally, distributing training to multiple GPU servers is now easy, making it much simpler to scale training to very large data sets.

Detectron2go: Facebook AI’s computer vision engineers have implemented an additional software layer, Detectron2go, to make it easier to deploy advanced new models to production. These features include standard training workflows with in-house data sets, network quantization, and model conversion to optimized formats for cloud and mobile deployment.

Accelerating AI research and engineering for all

Progress in AI is a community effort that includes individuals, large and small labs, academia, and industry. The problems we aim to solve go far beyond what any individual or group can achieve in isolation. For this reason, we believe strongly in sharing code that enables reproducible research, rapid experimentation, and development of new ideas. By releasing Detectron2, we hope to further accelerate research in the areas of object detection, segmentation, and human pose understanding.

New research starts with understanding, reproducing, and verifying previous results in the literature. With Detectron2, we aim to provide high-quality reference implementations for many state-of-the-art algorithms in order to democratize this phase of the research process.

The library’s modular design also enables researchers to implement new projects with clean separation from standard detection library functionality. As an example, Mesh R-CNN, FAIR’s recent work on predicting per-object instance 3D meshes from 2D images, was developed in Detectron2. Detectron2’s modular design enabled the researchers to easily extend Mask R-CNN to work with complex data structures representing 3D meshes, integrate new data sets, and design novel evaluation metrics.

Detectron2 can be easily shared between research-first use cases and production-oriented use cases. Because the library is built in PyTorch, new models can be implemented rapidly and then transferred to production.

Building technology to enable the next CV breakthroughs

Our goal with Detectron2 is to support the wide range of cutting-edge object detection and segmentations models available today, but also to serve the ever-shifting landscape of cutting-edge research. Novel research by definition involves inventing new models that will likely break the design assumptions of existing models. This meant building software to support a nearly unspecifiable set of requirements while also making it as simple as possible to use. We expect to continually develop and refine Detectron2 in service of this objective. With the library now available to the wider ML community, we look forward to collaborating with and learning from others as we push the limits of what’s possible in computer vision systems.

We’d like to acknowledge the contributions of Xinlei Chen, Jing Huang, Vasil Khalidov, Yanghao Li, Jon Morton, Sam Pepose, Ria Verma, Yanghan Wang, Peizhao Zhang, and others who helped build Detectron2.

Written by

Yuxin Wu

Research Engineer

Alexander Kirillov

Research Scientist

Francisco Massa

Research Engineer

Wan-Yen Lo

Research Engineering Manager

Ross Girshick

Research Scientist