Reengineering Facebook AI’s deep learning platforms for interoperability

December 21, 2020

AI is used at Facebook today in scores of different ways, from providing intelligent shopping recommendations to detecting harmful content to translating text to generating automated captions. We’ve built several deep learning platforms so we can iterate quickly on new modeling ideas and then seamlessly deploy them at scale. These platforms, such as ClassyVision, Fairseq, and PyText, provide end-to-end solutions for configuring and training models without requiring code changes. Config file goes in, trained model comes out. But they also tend to rely on custom abstractions and tight coupling between components. This limits their flexibility and reusability in other projects. It can also impede the work of power users and researchers who want to find new ways to combine elements of different platforms and extend them to novel use cases.

To overcome these challenges, Facebook AI is working to reengineer our platforms to be more modular and interoperable. We will split out reusable components into standalone libraries and provide backward-compatible entry points for end-to-end training from configs. To further increase standardization across communities, we are leveraging Facebook’s open source Hydra framework to handle configs. And we will offer an integration with PyTorch Lightning, a lightweight, open source Python library, to organize components and manage the training loop. We believe standardization will help the AI community that’s using our libraries so they can simply extend and integrate with other powerful training frameworks, such as Catalyst, fast.ai, and Ignite, among others.

By bringing us closer to having a single set of standardized, interchangeable building blocks across many different AI subfields, we will be able to make faster progress, build on others’ research and engineering innovations, and find new ways to use AI.

The evolution of content understanding

Historically, people working on content understanding problems have always been split across several communities, such as computer vision and natural language processing. These communities used to be quite separate, each developing its own tools tailored to a different use case. Deep learning has brought them closer together than ever before, however. Work that applies ideas from one field into another, like DETR, was very hard to do before deep learning.

That said, deep learning’s role in enabling closer collaboration between different communities will go only as far as the tools allow. Improving our tools to bring about increased productivity and democratization is a journey that we at Facebook AI have been on for years. We started building tools on top of LUA Torch and Caffe2, back when each new deep learning project required significant engineering investment. In that world, it made the most sense to have an API built around providing a single config file to produce a trained model without requiring that users write any code. This was a huge time-saver as well as a democratization tool to make content modeling with deep learning much more accessible.

Things have changed considerably since then. Thanks to PyTorch, developing a machine learning project has become much faster (and much more pleasant!). Today’s tools, such as FairSeq, PyText, and ClassyVision, allow you to build and train advanced models that were just not doable a few years ago on older platforms, such as DeepText, and they do it while having much more readable and maintainable code!

This progress has really shown the limits of our config-first design.

First of all, the rapid pace of progress in AI means that more and more ideas are experimented with every day, requiring more and more options to be added to configs — sometimes even making them larger than the code they were meant to replace.

Other consequences of this development model are more subtle: Building a platform with a single entry point brings little incentive to make components modular and reusable; it can be hard for even expert users to onboard to a new platform, because names and locations of common components often vary widely; and different platforms end up reinventing some wheels, often in substantially different ways. With different frameworks relying on different config systems, importing one another’s components becomes essentially impossible.

For these reasons, we think the time is right to bring about the next generation of our content understanding platforms so that cross-pollination is encouraged rather than discouraged by our tools. Note that we do not believe that a megaplatform covering all use cases would be the right answer here; rather, we are going to keep our platforms separate but will be sharing the same design philosophy as well as the key interfaces so that components can be mixed and matched by clients. We are starting with ClassyVision, Fairseq, and PyText, and we’ll explore additional opportunities to extend this work.

The future: Extensible libraries

Facebook AI is modularizing its platforms by separating them into two parts: a library and multiple entry points that import the library and use it. This means that clients can still enter with a configuration file if they prefer. But having a modular library will now also enable a new set of power users to be much more productive and to integrate platforms with code coming from other libraries in their projects, as well as make interactive development in notebooks much more natural. Indeed, a source of inspiration for this effort was fast.ai, whose notebook experience has been first class from the start.

This is only half the solution. We must also make sure that components are interoperable across platforms. Models originally made for one platform should be usable for others. This cross-pollination between research communities, which requires aligning to a shared, general API and electing to share as many components as possible, is a good thing. Ideas that have been successful in computer vision, such as pretraining, have carried over to NLP. Ideas from NLP, such as the Transformer layer, are increasingly being applied to computer vision problems successfully.

For this reason, we have decided to factor out two components from our platforms: the training loop and the config system. We have chosen to offload these to PyTorch Lightning and Hydra, respectively. These tools work well together and combine ease of use with the ability to customize just about anything.

PyTorch Lightning

PyTorch Lightning neatly separates the science from the engineering in a deep learning project, allowing for increased standardization and automation. The recently released version 1.0 brings in a stable API that the team is committed to supporting moving forward.

In collaboration with the Lightning team, we are happy to announce that we will be adding PyTorch Lightning compatibility to the platforms mentioned above and will be adopting abstractions, such as Callbacks, DataModule, and LightningModules. Not only will this reduce cognitive overhead when switching between platforms, but it will also encourage increased modularity. Furthermore, by building on a framework like PyTorch Lightning, our tools can more easily share common functionality, such as checkpointing, quantization, and scripting. Users will also retain the option to have custom training loops for specific use cases.

Hydra

Facebook AI’s open source Hydra framework lets users compose and override configurations in a type-safe way (validated against user-provided schemas). Hydra also offers abstractions for launching to different clusters and running sweeps and hyperparameter optimization without changes to the application’s code. This greatly reduces the need for boilerplate code and allows researchers and engineers to focus on what really matters.

What’s next

Making our libraries work with these external components will do more than just help accelerate Facebook AI’s work. It will also act as a forcing function to spur development of more modular tools and convergence on a unified code style and organizational system for complex projects.

Other specialized software communities — web developers, for example — have spent years perfecting their frameworks and converging on a shared style and project organization across subfields. This has made it much easier for community members to use and reuse different components and tools across a large developer ecosystem.

It may take a little bit longer for the community to build what might be thought of as the React of machine learning: a go-to library for everyone who works in the field, so it’s trivially easy to extend one’s ability to build AI systems from one domain to others. But by bringing researchers and engineers from different AI subfields closer together, we can make knowledge sharing more accessible, both inside Facebook and across the field of AI.