Announcing MMF: A framework for multimodal AI models

June 11, 2020

Pythia, our open source, modular deep learning framework for vision and language multimodal research, is now called a multimodal framework (MMF). As part of this change, we are rewriting major portions of the library to improve usability for the open source community and adding new state-of-the-art models and datasets in vision and language. MMF has starter code for several multimodal challenges, including the Hateful Memes, VQA, TextVQA, and TextCaps challenges. Learn more on the MMF website and on GitHub.

New features include performance and UX improvements, new state-of-the-art BERT-based multimodal models, new vision and language multimodal models, pretrained model zoo, automatic downloads, and a revamped configuration system based on OmegaConf. Rewriting the library has allowed us to make it highly modular, which enables researchers to easily include different individual MMF components. MMF is intended to help researchers develop adaptive AI that synthesizes multiple kinds of understanding into a more context-based, multimodal understanding. This work is extremely challenging for machines because they can’t analyze the text and the image separately. They must combine these different modalities and understand how the meaning changes when they are presented together.

Something Went Wrong

We're having trouble playing this video.

Learn more

Earlier this month, we provided starter code and baselines for the recent Hateful Memes Challenge, a first-of-its-kind online competition hosted by DrivenData through MMF. As part of that challenge, we also shared a new dataset designed specifically to help AI researchers develop new systems to identify multimodal hate speech. In addition to this open source release, we plan to continue adding tools, tasks, datasets, and reference models. We look forward to seeing how the open source community uses and contributes to MMF.

Our Work

Computer Vision

ML Applications

Advancing AI to make shopping easier for everyone

It’s been ~1 year since we launched our breakthrough product recognition system, GrokNet. Learn how we’ve scaled and improved our AI tech to make shopping easier with new applications, like product match and AI-assisted tagging on Facebook. Soon, we’ll bring visual search to Instagram so that people can find similar products just by tapping on an image.

June 22, 2021

Product experiences