ML APPLICATIONS

Using AI to detect COVID-19 misinformation and exploitative content

May 12, 2020

The COVID-19 pandemic is an incredibly complex and rapidly evolving global public health emergency. Facebook is committed to preventing the spread of false and misleading information on our platforms. Misinformation about the disease can evolve as rapidly as the headlines in the news and can be hard to distinguish from legitimate reporting. The same piece of misinformation can appear in slightly different forms, such as as an image modified with a few pixels cropped or augmented with a filter. And these variations can be unintentional or the result of someone’s deliberate attempt to avoid detection. Furthermore, it is also important to avoid miscategorizing legitimate content as misinformation, because it could prevent people from being able to express themselves on our platforms.

AI is a crucial tool to address these challenges and prevent the spread of misinformation, because it allows us to leverage and scale the work of the independent fact-checkers who review content on our services. We work with over 60 fact-checking organizations around the world that review content in more than 50 languages. Since the pandemic began, we’ve used our current AI systems and deployed new ones to take COVID-19-related material our fact-checking partners have flagged as misinformation and then detect copies when someone tries to share them.

In addition to detecting misinformation, our AI systems are helping us with other challenges related to the pandemic. We have built new computer vision classifiers to help enforce our temporary ban of ads and commerce listings for medical face masks and other products. Because people sometimes modify their ads for these products to try to sneak them past our systems, we are also using local feature-based instance matching to find these instances of manipulated media at scale. In many cases, we can then take action proactively — before anyone has even flagged it to us.

During the month of April, we put warning labels on about 50 million pieces of content related to COVID-19 on Facebook, based on around 7,500 articles by our independent fact-checking partners. Since March 1, we’ve removed more than 2.5 million pieces of content for the sale of masks, hand sanitizers, surface disinfecting wipes and Covid-19 test kits. But these are difficult challenges, and our tools are far from perfect. Furthermore, the adversarial nature of these challenges means the work will never be done. In this blog post, we are focusing on some of our work in computer vision, but addressing these problems requires an extensive toolkit of AI technologies, such as multimodal content understanding. We have much more work to do, but we are confident we can build on our efforts so far, further improve our systems, and do more to protect people from harmful content related to the pandemic.

Using AI to scale fact-checkers’ work against misinformation

These examples show near-exact copies of a piece of misinformation, the latter being a screenshot of the former.

Any person can easily tell these images are nearly identical. In fact, at a glance it might even be hard to see the differences. Computer vision systems can also struggle to detect these matches with certainty because while the content is identical, the pixels are not. It’s extremely important that these similarity systems be as accurate as possible, because a mistake can mean taking action on content that doesn’t actually violate our policies. The example below shows a very similar version that should not be classified as misinformation.

This image is very similar to the ones above, but its banner doesn’t contain misinformation about the virus.

When a piece of content is rated false by our independent fact-checking partners, we reduce its distribution and show warning labels with more context. (More details are available here.) As we've noted previously, these warning labels are an extremely effective tool to deal with misinformation. When people were shown labels warning that a piece of content contained misinformation, 95 percent of the time they did not go on to view that content.

SimSearchNet, a convolutional neural net–based model built specifically to detect near-exact duplicates, is now helping us do this work more effectively. Once independent fact-checkers have determined that an image contains misleading or false claims about coronavirus, SimSearchNet, as part of our end-to-end image indexing and matching system, is able to recognize near-duplicate matches so we can apply warning labels.

This is particularly important because for each piece of misinformation fact-checker identifies, there may be thousands or millions of copies. Using AI to detect these matches also enables our fact-checking partners to focus on catching new instances of misinformation rather than near-identical variations of content they’ve already seen.

Life cycle of images as they get matched against a database of certified misinformation

SimSearchNet is based on a multiyear collaboration by Facebook AI researchers, engineers, and many others across the company. It builds on years of computer vision research at Facebook — in particular, on building compact representations that allow us to index and quickly search photos at scale.

It also leverages the same large-scale matching infrastructure that is used to detect other harmful content. This system runs on every image uploaded to Instagram and Facebook and checks against task-specific human-curated databases. This accounts for billions of images being checked per day, including against databases set up to detect COVID-19 misinformation.

Stopping the sale of COVID-19 products even when people try to avoid detection

Since the crisis began, we’ve worked to protect people from those trying to exploit this emergency for financial gain. To help us better detect and remove ads for products such as medical face masks, hand sanitizer, surface-disinfecting wipes, and COVID-19 testing kits, we’ve deployed a system that leverages image-level local features to find altered ads. This helps us proactively prevent advertisers that are trying to circumvent our enforcement by bypassing our AI-based screening system.

Something Went Wrong
We're having trouble playing this video.

These example images show how people adjust their images to try to avoid detection.

We maintain an object-level database extracted from COVID-19-related ads that violate our policies and then apply instance matching to check images in new ads. This local feature-based solution allows us to better detect manipulated ads with spliced objects, making it more robust to common adversarial modification tactics like cropping, rotation, occlusion, and noise. The system based on instance matching results is now auto-rejecting ads.

We’ve also used instance matching for data augmentation in other downstream ads integrity systems for COVID-19 enforcement. For example, taking cropped images of face masks that we’ve detected in ads, we used instance matching to identify diverse samples of other images of medical face masks. This augmented data set was used to retrain our policy-enforcing ads-level classifier and make it more generalizable to alterations. By using examples detected by the ads-level classifier, we’re able to prevent the distribution of more than 10 times as many policy-violating ads for masks as the matching solution alone.

Quickly training vision models for Marketplace

When people sell things through Marketplace, they use images with very different backgrounds, camera angles, details, and overall quality. This can make it more difficult for vision models to recognize items than in, say, images from catalog photos taken by a professional photographer using plain backgrounds.

Over the years, we’ve leveraged various domain adaptation techniques in order to deploy hundreds of classification and object-detection models that perform well in these challenging real world conditions. Lessons from these efforts led us to invest in building a platform on top of PyTorch that allows us to quickly train and deploy classifiers/detectors on demand for new classes in images and videos. This platform leverages Facebook AI’s groundbreaking work on training state-of-the-art backbones on billions of hashtagged photos. It also employs data augmentation techniques that allow us to bootstrap models with limited amounts of data while still catering to the diversity seen in Marketplace product photos.

After the coronavirus crisis began, we utilized this platform to train and deploy classifiers for medical face masks, hand sanitizers, and surface-disinfecting wipes. We first collected public photos of these products and then fine-tuned and augmented this data set. To increase precision, we also added thousands of “negative” images of items that a model might mistake for a face mask. After training and offline evaluation, we deployed the concept to our production inference platform and retroactively applied it to Marketplace images. These models are now running globally on new Marketplace listings.

We plan to continue investing in both the platform and working to improve the above models, especially as the feedback loop presents us with more data. These signals will also be used by downstream multimodal classifiers, which aim to look holistically at a marketplace post level.

Doing more to detect misinformation and harmful content

The problems of misinformation and attempts to sell prohibited items did not start with the COVID-19 pandemic. To address these and other challenges, Facebook has made long-term investments in researching visual reasoning systems and multimodal understanding, developing new self-supervised learning techniques, and building deep learning platforms that allow us to move quickly from research to production at scale.

We’ve seen how cutting-edge research from a few years ago is already helping us do better in production today. We are confident that we can take new research techniques and tools and use them to better protect people on our platforms.

Written By

Roshan Sumbaly

Engineering Manager

Mahalia Miller

Product Manager

Hardik Shah

Research Scientist

Yang Xie

Research Scientist

Sean Chang Culatana

Research Scientist

Tim Khatkevich

Software Engineer

Enming Luo

Research Scientist

Emanuel Strauss

Engineering Manager

Gergely Szilvasy

Software Engineering Manager

Manika Puri

Research Science Manager

Pratyusa Manadhata

Software Engineer

Benjamin Graham

Research Scientist

Matthijs Douze

Research Scientist

Zeki Yalniz

Research Scientist

Hervé Jegou

Director, Research Scientist