Research

Hateful Memes Challenge and dataset for research on harmful multimodal content

May 12, 2020

We’ve built and are now sharing a dataset designed specifically to help AI researchers develop new systems to identify multimodal hate speech. This content combines different modalities, such as text and images, making it difficult for machines to understand.
The Hateful Memes dataset contains 10,000+ new multimodal examples created by Facebook AI. We licensed images from Getty Images so that researchers can use the dataset to support their work. We are also releasing the code for baseline-trained models.
We are also launching the Hateful Memes Challenge, a first-of-its-kind online competition hosted by DrivenData with a $100,000 total prize pool. The challenge has been accepted as part of the NeurIPS 2020 competition track.

In order for AI to become a more effective tool for detecting hate speech, it must be able to understand content the way people do: holistically. When viewing a meme, for example, we don’t think about the words and photo independently of each other; we understand the combined meaning together. This is extremely challenging for machines, however, because it means they can’t just analyze the text and the image separately. They must combine these different modalities and understand how the meaning changes when they are presented together. To catalyze research in this area, Facebook AI has created a dataset to help build systems that better understand multimodal hate speech. Today, we are releasing this Hateful Memes dataset to the broader research community and launching an associated competition, hosted by DrivenData with a $100,000 prize pool.

The challenges of harmful content affect the entire tech industry and society at large. As with our work on initiatives like the Deepfake Detection Challenge and the Reproducibility Challenge, Facebook AI believes the best solutions will come from open collaboration by experts across the AI community.

We continue to make progress in improving our AI systems to detect hate speech and other harmful content on our platforms, and we believe the Hateful Memes project will enable Facebook and others to do more to keep people safe.

Building a dataset for hateful multimodal content

The Hateful Memes dataset consists of more than 10,000 newly created examples of multimodal content. The memes were selected in such a way that strictly unimodal classifiers would struggle to classify them correctly (as illustrated in the examples below). We also designed the dataset specifically to overcome common challenges in AI research, such as the lack of examples to help machines learn to avoid false positives. It covers a wide variety of both the types of attacks and the groups and categories targeted. (More information on the dataset is available in this paper.)

The Hateful Memes dataset contains real hate speech. So instead of showing actual memes from the dataset, we are instead showing only merely mean examples here. In each of these sample memes, the text phrase and the image are innocuous when considered by themselves. The semantic content of the meme becomes mean only when the text phrase and image are considered together.

We started with real examples of hateful memes that were shared online. We then built the dataset to address the needs of the AI research community.

To provide researchers with a dataset with clear licensing terms, we licensed assets from Getty Images. We worked with trained third-party annotators to create new memes similar to existing ones that had been shared on social media sites. The annotators used Getty Images’ collection of stock images to replace the original visuals while still preserving the semantic content. For example, if the original meme had a photo of a desert, we picked a similar desert photo from Getty. If no suitable replacement image could be found, the meme was discarded. We’d like to thank Getty for its help and partnership.

Annotators reviewed each meme to make sure it required multimodal understanding. As with the merely mean examples shown in this blog post, considering just the text or image in isolation would mean missing the meme’s true meaning.

Something Went Wrong

We're having trouble playing this video.

Learn more

This flowchart shows how the dataset was created.

In an effort to prevent potential misuse, we are restricting access to the dataset. Only researchers will be able to view or use the memes. Participants will need to agree to the terms of use on how they will use, store, and handle the data. There are also strict restrictions on sharing the data.

To make sure the classification decisions are actionable, we created the examples in the Hateful Memes dataset using a clear and specific definition of hate speech:

“A direct or indirect attack on people based on characteristics, including ethnicity, race, nationality, immigration status, religion, caste, sex, gender identity, sexual orientation, and disability or disease. We define attack as violent or dehumanizing (comparing people to non-human things, e.g., animals) speech, statements of inferiority, and calls for exclusion or segregation. Mocking hate crime is also considered hate speech.”

Specially trained annotators classified the meme examples using this definition, which mirrors Facebook’s Community Standards. This in turn will help ensure that systems trained with Hateful Memes data will work effectively in real-life production applications.

Our examples also cover a wide variety of protected categories (such as religion, gender, and sexual orientation) and types of attacks (such as inciting violence or portraying types of people as criminals or terrorists). The distribution in the dataset reflects the real-world distribution found in the original examples.

The dataset also contains multimodal memes that are similar to hateful examples but are actually harmless. These examples, known as benign confounders, will help researchers address potential biases in classification systems and build systems that avoid false positives.

Something Went Wrong

We're having trouble playing this video.

Learn more

This graphic shows how different text and images can be swapped in to create benign confounder examples to train our systems.

The challenge of multimodal AI

Multimodal content, such as the memes in Hateful Memes dataset, is difficult to classify with machine learning algorithms, as decisions are often more subtle than in unimodal cases and require real-life context and common sense.

This graphic illustrates three types of tasks for understanding content. Only the middle category requires early fusion in order to understand the overall meaning.

To address this challenge, the research community is focused on building tools that take the different modalities present in a particular piece of content and then fuse them early in the classification process. This approach enables the system to analyze the different modalities together, like people do.

Something Went Wrong

We're having trouble playing this video.

Learn more

Early-fusion systems combine the different modalities before attempting to classify the content. This enables them to better detect hateful content even if the image or text on its own is not hateful.

This approach contrasts with late-fusion systems, which are easier to build but less effective at understanding complex multimodal content.

Something Went Wrong

We're having trouble playing this video.

Learn more

Late-fusion multimodal systems classify content for each modality before attempting to fuse the results.

Using the Hateful Memes dataset, we established baselines for the community using several well-known model architectures. We tested two unimodal systems and several multimodal systems. In late fusion, we trained the models separately and averaged their two scores during inference to get a prediction. In mid-fusion, we concatenated the BERT and ResNet-152 representations and fed them into a two-layer classifier (ConcatBERT). Finally, we used several BERT-derived architectures that fuse image and text understanding earlier in the process: a supervised multimodal bi-transformer model (MMBT), and state-of-the-art self-supervised (ViLBERT and Visual BERT). In addition to sharing the benchmark data, we are releasing the code for these models here through MMF, a multimodal framework, written in Facebook’s PyTorch deep learning library.

Spurring progress through the Hateful Memes Challenge

We’re pleased to be able to work with DrivenData to launch the Hateful Memes Challenge, which invites participants to create models trained on the Hateful Memes dataset. Both the challenge and the dataset launch today. More information for researchers is available here, and the accompanying paper is available here. For researchers considering attending NeurIPS in December, we’re also pleased to announce that the Hateful Memes Challenge has been accepted for the NeurIPS 2020 competition track. Journalists can request access to sample content in order to review the dataset and its characteristics. (The competition is subject to official rules, available here. See the competition website for eligibility, entry dates, submission requirements, evaluation metrics, and prizing.)

Like spam and other adversarial challenges, the problem of hateful content will continue to evolve. By providing a dataset expressly made to help researchers tackle this problem, along with a common benchmark and a community competition, we are confident that the Hateful Memes Challenge will spur faster progress across the industry in dealing with hateful content and help advance multimodal machine learning more broadly.