New progress in using AI to detect harmful content

November 13, 2019

Written byMike Schroepfer

Written by

Mike Schroepfer

Share

Today we issued our fourth Community Standards Enforcement Report, documenting how we are enforcing the policies that keep people safe on our platforms. We have been making consistent progress in increasing the effectiveness of our AI systems to detect harmful content. One of the big drivers of these improvements is recent advances in self-supervision. These allow us to train our systems using larger data sets without being limited by hand-annotated data sets. These techniques, combined with larger models, are allowing us to build systems that better understand subtle and complex nuances.

These problems are far from solved, and the adversarial nature of these challenges means the work will never be done. But these advances are helping us do more to protect our community now, and we believe they will help us continue to improve in the months and years to come.

Self-supervised language models

Recent advances in self-supervision have allowed us to break through the limits of small, hand-labeled data sets. These techniques have worked particularly well with language models, as seen in Google’s work on BERT and our recent refinements of these approaches .

This approach is especially helpful for difficult tasks like identifying hate speech because of the nuanced understanding of language that is required. The Community Standards Enforcement Report published today shows how we’ve improved our proactive detection of hate speech — increasing the amount of content we took action on from 4.1 million pieces in Q1 2019 to 7 million in Q3 2019.

Self-supervision for low-resource languages

Self-supervision is useful for improving language models and enabling us to build classifiers that understand concepts across multiple languages at once. Our XLM method uses a single shared encoder to train a large amount of multilingual data, generating sentence embeddings that work across a range of languages. This enables us to better understand concepts like hate speech across languages and, most crucially, it means that training done in one language, such as English, can improve the quality of the classifier across other languages. We currently use AI to proactively detect hate speech in 40 languages, and we are exploring new methods to extend our automatic detection capabilities to more languages and with greater accuracy.

A more holistic approach to content

When you look at a post on Facebook, you consider the picture, text, and comments as part of one unified thing: a post. Until recently, most of our classification systems looked at each part of a post separately on two dimensions: content type and violation type. One classifier would look at the photo for violations of our nudity policy, another would look for violence, and so on. A separate set of classifiers might look at the text of the post or the comments. To get a more holistic understanding of the post, we created Whole Post Integrity Embeddings (WPIE), a pretrained universal representation of content for integrity problems. WPIE works by understanding content across modalities, violation types, and even time. Our latest version is trained on more violations, with greatly increased training data. The system improves performance across modalities by using focal loss, which prevents easy-to-classify examples from overwhelming the detector during training, along with gradient blending, which computes an optimal blend of modalities based on their overfitting behavior.

Just as cross-lingual pretraining can improve a classifier’s overall performance, WPIE learns across many dozens of violation types in order to develop a much deeper understanding of content.

These new tools have been deployed into production here at Facebook, where they’ve substantially improved the performance of our integrity tools. On Facebook, for example, we removed about 4.4 million pieces of drug sale content in Q3 2019, 97.6 percent of which we detected proactively. This is a substantial increase from Q1 2019, when we removed about 841,000 pieces of drug sale content, 84.4 percent of which we detected proactively. (More details are available in the full report here.)

These example posts show how our AI systems are better able to interpret multimodal content (in this case, the sale of illegal drugs).

AI isn’t our only answer to harmful content, but it is allowing us to adapt more quickly, broadly, and effectively to address these challenges as we work to provide the safest communication platform possible. With advances in self-supervision being deployed in systems such as WPIE and XLM, we’ve seen how pushing the state of the art in AI research allows us to build better production tools to detect harmful content.

As excited as I am about these advances, we are far from done.

Written by

Mike Schroepfer

Chief Technology Officer