Generating Biographies

Meta AI develops a novel dataset and model to help bring more representation to Wikipedia.

THE OPPORTUNITY

Reducing Gender Bias on Wikipedia With AI

Wikipedia is often the first website many people visit when looking for biographical information about important figures, but not everyone is represented equally on the site. Only 20 percent of biographies on Wikipedia are about women. This imbalance can have far-reaching consequences. Wikipedia has long been used as a source of data in natural language processing (NLP) tasks, and this gender bias can affect machine learning models trained using the site. On a more human level, this bias can impact young students who are looking through Wikipedia to learn about history and choose subjects for their class assignments.

With Generating Biographies, artificial intelligence can serve as a starting point for Wikipedia article editors who are working to reduce bias and bring more representation to the site. The model generates biographies for marginalized communities, focusing on women in science, women in Asia, and women in Africa.

Angela Fan

Research Scientist, Meta AI "When I was in school, I wanted to write a biography about Eleanor Roosevelt, and I remember thinking, 'Okay, there's a lot of books, but there's mainly books about men.' That stayed with me throughout life."

Alex Sirac

Localization Editor, Meta “Everyone can edit Wikipedia, everyone can bring their contribution. And that's the whole strength of the platform, as much as it is a way to have testimonials of how the world was, is, and will be.”

Why It Matters

Our researchers understand the importance of having accurate, high-quality information available online, from high school students needing to write reports for class to NLP models being trained on Wiki articles. But when most Wikipedia biographies are about men, women and non-binary people are diminished despite their enormous impact throughout history. That’s why the Generating Biographies team has open-sourced an AI model that automatically creates biographical articles about important real-world public figures, along with a novel dataset to evaluate model performance on real biographies of women from historically marginalized groups. Our team hopes this will enable other researchers to push the model forward so AI-generated entries can be used as a starting point for human writers to publish more biographies of underrepresented groups.

How It Works

Generating Biographies is a model that searches websites for accurate information and drafts a Wikipedia-style entry about that person, complete with citations. The method starts with the subject and occupation of the biography, leveraging web searches to find relevant evidence. A retrieval-augmented generation architecture is then employed based on large-scale pre-training to identify relevant information and generate the biography.

After each generated sentence, a citation is appended based on which web searches were retrieved. This citation module is what builds the bibliography linking back to the sources that were used. The process repeats with each section predicting the next, covering all of the elements that make up a biography so the generated article looks like a real Wikipedia article. A novel dataset of Wikipedia biographies about women is then used to evaluate the quality of the generated text.

DID YOU KNOW

In 2017, 41 percent of biographies nominated for deletion on Wikipedia were about women

Generating Biographies Researchers

Angela Fan and Claire Gardent

Other Impact Projects

Breakthroughs in speech recognition AI

Through our vision and research in ML, we developed wav2vec, a way to build speech recognition systems that require no transcribed data.

New ways to store renewable energy with AI

Facebook AI and Carnegie Mellon University’s Department of Chemical Engineering have joined to collaborate on the Open Catalyst Project.

Help Us Pioneer the Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.