Towards measuring fairness in AI: the Casual Conversations dataset

April 8, 2021


This paper introduces a novel dataset to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of age, genders, apparent skin tones and ambient lighting conditions. Our dataset is composed of 3,011 subjects and contains over 45,000 videos, with an average of 15 videos per person. The videos were recorded in multiple U.S. states with a diverse set of adults in various age, gender and apparent skin tone groups. A key feature is that each subject agreed to participate for their likenesses to be used. Additionally, our age and gender annotations are provided by the subjects themselves. A group of trained annotators labeled the subjects’ apparent skin tone using the Fitzpatrick skin type scale. Moreover, annotations for videos recorded in low ambient lighting are also provided. As an application to measure robustness of predictions across certain attributes, we provide a comprehensive study on the top five winners of the DeepFake Detection Challenge (DFDC). Experimental evaluation shows that the winning models are less performant on some specific groups of people, such as subjects with darker skin tones and thus may not generalize to all people. In addition, we also evaluate the state-of-the-art apparent age and gender classification methods. Our experiments provides a through analysis on these models in terms of fair treatment of people from various backgrounds.

Download the Paper


Written by

Caner Hazirbas

Joanna Bitton

Brian Dolhansky

Jacqueline Pan

Albert GordoCristian Canton Ferrer

Related Publications

June 14, 2020

Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA | Facebook AI Research

Many visual scenes contain text that carries crucial information, and it is thus essential to understand text in images for downstream reasoning tasks. For example, a deep water label on a warning sign warns people about the danger in the…

Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach

June 14, 2020

April 25, 2020

Decoupling Representation and Classifier for Long-Tailed Recognition | Facebook AI Research

The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem.…

Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis

April 25, 2020

April 25, 2020

Permutation Equivariant Models for Compositional Generalization in Language | Facebook AI Research

Humans understand novel sentences by composing meanings and roles of core language components. In contrast, neural network models for natural language modeling fail when such compositional generalization is required. The main contribution of…

Jonathan Gordon, David Lopez-Paz, Marco Baroni, Diane Bouchacourt

April 25, 2020

September 15, 2019


Who Needs Words? Lexicon-Free Speech Recognition | Facebook AI Research

Lexicon-free speech recognition naturally deals with the problem of out-of-vocabulary (OOV) words. In this paper, we show that character-based language models (LM) can perform as well as word-based LMs for speech recognition, in word error…

Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

September 15, 2019

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.