Research

Meta AI Research explores new public fairness benchmarks for computer vision models

February 28, 2022

An overview of three fairness indicators explored by Meta AI Research for diagnosing fairness issues in computer vision systems.

In order to build socially responsible AI systems that work well for everyone, we need effective ways to diagnose potential fairness issues in both existing systems and the new ones that will power future technology. To move the AI research community towards more standardized fairness audits for computer vision systems, Meta AI Research is exploring new public fairness indicators to quantitatively assess three well-documented types of potential harms and biases in computer vision models. These fairness indicators complement existing approaches to responsible AI development, such as data / model documentation, and are specifically designed to adapt and evolve as research advances and new approaches are proposed.

The fairness indicators we are exploring are applicable to a broad range of computer vision systems irrespective of whether these systems are designed to predict labels or only generate the embeddings on a given image. Using these indicators, researchers across the AI community can measure and monitor key elements of fairness across their systems (and in particular, how they impact marginalized populations).

Swift progress in AI often requires tools that are available to everyone working in the field. So we are publishing details of this approach, along with experimental protocols, guidance, and code for automated fairness assessments using these explored indicators. By making our work public, we hope to foster collaboration across the entire AI community to develop a more comprehensive, ever-growing set of fairness assessments. As we and others work to advance the field, we believe this flexible approach, where data sets and metrics can be added as they are collected and developed, will move the industry closer to developing computer vision systems that work well for everyone. Researching new approaches to measuring fairness in AI will help inform our own work here at Meta so we can expand and improve the systems we have in place today.

A systematic and flexible way of quantitatively measuring fairness in CV

Computer vision technologies offer a wide range of benefits, ranging from helping people with low vision to improving medical imaging. But important societal issues have come to surface as new computer vision systems are deployed at large scale. Motivated by these societal concerns, we developed three fairness indicators based on publicly available datasets for fairness evaluation. These indicators allow measuring the frequency of concerning mistakes in a controlled setting and for various populations. These indicators can be applied to any feature extractor, so other practitioners can not only test their systems but compare performance with others’ work.

As detailed in this paper, our indicators allow auditing for three types of fairness issues:

  • Harmful label associations, where images of people are mistakenly assigned a label that is offensive, derogatory, or reinforces stereotypes.

  • Differences in performance on images from different places and populations across the world.

  • Differences in a pretrained model's learned representations of specific features -- in particular ones can be used to predict social and demographic labels.(For example, misgendering darker-skinned women.)

To illustrate how our indicators help in measuring progress towards fairer computer vision models, we performed an audit of some publicly available visual feature extractors. Our experiments study the effect of model size, data scale and training paradigm (self-supervised vs weakly supervised vs supervised learning) on the fairness indicators. Our findings suggest that large self-supervised models trained on large amounts of diverse, real, and unfiltered internet data show the most promise towards building fairer models compared to systems trained on highly curated object centric datasets such as ImageNet.

Nevertheless, significant discrepancies remain for even the best-performing models today. We hope our results will help the industry move to systematically incorporate fairness considerations as a core component of computer vision development.

Collaborating to build AI systems responsibly

There is no universal definition of fairness, and whether a system is fair cannot be summarized by just a few numbers. Nevertheless, we believe that reproducible and widely applicable assessments are critical to motivate and measure progress towards building fairer models.

Our goal is to catalyze research to define clear guidelines – within standardized protocols – that can continue to be applied as new data sets and models are collected. With time, the set of available fairness assessments will become more comprehensive and standardized frameworks will facilitate systematic fairness assessment of new models.

We believe that collaboration between the entire community will enable faster progress -- between researchers who develop new fairness protocols, as well as computer vision researchers who would systematically assess fairness as a foundational standard similar to the measurement of computational complexity or accuracy.

We invite the community to apply these benchmarks to their models and also to contribute new fairness tests or data sets. By working together we can continue to iterate on benchmarks to better spot and address fairness gaps in AI.

Read the paper: Fairness indicators for systematic assessments of visual feature extractors and get the code

We’d like to thank Adriana Romero-Soriano, Research Scientist at Meta, Caner Hazirbas, Research Scientist at Meta, and Levent Sagun, Research Scientist at Meta for their contributions to this project.

Written By

Priya Goyal

Software Engineer

Nicolas Usunier

Research Scientist