An Operational Metrics Framework for ML Data

August 01, 2022


Maintainable, high quality, rapidly built, scalable ML datasets have been fundamental for multiple AI production applications that we have worked on. How have we gone about building these ML datasets in a systematic way? Our approach has included defining a set of operational metrics for ML data. Our framework for organizing those metrics focuses on goals that we have: time to launch, effect on model performance, properties of the data, data quality, and tracking dataset and historical changes. In each area, we have defined more detailed metrics and created operational processes to track them. Through disciplined tracking, we have seen the benefits of ML dataset improvements on ML performance improvements in diverse examples.

Download the Paper


Written by

Anoop Sinha

Gunveer Gujral

Liz Jenkins

Nicolas Scheffer


ICML 2022 Workshop on DataPerf

Research Topics

Core Machine Learning

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.