July 06, 2022
In this position paper for the DataPerf2022 workshop at ICML2022 (paper 0891), we share our considerations for an end-to-end Data-Centric AI infrastructure vision to implement Artificial Intelligence (AI). AI is trained and evaluated using datasets that undergo various changes as part of their lifecycle (privacy, drift, errors, transformations, etc). Data-Centric AI Infra helps practitioners understand and iterate on datasets for ML Models. By adopting Data-Centric AI infrastructure our customers could improve their model performance through faster, resource efficient access to AI data. We hope to connect the scientific community with the AI data problems faced in a real production environment at an exabyte scale.