April 18, 2022
Transforming renewable resources to other fuels, such as hydrogen, is one scalable solution to energy challenges posed by climate change. To be widely adopted, however, we need low-cost catalysts to drive the necessary chemical reactions at high rates. Unfortunately, finding new catalysts is a highly time- and resource-intensive process. Conventional methods, for example, allow researchers to computationally evaluate tens of thousands of chemical structures per year — yet there are billions of possible combinations of elements to test.
To address this challenge, Meta AI and Carnegie Mellon University’s (CMU) Department of Chemical Engineering have been collaborating on the Open Catalyst Project, which aims to build machine learning (ML) models that simulate chemical reactions and accelerate the discovery of low-cost catalysts. Historically, a lack of sufficient training data sets has been a roadblock for researchers developing these ML models. As part of this project, we’ve already made progress by open-sourcing OC20, the world’s largest training data set of materials for renewable energy storage.
Today, we’re announcing an entirely new data set focused on oxide catalysts for the Oxygen Evolution Reaction (OER), a critical chemical reaction used in green hydrogen fuel production via wind and solar energy. The OER data set contains ~8M data points from 40K unique simulations. We believe it’s the largest data set for oxide catalysis to date, spanning a swath of oxide materials across 52 elements. It includes interactions between the surfaces of the oxide materials and five important molecules (O, OH, H2O, OOH, and O2) involved in OER, in addition to surface interactions with CO, H, C, and N. It also explores interactions on the surface when crystal defects and multiple molecules are present. The data set and baseline models will be open-sourced in the coming months to help the global scientific community advance renewable energy technologies.
To identify promising catalysts, research scientists use quantum mechanical simulation tools like density functional theory (DFT) to predict adsorption energies of small molecules on potential catalysts. This is a crucial property in determining how effective the catalyst will be. DFT uses quantum mechanics to simulate the movement of atoms in a given scenario, iteratively moving the positions of atoms in the system until they reach their lowest energy configuration, also known as a relaxation. Each relaxation takes hundreds of hours to complete on a multicore machine.
ML can accelerate this process — we can replace DFT simulations that currently take hours or days with ML predictions that take a few seconds. These ML models need to be trained on a data set that matches DFT-predicted configurations or energies. To build our new OER data set, we partnered with experts at CMU to determine the materials included in the data set and to run DFT calculations out of billions of possibilities to create baseline models.
The process of generating this data set required tens of millions of compute hours. The carbon emissions stemming from the compute resources used to generate the data set were committed to be 100 percent offset as part of Meta’s Net Zero program.
Scalable solutions to renewable energy storage are essential to addressing the world’s rising energy needs while slowing climate change.
OER is an important electrochemical reaction for hydrogen production and the intermediate steps involved in that process. Limited by the availability of existing, expensive precious metal oxides, like ruthenium and iridium oxide, researchers’ need for efficient low-cost catalysts for OER has grown more pressing. Our new data set enables researchers to train and build ML models that will quickly identify low-cost oxide catalysts.
Improved catalysts for OER will advance several renewable energy technologies, such as solar and wind fuel production, as well as rechargeable metal-air batteries, a renewable energy storage device that is useful for electric cars.
With this new upcoming open source data set release, we hope to spur scientific progress by helping researchers overcome computational limits of previous methods. More broadly, we hope it will help the computational chemistry community discover promising new materials at scale.
Technical Program Manager