February 17, 2021
We are sharing a new benchmark for continual learning (CL), a means for improving upon traditional machine learning (ML) methods by training AI models to mimic the way humans learn new tasks. In CL, an AI model applies knowledge from previous tasks to solve new problems, rather than restarting its training from scratch every time. We expect that CL models will require less supervision, sidestepping one of the most significant shortcomings of modern AI systems: their reliance on large human-labeled data sets.
But developing effective CL models comes with its own challenges. When we fundamentally change how we train ML models, we must also change how we evaluate and compare them. Traditional ML models are generally measured by their accuracy after training on a given task, while CL models have to be evaluated over multiple dimensions: 1) how well they transfer knowledge between tasks, 2) their ability to retain previously learned skills (avoiding the so-called catastrophic forgetting problem), and 3) how they scale to a very large number of tasks. Until now, there have been no effective standard benchmarks for evaluating CL systems across these axes.
Our research, conducted in collaboration with Sorbonne University, proposes two components vital to the development of effective CL systems: a set of general properties that make up an effective CL learner and a standard benchmark, called CTrL, for evaluating CL systems. We’re making CTrL publicly available for the first time to help the community progress on this research problem. We have also open-sourced a new CL model, MNTDP, that offers superior performance on a variety of benchmarks, including our own.
Our primary contribution is identifying general properties — beyond avoiding catastrophic forgetting — required by an effective CL learner. For instance, the model’s memory consumption must grow slowly as tasks are added, or the learner must make more accurate predictions when it observes new examples that are related to tasks it tackled earlier.
Using these principles, we created CTrL to benchmark how efficiently CL models transfer knowledge from one task to the next and scale to a larger number of tasks. CTrL is based on the idea of comparing the performance of a model on the same task in two conditions: when the task is learned in isolation versus when it’s learned after the observation of a sequence of potentially related tasks.
By examining the model in both settings, CTrL can help us evaluate the amount of knowledge transferred from the sequence of observed tasks. This allows us to successfully evaluate the model’s ability to transfer to similar tasks. Our benchmark proposes numerous streams of tasks to assess multiple dimensions of transfer, as well as a long sequence of tasks for assessing the ability of CL models to scale. You can access CTrL online here.
We also propose a new model that works well on both standard CL benchmarks and the newly introduced CTrL. When this model — which we call Modular Networks with Task-Driven Priors (MNTDP) — confronts a new task, it determines which previously learned modules can be applied as well as which new modules are required to solve the task. The more similar the current task is to a previous task, the more MNTDP will share modules.
MNTDP leverages a task-driven prior used to limit the search space over the possible ways to combine modules. This drastically reduces computation while yielding better transfer quality. You can access MNTDP here and reproduce the results of our paper.
People take their ability to learn from past experiences for granted, but we cannot do that with AI systems. Being able to use CL to train an AI like you’d train a human is an exciting idea and bypasses many of the bottlenecks associated with classic ML approaches. ML models are typically trained in isolation, requiring a single model for each new task, and have been able to achieve high performance only with massive data sets in supervised settings or through billions of interactions in reinforcement learning settings.
If the models can instead continuously learn from one task to the next, we can drastically reduce the amount of labeled training data to train them. This, in turn, reduces the time and resources needed to build new models and opens the door to personalized systems better suited to meeting people’s needs — like a conversational agent that learns by talking to people, or a speech recognition system that continuously adapts to new idioms and changing circumstances.
We’ve long been committed to the principles of open science, and this research is still in its early stages, so we’re excited to work with the community on advancing CL. We are releasing the CTrL benchmark alongside the MNTDP model in hopes that it will help others reproduce our research and test their own CL systems going forward.