Responsible AI

Making federated learning faster and more scalable: A new asynchronous method

May 4, 2022

People increasingly care about how their data is used. Privacy-preserving ML techniques aim to enable the AI-powered services that people love while maintaining the privacy of their data. In the past, we shared the Opacus framework for training PyTorch models with differential privacy and the CrypTen framework for secure multiparty computation in PyTorch. Today, we are excited to share more about our efforts around scaling another privacy-enhancing technology, federated learning.

Federated learning (FL) is an important tool for preserving the privacy of people’s data when training AI models. It enables practitioners to train their models without requiring that people’s data ever leave their devices. For example, FL can train models that recognize voice commands without recording and transmitting audio clips into the cloud. However, the existing state-of-the art approach — synchronous FL — has a significant shortcoming: Learning can deliver results only as fast as the slowest device, seriously degrading efficiency when scaling to millions of devices.

Today, we describe a novel asynchronous approach to FL that circumvents these problems. We believe we are running the first asynchronous FL system at scale, training a model on 100 million Android devices. With the ability to scale to millions of devices, FL can make significant leaps in training speed and improve efficiency. Our results show that our asynchronous FL system is five times faster than synchronous federated learning and can achieve the same result (a high-accuracy model) with eight times less communication than the previous method.

While synchronous FL preserves privacy, it is substantially less efficient than training models in the cloud because it can only move as fast as the slowest device. In one synchronous round, a subset of clients (mobile devices) would download the current model from the server and then cooperate to send the server an update that improves the model. These synchronous rounds require the cooperation of every client in the round, so that a round can only be completed (and the next round started) once all participating clients respond. In practice, this means that progress moves at the pace of the slowest clients — the stragglers.

To address the straggler problem in synchronous FL, it is common to start training on more clients than needed and then to drop the slowest ones from the round. For example, the round might start with 1,300 clients participating, and the server will finish the round after receiving 1,000 responses — dropping the slowest 300 clients. This leads to wasted work, however, and can also result in less accurate predictions on data from dropped clients, since their updates do not get reflected in the model.

Leaving no stragglers behind with an asynchronous approach

In asynchronous FL, there is no longer a concept of a fixed group of devices participating in each round. Instead, whenever a device is available for training, it downloads the current version of the model from the server, computes the update, and transmits an update to the server once it has finished. The server receives updates from individual clients until it has the requisite number needed, and then it adjusts the model based on the combined updates it has received. This way, responses are still aggregated before being revealed to the server, so privacy is preserved.

Another difference from synchronous FL is that there is no predefined notion of rounds. Instead, slower clients may still send their updates later in asynchronous FL, but their updates may be based on stale information. Our theoretical analysis takes the impact of staleness into account and suggests strategies to reduce its effects, such as down-weighting stale information and bounding the maximum staleness allowed.

In building this new asynchronous method, we knew we had to use secure aggregation, a crucial privacy-enhancing technology commonly used in FL systems. Secure aggregation uses cryptographic protocols to ensure that the server and all participating clients receive only a response aggregated over a group of devices and never from any individual device.

Previously, there was no known way to implement secure aggregation when performing asynchronous operations, because all parties participating in secure aggregation needed to be determined in advance. Our approach introduces a novel asynchronous secure aggregation procedure leveraging trusted execution environments (TEEs) — specialized secure hardware running in the cloud. Our implementation provides security guarantees while limiting the amount of data that needs to pass through the TEE. This enables rapid processing of updates and prevents the TEE from becoming a bottleneck.

Speeding toward a more private future

This advancement yields several improvements that will benefit us as we move into the future and help build the metaverse. With improved speed and efficiency, asynchronous FL will enable engineers to iterate faster on model development and evaluation. This speedup also makes it possible to deploy models trained using FL based on more recent, fresher data⁠, which will significantly impact features and applications that benefit from recent trends.

There’s also a potential environmental benefit. The reduction in communication overhead makes it possible to train models that make high-quality predictions using data from fewer individuals, reducing the carbon footprint of FL training.

Finally, by incorporating stale updates from slower devices, we show that models trained using asynchronous FL can be more accurate. We found our method’s predictions on data from slower devices was 47 percent more accurate than models trained with synchronous FL. These benefits show that asynchronous FL can help us scale and improve user experience while still protecting the privacy of people’s data.

Learn more about our approach to asynchronous FL by downloading our simulation framework and reading our papers.

Federated Learning with Buffered Asynchronous Aggregation

Papaya: Practical, Private, and Scalable Federated Learning

Written By

John Nguyen

Software Engineer

Dzmitry Huba

Software Engineer

Kshitiz Malik

Software Engineer

Mike Rabbat

Research Scientist