Combining online and offline tests to improve News Feed ranking

April 03, 2019

Written byBen Letham and Eytan Bakshy

Written by

Ben Letham and Eytan Bakshy


What the research is:

A/B testing is an important part of the product improvement cycle for machine learning (ML) technologies. But applying advanced techniques such as Bayesian optimization to optimize these systems can be challenging due to resource limitations. We propose a new multitask Bayesian optimization approach that combines observations from online A/B tests with a simple offline simulator. This model allows for jointly optimizing up to 20 parameters with as few as 40 total online A/B tests. We use the model to improve our News Feed ranking by identifying new system configurations to be tested online. We also provide an empirical analysis of its generalization behavior.

How it works:

The Facebook News Feed, like many recommender systems used at Facebook and Instagram, ranks content using a variety of signals (such as predictions from ML models) that are combined using a collection of rules that form a configuration policy. Because of the system's complex dynamics, policy search to improve the system can be done only online, through A/B tests. Our ranking systems contain predictive models for the outcomes of interest, which are used as signals.

Offline simulation is a higher throughput alternative to online A/B tests for evaluating changes to the ranking configuration. A naive approach is to simply replay sessions offline to a simulator with the changed configuration, and use the existing models to predict the online outcomes on the ranked list. This system is easy to implement because it relies on existing predictive models, but it fails to capture many important behavioral dynamics. Ultimately, it cannot be used as a replacement for online experiments.

Instead, we use a multitask Gaussian process (MTGP) to combine a small number of online tests and a large number of (miscalibrated) offline tests into a single model. The MTGP effectively learns to correct the simulator error, and borrows strength from the offline tests to make accurate online predictions. We then use the MTGP in combination with Bayesian optimization to identify more optimal policies to be tested online.

Why it matters:

Reducing the number of required A/B tests enables engineers and researchers to use Bayesian optimization in settings where it would otherwise not be feasible. The ability to augment online experiments with simple, miscalibrated, offline simulations allows us to accelerate improvements to the system while increasing the number of positive configurations tested online. Read our previous work about using Bayesian optimization to tune online systems through A/B tests, and how we use this approach for value model tuning in Instagram ranking systems among others.

Read the full paper:

Bayesian Optimization for Policy Search via Online-Offline Experimentation