May 01, 2019

Written byEytan Bakshy, Max Balandat, Kostya Kashin

Written by

Eytan Bakshy, Max Balandat, Kostya Kashin

How can researchers and engineers explore large configuration spaces that have complex trade-offs when it may take hours or days to evaluate any given configuration? This challenge frequently arises across many domains, including tuning hyperparameters for machine learning (ML) models, finding optimal product settings through A/B testing, and designing next-generation hardware.

Today we are open-sourcing two tools, Ax and BoTorch, that enable anyone to solve challenging exploration problems in both research and production — without the need for large quantities of data.

Ax is an accessible, general-purpose platform for understanding, managing, deploying, and automating adaptive experiments.

BoTorch, built on PyTorch, is a flexible, modern library for Bayesian optimization, a probabilistic method for data-efficient global optimization.

These tools, which have been deployed at scale here at Facebook, are part of our ongoing work in what we have termed “adaptive experimentation,” in which machine learning algorithms, with human guidance, sequentially determine what configurations to test next in order to achieve some set of goals. These methods work by modeling the relationship between limited, potentially noisy observed data from experiments and applying principled exploration strategies, such as bandit optimization and Bayesian optimization, to make decisions.

At Facebook, adaptive experimentation is used to tackle a broad range of problems, including:

Increasing the efficiency of back-end infrastructure, such as just-in-time compilers, memory allocation, and data retrieval systems.

Tuning ranking models, such as those used by News Feed and Instagram, to improve user experience.

Optimizing algorithms for video playback, Facebook Live, and media uploads to deliver higher-quality, smoother video streaming.

Improving response rates on prompts to take surveys or raise awareness of products, such as the blood donations feature on Facebook.

Solving inverse problems in optics for the design of AR and VR hardware with Facebook Reality Labs.

Automating hyperparameter search for Facebook’s FBLearner machine learning platform to achieve high model accuracy with fewer computing resources.

Learning robust robot locomotion policies in simulated and real-world environments.

BoTorch advances the state of the art in Bayesian optimization research by leveraging the features of PyTorch, including auto-differentiation, massive parallelism, and deep learning. BoTorch provides a platform upon which researchers can build and unlocks new areas of research for tackling complex optimization problems. Ax and BoTorch leverage probabilistic models that make efficient use of data and are able to meaningfully quantify the costs and benefits of exploring new regions of problem space. In these cases, probabilistic models can offer significant benefits over standard deep learning methods such as neural networks, which often require large amounts of data to make accurate predictions and don’t provide good estimates of uncertainty.

We hope that by lowering the barrier to entry for adaptive experimentation, Ax will empower developers and researchers to explore more configurations in a principled and resource-efficient way. We also hope BoTorch will be a catalyst for research in this area by providing a powerful, versatile platform for Bayesian optimization research that integrates closely with popular deep learning libraries.

In this blog post, we’ll introduce both projects in detail and then provide a concrete example with code snippets to illustrate how simple it is to use our framework to find optimal configurations.

The goal of Bayesian optimization is to find an optimal configuration of a system with a limited budget of experimental trials. These methods employ a probabilistic surrogate model to make predictions about possible outcomes of unobserved configurations. To search for optimal configurations, we define an acquisition function that uses the surrogate model to assign each configuration a utility. Configurations with the highest utility are tested on the system, and the process repeats. The performance of a Bayesian optimization algorithm is therefore determined by three components: the surrogate model, the acquisition function, and the methods that numerically optimize the acquisition function.

Facebook has previously used Bayesian optimization for simple hyperparameter optimization tasks, but we found existing tools were insufficient to meet our growing needs. So we developed new methods that would support optimization of multiple noisy objectives, scale to highly parallel test environments, leverage low-fidelity approximations, and optimize over high-dimensional parameter spaces. While a number of Bayesian optimization packages existed, they were difficult to extend or customize, and none supported all the features necessary to tackle the diversity of use cases we encounter at Facebook.

To address these challenges, we harnessed the computational capabilities of PyTorch and rethought how we implement models and optimization routines. The result of that work is BoTorch, which provides a modular, easily extensible interface for composing Bayesian optimization primitives, including probabilistic surrogate models, acquisition functions, and optimizers. It also offers support for:

Auto-differentiation, highly parallelized computations on modern hardware (including GPUs), and seamless integration with deep learning modules via PyTorch.

State-of-the art probabilistic modeling in GPyTorch, including support for multitask Gaussian processes (GPs), scalable GPs, deep kernel learning, deep GPs, and approximate inference.

Monte Carlo-based acquisition functions via the reparameterization trick, which makes it straightforward to implement new ideas without having to impose restrictive assumptions about the underlying model.

In our work, we have found that BoTorch substantially improves developer efficiency for Bayesian optimization research. It opens the door for novel methods that do not admit analytic solutions, including batch acquisition functions and proper handling of rich multitask models with multiple correlated outcomes. BoTorch’s modular design makes it possible for researchers to swap out or rearrange individual components in order to customize all aspects of their algorithm, thereby empowering them to do state-of-the art research on modern Bayesian optimization methods.

Ax provides easy-to-use APIs to interface with BoTorch, along with the management necessary for production-ready services and reproducible research. This allows developers to focus on the applied problems, such as exploring configurations and understanding trade-offs between objectives. Similarly, it allows researchers to spend more time focusing on the building blocks of Bayesian optimization. At Facebook, Ax has been broadly applied by engineers who do not have extensive experience with machine learning, as well as by AI researchers.

The figure below illustrates how Ax and BoTorch are used within the optimization ecosystem. At Facebook, Ax interfaces with our major A/B testing and machine learning platforms, as well as simulators and other types of backend systems, requiring minimal user involvement for deploying configurations and gathering results.

Ax enables developers to create custom optimization applications, or optimize on an ad hoc basis from a Jupyter notebook. New algorithms can be implemented using the BoTorch library or other applications. Ax provides a framework for dispatching configurations to, and querying data from, external systems used in evaluation of configurations.

Ax lowers the barriers to adaptive experimentation for developers and researchers alike through the following core features:

Framework-agnostic interface for implementing new adaptive experimentation algorithms. While Ax makes heavy use of BoTorch for its optimization algorithms, generic NumPy and PyTorch interfaces are provided so that researchers and developers can plug in methods implemented in any framework.

Customizable, automated optimization routines. Ax selects the appropriate optimization strategy — choosing from Bayesian optimization, bandit optimization, and other techniques — according to features of the experiment. These default routines can be easily customized by users to meet the needs of their specific applications.

Tools for system understanding.Interactive visualizations that allow users to view the surrogate model, perform diagnostics, and understand trade-offs between different outcomes.

Human-in-the-loop optimization. In addition to supporting multiple objectives and advancing system understanding, Ax's underlying data model enables experimenters to safely evolve their search space and goals as new data is collected.

Ability to create custom optimization services. Multiple APIs allow using Ax either as a framework that controls deployment and data collection, or as a lightweight library that can be called via a remote service.

A benchmarking suite for evaluating new adaptive experimentation algorithms. Easily compare optimization performance of different algorithms on test problems and save results for reproducible research.

To show what it's like to work with Ax, here is an example of a simple optimization loop using the artificial Booth function as the evaluation function:

from ax import optimize best_parameters, _, _, _ = optimize( parameters=[ { "name": "x1", "type": "range", "bounds": [-10.0, 10.0], }, { "name": "x2", "type": "range", "bounds": [-10.0, 10.0], }, ], evaluation_function=lambda p: (p["x1"] + 2*p["x2"] - 7)**2 + (2*p["x1"] + p["x2"] - 5)**2, minimize=True, ) best_parameters # returns {'x1': 1.02, 'x2': 2.97}; true min is (1, 3)

Having shown the big picture of what BoTorch and Ax can do, we’ll now dive into the nuts and bolts of how to take a research idea from creation to production.

In many applications, it is desirable to explore the problem space using batches of design points (i.e., configurations). For instance, simulations or ML model training jobs for hyperparameter optimization can be run in parallel on a cluster of compute resources. Doing this kind of batched exploration optimally requires the acquisition function to assess the joint value of a set of design points. One such acquisition function, is the q-Expected Improvement algorithm (qEI), from Parallel Bayesian Global Optimization of Expensive Functions by Wang et al.:

qEI does not admit an analytic expression in terms of the parameters of the posterior distribution. However, it can be estimated using Monte Carlo (MC) sampling via the reparameterization trick, which involves correlating samples drawn from the standard normal distribution using the Cholesky decomposition of the posterior covariance:

Illustration of Monte Carlo-based acquisition functions in BoTorch. The model provides the posterior distribution over function values at a given candidate set X. To compute the overall utility of the candidate set, quasi-Monte Carlo samples are drawn from the posterior, the value of each sample is evaluated, and these values are subsequently averaged.

Implementing this approximation in BoTorch is straightforward:

import torch from botorch.acquisition.monte_carlo import MCAcquisitionFunction from botorch.acquisition.sampler import SobolQMCNormalSampler class qExpectedImprovement(MCAcquisitionFunction): def __init__(self, model, best_f, num_samples=500): sampler = SobolQMCNormalSampler(num_samples) super().__init__(model=model, sampler=sampler) self.register_buffer("best_f", torch.as_tensor(best_f)) def forward(self, X): posterior = self.model.posterior(X) # evaluate posterior at X samples = self.sampler(posterior) # sample from posterior delta = (samples - self.best_f).clamp_min(0) # compute improvement per sample delta_max = delta.max(dim=-1)[0] # compute maximum across the q points qei = delta_max.mean(dim=0) # average across samples return qei

Here, MCAcquisitionFunction is a subclass of torch.nn.Module, and so we only need to implement a forward method. self.sampler() takes 500 quasi-Monte Carlo draws from the (joint) posterior distribution over function values (as modeled by the surrogate model) at the q design points, X. The expected improvement is then the sample average of the largest improvements across q over the best observed value so far (best_f) for each of the 500 samples.

How do we optimize this quantity? PyTorch’s autograd makes it easy to compute gradients:

qEI = qExpectedImprovement(model, best_f=0.0) X = torch.rand(5, 10, requires_grad=True) val = qEI(X) val.backward() grad = X.grad

This automatic gradient can then be plugged into optimizers that take full advantage of this information to efficiently find the set of design points that maximizes the full joint utility. The figure below shows the trajectory of the design points during optimization for a single batch of size q=4.

Illustration of acquisition function optimization for a single batch of four design points. Starting points are denoted by the open circle, and their final locations are denoted by the filled circle. One can see that qEI balances explore-exploit trade-off by selecting points that either have a high expected posterior mean (dark green, at left), a high amount of uncertainty (dark blue, at right), or some combination of the two.

The newly minted acquisition function can be plugged into an Ax optimization loop, which internally will use quasi-second order numerical optimization algorithms in conjunction with a random-restart heuristic to optimize the utility of these q design points:

from ax.modelbridge.factory import get_botorch def get_qEI(model, best_f): return qExpectedImprovement(model, best_f=best_f) # collect some initial data... for i in range(num_batches): # evaluate all trials that have not yet been evaluated data = experiment.eval() # set up the model model = get_botorch( experiment=experiment, data=data, search_space=experiment.search_space, acqf_constructor=get_qEI, ) # generate candidates and schedule a new trial of # batch size q=4 trial = experiment.new_trial(model.gen(4))

The figure below shows how parallel evaluations can help speed up the amount of time to optimize a problem.

Closed-loop optimization performance for qEI and random exploration (q denotes the parallelism of the algorithm).

Given that acquisition functions are a fundamental component of Bayesian optimization, it is important for researchers to easily prototype and test new variants of these functions. The above example shows how easy it is to do this by using a custom BoTorch acquisition function in a standard Ax optimization loop. Models and acquisition function optimizers can be customized in a similar fashion. As a result, researchers can focus on improving the underlying modeling and optimization algorithms in BoTorch and delegate the setup, management, deployment, and analysis to Ax.

Over time, we will refine the beta releases of our software, expand the set of available algorithms, and provide out-of-box integration with popular scheduling software. We look forward to working with the community to add user-contributed modules to Ax and BoTorch to further improve the platform.

Research areas we are particularly excited about include high-dimensional Bayesian optimization and multi-fidelity optimization. We also believe there are significant opportunities for improving the performance of numerical optimization of acquisition functions using novel parallelism-aware solvers. We plan to further explore these as well as other new features.

Used in tandem, Ax and BoTorch significantly accelerate the process of going from research to production, and we hope it will inspire new use cases for adaptive experimentation within the broader community.

Both Ax and BoTorch are available now, so engineers and researchers can start using them today.

*If you are interested in collaborating with the Ax and BoTorch teams, please reach out.*

*We’d like to acknowledge the contributions to BoTorch and Ax from many researchers, engineers, and data scientists at Facebook. BoTorch is designed to work seamlessly with GPyTorch, and it was developed in collaboration with Jake Gardner from Uber AI Labs and Geoff Pleiss and Andrew Gordon Wilson from Cornell University*

Eytan Bakshy

Research Manager, Facebook

Max Balandat

Research Scientist, Facebook

Kostya Kashin

Engineering Manager, Facebook

Facebook © 2019