Research

Introducing Theseus, a library for encoding domain knowledge in end to end AI models

July 20, 2022

What the research is:

Meta AI is open-sourcing Theseus, a library for an optimization technique called differentiable nonlinear least squares (NLS) that is particularly useful for applications like robotics and computer vision. Built on PyTorch, Theseus enables researchers to easily incorporate expert domain knowledge into modern AI architectures. It does this by expressing that knowledge as an optimization problem and adding it to the architecture as a modular “optimization layer” in the usual gradient-based learning process. This domain knowledge is distinct from the training data and can help the model make more accurate predictions. For instance, to ensure that a robot’s movements are smooth, researchers could include knowledge about the robot’s embodiment and movement patterns (called a kinematics model) as a layer while the robot is trained end to end to move.

Theseus is the first library to provide an application-agnostic framework for differentiable nonlinear optimization. Theseus is also highly efficient — it speeds computation and memory by supporting batching, GPU acceleration, sparse solvers, and implicit differentiation. As a result, it is up to four times faster than Google’s state-of-the-art, C++-based Ceres Solver (which does not support end-to-end learning).

Theseus fuses the best aspects of the two prevailing methods for injecting prior knowledge into an AI system. Before the advent of deep learning, researchers used simpler, standalone AI optimization algorithms to solve individual problems in robotics. Robotic systems learned the best way to carry out commands by calculating the minimum value of a hand-selected combination of factors, such as joint motion and energy use. This method was effective but inflexible; the application-specific optimization algorithms often proved difficult to adapt to new systems or environments. Deep learning methods, on the other hand, are much more scalable, but they require a massive amount of data, and they may produce solutions that are effective but also brittle outside of the training domain.

To train a deep learning model for a particular application, researchers use a carefully selected loss function to measure how well the model is predicting the data. But to update the model weights through backpropagation, each layer must be differentiable, allowing the error information to flow through the network. Traditional optimization algorithms are not end to end differentiable, so researchers face a trade-off: They can abandon optimization algorithms for end to end deep learning dedicated to the specific task — and risk losing optimization’s efficiency as well as its facility for generalization. Or, they can train the deep learning model offline and add it to the optimization algorithms at inference time. The second method has the benefit of combining deep learning and prior knowledge, but — because the deep learning model is trained without that pre-existing information or the task-specific error function — its predictions might prove inaccurate.

To blend these strategies in a way that mitigates their weaknesses and leverages their strengths, Theseus converts the results of optimization into a layer that can be plugged into any neural network architecture. That way, revisions can back-propagate through the optimization layer, allowing researchers to fine-tune with domain-specific knowledge on the final task loss as an integral part of the end to end deep learning model.

In the Theseus layer (green), the objective is composed of the output tensors of upstream neural models (gray) and prior knowledge (orange). The output of the Theseus layer are tensors that minimize the objective.

How it works:

NLS measures how much a nonlinear function varies from the actual data it is meant to predict. A small value means the function fits the data set well. NLS is prevalent in the formulation of many robotics and vision problems, from mapping and estimation to planning and control. For example, a robot’s route toward a desired goal can be formulated as an NLS optimization problem: To plot the fastest safe trajectory, the system finds the solution to a sum-of-costs objective that minimizes both travel duration and unwanted behavior, like sharp turns or collisions with obstacles in the environment. A sum-of-costs objective can also capture sensor measurement errors to optimize the past trajectories of a robot or camera.

Making NLS differentiable, Theseus provides differentiable nonlinear optimization as a layer that researchers can insert into their neural network. Input tensors define a sum-of-weighted-squares objective function, and output tensors are arguments that produce the minimum of that objective. (In contrast, typical neural layers take input tensors through a linear transformation and some element-wise nonlinear activation function.) The ability to compute gradients end to end is retained by differentiating through the optimizer.

This integrates the optimizer and known priors into the deep learning training loop, allowing models to encode domain knowledge and learn on the actual task loss. For instance, to ensure that a robot’s movements are smooth, researchers could include known robot kinematics in the optimizer; meanwhile, the deep learning model will extract the larger goal from perception or a language instruction during training. That way, researchers can develop the goal prediction model end to end with the known kinematics model in the training loop. This technique of modularly mixing known priors with neural components leads to improved data efficiency and generalization.

For efficiency, Theseus incorporates support for sparse solvers, automatic vectorization, batching, GPU acceleration, and gradient computation with implicit differentiation. Just as autodiff and GPU acceleration have propelled the evolution of PyTorch over NumPy, sparsity and implicit differentiation — on top of autodiff and GPU acceleration — power Theseus, in contrast to solvers like Ceres that typically support only sparsity. On a standard GPU, Theseus with a sparse solver is much faster and requires significantly less memory than a dense solver. Additionally, when Theseus is solving a batch of large problems, its forward pass is up to four times faster than that of Ceres, which has limited GPU support and does not support batching or end to end learning. Finally, implicit differentiation yields better gradients than standard unrolling. Implicit differentiation also has a constant memory and compute footprint with increasing optimization iterations, unlike unrolling, which scales linearly in compute and memory.

Why it matters:

Theseus provides a common framework to leverage the complementary strengths of traditional robotics and vision approaches and deep learning. Differentiable optimization acts as an inductive prior, improving data efficiency and generalization, which is crucial in robotics because data and labels often do not come cheap, and application domains tend to be broad.

Recognizing the flexibility of differentiable NLS, previous researchers have reported state-of-the-art results with similar methods in a wide range of applications in robotics and vision, but existing implementations are task-specific and often inefficient. Theseus is application-agnostic, so the AI community can make faster progress by training accurate models that excel in multiple tasks and environments. We have developed several example applications, including pose graph optimization, tactile state estimation, bundle adjustment, motion planning, and homography estimation. We built these examples using the same underlying differentiable components, such as second-order optimizers, standard costs functions, and Lie groups.

Beyond pushing the current state of the art, our framework will enable avenues for future research into the role and possible evolution of structure in complex robot systems, learning end to end on such systems, and continually learning during real-world interactions.