Advancing AI by teaching robots to learn

May 16, 2019

Robotics provides important opportunities for advancing artificial intelligence, because teaching machines to learn on their own in the physical world will help us develop more capable and flexible AI systems in other scenarios as well. Working with a variety of robots — including walking hexapods, articulated arms, and robotic hands fitted with tactile sensors — Facebook AI researchers are exploring new techniques to push the boundaries of what artificial intelligence can accomplish.

Doing this work means addressing the complexity inherent in using sophisticated physical mechanisms and conducting experiments in the real world, where the data is noisier, conditions are more variable and uncertain, and experiments have additional time constraints (because they cannot be accelerated when learning in a simulation). These are not simple issues to address, but they offer useful test cases for AI.

As in other AI research areas, much of our work in robotics concentrates on self-supervision, in which systems learn directly from raw data (rather than from extensive structured training data specific to a particular task) so they can adapt to new tasks and new circumstances. To do this in robotics, we’re advancing techniques such as model-based reinforcement learning (RL) to enable robots to teach themselves through trial and error using direct input from sensors. The projects highlighted here show how we’re using these self-supervised learning approaches to address some of the most essential challenges in this field: developing robots that can move around in and explore their surroundings and manipulate objects they encounter.

This work will lead to more capable robots, but more important, it will lead to AI that can learn more efficiently and better generalize to new applications.

Teaching robots to learn how to walk on their own

To push the limits of how machines can learn independently, we are developing model-based RL methods to enable a six-legged robot to learn to walk — without being given task-specific information or training.

The robot starts learning from scratch with no information about its environment or its physical capabilities, and then it uses a data-efficient RL algorithm to learn a controller that achieves a desired outcome, such as moving itself forward. As it gathers information, the model optimizes for rewards and improves its performance over time.

Learning to walk is challenging because the robot must reason about its balance, location, and orientation in space, with the help of its sensors, such as the sensors on the joints of each of its six legs (because it doesn’t have sensors on its feet). These sensors are noisy, making this estimation hard and prone to errors.

Our goal is to reduce the number of interactions the robot needs to learn to walk, so it takes only hours instead of days or weeks. The techniques we are researching, which include Bayesian optimization as well as model-based RL, are designed to be generalized to work with a variety of different robots and environments. They could also help improve sample efficiency of RL for other applications beyond robotics, such as A/B testing or task scheduling.

Using curiosity to learn more effectively

Curiosity is a central motivation for learning in humans, and in our recent research done in collaboration with colleagues at New York University, we’re applying this notion to improve how robots learn in the real world. “Curious” AI systems are rewarded for exploring and trying new things, as well as for accomplishing a specific goal. Although previous similar systems typically explore their environment randomly, ours does it in a structured manner, seeking to satisfy its curiosity by learning about its surroundings and thereby reducing model uncertainty. We have applied this technique successfully in both simulations and also with a real-world robotic arm.

Our approach is different from other curiosity-driven robotics research in that we explicitly optimize actions that resolve uncertainty. To generate higher rewards for actions that explore the uncertain parts of the dynamics model, we seek to include the variance of the models prediction into the reward function evaluation. The system is aware of its model uncertainty and optimizes action sequences to both maximize rewards (achieving the desired task) and reduce that model uncertainty, making it better able to handle new tasks and conditions. It generates a greater variety of new data and learns more quickly — in some cases, in tens of iterations, rather than hundreds or thousands.

An overview of our approach for model-based reinforcement learning. We start with motor “babbling” data to initialize the dynamics model, followed by an iterative loop of learning the model and updating an iLQR policy.

Our research has shown that seeking to resolve uncertainty can actually help the robot achieve a task even faster. Our model was also better able to generalize to new tasks and initial conditions.

Although it may seem counterintuitive that exploring an environment can be more effective than focusing solely on the specific goal, this curiosity-driven behavior can help the robot avoid pitfalls, such as getting trapped or stuck.

We hope this research will help us create systems that can respond with more flexibility in uncertain environments and learn new tasks. This can potentially help with structured exploration necessary for faster, more efficient learning for other RL tasks in the real world and help us develop new ways to incorporate uncertainty into other models.

Learning through tactile sensing

Robots often rely primarily on computer vision, but touch is also an important and complex area of research. Given a particular manipulation task, for instance, a robot might use tactile sensing to complete the task if the object is obstructed from its view.

In collaboration with researchers from UC Berkeley, we developed a new method for learning from touch to accomplish a new goal through self-supervised learning, without task-specific training data. And then we can, by assigning a new goal, use this model to decide what is the best sequence of actions to take. We took a predictive model originally developed for video input and used it instead to optimize deep model-based control policies that operate directly on the raw data — which, in this case, consists of high-dimensional maps — provided by a high-resolution tactile sensor. Our work shows that predictive models may be learned entirely without rewards, through diverse self-supervised exploratory interactions with the environment.

Using this video prediction model, the robot was able to complete a series of complex tactile tasks: rolling a ball, moving a joystick, and identifying the right face of a 20-sided die. The model’s success shows the promise of using video predictive models to create systems that understand how the environment will react to touch.

In this die rolling task, we start from face 20, and the goal is to reach face 8. The second row shows the video-predictions (at every 3rd time-step) for the best action sequence found at the first real-world timestep. The red margins indicate real context frames, green margins indicate predicted frames.

This research also creates new avenues to explore multimodal learning, which is important for a wide range of AI research, such as developing systems to better understand content across modalities.

Robotics: A long-term focus for AI research

The examples presented here are a few of the many robotics research projects under way in at Facebook AI that will help us build AI that can learn more efficiently and better generalize to new applications, even in noisy and highly complex environments such as the physical world. We are focused on using robotics work that will not only lead to more capable robots but will also push the limits of AI over the years and decades to come. If we want to move closer to machines that can think, plan, and reason the way people do, then we need to build AI systems that can learn for themselves in a multitude of scenarios — beyond the digital world.

We would like to acknowledge the contributions of the following people to the tactile sensing research described in the blog post: UC Berkeley’s Stephen Tian, Frederik Ebert, Dinesh Jayaraman, Mayur Mudigonda, Chelsea Finn, and Sergey Levine.