July 9, 2021
Humans can walk with relative ease over rocks, through mud, up and down hills, on thick carpets, and across bouncy trampolines. We can do so with tired muscles or twisted ankles and while carrying objects of all shapes, sizes, and weights. To accomplish this, we constantly make near-instantaneous adjustments to the changing conditions in our bodies and beneath our feet.
To be similarly successful in the real world, walking robots also must adapt to whatever surfaces they encounter, whatever objects they carry, and whatever condition they are in — even if they’ve never been exposed to those conditions before. And to avoid falling and potentially suffering damage, these adjustments must happen in fractions of a second.
Today, a team of researchers from Facebook AI, UC Berkeley, and Carnegie Mellon University’s School of Computer Science are announcing Rapid Motor Adaptation (RMA), a breakthrough in artificial intelligence that enables legged robots to adapt intelligently in real time to challenging, unfamiliar new terrain and circumstances. RMA uses a novel combination of two policies, both learned entirely in simulation — a base policy trained through reinforcement learning (RL) and an adaptation module trained using supervised learning. Importantly, with RMA the robot demonstrates an aptitude fundamental to all intelligent agents — the ability to adapt to factors in its environment, such as the weight of a backpack suddenly thrust on it or the amount of friction on a new surface, without depending on any visual input at all.
Until now, legged robots have either been fully hand-coded for the environments they will inhabit or taught to navigate their environments through a combination of hand-coding and learning techniques. RMA is the first entirely learning-based system to enable a legged robot to adapt to its environment from scratch by exploring and interacting with the world.
Our tests demonstrate that an RMA-enabled robot outperforms alternative systems when walking over different surfaces, slopes, and obstacles, and when given different payloads to carry. This requires going beyond even sophisticated hand-coding, because it is difficult or impossible to preprogram a robot to adjust to the full range of real-world conditions, whether it’s a different type of rug, a deeper mud puddle, or a bouncier trampoline. Moreover, to work reliably, robots must be able to adjust not only to carrying different loads but also to expected wear and tear, like a dent on the bottom of its foot, a slightly worn-down part, or the countless other unpredictable changes that happen in the real world. Because its ability is based entirely on what it encounters, an RMA-enabled robot can adjust to situations programmers never even considered.
We are now sharing our work, including implementation details and experimental results, in this paper.
Improvements in hand-coding can boost a robot’s performance within a controlled environment, but the only way to truly adjust to the infinite variations found in the real world is to teach robots to actually adapt, similar to how people learn.
Giving robots this ability to adapt to changing real-world conditions requires teaching them through millions of repetitions, and the best way to do this is not in the real world, where they could get damaged or worn down while learning, but in simulation. RMA uses end-to-end learning all the way, even directly outputting joint positions without relying on predefined leg motions or other control primitives.
However, a number of challenges emerge when these skills are first learned in simulation and then deployed in the real world. The physical robot and its model in the simulator are often different in small but important ways. There might be a slight latency between a control signal being sent and the actuator moving, for example, or a scuff on a foot that makes it less slippery than before, or the angle of a joint might be off by a hundredth of a degree.
The physical world itself also presents intricacies that a simulator, which is modeled on rigid bodies moving in free space, cannot accurately capture. Surfaces like a mattress or a mud puddle can deform on contact. An environment that’s fairly standardized in simulation becomes much more varied and complex in the real world, moreso when one factors the multitude of terrains that can exist in both indoor and outdoor spaces. And of course, factors in the real world are never static, so one real-world environment that a legged robot is able to master can be completely different from another.
RMA overcomes these challenges by using two distinct subsystems: a base policy and an adaptation module.
The base policy is learned in simulation with RL, using carefully curated information about different environments (like the amount of friction and the weight and shape of the payload). We set different variables — simulating more slippery or less slippery ground or the grade of an incline — so it learns the right controls for different conditions, and we encode info about those variables as “extrinsics.”
We can’t simply deploy the robot with only this base policy, because we don’t know the actual extrinsics it will encounter out in the real world. So we rely on information that the robot teaches itself about its surroundings — information based on its most recent body movement. We know that the discrepancies between a joint’s actual movement and the expected movement from a command is dependent on these extrinsics. For example, sudden leg obstructions stop the robot’s legs but also reveal information about the ground height around it. Similarly, on a soft surface the leg will extend farther as the foot sinks in, whereas on a hard surface it’ll stop sooner.
Since we know the actual extrinsics the robot encounters in simulation, we can use supervised learning to train the adaptation module to predict them from the recent history of the robot’s state.
With this combination of a base policy and an adaptation module, the robot can adapt to new conditions in fractions of a second.
Robots trained with prior RL-based approaches require several minutes, and sometimes human intervention, to adjust to new conditions, rendering them impractical in the real world.
When the RMA-enabled robot is deployed, the base policy and adaptation module work hand in hand and asynchronously — the base policy running at a faster speed, the adaptation module running much slower — to enable the robot to perform robust and adaptive locomotion without any fine-tuning. Running both policies asynchronously and at substantially different frequencies also helps deploy RMA with a small onboard compute, as is the case with our robot. The small base policy can keep the robot walking at a high frequency, while the bigger adaptation module can send the extrinsics vector at a low frequency when it’s ready. Running both policies asynchronously also adds robustness to somewhat unpredictable hardware speeds and timing.
Our experiments have shown that the RMA-enabled robot successfully walks across several challenging environments, outperforming a non-RMA deployment and equaling or bettering the hand-coded controllers used in a Unitree robot. We executed all our real-world deployments with the same policy without any simulation calibration or real-world fine-tuning.
The robot was able to walk on sand, in mud, on hiking trails, in tall grass, and over a dirt pile without a single failure in all our trials. The robot successfully walked down steps along a hiking trail in 70 percent of the trials. It successfully navigated a cement pile and a pile of pebbles in 80 percent of the trials, despite never seeing the unstable or sinking ground, obstructive vegetation, or steps during training. It also maintained its height with a high success rate when moving with a 12 kg payload, which amounted to 100 percent of its body weight.
RMA is an exciting advance for robotics that could enable real-world deployment of new, highly effective, and adaptable walking robots. This work also shows how advancements in AI can transform the field of robotics, enhancing the capabilities of robots while also making those improvements more scalable to new conditions and applications. Methods that rely purely on learning potentially have the capability to work with much cheaper, inaccurate hardware, which would substantially bring down the cost of robots in the future. Increased efficiencies and reduced costs may mean that RMA-enabled robots could one day serve in myriad capacities, such as assistants in search and rescue operations, particularly in areas that are too dangerous or impractical for humans.
More broadly, we hope our work with RMA will help researchers build AI that can adapt in real time to unforeseen, rapidly changing, and highly complex conditions.
Beyond robotics, RMA points the way to building AI systems that can adapt to many difficult challenges in real time by leveraging data on the fly to understand the context in which a particular algorithm operates. This is a broad, long-term challenge that will require progress in many subfields beyond RL. But we are excited to see how the AI research community builds on our work with RMA — both in robotics and beyond.