Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

November 10, 2020


Adversarial Imitation Learning alternates between learning a discriminator -- which tells apart expert's demonstrations from generated ones -- and a generator's policy to produce trajectories that can fool this discriminator. This alternated optimization is known to be delicate in practice since it compounds unstable adversarial training with brittle and sample-inefficient reinforcement learning. We propose to remove the burden of the policy optimization steps by leveraging a novel discriminator formulation. Specifically, our discriminator is explicitly conditioned on two policies: the one from the previous generator's iteration and a learnable policy. When optimized, this discriminator directly learns the optimal generator's policy. Consequently, our discriminator's update solves the generator's optimization problem for free: learning a policy that imitates the expert does not require an additional optimization loop. This formulation effectively cuts by half the implementation and computational burden of Adversarial Imitation Learning algorithms by removing the Reinforcement Learning phase altogether. We show on a variety of tasks that our simpler approach is competitive to prevalent Imitation Learning methods.

Download the Paper


Written by

Paul Barde

Julien Roy

Wonseok Jeon

Joelle Pineau

Derek Nowrouzezahrai

Christopher Pal


NeurIPS 2020

Related Publications

August 15, 2019


PHYRE: A New Benchmark for Physical Reasoning | Facebook AI Research

Understanding and reasoning about physics is an important ability of intelligent agents. We develop the PHYRE benchmark for physical reasoning that contains a set of simple classical mechanics puzzles in a 2D physical environment. The benchmark…

Anton Bakhtin, Laurens van der Maaten, Justin Johnson, Laura Gustafson, Ross Girshick

August 15, 2019

July 03, 2019



Linguistic generalization and compositionality in modern artificial neural networks | Facebook AI Research

In the last decade, deep artificial neural networks have achieved astounding performance in many natural language processing tasks. Given the high productivity of language, these models must possess effective generalization abilities. It is…

Marco Baroni

July 03, 2019

May 06, 2019


Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies | Facebook AI Research

In this work we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks. The agent is split into a low-level and a high-level policy. The low-level policy only accesses internal,…

Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam

May 06, 2019

April 24, 2017



Episodic Exploration for Deep Deterministic Policies for StarCraft Micro-Management | Facebook AI Research

We consider scenarios from the real-time strategy game StarCraft as benchmarks for reinforcement learning algorithms. We focus on micromanagement, that is, the short-term, low-level control of team members during a battle. We propose several…

Nicolas Usunier, Gabriel Synnaeve, Zeming Lin, Soumith Chintala

April 24, 2017

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.