RESEARCH

HUMAN & MACHINE INTELLIGENCE

Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies

May 6, 2019

Abstract

In this work we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks. The agent is split into a low-level and a high-level policy. The low-level policy only accesses internal, proprioceptive dimensions of the state observation. The low-level policies are trained with a simple reward that encourages changing the values of the non-proprioceptive dimensions. Furthermore, it is induced to be periodic with the use a “phase function.” The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time. Using this approach, we solve difficult maze and navigation tasks with sparse rewards using the Mujoco Ant and Humanoid agents and show improvement over recent hierarchical methods.

Download the Paper

Related Publications

November 04, 2019

NLP

SPEECH & AUDIO

Countering Language Drift via Visual Grounding | Facebook AI Research

Emergent multi-agent communication protocols are very different from natural language and not easily interpretable by humans. We find that agents that were initially pretrained to produce natural language can also experience detrimental…

Jason Lee, Kyunghyun Cho, Douwe Kiela

November 04, 2019

November 02, 2019

NLP

SPEECH & AUDIO

Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack | Facebook AI Research

The detection of offensive language in the context of a dialogue has become an increasingly important application of natural language processing. The detection of trolls in public forums (Galan-García et al., 2016), and the deployment of…

Emily Dinan, Samuel Humeau, Bharath Chintagunta, Jason Weston

November 02, 2019

May 08, 2018

COMPUTER VISION

SPEECH & AUDIO

Optimization Methods for Large-Scale Machine Learning | Facebook AI Research

This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural…

Leon Bottou, Frank E. Curtis, Jorge Nocedal

May 08, 2018

October 27, 2019

COMPUTER VISION

SPEECH & AUDIO

Live Face De-Identification in Video | Facebook AI Research

We propose a method for face de-identification that enables fully automatic video modification at high frame rates. The goal is to maximally decorrelate the identity, while having the perception (pose, illumination and expression) fixed. We…

Oran Gafni, Lior Wolf, Yaniv Taigman

October 27, 2019

Related Work

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.