May 6, 2019
In this work we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks. The agent is split into a low-level and a high-level policy. The low-level policy only accesses internal, proprioceptive dimensions of the state observation. The low-level policies are trained with a simple reward that encourages changing the values of the non-proprioceptive dimensions. Furthermore, it is induced to be periodic with the use a “phase function.” The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time. Using this approach, we solve difficult maze and navigation tasks with sparse rewards using the Mujoco Ant and Humanoid agents and show improvement over recent hierarchical methods.
Research Topics
Human & Machine IntelligenceNovember 04, 2019
Emergent multi-agent communication protocols are very different from natural language and not easily interpretable by humans. We find that agents that were initially pretrained to produce natural language can also experience detrimental…
Jason Lee, Kyunghyun Cho, Douwe Kiela
November 04, 2019
November 02, 2019
The detection of offensive language in the context of a dialogue has become an increasingly important application of natural language processing. The detection of trolls in public forums (Galan-García et al., 2016), and the deployment of…
Emily Dinan, Samuel Humeau, Bharath Chintagunta, Jason Weston
November 02, 2019
May 08, 2018
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural…
Leon Bottou, Frank E. Curtis, Jorge Nocedal
May 08, 2018
October 27, 2019
We propose a method for face de-identification that enables fully automatic video modification at high frame rates. The goal is to maximally decorrelate the identity, while having the perception (pose, illumination and expression) fixed. We…
Oran Gafni, Lior Wolf, Yaniv Taigman
October 27, 2019