Learning to Set Waypoints for Audio-Visual Navigation

May 3, 2021


In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e.g., a phone ringing in another room). Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations. We introduce a reinforcement learning approach to audio-visual navigation with two key novel elements: 1) waypoints that are dynamically set and learned end-to-end within the navigation policy, and 2) an acoustic memory that provides a structured, spatially grounded record of what the agent has heard as it moves. Both new ideas capitalize on the synergy of audio and visual data for revealing the geometry of an unmapped space. We demonstrate our approach on two challenging datasets of real-world 3D scenes, Replica and Matterport3D. Our model improves the state of the art by a substantial margin, and our experiments reveal that learning the links between sights, sounds, and space is essential for audio-visual navigation.

Download the Paper


Written by

Changan Chen

Sagnik Majumder

Ziad Al-Halah

Ruohan Gao

Santhosh K. Ramakrishnan

Kristen Grauman


ICLR 2021

Research Topics

Reinforcement Learning


Related Publications

December 07, 2020


Joint Policy Search for Collaborative Multi-agent Imperfect Information Games

To learn good joint policies for multi-agent collaboration with imperfect information remains a fundamental challenge. While for two-player zero-sum games, coordinate-ascent approaches…

Stéphane d’Ascoli, Levent Sagun, Giulio Biroli

December 07, 2020

December 18, 2020



Reinforcement Learning-based Product Delivery Frequency Control

Frequency control is an important problem in modern recommender systems. It dictates the delivery frequency of recommendations to maintain product quality and efficiency.…

Yang Liu, Zhengxing Chen, Kittipat Virochsiri, Juan Wang, Jiahao Wu, Feng Liang

December 18, 2020

December 05, 2020


An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit the structure of the problem and have been shown to be asymptotically suboptimal. …

Andrea Tirinzonin, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

December 05, 2020

October 10, 2020



Active MR k-space Sampling with Reinforcement Learning

Deep learning approaches have recently shown great promise in accelerating magnetic resonance image (MRI) acquisition. The majority of existing work have focused on designing better reconstruction models…

Luis Pineda, Sumana Basu, Adriana Romero,Roberto CalandraRoberto Calandra, Michal Drozdzal

October 10, 2020

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.