Semantic Audio-Visual Navigation

April 21, 2021

Abstract

Recent work on audio-visual navigation assumes a constantly-sounding target and restricts the role of audio to signaling the target's position. We introduce semantic audio-visual navigation, where objects in the environment make sounds consistent with their semantic meaning (e.g., toilet flushing, door creaking) and acoustic events are sporadic or short in duration. We propose a transformer-based model to tackle this new semantic AudioGoal task, incorporating an inferred goal descriptor that captures both spatial and semantic properties of the target. Our model’s persistent multimodal memory enables it to reach the goal even long after the acoustic event stops. In support of the new task, we also expand the SoundSpaces audio simulations to provide semantically grounded sounds for an array of objects in Matterport3D. Our method strongly outperforms existing audio-visual navigation methods by learning to associate semantic, acoustic, and visual cues. Project page: http://vision.cs.utexas.edu/projects/semantic-audio-visual-navigation.

Download the Paper

AUTHORS

Written by

Changan Chen

Ziad Al-Halah

Kristen Grauman

Publisher

CVPR 2021

Research Topics

Natural Language Processing

Integrity

Related Publications

April 08, 2021

Responsible AI

Integrity

Towards measuring fairness in AI: the Casual Conversations dataset

Caner Hazirbas, Joanna Bitton, Brian Dolhansky, Jacqueline Pan, Albert Gordo, Cristian Canton Ferrer

April 08, 2021

Read the Paper

February 07, 2020

Integrity

Generate, Segment and Refine: Towards Generic Manipulation Segmentation | Facebook AI Research

Peng Zhou, Bor-Chun Chen, Xintong Han, Mahyar Najibi, Abhinav Shrivastava, Ser-Nam Lim, Larry S. Davis

February 07, 2020

Read the Paper

February 24, 2018

Speech & Audio

Computer Vision

Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective | Facebook AI Research

Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, James Law, Kevin Lee, Jason Lu, Pieter Noordhuis, Misha Smelyanskiy, Liang Xiong, Xiaodong Wang

February 24, 2018

Read the Paper

April 30, 2018