Integrity

NLP

Semantic Audio-Visual Navigation

April 21, 2021

Abstract

Recent work on audio-visual navigation assumes a constantly-sounding target and restricts the role of audio to signaling the target's position. We introduce semantic audio-visual navigation, where objects in the environment make sounds consistent with their semantic meaning (e.g., toilet flushing, door creaking) and acoustic events are sporadic or short in duration. We propose a transformer-based model to tackle this new semantic AudioGoal task, incorporating an inferred goal descriptor that captures both spatial and semantic properties of the target. Our model’s persistent multimodal memory enables it to reach the goal even long after the acoustic event stops. In support of the new task, we also expand the SoundSpaces audio simulations to provide semantically grounded sounds for an array of objects in Matterport3D. Our method strongly outperforms existing audio-visual navigation methods by learning to associate semantic, acoustic, and visual cues. Project page: http://vision.cs.utexas.edu/projects/semantic-audio-visual-navigation.

Download the Paper

AUTHORS

Written by

Changan Chen

Ziad Al-Halah

Kristen Grauman

Publisher

CVPR 2021

Related Publications

April 08, 2021

Responsible AI

Integrity

Towards measuring fairness in AI: the Casual Conversations dataset

Caner Hazirbas, Joanna Bitton, Brian Dolhansky, Jacqueline Pan, Albert Gordo, Cristian Canton Ferrer

April 08, 2021

February 07, 2020

Integrity

Generate, Segment and Refine: Towards Generic Manipulation Segmentation | Facebook AI Research

Peng Zhou, Bor-Chun Chen, Xintong Han, Mahyar Najibi, Abhinav Shrivastava, Ser-Nam Lim, Larry S. Davis

February 07, 2020

February 24, 2018

Speech & Audio

Computer Vision

Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective | Facebook AI Research

Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, James Law, Kevin Lee, Jason Lu, Pieter Noordhuis, Misha Smelyanskiy, Liang Xiong, Xiaodong Wang

February 24, 2018

April 30, 2018

Computer Vision

Integrity

Countering Adversarial Images Using Input Transformations | Facebook AI Research

Chuan Guo, Mayank Rana, Moustapha Cisse, Laurens van der Maaten

April 30, 2018

November 02, 2019

NLP

Speech & Audio

Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack | Facebook AI Research

Emily Dinan, Samuel Humeau, Bharath Chintagunta, Jason Weston

November 02, 2019

May 31, 2019

Integrity

Abusive Language Detection with Graph Convolutional Networks | Facebook AI Research

Pushkar Mishra, Marco Del Tredici, Helen Yannakoudakis, Ekaterina Shutova

May 31, 2019

June 15, 2019

Computer Vision

Integrity

Feature Denoising for Improving Adversarial Robustness | Facebook AI Research

Kaiming He, Yuxin Wu, Laurens van der Maaten, Alan Yuille, Cihang Xie

June 15, 2019

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.