RESEARCH

COMPUTER VISION

Listen to Look: Action Recognition by Previewing Audio

June 14, 2020

Abstract

In the face of the video data deluge, today’s expensive clip-level classifiers are increasingly impractical. We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies. First, we devise an IMGAUD2VID framework that hallucinates clip-level features by distilling from lighter modalities—a single frame and its accompanying audio — reducing short-term temporal redundancy for efficient clip-level recognition. Second, building on IMGAUD2VID, we further propose IMGAUD-SKIMMING, an attention-based long short-term memory network that iteratively selects useful moments in untrimmed videos, reducing long-term temporal redundancy for efficient video-level recognition. Extensive experiments on four action recognition datasets demonstrate that our method achieves the state-of-the-art in terms of both recognition accuracy and speed.

Download the Paper

AUTHORS

Written by

Ruohan Gao

Tae-Hyun Oh

Kristen Grauman

Lorenzo Torresani

Publisher

Conference on Computer Vision and Pattern Recognition (CVPR)

Research Topics

Computer Vision

Related Publications

June 15, 2019

COMPUTER VISION

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search | Facebook AI Research

Designing accurate and efficient ConvNets for mobile devices is challenging because the design space is combinatorially large. Due to this, previous neural architecture search (NAS) methods are computationally expensive. ConvNet architecture…

Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer

June 15, 2019

April 28, 2019

COMPUTER VISION

Inverse Path Tracing for Joint Material and Lighting Estimation | Facebook AI Research

Modern computer vision algorithms have brought significant advancement to 3D geometry reconstruction. However, illumination and material reconstruction remain less studied, with current approaches assuming very simplified models for materials…

Dejan Azinović, Tzu-Mao Li, Anton Kaplanyan, Matthias Nießner

April 28, 2019

June 14, 2019

COMPUTER VISION

Thinking Outside the Pool: Active Training Image Creation for Relative Attributes | Facebook AI Research

Current wisdom suggests more labeled image data is always better, and obtaining labels is the bottleneck. Yet curating a pool of sufficiently diverse and informative images is itself a challenge. In particular, training image curation is…

Aron Yu, Kristen Grauman

June 14, 2019

September 09, 2018

COMPUTER VISION

DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs | Facebook AI Research

Consumer depth sensors are more and more popular and come to our daily lives marked by its recent integration in the latest iPhone X. However, they still suffer from heavy noises which dramatically limit their applications. Although plenty of…

Shi Yan, Chenglei Wu, Lizhen Wang, Feng Xu, Liang An, Kaiwen Guo, Yebin Liu

September 09, 2018

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.