RESEARCH

COMPUTER VISION

Learning State-Aware Visual Representations from Audible Interactions

November 10, 2022

Abstract

We propose a self-supervised algorithm to learn representations from egocentric video data. Recently, significant efforts have been made to capture humans interacting with their own environments as they go about their daily activities. In result, several large egocentric datasets of interaction-rich multi-modal data have emerged. However, learning representations from videos can be challenging. First, given the uncurated nature of long-form continuous videos, learning effective representations require focusing on moments in time when interactions take place. Second, visual representations of daily activities should be sensitive to changes in the state of the environment. However, current successful multi-modal learning frameworks encourage representation invariance over time. To address these challenges, we leverage audio signals to identify moments of likely interactions which are conducive to better learning. We also propose a novel self-supervised objective that learns from audible state changes caused by interactions. We validate these contributions extensively on two large-scale egocentric datasets, EPIC-Kitchens-100 and the recently released Ego4D, and show improvements on several downstream tasks, including action recognition, long-term action anticipation, and object state change classification.

Download the Paper

AUTHORS

Written by

Unnat Jain

Abhinav Gupta

Himangi Mittal

Pedro Morgado

Publisher

NeurIPS

Research Topics

Computer Vision

Related Publications

March 09, 2023

COMPUTER VISION

The Casual Conversations v2 Dataset

Bilal Porgali, VĂ­tor Albiero, Jordan Ryda, Cristian Canton Ferrer, Caner Hazirbas

March 09, 2023

February 21, 2023

COMPUTER VISION

CORE MACHINE LEARNING

ArchRepair: Block-Level Architecture-Oriented Repairing for Deep Neural Networks

Felix Xu, Fuyuan Zhang, Hua Qi, Jianjun Zhao, Jianlang Chen, Lei Ma, Qing Guo, Zhijie Wang

February 21, 2023

January 10, 2023

COMPUTER VISION

CORE MACHINE LEARNING

Online Backfilling with No Regret for Large-Scale Image Retrieval

Gokhan Uzunbas, Joena Zhang, Sara Cao, Ser-Nam Lim, Taipeng Tian, Bohyung Han, Seonguk Seo

January 10, 2023

January 04, 2023

COMPUTER VISION

CORE MACHINE LEARNING

Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales

Xi Liu, Panganamala Kumar, Ruida Zhou, Tao Liu

January 04, 2023

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.