COMPUTER VISION

ImageBind: One Embedding Space To Bind Them All

May 09, 2023

Abstract

We present IMAGEBIND, an approach to learn a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. We show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together. IMAGEBIND can leverage recent large scale vision-language models, and extends their zeroshot capabilities to new modalities just by using their natural pairing with images. It enables novel emergent applications ‘out-of-the-box’ including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation. The emergent capabilities improve with the strength of the image encoder and we set a new state-of-theart on emergent zero-shot recognition tasks across modalities, outperforming specialist supervised models. Finally, we show strong few-shot recognition results outperforming prior work, and that IMAGEBIND serves as a new way to evaluate vision models for visual and non-visual tasks

Download the Paper

AUTHORS

Written by

Rohit Girdhar

Alaa El-Nouby

Zhuang Liu

Mannat Singh

Kalyan Vasudev Alwala

Armand Joulin

Ishan Misra

Publisher

CVPR

Research Topics

Computer Vision

Related Publications

June 04, 2023

COMPUTER VISION

Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation

Dahyun Kang, Peter Koniusz, Minsu Cho, Naila Murray

June 04, 2023

April 20, 2023

COMPUTER VISION

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Xubo Liu, Egor Lakomkin, Dino Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jachym Kolar, Stavros Petridis, Maja Pantic, Christian Fuegen

April 20, 2023

April 06, 2023

COMPUTER VISION

On the Benefits of 3D Pose and Tracking for Human Action Recognition

Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Christoph Feichtenhofer, Jitendra Malik

April 06, 2023

April 05, 2023

COMPUTER VISION

Segment Anything

Alexander Kirillov, Alex Berg, Chloe Rolland, Eric Mintun, Hanzi Mao, Laura Gustafson, Nikhila Ravi, Piotr Dollar, Ross Girshick, Spencer Whitehead, Wan-Yen Lo

April 05, 2023

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.