COMPUTER VISION

ML APPLICATIONS

Self-Supervised Learning of Pretext-Invariant Representations

June 19, 2020

Abstract

The goal of self-supervised learning from images is to construct image representations that are semantically meaningful via pretext tasks that do not require semantic annotations. Many pretext tasks lead to representations that are covariant with image transformations. We argue that, instead, semantic representations ought to be invariant under such transformations. Specifically, we develop PretextInvariant Representation Learning (PIRL, pronounced as “pearl”) that learns invariant representations based on pretext tasks. We use PIRL with a commonly used pretext task that involves solving jigsaw puzzles. We find that PIRL substantially improves the semantic quality of the learned image representations. Our approach sets a new stateof-the-art in self-supervised learning from images on several popular benchmarks for self-supervised learning. Despite being unsupervised, PIRL outperforms supervised pre-training in learning image representations for object detection. Altogether, our results demonstrate the potential of self-supervised representations with good invariance properties.

Download the Paper

AUTHORS

Written by

Ishan Misra

Laurens-van der Maaten

Publisher

Conference on Computer Vision and Pattern Recognition (CVPR)

Research Topics

Computer Vision

Related Publications

June 18, 2018

NLP

COMPUTER VISION

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering | Facebook AI Research

A number of studies have found that today’s Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To encourage development of models geared towards the…

Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi

June 18, 2018

June 17, 2018

NLP

COMPUTER VISION

Neural Baby Talk | Facebook AI Research

We introduce a novel framework for image captioning that can produce natural language explicitly grounded in entities that object detectors find in the image. Our approach reconciles classical slot filling approaches (that are generally better…

Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh

June 17, 2018

June 18, 2018

COMPUTER VISION

On the iterative refinement of densely connected representation levels for semantic segmentation | Facebook AI Research

State-of-the-art semantic segmentation approaches increase the receptive field of their models by using either a downsampling path composed of poolings/strided convolutions or successive dilated convolutions. However, it is not clear which…

Arantxa Casanova, Guillem Cucurull, Michal Drozdzal, Adriana Romero, Yoshua Bengio

June 18, 2018

June 18, 2018

SPEECH & AUDIO

COMPUTER VISION

Non-Local Neural Networks | Facebook AI Research

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired…

Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He

June 18, 2018

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.