RESEARCH

COMPUTER VISION

ClusterFit: Improving Generalization of Visual Representations

May 19, 2020

Abstract

Pre-training convolutional neural networks with weakly supervised and self-supervised strategies is becoming increasingly popular for several computer vision tasks. However, due to the lack of strong discriminative signals, these learned representations may overfit to the pre-training objective (e.g., hashtag prediction) and not generalize well to downstream tasks. In this work, we present a simple strategy – ClusterFit (CF) to improve the robustness of the visual representations learned during pre-training. Given a dataset, we (a) cluster its features extracted from a pretrained network using k-means and (b) re-train a new network from scratch on this dataset using cluster assignments as pseudo-labels. We empirically show that clustering helps reduce the pre-training task-specific information from the extracted features thereby minimizing overfitting to the same. Our approach is extensible to different pretraining frameworks – weak- and self-supervised, modalities – images and videos, and pre-training tasks – object and action classification. Through extensive transfer learning experiments on 11 different target datasets of varied vocabularies and granularities, we show that CF significantly improves the representation quality compared to the state-of-the-art large-scale (millions / billions) weakly-supervised image and video models and self-supervised image models.

Download the Paper

AUTHORS

Written by

Xueting Yan

Ishan Misra

Abhinav GuptaDeepti GhadiyaramDhruv Mahajan

Publisher

Conference on Computer Vision and Pattern Recognition (CVPR)

Research Topics

Computer Vision

Related Publications

June 15, 2019

COMPUTER VISION

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search | Facebook AI Research

Designing accurate and efficient ConvNets for mobile devices is challenging because the design space is combinatorially large. Due to this, previous neural architecture search (NAS) methods are computationally expensive. ConvNet architecture…

Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer

June 15, 2019

April 28, 2019

COMPUTER VISION

Inverse Path Tracing for Joint Material and Lighting Estimation | Facebook AI Research

Modern computer vision algorithms have brought significant advancement to 3D geometry reconstruction. However, illumination and material reconstruction remain less studied, with current approaches assuming very simplified models for materials…

Dejan Azinović, Tzu-Mao Li, Anton Kaplanyan, Matthias Nießner

April 28, 2019

June 14, 2019

COMPUTER VISION

Thinking Outside the Pool: Active Training Image Creation for Relative Attributes | Facebook AI Research

Current wisdom suggests more labeled image data is always better, and obtaining labels is the bottleneck. Yet curating a pool of sufficiently diverse and informative images is itself a challenge. In particular, training image curation is…

Aron Yu, Kristen Grauman

June 14, 2019

September 09, 2018

COMPUTER VISION

DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs | Facebook AI Research

Consumer depth sensors are more and more popular and come to our daily lives marked by its recent integration in the latest iPhone X. However, they still suffer from heavy noises which dramatically limit their applications. Although plenty of…

Shi Yan, Chenglei Wu, Lizhen Wang, Feng Xu, Liang An, Kaiwen Guo, Yebin Liu

September 09, 2018

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.