GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce

August 22, 2020


In this paper, we present GrokNet, a deployed image recognition system for commerce applications. GrokNet leverages a multi-task learning approach to train a single computer vision trunk. We achieve a 2.1x improvement in exact product match accuracy when compared to the previous state-of-the-art Facebook product recognition system. We achieve this by training on 7 datasets across several commerce verticals, using 80 categorical loss functions and 3 embedding losses. We share our experience of combining diverse sources with wide-ranging label semantics and image statistics, including learning from human annotations, user-generated tags, and noisy search engine interaction data. GrokNet has demonstrated gains in production applications and operates at Facebook scale.

Download the Paper


Written by

Sean Bell

Yiqun Liu

Sami Alsheikh

Yina Tang

Ed Pizzi

M. Henning

Karun Singh

Omkar Parkhi

Fedor Borisyuk



Recent Publications

January 01, 2021

Asynchronous Gradient-Push | Facebook AI Research

We consider a multi-agent framework for distributed optimization where each agent has access to a local smooth strongly convex function, and the collective goal is to achieve consensus on the parameters that minimize the sum of the agents’…

Mahmoud Assran, Michael Rabbat

January 01, 2021

October 26, 2020


Weak-Attention Suppression For Transformer Based Speech Recognition

Transformers, originally proposed for natural language processing (NLP) tasks, have recently achieved great success in automatic…

Yangyang Shi, Yongqiang Wang, Chunyang Wu, Christian Fuegen, Frank Zhang, Duc Le, Ching-Feng Yeh, Michael L. Seltzer

October 26, 2020

October 25, 2020


Statistical Testing on ASR Performance via Blockwise Bootstrap

A common question being raised in automatic speech recognition (ASR) evaluations is how reliable is an observed word error rate (WER) improvement comparing two ASR systems, where statistical hypothesis testing and confidence interval…

Zhe Liu, Fuchun Peng

October 25, 2020

October 01, 2020

Voice Separation with an Unknown Number of Multiple Speakers

We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps…

Eliya Nachmani, Yossi Adi, Lior Wolf

October 01, 2020

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.