April 25, 2020
Humans understand novel sentences by composing meanings and roles of core language components. In contrast, neural network models for natural language modeling fail when such compositional generalization is required. The main contribution of this paper is to hypothesize that language compositionality is a form of group-equivariance. Based on this hypothesis, we propose a set of tools for constructing equivariant sequence-to-sequence models. Throughout a variety of experiments on the SCAN tasks, we analyze the behavior of existing models under the lens of equivariance, and demonstrate that our equivariant architecture is able to achieve the type compositional generalization required in human language understanding.
June 14, 2020
Many visual scenes contain text that carries crucial information, and it is thus essential to understand text in images for downstream reasoning tasks. For example, a deep water label on a warning sign warns people about the danger in the…
Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach
June 14, 2020
April 25, 2020
The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem.…
Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis
April 25, 2020
April 25, 2020
Humans understand novel sentences by composing meanings and roles of core language components. In contrast, neural network models for natural language modeling fail when such compositional generalization is required. The main contribution of…
Jonathan Gordon, David Lopez-Paz, Marco Baroni, Diane Bouchacourt
April 25, 2020
September 15, 2019
Lexicon-free speech recognition naturally deals with the problem of out-of-vocabulary (OOV) words. In this paper, we show that character-based language models (LM) can perform as well as word-based LMs for speech recognition, in word error…
Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert
September 15, 2019