RESEARCH

NLP

CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text

November 05, 2019

Abstract

The recent success of natural language understanding (NLU) systems has been troubled by results highlighting the failure of these models to generalize in a systematic and robust way. In this work, we introduce a diagnostic benchmark suite, named CLUTRR, to clarify some key issues related to the robustness and systematicity of NLU systems. Motivated by classic work on inductive logic programming, CLUTRR requires that an NLU system infer kinship relations between characters in short stories. Successful performance on this task requires both extracting relationships between entities, as well as inferring the logical rules governing these relationships. CLUTRR allows us to precisely measure a model's ability for systematic generalization by evaluating on held-out combinations of logical rules, and it allows us to evaluate a model's robustness by adding curated noise facts. Our empirical results highlight a substantial performance gap between state-of-the-art NLU models (e.g., BERT and MAC) and a graph neural network model that works directly with symbolic inputs --- with the graph-based model exhibiting both stronger generalization and greater robustness.

Download the Paper

AUTHORS

Written by

Koustuv Sinha

Joelle Pineau

Will Hamilton

Jin Dong

Shagun Sodhani

Publisher

EMNLP

Related Publications

December 13, 2022

NLP

COMPUTER VISION

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

Michael Auli, Alexei Baevski, Arun Babu, Wei-Ning Hsu

December 13, 2022

November 28, 2022

RESEARCH

CORE MACHINE LEARNING

Neural Attentive Circuits

Nicolas Ballas, Bernhard Schölkopf, Chris Pal, Francesco Locatello, Li Erran, Martin Weiss, Nasim Rahaman, Yoshua Bengio

November 28, 2022

November 27, 2022

RESEARCH

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

Andrea Tirinzoni, Aymen Al Marjani, Emilie Kaufmann

November 27, 2022

November 16, 2022

RESEARCH

NLP

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Kushal Tirumala, Aram H. Markosyan, Armen Aghajanyan, Luke Zettlemoyer

November 16, 2022

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.