RESEARCH

NLP

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

November 16, 2022

Abstract

Despite their wide adoption, the underlying training and memorization dynamics of very large language models is not well understood. We empirically study exact memorization in causal and masked language modeling, across model sizes and throughout the training process. We measure the effects of dataset size, learning rate, and model size on memorization, finding that larger language models memorize training data faster across all settings. Surprisingly, we show that larger models can memorize a larger portion of the data before over-fitting and tend to forget less throughout the training process. We also analyze the memorization dynamics of different parts of speech and find that models memorize nouns and numbers first; we hypothesize and provide empirical evidence that nouns and numbers act as a unique identifier for memorizing individual training examples. Together, these findings present another piece of the broader puzzle of trying to understand what actually improves as models get bigger.

Download the Paper

AUTHORS

Written by

Kushal Tirumala

Aram H. Markosyan

Armen Aghajanyan

Luke Zettlemoyer

Publisher

NeurIPS

Research Topics

Natural Language Processing (NLP)

Core Machine Learning

Related Publications

November 28, 2022

RESEARCH

CORE MACHINE LEARNING

Neural Attentive Circuits

Nicolas Ballas, Bernhard Schölkopf, Chris Pal, Francesco Locatello, Li Erran, Martin Weiss, Nasim Rahaman, Yoshua Bengio

November 28, 2022

November 27, 2022

RESEARCH

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

Andrea Tirinzoni, Aymen Al Marjani, Emilie Kaufmann

November 27, 2022

November 10, 2022

RESEARCH

COMPUTER VISION

Learning State-Aware Visual Representations from Audible Interactions

Unnat Jain, Abhinav Gupta, Himangi Mittal, Pedro Morgado

November 10, 2022

November 09, 2022

RESEARCH

VisCo Grids: Surface Reconstruction with Viscosity and Coarea Grids

Yaron Lipman, Albert Pumarola, Ali Thabet, Artsiom Sanakoyeu, Lior Yariv

November 09, 2022

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.