NLP

ML APPLICATIONS

AD-Drop: Attribution Driven Dropout for Robust Language Model Finetuning

October 31, 2022

Abstract

Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find that dropping attention positions with low attribution scores can accelerate training and increase the risk of overfitting. Motivated by this observation, we propose Attribution-Driven Dropout (\textsc{AD-Drop}), which randomly discards some high-attribution positions to encourage the model to make predictions by relying more on low-attribution positions to reduce overfitting. We also develop a cross-tuning strategy to alternate fine-tuning and \textsc{AD-Drop} to avoid dropping high-attribution positions excessively. Extensive experiments on various benchmarks show that \textsc{AD-Drop} yields consistent improvements over baselines.~Analysis further confirms that \textsc{AD-Drop} serves as a strategic regularizer to prevent overfitting during fine-tuning.

Download the Paper

AUTHORS

Written by

Qifan Wang

Shaoliang Nie

Jinghao Deng

Tao Yang

Xiaojun Quan

Publisher

NeurIPS

Research Topics

Natural Language Processing (NLP)

Core Machine Learning

Related Publications

November 28, 2022

RESEARCH

CORE MACHINE LEARNING

Neural Attentive Circuits

Nicolas Ballas, Bernhard Schölkopf, Chris Pal, Francesco Locatello, Li Erran, Martin Weiss, Nasim Rahaman, Yoshua Bengio

November 28, 2022

November 16, 2022

RESEARCH

NLP

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Kushal Tirumala, Aram H. Markosyan, Armen Aghajanyan, Luke Zettlemoyer

November 16, 2022

November 10, 2022

RESEARCH

COMPUTER VISION

Learning State-Aware Visual Representations from Audible Interactions

Unnat Jain, Abhinav Gupta, Himangi Mittal, Pedro Morgado

November 10, 2022

November 08, 2022

THEORY

RESEARCH

Beyond neural scaling laws: beating power law scaling via data pruning

Ari Morcos, Shashank Shekhar, Surya Ganguli, Ben Sorscher, Robert Geirhos

November 08, 2022

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.