NLP

ML APPLICATIONS

AD-Drop: Attribution Driven Dropout for Robust Language Model Finetuning

October 31, 2022

Abstract

Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find that dropping attention positions with low attribution scores can accelerate training and increase the risk of overfitting. Motivated by this observation, we propose Attribution-Driven Dropout (\textsc{AD-Drop}), which randomly discards some high-attribution positions to encourage the model to make predictions by relying more on low-attribution positions to reduce overfitting. We also develop a cross-tuning strategy to alternate fine-tuning and \textsc{AD-Drop} to avoid dropping high-attribution positions excessively. Extensive experiments on various benchmarks show that \textsc{AD-Drop} yields consistent improvements over baselines.~Analysis further confirms that \textsc{AD-Drop} serves as a strategic regularizer to prevent overfitting during fine-tuning.

Download the Paper

AUTHORS

Written by

Qifan Wang

Shaoliang Nie

Jinghao Deng

Tao Yang

Xiaojun Quan

Publisher

NeurIPS

Research Topics

Natural Language Processing (NLP)

Core Machine Learning

Related Publications

February 24, 2023

NLP

LLaMA: Open and Efficient Foundation Language Models

Faisal Azhar, Hugo Touvron, Armand Joulin, Aurelien Rodriguez, Baptiste Rozière, Eric Hambro, Gautier Izacard, Guillaume Lample, Marie-Anne Lachaux, Naman Goyal, Thibaut Lavril, Timothee Lacroix, Xavier Martinet, Edouard Grave

February 24, 2023

February 20, 2023

INTEGRITY

NLP

UNIREX: A Unified Learning Framework for Language Model Rationale Extraction

Maziar Sanjabi, Aaron Chan, Hamed Firooz, Lambert Mathias, Liang Tan, Shaoliang Nie, Xiaochang Peng, Xiang Ren

February 20, 2023

December 31, 2022

NLP

Textless Speech Emotion Conversion using Discrete & Decomposed Representations

Yossef Mordechay Adi, Abdelrahman Mohamed, Adam Polyak, Emmanuel Dupoux, Evgeny Kharitonov, Jade Copet, Morgane Rivière, Tu Anh Nguyen, Wei-Ning Hsu, Felix Kreuk

December 31, 2022

December 29, 2022

NLP

Staircase Attention for Recurrent Processing of Sequences

Dexter Ju, Jason Weston, Sainbayar Sukhbaatar, Stephen Roller

December 29, 2022

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.