RESEARCH

ML APPLICATIONS

Reducing Transformer Depth on Demand with Structured Dropout

April 26, 2020

Abstract

Overparameterized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering. These models contain hundreds of millions of parameters, necessitating a large amount of computation and making them prone to overfitting. In this work, we explore LayerDrop, a form of structured dropout, which has a regularization effect during training and allows for efficient pruning at inference time. In particular, we show that it is possible to select sub-networks of any depth from one large network without having to finetune them and with limited impact on performance. We demonstrate the effectiveness of our approach by improving the state of the art on machine translation, language modeling, summarization, question answering, and language understanding benchmarks. Moreover, we show that our approach leads to small BERT-like models of higher quality compared to training from scratch or using distillation.

Download the Paper

AUTHORS

Written by

Angela Fan

Edouard Grave

Armand Joulin

Publisher

International Conference on Learning Representations (ICLR)

Research Areas

ML Applications

Recent Publications

May 14, 2021

Not All Memories are Created Equal: Learning to Forget by Expiring

Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan

May 14, 2021

May 03, 2021

NLP

Support-Set bottlenecks for video-text representation learning

Mandela Patrick, Po-Yao Huang, Florian Metze , Andrea Vedaldi, Alexander Hauptmann, Yuki M. Asano, João Henriques

May 03, 2021

April 08, 2021

RESPONSIBLE AI

INTEGRITY

Towards measuring fairness in AI: the Casual Conversations dataset

Caner Hazirbas, Joanna Bitton, Brian Dolhansky, Jacqueline Pan, Albert Gordo, Cristian Canton Ferrer

April 08, 2021

March 13, 2021

REINFORCEMENT LEARNING

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, Andre Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra

March 13, 2021

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.