RESEARCH

NLP

Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling

November 18, 2020

Abstract

Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks. On the other hand, traditional machine translation has a long history of leveraging unlabeled data through noisy channel modeling. The same idea has recently been shown to achieve strong improvements for neural machine translation. Unfortunately, naıve noisy channel modeling with modern sequence to sequence models is up to an order of magnitude slower than alternatives. We address this issue by introducing efficient approximations to make inference with the noisy channel approach as fast as strong ensembles while increasing accuracy. We also show that the noisy channel approach can outperform strong pre-training results by achieving a new state of the art on WMT Romanian-English translation.

Download the Paper

AUTHORS

Publisher

WMT

Related Publications

April 14, 2024

SPEECH & AUDIO

NLP

CoLLD: Contrastive Layer-to-Layer Distillation for Compressing Multilingual Pre-Trained Speech Encoders

Heng-Jui Chang, Ning Dong (AI), Ruslan Mavlyutov, Sravya Popuri, Andy Chung

April 14, 2024

February 21, 2024

INTEGRITY

NLP

Watermarking Makes Language Models Radioactive

Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, Teddy Furon

February 21, 2024

December 07, 2023

CONVERSATIONAL AI

NLP

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Davide Testuggine, Madian Khabsa

December 07, 2023

December 06, 2023

NLP

Polar Ducks and Where to Find Them: Enhancing Entity Linking with Duck Typing and Polar Box Embeddings

Mattia Atzeni, Mike Plekhanov, Frederic Dreyer, Nora Kassner, Simone Merello, Louis Martin, Nicola Cancedda

December 06, 2023

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.