SPEECH & AUDIO

NLP

Real Time Speech Enhancement in the Waveform Domain

October 25, 2020

Abstract

We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities. We perform evaluations on several standard benchmarks, both using objective metrics and human judgements. The proposed model matches state-of-the-art performance of both causal and non causal methods while working directly on the raw waveform.

Download the Paper

AUTHORS

Written by

Alexandre Defossez

Gabriel Synnaeve

Yossef Mordechay Adi

Publisher

InterSpeech

Related Publications

November 16, 2022

RESEARCH

NLP

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Kushal Tirumala, Aram H. Markosyan, Armen Aghajanyan, Luke Zettlemoyer

November 16, 2022

October 31, 2022

NLP

ML APPLICATIONS

AD-Drop: Attribution Driven Dropout for Robust Language Model Finetuning

Qifan Wang, Shaoliang Nie, Jinghao Deng, Tao Yang, Xiaojun Quan

October 31, 2022

October 31, 2022

RESEARCH

NLP

Autoregressive Search Engines: Generating Substrings as Document Identifiers

Fabio Petroni, Giuseppe Ottaviano, Michele Bevilacqua, Patrick Lewis, Scott Yih, Sebastian Riedel

October 31, 2022

July 07, 2022

NLP

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

Xilun Chen, Armen Aghajanyan, Barlas Oguz, Scott Yih, Sonal Gupta, Patrick Huber

July 07, 2022

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.