SPEECH & AUDIO

NLP

A Two-stage Approach to Speech Bandwidth Extension

August 30, 2021

Abstract

Algorithms for speech bandwidth extension (BWE) may work in either the time domain or the frequency domain. Time-domain methods often do not sufficiently recover the high-frequency content of speech signals; frequency-domain methods are better at recovering the spectral envelope, but have difficulty reconstructing the details of the waveform. In this paper, we propose a two-stage approach for BWE, which enjoys the advantages of both time- and frequency-domain methods. The first stage is a frequency-domain neural network, which predicts the high-frequency part of the wide-band spectrogram from the narrow-band input spectrogram. The wide-band spectrogram is then converted into a time-domain waveform, and passed through the second stage to refine the temporal details. For the first stage, we compare a convolutional recurrent network (CRN) with a temporal convolutional network (TCN), and find that the latter is able to capture long-span dependencies equally well as the former while using a lot fewer parameters. For the second stage, we enhance the Wave-U-Net architecture with a multi-resolution short-time Fourier transform (MSTFT) loss function. A series of comprehensive experiments show that the proposed system achieves superior performance in speech enhancement (measured by both time- and frequency-domain metrics) as well as speech recognition.

Download the Paper

AUTHORS

Written by

Yun Wang

Christian Fuegen

Didi Zhang

Gil Keren

Kaustubh Kalgaonkar

Ju Lin

Publisher

Interspeech

Related Publications

April 22, 2024

NLP

Text Quality-Based Pruning for Efficient Training of Language Models

Vasu Sharma *, Karthik Padthe *, Newsha Ardalani, Kushal Tirumala, Russ Howes, Hu Xu, Bernie Huang, Daniel Li (FAIR), Armen Aghajanyan, Gargi Ghosh, Luke Zettlemoyer

April 22, 2024

April 14, 2024

SPEECH & AUDIO

NLP

CoLLD: Contrastive Layer-to-Layer Distillation for Compressing Multilingual Pre-Trained Speech Encoders

Heng-Jui Chang, Ning Dong (AI), Ruslan Mavlyutov, Sravya Popuri, Andy Chung

April 14, 2024

April 05, 2024

CONVERSATIONAL AI

NLP

MART: Improving LLM Safety with Multi-round Automatic Red-Teaming

Suyu Ge, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, Yuning Mao

April 05, 2024

March 05, 2024

SPEECH & AUDIO

Generative Pre-training for Speech with Flow Matching

Alex Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu

March 05, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.