RESEARCH

SPEECH & AUDIO

SING: Symbol-to-Instrument Neural Generator

October 26, 2018

Abstract

Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their successes, current state-of-the-art neural audio synthesizers such as WaveNet and SampleRNN suffer from prohibitive training and inference times because they are based on autoregressive models that generate audio samples one at a time at a rate of 16kHz. In this work, we study the more computationally efficient alternative of generating the waveform frame-by-frame with large strides. We present a lightweight neural audio synthesizer for the original task of generating musical notes given desired instrument, pitch and velocity. Our model is trained end-to-end to generate notes from nearly 1000 instruments with a single decoder, thanks to a new loss function that minimizes the distances between the log spectrograms of the generated and target waveforms. On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.

Download the Paper

AUTHORS

Written by

Alexandre Defossez

Leon Bottou

Neil Zeghidour

Nicolas Usunier

Francis Bach

Publisher

NIPS

Research Topics

Speech & Audio

Related Publications

December 15, 2021

RESEARCH

Sample-and-threshold differential privacy: Histograms and applications

Akash Bharadwaj, Graham Cormode

December 15, 2021

August 30, 2021

SPEECH & AUDIO

NLP

A Two-stage Approach to Speech Bandwidth Extension

Yun Wang, Christian Fuegen, Didi Zhang, Gil Keren, Kaustubh Kalgaonkar, Ju Lin

August 30, 2021

January 09, 2021

RESEARCH

COMPUTER VISION

Tarsier: Evolving Noise Injection in Super-Resolution GANs

Baptiste Rozière, Camille Couprie, Olivier Teytaud, Andry Rasoanaivo, Hanhe Lin, Nathanaël Carraz Rakotonirina, Vlad Hosu

January 09, 2021

January 09, 2021

RESEARCH

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

Jean Tarbouriech, Alessandro Lazaric, Matteo Pirotta, Michal Valko

January 09, 2021