RESEARCH

ML APPLICATIONS

TTS Skins: Speaker Conversion via ASR

August 02, 2020

Abstract

We present a fully convolutional wav-to-wav network for converting between speakers' voices, without relying on text. Our network is based on an encoder-decoder architecture, where the encoder is pre-trained for the task of Automatic Speech Recognition, and a multi-speaker waveform decoder is trained to reconstruct the original signal in an autoregressive manner. We train the network on narrated audiobooks, and demonstrate multi-voice TTS in those voices, by converting the voice of a TTS robot.

Download the Paper

AUTHORS

Written by

Adam Polyak

Lior Wolf

Yaniv Taigman

Publisher

INTERSPEECH

Related Publications

November 28, 2022

RESEARCH

CORE MACHINE LEARNING

Neural Attentive Circuits

Nicolas Ballas, Bernhard Schölkopf, Chris Pal, Francesco Locatello, Li Erran, Martin Weiss, Nasim Rahaman, Yoshua Bengio

November 28, 2022

November 27, 2022

RESEARCH

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

Andrea Tirinzoni, Aymen Al Marjani, Emilie Kaufmann

November 27, 2022

November 23, 2022

THEORY

CORE MACHINE LEARNING

Generalization Bounds for Deep Transfer Learning Using Majority Predictor Accuracy

Tal Hassner, Cuong N. Nguyen, Cuong V. Nguyen, Lam Si Tung Ho, Vu Dinh

November 23, 2022

November 16, 2022

RESEARCH

NLP

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Kushal Tirumala, Aram H. Markosyan, Armen Aghajanyan, Luke Zettlemoyer

November 16, 2022

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.