Wei-Ning Hsu

RESEARCH SCIENTIST | NEW YORK CITY, UNITED STATES

Wei-Ning is a research scientist at Meta AI (f.k.a FAIR). His research focuses on representation learning, self-supervised learning, and structured generative modeling for unimodal and multimodal speech. He is passionate about reducing the supervision required for various speech applications and developing technologies applicable to both written and unwritten languages.

Prior to joining Facebook. Wei-Ning received his Ph.D. and S.M. degrees in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2020 and 2018. He received his B.S. degree in Electrical Engineering from National Taiwan University in 2014.

Wei-Ning's Work

Wei-Ning's Publications

December 11, 2023

SPEECH & AUDIO

Audiobox: Unified Audio Generation with Natural Language Prompts

Wei-Ning Hsu, Akinniyi Akinyemi, Alice Rakotoarison, Andros Tjandra, Apoorv Vyas, Baishan Guo, Bapi Akula, Bowen Shi, Brian Ellis, Ivan Cruz, Jeff Wang, Jiemin Zhang, Mary Williamson, Matt Le, Rashel Moritz, Robbie Adkins, William Ngan, Xinyue Zhang, Yael Yungster, Yi-Chiao Wu

December 11, 2023

October 22, 2023

SPEECH & AUDIO

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Michael Auli, Wei-Ning Hsu, Alexander Liu, Heng-Jui Chang, James Glass

October 22, 2023

August 19, 2023

NLP

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossef Mordechay Adi, Emmanuel Dupoux

August 19, 2023

August 19, 2023

NLP

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang

August 19, 2023

July 23, 2023

NLP

COMPUTER VISION

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

Michael Auli, Alexei Baevski, Arun Babu, Wei-Ning Hsu

July 23, 2023

June 16, 2023

SPEECH & AUDIO

NLP

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Matt Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossef (Yossi) Adi, Jay Mahadeokar, Wei-Ning Hsu

June 16, 2023

May 22, 2023

NLP

Scaling Speech Technology to 1,000+ Languages

Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli

May 22, 2023

December 31, 2022

NLP

Textless Speech Emotion Conversion using Discrete & Decomposed Representations

Yossef Mordechay Adi, Abdelrahman Mohamed, Adam Polyak, Emmanuel Dupoux, Evgeny Kharitonov, Jade Copet, Morgane Rivière, Tu Anh Nguyen, Wei-Ning Hsu, Felix Kreuk

December 31, 2022

July 17, 2022

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

Michael Auli, Alexei Baevski, Arun Babu, Jiatao Gu, Wei-Ning Hsu, Qiantong Xu

July 17, 2022

October 25, 2021

NLP

Unsupervised Speech Recognition

Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli

October 25, 2021