Speech & Audio

Our team advances the state of the art in Speech & Audio. We create spoken language technology to make it faster and easier for people to build community and connect with others around the world. We work on all aspects of speech and audio processing, including speech recognition and synthesis, speaker identification, acoustic event detection and music analysis and generation.

Our technology is deployed at scale, including voice interfaces for Portal and Oculus devices, and video understanding for Facebook and Instagram, including transcription, captioning, and content understanding. Our video understanding efforts are unique in their scope and scale, processing the billions of videos that Facebook and Instagram receive in dozens of languages.

Latest Publications

April 15, 2018


Towards End-to-End Spoken Language Understanding

Dmitriy Serdyuk, Yongqiang Wang, Christian Fuegen, Anuj Kumar, Baiyang Liu, Yoshua Bengio

April 15, 2018