Research Area

Speech & Audio

Our team advances the state of the art in Speech & Audio. We create spoken language technology to make it faster and easier for people to build community and connect with others around the world. We work on all aspects of speech and audio processing, including speech recognition and synthesis, speaker identification, acoustic event detection and music analysis and generation.

Our technology is deployed at scale, including voice interfaces for Portal and Oculus devices, and video understanding for Facebook and Instagram, including transcription, captioning, and content understanding. Our video understanding efforts are unique in their scope and scale, processing the billions of videos that Facebook and Instagram receive in dozens of languages.

Latest Publications

Speech & Audio

Towards End-to-End Spoken Language Understanding

In this paper, we present our study on an end-to-end learning system for spoken language understanding. With this unified approach, we can infer the semantic meaning directly from audio features without the intermediate text representation.

Join Us

Tackle the world's most complex technology challenges.

Join Our Team

Latest News

Visit the AI Blog for updates on recent publications, new tools, and more.

Visit Blog