RESEARCH

SPEECH & AUDIO

Training Millions of Personalized Dialogue Agents

October 31, 2018

Abstract

Current dialogue systems fail at being engaging for users, especially when trained endto-end without relying on proactive reengaging scripted strategies. Zhang et al. (2018) showed that the engagement level of end-toend dialogue models increases when conditioning them on text personas providing some personalized back-story to the model. However, the dataset used in (Zhang et al., 2018) is synthetic and only contains around 1k different personas. In this paper we introduce a new dataset providing 5 million personas and 700 million persona-based dialogues. Our experiments show that, at this scale, training using personas still improves the performance of end-to-end systems. In addition, we show that other tasks benefit from the wide coverage of our dataset by fine-tuning our model on the data from (Zhang et al., 2018) and achieving state-of-the-art results.

Download the Paper

AUTHORS

Written by

Pierre-Emmanuel Mazaré

Antoine Bordes

Martin Raison

Samuel Humeau

Publisher

EMNLP

Related Publications

December 15, 2021

RESEARCH

Sample-and-threshold differential privacy: Histograms and applications

Akash Bharadwaj, Graham Cormode

December 15, 2021

August 30, 2021

SPEECH & AUDIO

NLP

A Two-stage Approach to Speech Bandwidth Extension

Yun Wang, Christian Fuegen, Didi Zhang, Gil Keren, Kaustubh Kalgaonkar, Ju Lin

August 30, 2021

January 09, 2021

RESEARCH

COMPUTER VISION

Tarsier: Evolving Noise Injection in Super-Resolution GANs

Baptiste Rozière, Camille Couprie, Olivier Teytaud, Andry Rasoanaivo, Hanhe Lin, Nathanaël Carraz Rakotonirina, Vlad Hosu

January 09, 2021

January 09, 2021

RESEARCH

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

Jean Tarbouriech, Alessandro Lazaric, Matteo Pirotta, Michal Valko

January 09, 2021