RESEARCH

COMPUTER VISION

Improving Generative Visual Dialog by Answering Diverse Questions

November 3, 2019

Abstract

Prior work on training generative Visual Dialog models with reinforcement learning (Das et al., 2017b) has explored a Q-BOT-A-BOT image-guessing game and shown that this ‘self-talk’ approach can lead to improved performance at the downstream dialog-conditioned image-guessing task. However, this improvement saturates and starts degrading after a few rounds of interaction, and does not lead to a better Visual Dialog model. We find that this is due in part to repeated interactions between Q-BOT and A-BOT during self-talk, which are not informative with respect to the image. To improve this, we devise a simple auxiliary objective that incentivizes Q-BOT to ask diverse questions, thus reducing repetitions and in turn enabling A-BOT to explore a larger state space during RL i.e. be exposed to more visual concepts to talk about, and varied questions to answer. We evaluate our approach via a host of automatic metrics and human studies, and demonstrate that it leads to better dialog, i.e. dialog that is more diverse (i.e. less repetitive), consistent (i.e. has fewer conflicting exchanges), fluent (i.e. more humanlike), and detailed, while still being comparably image-relevant as prior work and ablations.

Download the Paper

Related Publications

June 17, 2019

COMPUTER VISION

Graph-Based Global Reasoning Networks | Facebook AI Research

Yunpeng Chen, Marcus Rohrbach, Zhicheng Yan, Shuicheng Yan, Jiashi Feng, Yannis Kalantidis

June 17, 2019

June 17, 2019

COMPUTER VISION

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition | Facebook AI Research

Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan

June 17, 2019

June 18, 2019

COMPUTER VISION

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception | Facebook AI Research

Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra

June 18, 2019

July 28, 2019

SPEECH & AUDIO

COMPUTER VISION

Learning to Optimize Halide with Tree Search and Random Programs | Facebook AI Research

Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, Jonathan Ragan-Kelley

July 28, 2019

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.