August 03, 2023
Large-scale text-to-image diffusion models have significantly improved the state of the art in generative image modeling and allow for an intuitive and powerful user interface to drive the image generation process. Expressing spatial constraints, e.g. to position specific objects in particular locations, is cumbersome using text; and current text-based image generation models are not able to accurately follow such instructions. In this paper we consider image generation from text associated with segments on the image canvas, which combines an intuitive natural language interface with precise spatial control over the generated content. We propose ZestGuide, a ``zero-shot'' segmentation guidance approach that can be plugged into pre-trained text-to-image diffusion models, and does not require any additional training. It leverages implicit segmentation maps that can be extracted from cross-attention layers, and uses them to align the generation with input masks. Our experimental results combine high image quality with accurate alignment of generated content with input segmentations, and improve over prior work both quantitatively and qualitatively, including methods that require training on images with corresponding segmentations. Compared to Paint with Words, the previous state-of-the art in image generation with zero-shot segmentation conditioning, we improve by 5 to 10 mIoU points on the COCO dataset with similar FID scores.
Written by
Jakob Verbeek
Guillaume Couairon
Marlene Careil
Matthieu Cord
Stephane Lathuiliere
Publisher
ICCV
Research Topics
April 18, 2024
Jonas Kohler, Albert Pumarola, Edgar Schoenfeld, Artsiom Sanakoyeu, Roshan Sumbaly, Peter Vajda, Ali Thabet
April 18, 2024
March 20, 2024
Armen Avetisyan, Chris Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell Orme, Jakob Julian Engel, Edward Miller, Richard Newcombe, Vasileios Balntas
March 20, 2024
February 13, 2024
Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos
February 13, 2024
January 25, 2024
Felix Xu, Di Lin, Jianjun Zhao, Jianlang Chen, Lei Ma, Qing Guo, Wei Feng, Xuhong Ren
January 25, 2024
Product experiences
Foundational models
Product experiences
Latest news
Foundational models