September 20, 2021
Generative adversarial networks (GANs) are a well-established AI method to create images, whether photorealistic pictures or abstract collages. However, to date these models have had an important limitation: They can typically only generate images of objects or scenes that are closely related to the training data set.
A traditional GAN trained on images of cars shows impressive results when asked to generate other images of cars, for example, but will likely fail if asked to generate images of flowers or other objects outside of its automotive data set.
Facebook AI has made great strides in solving this problem with Instance-Conditioned GAN (IC-GAN), a new and simple image generation model that creates high-quality, diverse images — even if its input image doesn’t appear in the training set. Unlike previous methods, IC-GANs can generate realistic, unforeseen image combinations, such as camels surrounded by snow or zebras in a city. Our approach exhibits exceptional transfer capabilities across different types of objects. Researchers can use IC-GANs off the shelf with previously unseen data sets and still generate realistic-looking images, without requiring labeled data.
With these new capabilities, IC-GANs could be used to create new visual examples to augment data sets to include diverse objects and scenes; help artists and creators with more expansive, creative AI-generated content; and advance research in high-quality image generation.
Standard methods, called class conditional GANs, focus on conditioning on class labels, effectively partitioning the data into groups corresponding to those labels. This enables them to generate higher-quality samples than their unconditional counterparts. And rather than creating only random images, these GANs are also able to create images that fit a particular label, such as “clothing” or “car.” However, they rely on labeled data which may be unavailable or unfeasible to obtain.
Previous label-free learning approaches (using no labeled data) to image generation have been promising, but their output is typically of poor quality when trained to model complex data sets, such as ImageNet. They either use coarse, nonoverlapping data partitions (resulting in very large clusters, each of which contains images of very different objects and thus won’t be semantically similar to the picture the model is trying to create). Or they use fine partitions that tend to deteriorate results because the clusters contain too few data points.
Our new approach, the IC-GAN, can be used with both labeled and unlabeled data sets. It extends the GAN framework to model a mixture of local and overlapping data clusters. It can take a single image (or “instance”) and then generate images that are similar to the instance’s closest neighbors in the data set. We use neighbors as an input to the discriminator, to force the generator to create samples that are similar to the neighborhood samples of each instance. This avoids the problem of partitioning data into small clusters, since so much of the data is overlapping, so the model can use data sets more efficiently.
Once the model is trained, we then test it on images it has never seen before. Using a single image, the model can generate visually rich images that are similar to the closest neighbors in the data set.
For both class-conditional settings (where the training set includes labeled images) and also where there are no labels at all, the IC-GAN can be transferred to other data sets not seen during training. In the case of IC-GAN, we do this by swapping out the conditioning instances at inference time. In the case of a class-conditional IC-GAN, we can swap either the instance conditioning or the class label. By appropriately combining instances and the class labels, the class-conditional IC-GAN can create unusual scenes that either aren’t present or are very rare in current data sets. For example, given an image of a snowplow surrounded by snow and a class label “camel,” which doesn’t appear in the instance conditioning, we can generate camels surrounded by snow, bypassing the bias that camels live only in the desert.
IC-GAN can be used to augment data and include items or objects that are not commonly found in the training data. Moreover, since it works across different domains, our approach can generate more diverse training data for object recognition models. Traditional GAN models, for instance, would not be able to generate images of zebras standing in urban areas, since its training data would likely only contain images of zebras in grasslands. The IC-GAN model can be used to augment data and include items or objects that are not commonly found in standard data sets. We’ve shown that we can use controlled semantics to generate unusual image combinations, like cows in the sand, for example.
In the future, we hope to explore ways to bring even more control to this model. It will no longer be just about the background and the object at the center. We want to explore how more objects can be placed in the background and determine where the items are placed, creating complex, picture-perfect scenes.
By releasing our pretrained models into the open source community, along with code to reproduce the results from the paper, we hope this research will lead to AI models that generate images with more flexibility, accuracy, and efficiency than ever before.