Image synthesis with deep generative models

Phillip Isola

Massachusetts Institute of Technology, US

Abstract

I will cover the basic theory and applications of deep generative models for vision. In these models, a generator network is trained to map random noise to natural images. I will cover the different ways differnet models achieve this goal: autoregressive models maximize the likelihood of generated samples, while generative adversarial networks map to an output distribution that is indistinguishable from the distribution of natural images, according to an adversary that tries to classify between these two distributions. These models can hallucinate realistic photos, but often we don't want to just make up images from scratch. More often, we are given some data and wish to make predictions based on it. For example, given the current view of the world, predict what the future will look like. This is an application for conditional generative models, which condition on observed data and make a prediction about unobserved data. Unlike traditional predictors, conditional generative models are adept at dealing with both high-dimensional inputs and high-dimensional outputs. I will show a number of applications in vision and robotics, including image-to-image translation and visual foresight for control.