Example: air traffic controller

Conditional Image Synthesis with Auxiliary Classifier GANs

Conditional Image Synthesis with Auxiliary Classifier GANsAugustus Odena1 Christopher Olah1 Jonathon Shlens1 AbstractIn this paper we introduce new methods for theimproved training of generative adversarial net-works (GANs) for Image Synthesis . We con-struct a variant of GANs employing label condi-tioning that results in128 128resolution im-age samples exhibiting global coherence. Weexpand on previous work for Image quality as-sessment to provide two new analyses for assess-ing the discriminability and diversity of samplesfrom class- Conditional Image Synthesis analyses demonstrate that high resolutionsamples provide class information not present inlow resolution samples. Across 1000 ImageNetclasses,128 128samples are more than twiceas discriminable as artificially resized32 32samples.

over pixels (van den Oord et al.,2016a;b). These models produce convincing samples but are costly to sample from and do not provide a latent representation. Invertible den-sity estimators transform latent variables directly using a series of parameterized functions constrained to be invert-ible (Dinh et al.,2016). This technique allows for exact

Tags:

  Conditional, Bile

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Conditional Image Synthesis with Auxiliary Classifier GANs

1 Conditional Image Synthesis with Auxiliary Classifier GANsAugustus Odena1 Christopher Olah1 Jonathon Shlens1 AbstractIn this paper we introduce new methods for theimproved training of generative adversarial net-works (GANs) for Image Synthesis . We con-struct a variant of GANs employing label condi-tioning that results in128 128resolution im-age samples exhibiting global coherence. Weexpand on previous work for Image quality as-sessment to provide two new analyses for assess-ing the discriminability and diversity of samplesfrom class- Conditional Image Synthesis analyses demonstrate that high resolutionsamples provide class information not present inlow resolution samples. Across 1000 ImageNetclasses,128 128samples are more than twiceas discriminable as artificially resized32 32samples.

2 In addition, of the classes havesamples exhibiting diversity comparable to realImageNet IntroductionCharacterizing the structure of natural images has been arich research endeavor. Natural images obey intrinsic in-variances and exhibit multi-scale statistical structures thathave historically been difficult to quantify (Simoncelli &Olshausen, 2001). Recent advances in machine learningoffer an opportunity to substantially improve the quality ofimage models. Improved Image models advance the state-of-the-art in Image denoising (Ball e et al., 2015), compres-sion (Toderici et al., 2016), in-painting (van den Oord et al.,2016a), and super-resolution (Ledig et al., 2016). Bet-ter models of natural images also improve performance insemi-supervised learning tasks (Kingma et al.)

3 , 2014; Sprin-genberg, 2015; Odena, 2016; Salimans et al., 2016) and re-inforcement learning problems (Blundell et al., 2016).One method for understanding natural Image statistics is tobuild a system that synthesizes imagesde novo. There are1 Google Brain. Correspondence to: Augustus of the34thInternational Conference on MachineLearning, Sydney, Australia, PMLR 70, 2017. Copyright 2017by the author(s).several promising approaches for building Image synthe-sis models. Variational autoencoders (VAEs) maximize avariational lower bound on the log-likelihood of the train-ing data (Kingma & Welling, 2013; Rezende et al., 2014).VAEs are straightforward to train but introduce potentiallyrestrictive assumptions about the approximate posteriordistribution (but see (Rezende & Mohamed, 2015; Kingmaet al.)

4 , 2016)). Autoregressive models dispense with latentvariables and directly model the Conditional distributionover pixels (van den Oord et al., 2016a;b). These modelsproduce convincing samples but are costly to sample fromand do not provide a latent representation. Invertible den-sity estimators transform latent variables directly using aseries of parameterized functions constrained to be invert-ible (Dinh et al., 2016). This technique allows for exactlog-likelihood computation and exact inference, but the in-vertibility constraint is adversarial networks (GANs) offer a distinctand promising approach that focuses on a game-theoreticformulation for training an Image Synthesis model (Good-fellow et al., 2014). Recent work has shown that GANs canproduce convincing Image samples on datasets with lowvariability and low resolution (Denton et al.

5 , 2015; Radfordet al., 2015). However, GANs struggle to generate glob-ally coherent, high resolution samples - particularly fromdatasets with high variability. Moreover, a theoretical un-derstanding of GANs is an on-going research topic (Ueharaet al., 2016; Mohamed & Lakshminarayanan, 2016).In this work we demonstrate that that adding more structureto the GAN latent space along with a specialized cost func-tion results in higher quality samples. We exhibit128 128pixel samples from all classes of the ImageNet dataset(Russakovsky et al., 2015) with increased global coherence(Figure 1). Importantly, we demonstrate quantitatively thatour high resolution samples are not just naive resizings oflow resolution samples. In particular, downsampling our128 128samples to32 32leads to a 50% decrease invisual discriminability.

6 We also introduce a new metric forassessing the variability across Image samples and employthis metric to demonstrate that our synthesized images ex-hibit diversity comparable to training data for a large frac-tion ( ) of ImageNet classes. In more detail, this workis the first to: [ ] 20 Jul 2017 Conditional Image Synthesis with Auxiliary Classifier GANsmonarch butterflygoldfinchdaisygrey whaleredshankFigure 128resolution samples from 5 classes taken from an AC-GAN trained on the ImageNet dataset. Note that the classesshown have been selected to highlight the success of the model and are not representative. Samples from all ImageNet classes are linkedlater in the text. Demonstrate an Image Synthesis model for all 1000 ImageNet classes at a 128x128 spatial resolution (orany spatial resolution - see Section 3).

7 Measure how much an Image Synthesis model actuallyuses its output resolution (Section ). Measure perceptual variability and collapsing be-havior in a GAN with a fast, easy-to-compute metric(Section ). Highlight that a high number of classes is what makesImageNet Synthesis difficult for GANs and provide anexplicit solution (Section ). Demonstrate experimentally that GANs that performwell perceptually are not those that memorize a smallnumber of examples (Section ). Achieve state of the art on the Inception score metricwhen trained on CIFAR-10 without using any of thetechniques from (Salimans et al., 2016) (Section ).2. BackgroundA generative adversarial network (GAN) consists of twoneural networks trained in opposition to one another. ThegeneratorGtakes as input a random noise vectorzandoutputs an imageXfake=G(z).

8 The discriminatorDreceives as input either a training Image or a synthesizedimage from the generator and outputs a probability distri-butionP(S|X) =D(X)over possible Image discriminator is trained to maximize the log-likelihoodit assigns to the correct source:L=E[logP(S=real|Xreal)]+E[logP(S= fake|Xfake)](1)The generator is trained to minimize the second term inEquation basic GAN framework can be augmented using sideinformation. One strategy is to supply both the generatorand discriminator with class labels in order to produce classconditional samples (Mirza & Osindero, 2014). Class con-ditional Synthesis can significantly improve the quality ofgenerated samples (van den Oord et al., 2016b). Richer sideinformation such as Image captions and bounding box lo-calizations may improve sample quality further (Reed et al.)

9 ,2016a;b).Instead of feeding side information to the discriminator,one can task the discriminator with reconstructing side in-formation. This is done by modifying the discriminator tocontain an Auxiliary decoder network1that outputs the classlabel for the training data (Odena, 2016; Salimans et al.,2016) or a subset of the latent variables from which thesamples are generated (Chen et al., 2016). Forcing a modelto perform additional tasks is known to improve perfor-mance on the original task ( (Sutskever et al., 2014;Szegedy et al., 2014; Ramsundar et al., 2016)). In addi-tion, an Auxiliary decoder could leverage pre-trained dis-criminators ( Image classifiers) for further improvingthe synthesized images (Nguyen et al.

10 , 2016). Motivatedby these considerations, we introduce a model that com-bines both strategies for leveraging side information. Thatis, the model proposed below is class Conditional , but withan Auxiliary decoder that is tasked with reconstructing , one can force the discriminator to work withthe joint distribution(X, z)and train a separate inference networkthat computesq(z|X)(Dumoulin et al., 2016; Donahue et al.,2016). Conditional Image Synthesis with Auxiliary Classifier GANs3. AC-GANsWe propose a variant of the GAN architecture which wecall an Auxiliary Classifier GAN (or AC-GAN). In the AC-GAN, every generated sample has a corresponding class la-bel,c pcin addition to the both to gener-ate imagesXfake=G(c,z). The discriminator gives botha probability distribution over sources and a probability dis-tribution over the class labels,P(S|X), P(C|X) =D(X).


Related search queries