Generation - NTU Speech Processing Laboratory

Generation Hung-yi Lee 1 Network as Generator Network Simple Distribution Generator Complex Distribution 2We know its formulation, so we can sample from distribution? Video Prediction NetworkSource: Video Previous framesnext frame3 Why distribution? Source: right???Video Prediction Network Previous framesturn left4 Why distribution? Source: Prediction NetworkPrevious framesSimple Distribution 5 Why distribution? Especially for the tasks needs creativity NetworkCharacter with red eyesDrawing Chatbot Network ? ..6(The same input has different outputs.)Generative Adversarial Network (GAN)7 GAN How to pronounce GAN ?Google 8 All Kinds of GAN .. Rosca,Balaji Lakshminarayanan,David Warde-Farley,Shakir Mohamed, VariationalApproaches for Auto-Encoding Generative Adversarial Networks , arXiv, 20179 Anime Face Generation Unconditional Generation Generator Normal DistributionComplex Distribution It is a neural network (that is,a function).

: Larger means real, smaller value Idea of GANB rownveinsButterflies are not brownButterflies do not have Idea of GANNNG eneratorv1 Discri-minatorv1 Real images:NNGeneratorv2 Discri-minatorv2 NNGeneratorv3 Discri-minatorv3 This is where the term adversarial comes Idea of GAN Initialize generator and discriminator In each training iteration:DGsamplegenerated objectsGAlgorithmDUpdatevectorvectorvect orvector00001111randomly sampledDatabaseStep 1: Fix generator G, and update discriminator DDiscriminator learns to assign high scores to real objects and low scores to generated Initialize generator and discriminator In each training iteration:DGAlgorithmStep 2: Fix discriminator D, and update generator layerupdatefixlarge networkGenerator learns to fool the discriminator16 Initialize generator and discriminator In each training iteration:DGLearning DSample some real objects:Generate some fake objects:GAlgorithmDUpdateLearning GGDimage1111imageimageimage1updatefix000 0vectorvectorvectorvectorvectorvectorvec torvectorfix17 Anime Face Generation100 updatesSource of training data: Face Generation1000 updates19 Anime Face Generation2000 updates20 Anime Face Generation5000 updates21 Anime Face Generation10,000 updates22 Anime Face Generation20,000 updates23 Anime Face Generation50,000 updates24 The faces generated by machine.

25In 2019, with of video: first 29(Ian J. Goodfellow)Today .. behind GAN31 Our Objective Normal Distribution as close as possibleGHow to compute the divergence? = min , Divergence between distributions and 32 , = min , is good enough .. = min , Although we do not know the distributions of and , we can sample from from normalDatabaseSampling from Sampling from 33 Discriminator = min , Discriminator: data sampled from : data sampled from train , = + 1 Objective Functionfor D = max , Training: value is related to JS divergence. = max , negative cross entropy Training classifier: minimize cross entropy =class 1class 2 Train a binary classifier Discriminator = min , Discriminator: data sampled from : data sampled from trainhard to discriminatesmall divergenceDiscriminatortraineasy to discriminatelarge divergence = max , Training:Small max , 35 = min , max , The maximum objective value is related to JS divergence.

Initialize generator and discriminator In each training iteration:Step 1: Fix generator G, and update discriminator DStep 2: Fix discriminator D, and update generator G = max , 36 Using the divergence you like Can we use other divergence? is difficult to train .. There is a saying ..(I found this joke from s facebook.)38 Tips for GAN39JS divergence is not suitable In most cases, and are not overlapped. 1. The nature of data 2. SamplingBoth and are low-dim manifold in high-dim space. The overlap can be though and have overlap. If you do not have enough sampling ..40 0 1 0, = 2 1, = 2 100, =0 What is the problem of JS divergence?..JS divergence is always log2 if two distributions do not : If two distributions do not overlap, binary classifier achieves 100% accuracy (or loss) means nothing during GAN distance Considering one distribution P as a pile of earth, and another distribution Q as the target The average distance the earth mover has to move the earth.

D , = 42 Wasserstein distanceSource of image : Using the moving plan with the smallest average distance to define the Wasserstein are many possible moving plans . Smaller distance?Larger distance?43 0 1 0, = 2 1, = 2 100, =0 What is the problem of JS divergence? 0, = 0 1, = 1 100, =0 0 !4445 0 1 is the problem of JS divergence? 0 1 1 ~ ~ Evaluate Wasserstein distance between and How to fulfill this constraint?D has to be smooth generatedD Without the constraint, the training of D will not the D smooth forces D(y) become and Original WGAN Weight Improved WGAN Gradient Penalty Spectral Normalization Keep gradient norm smaller than 1 everywhereForce the parameters w between c and -cAfter parameter update, if w > c, w = c; if w < -c, w = -cKeep the gradient close to 1max 1 ~ ~ is still challenging.

Generator and Discriminator needs to match each other ( )Generate fake images to fool discriminator Tell the difference between real and fakeGeneratorDiscriminatorI cannot tell the difference ..Fail to improve ..Fail to improve ..Cannot fool the discriminator ..More Tips Tips from Soumith Tips in DCGAN: Guideline for network architecture design for image Generation Improved techniques for training GANs Tips from BigGAN for Sequence Generationmax or sampleDecoder Generator DiscriminatorscoreupdateNon-differentiab le .. unchanged unchanged. RL is difficult to trainGAN is difficult to train Sequence Generation GAN (RL+GAN)Reinforcement learning (RL) is for Sequence Generation51 GAN for Sequence Generation Usually, thegeneratorarefine-tuned from a model learned by other approaches. However, with enough hyperparameter-tuning and tips, ScarchGANcan train from scratch. Training language GANs from Models This lecture: Generative Adversarial Network (GAN)Full Generative Models Variational Autoencoder (VAE)FLOW-based Solution?

Typical learning approaches? Generative Latent Optimization (GLO), Origin Networks, Generation56 Generator 57red eyesred eyesblack hairyellow hairdark circlesText-to-imagered hair,greeneyesblue hair,redeyesConditional GANG Normal distribution = , is real image or notImageReal images:Generated images:10 Generator will learn to generate realistic images ..But completely ignore the input conditions. : Red eyesD (original)scalar Conditional GANTrue text- image pairs: is realistic or not + and are matched or not(red eyes, )10G Normal distribution = , image : Red eyesD (better)scalar (red eyes, ) (red eyes, )Conditional GANG translation, or pix2pix = , Conditional GANT esting:inputsupervisedGANG ImageDscalarGAN + GANG : soundImage"a dog barking sound"Training Data GAN Sound-to- images are generated by Chia-Hung Wan and Shun-Po Head Generation Conditional GANM ulti-label image Classifier = Conditional Generator Input conditionGenerated from Unpaired Data6768 DeepNetwork unpairedLearning from Unpaired DataHW3: pseudo labelingHW5: back translation Still need somepaired data69 DeepNetwork unpairedLearning from Unpaired DataImage Style Transfer domain domain Can we learn the mapping without any paired data?

unsupervised Conditional Generation Learning from Unpaired DataNetwork70 domain domain ?Cycle GAN scalarInput image belongs to domain or notBecome similar to domain domain domain domain domain ?Cycle GAN Become similar to domain domain domain domain domain ignore input scalarInput image belongs to domain or notCycle GAN as close as possibleLack of information for reconstruction ? scalarInput image belongs to domain or notDomain Cycle consistencyCycle GAN as close as possibleCycle consistency scalarInput image belongs to domain or notDomain Related to input, so possibleto reconstructCycle GAN as close as possible scalar scalar: belongs to domain or not Cycle consistencyCycle GANDual GAND isco Style Transfer (negative) (positive)GSeq2seqPositive or not?DDiscriminatorText Style Transferpositive (negative)RSeq2seq (negative)minimize the reconstruction errorCycle GAN (positive)?

???????Text Style Transfer , , , , , ! , ! , ~ , ~ From negativesentence to positiveoneLanguage 1 Language 2 AudioTextUnsupervised Abstractive SummarizationUnsupervised ASRU nsupervised of Generation84 Quality of image Human evaluation is expensive (and sometimes unfair/unstable). How to evaluate the quality of the generated images automatically? 85 Off-the-shelf image Classifier | Concentrated distribution means higher visual , Inception net, VGG, 1class 2class 3imageDiversity -Mode Collapse : real data: generated data 86 Diversity -Mode Dropping87(BEGANon CelebA)Generator at iteration tGenerator at iteration t+1: real data: generated data Diversity88 CNN 1low diversityCNN 2 CNN 1class 2class 3class 1class 2class 3class 1class 2class 3 =1 | class 1class 2class 3 | 1 | 2 | 3 Diversity89 CNN 1 | 1 CNN 2 | 2 CNN 3 | 1class 2class 3class 1class 2class 3class 1class 2class 3 =1 | Uniform means higher varietyInception Score(IS): Good quality, large diversity Large ISWhat is the problem here?

Fr chet Inception Distance (FID)redpoints: real imagesFID= Fr chet distance between the two : generated images???Smaller is betterAre GANs Created Equal? A Large-Scale : Smaller is better91We don t want memory DataGenerated Data Generated DataSame as real data ..Simply flip real data ..To learn more about evaluation ..Pros and cons of GAN evaluation Remarks Introduction of Generative ModelsGenerative Adversarial Network (GAN)Theory behind GANTips for GANC onditional GenerationLearning from unpaired dataEvaluation of Generative Models9495

Generation - NTU Speech Processing Laboratory

Tags:

Information

Advertisement

Transcription of Generation - NTU Speech Processing Laboratory

Related search queries

Generation - NTU Speech Processing Laboratory

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries