Image Style Transfer Using Convolutional Neural Networks

Image Style Transfer Using Convolutional Neural NetworksLeon A. GatysCentre for Integrative Neuroscience, University of T ubingen, GermanyBernstein Center for Computational Neuroscience, T ubingen, GermanyGraduate School of Neural Information Processing, University of T ubingen, S. EckerCentre for Integrative Neuroscience, University of T ubingen, GermanyBernstein Center for Computational Neuroscience, T ubingen, GermanyMax Planck Institute for Biological Cybernetics, T ubingen, GermanyBaylor College of Medicine, Houston, TX, USAM atthias BethgeCentre for Integrative Neuroscience, University of T ubingen, GermanyBernstein Center for Computational Neuroscience, T ubingen, GermanyMax Planck Institute for Biological Cybernetics, T ubingen, GermanyAbstractRendering the semantic content of an Image in differentstyles is a difficult Image processing task. Arguably, a majorlimiting factor for previous approaches has been the lack ofimage representations that explicitly represent semantic in-formation and, thus, allow to separate Image content fromstyle.

Here we use Image representations derived from Con-volutional Neural Networks optimised for object recogni-tion, which make high level Image information explicit. WeintroduceA Neural Algorithm of Artistic Stylethat can sep-arate and recombine the Image content and Style of naturalimages. The algorithm allows us to produce new images ofhigh perceptual quality that combine the content of an ar-bitrary photograph with the appearance of numerous well-known artworks. Our results provide new insights into thedeep Image representations learned by Convolutional Neu-ral Networks and demonstrate their potential for high levelimage synthesis and IntroductionTransferring the Style from one Image onto another canbe considered a problem of texture Transfer . In texture trans-fer the goal is to synthesise a texture from a source imagewhile constraining the texture synthesis in order to preservethe semantic content of a target Image . For texture synthesisthere exist a large range of powerful non-parametric algo-rithms that can synthesise photorealistic natural textures byresampling the pixels of a given source texture [7,30,8,20].

Most previous texture Transfer algorithms rely on these non-parametric methods for texture synthesis while Using differ-ent ways to preserve the structure of the target Image . Forinstance, Efros and Freeman introduce a correspondencemap that includes features of the target Image such as im-age intensity to constrain the texture synthesis procedure[8]. Hertzman et al. use Image analogies to Transfer the tex-ture from an already stylised Image onto a target Image [13].Ashikhmin focuses on transferring the high-frequency tex-ture information while preserving the coarse scale of thetarget Image [1]. Lee et al. improve this algorithm by addi-tionally informing the texture Transfer with edge orientationinformation [22].Although these algorithms achieve remarkable results,they all suffer from the same fundamental limitation: theyuse only low-level Image features of the target Image to in-form the texture Transfer . Ideally, however, a Style transferalgorithm should be able to extract the semantic Image con-tent from the target Image ( the objects and the generalscenery) and then inform a texture Transfer procedure to ren-der the semantic content of the target Image in the Style ofthe source Image .

Therefore, a fundamental prerequisite isto find Image representations that independently model vari-ations in the semantic Image content and the Style in which12414 Input imageContentRepresentationsStyleRepresen tationsConvolutional Neural NetworkStyle ReconstructionsContent ReconstructionsaedcbaedcbFigure 1. Image representations in a Convolutional Neural Network (CNN). A given input Image is represented as a set of filtered imagesat each processing stage in the CNN. While the number of different filters increases along the processing hierarchy, the size of the filteredimages is reduced by some downsampling mechanism ( max-pooling) leading to a decrease in the total number of units per layer of Reconstructions. We can visualise the information at different processing stages in the CNN by reconstructing the inputimage from only knowing the network s responses in a particular layer. We reconstruct the input Image from from layers conv12 (a), conv22 (b), conv32 (c), conv42 (d) and conv52 (e) of the original VGG-Network.

We find that reconstruction from lower layers isalmost perfect (a c). In higher layers of the network, detailed pixel information is lost while the high-level content of the Image is preserved(d,e). Style Reconstructions. On top of the original CNN activations we use a feature space that captures the texture information of aninput Image . The Style representation computes correlations between the different features in different layers of the CNN. We reconstructthe Style of the input Image from a Style representation built on different subsets of CNN layers ( conv11 (a), conv11 and conv21 (b), conv11 , conv21 and conv31 (c), conv11 , conv21 , conv31 and conv41 (d), conv11 , conv21 , conv31 , conv41 and conv51 (e). This creates images that match the Style of a given Image on an increasing scale while discarding information of theglobal arrangement of the is presented. Such factorised representations were pre-viously achieved only for controlled subsets of natural im-ages such as faces under different illumination conditionsand characters in different font styles [29] or handwrittendigits and house numbers [17].)

To generally separate content from Style in natural im-ages is still an extremely difficult problem. However, the re-cent advance of Deep Convolutional Neural Networks [18]has produced powerful computer vision systems that learnto extract high-level semantic information from natural im-ages. It was shown that Convolutional Neural Networkstrained with sufficient labeled data on specific tasks suchas object recognition learn to extract high-level Image con-tent in generic feature representations that generalise acrossdatasets [6] and even to other visual information processingtasks [19,4,2,9,23], including texture recognition [5] andartistic Style classification [15].In this work we show how the generic feature represen-tations learned by high-performing Convolutional NeuralNetworks can be used to independently process and ma-nipulate the content and the Style of natural images. WeintroduceA Neural Algorithm of Artistic Style , a new algo-2415rithm to perform Image Style Transfer .

Conceptually, it is atexture Transfer algorithm that constrains a texture synthe-sis method by feature representations from state-of-the-artConvolutional Neural Networks . Since the texture model isalso based on deep Image representations, the Style transfermethod elegantly reduces to an optimisation problem withina single Neural network. New images are generated by per-forming a pre- Image search to match feature representationsof example images. This general approach has been usedbefore in the context of texture synthesis [12,25,10] and toimprove the understanding of deep Image representations[27,24]. In fact, our Style Transfer algorithm combines aparametric texture model based on Convolutional NeuralNetworks [10] with a method to invert their Image repre-sentations [24].2. Deep Image representationsThe results presented below were generated on the ba-sis of the VGG network [28], which was trained to performobject recognition and localisation [26] and is described ex-tensively in the original work [28].

We used the featurespace provided by a normalised version of the 16 convo-lutional and 5 pooling layers of the 19-layer VGG normalized the network by scaling the weights such thatthe mean activation of each Convolutional filter over imagesand positions is equal to one. Such re-scaling can be donefor the VGG network without changing its output, becauseit contains only rectifying linear activation functions and nonormalization or pooling over feature maps. We do not useany of the fully connected layers. The model is publiclyavailable and can be explored in the caffe-framework [14].For Image synthesis we found that replacing the maximumpooling operation by average pooling yields slightly moreappealing results, which is why the images shown were gen-erated with average Content representationGenerally each layer in the network defines a non-linearfilter bank whose complexity increases with the position ofthe layer in the network. Hence a given input Image ~xisencoded in each layer of the Convolutional Neural Networkby the filter responses to that Image .

A layer withNldis-tinct filters hasNlfeature maps each of sizeMl, whereMlis the height times the width of the feature map. So the re-sponses in a layerlcan be stored in a matrixFl RNl MlwhereFlijis the activation of theithfilter at visualise the Image information that is encoded atdifferent layers of the hierarchy one can perform gradientdescent on a white noise Image to find another Image thatmatches the feature responses of the original Image (Fig1,content reconstructions) [24]. Let~pand~xbe the originalimage and the Image that is generated, andPlandFltheirrespective feature representation in layerl. We then definethe squared-error loss between the two feature representa-tionsLcontent(~p, ~x, l) =12 i,j(Flij Plij)2.(1)The derivative of this loss with respect to the activations inlayerlequals Lcontent Flij={(Fl Pl)ijifFlij>00ifFlij<0,(2)from which the gradient with respect to the Image ~xcanbe computed Using standard error back-propagation (Fig2,right).}

Thus we can change the initially random Image ~xuntil it generates the same response in a certain layer of theConvolutional Neural Network as the original Image ~ Convolutional Neural Networks are trained on ob-ject recognition, they develop a representation of the imagethat makes object information increasingly explicit alongthe processing hierarchy [10]. Therefore, along the process-ing hierarchy of the network, the input Image is transformedinto representations that are increasingly sensitive to the ac-tualcontentof the Image , but become relatively invariant toits precise appearance. Thus, higher layers in the networkcapture the high-levelcontentin terms of objects and theirarrangement in the input Image but do not constrain the ex-act pixel values of the reconstruction very much (Fig1, con-tent reconstructions d, e). In contrast, reconstructions fromthe lower layers simply reproduce the exact pixel values ofthe original Image (Fig1, content reconstructions a c).

Image Style Transfer Using Convolutional Neural Networks

Tags:

Information

Advertisement

Transcription of Image Style Transfer Using Convolutional Neural Networks

Related search queries

Image Style Transfer Using Convolutional Neural Networks

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries