Transcription of Convolutional Neural Network - 國立臺灣大學
1 Convolutional Neural NetworkHung-yi LeeCan the Network be simplified by considering the properties of images?Why CNN for Image Some patterns are much smaller than the whole imageA neuron does not have to see the whole image to discover the pattern. beak detectorConnecting to small region with less parametersWhy CNN for Image The same patterns appear in different regions. upper-left beak detector middle beak detectorThey can use the same set of almost the same thingWhy CNN for Image Subsamplingthe pixels will not change the objectsubsamplingbirdbirdWe can subsample the pixels to make image smallerLess parameters for the Network to process the imageThe whole CNNF ully Connected Feedforward networkcat dog ..ConvolutionMax PoolingConvolutionMax PoolingFlattenCan repeat many timesThe whole CNNC onvolutionMax PoolingConvolutionMax PoolingFlattenCan repeat many times Some patterns are much smaller than the whole image The same patterns appear in different regions.
2 Subsamplingthe pixels will not change the objectProperty 1 Property 2 Property 3 The whole CNNF ully Connected Feedforward networkcat dog ..ConvolutionMax PoolingConvolutionMax PoolingFlattenCan repeat many timesCNN Convolution10000101001000110010001001001 00010106 x 6 image1-1-1-11-1-1-11 Filter 1-11-1-11-1-11-1 Filter are the Network parameters to be filter detects a small pattern (3 x 3). Property 1 CNN Convolution10000101001000110010001001001 00010106 x 6 image1-1-1-11-1-1-11 Filter 13-1stride=1 CNN Convolution10000101001000110010001001001 00010106 x 6 image1-1-1-11-1-1-11 Filter 13-3If stride=2We set stride=1 belowCNN Convolution10000101001000110010001001001 00010106 x 6 image1-1-1-11-1-1-11 Filter 13-1-3-1-310-3-3-3013-2-2-1stride=1 Property 2 CNN Convolution10000101001000110010001001001 00010106 x 6 image3-1-3-1-310-3-3-3013-2-2-1-11-1-11- 1-11-1 Filter 2-1-1-1-1-1-1-21-1-1-21-10-43Do the same process for every filterstride=14 x 4 imageFeatureMapCNN Colorful image10000101001000110010001001001000101 0100001010010001100100010010010001010100 0010100100011001000100100100010101-1-1-1 1-1-1-11 Filter 1-11-1-11-1-11-1 Filter 21-1-1-11-1-1-111-1-1-11-1-1-11-11-1-11- 1-11-1-11-1-11-1-11-1 Colorful fully ConnectedFully-connected1000010100100011 001000100100100010106 x 6 image1-1-1-11-1-1-11 Filter 11:2:3.
3 7:8:9:..13:14:15:..Only connect to 9 input, not fully connected4:10:16:1000010000113 Less parameters!10000101001000110010001001001 00010101-1-1-11-1-1-11 Filter 11:2:3:..7:8:9:..13:14:15:..4:10:16:1000 010000113-1 Shared weights6 x 6 imageLess parameters!Even less parameters!The whole CNNF ully Connected Feedforward networkcat dog ..ConvolutionMax PoolingConvolutionMax PoolingFlattenCan repeat many timesCNN Max Pooling3-1-3-1-310-3-3-3013-2-2-1-11-1-1 1-1-11-1 Filter 2-1-1-1-1-1-1-21-1-1-21-10-431-1-1-11-1- 1-11 Filter 1 CNN Max Pooling100001010010001100100010010010001 0106 x 6 image3013-11302 x 2 imageEach filter is a channelNew image but smallerConvMaxPoolingThe whole CNNC onvolutionMax PoolingConvolutionMax PoolingCan repeat many timesA new imageThe number of the channel is the number of filtersSmaller than the original image3013-1130 The whole CNNF ully Connected Feedforward networkcat dog ..ConvolutionMax PoolingConvolutionMax PoolingFlattenA new imageA new imageFlatten3013-1130 Flatten3013-1103 fully Connected Feedforward networkOnly modified the Network structure and input format (vector -> 3-D tensor)CNN in KerasConvolutionMax PoolingConvolutionMax Poolinginput1-1-1-11-1-1-11-11-1-11-1-11 -1 There are ( 28 , 28 , 1)1: black/white, 3: RGB28 x 28 pixels3-1-313 Only modified the Network structure and input format (vector -> 3-D tensor)CNN in KerasConvolutionMax PoolingConvolutionMax Poolinginput1 x 28 x 2825 x 26 x 2625 x 13 x 1350 x 11 x 1150 x 5 x 5 How many parameters for each filter?
4 How many parameters for each filter?9225 Only modified the Network structure and input format (vector -> 3-D tensor)CNN in KerasConvolutionMax PoolingConvolutionMax Poolinginput1 x 28 x 2825 x 26 x 2625 x 13 x 1350 x 11 x 1150 x 5 x 5 Flatten1250 fully Connected Feedforward networkoutputLive DemoWhat does machine learn? FirstConvolution Layer Typical-looking filters on the trained first layer x 11(AlexNet)How about higher layers? Which images make a specific neuron activateRoss Girshick,Jeff Donahue,Trevor Darrell,Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation , CVPR, 2014 ConvolutionMax PoolingConvolutionMax Poolinginput25 3x3 filters50 3x3 filtersWhat does CNN learn?50 x 11 x 11 The output of the k-thfilter is a 11 x 11 of the activation of the k-thfilter: = =111 =111 1111x = max (gradient ascent) ConvolutionMax PoolingConvolutionMax Poolinginput25 3x3 filters50 3x3 filtersWhat does CNN learn?
5 50 x 11 x 11 The output of the k-thfilter is a 11 x 11 of the activation of the k-thfilter: = =111 =111 = max (gradient ascent)For each filter What does CNN learn?ConvolutionMax PoolinginputConvolutionMax Poolingflatten = max Each figure corresponds to a neuronFind an image maximizing the output of neuron:ConvolutionMax PoolinginputConvolutionMax Poolingflatten What does CNN learn? = max Can we see digits?012345678 Deep Neural Networks are Easily does CNN learn?012345678012345678 = max = max , Over all pixel valuesKaren Simonyan,Andrea Vedaldi,Andrew Zisserman, Deep Inside Convolutional Networks: VisualisingImage Classification Models and Saliency Maps , ICLR, 2014 Karen Simonyan,Andrea Vedaldi,Andrew Zisserman, Deep Inside Convolutional Networks: VisualisingImage Classification Models and Saliency Maps , ICLR, 2014| | : the predicted class of the modelPixel Reference: Zeiler, M.
6 D., & Fergus, R. (2014). Visualizing and understanding Convolutional networks. InComputer Vision ECCV 2014(pp. 818-833)Deep Dream Given a photo, machine adds what it sees .. Modify imageCNN exaggerates what it seesDeep Dream Given a photo, machine adds what it sees .. Style Given a photo, make its style like famous Style Given a photo, make its style like famous StyleCNNCNN contentstyleCNN?A Neural Algorithm of Artistic Application: Playing GoNetwork(19 x 19 positions)Next move19 x 19 vectorBlack: 1white: -1none: 019 x 19 vectorFully-connected feedforward Network can be usedBut CNN performs much x 19 matrix (image)More Application: Playing GoCNNCNN record of previous playsTarget: = 1else = 0 Target: 5 = 1else = 0 Training: :5 : : 5 ..Why CNN for playing Go? Some patterns are much smaller than the whole image The same patterns appear in different Go uses 5 x 5 for first layerWhy CNN for playing Go? Subsamplingthe pixels will not change the objectAlpha Go does not use Max Pooling.
7 Max PoolingHow to explain this???More Application: SpeechTimeFrequencySpectrogramCNNI mageThe filters move in the frequency Application: TextSource of image: Guobiao Mo