Lecture 10: Recurrent Neural Networks

Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 20171 Lecture 10: Recurrent Neural NetworksFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 20172 AdministrativeA1 grades will go out soonA2 is due today (11:59pm)Midterm is in-class on Tuesday!We will send out details on where to go soonFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 20173 Extra Credit: Train GameMore details on Piazza by early next weekFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 20174 Last Time: CNN ArchitecturesAlexNetFigure copyright Kaiming He, 2016. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 20175 Last Time: CNN ArchitecturesFigure copyright Kaiming He, 2016.

Reproduced with permission. 3x3 conv, 128 Pool3x3 conv, 643x3 conv, 64 Input3x3 conv, 128 Pool3x3 conv, 2563x3 conv, 256 Pool3x3 conv, 5123x3 conv, 512 Pool3x3 conv, 5123x3 conv, 512 PoolFC 4096FC 1000 SoftmaxFC 40963x3 conv, 5123x3 conv, 512 PoolInputPoolPoolPoolPoolSoftmax3x3 conv, 5123x3 conv, 5123x3 conv, 2563x3 conv, 2563x3 conv, 1283x3 conv, 1283x3 conv, 643x3 conv, 643x3 conv, 5123x3 conv, 5123x3 conv, 5123x3 conv, 5123x3 conv, 5123x3 conv, 512FC 4096FC 1000FC 4096 VGG16 VGG19 GoogLeNetFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 20176 Last Time: CNN ArchitecturesFigure copyright Kaiming He, 2016. Reproduced with permission. InputSoftmax3x3 conv, 647x7 conv, 64 / 2FC 1000 Pool3x3 conv, 643x3 conv, 643x3 conv, 643x3 conv, 643x3 conv, 643x3 conv, 1283x3 conv, 128 / 23x3 conv, 1283x3 conv, 1283x3 conv, 1283x3 conv, conv, 643x3 conv, 643x3 conv, 643x3 conv, 643x3 conv, 643x3 conv, 64 PoolreluResidual blockconvconvXidentityF(x) + xF(x)reluXFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 20177 Figures copyright Larsson et al.

, 2017. Reproduced with Block 1 ConvInputConvDense Block 2 ConvPoolConvDense Block 3 SoftmaxFCPoolConvConv1x1 conv, 641x1 conv, 64 InputConcatConcatConcatDense BlockDenseNetFractalNetFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 20178 Last Time: CNN ArchitecturesFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 20179 Last Time: CNN ArchitecturesAlexNet and VGG have tons of parameters in the fully connected layersAlexNet: ~62M parametersFC6: 256x6x6 -> 4096: 38M paramsFC7: 4096 -> 4096: 17M paramsFC8: 4096 -> 1000: 4M params~59M params in FC layers!Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201710 Today: Recurrent Neural NetworksFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201711 Vanilla Neural Networks Vanilla Neural NetworkFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201712 Recurrent Neural Networks : Process image Captioningimage -> sequence of wordsFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201713 Recurrent Neural Networks : Process Sentiment Classificationsequence of words -> sentimentFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201714 Recurrent Neural Networks : Process Machine Translationseq of words -> seq of wordsFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201715 Recurrent Neural Networks .

Process Video classification on frame levelFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201716 Sequential Processing of Non-Sequence DataBa, Mnih, and Kavukcuoglu, Multiple Object Recognition with Visual Attention , ICLR et al, DRAW: A Recurrent Neural Network For image Generation , ICML 2015 Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with images by taking a series of glimpses Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201717 Sequential Processing of Non-Sequence DataGregor et al, DRAW: A Recurrent Neural Network For image Generation , ICML 2015 Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with images one piece at a time!

Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201718 Recurrent Neural NetworkxRNNFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201719 Recurrent Neural NetworkxRNNyusually want to predict a vector at some time stepsFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201720 Recurrent Neural NetworkxRNNyWe can process a sequence of vectors x by applying a recurrence formula at every time step:new stateold stateinput vector at some time stepsome functionwith parameters WFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201721 Recurrent Neural NetworkxRNNyWe can process a sequence of vectors x by applying a recurrence formula at every time step:Notice: the same function and the same set of parameters are used at every time Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201722(Vanilla) Recurrent Neural NetworkxRNNyThe state consists of a single hidden vector h:Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201723h0fWh1x1 RNN.

Computational GraphFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201724h0fWh1fWh2x2x1 RNN: Computational GraphFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, x2x1 RNN: Computational GraphhTFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, x2x1 WRNN: Computational GraphRe-use the same weight matrix at every time-stephTFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, x2x1 WRNN: Computational Graph: Many to ManyhTy3y2y1 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, x2x1 WRNN: Computational Graph: Many to ManyhTy3y2y1L1L2L3 LTFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, x2x1 WRNN: Computational Graph: Many to ManyhTy3y2y1L1L2L3 LTLFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, x2x1 WRNN: Computational Graph: Many to OnehTFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, xWRNN: Computational Graph.

One to ManyhTy3y3y3 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201732 Sequence to Sequence: Many-to-one + x2x1W1hTMany to one: Encode input sequence in a single vectorFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201733 Sequence to Sequence: Many-to-one + Many to one: Encode input sequence in a single vectorOne to many: Produce output sequence from single input vectorfWh1fWh2fWW2 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201734 Example: Character-levelLanguage ModelVocabulary:[h,e,l,o]Example trainingsequence: hello Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201735 Example: Character-levelLanguage ModelVocabulary:[h,e,l,o]Example trainingsequence: hello Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201736 Example: Character-levelLanguage ModelVocabulary:[h,e,l,o]Example trainingsequence: hello Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201737 Example: Character-levelLanguage ModelSamplingVocabulary.

[h,e,l,o]At test-time sample characters one at a time, feed back to e l l o SampleFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, e l l o SampleExample: Character-levelLanguage ModelSamplingVocabulary:[h,e,l,o]At test-time sample characters one at a time, feed back to modelFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, e l l o SampleExample: Character-levelLanguage ModelSamplingVocabulary:[h,e,l,o]At test-time sample characters one at a time, feed back to modelFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, e l l o SampleExample: Character-levelLanguage ModelSamplingVocabulary.

[h,e,l,o]At test-time sample characters one at a time, feed back to modelFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201741 Backpropagation through timeLossForward through entire sequence to compute loss, then backward through entire sequence to compute gradientFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201742 Truncated Backpropagation through timeLossRun forward and backward through chunks of the sequence instead of whole sequenceFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201743 Truncated Backpropagation through timeLossCarry hidden states forward in time forever, but only backpropagate for some smaller number of stepsFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201744 Truncated Backpropagation through timeLossFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, gist: 112 lines of Python( )Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201746xRNNyFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201747train moretrain moretrain moreat first.

Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201748 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201749 The Stacks Project: open source algebraic geometry textbookLatex stacks project is licensed under the GNU Free Documentation LicenseFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201750 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201751 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201752 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 201753 Generated C codeFei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4, 2017 Fei-Fei Li & Justin Johnson & Serena YeungLecture 10 -May 4.

Lecture 10: Recurrent Neural Networks

Tags:

Information

Transcription of Lecture 10: Recurrent Neural Networks

Related search queries

Lecture 10: Recurrent Neural Networks

Tags:

Information

Documents from same domain

Related documents

Related search queries