DEEP LEARNING - REVIEW - Pennsylvania State …

deep LEARNING - REVIEWYANN LECUN, YOSHUA BENGIO & GEOFFREY deep LEARNING - History, Background & Applications. Recent Revival. Convolutional Neural Networks. Recurrent Neural Networks. IS deep LEARNING ? A particular class of LEARNING Algorithms. Rebranded Neural Networks : With multiple layers. Inspired by the Neuronal architecture of the Brain. Renewed interest in the area due to a few recent breakthroughs. Learn parameters from data. Non Linear CONTEXTHISTORY 1943 - McCulloch & Pitts develop computational model for neural network. Idea: neurons with a binary threshold activation function were analogous to first order logic sentences. 1949 - Donald Hebb proposes Hebb s ruleIdea: Neurons that fire together, wire together! 1958 - Frank Rosenblatt creates the Perceptron. 1959 - Hubel and Wiesel elaborate cells in Visual Cortex.

1975 - Paul J. Werbos develops the Backpropagation Algorithm. 1980 - Neocognitron, a hierarchical multilayered ANN. 1990 - Convolutional Neural Predict the activity of potential drug molecules. Reconstruct Brain circuits. Predict effects of mutation on non-coding regions of DNA. Speech/Image Recognition & Language translationMULTILAYER NEURAL NETWORKCOMMON NON-LINEAR FUNCTIONS:1) F ( Z ) = MAX(0,Z )2) SIGMOID3)LOGISTICCOST FUNCTION:1/2[(YL - TL)]2 Source : deep LEARNING Yann LeCun, Yoshua Bengio, Geoffrey Hinton nature 521, 436 444 (28 May 2015) STOCHASTIC GRADIENT DESCENTS ource: ; Wikipedia. Analogy: A person is stuck in the mountains and is trying to get down ( trying to find the minima). SGD: The person represents the backpropagation algorithm, and the path taken down the mountain represents the sequence of parameter settings that the algorithm will explore.

The steepness of the hill represents the slope of the error surface at that point. The instrument used to measure steepness is differentiation (the slope of the error surface can be calculated by taking the derivative of the squared error function at that point). The direction he chooses to travel in aligns with the gradient of the error surface at that point. The amount of time he travels before taking another measurement is the LEARNING rate of the PROPAGATIONCOST FUNCTION:Error =1/2[(yl- tl)]2 Source : deep LEARNING Yann LeCun, Yoshua Bengio, Geoffrey Hinton nature 521, 436 444 (28 May 2015) WHY ALL THE BUZZ?ImageNet: ~5M labeled high resolution images. Roughly 22K categories. Collected from web & labeled by Amazon Mechanical ERROR RATESCONVOLUTION NEURAL NETWORKS-CORE IDEA Color Image - 32 x 32 pixels on 3 color palettes.

Pixel Intensity - 0 - 255. Image Representation : 32 * 32 * 3 array of numbers with each pixel ranging between 0 and 255. Idea : Feed the numerical array to a ConvNet and obtain probabilities for each class of objects as an N dimensional vector, where N is the number of NEURAL NETS Multi stage Neural Nets that model V1,V2,V3 areas of the visual cortex. (Convolutional Layer + NL Layer + Pooling Layer)^n + Fully Connected Layer. Highly correlated local values are easily detected. Ideal for volumetric data that come in multiple arrays. , Color images. Learn the essence' of images well. Applications in Computer VisionINITIAL WORK - YAN LECUN Primitive recognition without hand coded features. Adaptive, yet constrained architecture. Hand written digit recognition served as a simple and powerful model. Training Sample : 9298 zip codes on mails passing through Buffalo, OVERVIEWS ource : deep LEARNING Yann LeCun, Yoshua Bengio, Geoffrey Hinton nature 521, 436 444 (28 May 2015) WHAT IS A CONVOLUTION?

Several meanings depending on the area of application. Convolution - Operation of applying filters/kernels through overlapping regions of the image. Stride - Extent of overlap during convolution. Each filter has the same set of weights and biases. This minimizes the number of : Filters - Carefully designed feature detectors (matrices) to detect edges, curves, colors etc. Receptive field - Area covered by a single filter. 3*3 and 5*5 are the common sizes. Alexnet used 96 Kernels on the input ~jduh/courses/Archive/geog481w07/ & POOLING LAYERS RELU - Applies non-linear activation function MAX(0,x) to every pixel. Other common functions include tanh and sigmoid. RELU - Addresses the vanishing gradient problem . Pooling - Reduces the spatial size and minimizes overfitting. MAX 2x2 is the most common pooling operation. Dropout - Random elimination of neurons to minimize overfitting.

Pooling Vs. Larger LINEARITIES- - - THE BIG PICTURERECURRENT NEURAL NETWORKS -RNN RNN - Neural Nets with feedback loops. Multiple copies of the same network, each passing a message to a successor Used to train sequential inputs. , Speech, DNA sequences etc. Operate over sequences of vectors. Predict next character, word SHORT TERM MEMORY (LSTM) Sequences have long term the clouds are in the sky vs. I grew up in I speak fluent French. Problem: Hard to store information for very long. Solution: BY ANDREJ KARPATHY Source : 474MB of C code from Github Multiple 3-layer LSTMs; Few days of training on GPUs Parameters - 10 AND BEYOND RNNs augmented with Memory in question answering systems. ConvNets + RNNs = Novel ApplicationsFUTURE - deep LEARNING Extension of recent successes from to Unsupervised LEARNING .

End to End Integration :Reinforcement + Convnets + RNNs Natural language understanding. Complex systems that combine LEARNING , memory and : Andrej Karpathy s Course: : Wikipedia!

DEEP LEARNING - REVIEW - Pennsylvania State …

Tags:

Information

Advertisement

Transcription of DEEP LEARNING - REVIEW - Pennsylvania State …

Related search queries

DEEP LEARNING - REVIEW - Pennsylvania State …

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries