Machine Learning: Multi Layer Perceptrons

Machine Learning: Multi Layer PerceptronsProf. Dr. Martin RiedmillerAlbert-Ludwigs-University FreiburgAG Maschinelles LernenMachine learning : Multi Layer Perceptrons Multi Layer Perceptrons (MLP) learning MLPs function minimization: gradient descend & related methodsMachine learning : Multi Layer Perceptrons networks single neurons are not able to solve complex tasks ( restricted to linearcalculations) creating networks by hand is too expensive; we want to learn from data nonlinear features also have to be generated by hand; tessalations becomeintractable for larger dimensionsMachine learning : Multi Layer Perceptrons networks single neurons are not able to solve complex tasks ( restricted to linearcalculations) creating networks by hand is too expensive; we want to learn from data nonlinear features also have to be generated by hand; tessalations becomeintractable for larger dimensions we want to have a generic model that can adapt to some training data basic idea: Multi Layer perceptron(Werbos 1974, Rumelhart, McClelland, Hinton1986), also namedfeed forward networksMachine learning : Multi Layer Perceptrons in a Multi Layer perceptron standard Perceptrons calculate adiscontinuous function:~x7 fstep(w0+h~w, ~xi) 8 Machine learning : Multi Layer Perceptrons in a Multi Layer perceptron standard Perceptrons calculate adiscontinuous function:~x7 fstep(w0+h~w, ~xi) due to technical reasons, neurons inMLPs calculate a smoothed variantof this:~x7 flog(w0+h~w, ~xi)withflog(z) =11 +e zflogis calledlogistic function 0 1 8 6 4 2 0 2 4 6 8 Machine learning : Multi Layer Perceptrons in a Multi Layer perceptron standard Perceptrons calculate adiscontinuous function.

~x7 fstep(w0+h~w, ~xi) due to technical reasons, neurons inMLPs calculate a smoothed variantof this:~x7 flog(w0+h~w, ~xi)withflog(z) =11 +e zflogis calledlogistic function 0 1 8 6 4 2 0 2 4 6 8 properties: monotonically increasing limz = 1 limz = 0 flog(z) = 1 flog( z) continuous, differentiableMachine learning : Multi Layer Perceptrons Layer Perceptrons Amulti Layer Perceptrons (MLP)is a finite acyclic graph. The nodes areneurons with logistic .. neurons ofi-th Layer serve as input features for neurons ofi+ 1th Layer very complex functions can be calculated combining many neuronsMachine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) Multi Layer Perceptrons , more formally:A MLP is a finite directed acyclic graph. nodes that are no target of any connection are calledinput neurons. AMLP that should be applied to input patterns of dimensionnmust haveninput neurons, one for each dimension. Input neurons are typicallyenumerated as neuron 1, neuron 2, neuron 3.

Nodes that are no source of any connection are calledoutput neurons. AMLP can have more than one output neuron. The number of outputneurons depends on the way the target values (desired values) ofthetraining patterns are described. all nodes that are neither input neurons nor output neurons are calledhidden neurons. since the graph is acyclic, all neurons can be organized in layers, with theset of input layers being the first learning : Multi Layer Perceptrons Layer Perceptrons (cont.) connections that hop over several layers are calledshortcut most MLPs have a connection structure with connections from all neurons ofone Layer to all neurons of the next Layer without shortcuts all neurons are enumerated Succ(i)is the set of all neuronsjfor which a connectioni jexists Pred(i)is the set of all neuronsjfor which a connectionj iexists all connections are weighted with a real number. The weight of theconnectioni jis namedwji all hidden and output neurons have a bias weight.

The bias weight of neuroniis namedwi0 Machine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) variables for calculation: hidden and output neurons have some variableneti( network input ) all neurons have some variableai( activation / output ) Machine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) variables for calculation: hidden and output neurons have some variableneti( network input ) all neurons have some variableai( activation / output ) applying a pattern~x= (x1, .. , xn)Tto the MLP: for each input neuron the respective element of the input pattern ispresented, xiMachine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) variables for calculation: hidden and output neurons have some variableneti( network input ) all neurons have some variableai( activation / output ) applying a pattern~x= (x1, .. , xn)Tto the MLP: for each input neuron the respective element of the input pattern ispresented, xi for all hidden and output neuronsi:after the valuesajhave been calculated for all predecessorsj Pred(i), calculatenetiandaias:neti wi0+Xj Pred(i)(wijaj)ai flog(neti) Machine learning : Multi Layer Perceptrons Layer Perceptrons (cont.)

Variables for calculation: hidden and output neurons have some variableneti( network input ) all neurons have some variableai( activation / output ) applying a pattern~x= (x1, .. , xn)Tto the MLP: for each input neuron the respective element of the input pattern ispresented, xi for all hidden and output neuronsi:after the valuesajhave been calculated for all predecessorsj Pred(i), calculatenetiandaias:neti wi0+Xj Pred(i)(wijaj)ai flog(neti) the network output is given by theaiof the output neuronsMachine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) illustration:12 apply pattern~x= (x1, x2)TMachine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) illustration:12 apply pattern~x= (x1, x2)T calculate activation of input neurons:ai xiMachine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) illustration:12 apply pattern~x= (x1, x2)T calculate activation of input neurons:ai xi propagate forward the activations: Machine learning : Multi Layer Perceptrons Layer Perceptrons (cont.)

Illustration:12 apply pattern~x= (x1, x2)T calculate activation of input neurons:ai xi propagate forward the activations: stepMachine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) illustration:12 apply pattern~x= (x1, x2)T calculate activation of input neurons:ai xi propagate forward the activations: step byMachine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) illustration:12 apply pattern~x= (x1, x2)T calculate activation of input neurons:ai xi propagate forward the activations: step by stepMachine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) illustration:12 apply pattern~x= (x1, x2)T calculate activation of input neurons:ai xi propagate forward the activations: step by step read the network output from both output neuronsMachine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) algorithm (forward pass):Require:pattern~x, MLP, enumeration of all neurons in topological orderEnsure:calculate output of MLP1:for allinput neuronsido2:setai xi3:end for4:for allhidden and output neuronsiin topological orderdo5:setneti wi0+Pj Pred(i)wijaj6:setai flog(neti)7:end for8:for alloutput neuronsido9:assembleaiin output vector~y10:end for11:return~yMachine learning : Multi Layer Perceptrons Layer Perceptrons (cont.)

Variant:Neurons with logistic activation canonly output values enable output in a wider range ofreal number variants are used: neurons withtanhactivationfunction:ai=tanh(neti) =eneti e netieneti+e neti neurons with linear activation:ai=netilinear activation 2 1 0 1 2 3 2 1 0 1 2 3flog(2x)tanh(x) Machine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) variant:Neurons with logistic activation canonly output values enable output in a wider range ofreal number variants are used: neurons withtanhactivationfunction:ai=tanh(neti) =eneti e netieneti+e neti neurons with linear activation:ai=netilinear activation 2 1 0 1 2 3 2 1 0 1 2 3flog(2x)tanh(x) the calculation of the networkoutput is similar to the case oflogistic activation except therelationship betweennetiandaiis different. the activation function is a localproperty of each learning : Multi Layer Perceptrons Layer Perceptrons (cont.) typical network topologies: for regression: output neurons with linear activation for classification: output neurons with logistic/tanh activation all hidden neurons with logistic activation layered layout:input Layer first hidden Layer second hidden Layer .

Output layerwith connection from each neuron in layeriwith each neuron in layeri+ 1, no shortcut connectionsMachine learning : Multi Layer Perceptrons Layer Perceptrons (cont.) typical network topologies: for regression: output neurons with linear activation for classification: output neurons with logistic/tanh activation all hidden neurons with logistic activation layered layout:input Layer first hidden Layer second hidden Layer .. output layerwith connection from each neuron in layeriwith each neuron in layeri+ 1, no shortcut connections Lemma:Any boolean function can be realized by a MLP with one hidden Layer . Anybounded continuous function can be approximated with arbitrary precision bya MLP with one hidden :was given by Cybenko (1989). Idea: partition input space in smallcellsMachine learning : Multi Layer Perceptrons Training given training data:D={(~x(1),~d(1)), .. ,(~x(p),~d(p))}where~d(i)is thedesired output (real number for regression, class label0or1forclassification) given topology of a MLP task: adapt weights of the MLPM achine learning : Multi Layer Perceptrons Training(cont.)

Idea: minimize an error termE(~w;D) =12pXi=1||y(~x(i);~w) ~d(i)||2withy(~x;~w): network output for input pattern~xand weight vector~w,||~u||2squared length of vector~u:||~u||2=Pdim(~u)j=1(uj)2 Machine learning : Multi Layer Perceptrons Training(cont.) idea: minimize an error termE(~w;D) =12pXi=1||y(~x(i);~w) ~d(i)||2withy(~x;~w): network output for input pattern~xand weight vector~w,||~u||2squared length of vector~u:||~u||2=Pdim(~u)j=1(uj)2 learning means: calculating weights for which the error becomes minimalminimize~wE(~w;D) Machine learning : Multi Layer Perceptrons Training(cont.) idea: minimize an error termE(~w;D) =12pXi=1||y(~x(i);~w) ~d(i)||2withy(~x;~w): network output for input pattern~xand weight vector~w,||~u||2squared length of vector~u:||~u||2=Pdim(~u)j=1(uj)2 learning means: calculating weights for which the error becomes minimalminimize~wE(~w;D) interpretEjust as a mathematical function depending on~wand forget aboutits semantics, then we are faced with a problem of mathematical optimizationMachine learning : Multi Layer Perceptrons theory discusses mathematical problems of the form:minimize~uf(~u)~ucan be any vector of suitable size.

But which one solves this task and howcan we calculate it? Machine learning : Multi Layer Perceptrons theory discusses mathematical problems of the form:minimize~uf(~u)~ucan be any vector of suitable size. But which one solves this task and howcan we calculate it? some simplifications:here we consider only functionsfwhich are continuous and differentiablecontinuous, non differentiablefunctionnon continuous functiondifferentiable function(disrupted)(folded)(smooth)xy y yx xMachine learning : Multi Layer Perceptrons theory(cont.) Aglobal minimum~u is a point sothat:f(~u ) f(~u)for all~u. Alocal minimum~u+is a point so thatexistr >0withf(~u+) f(~u)for all points~uwith||~u ~u+||< ryxgloballocalminimaMachine learning : Multi Layer Perceptrons theory(cont.) analytical way to find a minimum:For a local minimum~u+, the gradient offbecomes zero: f ui(~u+) = 0for alliHence, calculating all partial derivatives and looking for zeros is a good idea( linear regression) Machine learning : Multi Layer Perceptrons theory(cont.)

Machine Learning: Multi Layer Perceptrons

Tags:

Information

Advertisement

Transcription of Machine Learning: Multi Layer Perceptrons

Related search queries

Machine Learning: Multi Layer Perceptrons

Tags:

Information

Advertisement

Related documents

Related search queries