4 Perceptron Learning - fu-berlin.de

R. Rojas: Neural Networks, Springer-Verlag, Berlin, 19964 Perceptron Learning algorithms for neural networksIn the two preceding chapters we discussed two closely related models,McCulloch Pitts units and perceptrons, but the question ofhow to find theparameters adequate for a given task was left open. If two sets of points haveto be separated linearly with a Perceptron , adequate weights for the comput-ing unit must be found. The operators that we used in the preceding chapter,for example for edge detection, used hand customized weights. Now we wouldlike to find those parameters automatically. Theperceptron Learning algorithmdeals with this Learning algorithm is an adaptive method by which a networkof com-puting units self-organizes to implement the desired behavior. This is done insome Learning algorithms by presenting some examples of thedesired input-output mapping to the network.

A correction step is executediteratively untilthe network learns to produce the desired response. The Learning algorithmis a closed loop of presentation of examples and of corrections to the networkparameters, as shown in Figure input-outputexamplescompute theerrorfix network parametersFig. process in a parametric systemR. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996784 Perceptron LearningIn some simple cases the weights for the computing units can be foundthrough a sequential test of stochastically generated numerical , such algorithms which look blindly for a solution do not qualify as Learning . A Learning algorithm must adapt the network parameters accord-ing to previous experience until a solution is found, if it Classes of Learning algorithmsLearning algorithms can be divided intosupervisedandunsupervisedmeth-ods.

Supervised Learning denotes a method in which some input vectors arecollected and presented to the network. The output computedby the net-work is observed and the deviation from the expected answer is weights are corrected according to the magnitude of the error in the waydefined by the Learning algorithm. This kind of Learning is also calledlearningwith a teacher, since a control process knows the correct answer for the setofselected input Learning is used when, for a given input, the exact numericaloutput a network should produce is unknown. Assume, for example, that somepoints in two-dimensional space are to be classified into three clusters. For thistask we can use a classifier network with three output lines, one for each class(Figure ). Each of the three computing units at the outputmust specializeby firing only for inputs corresponding to elements of each cluster.

If one unitfires, the others must keep silent. In this case we do not know apriori whichunit is going to specialize on which cluster. Generally we donot even knowhow many well-defined clusters are present. Since no teacher is available,the network must organize itself in order to be able to associate clusters 1cluster 2cluster 3networkxyxy Fig. clusters and a classifier networkSupervised Learning is further divided into methods which use reinforce-ment or error correction. Reinforcement Learning is used when after each pre-sentation of an input-output example we only know whether the networkproduces the desired result or not. The weights are updated based on thisinformation (that is, the Boolean valuestrueorfalse) so that only the inputR. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996R.

Rojas: Neural Networks, Springer-Verlag, Berlin, Learning algorithms for neural networks79vector can be used for weight correction. In Learning with error correction, themagnitude of the error, together with the input vector, determines the magni-tude of the corrections to the weights, and in many cases we try to eliminatethe error in a single correction learningunsupervised learningcorrective learningreinforcement learningFig. of Learning algorithmsThe Perceptron Learning algorithm is an example of supervised learningwith reinforcement. Some of its variants use supervised Learning with errorcorrection (corrective Learning ). Vector notationIn the following sections we deal with Learning methods for simplify the notation we adopt the following conventions. The input(x1,x2,..,xn) to the Perceptron is called the input vector.

If the weightsof the Perceptron are the real numbersw1,w2,..,wnand the threshold is , we callw= (w1,w2,..,wn,wn+1) withwn+1= theextended weightvectorof the Perceptron and (x1,x2,..,xn,1) theextended input vector. Thethreshold computation of a Perceptron will be expressed using scalar arithmetic test computed by the Perceptron is thusw x ,ifwandxare the weight and input vectors, andw x 0ifwandxare the extended weight and input vectors. It will always be clearfrom the context whether normal or extended vectors are being , for example, we are looking for the weights and thresholdneeded toimplement the AND function with a Perceptron , the input vectors and theirassociated outputs are(0,0)7 0,(0,1)7 0,(1,0)7 0,(1,1)7 Rojas: Neural Networks, Springer-Verlag, Berlin, 1996R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996804 Perceptron LearningIf a Perceptron with threshold zero is used, the input vectors must be extendedand the desired mappings are(0,0,1)7 0,(0,1,1)7 0,(1,0,1)7 0,(1,1,1)7 Perceptron with three still unknown weights (w1,w2,w3) can carry out Absolute linear separabilityThe proof of convergence of the Perceptron Learning algorithm assumes thateach Perceptron performs the testw x>0.

So far we have been workingwith perceptrons which perform the testw x 0. We must just show thatboth classes of computing units are equivalent when the training set is finite,as is always the case in Learning problems. We need the following setsAandBof points in ann-dimensional space arecalled absolutely linearly separable ifn+ 1real numbersw1,..,wn+1existsuch that every point(x1,x2,..,xn) Asatisfies ni=1wixi> wn+1andevery point(x1,x2,..,xn) Bsatisfies ni=1wixi< wn+1If a Perceptron with threshold zero can linearly separate two finite setsof input vectors, then only a small adjustment to its weightsis needed toobtain an absolute linear separation. This is a direct corollary of the finite sets of points,AandB, inn-dimensional spacewhich are linearly separable are also absolutely linearly the two sets are linearly separable, weightsw1.

,wn+1existfor which, without loss of generality, the inequalityn i=1wiai wn+1holds for all points (a1,..,an) Aandn i=1wibi< wn+1for all points (b1,..,bn) B. Let = max{ ni=1wibi wn+1|(b1,..,bn) B}. It is clear that < /2<0. Letw =wn+1+ /2. For all points inAitholds thatR. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996R. Rojas: Neural Networks, Springer-Verlag, Berlin, Learning algorithms for neural networks81n i=1wiai (w 12 ) means thatn i=1wiai w 12 >0 n i=1wiai> w .( )For all points inBa similar argument holds sincen i=1wibi wn+1=n i=1wibi (w 12 ) ,and from this we deducen i=1wibi w 12 <0.( )Equations ( ) and ( ) show that the setsAandBare absolutely linearlyseparable. If two sets are linearly separable in the absolute sense, then theyare, of course, linearly separable in the conventional The error surface and the search methodA usual approach for starting the Learning algorithm is to initialize the networkweights randomly and to improve these initial parameters, looking at each stepto see whether a better separation of the training set can be achieved.

In thissection we identify points (x1,x2,..,xn) inn-dimensional space with thevectorxwith the same open (closed) positive half-space associated with then-dimensional weight vectorwis the set of all pointsx IRnfor whichw x>0(w x 0). The open (closed) negative half-space associated withwis the setof all pointsx IRnfor whichw x<0 (w x 0).We omit the adjectives closed or open whenever it is clear from thecontext which kind of linear separation is being for two finite sets of points in IRnwhich we want toseparate linearly. A weight vector is sought so that the points inPbelong to itsassociated positive half-space and the points inNto the negative a Perceptron with weight vectorwis the number of incorrectlyclassified points. The Learning algorithm must minimize this error functionE(w). One possible strategy is to use a local greedy algorithm which worksby computing the error of the Perceptron for a given weight vector, lookingthen for a direction in weight space in which to move, and updating theweight vector by selecting new weights in the selected search direction.

Wecan visualize this strategy by looking at its effect in Rojas: Neural Networks, Springer-Verlag, Berlin, 1996R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996824 Perceptron Learning1w1x1w2x2 Fig. with constant thresholdLet us take as an example a Perceptron with constant threshold = 1(Figure ). We are looking for two weights,w1andw2, which transformthe Perceptron into a binary AND gate. We can show graphically the errorfunction for all combinations of the two variable weights. This has been donein Figure for values of the weights between and The solutionregion is the triangular area in the middle. The Learning algorithm shouldreach this region starting from any other point in weight space. In this case, itis possible to descend from one surface to the next using purely local decisionsat each w101w2 012error 01 w101w2 012error Fig.

4 Perceptron Learning - fu-berlin.de

Tags:

Information

Transcription of 4 Perceptron Learning - fu-berlin.de

Related search queries

4 Perceptron Learning - fu-berlin.de

Tags:

Information

Documents from same domain

Related documents

Related search queries