Example: bachelor of science

The Method of Least Squares - Williams College

The Method of Least SquaresSteven J. Miller Mathematics DepartmentBrown UniversityProvidence, RI 02912 AbstractThe Method of Least Squares is a procedure to determine the best fit line to data; theproof uses simple calculus and linear algebra. The basic problem is to find the best fitstraight liney=ax+bgiven that, forn {1, .. , N}, the pairs(xn, yn)are Method easily generalizes to finding the best fit of the formy=a1f1(x) + +cKfK(x);( )it is not necessary for the functionsfkto be linearly inx all that is needed is thatyis tobe a linear combination of these Description of the Problem12 Probability and Statistics Review23 The Method of Least Squares41 Description of the ProblemOften in the real world one expects to find linear relationships between variables. For example,the force of a spring linearly depends on the displacement of the spring:y=kx(hereyisthe force,xis the displacement of the spring from rest, andkis the spring constant).

3 The Method of Least Squares 4 1 Description of the Problem Often in the real world one expects to find linear relationships between variables. For example, the force of a spring linearly depends on the displacement of the spring: y = kx (here y is the force, x is the displacement of the spring from rest, and k is the spring constant). To test

Tags:

  Tesla

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of The Method of Least Squares - Williams College

1 The Method of Least SquaresSteven J. Miller Mathematics DepartmentBrown UniversityProvidence, RI 02912 AbstractThe Method of Least Squares is a procedure to determine the best fit line to data; theproof uses simple calculus and linear algebra. The basic problem is to find the best fitstraight liney=ax+bgiven that, forn {1, .. , N}, the pairs(xn, yn)are Method easily generalizes to finding the best fit of the formy=a1f1(x) + +cKfK(x);( )it is not necessary for the functionsfkto be linearly inx all that is needed is thatyis tobe a linear combination of these Description of the Problem12 Probability and Statistics Review23 The Method of Least Squares41 Description of the ProblemOften in the real world one expects to find linear relationships between variables. For example,the force of a spring linearly depends on the displacement of the spring:y=kx(hereyisthe force,xis the displacement of the spring from rest, andkis the spring constant).

2 To testthe proposed relationship, researchers go to the lab and measure what the force is for variousdisplacements. Thus they assemble data of the form(xn, yn)forn {1, .. , N}; hereynisthe observed force in Newtons when the spring is displacedxnmeters. 1: 100 simulated observations of displacement and force (k= 5).Unfortunately, it is extremely unlikely that we will observe a perfect linear are two reasons for this. The first is experimental error; the second is that the underlyingrelationship may not be exactly linear, but rather only approximately linear. See Figure 1 fora simulated data set of displacements and forces for a spring with spring constant equal Method of Least Squares is a procedure, requiring just some calculus and linear alge-bra, to determine what the best fit line is to the data. Of course, we need to quantify whatwe mean by best fit , which will require a brief review of some probability and careful analysis of the proof will show that the Method is capable of great generaliza-tions.

3 Instead of finding the best fit line, we could find the best fit given byanyfinite linearcombinations of specified functions. Thus the general problem is given functionsf1, .. , fK,find values of coefficientsa1, .. , aKsuch that thelinearcombinationy=a1f1(x) + +aKfK(x)( )is the best approximation to the Probability and Statistics ReviewWe give a quick introduction to the basic elements of probability and statistics which we needfor the Method of Least Squares ; for more details see [BD, CaBe, Du, Fe, Kel, LF, MoMc].Given a sequence of datax1, .. , xN, we define themean(or theexpected value) to be2(x1+ +xN)/N. We denote this by writing a line abovex: thusx=1NN n=1xn.( )The mean is the average value of the the following two sequences of data:{10,20,30,40,50}and{30,30,30,30,30} .Both sets have the same mean; however, the first data set has greater variation about the leads to the concept of variance, which is a useful tool to quantify how much a set of datafluctuates about its mean.

4 The variance of{x1, .. , xN}, denoted by 2x, is 2x=1NN n=1(xi x)2;( )the standard deviation xis the square root of the variance: x= 1NN n=1(xi x)2.( )Note that if thex s have units of meters then the variance 2xhas units ofmeters2, and thestandard deviation xand the meanxhave units of meters. Thus it is the standard deviationthat gives a good measure of the deviations of thex s around their are, of course, alternate measures one can use. For example, one could consider1NN n=1(xn x).( )Unfortunately this is a signed quantity, and large positive deviations can cancel with largenegatives. In fact, the definition of the mean immediately implies the above is zero! This,then, would be a terrible measure of the variability in data, as it is zero regardless of what thevalues of the data can rectify this problem by using absolute values. This leads us to consider1NN n=1|xn x|.( )While this has the advantage of avoiding cancellation of errors (as well as having the sameunits as thex s), the absolute value function is not a good function analytically.

5 It is notdifferentiable. This is primarily why we consider the standard deviation (the square root ofthe variance) this will allow us to use the tools from can now quantify what we mean by best fit . If we believey=ax+b, theny (ax+b)should be zero. Thus given observations{(x1, y1), .. ,(xN, yN)},( )we look at{y1 (ax1+b), .. , yN (axN+b)}.( )The mean should be small (if it is a good fit), and the variance will measure how good of a fitwe that the variance for this data set is 2y (ax+b)=1NN n=1(yn (axn+b))2.( )Large errors are given a higher weight than smaller errors (due to the squaring). Thus our pro-cedure favors many medium sized errors over a few large errors. If we used absolute values tomeasure the error (see equation ( )), then all errors are weighted equally; however, the ab-solute value function is not differentiable, and thus the tools of calculus become The Method of Least SquaresGiven data{(x1, y1).}

6 ,(xN, yN)}, we may define the error associated to sayingy=ax+bbyE(a, b) =N n=1(yn (axn+b))2.( )This is justNtimes the variance of the data set{y1 (ax1+b), .. , yn (axN+b)}. It makesno difference whether or not we study the variance orNtimes the variance as our error, andnote that the error is a function of two goal is to find values ofaandbthat minimize the error. In multivariable calculus welearn that this requires us to find the values of(a, b)such that E a= 0, E b= 0.( )Note we do not have to worry about boundary points: as|a|and|b|become large, the fit willclearly get worse and worse. Thus we do not need to check on the (a, b)yields E a=N n=12 (yn (axn+b)) ( xn) E b=N n=12 (yn (axn+b)) 1.( )4 Setting E/ a= E/ b= 0(and dividing by 2) yieldsN n=1(yn (axn+b)) xn= 0N n=1(yn (axn+b)) = 0.( )We may rewrite these equations as(N n=1x2n)a+(N n=1xn)b=N n=1xnyn(N n=1xn)a+(N n=11)b=N n=1yn.( )We have obtained that the values ofaandbwhich minimize the error (defined in ( ))satisfy the following matrix equation: Nn=1x2n Nn=1xn Nn=1xn Nn=11 ab = Nn=1xnyn Nn=1yn.

7 ( )We will show the matrix is invertible, which implies ab = Nn=1x2n Nn=1xn Nn=1xn Nn=11 1 Nn=1xnyn Nn=1yn .( )Denote the matrix byM. The determinant ofMisdetM=N n=1x2n N n=11 N n=1xn N n=1xn.( )Asx=1NN n=1xn,( )we find thatdetM=NN n=1x2n (Nx)2=N2(1NN n=1x2n x2)=N2 1NN n=1(xn x)2,( )5where the last equality follows from simple algebra. Thus, as long as all thexnare not equal,detMwill be non-zero andMwill be we find that, so long as thex s are not all equal, the best fit values ofaandbareobtained by solving a linear system of equations; the solution is given in( ).Remark data plotted in Figure 1 was obtained by lettingxn= 5 +.2nand thenlettingyn= 5xnplus an error randomly drawn from a normal distribution with mean zero andstandard deviation4(n {1, .. ,100}). Using these values, we find a best fit line ofy= +.48;( )thusa= As the expected relation isy= 5x, we expected a best fit value our value forais very close to the true value, our value ofbis significantly deliberately chose data of this nature to indicate the dangers in using the Method of LeastSquares.

8 Just because we the best value for the slope the best valuefor they-intercept doesnotmean that these are good estimates of the true values. The theoryneeds to be supplemented with techniques which provide error estimates. Thus we want toknow something like, given this data, there is a99%chance that the true value ofais in( , )and the true value ofbis in( .22, ); this is far more useful than just knowingthe best fit instead we usedEabs(a, b) =N n=1|yn (axn+b)|,( )then numerical techniques yield that the best fit value the best fit value ofbis less than10 10in absolute value. The difference between these values and those from theMethod of Least Squares is in the best fit value ofb(the Least important of the two parameters),and is due to the different ways of weighting the the Method of Least Squares to find the best fit quadratic toy=ax2+bx+c(or more generally the best fit degreempolynomial toy=amxm+am 1xm 1+ +a0).

9 While for any real world problem, direct computation determines whether or not the re-sulting matrix is invertible, it is nice to be able to prove the determinant is always non-zerofor the best fit line (if all thex s are not equal).Exercise thex s are not all equal, must the determinant be non-zero for the best fitquadratic or the best fit cubic?Looking at our proof of the Method of Least Squares , we note that it was not essential thatwe havey=ax+b; we could have hady=af(x) +bg(x), and the arguments would have6proceeded similarly. The difference would be that we would now obtain Nn=1f(xn)2 Nn=1f(xn)g(xn) Nn=1f(xn)g(xn) Nn=1g(xn)2 ab = Nn=1f(xn)yn Nn=1g(xn)yn .( )Exercise the generalization of the Method of Least Squares given in( ).Under what conditions is the matrix invertible?Exercise Method of proof generalizes further to the case when one expectsyis alinearcombination ofKfixed functions. The functions need not be linear; all that is requiredis that we have a linear combination, saya1f1(x) + +aKfK(x).

10 One then determinesthea1, .. , aKthat minimize the variance (the sum of Squares of the errors) by calculus andlinear algebra. Find the matrix equation that the best fit coefficients(a1, .. , aK)must the best fit line from the Method of Least Squares , so the best fit valuesare given by( ). Is the point(x,y), wherex=1n Nn=1xnandy= Nn=1yn, on the bestfit line? In other words, does the best fit line go through the average point?References[BD]P. Bickel and K. Doksum,Mathematical Statistics: Basic Ideas and Selected Top-ics, Holden-Day, San Francisco, 1977.[CaBe]G. Casella and R. Berger,Statistical Inference, 2nd edition, Duxbury AdvancedSeries, Pacific Grove, CA, 2002.[Du]R. Durrett,Probability: Theory and Examples, 2nd edition, Duxbury Press, 1996.[Fe]W. Feller,An Introduction to Probability Theory and Its Applications, 2nd edition,Vol. II, John Wiley & Sons, New York, 1971.[Kel]D. Kelley,Introduction to Probability, Macmillan Publishing Company, London,1994.


Related search queries