Example: confidence

Chapter 7 Least Squares Estimation

7-1 Least Squares EstimationVersion 7 Least Squares IntroductionLeast Squares is a time-honored Estimation procedure, that was developed independently by Gauss(1795), Legendre (1805) and Adrain (1808) and published in the first decade of the nineteenthcentury. It is perhaps the most widely used technique in geophysical data analysis. Unlikemaximum likelihood, which can be applied to any problem for which we know the general formof the joint pdf, in Least Squares the parameters to be estimated must arise in expressions for themeans of the observations. When the parameters appear linearly in these expressions then theleast Squares Estimation problem can be solved in closed form, and it is relatively straightforwardto derive the statistical properties for the resulting parameter very simple example which we will treat in some detail in order to illustrate the more generalproblem is that of fitting a straight line to a collection of pairs of observations (xi, yi) wher

7-4 Least Squares Estimation Version 1.3 is an unbiased estimate of σ2.The number of degrees of freedom is n − 2 because 2 parameters have been estimated from the data.

Tags:

  Estimation

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Chapter 7 Least Squares Estimation

1 7-1 Least Squares EstimationVersion 7 Least Squares IntroductionLeast Squares is a time-honored Estimation procedure, that was developed independently by Gauss(1795), Legendre (1805) and Adrain (1808) and published in the first decade of the nineteenthcentury. It is perhaps the most widely used technique in geophysical data analysis. Unlikemaximum likelihood, which can be applied to any problem for which we know the general formof the joint pdf, in Least Squares the parameters to be estimated must arise in expressions for themeans of the observations. When the parameters appear linearly in these expressions then theleast Squares Estimation problem can be solved in closed form, and it is relatively straightforwardto derive the statistical properties for the resulting parameter very simple example which we will treat in some detail in order to illustrate the more generalproblem is that of fitting a straight line to a collection of pairs of observations (xi, yi) wherei= 1,2.

2 , n. We suppose that a reasonable model is of the formy= 0+ 1x,(1)and we need a mechanism for determining 0and 1. This is of course just a special case of manymore general problems including fitting a polynomial of orderp, for which one would need to findp+ 1 coefficients. The most commonly used method for finding a model is that of Least squaresestimation. It is supposed thatxis anindependent(or predictor) variable which is known exactly,whileyis adependent(or response) variable. The Least Squares (LS) estimates for 0and 1arethose for which the predicted values of the curve minimize the sum of the squared deviations fromthe observations. That is the problem is to find the values of 0, 1that minimize the residual sumof squaresS( 0, 1) =n i=1(yi 0 1xi)2(2)Note that this involves the minimization of vertical deviations from the line (not the perpendiculardistance) and is thus not symmetric inyandx.

3 In other words ifxis treated as the dependentvariable instead ofyone might well expect a different find the minimizing values of iin (2) we just solve the equations resulting from setting S 0= 0, S 1= 0,(3)namely iyi=n 0+ 1 ixi ixiyi= 0 ixi+ 1 ix2i(4)c 2008 Agnew/ C. Constable7-2 Least Squares EstimationVersion for the iyields the Least Squares parameter estimates: 0= x2i iyi xi xiyin x2i ( xi)2 1=n xiyi xi yin x2i ( xi)2(5)where the s are implicitly taken to be fromi= 1 tonin each case. Having generated theseestimates, it is natural to wonder how much faith we should have in 0and 1, and whether the fitto the data is reasonable. Perhaps a different functional form would provide a more appropriate fitto the observations, for example, involving a series of independent variables, so thaty 0+ 1x1+ 2x2+ 3x3(6)or decay curvesf(t) =Ae t+Be t,(7)or periodic functionsf(t) =Acos 1t+Bsin 1t+Ccos 2t+Dsin 2t.

4 (8)In equations (7) and (8) the functionsf(t) are linear inA,B,CandD, butnonlinearin the otherparameters , , 1, and 2. When the function to be fit is linear in the parameters, then the partialderivatives ofSwith respect to them yield equations that can be solved in closed form. Typicallynon-linear Least Squares problems do not provide a solution in closed form and one must resort toan iterative procedure. However, it is sometimes possible to transform the nonlinear function tobe fitted into a linear form. For example, the Arrhenius equation models the rate of a chemicalreaction as a function of temperature via a 2-parameter model with an unknown constant frequencyfactorCand activation energyEA, so that (T) =Ce EA/kT(9)Boltzmann s constant,kis knowna priori.

5 If one measures at various values ofT, thenCandEAcan be found by a linear Least Squares fit to the transformed variables, log and1T:log (T) = logC EAkT(10) Fitting a Straight LineWe return to the simplest of LS fitting problems, namely fitting a straight line to paired observations(xi, yi), so that we can consider the statistical properties of LS estimates, assess the goodness offit in the resulting model, and understand how regression is related to make progress on these fronts we need to adopt some kind of statistical model for the noiseassociated with the measurements. In thestandard statistical model(SSM) we suppose thatyisa linear function ofxplus some random noise,yi= 0+ 1xi+eii= 1.

6 , n.(11)c 2008 Agnew/ C. Constable7-3 Least Squares EstimationVersion (11) the values ofxiare taken to be fixed, while theeiare independent random variables withE(ei) = 0 andV ar(ei) = 2, but for the time being we make no further assumption about the exactdistribution underlying the SSM it is straightforward to show that the LS estimate for a straight line is unbiased:that isE[ j] = j. To do this for 0we make use of the fact thatE[yi] = 0+ 1xi, and take theexpected value of 0in equation (5). This yields:E[ 0] = x2i iE[yi] xi xiE[yi]n x2i ( xi)2= x2i(n 0+ 1 ixi) xi( 0 xi+ 1 ix2i)n x2i ( xi)2= 0(12)A similar proof establishes thatE[ 1] = 1. Note that this proof only usesE[ei] = 0 and the factthat the errors are additive: we did not need them to be the SSM,V ar[yi] = 2andCov[yi, yj] = 0 fori6=j.

7 Making use of this it is possible (seeRice p. 513) to calculate the variances for iasV ar[ 0] = 2 ix2in x2i ( xi)2V ar[ 1] =n 2n x2i ( xi)2 Cov[ 0, 1] = 2 ixin x2i ( xi)2(13)To show this we make use of the fact that equation (5) can be rewritten in the form: 1= i(xi x)(yi y) i(xi x)2= i(xi x)(yi) i(xi x)2 ThenV ar[ 1] = 2 i(xi x)2 Similarly for the other expressions in (13).We see from (13) that the variances of the slope and intercept depend onxiand 2. Thexiareknown, so we just need a means of finding 2. In the SSM, 2=E[yi 0 1xi]. So we canestimate 2from the average squared deviations of data about the fitted line:RSS= i(yi 0 1xi)2(14)We will see later thats2=RSSn 2(15)c 2008 Agnew/ C.

8 Constable7-4 Least Squares EstimationVersion an unbiased estimate of 2. The number of degrees of freedom isn 2 because 2 parametershave been estimated from the data. So our recipe for estimatingV ar[ 0] andV ar[ 1] simplyinvolves substitutings2for 2in (13). We call these estimatess2 0ands2 1, theeiare independent normally distributed random variables then 0, 1will be too, sincethey are just linear combinations of independent normal RV s. More generally if theeiareindependent and satisfy some not too demanding assumptions, then a version of the Central LimitTheorem will apply, and for largen, 0and 1are approximately normal RV immediate and important consequence of this is that we can invoke either exact or approximateconfidence intervals and hypothesis tests based on the ibeing normally distributed.

9 It can beshown that i is i tn 2(16)and we can use thet-distribution to establish confidence intervals and for hypothesis the commonest application of hypothesis testing is in determining whether the iaresignificantly different from zero. If not there may be a case for excluding them from the Assessing FitThe most basic thing to do in assessing the fit is to use the residuals from the model, in this case: ei=yi 0 1xi(17)They should be plotted as a function ofx, which allows one to see systematic misfit or departuresfrom the SSM. These may indicate the need for a more complex model or transformation the variance of errors is a constant independent ofxthen the errors are said to beho-moscedastic, when the opposite is true they areheteroscedastic.

10 Rice provides some goodexamples for this in Chapter 14 - see Figs and When the variance varies withxitis sometimes possible to find a transformation to correct the problem. For example, instead ofy= xone could try y= x. Then = 2, ..A common scenario one might wish to test is whether the intercept is zero. This can be doneby calculating both slope and intercept, and findings 0. Then one could use thet test on thehypothesisH0: 0= 0 witht= 0s 0 Another strategy in assessing fit is to look at the sample distribution of residuals, compared to anormal probability plot. Q-Q plots of the residuals can provide a visual means of assessing thingslike gross departures from normality or identifying outliers.


Related search queries