Robust Regression - University of Minnesota

Robust Regression John Fox & Sanford WeisbergOctober 8, 2013 All estimation methods rely on assumptions for their validity. We say that an estimator orstatistical procedure isrobustif it provides useful information even if some of the assumptionsused to justify the estimation method are not applicable. Most of this appendix concernsrobustregression, estimation methods typically for the linear Regression model that are insensitive tooutliers and possibly high leverage points. Other types of robustness, for example to modelmisspecification, are not discussed here. These methods were developed beginning in the mid-1960s. With the exception of theL1methods described in section 5, the are not widely usedtoday. A recent mathematical treatment is given by?

1 Breakdown and RobustnessThe finite sample breakdown of an estimator/procedure is the smallest fraction of data pointssuch that if [n ] points then the estimator/procuedure also becomes sample mean ofx1,..,xnis xn=1nn i=1xi=1n[n 1 i=1xi+xn]=n 1n xn 1+1nxnand so ifxnis large enough then xncan be made as large as desired regardless of the othern 1 the mean, the sample median, as an estimate of a population median, can tolerateup to 50% bad values. In general, breakdown cannot exceed 50%. (Why is that?)2M-EstimationLinear least-squares estimates can behave badly when the error distribution is not normal,particularly when the errors are heavy-tailed. One remedy is to remove influential observationsfrom the least-squares fit. Another approach, termedrobust Regression , is to use a fitting criterionthat is not as vulnerable as least squares to unusual most common general method of Robust Regression isM-estimation, introduced by?

This class of estimators can be regarded as a generalization of maximum-likelihood estimation,hence the term M -estimation. An appendix to?.1We consider only the linear modelyi= + 1xi1+ 2xi2+ + kxik+ i=x i + ifor theith ofnobservations. Given an estimatorbfor , the fitted model is yi=a+b1xi1+b2xi2+ +bkxik+ei=x iband the residuals are given byei=yi yiWithM-estimation, the estimatesbare determined by minimizing a particularobjective func-tionover allb,n i=1 (ei) =n i=1 (yi x ib)where the function gives the contribution of each residual to the objective function. A rea-sonable should have the following properties: Always nonnegative, (e) 0 Equal to zero when its argument is zero, (0) = 0 Symmetric, (e) = ( e) Monotone in|ei|, (ei) (ei ) for|ei|>|ei |For example, the least-squares -function (ei) =e2isatisfies these requirements, as do manyother = be the derivative of.

Is called theinfluence curve. Differentiating the objectivefunction with respect to the coefficientsband setting the partial derivatives to 0, produces asystem ofk+ 1 estimating equations for the coefficients:n i=1 (yi x ib)x i=0 Define theweight functionw(e) = (e)/e, and letwi=w(ei). ComputingThe estimating equations may be written asn i=1wi(yi x ib)x i=0 Solving these estimating equations is equivalent to a weighted least-squares problem, minimizing w2ie2i. The weights, however, depend upon the residuals, the residuals depend upon theestimated coefficients, and the estimated coefficients depend upon the weights. An iterativesolution (callediteratively reweighted least-squares,IRLS) is therefore required:1. Select initial estimatesb(0), such as the least-squares At each iterationt, calculate residualse(t 1)iand associated weightsw(t 1)i=w[e(t 1)i]from the previous Solve for new weighted-least-squares estimatesb(t)=[X W(t 1)X] 1X W(t 1)ywhereXis the model matrix, withx ias itsith row, andW(t 1)= diag{w(t 1)i}is thecurrent weight 2 and 3 are repeated until the estimated coefficients asymptotic covariance matrix ofbisV(b) =E( 2)[E( )]2(X X) 1 Using [ (ei)]2to estimateE( 2), and [ (ei)/n]2to estimate [E( )]2produces theesti-matedasymptotic covariance matrix, V(b) (which is not reliable in small samples).

Objective FunctionsFigure 1 compares the objective functions, and the corresponding and weight functions forthreeM-estimators: the familiar least-squares estimator; theHuberestimator; and the Tukeybisquare(orbiweight) estimator. The objective and weight functions for the three estimatorsare also given in Table the least-squares and Huber objective functions increase without bound as the residualedeparts from 0, but the least-squares objective function increases more rapidly. In contrast,the bisquare objective function levels eventually levels off (for|e|> k). Least-squares assignsequal weight to each observation; the weights for the Huber estimator decline when|e|> k; andthe weights for the bisquare decline as soon asedeparts from 0, and are 0 for|e|> valuekfor the Huber and bisquare estimators is called atuning constant; smallervalues ofkproduce more resistance to outliers, but at the expense of lower efficiency when theerrors are normally distributed.

The tuning constant is generally picked to give reasonably highefficiency in the normal case; in particular,k= for the Huber andk= for thebisquare (where is the standard deviation of the errors) produce 95-percent efficiency whenthe errors are normal, and still offer protection against an application, we need an estimate of the standard deviation of the errors to use theseresults. Usually a Robust measure of spread is used in preference to the standard deviation ofthe residuals. For example, a common approach is to take = , where MAR isthe median absolute FunctionWeight FunctionLeast-Squares LS(e) =e2wLS(e) = 1 Huber H(e) ={12e2for|e| kk|e| 12k2for|e|> kwH(e) ={1for|e| kk/|e|for|e|> kBisquare B(e) = k26{1 [1 (ek)2]3}for|e| kk2/6for|e|> kwB(e) ={[1 (ek)2]2for|e| k0for|e|> kTable 1: Objective function and weight function for least-squares, Huber, and bisquare 6 4 2024605102030e (e)ls 6 4 20246 6 4 20246e (e) 6 4 (e) 6 4 2024601234567e (e)huber 6 4 20246 (e) 6 4 (e) 6 4 202460123e (e)bisquare 6 4 20246 (e) 6 4 (e)Figure 1: Objective, , and weight functions for the least-squares (top), Huber (middle), andbisquare (bottom) estimators.}}}

The tuning constants for these graphs arek= for theHuber estimator andk= for the bisquare. (One way to think about this scaling is thatthe standard deviation of the errors, , is taken as 1.)43 Bounded-Influence RegressionUnder certain circumstances,M-estimators can be vulnerable to high-leverage key concept in assessing influence is thebreakdown pointof an estimator: The breakdownpoint is the fraction of bad data that the estimator can tolerate without being affected to anarbitrarily large extent. For example, in the context of estimating the center of a distribution,the mean has a breakdown point of 0, because even one bad observation can change the meanby an arbitrary amount; in contrast the median has a breakdown point of 50 percent.

Very highbreakdown estimators for Regression have been proposed and R functions for them are presentedhere. However, very high breakdown estimates should be avoided unless you have faith that themodel you are fitting is correct, as the very high breakdown estimates do not allow for diagnosisof model misspecification,?.One bounded-influence estimator isleast-trimmed squares(LTS) Regression . Order thesquared residuals from smallest to largest:(e2)(1),(e2)(2),..,(e2)(n)The LTS estimator chooses the Regression coefficientsbto minimize the sum of the smallestmof the squared residuals,LTS(b) =m i=1(e2)(i)where, typically,m=bn/2c+b(k+ 2)/2c, a little more than half of the observations, and the floor brackets,b c, denote rounding down to the next smallest the LTS criterion is easily described, the mechanics of fitting the LTS estimator arecomplicated (?)

Moreover, bounded-influence estimators can produce unreasonable results incertain circumstances,?, and there is no simple formula for coefficient standard An Illustration: Duncan s Occupational-Prestige RegressionDuncan s occupational-prestige Regression was introduced in Chapter 1 of?. The least-squaresregression ofprestigeonincomeandeducationproduces the following results:> library(car)> <- lm(prestige ~ income + education, data=Duncan)> summary( )Call:lm(formula = prestige ~ income + education, data = Duncan)Residuals:Min 1Q Median 3Q :Estimate Std. Error t value Pr(>|t|)(Intercept) inference for the LTS estimator can be performed by bootstrapping, however. See the Appendixon bootstrapping for an standard error: on 42 degrees of freedomMultiple R-squared: , Adjusted R-squared: : 101 on 2 and 42 DF, p-value: <2e-16 Recall from the discussion of Duncan s data in?

That two observations, ministers andrailroad conductors, serve to decrease theincomecoefficient substantially and to increase theeducationcoefficient, as we may verify by omitting these two observations from the Regression :> <- update( , subset=-c(6,16))> summary( )Call:lm(formula = prestige ~ income + education, data = Duncan, subset = -c(6,16))Residuals:Min 1Q Median 3Q :Estimate Std. Error t value Pr(>|t|)(Intercept) standard error: on 40 degrees of freedomMultiple R-squared: , Adjusted R-squared: : 141 on 2 and 40 DF, p-value: <2e-16 Alternatively, let us compute the HuberM-estimator for Duncan s Regression model, using therlm(robustlinearmodel) function in theMASS library:> library(MASS)> <- rlm(prestige ~ income + education, data=Duncan)> summary( )Call: rlm(formula = prestige ~ income + education, data = Duncan)Residuals:Min 1Q Median 3Q :Value Std.

Robust Regression - University of Minnesota

Tags:

Information

Transcription of Robust Regression - University of Minnesota

Related search queries

Robust Regression - University of Minnesota

Tags:

Information

Documents from same domain

Related documents

Related search queries