Example: air traffic controller

Generalized Linear Model Theory - Princeton University

Appendix BGeneralized Linear ModelTheoryWe describe the Generalized Linear Model as formulated by Nelder and Wed-derburn (1972), and discuss estimation of the parameters and tests of The ModelLety1, .. , yndenotenindependent observations on a response. We treatyias a realization of a random variableYi. In the general Linear Model weassume thatYihas a normal distribution with mean iand variance 2Yi N( i, 2),and we further assume that the expected value iis a Linear function ofppredictors that take valuesx i= (xi1.)

B.2 Maximum Likelihood Estimation An important practical feature of generalized linear models is that they can all be fit to data using the same algorithm, a form of iteratively re-weighted least squares. In this section we describe the algorithm. Given a trial estimate of the parameters βˆ, we calculate the estimated linear predictor ˆη i ...

Tags:

  Linear, Model, Estimation, Generalized, Generalized linear models, Likelihood, Generalized linear, Likelihood estimation

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Generalized Linear Model Theory - Princeton University

1 Appendix BGeneralized Linear ModelTheoryWe describe the Generalized Linear Model as formulated by Nelder and Wed-derburn (1972), and discuss estimation of the parameters and tests of The ModelLety1, .. , yndenotenindependent observations on a response. We treatyias a realization of a random variableYi. In the general Linear Model weassume thatYihas a normal distribution with mean iand variance 2Yi N( i, 2),and we further assume that the expected value iis a Linear function ofppredictors that take valuesx i= (xi1.)

2 , xip) for thei-th case, so that i=x i ,where is a vector of unknown will generalize this in two steps, dealing with the stochastic andsystematic components of the Rodr guez. Revised November 20012 APPENDIX B. Generalized Linear Model The Exponential FamilyWe will assume that the observations come from a distribution in the expo-nential family with probability density functionf(yi) = exp{yi i b( i)ai( )+c(yi, )}.( )Here iand are parameters andai( ),b( i) andc(yi, ) are known func-tions. In all models considered in these notes the functionai( ) has theformai( ) = /pi,wherepiis a knownprior weight, usually parameters iand are essentially location and scale can be shown that ifYihas a distribution in the exponential family thenit has mean and varianceE(Yi) = i=b ( i)( )var(Yi) = 2i=b ( i)ai( ),( )whereb ( i) andb ( i) are the first and second derivatives ofb( i).

3 Whenai( ) = /pithe variance has the simpler formvar(Yi) = 2i= b ( i) exponential family just defined includes as special cases the normal,binomial, Poisson, exponential, gamma and inverse Gaussian :The normal distribution has densityf(yi) =1 2 2exp{ 12(yi i)2 2}.Expanding the square in the exponent we get (yi i)2=y2i+ 2i 2yi i,so the coefficient ofyiis i/ 2. This result identifies ias iand as 2,withai( ) = . Now writef(yi) = exp{yi i 12 2i 2 y2i2 2 12log(2 2)}.This shows thatb( i) =12 2i(recall that i= i). Let us check the meanand variance:E(Yi) =b ( i) = i= i,var(Yi) =b ( i)ai( ) = THE MODEL3 Try to generalize this result to the case whereYihas a normal distributionwith mean iand variance 2/nifor known constantsni, as would be thecase if theYirepresented sample.

4 In Problem Set 1 you will show that the exponential distributionwith densityf(yi) = iexp{ iyi}belongs to the exponential Sections and we verify that the binomial and Poisson distri-butions also belong to this The Link FunctionThe second element of the generalization is that instead of modeling themean, as before, we will introduce a one-to-one continuous differentiabletransformationg( i) and focus on i=g( i).( )The functiong( i) will be called thelinkfunction. Examples of link func-tions include the identity, log, reciprocal, logit and further assume that the transformed mean follows a Linear Model , sothat i=x i.

5 ( )The quantity iis called thelinear predictor. Note that the Model for iis pleasantly simple. Since the link function is one-to-one we can invert it toobtain i=g 1(x i ).The Model for iis usually more complicated than the Model for that we do not transform the responseyi, but rather its expectedvalue i. A Model where logyiis Linear onxi, for example, is not the sameas a Generalized Linear Model where log iis Linear :The standard Linear Model we have studied so far can be describedas a Generalized Linear Model with normal errors and identity link, so that i= also happens that i, and therefore i, is the same as i, the parameter inthe exponential family B.

6 Generalized Linear Model THEORYWhen the link function makes the Linear predictor ithe same as thecanonical parameter i, we say that we have acanonical link. The identityis the canonical link for the normal distribution. In later sections we will seethat the logit is the canonical link for the binomial distribution and the logis the canonical link for the Poisson distribution. This leads to some naturalpairings:ErrorLinkNormal IdentityBinomial LogitPoisson LogHowever, other combinations are also possible. An advantage of canoni-cal links is that a minimal sufficient statistic for exists, all the informa-tion about is contained in a function of the data of the same dimensionalityas.

7 Maximum likelihood EstimationAn important practical feature of Generalized Linear models is that they canall be fit to data using the same algorithm, a form ofiteratively re-weightedleast squares. In this section we describe the a trial estimate of the parameters , we calculate the estimatedlinear predictor i=x i and use that to obtain the fitted values i=g 1( i). Using these quantities, we calculate the working dependent variablezi= i+ (yi i)d id i,( )where the rightmost term is the derivative of the link function evaluated atthe trial we calculate the iterative weightswi=pi/[b ( i)(d id i)2],( )whereb ( i) is the second derivative ofb( i) evaluated at the trial estimateand we have assumed thatai( ) has the usual form /pi.

8 This weight isinversely proportional to the variance of the working dependent variablezigiven the current estimates of the parameters, with proportionality factor . TESTS OF HYPOTHESES5 Finally, we obtain an improved estimate of regressing the workingdependent variablezion the predictorsxiusing the weightswi, wecalculate the weighted least-squares estimate = (X WX) 1X Wz,( )whereXis the Model matrix,Wis a diagonal matrix of weights with entrieswigiven by ( ) andzis a response vector with entrieszigiven by ( ).The procedure is repeated until successive estimates change by less thana specified small amount.

9 McCullagh and Nelder (1989) prove that thisalgorithm is equivalent to Fisher scoring and leads to maximum likelihoodestimates. These authors consider the case of generalai( ) and include intheir expression for the iterative weight. In other words, they usew i= wi,wherewiis the weight used here. The proportionality factor cancels outwhen you calculate the weighted least-squares estimates using ( ), so theestimator is exactly the same. I prefer to show explicitly rather thaninclude it :For normal data with identity link i= i, so the derivative isd i/d i= 1 and the working dependent variable isyiitself.

10 Since in additionb ( i) = 1 andpi= 1, the weights are constant and no iteration is Sections and we derive the working dependent variable and theiterative weights required for binomial data with link logit and for Poissondata with link log. In both cases iteration will usually be values may be obtained by applying the link to the data, take i=yiand i=g( i). Sometimes this requires a few adjustments,for example to avoid taking the log of zero, and we will discuss these at theappropriate Tests of HypothesesWe consider Wald tests and likelihood ratio tests, introducing Wald TestsThe Wald test follows immediately from the fact that the information matrixfor Generalized Linear models is given byI( ) =X WX/ ,( )6 APPENDIX B.


Related search queries