Example: marketing

Generalized Linear Model Theory - Princeton University

Appendix BGeneralized Linear ModelTheoryWe describe the Generalized Linear Model as formulated by Nelder and Wed-derburn (1972), and discuss estimation of the parameters and tests of The ModelLety1, .. , yndenotenindependent observations on a response. We treatyias a realization of a random variableYi. In the general Linear Model weassume thatYihas a normal distribution with mean iand variance 2Yi N( i, 2),and we further assume that the expected value iis a Linear function ofppredictors that take valuesx i= (xi1, .. , xip) for thei-th case, so that i=x i ,where is a vector of unknown will generalize this in two steps, dealing with the stochastic andsystematic components of the Rodr guez. Revised November 20012 APPENDIX B. Generalized Linear Model The Exponential FamilyWe will assume that the observations come from a distribution in the expo-nential family with probability density functionf(yi) = exp{yi i b( i)ai( )+c(yi, )}.( )Here iand are parameters andai( ),b( i) andc(yi, ) are known func-tions.

cal links is that a minimal sufficient statistic for β exists, i.e. all the informa-tion about β is contained in a function of the data of the same dimensionality as β. B.2 Maximum Likelihood Estimation An important practical feature of generalized linear models is that they can

Tags:

  Linear, Model, Into, Rofamin, Generalized, Generalized linear models, I nformation, Generalized linear

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Generalized Linear Model Theory - Princeton University

1 Appendix BGeneralized Linear ModelTheoryWe describe the Generalized Linear Model as formulated by Nelder and Wed-derburn (1972), and discuss estimation of the parameters and tests of The ModelLety1, .. , yndenotenindependent observations on a response. We treatyias a realization of a random variableYi. In the general Linear Model weassume thatYihas a normal distribution with mean iand variance 2Yi N( i, 2),and we further assume that the expected value iis a Linear function ofppredictors that take valuesx i= (xi1, .. , xip) for thei-th case, so that i=x i ,where is a vector of unknown will generalize this in two steps, dealing with the stochastic andsystematic components of the Rodr guez. Revised November 20012 APPENDIX B. Generalized Linear Model The Exponential FamilyWe will assume that the observations come from a distribution in the expo-nential family with probability density functionf(yi) = exp{yi i b( i)ai( )+c(yi, )}.( )Here iand are parameters andai( ),b( i) andc(yi, ) are known func-tions.

2 In all models considered in these notes the functionai( ) has theformai( ) = /pi,wherepiis a knownprior weight, usually parameters iand are essentially location and scale can be shown that ifYihas a distribution in the exponential family thenit has mean and varianceE(Yi) = i=b ( i)( )var(Yi) = 2i=b ( i)ai( ),( )whereb ( i) andb ( i) are the first and second derivatives ofb( i). Whenai( ) = /pithe variance has the simpler formvar(Yi) = 2i= b ( i) exponential family just defined includes as special cases the normal,binomial, Poisson, exponential, gamma and inverse Gaussian :The normal distribution has densityf(yi) =1 2 2exp{ 12(yi i)2 2}.Expanding the square in the exponent we get (yi i)2=y2i+ 2i 2yi i,so the coefficient ofyiis i/ 2. This result identifies ias iand as 2,withai( ) = . Now writef(yi) = exp{yi i 12 2i 2 y2i2 2 12log(2 2)}.This shows thatb( i) =12 2i(recall that i= i). Let us check the meanand variance:E(Yi) =b ( i) = i= i,var(Yi) =b ( i)ai( ) = THE MODEL3 Try to generalize this result to the case whereYihas a normal distributionwith mean iand variance 2/nifor known constantsni, as would be thecase if theYirepresented sample :In Problem Set 1 you will show that the exponential distributionwith densityf(yi) = iexp{ iyi}belongs to the exponential Sections and we verify that the binomial and Poisson distri-butions also belong to this The Link FunctionThe second element of the generalization is that instead of modeling themean, as before, we will introduce a one-to-one continuous differentiabletransformationg( i) and focus on i=g( i).

3 ( )The functiong( i) will be called thelinkfunction. Examples of link func-tions include the identity, log, reciprocal, logit and further assume that the transformed mean follows a Linear Model , sothat i=x i .( )The quantity iis called thelinear predictor. Note that the Model for iis pleasantly simple. Since the link function is one-to-one we can invert it toobtain i=g 1(x i ).The Model for iis usually more complicated than the Model for that we do not transform the responseyi, but rather its expectedvalue i. A Model where logyiis Linear onxi, for example, is not the sameas a Generalized Linear Model where log iis Linear :The standard Linear Model we have studied so far can be describedas a Generalized Linear Model with normal errors and identity link, so that i= also happens that i, and therefore i, is the same as i, the parameter inthe exponential family B. Generalized Linear Model THEORYWhen the link function makes the Linear predictor ithe same as thecanonical parameter i, we say that we have acanonical link.

4 The identityis the canonical link for the normal distribution. In later sections we will seethat the logit is the canonical link for the binomial distribution and the logis the canonical link for the Poisson distribution. This leads to some naturalpairings:ErrorLinkNormal IdentityBinomial LogitPoisson LogHowever, other combinations are also possible. An advantage of canoni-cal links is that a minimal sufficient statistic for exists, all the informa-tion about is contained in a function of the data of the same dimensionalityas . Maximum Likelihood EstimationAn important practical feature of Generalized Linear models is that they canall be fit to data using the same algorithm, a form ofiteratively re-weightedleast squares. In this section we describe the a trial estimate of the parameters , we calculate the estimatedlinear predictor i=x i and use that to obtain the fitted values i=g 1( i). Using these quantities, we calculate the working dependent variablezi= i+ (yi i)d id i,( )where the rightmost term is the derivative of the link function evaluated atthe trial we calculate the iterative weightswi=pi/[b ( i)(d id i)2],( )whereb ( i) is the second derivative ofb( i) evaluated at the trial estimateand we have assumed thatai( ) has the usual form /pi.

5 This weight isinversely proportional to the variance of the working dependent variablezigiven the current estimates of the parameters, with proportionality factor . TESTS OF HYPOTHESES5 Finally, we obtain an improved estimate of regressing the workingdependent variablezion the predictorsxiusing the weightswi, wecalculate the weighted least-squares estimate = (X WX) 1X Wz,( )whereXis the Model matrix,Wis a diagonal matrix of weights with entrieswigiven by ( ) andzis a response vector with entrieszigiven by ( ).The procedure is repeated until successive estimates change by less thana specified small amount. McCullagh and Nelder (1989) prove that thisalgorithm is equivalent to Fisher scoring and leads to maximum likelihoodestimates. These authors consider the case of generalai( ) and include intheir expression for the iterative weight. In other words, they usew i= wi,wherewiis the weight used here. The proportionality factor cancels outwhen you calculate the weighted least-squares estimates using ( ), so theestimator is exactly the same.

6 I prefer to show explicitly rather thaninclude it :For normal data with identity link i= i, so the derivative isd i/d i= 1 and the working dependent variable isyiitself. Since in additionb ( i) = 1 andpi= 1, the weights are constant and no iteration is Sections and we derive the working dependent variable and theiterative weights required for binomial data with link logit and for Poissondata with link log. In both cases iteration will usually be values may be obtained by applying the link to the data, take i=yiand i=g( i). Sometimes this requires a few adjustments,for example to avoid taking the log of zero, and we will discuss these at theappropriate Tests of HypothesesWe consider Wald tests and likelihood ratio tests, introducing Wald TestsThe Wald test follows immediately from the fact that the information matrixfor Generalized Linear models is given byI( ) =X WX/ ,( )6 APPENDIX B. Generalized Linear Model THEORYso the large sample distribution of the maximum likelihood estimator ismultivariate normal Np( ,(X WX) 1 ).

7 ( )with mean and variance-covariance matrix (X WX) 1 .Tests for subsets of are based on the corresponding marginal :In the case of normal errors with identity link we haveW=I(whereIdenotes the identity matrix), = 2, and theexactdistributionof is multivariate normal with mean and variance-covariance matrix(X X) 1 Likelihood Ratio Tests and The DevianceWe will show how the likelihood ratio criterion for comparing any two nestedmodels, say 1 2, can be constructed in terms of a statistic called thedevianceand an unknown scale parameter .Consider first comparing a Model of interest with asaturatedmodel that provides a separate parameter for each idenote the fitted values under and let idenote the correspond-ing estimates of the canonical parameters. Similarly, let O=yiand idenote the corresponding estimates under .The likelihood ratio criterion to compare these two models in the expo-nential family has the form 2 log = 2n i=1yi( i i) b( i) +b( i)ai( ).

8 Assume as usual thatai( ) = /pifor known prior weightspi. Then wecan write the likelihood-ratio criterion as follows: 2 log =D(y, ) .( )The numerator of this expression does not depend on unknown parametersand is called thedeviance:D(y, ) = 2n i=1pi[yi( i i) b( i) +b( i)].( )The likelihood ratio criterion 2 logLis the deviance divided by the scaleparameter , and is called thescaled TESTS OF HYPOTHESES7 Example:Recall that for the normal distribution we had i= i,b( i) =12 2i,andai( ) = 2, so the prior weights arepi= 1. Thus, the deviance isD(y, ) = 2 {yi(yi i) 12y2i+12 i2}= 2 {12y2i yi i2+12 i2}= (yi i)2our good old friend, the residual sum of us now return to the comparison of two nested models 1, withp1parameters, and 2, withp2parameters, such that 1 2andp2> log of the ratio of maximized likelihoods under the two models canbe written as a difference of deviances, since the maximized log-likelihoodunder the saturated Model cancels out.

9 Thus, we have 2 log =D( 1) D( 2) ( )The scale parameter is either known or estimated using the larger Model sample Theory tells us that the asymptotic distribution of thiscriterion under the usual regularity conditions is 2 with =p2 p1degreesof :In the Linear Model with normal errors we estimate the unknownscale parameter using the residual sum of squares of the larger Model , sothe criterion becomes 2 log =RSS( 1) RSS( 2)RSS( 2)/(n p2).In large samples the approximate distribution of this criterion is 2 with =p2 p1degrees of freedom. Under normality, however, we have an exactresult: dividing the criterion byp2 p1we obtain anFwithp2 p1andn p2degrees of freedom. Note that asn the degrees of freedom inthe denominator approach and theFconverges to (p2 p1) 2, so theasymptotic and exact criteria become Sections and we will construct likelihood ratio tests for bi-nomial and Poisson data. In those cases = 1 (unless one allows over-dispersion and estimates , but that s another story) and the deviance isthe same as the scaled deviance.

10 All our tests will be based on asymptotic B. Generalized Linear Model Binomial Errors and Link LogitWe apply the Theory of Generalized Linear models to the case of binary data,and in particular to logistic regression The Binomial DistributionFirst we verify that the binomial distributionB(ni, i) belongs to the expo-nential family of Nelder and Wedderburn (1972). The binomial probabilitydistribution function ( ) isfi(yi) =(niyi) yii(1 i)ni yi.( )Taking logs we find thatlogfi(yi) =yilog( i) + (ni yi) log(1 i) + log(niyi).Collecting terms onyiwe can writelogfi(yi) =yilog( i1 i) +nilog(1 i) + log(niyi).This expression has the general exponential formlogfi(yi) =yi i b( i)ai( )+c(yi, )with the following equivalences: Looking first at the coefficient ofyiwe notethat the canonical parameter is the logit of i i= log( i1 i).( )Solving for iwe see that i=e i1 +e i,so 1 i=11 +e we rewrite the second term in the as a function of i, so log(1 i) = log(1 +e i), we can identify the cumulant functionb( i) asb( i) =nilog(1 +e i).


Related search queries