Example: tourism industry

Weighting in the regression analysis of survey …

Weighting in the regression analysis of surveydata with a cross-national applicationChris Skinner Ben Mason 17 July 2012 AbstractA class of survey Weighting methods provides consistent estima-tion of regression coefficients under unequal probability sampling. Theminimization of the variance of the estimated coefficients within thisclass is considered. A series of approximations leads to a simple mod-ification of the usual design weight. One type of application whereunequal probabilities of selection arise is in cross-national compara-tive surveys. In this case, our argument suggests the use of a certainkind of within-country weight. We investigate this idea in an applica-tion to data from the European Social survey , where we fit a logisticregression model with vote in an election as the dependent variableand with various variables of political science interest included as ex-planatory variables. We show that the use of the modified weightsleads to a considerable reduction in standard errors compared to de-sign INTRODUCTIONS urvey Weighting is often used when regression models are estimated fromsurvey data .

Weighting in the regression analysis of survey data with a cross-national application Chris Skinner Ben Masony 17 July 2012 Abstract A class of survey weighting methods provides consistent estima-

Tags:

  Analysis, Data, Survey, 2012, Regression, Weighting, Weighting in the regression analysis of survey, Weighting in the regression analysis of survey data

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Weighting in the regression analysis of survey …

1 Weighting in the regression analysis of surveydata with a cross-national applicationChris Skinner Ben Mason 17 July 2012 AbstractA class of survey Weighting methods provides consistent estima-tion of regression coefficients under unequal probability sampling. Theminimization of the variance of the estimated coefficients within thisclass is considered. A series of approximations leads to a simple mod-ification of the usual design weight. One type of application whereunequal probabilities of selection arise is in cross-national compara-tive surveys. In this case, our argument suggests the use of a certainkind of within-country weight. We investigate this idea in an applica-tion to data from the European Social survey , where we fit a logisticregression model with vote in an election as the dependent variableand with various variables of political science interest included as ex-planatory variables. We show that the use of the modified weightsleads to a considerable reduction in standard errors compared to de-sign INTRODUCTIONS urvey Weighting is often used when regression models are estimated fromsurvey data .

2 Unweighted estimators of regression coefficients may be biased Department of Statistics, London School of Economics and Political Science, LondonWC2A 2AE, School of Social Sciences, University of Southampton, Southampton SO17 1BJ, the inclusion of units in the sample is correlated with the outcome variableconditional on the explanatory variables. Weighting by the reciprocals of theunit inclusion probabilities, as in Horvitz-Thompson estimation, enables thisbias to be corrected and regression coefficients to be estimated consistently(Fuller, 2009, Sect. ). A potential disadvantage of Weighting , however,is that it may inflate the variances of the coefficient estimators. This prob-lem may be more acute the greater the variability of the sample number of approaches exist to control the variance inflation while re-taining consistency. A simple option takes the design variables, which ac-count for variation in the sample inclusion probabilities, and includes theseas additional explanatory variables in the model.

3 This is only appropriate,however, when this respecified model remains of scientific this paper we consider approaches which seek to improve estimation ef-ficiency by modifying the survey weights. Such approaches can be straightfor-ward to implement for practitioners. We develop such an approach within thegeneral estimating function framework of Thompson (1997, ). Our ap-proach extends a variance minimization approach proposed by Fuller (2009,Sect. ) for linear regression and leads to a weight modification similarto the semi-parametric approach proposed by Pfeffermann and Sverchkov(1999).We have presented the rationale for Weighting as the removal of bias dueto unequal selection probabilities. Another rationale that is sometimes givenis that it addresses model misspecification, in the sense that the weightedestimator is consistent for a finite population regression coefficient, which2is meaningful even if the model fails (Godambe and Thompson, 1986).

4 Weshall discuss this issue too, but only as a secondary explore the application of our modified Weighting method to a spe-cific cross-national regression analysis of European Social survey (ESS) application exemplifies a particular problem of Weighting arising in cross-national comparative surveys when data are pooled across countries (Thomp-son, 2008, Section 3). It is common in the design of such surveys for samplesizes in different countries to be much less variable than population sizesand for this to lead to very different sampling fractions across implies that data from large countries may dominate pooled analysesemploying Horvitz-Thompson Weighting , leading to inefficient use of sampledata (Thompson, 2008, Section 3). In this paper we show how the modifiedweighting approach may address this paper is organized as follows. We develop the theory behind ourmodified Weighting approach in Section 2.

5 We discuss the generic kind ofcross-national application in Section 3. The specific application to ESS datais presented in Section 4 and some final discussion is provided in Section ESTIMATION THEORYF ollowing Thompson (1997, Ch. 6), let the units in a finite populationUbelabelledj= 1,..,Nand let the row vector (yj,xj) denote the associatedvalues of a pair of response and explanatory variables in a regression analysis ,whereyjis the realized value of the random variableYjand the 1 kvectorxjis treated as fixed. Consider a regression model under which theYjareindependent, with a distribution depending on ak 1 column vector of3parameters, such thatEm{ j(Yj;xj, )}=0forj= 1,..,N,(1)where j(Yj;xj, ) is ak 1 vector estimating function andEm(.) denotesexpectation under the model. The population-level equationsN j=1 j(Yj;xj, ) =0,(2)are unbiased estimating equations in the terminology of Godambe andThompson (2009). Assuming that in some asymptotic framework a solution Uto these equations eventually exists uniquely and under additional regular-ity conditions (Godambe and Thompson, 2009), Uconverges in probabilityto.

6 A particular instance of such an estimating function is the unit-levelscore function given by j(Yj;xj, ) = logfj(Yj;xj, ).wherefj(Yj;xj, ) is the probability density or mass function forYjand Uis the census maximum likelihood estimator of which would apply ifall population values of (yj,xj) were observed. For illustration, if a binaryvariableyj, taking values 0 or 1, obeys a logistic regression model, where isthe vector of regression coefficients, we have log{fj(1;xj, )/fj(0;xj, )}=xj and j(Yj;xj, ) ={Yj fj(1;xj, ))}xTj,(3)whereTdenotes suppose that the (yj,xj) are only observed for unitsjin a sampledrawn by a probability sampling scheme fromUand letIj,j= 1,..,N, be4the sample indicators, whereIj= 1 if unitjis sampled andIj= 0 if shall be interested in the weighted estimator wwhich solves the sampleestimating equationsN j=1wjIj j(Yj;xj, ) =0,(4)wherewjis a survey weight. Corresponding to condition (1) for the con-sistency of U, these estimating equations are unbiased under the joint dis-tribution induced by the design and the model and wis consistent for ifEmEp[wjIj j(Yj;xj, )] =0,forj= 1.

7 ,N(5)whereEp(.) denotes expectation with respect to the sampling scheme. Twobasic cases when condition (5) holds are:(i)thewjare constant, so that wis the unweighted estimator, and sam-pling isnoninformative, that isIjandYjare independent (conditionalonxj) for eachj. This arises, in particular, when sample inclusiondepends just on a set of design variables which are included in the vec-torxj. Fuller (2009, Section ) reviews tests of this noninformativecondition, including a test proposed by DuMouchel and Duncan (1983).(ii)wj=dj, the design (Horvitz-Thompson) weight given bydj= 1j,where j=Ep(Ij) is the inclusion probability of each of these cases, the proof of (5) assumes model (1) holds. For ex-ample, to demonstrate that (5) holds in case (ii) we writeEmEp(djIj j) =Em{Ep(djIj) j}, where j= j(Yj;xj, ), and useEp[djIj] = 1 and (1).5 Following a similar argument, (5) holds for the class of cases, generalizing(ii), defined by:(iii)wj=djqj, whereqj=q(xj) andq(.)

8 Is an arbitrary class of weighted estimators defined by such weights is the one of primaryinterest in this paper and within which we consider minimising the varianceof linear combinations of the elements of w. In regular cases (Thompson,1997, ), the asymptotic covariance matrix of wisvarmp( w) =J( ) 1varmp{N j=1wjIj j}J( ) 1,(6)whereJ( ) =Emp{ Nj=1wjIj j }and, whenwj=djq(xj), we can writeJ( ) =N j=1qjEm( j ).We should like to chooseqjso that the variance in (6) is minimized. Forpractical purposes, we consider it sufficient to minimize an approximation tothis variance, since any weighted estimator in the class defined bywj=djqjis consistent. We shall make a series of approximations to enable us to specifyqjin a simple and practical way. Our first approximation is of independencebetween units, so that the covariances between the differentdjqjIj jtermscan be ignored. This is similar to the simplification employed by Fuller(2009, ) of assuming Poisson sampling.

9 Under this approximation, wemay rewrite (6) asvarmp( w) J( ) 1{N j=1varmp(djqjIj j)}J( ) , we havevarmp(djqjIj j) =Em{varp(djqjIj j)}+varm{Ep(djqjIj j)}=Em{(dj 1)q2j j Tj}+varm(qj j)=Em(djq2j j Tj).Hence, whenwj=djq(xj), the asymptotic covariance matrix can be ex-pressed asvarmp( w) {N j=1qjEm( j )} 1N j=1q2jEm(dj j Tj){N j=1qjEm( j )} 1As a second simplification, we assume that j(Yj;xj, ) is a score functionso thatEm( j Tj) = Em( j )=Hj,say,and also that we have a generalized linear model so that = is the vectorof regression coefficients and j(Yj;xj, ) = j(Yj;xj )xTjwhere j(.) is ascalar function. Then we may writeHj= 2jxTjxj, where 2j=Em( 2j) , j= j(Yj;xj ) andvarmp( w) {N j=1qj 2jxTjxj} 1N j=1q2j jxTjxj{N j=1qj 2jxTjxj} 1(7)where j=Em(dj 2j). By analogy to the Gauss-Markov Theorem, the choiceofqjwhich minimises the variance given by (7) of any linear combination ofthe elements of wisqoptj=qopt(xj) 2j/ j=Em( 2j|xj)/Em(dj 2j|xj).

10 (8)This generalizes an argument used by Fuller (2009, pp. 359, 360) for thespecial case of heteroskedastic normal error linear regression withk= 1. We7make the conditioning onxjexplicit on the right hand side of (8) to be clearthatqoptjdepends onxj. The quantity on the right hand side of (8) is not ob-served, but is estimable from auxiliary regressions of 2janddj 2jonxj, where j= j(Yj;xj ) and is a consistent estimator of . These regressions andthe estimation of could, for example, employ design-weighted do not pursue this idea further in this paper, however. Rather, we makethe further approximation thatdjis uncorrelated with 2j(givenxj) so thatexpression (8) simplifies toqoptj 1/Em(dj|xj).(9)Some justification for this approximation will be given in Section 4 in thecontext of our application. The form of Weighting in (9) is similar to thesemi-parametric approach of Pfeffermann and Sverchkov (1999), althoughthey propose to takeqj 1/Emp(dj|xj,Ij= 1).


Related search queries