Example: biology

Fitting generalized estimating equation (GEE ... - Stata

13/16/2001 Nicholas Horton, BU SPH1 Fitting generalized estimating equation (GEE) regression models in StataNicholas of Epidemiology and BiostatisticsBoston University School of Public Health3/16/2001 Nicholas Horton, BU SPH2 Outline Regression models for clustered or longitudinal data Brief review of GEEs mean model working correlation matrix Stata GEE implementation Example: Mental health service utilization Summary and conclusions23/16/2001 Nicholas Horton, BU SPH3 Regression models for clustered or longitudinal data Longitudinal, repeated measures, or clustered data commonly encountered Correlations between observations on a given subject may exist, and need to be accounted for If outcomes are multivariate normal, then established methods of analysis are available (Laird and Ware, Biometrics, 1982) If outcomes are binary or counts, likelihood based inference less tractable3/16/2001 Nicholas Horton, BU SPH4 generalized estimating equations Described by Liang and Zeger (Biometrika, 1986) and Zeger and Liang (Biometrics, 1986)

Ł These methods not yet implemented in standard software (requires estimation of weights and more complicated variance formula) 3/16/2001 Nicholas Horton, BU SPH 16 Variance estimators Ł Empirical (aka sandwich or robust/semi-robust) consistent when the mean model is correctly specified (if no missing data)

Tags:

  Standards, Robust, Stata

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Fitting generalized estimating equation (GEE ... - Stata

1 13/16/2001 Nicholas Horton, BU SPH1 Fitting generalized estimating equation (GEE) regression models in StataNicholas of Epidemiology and BiostatisticsBoston University School of Public Health3/16/2001 Nicholas Horton, BU SPH2 Outline Regression models for clustered or longitudinal data Brief review of GEEs mean model working correlation matrix Stata GEE implementation Example: Mental health service utilization Summary and conclusions23/16/2001 Nicholas Horton, BU SPH3 Regression models for clustered or longitudinal data Longitudinal, repeated measures, or clustered data commonly encountered Correlations between observations on a given subject may exist, and need to be accounted for If outcomes are multivariate normal, then established methods of analysis are available (Laird and Ware, Biometrics, 1982) If outcomes are binary or counts, likelihood based inference less tractable3/16/2001 Nicholas Horton, BU SPH4 generalized estimating equations Described by Liang and Zeger (Biometrika, 1986) and Zeger and Liang (Biometrics, 1986)

2 To extend the generalized linear model to allow for correlated observations Characterize the marginal expectation (average response for observations sharing the same covariates) as a function of covariates Method accounts for the correlation between observations in generalized linear regression models by use of empirical (sandwich/ robust ) variance estimator Posits model for the working correlation matrix33/16/2001 Nicholas Horton, BU SPH5 The marginal mean model We assume the marginal regression model:'|([])ijijijxgEYx = Where is a p times 1 vector of covariates, consists of the p regression parameters of interest, g(.) is the link function, and denotes the jth outcome (for j=1, ,J) for the ith subject (for i=1, ,N) Common choices for the link function include:g(a)=a (identity link)g(a)=log(a) [for count data]g(a)=log(a/(1-a)) [logit link for binary data]ijx ijY3/16/2001 Nicholas Horton, BU SPH6 Model for the correlation Assuming no missing data, the J x J covariance matrix for Y is modeled as:1/ 21/ 2()iiiVARA = Where is a glm dispersion parameter, A is a diagonal matrix of variance functions, and is the working correlation matrix of Y ()R 43/16/2001 Nicholas Horton, BU SPH7 Model for the correlation (cont.)

3 If mean model is correct, correlation structure may be mis-specified, but parameter estimates remain consistent Liang and Zeger showed that modeling correlation may boost efficiency But this is a large sample result; there must be enough clusters to estimate these parameters Variety of models that are supported in Stata3/16/2001 Nicholas Horton, BU SPH8 Model for the correlation (cont.) Independence100010()001R = LLMMOML Number of parameters: 053/16/2001 Nicholas Horton, BU SPH9 Model for the correlation (cont.) Exchangeable (compound symmetry)11()1R = LLMMOML Number of parameters: 13/16/2001 Nicholas Horton, BU SPH10 Model for the correlation (cont.) Unstructured1211221211()1 JJJJR = LLMMOML Number of parameters: J(J-1)/263/16/2001 Nicholas Horton, BU SPH11 Model for the correlation (cont.)

4 Auto-regressive121211()1 JJJJR = LLMMOML Number of parameters: 13/16/2001 Nicholas Horton, BU SPH12 Model for the correlation (cont.) Stationary (g-dependent)11121211()1 JJJJR = LLMMOML Number of parameters: 0 <g <= J-173/16/2001 Nicholas Horton, BU SPH13 Model for the correlation (cont.) Fixed1211221211()1 JJJJ ccccRcc = LLMMOML Number of parameters: 0 (user specified)3/16/2001 Nicholas Horton, BU SPH14 Model for the correlation (cont.) If J is small and data are balanced and complete, then an unstructured matrix is recommended If observations are mistimed, then use a structure that accounts for correlation as function of time (stationary, or auto-regressive) If observations are clustered ( no logical ordering) then exchangeable may be appropriate If number of clusters small, independent may be best Issues discussed further in Diggle, Liang and Zeger (1994, book)83/16/2001 Nicholas Horton, BU SPH15 Missing data Standard GEE models assume that missing observations are Missing Completely at Random (MCAR) in the sense of Little and Rubin (book, 1987) Robins, Rotnitzky and Zhao (JASA, 1995) proposed methods to allow for data that is missing at random (MAR)

5 These methods not yet implemented in standard software (requires estimation of weights and more complicated variance formula)3/16/2001 Nicholas Horton, BU SPH16 Variance estimators Empirical (aka sandwichor robust /semi- robust )consistent when the mean model is correctly specified (if no missing data) Model-based (aka na ve) [default in Stata ]consistent when both the mean model and the covariance model are correctly specified93/16/2001 Nicholas Horton, BU SPH17 Syntax for xtgeextgee depvar varlist, family(family) link(link) corr(corr)i(idvar) t(timevar) robust Family: binomial, gaussian, gamma, igaussian, nbinomial, poissonLink: identity, cloglog, log, logit, nbinomial, opwer, power, probit, reciprocalCorrelation: independent, exchangeable, ar#, stationary#, nonstationary#,unstructured, fixedAlso options to change the scale parameter, use weighted equations, specify offsets3/16/2001 Nicholas Horton, BU SPH18 Example: Mental Health Service Utilization Connecticut child studies (Zahner et al, AJPH, 1997) Outcome: use of general health, school, or mental health services (dichotomous report) Sample: 2,519 children Other dichotomous predictors.

6 Age, gender, academic problems103/16/2001 Nicholas Horton, BU SPH21 Data format and variablesAS GCESMEATCENDTHNESO BOPIOTREB I OLRNOAARS D YDOGLLLV1 90111502 000010002 90111502 000101003 90111502 000200104 80111206 000010005 80111206 000101006 80111206 000200107 40111608 100010008 40111608 100101009 40111608 100200103/16/2001 Nicholas Horton, BU SPH23 Stata code to fit modeliis idtis settingxtdesxi: xtgee serv *mental * *mental * *mental *school,link(logit) corr(unst) family(binomial)robustxtcorr113/16/2001 Nicholas Horton, BU SPH24id: 1, 2, .., 2519 n = 2519setting: 0, 1, .., 2 T = 3 Delta(type) = 1; (2-0)+1 = 3(id*setting uniquely identifies each observation)Distribution of T_i: min 5% 25% 50% 75% 95% max333 3 333 Freq.

7 Percent Cum. | Pattern---------------------------+----- ----2519 | 111---------------------------+--------- 2519 | XXX(No missing data!)Describe cross-sectional data (xtdes)3/16/2001 Nicholas Horton, BU SPH25 GEE population-averaged model Number of obs = 7557 Group and time vars: id setting Number of groups = 2519 Link: logit Obs per group: min = 3 Family: binomial avg = : unstructured max = 3 Wald chi2(11) = parameter: 1 Prob > chi2 = (standard errors adjusted for clustering on id)------------------------------------- ---------------------------------------- -| Semi-robustserv | Coef.

8 Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------- ---------------------------------------- -------_Iold_1 | .1233576 .1441123 .4058124mental | .1933698 .0268992_IoldXment~1 | .2905076 .189558 .6620344school | .1850487 .1734874 .5250778_IoldXscho~1 | .330549 .162133 .6483239_Iboy_1 | .3652564 .1464068 .0783043 .6522084_IboyXment~1 | .1894824 .0934654_IboyXscho~1 | .1650033 .1695418_Iacadpro_1 | .7239641 .1445971 .440559 ~1 | .1843236 .1911094 .5588912_IacaXscho~1 | .1669423 .8088873 | .1489399 Horton, BU SPH26 Estimates of working correlation (xtcorr)Estimated within-id corr matrix Rschool mental generalc1 c2 c3r1 Horton, BU SPH27 Multidimensional test of OLD effecttest _IoldXmenta_1=0( 1) _IoldXmenta_1 = ( 1) = > chi2 = _IoldXschoo_1=0,accumulate( 1) _IoldXschoo_1 = ( 2) _IoldXmenta_1 = ( 2) = > chi2 = !

9 Test _Iold_1=0,accumulate( 1) _IoldXschoo_1 = ( 2) _IoldXmenta_1 = ( 3) _Iold_1 = ( 3) = > chi2 = !133/16/2001 Nicholas Horton, BU SPH28 Results from Example There is a significant interaction between service setting and academic problems (df=2,p< ), but not for age and setting (df=2,p= ) or gender and setting (df=2,p= ) Overall, a higher proportion of boys use services (df=3,p= ) and older children use them more than younger children (df=3,p= )3/16/2001 Nicholas Horton, BU SPH29 More resources generalized estimating equations: an annotated bibliography (Ziegler, Kastner and Blettner, Biometrical Journal, 1998) Review of software to fit generalized estimating equation regression models (Horton and Lipsitz, The American Statistician, 1999, article online at ~ )