Example: bachelor of science

Generalized Estimating Equations - SAS

Generalized Estimating Equations Introduction The Generalized Estimating Equations (GEEs) methodology, introduced by Liang and Zeger (1986), enables you to analyze correlated data that otherwise could be modeled as a Generalized linear model. GEEs have become an important strategy in the analysis of correlated data. These data sets can arise from longitudinal studies, in which subjects are measured at different points in time, or from clustering, in which measurements are taken on subjects who share a common characteristic, such as belonging to the same litter. SAS/STAT. software provides two procedures that enable you to perform GEE analysis: the GENMOD procedure and the GEE procedure. Both procedures implement the standard Generalized Estimating equation approach for longitudinal data; this approach is appropriate for complete data or when data are missing completely at random (MCAR).

SAS/STAT software provides two procedures that enable you to perform GEE analysis: the GENMOD procedure and the GEE procedure. Both procedures implement the standard generalized estimating equation approach for longitudinal data; this approach is appropriate for complete data or when data are missing completely

Tags:

  Estimating, Generalized, Stats, Generalized estimating

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Generalized Estimating Equations - SAS

1 Generalized Estimating Equations Introduction The Generalized Estimating Equations (GEEs) methodology, introduced by Liang and Zeger (1986), enables you to analyze correlated data that otherwise could be modeled as a Generalized linear model. GEEs have become an important strategy in the analysis of correlated data. These data sets can arise from longitudinal studies, in which subjects are measured at different points in time, or from clustering, in which measurements are taken on subjects who share a common characteristic, such as belonging to the same litter. SAS/STAT. software provides two procedures that enable you to perform GEE analysis: the GENMOD procedure and the GEE procedure. Both procedures implement the standard Generalized Estimating equation approach for longitudinal data; this approach is appropriate for complete data or when data are missing completely at random (MCAR).

2 When the data are missing at random (MAR), the weighted GEE method, which is implemented in the GEE procedure, produces valid inference. The weighted GEE method is described by Molenberghs and Kenward (2007); Fitzmaurice, Laird, and Ware (2011); Mallinckrodt (2013); O'Kelly and Ratitch (2014). The GENMOD Procedure The GENMOD procedure enables you to perform GEE analysis by specifying a REPEATED statement in which you provide clustering information and a working correlation matrix. The Generalized linear model estimates are used as the starting values. Both model-based and empirical standard errors of the parameter estimates are produced. Many correlation structures are available, including first-order autoregressive, exchangeable, independent, m-dependent, and unstructured.

3 You can also input your own correlation structures. The GENMOD procedure also provides the following: Type III tests for model effects CONTRAST, LSMEANS, and ESTIMATE statements alternating logistic regression estimation models for ordinal data The proportional odds model is a popular method of GEE analysis of ordinal data and is based on modeling cumulative logit functions. The GENMOD procedure also models cumulative probits and cumulative complementary log-log functions. 2 F. Example A study of the effects of pollution on children produced the following data. The binary response indicates whether children exhibited symptoms during the period of study at ages 8, 9, 10, and 11. A logistic regression is fit to the data with the explanatory variables age, city of residence, and a passive smoking index.

4 The correlations among the binary outcomes are modeled as exchangeable. The following statements create the data set Children and fit a GEE model by using the GENMOD procedure. data children;. input id city$ @ do i=1 to 4;. input age smoke symptom @ output;. end;. datalines;. 1 steelcity 8 0 1 9 0 1 10 0 1 11 0 0. 2 steelcity 8 2 1 9 2 1 10 2 1 11 1 0. 3 steelcity 8 2 1 9 2 0 10 1 0 11 0 0. 4 greenhills 8 0 0 9 1 1 10 1 1 11 0 0. 5 steelcity 8 0 0 9 1 0 10 1 0 11 1 0. 6 greenhills 8 0 1 9 0 0 10 0 0 11 0 1. 7 steelcity 8 1 1 9 1 1 10 0 1 11 0 0. 8 greenhills 8 1 0 9 1 0 10 1 0 11 2 0. 9 greenhills 8 2 1 9 2 0 10 1 1 11 1 0. 10 steelcity 8 0 0 9 0 0 10 0 0 11 1 0. 11 steelcity 8 1 1 9 0 0 10 0 0 11 0 1. 12 greenhills 8 0 0 9 0 0 10 0 0 11 0 0.

5 13 steelcity 8 2 1 9 2 1 10 1 0 11 0 1. 14 greenhills 8 0 1 9 0 1 10 0 0 11 0 0. 15 steelcity 8 2 0 9 0 0 10 0 0 11 2 1. 16 greenhills 8 1 0 9 1 0 10 0 0 11 1 0. 17 greenhills 8 0 0 9 0 1 10 0 1 11 1 1. 18 steelcity 8 1 1 9 2 1 10 0 0 11 1 0. 19 steelcity 8 2 1 9 1 0 10 0 1 11 0 0. 20 greenhills 8 0 0 9 0 1 10 0 1 11 0 0. 21 steelcity 8 1 0 9 1 0 10 1 0 11 2 1. 22 greenhills 8 0 1 9 0 1 10 0 0 11 0 0. 23 steelcity 8 1 1 9 1 0 10 0 1 11 0 0. 24 greenhills 8 1 0 9 1 1 10 1 1 11 2 1. 25 greenhills 8 0 1 9 0 0 10 0 0 11 0 0. ;. run;. proc genmod data=children;. class id city smoke;. model symptom = city age smoke / dist=bin type3;. repeated subject=id / type=exch covb corrw;. contrast 'Smoke=0 vs Smoke=1' smoke 1 -1 0;. run;. The REPEATED statement requests a GEE analysis.

6 The SUBJECT=ID option identifies ID as the clustering Example F 3. variable, and the TYPE=EXCH option specifies an exchangeable correlation structure. The TYPE3 option in the MODEL statement requests Type III statistics for each effect in the model. The CONTRAST statement requests a test that compares the first and second levels of the SMOKE effect. Output 1 GEE Analysis Results The GENMOD Procedure GEE Model Information Correlation Structure Exchangeable Subject Effect id (25 levels). Number of Clusters 25. Correlation Matrix Dimension 4. Maximum Cluster Size 4. Minimum Cluster Size 4. Covariance Matrix (Model-Based). Prm1 Prm2 Prm4 Prm5 Prm6. Prm1 Prm2 Prm4 Prm5 Prm6 Covariance Matrix (Empirical). Prm1 Prm2 Prm4 Prm5 Prm6. Prm1 Prm2 Prm4 Prm5 Prm6 Algorithm converged.

7 Working Correlation Matrix Col1 Col2 Col3 Col4. Row1 Row2 Row3 Row4 Exchangeable Working Correlation Correlation GEE Fit Criteria QIC QICu 4 F. Output 1 continued Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates 95%. Standard Confidence Parameter Estimate Error Limits Z Pr > |Z|. Intercept city greenhil city steelcit .. age smoke 0 smoke 1 smoke 2 .. Score Statistics For Type 3 GEE. Analysis Source DF Chi-Square Pr > ChiSq city 1 age 1 smoke 2 Contrast Results for GEE Analysis Contrast DF Chi-Square Pr > ChiSq Type Smoke=0 vs Smoke=1 1 Score The GEE Procedure For longitudinal studies, missing data are common, and they can be caused by dropouts or skipped visits. If missing responses depend on previous responses, the usual GEE approach can lead to biased estimates.

8 So the GEE procedure also implements the weighted GEE method to handle missing responses that are caused by dropouts in longitudinal studies (Robins and Rotnitzky 1995; Preisser, Lohman, and Rathouz 2002). The GEE procedure includes alternating logistic regression (ALR) analysis for binary and ordinal multinomial responses. In ordinary GEEs, the association between pairs of responses are modeled with correlations. The ALR approach provides an alternative by using the log odds ratio to model the association between pairs. For binary responses, the ALR algorithm of Carey, Zeger, and Diggle (1993) is implemented in both the GEE. and GENMOD procedures. PROC GEE also implements the ALR algorithm of Heagerty and Zeger (1996), which extends the ALR approach to ordinal multinomial responses.

9 An ordinary GEE with the independent working correlation structure is also available for both nominal and ordinal multinomial data. Example This example shows how you can use the GEE procedure to analyze longitudinal data that contain missing values. The data set is taken from a longitudinal study of women who used contraception during one year (Fitzmaurice, Laird, and Ware 2011). In this study, 1,151 women were randomly assigned to one of two Example F 5. treatments: 100 mg or 150 mg of depot medroxyprogesterone acetate (DMPA) at baseline and at three-month intervals. The response variable indicates the women's amenorrhea status during the four consecutive three- month intervals. The question of interest is whether the treatment has an effect on the rate of amenorrhea over time.

10 The example follows the analysis by Fitzmaurice, Laird, and Ware (2011). The following statements create the data set Amenorrhea: data Amenorrhea;. input ID Dose Time Y@ datalines;. 1 0 1 0. 1 0 2 . 1 0 3 . 1 0 4 .. more lines .. 1150 1 4 1. 1151 1 1 1. 1151 1 2 1. 1151 1 3 1. 1151 1 4 1. ;. The variables in the data are as follows: ID: patient's ID. Y: indicator of amenorrhea status (1 for amenorrhea; 0 otherwise). Time: four consecutive three-month intervals with values 1, 2, 3, and 4. Dose: 0 for treatment with 100 mg injection; 1 for treatment with 150 mg injection To prepare for the analysis, two additional variables are created: Prevy: the patient's amenorrhea status in the previous three-month interval. For the baseline visit, this is set to an arbitrary nonmissing value (0 here).


Related search queries