Example: dental hygienist

Introduction to log-linear models

Stat 504, Lecture 161'&$% Introduction tolog-linear modelsKey Concepts: Benefits of models Two-way log-linear models Parameters Constraints, Estimation andInterpretation Inference for log-linear modelsObjectives: Understand the structure of the log-linear modelsin two-way tables Understand the concepts of independence andassociations described via log-linear models intwo-way tablesStat 504, Lecture 162'&$%Useful Links: The CATMOD procedure in SAS: The GENMOD procedure in SAS: The SAS source on log-linear model #stat_catmod_catmodllma Fitting log-linear models in Fitting log-linear models in R via generalizedlinear models (glm()) : Agresti (2002) Ch.

of PROC FREQ and PROC GENMOD procedures. Statistics for Table of pview by choice Statistic DF Value Prob-----Chi-Square 4 238.5354 <.0001 Likelihood Ratio Chi-Square 4 247.6951 <.0001... Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 4 247.6951 61.9238

Tags:

  Corps, Freq, Proc freq

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Introduction to log-linear models

1 Stat 504, Lecture 161'&$% Introduction tolog-linear modelsKey Concepts: Benefits of models Two-way log-linear models Parameters Constraints, Estimation andInterpretation Inference for log-linear modelsObjectives: Understand the structure of the log-linear modelsin two-way tables Understand the concepts of independence andassociations described via log-linear models intwo-way tablesStat 504, Lecture 162'&$%Useful Links: The CATMOD procedure in SAS: The GENMOD procedure in SAS: The SAS source on log-linear model #stat_catmod_catmodllma Fitting log-linear models in Fitting log-linear models in R via generalizedlinear models (glm()) : Agresti (2002) Ch.

2 8, 9 Agresti (1996) Ch. 6, 7 Stat 504, Lecture 163'&$%Benefits of models over significance testsThus far our focus has been on describing interactionsor associations between two or three categoricalvariables mostly via single summary statistics andwith significance can handle more complicated situation, andanalyze the simultaneous effects of multiple variables,including mixtures of categorical and example, the Breslow-Day statistics only worksfor 2 2 Ktables, whilelog-linearmodels will allowus to test of homogenous associations inI J Kand higher-dimensional structural form of the model describes thepatterns of interactions and associations.

3 The modelparameters provide measures of strength 504, Lecture 164'&$%In models , the focus is on estimating the modelparameters. The basic inference tools ( , pointestimation, hypothesis testing, and confidenceintervals) will be applied to the these discussing models , we will keep in mind Objective Model structure ( variables, formula,equation) Model assumptions Parameter estimates and interpretation Model fit ( goodness-of-fit tests and statstics) Model selectionStat 504, Lecture 165'&$%For example, recall asimple linear regression model Objective.

4 Model the expected value of acontinuous variable,Y, as a linear function of thecontinuous predictor,X,E(Yi) = 0+ 1xi Model structure:Yi= 0+ 1xi+ei Model assumptions:Yis is normally distributed,ei N(0, 2),and independent, and X is fixed,and constant variance 2. Parameter estimates and interpretation: 0isestimate of 0or the intercept, and 1is estimateof the slope, What is the interpretation ofthe slope? Model fit:R2, residual analysis, F-statistic Model selectionSee handout labeled as onmodeling average water usage given the amount ofbread production:Water = 2273 + ProductionStat 504, Lecture 166'&$%Two-way ANOVADoes the amount of sunlight and watering affect thegrowth of geraniums?

5 Objective: model the continuous response as functionof two structure:Yijk= + i+ j+ ij+eijkwitheijk N(0, 2),i= 1, .., I, j= 1, .., J, andk= 1, .., nijModel assumptions: At each combination of levels theoutcome is normally distributed with the samevariance:yijk N( ij, 2), where ij=E(yijk) = + i+ j+ ijStat 504, Lecture 167'&$%This model is over-parametrized because term ijalready hasI Jparameters corresponding to thecell means ij. The constant, , and the main effects, iand jgive us additional 1 +I+ useconstraintssuch asPi i=Pj j=PiPj ij= 0,to deal with level of watering affect the growth of pottedgeraniums?

6 (Is there a significant main effect forfactor A?, : i= 0 for alli)Does level of sunlight affect the growth of pottedgeraniums? (Is there a significant main effect forfactor B?)Does the effect of level of sunlight depend on level ofwatering? (Is there a significant interaction betweenfactors A and B?)Stat 504, Lecture 168'&$%Analysis of Variance for YIELDS ource DF SS MS F PWATER 1 1 1 12 15 95% CIWATER Mean ------+---------+---------+---------+--- --HIGH (------*------)LOW (------*------)

7 ------+---------+---------+---------+ 95% CISUNLIGHT Mean ----+---------+---------+---------+----- --HIGH (--------------*-------------)LOW (-------------*--------------)----+----- ----+---------+---------+ 504, Lecture 169'&$%Two-way log-linear ModelNow let ijbe the expected counts,E(nij), in anI Jtable. An analogous model to two-way ANOVA islog( ij) = + i+ j+ ijor in the notation used by Agrestilog ( ij) = + Ai+ Bj+ ABijwith constraints:Pi i=Pj j=PiPj ij= 0,todeal with models specify how the cell counts dependon the levels of categorical variables.

8 They model theassociation and interaction patterns amongcategorical log-linear modeling is natural for Poisson,Multinomial and Product-Mutlinomial are appropriate when there is no cleardistinction between response and explanatoryvariables, or there are more than two 504, Lecture 1610'&$%Example: General Social SurveyCross-classification of respondents according tochoicefor the president in 1992 presidental election(Bush, Clinton, Perot) andpolitical viewon the 7point scale (extremely liberal, liberal, slightly liberal,moderate, slightly conservative, conservative,extremely conservative) :7502/D3/GSS96/ s consider a 3 3 table:BushClintonPerotTotalLiberal703245 6450 Moderate195332101628 Conservative382199117698 Total6478552741774 Are political view and choice independent?

9 You already know how to answer this via chi-squaretest of independence, but now we want to model thecell counts with the log-linear model of independenceand ask if this model fits 504, Lecture 1611'&$%Two-way log-linear modelsGiven two categorical random variables,AandB,there are two main models we will consider: Independence model, (A, B) Saturated model, (AB)Objective: Model the cell counts: ij=n ijMain assumption: TheN=IJcounts in the cells areassumed to be independent observations of a Poissonrandom 504, Lecture 1612'&$% log-linear model of independence for 2-waytablesRecall the independence in terms of cell probabilitiesas a product of marginal probabilities: ij= i+ +ji= 1.

10 , I, j= 1, .., Jin terms of cell frequencies: ij=n ij= i+ +ji= 1, .., I, j= 1, .., JBy taking logarithms of the expected number ofcounts we obtain theloglinear model ofindependence:log ij= logn+ log i++ log +jlog ( ij) = + Ai+ BjwhereAandBstand for two categorical 504, Lecture 1613'&$%log ( ij) = + Ai+ BjThis is an ANOVA type-representation where, represents an overall effect, or a grand mean ofthe logarithms of the expected counts, and it ensuresthatPiPj ij=n Airepresents a main effect of variableA, or adeviation from a grand mean, and it ensures thatPj ij=ni+.


Related search queries