Example: stock market

Introduction to log-linear models

' $. Stat 504, Lecture 16 1. Introduction to log-linear models Key Concepts: Benefits of models Two-way log-linear models Parameters Constraints, Estimation and Interpretation Inference for log-linear models Objectives: Understand the structure of the log-linear models in two-way tables Understand the concepts of independence and associations described via log-linear models in two-way tables & %. ' $. Stat 504, Lecture 16 2. Useful Links: The CATMOD procedure in SAS: The GENMOD procedure in SAS: The SAS source on log-linear model analysis #stat_catmod_catmodllma Fitting log-linear models in R. Fitting log-linear models in R via generalized linear models (glm()). Readings: Agresti (2002) Ch. 8, 9. Agresti (1996) Ch.

analyze the simultaneous effects of multiple variables, including mixtures of categorical and continuous variables. For example, the Breslow-Day statistics only works for 2×2×K tables, while log-linear models will allow us to test of homogenous associations in I × J × K and higher-dimensional tables. The structural form of the model ...

Tags:

  Simultaneous

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Introduction to log-linear models

1 ' $. Stat 504, Lecture 16 1. Introduction to log-linear models Key Concepts: Benefits of models Two-way log-linear models Parameters Constraints, Estimation and Interpretation Inference for log-linear models Objectives: Understand the structure of the log-linear models in two-way tables Understand the concepts of independence and associations described via log-linear models in two-way tables & %. ' $. Stat 504, Lecture 16 2. Useful Links: The CATMOD procedure in SAS: The GENMOD procedure in SAS: The SAS source on log-linear model analysis #stat_catmod_catmodllma Fitting log-linear models in R. Fitting log-linear models in R via generalized linear models (glm()). Readings: Agresti (2002) Ch. 8, 9. Agresti (1996) Ch.

2 6, 7. & %. ' $. Stat 504, Lecture 16 3. Benefits of models over significance tests Thus far our focus has been on describing interactions or associations between two or three categorical variables mostly via single summary statistics and with significance testing. models can handle more complicated situation, and analyze the simultaneous effects of multiple variables, including mixtures of categorical and continuous variables. For example, the Breslow-Day statistics only works for 2 2 K tables, while log-linear models will allow us to test of homogenous associations in I J K. and higher-dimensional tables. The structural form of the model describes the patterns of interactions and associations. The model parameters provide measures of strength of associations.

3 & %. ' $. Stat 504, Lecture 16 4. In models , the focus is on estimating the model parameters. The basic inference tools ( , point estimation, hypothesis testing, and confidence intervals) will be applied to the these parameters. When discussing models , we will keep in mind Objective Model structure ( variables, formula, equation). Model assumptions Parameter estimates and interpretation Model fit ( goodness-of-fit tests and statstics). Model selection & %. ' $. Stat 504, Lecture 16 5. For example, recall a simple linear regression model Objective: model the expected value of a continuous variable, Y , as a linear function of the continuous predictor, X, E(Yi ) = 0 + 1 xi Model structure: Yi = 0 + 1 xi + ei Model assumptions: Y is is normally distributed, ei N (0, 2 ), and independent, and X is fixed, and constant variance 2.

4 Parameter estimates and interpretation: 0 is estimate of 0 or the intercept, and 1 is estimate of the slope, What is the interpretation of the slope? Model fit: R2 , residual analysis, F-statistic Model selection See handout labeled as on modeling average water usage given the amount of bread production: Water = 2273 + Production & %. ' $. Stat 504, Lecture 16 6. Two-way ANOVA. Does the amount of sunlight and watering affect the growth of geraniums? Objective: model the continuous response as function of two factors. Model structure: Yijk = + i + j + ij + eijk with eijk N (0, 2 ), i = 1, .., I, j = 1, .., J , and k = 1, .., nij Model assumptions: At each combination of levels the outcome is normally distributed with the same variance: yijk N ( ij , 2 ), where & %.

5 Ij = E(yijk ) = + i + j + ij ' $. Stat 504, Lecture 16 7. This model is over-parametrized because term ij already has I J parameters corresponding to the cell means ij . The constant, , and the main effects, i and j give us additional 1 + I + J parameters. We use constraints such as P. i i =. P. j j =. P P. i j ij = 0, to deal with this overparametrization. Does level of watering affect the growth of potted geraniums? (Is there a significant main effect for factor A?, H0 : i = 0 for all i). Does level of sunlight affect the growth of potted geraniums? (Is there a significant main effect for factor B?). Does the effect of level of sunlight depend on level of watering? (Is there a significant interaction between factors A and B?)

6 & %. ' $. Stat 504, Lecture 16 8. Analysis of Variance for YIELD. Source DF SS MS F P. WATER 1 SUNLIGHT 1 Interaction 1 Error 12 Total 15 Individual 95% CI. WATER Mean ------+---------+---------+---------+--- -- HIGH (------*------). LOW (------*------). ------+---------+---------+---------+--- -- Individual 95% CI. SUNLIGHT Mean ----+---------+---------+---------+----- -- HIGH (--------------*-------------). LOW (-------------*--------------). ----+---------+---------+---------+----- -- & %. ' $. Stat 504, Lecture 16 9. Two-way log-linear Model Now let ij be the expected counts, E(nij ), in an I J table. An analogous model to two-way ANOVA. is log( ij ) = + i + j + ij or in the notation used by Agresti log ( ij ) = + A B AB.

7 I + j + ij P P P P. with constraints: i i = j j = i j ij = 0, to deal with overparametrization. log-linear models specify how the cell counts depend on the levels of categorical variables. They model the association and interaction patterns among categorical variables. The log-linear modeling is natural for Poisson, Multinomial and Product-Mutlinomial sampling. They are appropriate when there is no clear distinction between response and explanatory & %. variables, or there are more than two responses. ' $. Stat 504, Lecture 16 10. Example: General Social Survey Cross-classification of respondents according to choice for the president in 1992 presidental election (Bush, Clinton, Perot) and political view on the 7.

8 Point scale (extremely liberal, liberal, slightly liberal, moderate, slightly conservative, conservative, extremely conservative). :7502/D3/GSS96/ Let's consider a 3 3 table: Bush Clinton Perot Total Liberal 70 324 56 450. Moderate 195 332 101 628. Conservative 382 199 117 698. Total 647 855 274 1774. Are political view and choice independent? You already know how to answer this via chi-square test of independence, but now we want to model the cell counts with the log-linear model of independence & %. and ask if this model fits well. ' $. Stat 504, Lecture 16 11. Two-way log-linear models Given two categorical random variables, A and B, there are two main models we will consider: Independence model, (A, B). Saturated model, (AB).

9 Objective: Model the cell counts: ij = n ij Main assumption: The N = IJ counts in the cells are assumed to be independent observations of a Poisson random variable. & %. ' $. Stat 504, Lecture 16 12. log-linear model of independence for 2-way tables Recall the independence in terms of cell probabilities as a product of marginal probabilities: ij = i+ +j i = 1, .., I, j = 1, .., J. in terms of cell frequencies: ij = n ij = i+ +j i = 1, .., I, j = 1, .., J. By taking logarithms of the expected number of counts we obtain the loglinear model of independence: log ij = log n + log i+ + log +j log ( ij ) = + A B. i + j where A and B stand for two categorical variables. & %. ' $. Stat 504, Lecture 16 13. log ( ij ) = + A B.

10 I + j This is an ANOVA type-representation where, represents an overall effect, or a grand mean of the logarithms of the expected counts, and it ensures that i j ij = n P P. Ai represents a main effect of variable A, or a deviation from a grand mean, and it ensures that P. j ij = ni+ . It represents the effect of classification in row i. B. j represents a main effect of variables B, or a deviation from a grand mean, and it ensures that P. i ij = n+j. This is the effect of classification in ??? and, A B. I = J = 0. & %. ' $. Stat 504, Lecture 16 14. The ML fitted values are the same as expected values under the test of independence: Thus, the X 2 and G2 for the test of independence are goodness-of-fit statistics for the loglinear model of independence testing that the independence model holds vs.


Related search queries