Transcription of Factor Analysis - Harvard University
1 Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst | The Harvard Clinical & Translational Science Center Short course, October 27, 2016. 1. Well-used latent variable models Latent Observed variable scale variable scale Continuous Discrete Continuous Factor Discrete FA. Analysis IRT (item response). LISREL. Discrete Latent profile Latent class Growth mixture Analysis , regression General software: MPlus, Latent Gold, WinBugs (Bayesian), NLMIXED (SAS). Objectives What is Factor Analysis ? What do we need Factor Analysis for?
2 What are the modeling assumptions? How to specify, fit, and interpret Factor models? What is the difference between exploratory and confirmatory Factor Analysis ? What is and how to assess model identifiability? 3. What is Factor Analysis Factor Analysis is a theory driven statistical data reduction technique used to explain covariance among observed random variables in terms of fewer unobserved random variables named factors 4. An Example: General Intelligence (Charles Spearman, 1904). Y1 1. Y2 2. General Y3 3. Intelligence Y4 4.
3 F. Y5 5. 5. Why Factor Analysis ? 1. Testing of theory Explain covariation among multiple observed variables by Mapping variables to latent constructs (called factors ). 2. Understanding the structure underlying a set of measures Gain insight to dimensions Construct validation ( , convergent validity). 3. Scale development Exploit redundancy to improve scale's validity and reliability 6. Part I. Exploratory Factor Analysis (EFA). 7. One Common Factor Model: Model Specification 1 Y1 1 Y1 = 1 F + 1. F. 2. Y2 2 Y2 = 2 F + 2.
4 3. Y3 3 Y3 = 3 F + 3. The Factor F is not observed; only Y1, Y2, Y3 are observed i represent variability in the Yi NOT explained by F. Yi is a linear function of F and i 8. One Common Factor Model: Model Assumptions 1 Y1 1. Y1 = 1 F + 1. Y2 = 2 F + 2. 2. Y2 2. F. 3. Y3 3 Y3 = 3 F + 3. Factorial causation F is independent of j, cov(F, j)=0. i and j are independent for i j, cov( i, j)=0. Conditional independence: Given the Factor , observed variables are independent of one another, cov( Yi ,Yj | F ) = 0. 9. One Common Factor Model: Model Interpretation Given all variables in standardized form, 1 Y1 1 var(Yi)=var(F)=1.
5 Factor loadings: i 2. Y2 2 i = corr(Yi,F). F. 3. Y3 3 Communality of Yi: hi2. hi2 = i2 = [corr(Yi,F)]2. =% variance of Yi explained by F. Y1 = 1 F + 1. Uniqueness of Yi: 1-hi2. Y2 = 2 F + 2 = residual variance of Yi Y3 = 3 F + 3 Degree of factorial determination: = i2/n, where n=# observed variables Y. 10. Two-Common Factor Model (Orthogonal): Model Specification Y1 1. 11 Y1 = 11F1 + 12 F2 + 1. 21. F1 Y2 2. 31. 41. Y2 = 21 F1 + 22 F2 + 2. 51. 61 Y3 = 31F1 + 32 F2 + 3. Y3 3. Y4 = 41 F1 + 42 F2 + 4. 12 22 . Y4 4. 32.
6 42 Y5 = 51F1 + 52 F2 + 5. F2 52 Y5 5 Y6 = 61 F1 + 62 F2 + 6. 62. Y6 6. F1 and F2 are common factors because they are shared by 2 variables ! 11. Matrix Notation with n variables and m factors Ynx1 = nxmFmx1 + nx1. Y1 11 1m 1 . F1 . = + .. Fm m 1 . Yn n1 nm n m n n 1. 12. Factor Pattern Matrix Columns represent derived factors 11 1m . Rows represent input variables . Loadings represent degree to which each . of the variables correlates with each of . the factors . Loadings range from -1 to 1 n1 nm n m Inspection of Factor loadings reveals extent to which each of the variables contributes to the meaning of each of the factors.
7 High loadings provide meaning and interpretation of factors (~ regression coefficients). 13. Two-Common Factor Model (Orthogonal): Model Assumptions Y1 1. Factorial causation 11 F1 and F2 are independent of j, F1. 21. Y2 2 cov(F1, j)= cov(F2, j)= 0. 31. 41 i and j are independent for i j, 51 cov( i, j)=0. 61. Y3 3 Conditional independence: Given factors F1 and F2, observed variables 12 22 . 32 Y4 4 are independent of one another, 42 cov( Yi ,Yj | F1, F2) = 0 for i j F2 52 Y5 5 Orthogonal (=independent): 62 cov(F1,F2)=0.
8 Y6 6. 14. Two-Common Factor Model (Orthogonal): Model Interpretation Given all variables in standardized form, var(Yi)=var(Fi)=1;. Y1 1. 11 AND orthogonal factors, cov(F1,F2)=0. 21. F1 31 Y2 2. 41 Factor loadings: ij 51 ij = corr(Yi,Fj). 61. Y3 3. Communality of Yi: hi2. 12 22 . 32 Y4 4 hi2 = i12 + i22=% variance of Yi 42 explained by F1 AND F2. F2 52 Y5 5. 62 Uniqueness of Yi: 1-hi2. Y6 6. Degree of factorial determination: = ij2/n, n=# observed variables Y 15. Two-Common Factor Model : The Oblique Case Given all variables in standardized form, var(Yi)=var(Fi)=1.
9 Y1 1. 11 AND oblique factors ( cov(F1,F2) 0). 21. F1 31 Y2 2. 41. The interpretation of Factor loadings: ij 51 is no longer correlation between Y and 61. Y3 3. F; it is direct effect of F on Y. 12 22 . 32 Y4 4 The calculation of communality of Yi 42 (hi2) is more complex F2 52 Y5 5. 62. Y6 6. 16. Extracting initial factors Least-squares method ( principal axis factoring with iterated communalities). Maximum likelihood method 17. Model Fitting: Extracting initial factors Least-squares method (LS) ( principal axis factoring with iterated communalities).
10 V Goal: minimize the sum of squared differences between observed and estimated corr. matrices v Fitting steps: a) Obtain initial estimates of communalities (h2). squared correlation between a variable and the remaining variables b) Solve objective function: det(RLS- I)=0, where RLS is the corr matrix with h2 in the main diag. (also termed adjusted corr matrix), is an eigenvalue c) Re-estimate h2. d) Repeat b) and c) until no improvement can be made 18. Model Fitting: Extracting initial factors Maximum likelihood method (MLE).