Transcription of Path Analysis Introduction and Example
1 Path Analysis Introduction and ExampleJoel S Steele, PhDWinter 2017 Path AnalysisModel specificationThere two main ways of communicating the system of equations that represents a theoretical model. Eitherwith a set of simultaneous equations, or with a path diagram. Below we explore both and provide an Model AssumptionsFor this Example we will be accepting a number of All causal relations arelinearandadditive2. All models are recursive results in uncorrelated error terms notwo-waycausal relations no feedback loops3. Error terms are uncorrelated with other independent variables4. There is aweakcausal ordering5. Causal closure, meaning all of the relevant causal variables are included in the modelIf these assumptions are met, then we can use least squares regression for our estimation.
2 In what follows wewill be fitting our model to the standardized simple exampleFigure 1: Regression path diagramBased on Figure 1 we have a simple multiple regression, this is not any more difficult than what we have seenpreviously. Everything that we know from multiple regression should replicate in this situation. However,there is another aspect to this illustration that is important, namely that the goal of path modeling, and themultivariate extensions such as SEM and latent variable modeling, is to reproduce the variance-covariancematrix of the variables included. In this Example we will be using z-scores, so we will be interested inreproducing the correlation matrix among the variablesz1,z2, equation modeling approachThe equation that represents the path model above in Figure 1 can be expressed as,z1i= 12z2i+ 13z3i+ 1auai.
3 (1)In our following steps we will work to compute the correlations among each of the variables, based onthe model. That is, we will compute the correlations using Equation 1 above to see how each relations isdecomposed based on our theoretical order to compute the model-based expected correlation betweenz1andz2we will multiply both sides ofthe equation byz2and z1iz2i=1n 12z2iz2i+1n 13z3iz2i+1n 1auaiz2i1n z1iz2i= 121n z2iz2i+ 131n z3iz2i+ 1a1n uaiz2ir12= 12(1) + 13r23+ 1ara2It is important to note that, by assumption errors are uncorrelated with all other predictors, thusra2= this substitution we obtain,r12= 12+ 13r23(2)represents our model based estimation of the correlation order to compute the model-based expected correlation betweenz1andz3we will multiply both sides ofthe equation byz3and z1iz3i=1n 12z2iz3i+1n 13z3iz3i+1n 1auaiz3ir13= 12r23+ 13(1) + 0r13= 12r23+ 13(3)Parameter estimation of 12and 13 Now that we have the model implied correlations for bothr12andr13, we can focus on the estimation of theparameters 12and 13.
4 Starting from the model implied relations among the variables, the estimation ofthese parameters can be expressed using our earlier solutions in equations 2 and begin, we will focus on the estimation of 12. Our first step is to solve for the parameter 13from equation3. We do this in order to get an equation that expresses 13in terms of 12, we will need this to solve for 12r23+ 13 13=r13 this expression into equation 2 we obtain,2r12= 12+ (r13 12r23)r23r12= 12+r13r23 12r223r12 r13r23= 12 12r223r12 r13r23= 12(1 r223), 12=r12 r13r231 r223(4)A similar process can be performed for the estimation of Error of EstimationFinally, we will solve for the model based correlation ofz1iwith itself. We multiply through our structuralequation byz1i,1n z1iz1i=1n 12z2iz1i+1n 13z3iz1i+1n 1auaiz1i1 = 12r12+ 13r13+ 1ar1a 1ar1a= 1 ( 12r12+ 13r13)Recall that the multipleR2for a model is equal to kp=1 ypryp, wherekis the number of predictors for thevariabley.
5 In our above equation this translates toR2= 12r12+ 13r13, thus we can express the aboveequation as, 1ar1a= 1 R2.(5)You may also notice that sinceuaiis uncorrelated with any other predictor, the correlationr1a= 1a. Thisresults in our final expression of the equation 5, 21a= 1 R2 1a= 1 R2.(6)This last expression is our standard error of the estimate from the ExampleMotivationThe difference from what we have seen before is that now we are considering multiple equations with multipleoutcomes possible. Note that each equation is still for a single outcome, but we can consider the entire systemof equations. This allows us to not only see the influence of other inputs on relations among predictors andoutcomes, as withModeration, in this framework we are interested in the possible mechanisms of causal relations can be eitherdirectorindirectmeaning that they can operate through other data represent a subset of62academic professionals who were measured on a number of variablesincluding: sex: Biological sex of respondent (male=1) time: Time, in years, since earning their PhD pub: Number of publications cit: Number of citations salary: Annual salary in dollarsTable 1.
6 Descriptive statisticsmeansdmin max 37939 83503 45564 we present a path diagram in Figure 2, as well as the mathematical specification of the system ofequations in Equation correlationsIt is always informative to look at the raw associations among the variables before any modeling is is the correlation table for these 2: correlation raw datatime pubsexcit salarytime entire system can be expressed as,time sexpub sex+timecit sex+time+pubsalary sex+time+pub+cit(7)4 Figure 2: Path diagramModel fit using linear multiple regressionNext we explore what the estimates will be for each of our linear equations using the multiple regressionestimation sexEstimate Std.
7 Error t value Pr(>|t|) sex+timeEstimate Std. Error t value Pr(>|t|) sex+time+pubEstimate Std. Error t value Pr(>|t|) sex+time+pub+citEstimate Std. Error t value Pr(>|t|) Equation Modeling of the SystemNext we will use the R packagelavaanto fit the above model to the our (library( lavaan )) ='time ~ sexpub ~ sex + timecit ~ sex + time + pubsalary ~ sex + time + pub + cit'fit =sem( , data=dat)summary(fit, ) lavaan ( ) converged normally after 135 iterationsNumber of observations 62 Estimator MLMinimum Function Test Statistic of freedom 0 Model test baseline model:Minimum Function Test Statistic of freedom 10P-value model versus baseline model.
8 Comparative Fit Index (CFI) Index (TLI) and Information Criteria:Loglikelihood user model (H0) unrestricted model (H1) of free parameters 14 Akaike (AIC) (BIC) adjusted Bayesian (BIC) Mean Square Error of Approximation:RMSEA Percent Confidence Interval RMSEA <= NAStandardized Root Mean Square Residual:SRMR Estimates:Information Expected6 Standard Errors StandardRegressions:Estimate z-value P(>|z|)time ~sex ~sex ~sex ~sex :Estimate z-value P(>|z|).
9 Time comparisonsBelow we present tables of estimates from both the SEM as well as the multiple equations using 7: Standardized estimates from SEMlhsrhs pvaluetime time pub 8: Estimates from linear regression modelsEstimate Std. Error t value Pr(>|t|)time ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 9: correlation raw datatime pubsexcit salarytime