Example: tourism industry

Lecture: IV and 2SLS Estimators (Wooldridge’s book chapter 15)

Lecture: IV and 2 SLS Estimators (Wooldridge s book chapter 15)1 Endogeneity Endogeneity issue arises when the key regressor is correlated with the error (x,u)6=0(Endogeneity)(1) This can happen when (i) there are omitted variables; (ii) there is reverse causation orsimultaneity; (iii) there is measurement error In the presence of endogeneity, OLS estimator is biased p +bias(2)or equivalently, causal effect cannot be identifiedcov(y,x) = cov(x,x)+cov(x,u)(3) =cov(y,x) cov(x,u)cov(x,x)6=cov(y,x)cov(x,x)(4) The primary goal of econometrics is to resolve the endogeneity (identification) issue2 Instrumental Variables (IV) Can Help If there is endogeneity, 2 SLS or IV estimator based on validIVs is consistent ( can beidentified) IV is valid if (i) it is uncorrelated with error term (exogeneity); (ii) it is correlated withthe key regressor (relevance).

It is straightforward to account for heteroskedasticity. The robust variance-covariance matrix for bˆ 2SLS allowing for heteroskedasticity is robust var-cov(bˆ 2SLS)= X 0PX 1 X PWPX X PX 1 where W = E(UU0): To estimate the meat in the middle of that sandwich, using X0PWbPX = Xb0WbXb= n å i=1 uˆ2 i bx ibx 0 i where ˆu denotes the 2SLS ...

Tags:

  Robust, Estimator, Heteroskedasticity

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Lecture: IV and 2SLS Estimators (Wooldridge’s book chapter 15)

1 Lecture: IV and 2 SLS Estimators (Wooldridge s book chapter 15)1 Endogeneity Endogeneity issue arises when the key regressor is correlated with the error (x,u)6=0(Endogeneity)(1) This can happen when (i) there are omitted variables; (ii) there is reverse causation orsimultaneity; (iii) there is measurement error In the presence of endogeneity, OLS estimator is biased p +bias(2)or equivalently, causal effect cannot be identifiedcov(y,x) = cov(x,x)+cov(x,u)(3) =cov(y,x) cov(x,u)cov(x,x)6=cov(y,x)cov(x,x)(4) The primary goal of econometrics is to resolve the endogeneity (identification) issue2 Instrumental Variables (IV) Can Help If there is endogeneity, 2 SLS or IV estimator based on validIVs is consistent ( can beidentified) IV is valid if (i) it is uncorrelated with error term (exogeneity); (ii) it is correlated withthe key regressor (relevance).

2 (iii) it has no directeffect onyor it is excludedin thestructural form (exclusion) A valid IV is hard to find For instance, the number of raining days is a valid IV for watching TV if (1) it isuncorrelated with autism gene (exogeneity); (2) it is correlated with watching TV(relevance); (3) it cannot have direct effect on developing autism (exclusion). Noticethat it is allowed to indirectlyaffect autism through watching Form vs Reduced FormConsider a linear model in whichx2is assumed to be exogenousy= 1x1+ 2x2+u(5) We are interested in estimating 1that measures the marginal effect ofx1ony This is reduced form ifx1is also exogenous.

3 OLS can be applied to the reduced form This is structural form ifx1is endogenous. Most economics models are structuralforms. OLS becomes biased. Instead we may need to find x2cannot be used as IV. It satisfies exogeneity, and maybe relevance. But it does notsatisfy exclusion The valid IV should be an exogenous variable that matters forx1(relevance) but onlyhas indirect effect onythrough its effect onx1(exclusion) 1is just-identifiedif there is only one IV (excluded exogenous variable). In this case,2 SLS is also called IV estimator . 1is over-identifiedif there are multiple IVs. 1is under-identifiedif there is no excluded exogenous variable.

4 For instance, we have over-identification if we know the number of raining days and thenumber of snowy days. If only one is known, we have just Story You can think ofx1as a partially rotten apple consisting of two parts: the badendogenous part (correlated withu) and the good exogenous part (uncorrelated withu) OLS is bad since it uses the whole apple IV estimation is good because IV is used as knife to remove the endogenous part, andonly the exogenous part is used in the estimation. When people ask about your identification strategy, typically they wonder how the badpart of apple is removed or how the good part is isolated We hope the good part is big, , the IV andx1are not weaklyrelated It is a good idea to use more IV (over-identification) to isolate biggerexogenous part ofthe apple6 Big Picture7 Big Picture The box defines the structural model in whichydepends onx1,x2andu.

5 X1is the variable of interest, for which we want to quantify its marginal (causal) ,x1is endogenous because it is linked is biased because of thex1u link. To solve the endogeneity or identification issue, we need help an IV variablezwhichis outside the box (exclusion), is related tox1(relevance), and is unrelated tou(exogeneity) Notice thatx2is exogenous because there is no link be usedas IV because it is inside the box (fails exclusion). Instead,x2is called controlledvariable (included exogenous variable) Critical thinking: what if we do not control forx2? (Hint: think about the potential linkbetweenzandx2) You need to draw and justify this big picture if you decide to use IV methodology8 StataSuppose there are two valid stata command for 2 SLS estimator isivreg y (x1 = z1 z2) x2, first It is important to control forx2,which can make exogeneity condition more likely tohold forz1andz2 The optionfirstreports the first-stage regression that regressesx1 ontoz1,z2 residual of the first-stage regression is the bad part of apple, and can be used toimplement Hausman test.

6 The weak IV test is just the F-value for testing bothcoefficients ofz1andz2being zero. The fitted value of first-stage regression is the goodpart of apple, so is the IV variable used in the second-stage We obtain 2 SLS estimator by regressingyonto the first-stage fitted value andx2usingOLS (second-stage). The ivreg command does all these for you Important:z1,z2are excludedexogenous variables. whilex2is includedexogenousvariable (control variable).9 Three Little Pigs Story Recall the first stage regressionx=c1z1+c2z2+..cmzm+included exogenous variable+v (Hausman Test): The null hypothesis is that the regressor is exogenous (so OLS is goodand IV is not needed).

7 We run the first stage regression and save the residual an auxiliary regressiony=x +d vand testH0:d= p value indicates thatthe regressor is endogenous and IV is needed. (Stock-Yogo Test): The null hypothesis is thatc1=c2=..=cm=0,meaning that theIV is irrelevant (weak IV). We reject the null hypothesis if F statistic exceeds 10 (Over-identification or Sargan s J Test) The key coefficient is over-identified if thenumber of IV exceeds the number of endogenous regressor byq>0. In that case wecan test the null hypothesis that all IVs are exogenous. We run the auxiliary regression u2sls=a1z1+a2z2+.

8 Amzm+included exogenous regressorand computenR2 2(q).BignR2leads to rejection, so at least one IV is (Optional)Consider a simple regressiony= 0+ 1x+u,wherexis endogenouscov(x,u)6= a formula for the IV estimator , assume there is only one excluded exogenous variablezsatisfyingcov(z,u) =0 andcov(x,z)6= follows thatcov(y,z) =cov( 0+ 1x+u,z) = 1cov(x,z)(6) IV1=cov(y,z)cov(x,z)(7)Some old school people want to rewrite it as IV1=cov(y,z)/var(z)cov(x,z)/var(z)=reduc e-form OLS estimatefirst-stage OLS estimate(8)When there are multiple instrumental variables, the IV estimator is called 2 SLS estimator 2 SLS1=cov(y, x)cov(x, x)=cov(y, x)

9 Var( x)=OLS estimate of regressing y onto x(9)where xis the fitted value of regressingxonto the multiple IV variables (first-stageregression).11(Optional) Matrix Algebra I LetXbe the matrix for the regressors in the structural formX= (x1,x2).Notex1isendogenous whilex2is exogenous LetZbe the matrix for all exogenous variablesZ= (z1,z2,x2).Notex2is includedexogenous variable, whilez1,z2are excluded exogenous variables Define the projection matrixP=Z(Z Z) 1Z .The fitted value of first stage is X= Xis exogenous, and the fitted value forx2is itself The second stage uses Xas regressors and apply OLS 2 SLS=( X X) 1( X Y)(10)=(X PX) 1(X PY)(11)=(X Z(Z Z) 1Z X) 1(X Z(Z Z) 1Z Y)(12)where we use the fact thatPis symmetric and idempotentP =PandPP=P12(Optional) Matrix Algebra II It follows that 2 SLS= +(X PX) 1(X PU)So 2 SLSis unbiased if IVs are valid The variance-covariance matrix for 2 SLSis (assuming homoscedasticity)var-cov( 2 SLS) = 2(X PX) 1 CLT implies that in large sample 2 SLS N( , 2(X PX) 1)

10 And Wald statistic can be constructed to testH0:R =rWald Test=(R 2 SLS r) [R 2(X PX) 1R ] 1(R 2 SLS r)13No Free Lunch (trade off between unbiasedness and efficiency)Recall that the first-stage regression is basically a decompositionx= x+ r(13)which implies the following decomposition of total sum square (TSS)T SS=ESS+RSS,T SS ESS(14)or in this case, loosely speaking, we haveX X X PX,(X X) 1 (X PX) 1,var-cov( OLS) var-cov( 2 SLS)(15)In words, IV estimator is less efficient than OLS estimator by having bigger variance (andsmaller t value). Intuitively this is because only part of the apple is (Optional) Matrix Algebra IIIIt is straightforward to account for heteroskedasticity .


Related search queries