Example: confidence

Dealing With and Understanding Endogeneity

Dealing with and Understanding Endogeneity Enrique Pinz n StataCorp LP. October 20, 2016. Barcelona (StataCorp LP) October 20, 2016 Barcelona 1 / 59. Importance of Endogeneity Endogeneity occurs when a variable, observed or unobserved, that is not included in our models, is related to a variable we incorporated in our model. Model building Endogeneity contradicts: I Unobservables have no effect or explanatory power I The covariates cause the outcome of interest Endogeneity prevents us from making causal claims Endogeneity is a fundamental concern of social scientists (first to the party). (StataCorp LP) October 20, 2016 Barcelona 2 / 59. Importance of Endogeneity Endogeneity occurs when a variable, observed or unobserved, that is not included in our models, is related to a variable we incorporated in our model. Model building Endogeneity contradicts: I Unobservables have no effect or explanatory power I The covariates cause the outcome of interest Endogeneity prevents us from making causal claims Endogeneity is a fundamental concern of social scientists (first to the party).

Dealing With and Understanding Endogeneity Enrique Pinzón StataCorp LP October 20, 2016 Barcelona (StataCorp LP) October 20, 2016 Barcelona 1 / 59

Tags:

  With, Understanding, Leading, Endogeneity, Dealing with and understanding endogeneity

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Dealing With and Understanding Endogeneity

1 Dealing with and Understanding Endogeneity Enrique Pinz n StataCorp LP. October 20, 2016. Barcelona (StataCorp LP) October 20, 2016 Barcelona 1 / 59. Importance of Endogeneity Endogeneity occurs when a variable, observed or unobserved, that is not included in our models, is related to a variable we incorporated in our model. Model building Endogeneity contradicts: I Unobservables have no effect or explanatory power I The covariates cause the outcome of interest Endogeneity prevents us from making causal claims Endogeneity is a fundamental concern of social scientists (first to the party). (StataCorp LP) October 20, 2016 Barcelona 2 / 59. Importance of Endogeneity Endogeneity occurs when a variable, observed or unobserved, that is not included in our models, is related to a variable we incorporated in our model. Model building Endogeneity contradicts: I Unobservables have no effect or explanatory power I The covariates cause the outcome of interest Endogeneity prevents us from making causal claims Endogeneity is a fundamental concern of social scientists (first to the party).

2 (StataCorp LP) October 20, 2016 Barcelona 2 / 59. Outline 1 Defining concepts and building our intuition 2 Stata built in tools to solve Endogeneity problems 3 Stata commands to address Endogeneity in non-built-in situations (StataCorp LP) October 20, 2016 Barcelona 3 / 59. Defining concepts and building our intuition (StataCorp LP) October 20, 2016 Barcelona 4 / 59. Building our Intuition: A Regression Model The regression model is given by: yi = 0 + 1 x1i + .. + k xki + i E ( i |x1i , .. , xki ) = 0. Once we have the information of our regressors, on average what we did not include in our model has no importance. E (yi |x1i , .. , xki ) = 0 + 1 x1i + .. + k xki (StataCorp LP) October 20, 2016 Barcelona 5 / 59. Building our Intuition: A Regression Model The regression model is given by: yi = 0 + 1 x1i + .. + k xki + i E ( i |x1i , .. , xki ) = 0. Once we have the information of our regressors, on average what we did not include in our model has no importance.

3 E (yi |x1i , .. , xki ) = 0 + 1 x1i + .. + k xki (StataCorp LP) October 20, 2016 Barcelona 5 / 59. Graphically (StataCorp LP) October 20, 2016 Barcelona 6 / 59. Examples of Endogeneity We want to explain wages and we use years of schooling as a covariate. Years of schooling is correlated with unobserved ability, and work ethic. We want to explain to probability of divorce and use employment status as a covariate. Employment status might be correlated to unobserved economic shocks. We want to explain graduation rates for different school districts and use the fraction of the budget used in education as a covariate. Budget decisions are correlated to unobservable political factors. Estimating demand for a good using prices. Demand and prices are determined simultaneously. (StataCorp LP) October 20, 2016 Barcelona 7 / 59. A General Framework If the unobservables, what we did not include in our model is correlated to our covariates then: E ( |X ) 6= 0.

4 Omitted variable bias . Simultaneity Functional form misspecification Selection bias . A useful implication of the above condition E X 0 6= 0.. (StataCorp LP) October 20, 2016 Barcelona 8 / 59. A General Framework If the unobservables, what we did not include in our model is correlated to our covariates then: E ( |X ) 6= 0. Omitted variable bias . Simultaneity Functional form misspecification Selection bias . A useful implication of the above condition E X 0 6= 0.. (StataCorp LP) October 20, 2016 Barcelona 8 / 59. A General Framework If the unobservables, what we did not include in our model is correlated to our covariates then: E ( |X ) 6= 0. Omitted variable bias . Simultaneity Functional form misspecification Selection bias . A useful implication of the above condition E X 0 6= 0.. (StataCorp LP) October 20, 2016 Barcelona 8 / 59. Example 1: Omitted Variable Bias . The true model is given by y = 0 + 1 x1 + 2 x2 + . E ( |x1 , x2 ) = 0.

5 The researcher does not incorporate x2 , they think y = 0 + 1 x1 + . The objective is to estimate 1 . In our framework we get a consistent estimate if E ( |x1 ) = 0. (StataCorp LP) October 20, 2016 Barcelona 9 / 59. Example 1: Omitted Variable Bias . The true model is given by y = 0 + 1 x1 + 2 x2 + . E ( |x1 , x2 ) = 0. the researcher does not incorporate x2 , they think y = 0 + 1 x1 + . The objective is to estimate 1 . In our framework we get a consistent estimate if E ( |x1 ) = 0. (StataCorp LP) October 20, 2016 Barcelona 9 / 59. Example 1: Endogeneity Using the definition of the true model y = 0 + 1 x1 + 2 x2 + . E ( |x1 , x2 ) = 0. We know that = 2 x2 + . and E ( |x1 ) = 2 E (x2 |x1 ). E ( |x1 ) = 0 only if 2 = 0 or x2 and x1 are uncorrelated (StataCorp LP) October 20, 2016 Barcelona 10 / 59. Example 1: Endogeneity Using the definition of the true model y = 0 + 1 x1 + 2 x2 + . E ( |x1 , x2 ) = 0. We know that = 2 x2 + . and E ( |x1 ) = 2 E (x2 |x1 ).

6 E ( |x1 ) = 0 only if 2 = 0 or x2 and x1 are uncorrelated (StataCorp LP) October 20, 2016 Barcelona 10 / 59. Example 1 Simulating Data . clear . set obs 10000. number of observations (_N) was 0, now 10,000.. set seed 111.. // Generating a common component for x1 and x2.. generate a = rchi2(1).. // Generating x1 and x2.. generate x1 = rnormal() + a . generate x2 = rchi2(2)-3 + a . generate e = rchi2(1) - 1.. // Generating the outcome . generate y = 1 - x1 + x2 + e (StataCorp LP) October 20, 2016 Barcelona 11 / 59. Example 1 Estimation . // estimating true model . quietly regress y x1 x2.. estimates store real . //estimating model with omitted variable . quietly regress y x1.. estimates store omitted . estimates table real omitted, se Variable real omitted x1 .00915198 .01482454. x2 .99993928..00648263. _cons .9920283 .32968254..01678995 .02983985. legend: b/se (StataCorp LP) October 20, 2016 Barcelona 12 / 59. Example 2: Simultaneity in a market equilibrium The demand and supply equations for the market are given by Qd = Pd + d Qs = Ps + s If a researcher wants to estimate Q d and ignores that P d is simultaneously determined, we have an Endogeneity problem that fits in our framework.

7 (StataCorp LP) October 20, 2016 Barcelona 13 / 59. Example 2: Assumptions and Equilibrium We assume: All quantities are scalars < 0 and > 0. E ( d ) = E ( s ) = E ( d s ) = 0. E 2d d2.. The equilibrium prices and quantities are given by: s d P =.. s d Q =.. (StataCorp LP) October 20, 2016 Barcelona 14 / 59. Example 2: Endogeneity This is a simple linear model so we can verify if E (Pd d ) = 0. Using our equilibrium conditions and the fact that s and d are uncorrelated we get . s d E (Pd d ) = E d . E ( s d ) E 2d . = .. 2.. E d = .. 2. = d . (StataCorp LP) October 20, 2016 Barcelona 15 / 59. Example 2: Endogeneity This is a simple linear model so we can verify if E (Pd d ) = 0. Using our equilibrium conditions and the fact that s and d are uncorrelated we get . s d E (Pd d ) = E d . E ( s d ) E 2d . = .. 2.. E d = .. 2. = d . (StataCorp LP) October 20, 2016 Barcelona 15 / 59. Example 2: Graphically (StataCorp LP) October 20, 2016 Barcelona 16 / 59.

8 Example 3: Functional Form Misspecification Suppose the true model is given by: y = sin(x) + . E ( |x) = 0. But the researcher thinks that: y = x + . (StataCorp LP) October 20, 2016 Barcelona 17 / 59. Example 3: Functional Form Misspecification Suppose the true model is given by: y = sin(x) + . E ( |x) = 0. But the researcher thinks that: y = x + . (StataCorp LP) October 20, 2016 Barcelona 17 / 59. Example 3: Real vs. Estimated Predicted values (StataCorp LP) October 20, 2016 Barcelona 18 / 59. Example 3: Endogeneity Adding zero we have y = x x + sin(x) + . y = x + . sin(x) x + . For our estimates to be consistent we need to have E ( |X ) = 0 but E ( |x) = sin(x) x + E ( |x). = sin(x) x . 6= 0. (StataCorp LP) October 20, 2016 Barcelona 19 / 59. Example 3: Endogeneity Adding zero we have y = x x + sin(x) + . y = x + . sin(x) x + . For our estimates to be consistent we need to have E ( |X ) = 0 but E ( |x) = sin(x) x + E ( |x). = sin(x) x.

9 6= 0. (StataCorp LP) October 20, 2016 Barcelona 19 / 59. Example 3: Endogeneity Adding zero we have y = x x + sin(x) + . y = x + . sin(x) x + . For our estimates to be consistent we need to have E ( |X ) = 0 but E ( |x) = sin(x) x + E ( |x). = sin(x) x . 6= 0. (StataCorp LP) October 20, 2016 Barcelona 19 / 59. Example 4: Sample Selection We observe the outcome of interest for a subsample of the population The subsample we observe is based on a rule For example we observe y if y 2 0. In a linear framework we have that: E (y |X1 , y2 0) = X1 + E ( |X1 , y2 0). If E ( |X1 , y2 0) 6= 0 we have selection bias In the classic framework this happens if the selection rule is related to the unobservables (StataCorp LP) October 20, 2016 Barcelona 20 / 59. Example 4: Endogeneity If we define X (X1 , y2 0) we are back in our framework E (y |X ) = X1 + E ( |X ). And we can define Endogeneity as happening when: E ( |X ) 6= 0. (StataCorp LP) October 20, 2016 Barcelona 21 / 59.

10 Example 4: Simulating data . clear . set seed 111.. quietly set obs 20000.. // Generating Endogenous Components .. matrix C = (1, .8\ .8, 1).. quietly drawnorm e v, corr (C).. // Generating exogenous variables .. generate x1 = rbeta(2 ,3).. generate x2 = rbeta(2 ,3).. generate x3 = rnormal().. generate x4 = rchi2(1).. // Generating outcome variables .. generate y1 = x1 - x2 + e . generate y2 = 2 + x3 - x4 + v . quietly replace y1 = . if y2 <=0. (StataCorp LP) October 20, 2016 Barcelona 22 / 59. Example 4: Estimation . regress y1 x1 x2, nocons Source SS df MS Number of obs = 14,847. F(2, 14845) = Model 2 Prob > F = Residual 14,845 .892750906 R-squared = Adj R-squared = Total 14,847 .990508004 Root MSE = .94485. y1 Coef. Std. Err. t P>|t| [95% Conf. Interval]. x1 .0290464 x2 .0287341 (StataCorp LP) October 20, 2016 Barcelona 23 / 59. What have we learnt Endogeneity manifests itself in many forms This manifestations can be understood within a general framework Mathematically E ( |X ) 6= 0 which implies E (X ) 6= 0.


Related search queries