Example: stock market

ECON4150 - Introductory Econometrics Lecture 13: Internal ...

ECON4150 - Introductory EconometricsLecture 13: Internal and external validityMonique de and Watson Chapter 92 Lecture outline Definitions of Internal and external validity Threats to Internal validity Omitted variables Functional form misspecification Measurement error Sample selection Simultaneous causality Heteroskedasticity and/or correlated error terms Threats to external validity Differences in populations Differences in settings Internal and external validity when regression analysis is used forforecasting3 Correlation does not imply causation!!4 Correlation does not imply causation!!5 Correlation does not imply causation!!T h e n e w e n g l a n d j o u r n a l o f m e d i c i n en engl j med i s c u s s i o nThe principal finding of this study is a surpris-ingly powerful correlation between chocolate intake per capita and the number of Nobel laure-ates in various countries.

6 Definitions of internal and external validity Internal validity:the statistical inferences about causal effects are valid for the population and setting being studied. External validity:the statistical inferences can be generalized from the

Tags:

  Lecture, Internal, Econometrics, Introductory, External, Validity, External validity, Econ4150 introductory econometrics lecture 13, Econ4150, Internal and external validity internal validity

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of ECON4150 - Introductory Econometrics Lecture 13: Internal ...

1 ECON4150 - Introductory EconometricsLecture 13: Internal and external validityMonique de and Watson Chapter 92 Lecture outline Definitions of Internal and external validity Threats to Internal validity Omitted variables Functional form misspecification Measurement error Sample selection Simultaneous causality Heteroskedasticity and/or correlated error terms Threats to external validity Differences in populations Differences in settings Internal and external validity when regression analysis is used forforecasting3 Correlation does not imply causation!!4 Correlation does not imply causation!!5 Correlation does not imply causation!!T h e n e w e n g l a n d j o u r n a l o f m e d i c i n en engl j med i s c u s s i o nThe principal finding of this study is a surpris-ingly powerful correlation between chocolate intake per capita and the number of Nobel laure-ates in various countries.

2 Of course, a correla-tion between X and Y does not prove causation but indicates that either X influences Y, Y influ-ences X, or X and Y are influenced by a common underlying mechanism. However, since choco-late consumption has been documented to im-prove cognitive function, it seems most likely that in a dose-dependent way, chocolate intake provides the abundant fertile ground needed for the sprouting of Nobel laureates. Obviously, these findings are hypothesis-generating only and will have to be tested in a prospective, ran-domized only possible outlier in Figure 1 seems to be Sweden. Given its per capita chocolate con-sumption of kg per year, we would predict that Sweden should have produced a total of about 14 Nobel laureates, yet we observe 32. Considering that in this instance the observed number exceeds the expected number by a fac-tor of more than 2, one cannot quite escape the notion that either the Nobel Committee in Stockholm has some inherent patriotic bias when assessing the candidates for these awards or, perhaps, that the Swedes are particularly sensitive to chocolate, and even minuscule amounts greatly enhance their second hypothesis, reverse causation that is, that enhanced cognitive performance could stimulate countrywide chocolate con-sumption must also be considered.

3 It is con-ceivable that persons with superior cognitive function ( , the cognoscenti) are more aware of the health benefits of the flavanols in dark chocolate and are therefore prone to increasing their consumption. That receiving the Nobel Prize would in itself increase chocolate intake countrywide seems unlikely, although perhaps celebratory events associated with this unique 35302520105150051015 Chocolate Consumption (kg/yr/capita)Nobel Laureates per 10 Million PopulationPolandSwitzerlandSwedenNorwayC hinaBrazilGreecePortugalUnited StatesGermanyFranceFinlandItalyAustralia The NetherlandsCanadaBelgiumUnited KingdomIrelandSpainAustriaDenmarkr = < 1. Correlation between Countries Annual Per Capita Chocolate Consumption and the Number of Nobel Laureates per 10 Million New England Journal of Medicine Downloaded from by MARCO VITORIA on October 10, 2012.

4 For personal use only. No other uses without permission. Copyright 2012 Massachusetts Medical Society. All rights reserved. 6 Definitions of Internal and external validityInternal validity : the statistical inferences about causal effects are valid forthe population and setting being validity : the statistical inferences can be generalized from thepopulation and setting studied to other populations andsettings7 Internal validity in an OLS regression modelSuppose we are interested in the causal effect ofX1onYand we estimatethe following regression modelYi= 0+ 1X1i+uiInternal validity has two components:1 The OLS estimator of 1is unbiased and consistent1E[ 1]= 12plimn ( 1)= 12 Hypothesis tests should have the desired significance level andconfidence intervals should have the desired confidence validity in an OLS regression model Wednesday February 12 12:05.

5 58 2014 Page 1 ___ ____ ____ ____ ____(R) /__ / ____/ / ____/ ___/ / /___/ / /___/ Statistics/Data Analysis 1. regress ln_earnings education Source SS df MS Number of obs = 602 F( 1, 600) = Model 1 Prob > F = Residual 600 .276691993 R-squared = Adj R-squared = Total 601 .327726767 Root MSE = .52602 ln_earnings Coef. Std. Err. t P>|t| [95% Conf. Interval] education .0932827 .0088202 .0759605.

6 110605 _cons .1243055 Is this regression internally valid? Is the causal effect of an additional year of education on average hourlyearnings equal to If we increase the education of a random sample of individuals in by one year does this increase their average hourly earnings to Internal validityThe 3 assumptions of an OLS regression model:1E(ui|X1i) =02(X1i,Yi),i=1,..Nare independently and identically distributed3 Big outliers are to Internal validity : Omitted variables Functional form misspecification Measurement error Sample selection Simultaneous causality Heteroskedasticity and/or correlated error termsThe first 5 are violations of assumption (1) the last one is a violation ofassumption (2).10 Omitted variables Suppose we want to estimate the causal effect ofX1ionYi. Thetruepopulation regression model isYi= 0+ 1X1i+ 2X2i+wi uiwithE[wi|X1i,X2i] =0 But we estimate the following modelYi= 0+ 1X1i+ui We have thatplimn ( 1)=Cov(X1i,Yi)Var(X1i)= 1+Cov(X1i,ui)Var(X1i)= 1+Cov(X1i, 2X2i+wi)Var(X1i)= 1+Cov(X1i, 2X2i)+Cov(X1i,wi)Var(X1i)= 1+ 2 Cov(X1i,X2i)Var(X1i)11 Omitted variablesplimn ( 1)= 1+ 2 Cov(X1i,X2i)Var(X1i) An omitted variableX2ileads to an inconsistent OLS estimate of thecausal effect ofX1iif1 The omitted variableX2iis a determinant of the dependent variableYi 26=02 The omitted variableX2iis correlated with the regressor of interestX1i Cov(X1i,X2i)

7 6=0 Only if there exists 1 or more variables that satisfy both conditions the OLS regression is not internally valid The OLS estimator does not provide a unbiased an consistentestimate of the causal effect ofX1i12 Omitted variables Are there important omitted variables in the returns to educationregression in slide 7? Important and often discussed omitted variable is ability1 Ability is likely a determinant of earnings2 Ability is likely correlated with education Since we expect 2>0 andCov(X1i,X2i)>0plimn ( 1)= 1+ 2 Cov(X1i,X2i)Var(X1i)> 1 Omitting ability from the regression will lead OLS to overestimate theeffect of educaion on earnings! But can we include ability as independent variable in the regression?13 Functional form misspecification Suppose that thetruepopulation regression model isYi= 0+ 1X1i+ 2X21i+wi uiwithE[wi|X1i] =0 But we estimate the following modelYi= 0+ 1X1i+ui We have thatplimn ( 1)= 1+Cov(X1i,ui)Var(X1i)= 1+Cov(X1i, 2X21i+wi)Var(X1i)= 1+ 2 Cov(X1i,X21i)Var(X1i) if 26=0, the simple linear regression model is not internally valid Cov(X1i,X21i)6=0 by form misspecificationShould we include education squared in the regression model?

8 Wednesday February 12 14:34:52 2014 Page 1 ___ ____ ____ ____ ____(R) /__ / ____/ / ____/ ___/ / /___/ / /___/ Statistics/Data Analysis 1. regress ln_earnings education Source SS df MS Number of obs = 602 F( 1, 600) = Model 1 Prob > F = Residual 600 .276691993 R-squared = Adj R-squared = Total 601 .327726767 Root MSE = .52602 ln_earnings Coef. Std. Err. t P>|t| [95% Conf. Interval] education .0932827 .0088202.

9 0759605 .110605 _cons .1243055 regress ln_earnings education education2 Source SS df MS Number of obs = 602 F( 2, 599) = Model 2 Prob > F = Residual 599 .27487877 R-squared = Adj R-squared = Total 601 .327726767 Root MSE = .52429 ln_earnings Coef. Std. Err. t P>|t| [95% Conf. Interval] education .0686496 .0765074 education2 .0054138 .0024314 .0006387 .0101889 _cons .4785439 form misspecification12345logaritm of average hourly earnings567891011121314151617181920years of educationlinear modelquadratic model For major part of the support, linear and quadratic models are errorThere are different types of measurement error1 Measurement error in the independent variableX Classical measurement error Measurement error correlated withX Both types of measurement error inXare a violation of internalvalidity2 Measurement error in the dependent variableY Less problematic than measurement error inX Usually not a violation of Internal validity Leads to less precise estimates17 Measurement error in X.

10 Classical measurement error Suppose we have the following population regression modelYi= 0+ 1X1i+uiwithE[ui|X1i] =0 Suppose that we do not observeX1ibut we observe X1ia noisymeasure ofX1i X1i=X1i+ i Adding and subtracting 1 X1igivesYi= 0+ 1 X1i+ 1(X1i X1i) +ui= 0+ 1 X1i 1 i+ui Classical measurement error:Cov(X1i, i) =0,Cov( i,ui) =0,E[ i] =0,Var( i) = 2 For example: measurement error due to someone making randommistakes when imputing data in a error in X: classical measurement error Suppose we estimate the following regression modelYi= 0+ 1 X1i+eiwithei= 1 i+ui With classical measurement error the OLS estimate of 1is ( 1)= 1+Cov( X1i,ei)Var( X1i) Substituting X1i=X1i+ iandei= 1 i+uigivesplimn ( 1)= 1+Cov(X1i+ i, 1 i+ui)Var(X1i+ i)19 Measurement error in X: classical measurement error From the previous slide we have:plimn ( 1)= 1+Cov(X1i+ i, 1 i+ui)Var(X1i+ i) Using thatCov(X1i, i) =Cov(X1i,ui) =Cov( i,ui) =0plimn ( 1)= 1 1 Cov( i, i)Var(X1i)+Var( i)= 1(1 Var( i)Var(X1i)+Var( i))= 1(Var(X1i)+Var( i)Var(X1i)+Var( i) Var( i)Var(X1i)+Var( i))= 1(Var(X1i)Var(X1i)+ 2 ) With classical measurement error 1is biased towards 0!


Related search queries