Example: biology

Lecture 3: Multiple Regression - Columbia University

Lecture 3: Multiple RegressionProf. Sharyn O Halloran Sustainable Development U9611 Econometrics II Spring 20052U9611 Outline Basics of Multiple Regression Dummy Variables Interactive terms Curvilinear models Review Strategies for Data Analysis Demonstrate the importance of inspecting, checking and verifying your data before accepting the results of your analysis. Suggest that Regression analysis can be misleading without probing data, which could reveal relationships that a casual analysis could overlook. Examples of Data ExplorationSpring 20053U9611 Multiple RegressionData:Data:Linear Regression models (Sect. )Linear Regression models (Sect.)

U9611 Spring 2005 12 Causation and Correlation Causal conclusions can be made from randomized experiments But not from observational studies One way around this problem is to start with a model of your phenomenon Then you test the implications of the model These observations can disprove the model’s hypotheses But they cannot prove these hypotheses

Tags:

  Lecture, University, Multiple, Columbia university, Columbia, Regression, Causation, Lecture 3, Multiple regression

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Lecture 3: Multiple Regression - Columbia University

1 Lecture 3: Multiple RegressionProf. Sharyn O Halloran Sustainable Development U9611 Econometrics II Spring 20052U9611 Outline Basics of Multiple Regression Dummy Variables Interactive terms Curvilinear models Review Strategies for Data Analysis Demonstrate the importance of inspecting, checking and verifying your data before accepting the results of your analysis. Suggest that Regression analysis can be misleading without probing data, which could reveal relationships that a casual analysis could overlook. Examples of Data ExplorationSpring 20053U9611 Multiple RegressionData:Data:Linear Regression models (Sect. )Linear Regression models (Sect.)

2 1. Model with 2 X s: (Y|X1,X2) = 0+ 1X1+ 2X22. Ex: Y: 1st year GPA, X1: Math SAT, X1:Verbal SAT3. Ex: Y= log(tree volume), X1:log(height), X2: log(diameter).. 20054U9611 Important notes about interpretation of Important notes about interpretation of ss Geometrically, 0+ 1X1+ 2X2describes a plane: For a fixed value of X1the mean of Y changes by 2for each one-unit increase in X2 If Y is expressed in logs, then Y changes 2% for each one-unit increase in X2, etc. The meaning of a coefficient depends on which explanatory variables are included! 1in (Y|X1) = 0+ 1X1is not the same as 1in (Y|X1,X2) = 0+ 1X1+ 2X2 Spring 20055U9611 Polynomial termsPolynomial terms, X2, for curvature (see Display ) Indicator variablesIndicator variablesto model effects of categorical variables One indicator variable (X=0,1) to distinguish 2 groups; Ex: X=1 for females, 0 for males (K-1) indicator variables to distinguish K groups.

3 Example: X2= 1 if fertilizer B was used, 0 if A or C was used X3= 1 if fertilizer C was used, 0 if A or B was used Product termsProduct termsfor interaction (Y|X1,X2) = 0+ 1X1+ 2X2+ 3(X1X2) (Y|X1,X2=7)= ( 0 + 7 2)+ ( 1 + 7 3) X1 (Y|X1,X2=-9)= ( 0 -9 2)+ ( 1 -9 3) X1 The effect of X1on Y depends on the level of X2 Specially constructed explanatory variablesSpecially constructed explanatory variablesSpring 20056U9611 Sex discrimination?Sex discrimination?Years ExperienceSalary++Gender?? Observation: Disparity in salaries between males and females. Theory: Salary is related to years of experience Hypothesis If no discrimination, gender should not matter Null Hypothesis H0: 2=0 2 1 Spring 20057U9611 Hypothetical sex discrimination exampleHypothetical sex discrimination exampleData: Yi= salary for teacher i, X1i= their years of experience,X2i= 1 for male teachers, 0 if they were a female0female172900031male72500040female 303900021male4230001X2 Gender X1Yi Gender :Categorical factorX2 Indicator variableSpring 20058U9611 Parallel lines model: (Y|X1,X2) = 0+ 1X1+ 2X2 for all females: (Y|X1,X2=0) = 0+ 1X1 for all males.

4 (Y|X1,X2=1) = 0+ 1X1+ 2 For the subpopulation of teachers at any particular years of experience, the mean salary for males is 2more than that for : 1 Intercepts: Males: 0+ 2 Females: 0 2 Model with Categorical VariablesModel with Categorical VariablesSpring 20059U9611 Model with Interactions (Y|X1,X2) = 0+ 1X1 + 2X2 + 3(X1X2)for all females: (Y|X1,X2=0) = 0+ 1X1for all males: (Y|X1,X2=1) = 0+ 1X1+ 2+ 3X1 The mean salary for inexperienced males (X1=0) is 2 (dollars) more than the mean salary for inexerienced females. The rate of increase in salary with increasing experience is 3(dollars) more for males than for : Males: 1+ 3 Females: 1 Slopes: Males: 0+ 2 Females: 0 Spring 200510U9611 Modelling curvature, parallel quadratic curves: (Y|X1,X2=1) = 0+ 1X1+ 2X2+ 3X12 Modelling curvature, parallel quadratic curves: (salary|.)

5 = 0+ 1exper+ 2 Gender+ 3exper2 Model with curvilinear effects: Spring 200511U9611 A t-test for H0: 0=0 in the Regression of Y on a single indicator variable IB, (Y|IB) = 0+ 2IB is the 2-sample (difference of means) t-test Regression when all explanatory variables are categorical is analysis of variance . Regression with categorical variables and one numerical X is often called analysis of covariance . These terms are used more in the medical sciences than social science. We ll just use the term Regression analysis for all these about indicator variablesNotes about indicator variablesSpring 200512U9611 causation and CorrelationCausation and Correlation Causal conclusions can be made from randomized experiments But not from observational studies One way around this problem is to start with a modelof your phenomenon Then you test the implications of the model These observations can disprovethe model s hypotheses But they cannot prove these hypotheses correct.

6 They merely fail to reject the nullSpring 200513U9611 Models and Tests A modelis an underlying theory about how the world works Assumptions Key players Strategic interactions Outcome set Models can be qualitative, quantitative, formal, experimental, etc. But everyoneuses models of some sort in their research Derive Hypotheses , as per capita GDP increases, countries become more democratic Test Hypotheses Collect Data Outcome and key explanatory variables Identify the appropriate functional form Apply the appropriate estimation procedures Interpret the resultsSpring 200514U9611 TheoryOperational HypothesisObservationMeasurementStatisti calTestEmpiricalFindingsThe traditional scientificapproachVirtuous cycle of theory informing data analysis which informs theory buildingSpring 200515U9611female education reduces childbearingWomen with higher education should have fewer children than those with less educationCBi= b0+ b1*educi+ residiIs b1significant?

7 Positive, negative? Magnitude?Example of a scientific approachUsing Ghana data? Women 15-49? Married or all women? How to measure education?Spring 200516U9611 Define the question of Interesta) Specify theory b) Hypothesis to be tested Explore the DataFormulate Inferential ModelDerived from theoryCheck Model:a) Model fitb) Examine residualsc) See if terms can be eliminatedInterpret results using appropriate toolsReview Study Designassumptions, logic, data availability, correct errors Confidence intervals, tests, prediction intervalsCheck for non-constant variance; assess outliersState hypotheses in terms of model parameters Use graphical tools; consider transformation; fit a tentative model.

8 Check outliersStrategies and Graphical ToolsStrategies and Graphical ToolsPresentation of results Tables, graphs, text2143 Model Model Not OKNot OKSpring 200517U9611 Data ExplorationData Exploration Graphical tools for exploration and communication: Matrix of scatterplots ( ) Coded scatterplot ( ) Different plotting codes for different categories Jittered scatterplot ( ) Point identification Consider transformations Fit a tentative model , linear, quadratic, interaction terms, etc. Check outliersSpring 200518U9611 Scatter plotsSTATA commandbrain weight data before log plot matrices provide a compact display of the relationship between a number of variable pairs.

9 Spring 200519U9611 Note the outliers in these commandScatter plotsScatter plot matrices can also indicate outliersbrain weight data before log 200520U9611 Scatterplot matrix for brain weight data after log transformationSpring 200521U9611 Notice: the outliers are now gone!Spring 200522U9611 Coded Scatter PlotsSTATA command Coded scatter plots are obtained by using different plotting codes for different categories. In this example, the variable time has two possible values (1,2). Such values are coded in the scatterplot using different 200523U9611 JitteringProvides a clearer view of overlapping 200524U9611 Point IdentificationSTATA commandHow to label points with 200525U9611 TransformationsSTATA commandThis variable is clearly skewed How should we correct it?

10 Spring 200526U9611 TransformationsStata ladder command shows normality test for various transformationsSelect the transformation with the lowest chi2statistic (this tests each distribution for normality). ladder enrollTransformation formula chi2(2) P(chi2)--------------------------------- ---------------------------------cubic enroll^3 . enroll^2 . enroll . sqrt(enroll) log(enroll) root 1/sqrt(enroll) 1/enroll square 1/(enroll^2).


Related search queries