Example: marketing

Econometrics II Lecture 2: Discrete Choice Models

Econometrics IILecture 2: Discrete Choice ModelsM ns S derbom 4 April 2011 University of Gothenburg. IntroductionLinear regression is primarily designed for modelling acontinuous,quantitativevariable - economicgrowth, the log of value-added or output, the log of earnings economic phenomena of interest, however, concern variables that are not continuous or perhapsnot even quantitative. What characteristics ( parental) a ect the likelihood that an individual obtains a higher degree? What determines labour force participation (employed vs not employed)? What factors drive the incidence of civil war?Today we will discussbinary Choice Models . These are central Models in applied binary Choice Models are useful when our outcome variable of interest is binary - a commonsituation in applied work.

1. Introduction Linearregressionisprimarilydesignedformodellingacontinuous, quantitativevariable-e.g. economic growth, the log of value-added or output, the log of ...

Tags:

  Lecture, Model, Discrete, Choice, Econometrics, Econometrics ii lecture 2, Discrete choice models

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Econometrics II Lecture 2: Discrete Choice Models

1 Econometrics IILecture 2: Discrete Choice ModelsM ns S derbom 4 April 2011 University of Gothenburg. IntroductionLinear regression is primarily designed for modelling acontinuous,quantitativevariable - economicgrowth, the log of value-added or output, the log of earnings economic phenomena of interest, however, concern variables that are not continuous or perhapsnot even quantitative. What characteristics ( parental) a ect the likelihood that an individual obtains a higher degree? What determines labour force participation (employed vs not employed)? What factors drive the incidence of civil war?Today we will discussbinary Choice Models . These are central Models in applied binary Choice Models are useful when our outcome variable of interest is binary - a commonsituation in applied work.

2 Moreover, the binary Choice model is often used as aningredientin othermodels. For example: Inpropensity score matchingmodels (to be covered in Lecture 3), we identify the averagetreatment e ect by comparing outcomes of treated and non-treated individuals who, a priori, havesimilar probabilities of being treated. The probability of being treated is typically modelled usingprobit. InHeckman s selection model , we use probit in the rst stage to predict the likelihood thatsomeone is included (selected) in the sample. We then control for the likelihood of being selectedwhen estimating our equation of interest ( a wage equation)The binary Choice model is also a good starting point if we want to study more complicated on in the course we will thus coverextensionsof the binary Choice model , such as Models formultinomial or ordered response, and Models combining continuous and Discrete outcomes ( cornerresponse Models ).

3 Useful references for this Lecture :2 Greene, W (2008).Econometric Analysis, 6th , Joshua and J rn-Stefen Pischke (2009).Mostly Harmless Econometrics . An Empiricist addition, for my empirical examples I will draw on material presented in the following paper:.Kingdon, G. (1996) The quality and e ciency of private and public education: a case-study of urbanIndia, Oxford Bulletin of Economics and Statistics58: 57-812. Binary ResponseWhenever the variable that we want to model is binary, it is natural to think in terms ofprobabilities, What is the probability that an individual with such and such characteristics owns a car? If some variable X changes by one unit, what is the e ect on the probability of owning a car? When the dependent variableyis binary, it is typically equal to one for all observations in the data forwhich the event of interest has happened ( success ) and zero for the remaining observations ( failure ).

4 Provided we have a random sample, the sample mean of this binary variable is an unbiased estimateof the unconditional probability that the event happens. That is, lettingydenote our binary dependentvariable, we havePr (y= 1) =E(y) =PiyiN;whereNis the number of observations in the the unconditional probability is trivial, but usually not the most interesting thing we cando with the data. Suppose we want to analyze what factors determine changes in the probability thatyequals one. Can we use the classical linear regression framework to this end?33. The Regression ApproachConsider the linear regression modely= 1+ 2x2+:::+ KxK+u=x +u;( )where is aK 1vector of parameters,xis aN Kmatrix of explanatory variables, anduis a that the residual is uncorrelated with the regressors, endogeneity is not a problem.

5 Thisallows us to use OLS to estimate the parameters of interest. To interpret the results, note that if we take expectations on both sides of the equation above weobtainE(yjx; ) =x : Now, just like the unconditional probability thatyequals one is equal to the unconditional expectedvalue ofy, (y) = Pr (y= 1), the conditional probability thatyequals one is equal to theconditional expected value ofy:Pr (y= 1jx) =E(yjx; );Pr (y= 1jx) =x :( )Because probabilities must sum to one, it must also be thatPr (y= 0jx) = 1 x : Equation ( ) is abinary response model . In this particular model the probability of success( 1)is alinearfunction of the explanatory variables in the vectorx. This is why usingOLS with a binary dependent variable is called thelinear probability model (LPM).

6 4 Notice that in the LPM the parameter jmeasures the change in the probability of success , resultingfrom a change in the variablexj, holding other factors xed: Pr (y= 1jx) = j xj:This can be interpreted as a partial e ect on the probability of success .EXAMPLE: Modelling the probability of going to a private, unaided school (PUA) in , Table Shortcomings of the Linear Probability ModelClearly the LPM is straightforward to estimate, however there are some important shortcomings. One undesirable property of the LPM is that, if we plug in certain combinations of values for theindependent variables into ( ), we can get predictions either less than zero or greater than one. Ofcourse a probability by de nition falls within the (0,1) interval, so predictions outside this range arehard to interpret.

7 This is not an unusual result; for instance, based on the above LPM results, thereare 61 observations for which the predicted probability is larger than one and 81 observations forwhich the predicted probability is less than zero. That is, 16 per cent of the predictions fall outsidethe (0,1) interval in this application (see Figure 1 in the appendix, and the summary statistics forthe predictions reported below the table). Angrist and Pischke ( ): "..[linear regression] may generate tted values outside the LDVboundaries. This fact bothers some researchers and has generated a lot of bad press for the linearprobability model ." A related problem is that, conceptually, it does not make sense to say that a probability islinearlyrelated to a continuous independent variable for all possible values.

8 If it were, then continuallyincreasing this explanatory variable would eventually driveP(y= 1jx)above one or below data for this example are taken from the study by Kingdon (1996).5 For example, the model above predicts that an increase in parental wealth by 1 unit increases theprobability of going to a PUA school by about 1 percentage point. This may seem reasonable forfamilies with average levels of wealth, however in very rich or very poor families the wealth e ectis probably smaller. In fact, when taken to the extreme our model implies that a hundred-foldincrease in wealth increases the probability of going to a PUA by more than 1 which, of course, isimpossible (the wealth variable ranges from to 82 in the data, so such an comparison is notunrealistic). A third problem with the LPM - arguably less serious than those above - is that the residual isheteroskedastic by de nition.

9 Why is this? Becauseytakes the value of 1 or 0, the residuals inequation ( ) can take only two values, conditional onx:1 xand x. Further, the respectiveprobabilities of these events are xand1 x. Hence,var(ujx) = Pr (y= 1jx) [1 x ]2+ Pr (y= 0jx) [ x ]2=x [1 x ]2+ (1 x ) [ x ]2=x [1 x ];which clearly varies with the explanatory variablesx. The OLS estimator is still unbiased, but theconventional formula for estimating the standard errors, and hence the t-values, will be wrong. Theeasiest way of solving this problem is to obtain estimates of the standard errors that are robust toheteroskedasticity. EXAMPLE continued: Appendix - LPM with robust standard errors, Table 1b; compare to LPMwith non-robust standard errors (Table 1a). A fourth and related problem is that, because the residual can only take two values, it cannot benormally distributed.

10 The problem of non-normality means that OLS point estimates are unbiased6but its violation does mean that inference in small samples cannot be based on the usual suite ofnormality-based distributions such as : The LPM can be useful as a rst step in the analysis of binary choices, but awkward issues arise ifwe want to argue that we are modelling a probability. As we shall see next, probit and logit solve these particular problems. Nowadays, these are just aseasy to implement as LPM/OLS - but they are less straightforward to interpret. However, LPM remains a reasonably popular modelling framework because certain econometricproblems are easier to address within the LPM framework than with probits and logits. If, for whatever reason, we use the LPM, it is important to recognize that it tends to give betterestimates of the partial e ects on the response probability near the centre of the distribution ofx than at extreme values ( close to 0 and 1).


Related search queries