Example: tourism industry

Logistic Regression - Carnegie Mellon University

Chapter 12 Logistic Modeling Conditional ProbabilitiesSo far, we either looked at estimating the conditional expectations of continuousvariables (as in Regression ), or at estimating distributions. There are many situationswhere however we are interested in input-output relationships, as in Regression , butthe output variable is discrete rather than continuous. In particular there are manysituations where we have binary outcomes (it snows in Pittsburgh on a given day, orit doesn t; this squirrel carries plague, or it doesn t; this loan will be paid back, orit won t; this person will get heart disease in the next five years, or they won t).

12.2.1 Likelihood Function for Logistic Regression Because logistic regression predicts probabilities, rather than just classes, we can fit it using likelihood. For each training data-point, we have a vector of features, x i, and an observed class, y i. The probability of that class was either p, if y i =1, or 1− p, if y i =0. The likelihood ...

Tags:

  Using, Logistics, Regression, Logistic regression

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Logistic Regression - Carnegie Mellon University

1 Chapter 12 Logistic Modeling Conditional ProbabilitiesSo far, we either looked at estimating the conditional expectations of continuousvariables (as in Regression ), or at estimating distributions. There are many situationswhere however we are interested in input-output relationships, as in Regression , butthe output variable is discrete rather than continuous. In particular there are manysituations where we have binary outcomes (it snows in Pittsburgh on a given day, orit doesn t; this squirrel carries plague, or it doesn t; this loan will be paid back, orit won t; this person will get heart disease in the next five years, or they won t).

2 Inaddition to the binary outcome, we have some input variables, which may or maynot be continuous. How could we model and analyze such data?We could try to come up with a rule which guesses the binary output from theinput variables. This is calledclassification, and is an important topic in statisticsand machine learning. However, simply guessing yes or no is pretty crude especially if there is no perfect rule. (Why should there be?) Something which takesnoise into account, and doesn t just give a binary answer, will often be useful.

3 Inshort, we want probabilities which means we need to fit a stochastic would be nice, in fact, would be to have conditional distribution of theresponseY, given the input variables, Pr(Y|X). This would tell us about how pre-cise our predictions are. If our model says that there s a 51% chance of snow and itdoesn t snow, that s better than if it had said there was a 99% chance of snow (thougheven a 99% chance is not a sure thing). We have seen how to estimate conditionalprobabilities non-parametrically, and could do this using the kernels for discrete vari-ables from lecture 6.

4 While there are a lot of merits to this approach, it does involvecoming up with a model for the joint distribution of outputsYand inputsX, whichcan be quite s pick one of the classes and call it 1 and the other 0 . (It doesn t mat-ter which is which. ThenYbecomes anindicator variable, and you can convinceyourself that Pr(Y=1)=E[Y]. Similarly, Pr(Y=1|X=x)=E[Y|X=x]. (Ina phrase, conditional probability is the conditional expectation of the indicator .)223224 CHAPTER 12. Logistic REGRESSIONThis helps us because by this point we know all about estimating conditional ex-pectations.)

5 The most straightforward thing for us to do at this point would be topick out our favorite smoother and estimate the Regression function for the indicatorvariable; this will be an estimate of the conditional probability are two reasons not to just plunge ahead with that idea. One is that proba-bilities must be between 0 and 1, but our smoothers will not necessarily respect that,even if all the observedyithey get are either 0 or 1. The other is that we might bebetter off making more use of the fact that we are trying to estimate probabilities, bymore explicitly modeling the that Pr(Y=1|X=x)=p(x; ), for some functionpparameterized by.

6 Parameterized function ,andfurther assume that observations are independentof each other. The the (conditional) likelihood function isn i=1Pr Y=yi|X=xi =n i=1p(xi; )yi(1 p(xi; )1 yi)( )Recall that in a sequence of Bernoulli trialsy1,..yn, where there is a constantprobability of successp, the likelihood isn i=1pyi(1 p)1 yi( )As you learned in intro. stats, this likelihood is maximized whenp= p=n 1 ni= each trial had its own success probabilitypi, this likelihood becomesn i=1pyii(1 pi)1 yi( )Without some constraints, estimating the inhomogeneous Bernoulli model by max-imum likelihood doesn t work; we d get pi=1 whenyi=1, pi=0 whenyi=0, andlearn nothing.

7 If on the other hand we assume that thepiaren t just arbitrary num-bers but are linked together, those constraints give non-trivial parameter estimates,and let us generalize. In the kind of model we are talking about, the constraint,pi=p(xi; ), tells us thatpimust be the same wheneverxiis the same, and ifpis acontinuous function, then similar values ofximust lead to similar values ofpi. As-sumingpis known (up to parameters), the likelihood is a function of , and we canestimate by maximizing the likelihood.

8 This lecture will be about this Logistic RegressionTo sum up: we have a binary output variableY, and we want to model the condi-tional probability Pr(Y=1|X=x)as a function ofx; any unknown parameters inthe function are to be estimated by maximum likelihood. By now, it will not surpriseyou to learn that statisticians have approach this problem by asking themselves howcan we use linear Regression to solve this? Logistic REGRESSION2251. The most obvious idea is to letp(x)be a linear function ofx.

9 Every incrementof a component ofxwould add or subtract so much to the probability. Theconceptual problem here is thatpmust be between 0 and 1, and linear func-tions are unbounded. Moreover, in many situations we empirically see dimin-ishing returns changingpby the same amount requires a bigger change inxwhenpis already large (or small) than whenpis close to 1/2. Linear modelscan t do The next most obvious idea is to let logp(x)be a linear function ofx, so thatchanging an input variablemultipliesthe probability by a fixed amount.

10 Theproblem is that logarithms are unbounded in only one direction, and linearfunctions are Finally, the easiest modification of logpwhich has an unbounded range is thelogistic(orlogit)transformation, logp1 p. We can makethisa linear func-tion ofxwithout fear of nonsensical results. (Of course the results could stillhappen to bewrong, but they re notguaranteedto be wrong.)This last alternative islogistic , the model Logistic Regression model is thatlogp(x)1 p(x)= 0+x ( )Solving forp, this givesp(x;b,w)=e 0+x 1+e 0+x =11+e ( 0+x )( )Notice that the over-all specification is a lot easier to grasp in terms of the transformedprobability that in terms of the untransformed minimize the mis-classification rate, we should predictY=1 whenp whenp< This means guessing 1 whenever 0+x is non-negative,and 0 otherwise.


Related search queries