Example: stock market

Lecture 20 - Logistic Regression - Duke University

Lecture 20 - Logistic RegressionStatistics 102 Colin RundelApril 15, 2013 Background1 Background2 GLMs3 Logistic Regression4 Additional ExampleStatistics 102 Lec 20 Colin RundelBackgroundRegression so far ..At this point we have covered:Simple linear regressionRelationship between numerical response and a numerical or categoricalpredictorMultiple regressionRelationship between numerical response and multiple numericaland/or categorical predictorsWhat we haven t seen is what to do when the predictors are weird(nonlinear, complicated dependence structure, etc.) or when the responseis weird (categorical, count data, etc.)Statistics 102 (Colin Rundel)Lec 20 April 15, 20132 / 30 BackgroundRegression so far ..At this point we have covered:Simple linear regressionRelationship between numerical response and a numerical or categoricalpredictorMultiple regressionRelationship between numerical response and multiple numericaland/or categorical predictorsWhat we haven t seen is what to do when the predictors are weird(nonlinear, complicated dependence structure, etc.)

Logistic Regression Logistic Regression Logistic regression is a GLM used to model a binary categorical variable using numerical and categorical predictors. We assume a binomial distribution produced the outcome variable and we therefore want to model p the probability of success for a given set of predictors.

Tags:

  Using, Logistics, Regression, Logistic regression, Logistic regression logistic regression logistic regression

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Lecture 20 - Logistic Regression - Duke University

1 Lecture 20 - Logistic RegressionStatistics 102 Colin RundelApril 15, 2013 Background1 Background2 GLMs3 Logistic Regression4 Additional ExampleStatistics 102 Lec 20 Colin RundelBackgroundRegression so far ..At this point we have covered:Simple linear regressionRelationship between numerical response and a numerical or categoricalpredictorMultiple regressionRelationship between numerical response and multiple numericaland/or categorical predictorsWhat we haven t seen is what to do when the predictors are weird(nonlinear, complicated dependence structure, etc.) or when the responseis weird (categorical, count data, etc.)Statistics 102 (Colin Rundel)Lec 20 April 15, 20132 / 30 BackgroundRegression so far ..At this point we have covered:Simple linear regressionRelationship between numerical response and a numerical or categoricalpredictorMultiple regressionRelationship between numerical response and multiple numericaland/or categorical predictorsWhat we haven t seen is what to do when the predictors are weird(nonlinear, complicated dependence structure, etc.)

2 Or when the responseis weird (categorical, count data, etc.)Statistics 102 (Colin Rundel)Lec 20 April 15, 20132 / 30 BackgroundRegression so far ..At this point we have covered:Simple linear regressionRelationship between numerical response and a numerical or categoricalpredictorMultiple regressionRelationship between numerical response and multiple numericaland/or categorical predictorsWhat we haven t seen is what to do when the predictors are weird(nonlinear, complicated dependence structure, etc.) or when the responseis weird (categorical, count data, etc.)Statistics 102 (Colin Rundel)Lec 20 April 15, 20132 / 30 BackgroundRecap of what you should know how to do ..Model parameter interpretationHypothesis tests for slope and intercept parametersHypothesis tests for all Regression parametersConfidence intervals for Regression parametersConfidence and prediction intervals for predicted means and values(SLR only)Model diagnostics, residuals plots, outliersR2, AdjustedR2 Model selection (MLR only)Simple transformationsStatistics 102 (Colin Rundel)Lec 20 April 15, 20133 / 30 BackgroundOddsOdds are another way of quantifying the probability of an event,commonly used in gambling (and Logistic Regression ).

3 OddsFor some eventE,odds(E) =P(E)P(Ec)=P(E)1 P(E)Similarly, if we are told the odds of E arextoythenodds(E) =xy=x/(x+y)y/(x+y)which impliesP(E) =x/(x+y),P(Ec) =y/(x+y)Statistics 102 (Colin Rundel)Lec 20 April 15, 20134 / 30 GLMs1 Background2 GLMs3 Logistic Regression4 Additional ExampleStatistics 102 Lec 20 Colin RundelGLMsExample - Donner PartyIn 1846 the Donner and Reed families left Springfield, Illinois, for Californiaby covered wagon. In July, the Donner Party, as it became known, reachedFort Bridger, Wyoming. There its leaders decided to attempt a new anduntested rote to the Sacramento Valley. Having reached its full size of 87people and 20 wagons, the party was delayed by a difficult crossing of theWasatch Range and again in the crossing of the desert west of the GreatSalt Lake. The group became stranded in the eastern Sierra Nevadamountains when the region was hit by heavy snows in late October.

4 Bythe time the last survivor was rescued on April 21, 1847, 40 of the 87members had died from famine and exposure to extreme , and Schafer, (2002). The Statistical Sleuth: A Course in Methods of Data Analysis (2nd ed)Statistics 102 (Colin Rundel)Lec 20 April 15, 20135 / 30 GLMsExample - Donner Party - 102 (Colin Rundel)Lec 20 April 15, 20136 / 30 GLMsExample - Donner Party - EDAS tatus vs. Gender:MaleFemaleDied205 Survived1010 Status vs. Age:DiedSurvived2030405060 AgeStatistics 102 (Colin Rundel)Lec 20 April 15, 20137 / 30 GLMsExample - Donner Party - EDAS tatus vs. Gender:MaleFemaleDied205 Survived1010 Status vs. Age:DiedSurvived2030405060 AgeStatistics 102 (Colin Rundel)Lec 20 April 15, 20137 / 30 GLMsExample - Donner Party - ???It seems clear that both age and gender have an effect on someone ssurvival, how do we come up with a model that will let us explore thisrelationship?

5 Even if we set Died to 0 and Survived to 1, this isn t something we cantransform our way out of - we need something way to think about the problem - we can treat Survived and Died assuccesses and failures arising from a binomial distribution where theprobability of a success is given by a transformation of a linear model ofthe 102 (Colin Rundel)Lec 20 April 15, 20138 / 30 GLMsExample - Donner Party - ???It seems clear that both age and gender have an effect on someone ssurvival, how do we come up with a model that will let us explore thisrelationship?Even if we set Died to 0 and Survived to 1, this isn t something we cantransform our way out of - we need something way to think about the problem - we can treat Survived and Died assuccesses and failures arising from a binomial distribution where theprobability of a success is given by a transformation of a linear model ofthe 102 (Colin Rundel)Lec 20 April 15, 20138 / 30 GLMsExample - Donner Party - ?

6 ??It seems clear that both age and gender have an effect on someone ssurvival, how do we come up with a model that will let us explore thisrelationship?Even if we set Died to 0 and Survived to 1, this isn t something we cantransform our way out of - we need something way to think about the problem - we can treat Survived and Died assuccesses and failures arising from a binomial distribution where theprobability of a success is given by a transformation of a linear model ofthe 102 (Colin Rundel)Lec 20 April 15, 20138 / 30 GLMsGeneralized linear modelsIt turns out that this is a very general way of addressing this type ofproblem in Regression , and the resulting models are called generalizedlinear models (GLMs). Logistic Regression is just one example of this typeof generalized linear models have the following three characteristics:1A probability distribution describing the outcome variable2A linear model = 0+ 1X1+ + nXn3A link function that relates the linear model to the parameter of theoutcome distributiong(p) = orp=g 1( )Statistics 102 (Colin Rundel)Lec 20 April 15, 20139 / 30 GLMsGeneralized linear modelsIt turns out that this is a very general way of addressing this type ofproblem in Regression , and the resulting models are called generalizedlinear models (GLMs).

7 Logistic Regression is just one example of this typeof generalized linear models have the following three characteristics:1A probability distribution describing the outcome variable2A linear model = 0+ 1X1+ + nXn3A link function that relates the linear model to the parameter of theoutcome distributiong(p) = orp=g 1( )Statistics 102 (Colin Rundel)Lec 20 April 15, 20139 / 30 Logistic Regression1 Background2 GLMs3 Logistic Regression4 Additional ExampleStatistics 102 Lec 20 Colin RundelLogistic RegressionLogistic RegressionLogistic Regression is a GLM used to model a binary categorical variableusing numerical and categorical assume a binomial distribution produced the outcome variable and wetherefore want to modelpthe probability of success for a given set finish specifying the Logistic model we just need to establish areasonable link function that connects top.

8 There are a variety ofoptions but the most commonly used is the logit functionlogit(p) = log(p1 p),for 0 p 1 Statistics 102 (Colin Rundel)Lec 20 April 15, 201310 / 30 Logistic RegressionLogistic RegressionLogistic Regression is a GLM used to model a binary categorical variableusing numerical and categorical assume a binomial distribution produced the outcome variable and wetherefore want to modelpthe probability of success for a given set finish specifying the Logistic model we just need to establish areasonable link function that connects top. There are a variety ofoptions but the most commonly used is the logit functionlogit(p) = log(p1 p),for 0 p 1 Statistics 102 (Colin Rundel)Lec 20 April 15, 201310 / 30 Logistic RegressionProperties of the LogitThe logit function takes a value between 0 and 1 and maps it to a valuebetween and.

9 Inverse logit ( Logistic ) functiong 1(x) =exp(x)1 + exp(x)=11 + exp( x)The inverse logit function takes a value between and and maps itto a value between 0 and formulation also has some use when it comes to interpreting themodel as logit can be interpreted as the log odds of a success, more onthis 102 (Colin Rundel)Lec 20 April 15, 201311 / 30 Logistic RegressionThe Logistic Regression modelThe three GLM criteria give us:yi Binom(pi) = 0+ 1x1+ + nxnlogit(p) = From which we arrive at,pi=exp( 0+ 1x1,i+ + nxn,i)1 + exp( 0+ 1x1,i+ + nxn,i)Statistics 102 (Colin Rundel)Lec 20 April 15, 201312 / 30 Logistic RegressionExample - Donner Party - ModelIn R we fit a GLM in the same was as a linear model except usingglminstead oflmand we must also specify the type of GLM to fit using (glm(Status ~ Age, data=donner, family=binomial))## Call:## glm(formula = Status ~ Age, family = binomial, data = donner)#### Coefficients:## Estimate Std.

10 Error z value Pr(>|z|)## (Intercept) .## Age *#### Null deviance: on 44 degrees of freedom## Residual deviance: on 43 degrees of freedom## AIC: #### Number of Fisher Scoring iterations: 4 Statistics 102 (Colin Rundel)Lec 20 April 15, 201313 / 30 Logistic RegressionExample - Donner Party - PredictionEstimate Std. Error z value Pr(>|z|)(Intercept) :log(p1 p)= AgeOdds / Probability of survival for a newborn (Age=0):log(p1 p)= 0p1 p= exp( ) = = 102 (Colin Rundel)Lec 20 April 15, 201314 / 30 Logistic RegressionExample - Donner Party - PredictionEstimate Std. Error z value Pr(>|z|)(Intercept) :log(p1 p)= AgeOdds / Probability of survival for a newborn (Age=0):log(p1 p)= 0p1 p= exp( ) = = 102 (Colin Rundel)Lec 20 April 15, 201314 / 30 Logistic RegressionExample - Donner Party - PredictionEstimate Std.


Related search queries