Example: confidence

Logisticregression Frameworkand ideasof logistic Lecture ...

1 Lecture 14: Interpreting logistic regression modelsSandy May 20082 logistic regression Framework and ideas of Logisticregression similar to linear regression Still have a systematic and probabilisticpart to any model Coefficients have a new interpretation, based on log(odds) and log(odds ratios)3 Recall from last time:The logit function In logistic regression , we are alwaysmodelling the outcome log(p/(1-p)) We define the function:logit(p)= log(p/(1-p)) We often use the namelogitfor convenience In logistic regression , we have the logit on the left-hand side of the equation4 Example: Public health graduate students 323 graduate students in introductory biostatistics took a health survey. Current smoking status was assessed, which we will predict with gender Associating demographics with smoking is vital to planning public health programs. Information was also collected on age, exercise, and history of smoking; potential confounders of the association between gender and current smoking.

1 Lecture 14: Interpreting logistic regression models Sandy Eckel seckel@jhsph.edu 15May2008 2 Logisticregression Frameworkand ideasof logistic regressionsimilarto linearregression

Tags:

  Lecture, Logistics, Regression, Logistic regression, Desafio, Logisticregression frameworkand ideasof logistic, Logisticregression, Frameworkand, Logisticregression frameworkand ideasof logistic lecture

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Logisticregression Frameworkand ideasof logistic Lecture ...

1 1 Lecture 14: Interpreting logistic regression modelsSandy May 20082 logistic regression Framework and ideas of Logisticregression similar to linear regression Still have a systematic and probabilisticpart to any model Coefficients have a new interpretation, based on log(odds) and log(odds ratios)3 Recall from last time:The logit function In logistic regression , we are alwaysmodelling the outcome log(p/(1-p)) We define the function:logit(p)= log(p/(1-p)) We often use the namelogitfor convenience In logistic regression , we have the logit on the left-hand side of the equation4 Example: Public health graduate students 323 graduate students in introductory biostatistics took a health survey. Current smoking status was assessed, which we will predict with gender Associating demographics with smoking is vital to planning public health programs. Information was also collected on age, exercise, and history of smoking; potential confounders of the association between gender and current smoking.

2 First we will focus only on the association between gender and current smoking status5 Coding our two variables for the first example Outcome: smoking = 1 for current smokers0 for current nonsmokers Primary predictor: gender = 1 for men0 for women6 Recall: an analogous linear regression model In linear regression , if we had only one binary X like gender, we would be predicting two means: 0 the mean outcome when X=0 0+ 1 the mean outcome when X=1 1 the differencein mean outcome when X=1 vs. when X=0()Gender10 E(Y)+=7 logistic regression modeland ResultsLogit estimates Number of obs = 323LR chi2(1) = > chi2 = likelihood = Pseudo R2 = | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------- ---------------------------------------- -------gender | .967966.

3 4547931 .0765879 (Intercept)| .3235656 ()() 1log10+= += gender = 1 for men0 for women8 For women, gender=0: For men, gender=1: 1is the difference between men and women 1is the change in log odds comparing men to women( ) =+ = ( ) =+ = logistic RegressionGender-specific results()() 1ln10+= += 9 logistic RegressionInterpretation 1: log(odds) scale 0: the log oddsof smoking for women 0+ 1: the log odds of smoking for men 1: the differencein the log odds of smoking for men compared to women()() 1ln10+= += gender = 1 for men0 for women10 What if we wanted to get the odds interpretation, not the log We can start to untransform the equations Recall: For women, X=0: log(odds)= 0+ 1(0) = 0 For men, X=1: log(odds)= 0+ 1(1) ()be)exp(log(a) then,alog if=== menfor smokingof 10====++ for womensmokingof 0===11 logistic RegressionInterpretation 2: odds scale the oddsof smoking for women (when X=0) the odds of smoking for men (when X=1) In the past, we ve compared two sets of odds by dividing to find the odds ratio (OR):e0 :e10 +12 Comparing odds If we subtractthe log odds, mathematically that s equivalent to dividing inside the log: log(a) log(b) = log(a/b) So, if is the odds when X=1, and is the odds when X=0, then we want to dividethem in order to 0== 10===++ womenoddsmenfor oddsRatioOdds010 ====+13 logistic RegressionInterpretation: the odds ratio The odds of smoking is about 2 times greater for men than for women.

4 Based on this study, perhaps smoking cessationprograms should be targeted toward womenoddsmenfor oddsRatioOdds010 ====+14 Useful math ratios of exponentiated terms We can usually simplify an equation like this() ()babaeee ++====1010010 - e e ee RatioOddsbecause15 Taking a ratio of odds to get the odds ratio the oddswhen X=0 the odds when X=1 the odds ratio comparing the odds when X=1 vs. X=0:e0 :e10 +1010 eee=+16 Two interpretations of logistic regression slopes 0+ 1= log(odds) (for X=1) 1= differencein log odds = odds (for X=1) = odds ratio But we started with P(Y=1) Can we find that?10 e+1 e17 More useful math how to get the probability from the odds ()1010 e1e 1X so+++==Podds1oddsrobabilityp+=robability p1robabilityp odds =18 Finding the probability from the log oddsFind the log odds:For X=0: log(odds) = 0 For X=1: log(odds) = 0+ 1 Find odds: For X=0: odds =For X=1: odds =Transform odds into probability:(next )10 e+0 e19 Finding the probability from the log odds, odds into probability: 1010 e1erobabilityp :1 XFor +++==odds1oddsp+=00 e1erobabilityp :0 XFor +==20We could even go one step further no way to simplify()1010 e1emale|smokeP :1 XFor +++==()00 e1efemale|smokeP :0 XFor +== + +=++001010 21e1ee1epp.

5 WomenMen Risk elativeR21pp(RR)Risk lativeRe=21 Remember to consider study design We always cancalculate the relative risk The relative risk is not appropriate for case-control studies Again, because the investigators decide the number of cases and controls to study The odds ratio is appropriate for all study designs22In General logistic regression for a binary outcome Left side of equation is log odds Can transform the equation to find odds probability Can compare two groups difference of log odds log odds ratio odds ratio relative risk (Almost) everything we learned before applies23 Summary:Useful math for logistic regression X=1: log(odds)= 0+ 1(1) log(a) log(b) = log(a/b)so log(odds|X=1) log(odds|X=0) = log(OR for X=1 vs. X=0) ()be a )exp(log(a) then,alog===bIf1010 b-abaeee so eee==+()10 e 1 Xfor ddso so+==()1010 e1e 1 Xfor robabilityp so+++==odds1oddsrobabilityp+=()2 2 baba1111eeee so eee :Also= = =+24 Another Example Regular physical examination is an important preventative public health measure We ll study this outcome using the public health graduate student dataset Outcome: No physical exam in the past two years Primary predictor: age (centered) Secondary predictor and potential confounder: regularly taking a multivitamin25 Problem with outcome variable.

6 The original physician visit variable was meant to be continuous, but it was collected categorically time since last physician visit Since it is now categorical and we wish to use it as the outcome for a regression model, we will make it binary and use logistic regressionPhys = 1 if over 2 years0 if 2 years or lessLength of time since last |check-up | Freq. Percent +-----------------------------------With in the past year | 182 the past 1-2 years | 72 the past 2-5 years | 53 or more years | 29 +-----------------------------------Tota l | 336 Predict Phys (no physician visit within the past two years=1) with centered Age (continuous) After adjusting for age, is taking a multivitamin (1=yes) a statistically significant predictor for not regularly visiting a physician? Is taking a multivitamin a confounder for the age-physician visit relationship?

7 27 ResultsModel 1: Intercept and AgeLogit estimates Number of obs = 336LR chi2(1) = > chi2 = likelihood = Pseudo R2 = | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------- ---------------------------------------- -------agec | .0176509 .0336365(Intercept) | .1270539 thatagec = age-30 (centered age)()() 30 1log10 = += AgeppAgepp28 Model 1: Interpretation of coefficients on log odds scale 0: the log odds of not visiting a physician for a 30-year-old 1: the difference in the log odds of not visiting a physician for a one year increase in age()() 30 1log10 = += AgeppAgepp29()() 30 1log10 = += AgeppAgepp For a 30-year-old: For a 31-year-old: 1is the difference in the log odds associated with a 1 year increase in age() = = pp() = = = ppPredictions by ageModel 1: How did we get the difference in log odds interpretation of 1 ?

8 30 Model 1: Interpretation of 1(diff log odds = log OR) log(a) log(b) = log(a/b) so log(odds|X=31) log(odds|X=30) = log(OR for X=31 vs. X=30) difference of log odds = log odds ratio Alternate interpretation for 1: The log odds ratioof not visiting a physician associated with a one year increase in age31 For a 31-year-old: For a 30-year-old: Odds ratio = () 1010 +() a ngnot visitiof ddso = =Interpretation: log(odds ratio) for one year age differenceModel 1: Interpretation of 1(OR = ratio of odds)32 Model 1: Interpretation of 1odds ratio for one year age difference is the odds of not visiting a physician for 30-year-olds is the odds of not visiting a physician for 31-year-olds is the odds ratio of not visiting a physician corresponding to a one year increase in age0 e10 e+1 e33 For a 32-year-old: For a 30-year-old: Ratio = () ()2 2 2 +() a ngnot visitiof ddso = =Interpretation: odds ratio for two year age differenceModel 1: Interpretation of 1 What is the OR fortwoyear age difference?

9 34 For a 40-year-old: For a 30-year-old: Ratio = () ()10 10 10 +() a ngnot visitiof ddso = =Interpretation: odds ratio for 10 year age differenceModel 1: Interpretation of 1 What is the OR fortenyear age difference?35 is the proportional increaseof the odds of not visiting a physician corresponding to a one year increase in age is the proportional increaseof the odds of not visiting a physician corresponding to a ten year increase in age1 e()1110 10 ee=()()()()old-yr-31 foroddsold-yr-30 foroddsold-yr-31 foroddsold-yr-30 forodds= Model 1: Interpretation of 1 What is the OR foranyage difference?36()() a ngnot visitiof robabilityp +== For a 40-year-old: For a 30-year-old: The relative risk (RR) is()( ) +=+= ()() ++=+= 1: How could we get a Relative Risk? (if it was appropriate based on our study design)37 Model 1: Probabilities and Relative Risk for 10 year diff is the probability of not visiting a physician for 30-year-olds is the probability of not visiting a physician for 40-year-olds is the relative risk of not visiting a physician for 40-year-olds vs.

10 30-year-olds00 e1e+10 10 1010e1e + ++001010 10 10 e1ee1e++ + +38 Remember those Goals? Predict Phys (no physician visit within the past two years=1) with Age (continuous) After adjusting for age, is taking a multivitamin (1=yes) a statistically significant predictor for not regularly visiting a physician? Is taking a multivitamin a confounder for the age-physician visit relationship?39 Nested models Adding a single new variable to the model Model 1: Model 2:()30 1log10 += Agepp()()inMultivitam 30 1log210+ += Agepp40 logistic regression :Comparing nested models that differ by one variable Compare models with p-value or CI What method is this? The Wald test, a test that applies the CLT, like Z test comparing proportions in 2x2 table X2test for independence in 2x2 table analogous to the t test for linear regression H0: the new variable is not neededOr, equivalentlyH0: new=0 in the population41 Logit estimates Number of obs = 317LR chi2(2) = > chi2 = likelihood = Pseudo R2 = | Coef.