Example: tourism industry

11 Logistic Regression - Interpreting Parameters

11 Logistic Regression - Interpreting PARAMETERS11 Logistic Regression - Interpreting ParametersLet us expand on the material in the last section, trying to make sure we understand the logisticregression model and can interpretStataoutput. Consider first the case of a single binary predictor,wherex={1 if exposed to factor0 if not,andy={1 if develops disease0 does can be summarized in a simple 2 X 2 contingency table asExposureDisease101 (+)ab0 ( )cdwhere OR=adbc(why?) and we interpret OR >1 as indicating a risk factor, and OR <1 asindicating a protective the Logistic model:p(x) is the probability of disease for a given value of x, andlogit(p(x)) = log(p(x)1 p(x))= + for x = 0 (unexposed), logit(p(x)) = logit(p(0)) = + (0) = x = 1 (exposed),logit(p(x)) = logit(p(1)) = + (1) = + Also,odds of disease among unexposed:p(0)/(1 p(0))exposed:p(1)/(1 p(1))NowOR=odds of disease among exposedodds of disease among unexposed=p(1)/(1 p(1))p(0)/(1 p(0))and = logit(p(1)) logit(p(0))= log(p(1)(1 p(1))) log(p(0)(1 p(0)))= log(p(1)/(1 p(1))p(0)/(1 p(0)))= log(OR)The Regression coefficient in the population model is the log(OR), hence theORis obtained byexponentiating ,e =elog(OR)=ORRemark:If we fit this simple Logistic model to a 2 X}}

11 LOGISTIC REGRESSION - INTERPRETING PARAMETERS outcome does not vary; remember: 0 = negative outcome, all other nonmissing values = positive outcome This data set uses 0 and 1 codes for the live variable; 0 and -100 would work, but not 1 and 2. Let’s look at both regression estimates and direct estimates of unadjusted odds ratios from Stata.

Tags:

  Parameters

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of 11 Logistic Regression - Interpreting Parameters

1 11 Logistic Regression - Interpreting PARAMETERS11 Logistic Regression - Interpreting ParametersLet us expand on the material in the last section, trying to make sure we understand the logisticregression model and can interpretStataoutput. Consider first the case of a single binary predictor,wherex={1 if exposed to factor0 if not,andy={1 if develops disease0 does can be summarized in a simple 2 X 2 contingency table asExposureDisease101 (+)ab0 ( )cdwhere OR=adbc(why?) and we interpret OR >1 as indicating a risk factor, and OR <1 asindicating a protective the Logistic model:p(x) is the probability of disease for a given value of x, andlogit(p(x)) = log(p(x)1 p(x))= + for x = 0 (unexposed), logit(p(x)) = logit(p(0)) = + (0) = x = 1 (exposed),logit(p(x)) = logit(p(1)) = + (1) = + Also,odds of disease among unexposed:p(0)/(1 p(0))exposed:p(1)/(1 p(1))NowOR=odds of disease among exposedodds of disease among unexposed=p(1)/(1 p(1))p(0)/(1 p(0))and = logit(p(1)) logit(p(0))= log(p(1)(1 p(1))) log(p(0)(1 p(0)))= log(p(1)/(1 p(1))p(0)/(1 p(0)))= log(OR)The Regression coefficient in the population model is the log(OR), hence theORis obtained byexponentiating ,e =elog(OR)=ORRemark.}}

2 If we fit this simple Logistic model to a 2 X 2 table, the estimated unadjustedOR(above)and the Regression coefficient for x have the same :Leukemia Survival Data (Section 10 p. 108). We can find the counts in the followingtable from thetabulate live iagcommand:Surv 1 yr?Ag+ (x=1)Ag- (x=0)Yes92No814and (unadjusted) OR=9(14)2(8)= .Before proceeding with theStataoutput, let me comment about coding of the outcome packages are less rigid, butStataenforces the (reasonable) convention that 0 indicates anegative outcome and all other values indicate a positive outcome. If you try to code somethinglike 2 for survive a year or more and 1 for not survive a year or more,Statacoaches you with theerror message11211 Logistic Regression - Interpreting Parameters outcome does not vary; remember:0 = negative outcome,all other nonmissing values = positive outcomeThis data set uses 0 and 1 codes for thelivevariable; 0 and -100 would work, but not 1 and s look at both Regression estimates and direct estimates of unadjusted odds ratios logit live iagLogit estimates Number of obs = 33LR chi2(1) = > chi2 = likelihood = Pseudo R2 = | Coef.

3 Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------- ---------------------------------------- -------iag | .8986321 .3024066 | .7559289 Logistic live iagLogistic Regression Number of obs = 33LR chi2(1) = > chi2 = likelihood = Pseudo R2 = | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------- ---------------------------------------- -------iag | has fit logit( p(x)) = log( p(x)1 p(x))= + x= + , with OR= This is identical to the hand calculation above. A 95% Confidence Intervalfor (IAG coefficient) This logit scale is where the real work andtheory is done. To get a Confidence Interval for the odds ratio, just exponentiate e OR do you conclude?

4 A More Complex Modellog(p1 p)= + 1x1+ 2x2, wherex1is binary (as before) andx2is a continuous predictor. Theregression coefficients areadjusted log-odds interpret 1, fix the value of x2:Forx1= 0log odds of disease = + 1(0) + 2x2= + 2x2odds of disease =e + 2x2 Forx1= 1log odds of disease = + 1(1) + 2x2= + 1+ 2x2odds of disease =e + 1+ 2x2 Thus the odds ratio (going fromx1= 0 tox1= 1 isOR=odds whenx1= 1odds whenx1= 0=e + 1+ 2x2e + 2x2=e 1(rememberea+b=eaeb,soea+bea=eb), 1= log(OR). Hencee 1is the relative increase in theodds of disease, going fromx1= 0 tox1= 1 holdingx2fixed (oradjusting forx2).11311 Logistic Regression - Interpreting PARAMETERSTo interpret 2, fix the value of x1:Forx2=k(any given valuek)log odds of disease = + 1x1+ 2kodds of disease =e + 1x1+ 2kForx2=k+ 1log odds of disease = + 1x1+ 2(k+ 1)= + 1x1+ 2k+ 2odds of disease =e + 1x1+ 2k+ 2 Thus the odds ratio (going fromx2=ktox2=k+ 1 isOR=odds whenx2=k+ 1odds whenx2=k=e + 1x1+ 2k+ 2e + 1x1+ 2k=e 2= log(OR).))

5 Hencee 2is the relative increase in the odds of disease, going fromx2=ktox2=k+ 1 holdingx1fixed (oradjusting forx1). Put another way, for every increase of 1 inx2the odds of disease increases by a factor ofe 2. More generally, if you increasex2fromktok+ thenOR=odds whenx2=k+ odds whenx2=k=e 2 =(e 2) The Leukemia Datalog(p1 p)= + 1 IAG + 2 LWBC where IAG is a binary variable and LWBC is a continuous seen earlier--------------------------------- ---------------------------------------- -----live | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------- ---------------------------------------- -------iag | .3818672 | .4609479 | a fitted model oflog( p1 p)= + IAG LWBCThe estimated (adjusted)ORfor IAG , which of course we saw earlier in theStataoutput-------------------------- ---------------------------------------- ------------live | Odds Ratio Std.

6 Err. z P>|z| [95% Conf. Interval]-------------+----------------- ---------------------------------------- -------iag | | .3299682 .1520981 .1336942 .8143885-------------------------------- ---------------------------------------- ------The estimated odds that an Ag+ individual (IAG=1) survives at least one year is greaterthan the corresponding odds for an Ag- individual (IAG=0), regardless of the LWBC (althoughthe LWBC must be the same for both individuals).The estimatedORfor LWBC ise ( 13). For each increase in 1 unit of LWBC, theestimated odds of surviving at least a year decreases by roughly a factor of 3, regardless of ones11411 Logistic Regression - Interpreting PARAMETERSIAG. Stated differently, if two individuals have the same Ag factor (either + or -) but differ ontheir values of LWBC by one unit, then the individual with the higher value of LWBC has about1/3 the estimated odds of survival for a year as the individual with the lower LWBC intervals for coefficients and ORs are related as before.

7 For IAG the 95% CI for 1yields the 95% CI for the adjusted IAG OR as follows:.382 1 e 1 OR estimate that the odds of an Ag+ individual (IAG=1) surviving at least a year to be the odds of an Ag- individual surviving at least one year. We are 95% confident the oddsratio is between and How does this compare with the unadjusted odds ratio?Similarly for LWBC, the 95% CI for 2yields the 95% CI for the adjusted LWBC OR as follows: 2 .205e e 2 e . OR .814We estimate the odds of surviving at least a year is reduced by a factor of 3 ( 1/3) for eachincrease of 1 LWBC unit. We are 95% confindent the reduction in odds is between .134 and . that while this is the usual way of defining the OR for a continuous predictor variable,software may try to trick you. JMP IN for instance would report OR=e (max(LWBC) min(LWBC))=.

8 33max(LWBC) min(LWBC),the change from the smallest to the largest LWBC. That is a lot smaller number. You just have tobe careful and check what is being done by knowing these ModelWe can have a lot more than complicated models than we have been analyzing, but the principlesremain the same. Suppose we havekpredictor variables wherekcan be considerably more than 2and the variables are a mix of binary and continuous. then we writelog(p1 p)= log odds of disease = + 1x1+ 2x2+..+ kxkwhich is a Logistic multiple Regression model. Now fix values ofx2, x3, .. , xk, and we getodds of disease forx1=c:e + 1c+ 2x2+..+ kxkx1=c+ 1 :e + 1(c+1)+ 2x2+..+ kxkThe odds ratio, increasingx1by 1 and holdingx2, x3, .. , xkfixed at any values isOR=e + 1(c+1)+ 2x2+..+ kxke + 1c+ 2x2+..+ kxk=e 1 That is,e 1is the increase in odds of disease obtained by increasingx1by 1 unit, holdingx2, x3.

9 , xkfixed ( adjusting for levels ofx2, x3, .. , xk). For this to make sense x1needs to be binary or continuous None of the remaining effectsx2, x3, .. , xkcan be an interaction (product) effect withx1. I will say more about this later! The essential problem is that if one or more ofx2, x3, .. , xkdepends uponx1then you cannot mathematically increasex1and simulta-neously holdx2, x3, .. , Logistic Regression - Interpreting PARAMETERSE xample: The UNM Trauma DataThe data to be analyzed here were collected on 3132 patients admitted to The University of NewMexico Trauma Center between the years 1991 and 1994. For each patient, the attending physicianrecorded their age, their revised trauma score (RTS), their injury severity score (ISS), whethertheir injuries were blunt ( the result of a car crash: BP=0) or penetrating ( gunshot wounds:BP=1), and whether they eventually survived their injuries (DEATH = 1 if died, DEATH = 0 ifsurvived).

10 Approximately 9% of patients admitted to the UNM Trauma Center eventually die fromtheir ISS is an overall index of a patient s injuries, based on the approximately 1300 injuriescataloged in the Abbreviated Injury Scale. The ISS can take on values from 0 for a patient with noinjuries to 75 for a patient with 3 or more life threatening injuries. The ISS is the standard injuryindex used by trauma centers throughout the The RTS is an index of physiologic injury, andis constructed as a weighted average of an incoming patient s systolic blood pressure, respiratoryrate, and Glasgow Coma Scale. The RTS can take on values from 0 for a patient with no vitalsigns to for a patient with normal vital et al. (1981) proposed a Logistic Regression model to estimate the probability of apatient s survival as a function of RTS, the injury severity score ISS, and the patient s age, which isused as a surrogate for physiologic reserve.


Related search queries