Example: biology

11 Logistic Regression - Interpreting Parameters

11 Logistic Regression - Interpreting PARAMETERS11 Logistic Regression - Interpreting ParametersLet us expand on the material in the last section, trying to make sure we understand the logisticregression model and can interpretStataoutput. Consider first the case of a single binary predictor,wherex={1 if exposed to factor0 if not,andy={1 if develops disease0 does can be summarized in a simple 2 X 2 contingency table asExposureDisease101 (+)ab0 ( )cdwhere OR=adbc(why?) and we interpret OR >1 as indicating a risk factor, and OR <1 asindicating a protective the Logistic model:p(x) is the probability of disease for a given value of x, andlogit(p(x)) = log(p(x)1 p(x))= + for x = 0 (unexposed), logit(p(x)) = logit(p(0)) = + (0) = x = 1 (exposed),logit(p(x)) = logit(p(1)) = + (1) = + Also,odds of disease among unexposed:p(0)/(1 p(0))exposed:p(1)/(1 p(1))NowOR=odds of disease among exposedodds of disease among unexposed=p(1)/(1 p(1))}}

11 LOGISTIC REGRESSION - INTERPRETING PARAMETERS To interpret fl2, fix the value of x1: For x2 = k (any given value k) log odds of disease = fi +fl1x1 +fl2k odds of disease = efi+fl1x1+fl2k For x2 = k +1 log odds of disease = fi +fl1x1 +fl2(k +1) = fi +fl1x1 +fl2k +fl2 odds of disease = efi+fl1x1+fl2k+fl2 Thus the odds ratio (going from x2 = k to x2 = k +1 is OR

Tags:

  Regression, Interpret

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of 11 Logistic Regression - Interpreting Parameters

1 11 Logistic Regression - Interpreting PARAMETERS11 Logistic Regression - Interpreting ParametersLet us expand on the material in the last section, trying to make sure we understand the logisticregression model and can interpretStataoutput. Consider first the case of a single binary predictor,wherex={1 if exposed to factor0 if not,andy={1 if develops disease0 does can be summarized in a simple 2 X 2 contingency table asExposureDisease101 (+)ab0 ( )cdwhere OR=adbc(why?) and we interpret OR >1 as indicating a risk factor, and OR <1 asindicating a protective the Logistic model:p(x) is the probability of disease for a given value of x, andlogit(p(x)) = log(p(x)1 p(x))= + for x = 0 (unexposed), logit(p(x)) = logit(p(0)) = + (0) = x = 1 (exposed),logit(p(x)) = logit(p(1)) = + (1) = + Also,odds of disease among unexposed:p(0)/(1 p(0))exposed.}}

2 P(1)/(1 p(1))NowOR=odds of disease among exposedodds of disease among unexposed=p(1)/(1 p(1))p(0)/(1 p(0))and = logit(p(1)) logit(p(0))= log(p(1)(1 p(1))) log(p(0)(1 p(0)))= log(p(1)/(1 p(1))p(0)/(1 p(0)))= log(OR)The Regression coefficient in the population model is the log(OR), hence theORis obtained byexponentiating ,e =elog(OR)=ORRemark:If we fit this simple Logistic model to a 2 X 2 table, the estimated unadjustedOR(above)and the Regression coefficient for x have the same :Leukemia Survival Data (Section 10 p. 108). We can find the counts in the followingtable from thetabulate live iagcommand:Surv 1 yr?Ag+ (x=1)Ag- (x=0)Yes92No814and (unadjusted) OR=9(14)2(8)=.

3 Before proceeding with theStataoutput, let me comment about coding of the outcome packages are less rigid, butStataenforces the (reasonable) convention that 0 indicates anegative outcome and all other values indicate a positive outcome. If you try to code somethinglike 2 for survive a year or more and 1 for not survive a year or more,Statacoaches you with theerror message11211 Logistic Regression - Interpreting Parameters outcome does not vary; remember:0 = negative outcome,all other nonmissing values = positive outcomeThis data set uses 0 and 1 codes for thelivevariable; 0 and -100 would work, but not 1 and s look at both Regression estimates and direct estimates of unadjusted odds ratios logit live iagLogit estimates Number of obs = 33LR chi2(1) = > chi2 = likelihood = Pseudo R2 = | Coef.

4 Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------- ---------------------------------------- -------iag | .8986321 .3024066 | .7559289 Logistic live iagLogistic Regression Number of obs = 33LR chi2(1) = > chi2 = likelihood = Pseudo R2 = | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------- ---------------------------------------- -------iag | has fit logit( p(x)) = log( p(x)1 p(x))= + x= + , with OR= This is identical to the hand calculation above.

5 A 95% Confidence Intervalfor (IAG coefficient) This logit scale is where the real work andtheory is done. To get a Confidence Interval for the odds ratio, just exponentiate e OR do you conclude?A More Complex Modellog(p1 p)= + 1x1+ 2x2, wherex1is binary (as before) andx2is a continuous predictor. Theregression coefficients areadjusted log-odds interpret 1, fix the value of x2:Forx1= 0log odds of disease = + 1(0) + 2x2= + 2x2odds of disease =e + 2x2 Forx1= 1log odds of disease = + 1(1) + 2x2= + 1+ 2x2odds of disease =e + 1+ 2x2 Thus the odds ratio (going fromx1= 0 tox1= 1 isOR=odds whenx1= 1odds whenx1= 0=e + 1+ 2x2e + 2x2=e 1(rememberea+b=eaeb,soea+bea=eb), 1= log(OR).)

6 Hencee 1is the relative increase in theodds of disease, going fromx1= 0 tox1= 1 holdingx2fixed (oradjusting forx2).11311 Logistic Regression - Interpreting PARAMETERSTo interpret 2, fix the value of x1:Forx2=k(any given valuek)log odds of disease = + 1x1+ 2kodds of disease =e + 1x1+ 2kForx2=k+ 1log odds of disease = + 1x1+ 2(k+ 1)= + 1x1+ 2k+ 2odds of disease =e + 1x1+ 2k+ 2 Thus the odds ratio (going fromx2=ktox2=k+ 1 isOR=odds whenx2=k+ 1odds whenx2=k=e + 1x1+ 2k+ 2e + 1x1+ 2k=e 2= log(OR). Hencee 2is the relative increase in the odds of disease, going fromx2=ktox2=k+ 1 holdingx1fixed (oradjusting forx1). Put another way, for every increase of 1 inx2the odds of disease increases by a factor ofe 2.)

7 More generally, if you increasex2fromktok+ thenOR=odds whenx2=k+ odds whenx2=k=e 2 =(e 2) The Leukemia Datalog(p1 p)= + 1 IAG + 2 LWBC where IAG is a binary variable and LWBC is a continuous seen earlier--------------------------------- ---------------------------------------- -----live | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------- ---------------------------------------- -------iag | .3818672 | .4609479 | a fitted model oflog( p1 p)= + IAG LWBCThe estimated (adjusted)ORfor IAG , which of course we saw earlier in theStataoutput-------------------------- ---------------------------------------- ------------live | Odds Ratio Std.

8 Err. z P>|z| [95% Conf. Interval]-------------+----------------- ---------------------------------------- -------iag | | .3299682 .1520981 .1336942 .8143885-------------------------------- ---------------------------------------- ------The estimated odds that an Ag+ individual (IAG=1) survives at least one year is greaterthan the corresponding odds for an Ag- individual (IAG=0), regardless of the LWBC (althoughthe LWBC must be the same for both individuals).The estimatedORfor LWBC ise ( 13). For each increase in 1 unit of LWBC, theestimated odds of surviving at least a year decreases by roughly a factor of 3, regardless of ones11411 Logistic Regression - Interpreting PARAMETERSIAG.

9 Stated differently, if two individuals have the same Ag factor (either + or -) but differ ontheir values of LWBC by one unit, then the individual with the higher value of LWBC has about1/3 the estimated odds of survival for a year as the individual with the lower LWBC intervals for coefficients and ORs are related as before. For IAG the 95% CI for 1yields the 95% CI for the adjusted IAG OR as follows:.382 1 e 1 OR estimate that the odds of an Ag+ individual (IAG=1) surviving at least a year to be the odds of an Ag- individual surviving at least one year. We are 95% confident the oddsratio is between and How does this compare with the unadjusted odds ratio?

10 Similarly for LWBC, the 95% CI for 2yields the 95% CI for the adjusted LWBC OR as follows: 2 .205e e 2 e . OR .814We estimate the odds of surviving at least a year is reduced by a factor of 3 ( 1/3) for eachincrease of 1 LWBC unit. We are 95% confindent the reduction in odds is between .134 and . that while this is the usual way of defining the OR for a continuous predictor variable,software may try to trick you. JMP IN for instance would report OR=e (max(LWBC) min(LWBC))=.33max(LWBC) min(LWBC),the change from the smallest to the largest LWBC. That is a lot smaller number. You just have tobe careful and check what is being done by knowing these ModelWe can have a lot more than complicated models than we have been analyzing, but the principlesremain the same.


Related search queries