Example: stock market

Lecture 18: Multiple Logistic Regression

Lecture 18: Multiple Logistic RegressionMulugeta Gebregziabher, 701/755: Biostatistical methods II Spring 2007 Department of Biostatistics, Bioinformatics and EpidemiologyMedical University of South CarolinaLecture 18: Multiple Logistic Regression p. 1/40 Topics to be covered Review1. Purpose of empirical models: Association vs Prediction2. Design of observational studies: cross-sectional, prospective, case-control3. Randomization, Stratification and Matching Multiple Logistic regression1. The model2. Estimation and Interpretation of Parameters3. Confounding and Interaction4. Effects of omitted variables5. Model Fitting Strategies6. Goodness of Fit and Model Diagnostics matching (group and individual) Conditional vs Unconditional analysis methods III: advanced Regression MethodsLecture 18: Multiple Logistic Regression p.

Methods III: Advanced Regression Methods Lecture 18: Multiple Logistic Regression – p. 2/40. Review: Purpose of empirical models ... Lecture 18: Multiple Logistic Regression – p. 5/40. Review: Designs for observational studies We discuss three important designs that have a lot of use of logistic regression in their

Tags:

  Lecture, Multiple, Methods, Advanced, Logistics, Regression, Methods iii, Lecture 18, Multiple logistic regression, Advanced regression methods lecture 18

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Lecture 18: Multiple Logistic Regression

1 Lecture 18: Multiple Logistic RegressionMulugeta Gebregziabher, 701/755: Biostatistical methods II Spring 2007 Department of Biostatistics, Bioinformatics and EpidemiologyMedical University of South CarolinaLecture 18: Multiple Logistic Regression p. 1/40 Topics to be covered Review1. Purpose of empirical models: Association vs Prediction2. Design of observational studies: cross-sectional, prospective, case-control3. Randomization, Stratification and Matching Multiple Logistic regression1. The model2. Estimation and Interpretation of Parameters3. Confounding and Interaction4. Effects of omitted variables5. Model Fitting Strategies6. Goodness of Fit and Model Diagnostics matching (group and individual) Conditional vs Unconditional analysis methods III: advanced Regression MethodsLecture 18: Multiple Logistic Regression p.

2 2/40 Review: Purpose of empirical modelsEmpirical models: are models that are fitted to provide succinct descriptions of relationshipsobserved in data. They can be of different forms, here we focuson Regression models thathave wide applicability They are data-driven models that provide a range of possible relationships betweenvariables often specified by mathematical convenience and a preference for simplicity. If the model fits well, inferences are possible about the nature of relationships betweenvariables in the ranges where they are observed (NO extrapolation) Examples:Association studiesin Epidemiology andPrediction studiesin clinical or policymaking researchLecture 18: Multiple Logistic Regression p. 3/40 Association Studies Interest centers on what variables (variables of interest and adjustment variables) arein the model and the size and sign of their coefficients Predicted value for each observation or model fit is not of interest per seExample adjusting for appropriate covariates, is broccoli intake associated withcolorectal adenomatous polyps?

3 Logit(Pr(polyps)) = 0+ 1energyintake +..+ kBroccoliintakeExample adjusting for age, is heart disease (HD) associated with hypertension?logit(Pr(HD)) = 0+ 1 Age + 2hypertensionLecture 18: Multiple Logistic Regression p. 4/40 Prediction Studies Interest centers on being able to accurately estimate or predict the response for agiven combination of predictors Do not care much about which predictor variable allow to do this or what theircoefficients are (Model fit is important)Example Multiple Logistic Regression model for screening diabetes(Tabaei and Herman(2002) in Diabetes Care, 25, 1999-2003)logit(Pr(Diabetes)) = 0+ 1 Age + 2 Plasmaglucose + 3 Postprandialtime + 4 Female + 5 BMIE stimates: 0= , 1= , 2= , 3= , 4= , 5= used a cutoff of 20% to predict a previously undiagnoseddiabetes with sensitivity=65% andspecificity=96% Lecture 18: Multiple Logistic Regression p.

4 5/40 Review: Designs for observational studiesWe discuss three important designs that have a lot of use of Logistic Regression in denote an exposure or treatment andDto be an outcome indicator (disease,death, etc).For a binaryXandD,CROSS-SECTIONAL DESIGN: randomly select n from a population of N recordsDXD=1D=0totalX= 18: Multiple Logistic Regression p. 6/40 Review: Designs for observational studiesPROSPECTIVE DESIGN: randomly 0 DXD=1D=0totalX= DESIGN: randomly 18: Multiple Logistic Regression p. 7/40 Review: ExampleConsider a hypothetical study of the association between maternal age and birth weightusing data from 1000 records of hospital of can use either of the three designs discussed X=I(maternal age<=20 yrs) and D=I(birth weight <=2500 g)CROSS-SECTIONAL DESIGN: randomly select 200 from the 1000 recordsDXD=1D=0totalX=1104050X=015135150 Total25175200 Lecture 18: Multiple Logistic Regression p.

5 8/40 Review: ExamplePROSPECTIVE DESIGN:randomly select a 100 pregnant women age<= 20and 100 age>20 DXD=1D=0totalX=12080100X=01090100 Total30170200 CASE-CONTROL DESIGN: Randomly select 100 infants with birth weight<= 2500gand 100 with birthweight>2500gDXD=1D=0totalX=1402363X=06077137 Total100100200 Lecture 18: Multiple Logistic Regression p. 9/40 Randomization, Stratification and MatchingUsually investigators are interested to find out the net effect of a certain risk factor controllingfor confounding and effect modifying example to control for age, race and gender differences (if they are not the main factorsunder consideration)These can be done by using : Randomization, Stratification and/or Matching Randomization: an intervention at the design stage to balance the groups undercomparison on factors that are potential confounders of the relationship betweenXandD Stratification: can be done at the design stage or at the analysis stage.

6 It also is usedto control for potential confounders of the relationship betweenXandD Matching: it could be1:1 matching or group matching. It is done at the design stage. It isused to control for potential confounders of the relationship betweenXandDLecture 18: Multiple Logistic Regression p. 10/40 Very Important ObservationWe can measure the association betweenXandDusing Ratio of ProportionsP R=P r(D= 1/X= 1)P r(D= 1|X= 0)Or using ratio of OddsOR=P r(D= 1/X= 1)/P r(D= 0/X= 1)P r(D= 1|X= 0)/P r(D= 0/X= 0)=n11 n00n01 n10 Measures of AssociationDesignPr(D=1/X=1)Pr(D=1/X=0)P r(X=1/D=1)Pr(X=1/D=0)PRORC ross-sectional10/50= 18: Multiple Logistic Regression p. 11/40 Very Important Observation The difference between the OR and PR grows withP r(D= 1/X) The bottom line is to use OR when the disease is rare (P r(D= 1/X)<10%) (PR)RRORL ecture 18: Multiple Logistic Regression p.

7 12/40 Multiple Logistic Regression Problem:It is likely that the outcome variable will be determined not bya singlepredictor variable, but by many. Goal:To consider the simultaneous influence of several variableson the will help to reveal the relationships that may have been hidden during theunivariate analysis. Suppose we havepvariables (nominal, ordinal or continuous) that are measuredonnindividuals, theData Layout:for thensubjects isSubject 18: Multiple Logistic Regression p. 13/40 The ModelForE(D|X) = D|X, whereDis the disease indicator andXis the exposurelogit( D|X) = 0+ 1X1+..+ pXpIn Matrix notation, this can be re written as:logit( D|X) = XWhere = [ 0, .., p]andX = [X1, .., Xp]This model can be used for different purposes: to estimate an adjusted effect ofXiandDcontrolling for confounding factors (Xj6=i)(Eg.)

8 Effect of condom use on STD adjusting for number of partners) to assess or investigate interaction or effect modification (Eg. Effect of seat belt use onfatality for speeders and non-speeders) to obtain the best prediction model (Eg. Gail et al. JNCI 81:1879-86, 1989: present aprediction model for a White Woman s risk of breast cancer based on Age atmenarche, number of previous biopsies, age at first live birth,number of first-degreerelatives with breast cancer) Lecture 18: Multiple Logistic Regression p. 14/40 Output from a typical Regression packageA computer output from a typical Regression package will 0which is the effect when all X s are iwhich is the effect ofXicontrolling all otherXjs to be same3. an overall test ofH0: 1=..= pvsH1:Some i s are not equal to zero.(a) Can be tested using LR test which has Chi-square distribution withpdegrees offreedom.

9 (b)Note that rejecting the global null hypothesis means some/all the predictors considered doaid in predictingDor outcome. On the other hand failing to reject it does not imply none ofthe covariates are important. There can be effect of some covariates masked by a Wald test to assess the significance of each covariate in the modelLecture 18: Multiple Logistic Regression p. 15/40 Example: A two variable modelAnalysis of condom use and STS;proc format;value condom 0= worn 1= not worn ;value partners 0= <5 1= >=5 ;value std 0= no 1= yes ;run;data std;input condom partners std repeat;datalines;1 0 0 101 0 1 51 1 0 301 1 1 500 0 0 520 0 1 300 1 0 80 1 1 15;run; Lecture 18: Multiple Logistic Regression p. 16/40 Example: Two variable model and SAS outputlogit(Pr(std = 1)) = 0+ 1 Condom + 2 PartnersTesting Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSqLikelihood Ratio 2 2 2 Logistic ProcedureAnalysis of Maximum Likelihood EstimatesStandard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq Exp(Est)Intercept 1 1 1 18: Multiple Logistic Regression p.

10 17/40 Estimation and Interpretation of Parameters Estimation is done using Maximum Likelihood methods with Newton Raphson iterativealgorithm (there is closed form solution for p=1, binary) Interpretation of logit(Pr(D= 1)) = 0+ 1X1+..+ pXp1. 0is the log-odds whenX1=..=Xp= 02. 1is the log-odds ratio comparing levels ofX1, LIKEX= 1vsX= 0or for a unitchange inX1givenX2, .., Xpare held constant In our example: logit(Pr(std = 1)) = + + HomeworkWrite the interpretation of the coefficient of Condom use andnumber ofpartnersLecture 18: Multiple Logistic Regression p. 18/40 Confounding and InteractionThe first step in Multiple Logistic Regression is to test anyapriorihypothesis of interactioneffect followed by confounding effect. These are the two ways an extraneous variable mayaffect the relationship between outcome and exposureInteraction :exists when the relationship between two variables is different for different levelsof a third variable.


Related search queries