Transcription of Variable Selection - Biostatistics
1 Chapter10 VariableSelectionVariableselectionis intendedtoselectthe best wanttoexplainthedatainthesimplestway 's Razorstatesthatamongseveralplausibleexpl anationsfora phenomenon,thesimplestis ,thisimpliesthatthesmallestmodelthat tsthedatais causedbyhavingtoomany :if themodelistobeusedforprediction,wecansav e timeand/ormoney uentialpoints- maybeexcludethemat a naturalhierarchy. Forexample,inpolynomialmodels,x2is a ,it is importanttorespectthehierarchy. commonsituationswherethissituationarises : b0 b1x b2x2 eSupposewe tthismodeland ndthattheregressionsummaryshowsthatthete rminxis wethenremovedthexterm,ourreducedmodelwou ldthenbecomey b0 b2x2 scalechangex x a, thenthemodelwouldbecomey b0 b2a2 2b2ax b2x2 e The any rstordertermherecorrespondstothehypothes isthatthepredictedresponseis symmetricaboutandhasanoptimumatx ofthelowerorderterm. :y b0 b1x1 b2x2 b11x21 b22x22 b12x1x2We wouldnotnormallyconsiderremovingthex1x2i nteractiontermwithoutsimultaneouslyconsi d-eringtheremoval jointremoval wouldcorrespondtotheclearlymeaningfulcom parisonofa surfacethatis rotationofthepredictorspacewouldreintrod ucetheinteractiontermand,aswiththepolyno mials, a complex hierarchy, sometimescalledthe p-to-remove anddoesnothave tobe5%.
2 If predictionperformanceisthegoal,thena 15-20%cut-off mayworkbest, ,checktheirp-valueif they a Ateachstagea variablemaybeaddedorremovedandtherearese veralvariationsonexactlyhow thisis dohave one-at-a-time natureofadding/droppingvariables,it's possibleto missthe optimal Thereis somuchmultipletestingoccurringthattheval idityis oflesssigni cantpredictorstendstoincreasethesigni nalobjectivesofpredictionorexplanationan dsomaynotreallyhelpsolve variableselectionmethod,it is wouldbewrongtosaythesevariablesareunrela tedtotheresponse,it's justthatthey give a simpleexample, mightnothaveenoughevidenceto saythatit is relatedtoybutit stillmightbebetterto useit forpredictive illustratethevariableselectionmethodsons omedataonthe50states- thevariablesarepopula-tionestimateasofJu ly1,1975;percapitaincome(1974);illiterac y (1970,percentofpopulation);lifeexpectanc y inyears(1969-71);murderandnon-negligentm anslaughterrateper100,000population(1976 );percenthigh-schoolgraduates(1970);mean numberofdayswithmintemperature 32degrees(1931-1960)incapitalorlargecity ; willtake lifeexpectancy astheresponseandtheremainingvariablesasp redictors- a xisnecessarytoremove spacesinsomeofthevariablenames.
3 > data(state)> statedata<- ( , , )> g <- lm( ., data=statedata)> summary(g) valuePr(>|t|)(Intercept) + + < 42 degreesof freedomMultipleR- ,AdjustedR- 7 and 42 degreesof freedom,p- canyoutellfromthep-values?Lookingat thecoef cients,canyouseewhatoperationwouldbehelpful?Doesthemurderratedecreaselifeexpectancy - that's obviousapriori,buthow shouldtheseresultsbeinterpreted?We illustratethebackwardmethod- at eachstageweremove :> g <- update(g,.. - Area)> summary(g) valuePr(>|t|)(Intercept) + + < > g <- update(g,.. - Illiteracy)> summary(g) valuePr(>|t|)(Intercept) + + < > g <- update(g,.. - Income)> summary(g) valuePr(>|t|)(Intercept) + < > g <- update(g,.. - Population)> summary(g) valuePr(>|t|)(Intercept) < 46 degreesof freedomMultipleR- ,AdjustedR- :38 on 3 and 46 degreesof freedom,p- nalremovalofthePopulationvariableis a maywanttoconsiderincludingthisvariableif interpretationis offourpredictorscausesonlya minorreductionin thereareppotentialpredictors, branch-and-bound methodcanavoidactually ttingallthemodels InformationCriterion(AIC)andtheBayesInfo rmationCriterion(BIC) ,AIC 2 2pwhileBIC 2 !
4 " " # $ plognForlinearregressionmodels,the-2log- likelihood(knownasthedevianceisnlog%RSS& n'. We tbetterandsohave smallerRSSbut canapplytheAIC(andoptionallytheBIC) searchmethodthatcomparesmodelssequential ly. Thusit bearssomecomparisontothestepwisemethodde scribedabove butwiththeadvantagethatnodubiousp-values areused.> g <- lm( ., data=statedata)> step(g)Start:AIC= Population+ Income+ Illiteracy+ Murder+ +Frost+ AreaDf Sum of SqRSSAIC- <none> :AIC= Population+ Income+ Illiteracy+ Murder+ + :AIC= Population+ Murder+ + FrostDf Sum of SqRSSAIC<none> :(Intercept) + calledR2a. RecallthatR2(1)RSS*T SS. Addinga variabletoa modelcanonlydecreasetheRSSandsoonlyincre asetheR2soR2byitselfis nota goodcriterionbecauseit (1)RSS*"+n)p,T SS*"+n)1,(1)-n)1n)p.+1)R2,(1) s2model s2nullAddinga predictorwillonlyincreaseR2aif it s2?Minimizingthestandarderrorforpredicti onmeansminimizing (PRESS)isde nedas i e2/i0wherethe e/i0aretheresidualscalculatedwithoutusin gcaseiinthe (whichmaybedesirableif predictionis theobjective).
5 ' :1s2 iE+ yi)E yi, s222p3nwhere s2isfromthemodelwithallpredictorsandRSSp indicatestheRSSfroma modelwithpparameters.(a)Cpis easytocompute(b)It is closelyrelatedtoR2aandtheAIC.(c)Forthefu llmodelCp1pexactly.(d)If appredictormodel tsthenE4 RSSp514n3p5s2andthenE4Cp576p. A modelwitha bad is usualtoplotCpagainstp. We desiremodelswithsmallp ()functionis theMallow's Cpcriterion:> library(leaps)> x <- (g)[,-1]> y <- statedata$Life> g <- leaps(x,y)> Cpplot(g) betweenthe 456 , theCp1pline,indicatinggood tsa littlebetter. Someevenlargermodels tin thesensethatthey areonorbelow theCp8plinebutwewouldnotoptfortheseinthe presenceofsmallermodelsthat or2 let's seewhichmodeltheadjustedR2criterionselec ts.> adjr<- leaps(x,y,method="adjr2")> maxadjr(adjr,8) seethatthePopulation,Frost,HSgraduationa ndMurdermodelhasthelargestR2a. Thebestthreepredictormodelis in eighthplacebut theinterveningmodelsarenotattractive sincethey tooutliersandin 's checkforhighleveragepoints:> h <- hat(x)> names(h)<- > rev(sort(h)) 's tryexcludingit (Alaskais thesecondstateinthedata).
6 > l <- leaps(x[-2,],y[-2],method="adjr2")> maxadjr(l) seethatareanow makesit aneffect:Takea lookat thevariables:> par(mfrow=c(3,3))> for(iin 1:8)boxplot( [,i],main=dimnames( )[[2]][i]) ,weseethatPopulation,Illiteracy andAreaareskewed- wetrytransformingthem:> nx <- cbind(log(x[,1]),x[,2],log(x[,3]),x[,4:6],log(x[,7]))Andnow replot:> par(mfrow=c(3,3))> apply(nx,2,boxplot) trytheadjustedR2methodagain.> a <- leaps(nx,y,method="adjr2")> maxadjr(a) best modelagaintolog(Population),Frost, thehighestmodelswehave Exp2468101214 Murder404550556065HS Grad050100150 Frost0e+002e+054e+ a meansto to constructa restrictedsearchthroughthespaceofpotenti almodelsandusea awidersearchandcomparemodelsina preferablemanner. Forthisreason,I taboutaswellaseachother. If thishappens, similarqualitative consequences? make similarpredictions? thecostofmeasuringthepredictors? you ndmodelsthatseemroughlyequallyasgoodbutl eadtoquitedifferentconclusionsthenit isclearthatthedatacannotanswerthequestio nofinterestunambiguously.
7 Bealerttothedangerthata modelcontradictorytothetentative conclusionsmightbeoutthere.