Example: quiz answers

Predicting Good Probabilities With Supervised Learning

IthacaNY14853 AbstractWe showthatmaxi-mummarginmethodssuchasboost edtreesandboostedstumpspushprobabilityma ssawayfrom0 and1 yieldinga Bayes,whichmake unrealis-ticindependenceassumptions,push probabilitiestoward0 ex-perimentwithtwo waysofcorrectingthebiasedprobabilitiespr edictedbysomelearningmeth- muchdatathey ,randomforests, IntroductionInmany applicationsit isimportanttopredictwellcali-bratedproba bilities;goodaccuracy orareaundertheROCcurve :SVMs,neuralnets,decisiontrees,memory-ba sedlearn-ing,baggedtrees,randomforests,b oostedtrees,boostedstumps,naive show howmaximummarginmethodssuchasSVMs,booste dtrees,andboostedstumpstendtopushpredict edprobabilitiesawayfrom0 predictandyieldsa bayeshave theoppositebiasandtendtopushpredictionsc loserto0 , Bonn,Germany, (s)/owner(s).

The reli-ability plots in the bottom of the figure show the function fitted with Isotonic Regression. Examining the histograms of predicted values (top row in Figure 1), note that almost all the values predicted by boosted trees lie in the central region with few predictions

Tags:

  With, Good, Learning, Supervised, Ability, Predicting, Probabilities, Lire, R eliability, Predicting good probabilities with supervised learning

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Predicting Good Probabilities With Supervised Learning

1 IthacaNY14853 AbstractWe showthatmaxi-mummarginmethodssuchasboost edtreesandboostedstumpspushprobabilityma ssawayfrom0 and1 yieldinga Bayes,whichmake unrealis-ticindependenceassumptions,push probabilitiestoward0 ex-perimentwithtwo waysofcorrectingthebiasedprobabilitiespr edictedbysomelearningmeth- muchdatathey ,randomforests, IntroductionInmany applicationsit isimportanttopredictwellcali-bratedproba bilities;goodaccuracy orareaundertheROCcurve :SVMs,neuralnets,decisiontrees,memory-ba sedlearn-ing,baggedtrees,randomforests,b oostedtrees,boostedstumps,naive show howmaximummarginmethodssuchasSVMs,booste dtrees,andboostedstumpstendtopushpredict edprobabilitiesawayfrom0 predictandyieldsa bayeshave theoppositebiasandtendtopushpredictionsc loserto0 , Bonn,Germany, (s)/owner(s).

2 Suchasbaggedtreesandneuralnetshave (orlackof)characteristictoeachlearningme thod,weexperimentwithtwo :a methodfortransformingSVMoutputsfrom[ 1;+1]toposteriorprobabilities(Platt,1999 )IsotonicRegression:themethodusedbyZadro zny andElkan(2002;2001)tocalibrateprediction sfromboostednaive bayes,SVM,anddecisiontreemodelsPlattScal ingismosteffective whenthedistortioninthepredictedprobabili tiesis a morepowerfulcalibrationmethodthatcancorr ectany , thisextrapowercomesat a learningcurve analysisshowsthatIso-tonicRegressionis morepronetooverfitting,andthusper-formsw orsethanPlattScaling,whendatais , weexaminehowgoodaretheprobabilitiespre-d ictedbyeachlearningmethodaftereachmethod s predic-tionshave ,neuralnetsandbaggeddecisiontreesaretheb estlearningmethodsforpredictingwell-cali bratedprobabilitiespriortocalibration,bu t aftercalibrationthebestmethodsareboosted trees, CalibrationMethodsInthissectionwedescrib ethetwo , thesemethodsaredesignedforbinaryclassifi cationandit is ,calibratethebinarymodels.

3 Andrecombinethepre-dictions(Zadrozny &Elkan,2002). (1999)proposedtransformingSVMpredictions toposteriorprobabilitiesbypassingthemthr ougha willseeinSection4 thata learningmethodbef(x). To getcali-bratedprobabilities,passtheoutpu tthrougha sigmoid:P(y= 1jf) =11 +exp(Af+B)(1)wheretheparametersAandBaref ittedusingmaximumlikelihoodestimationfro ma fittingtrainingset(fi; yi).GradientdescentisusedtofindAandBsuch thattheyarethesolutionto:argminA;Bf Xiyilog(pi) + (1 yi)log(1 pi)g;(2)wherepi=11 +exp(Afi+B)(3)Two questionsarise:wheredoesthesigmoidtrains etcomefrom?andhow toavoidoverfittingtothistrainingset?If weusethesamedatasetthatwasusedtotrainthe modelwewanttocalibrate, ,if themodellearnstodiscriminatethetrainsetp er-fectlyandordersallthenegative examplesbeforetheposi-tive examples,thenthesigmoidtransformationwil loutputjusta 0, orderto ,however, is nota draw back, avoidoverfittingtothesigmoidtrainset,ano ut-of-samplemodelis thereareN+positive examplesandN negative examplesinthetrainset,foreachtrain-ingex amplePlattCalibrationusestargetvaluesy+a ndy (insteadof1 and0, respectively),wherey+=N++ 1N++ 2;y =1N + 2(4)Fora moredetailedtreatment,anda justificationoftheseparticulartargetvalu essee(Platt,1999).

4 ,butit (2002;2001)successfullyuseda moregeneralmethodbasedonIsotonicRegressi on(Robertsonetal.,1988)tocalibratepredic tionsfromSVMs,Naive Bayes,boostedNaive Bayes, thatthemappingfunctionbeisotonic(monoton icallyincreasing).Thatis,giventhepredict ionsfifroma modelandthetruetargetsyi, thebasicassumptioninIsotonicRegressionis that:yi=m(fi) + i(5) :trainingset(fi; yi)sortedaccordingtofi2 Initialize^mi;i=yi,wi;i= 13 While9i s:t:^mk;i 1 ^mi;lSetwk;l=wk;i 1+wi;lSet^mk;l= (wk;i 1^mk;i 1+wi;l^mi;l)=wk;lReplace^mk;i 1and^mi;lwith^mk; :^m(f) = ^mi;j, forfi< f fjwheremisanisotonic(monotonicallyincrea sing) ,givena trainset(fi; yi), theIsotonicRegres-sionproblemis findingtheisotonicfunction^msuchthat^m=a rgminzX(yi z(fi))2(6)Onealgorithmthatfindsa stepwiseconstantsolutionfortheIsotonicRe gressionproblemis pair-adjacentviolators(PAV)algorithm(Aye ret al.)

5 ,1955) thecaseofPlattcalibration,if weusethemodeltrain-ingset(xi; yi)to getthetrainingset(f(xi); yi)forIsotonicRegression, DataSetsWe comparealgorithmson8 , COVTYPEandLETTER arefromUCIR epository(Blake &Merz,1998).COVTYPE hasbeenconvertedtoa binaryproblembytreatingthelargestclassas positive andtherestasnegative. We convertedLETTER tobooleantwo O aspositive andtheremaining25lettersasnegative,yield inga ,yieldinga difficult,butwellbalanced, (Gualtierietal.,1999)wherethedifficultcl assSoybean-mintillis thepositive is a problemfromtheStanfordLinearAccelerator. #ATTRTRAINSIZETEST SIZE%POZADULT14/10440003522225%COVTYPE54 40002500036% Qualitative ,wetrainmodelsusingtende-cisiontreestyle s,neuralnetsofmany sizes,SVMswithmany kernels, , ,foreachproblem,andforeachlearningalgori thm, ,modelcalibrationcanbevisualizedwithre-l iabilitydiagrams(DeGroot&Fienberg,1982).

6 First, , , ,themeanpredictedvalueis plottedagainstthetruefractionofpositive showshistogramsofthepredictedvalues(topr ow)andreliabilitydiagrams(middleandbotto mrows) is thatthey displaya sig-moidalshapeonsevenoftheeightproblems 1, motivatingtheuseofa sigmoidto ofthefigureshow sigmoidsfittedusingPlatt s (toprowinFigure1),notethatalmostalltheva luespredictedbyboostedtreeslieinthecentr alregionwithfewpredictionsapproaching0 ,ahighlyskeweddatasetthathasonly3%positi ve ,thoughcarefulexaminationofthehistograms howsthatevenonthisproblemthereis a sharpdropinthenumberofcasespredictedtoha ve showhowcalibrationtransformspredictions, weplothistogramsandreliabilitydiagramsfo rtheeightproblems1 BecauseboostingoverfitsontheADULT problem,thebestperformanceis allowedtocontinueformoreiterations,it willdisplaythesamesigmoidalshapeonADULT (Figure2)andIso-tonicRegression(Figure3) .

7 Thefiguresshow thatcalibra-tionundoestheshiftinprobabil itymasscausedbyboost-ing:aftercalibratio nmany morecaseshave predictedprob-abilitiesnear0 , ,transformingpre-dictionsusingPlattScali ngorIsotonicRegressionyieldsa significantimprovementinthepredictedprob abilities, apparentinthehistograms:becauseIsotonicR egressiongeneratesa piecewiseconstantfunction,thehistogramsa recoarse,whilethehistogramsgeneratedbyPl attScalingaresmoother. See(Niculescu-Mizil&Caruana,2005) showsthepredictionhistogramsforthetenlea rn-ingmethodsontheSLAC problembeforecalibration,andaftercalibra tionwithPlatt s s sigmoid-shapedreliabilityplots(secondand thirdrows,respectively, ofFigure6). ,thesigmoidalshapeofthereliabilityplotsc o-occurswiththeconcentrationofmassinthec enterofthehistogramsofpredictedvalues, and1 whichshowshistogramsofpredictedvaluesand reliabilityplotsforneuralnetstellsa ,a taskthatisn t ,scalingmighthurtneuralnetcalibrationa s methodhave troublefittingthetailsproperly, effec-tivelypushingpredictionsawayfrom0 and1 Figure4 looksimilarto thehistogramsforboostedtreesafterPlattSc alinginFigure2,givingusconfidencethatthe histogramsreflecttheunderlyingstruc-2 SVMpredictionsarescaledto[0,1]by(x min)=(max min).

8 PredictingGoodProbabilitiesWithSupervise dLearning 0 0 0 1 Fraction of Positives 0 1 Fraction of Positives 0 1 0 1 Fraction of PositivesMean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 0 1 Fraction of PositivesMean Predicted 0 0 0 1 0 1 Fraction of PositivesMean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 0 1 Fraction of PositivesMean Predicted s method.

9 0 0 0 1 0 1 Fraction of PositivesMean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 0 1 Fraction of PositivesMean Predicted ,wecouldconcludethattheLETTERandHSproble ms,giventheavailablefea-tures,have welldefinedclasseswitha smallnumberofcasesinthe gray region,whileintheSLAC problemthetwo classeshave is interestingtonotethatneuralnetworkswitha singlesigmoidoutputunitcanbeviewedasa linearclassifier(inthespanofit s hiddenunits)witha SVMsandboostedtreesaftertheyhave beencalibratedusingPlatt s is notsurprisingthatlogisticregressionpre-P redictingGoodProbabilitiesWithSupervised Learning 0 0 0 1 Fraction of Positives 0 1 Fraction of Positives 0 1 0 1 Fraction of PositivesMean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 0 1 Fraction of PositivesMean Predicted 0 0 0 1 0 1 Fraction of PositivesMean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0

10 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 Mean Predicted Value 0 1 0 1 Fraction of PositivesMean Predicted s ,it is ,wecandeducethatregulardecisiontreesalso arewellcalibratedonaver-age,inthesenseth atif decisiontreesaretrainedondif-ferentsampl esofthedataandtheirpredictionsaveraged, , a sin-gledecisiontreehashighvarianceandthi svarianceaffectsit s , ,sixandseveninFigure6 showthehistograms(beforeandaftercalibrat ion)andreliabilitydiagramsforlogisticreg ression,baggedtrees,anddecisiontreesonth eSLAC , ,andnotwellcalibratedonHS,COVTYPE, ,RFsseemtoexhibit,althoughtoa lesserex-tent,thesamebehaviorasthemaxmar ginmethods:pre-dictedvaluesareslightlypu shedtowardthemiddleofthehistogramandther eliabilityplotsshowa sigmoidalshape(moreaccentuatedontheLETTE R problemsandlesssoonCOVTYPE,MEDISandHS).


Related search queries