A Short Introduction to Boosting - Home | Computer Science ...

Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999.(In Japanese, translation by Naoki Abe.)A ShortIntroductiontoBoostingYoav ResearchShannonLaboratory180 ParkAvenueFlorhamPark, yoav, schapire yoav, schapire a generalmethodforimprovingtheaccuracy ofany ,andexplainstheun-derlyingtheoryofboosti ng,includinganexplanationofwhyboostingof tendoesnotsufferfromover ttingaswellasboosting's relationshipto horse-racinggambler, hopingtomaximizehiswinnings,decidestocre atea computerprogramthatwillaccuratelypredict thewinnerofa horseracebasedontheusualinformation(numb erofracesrecentlywonbyeachhorse,bettingo ddsforeachhorse,etc.)

To createsucha program,heasksa highlysuccessfulexpertgamblertoexplainhi sbettingstrategy. Notsurprisingly, theexpertis unabletoarticulatea grandsetofrulesforselectinga ,whenpresentedwiththedatafora speci csetofraces,theexperthasnotroublecomingu pwitha ruleofthumb forthatsetofraces(suchas, Betonthehorsethathasrecentlywonthemostra ces or Betonthehorsewiththemostfavoredodds ).Althoughsucha ruleofthumb,byitself,is obviouslyveryroughandinaccurate,it is notunreasonabletoexpectit ,byrepeatedlyaskingtheexpert's opinionondifferentcollectionsofraces,the gambleris abletoextractmany ,therearetwo problemsfacedbythegambler:First,how shouldhechoosethecollectionsofracesprese ntedtotheexpertsoastoextractrulesofthumb fromtheexpertthatwillbethemostuseful?

Second,oncehehascollectedmany rulesofthumb,how canthey becombinedintoa single,highlyaccuratepredictionrule?Boos tingreferstoa generalandprovablyeffective methodofproducinga veryaccuratepre-dictionrulebycombiningro ughandmoderatelyinaccuraterulesofthumbin a , ,wedescribesomeofthebasicunderlyingtheor yofboosting,includinganexplanationofwhyi t oftentendsnottoover theoreticalframeworkforstudyingmachinele arningcalledthe PAC learningmodel,duetoValiant[46];seeKearns andVazirani[32]fora [30,31] werethe rsttoposethequestionofwhethera weak learn-ingalgorithmwhichperformsjustsligh tlybetterthanrandomguessinginthePAC modelcanbe boosted intoanarbitrarilyaccurate strong [38]cameupwiththe , Freund[17]developeda muchmoreef cientboostingalgorithmwhich,althoughopti malina certainsense, rstexperimentswiththeseearlyboostingalgo rithmswerecarriedoutbyDrucker, SchapireandSimard[16] ,introducedin1995byFreundandSchapire[23] ,solvedmany ofthepracticaldif cultiesoftheearlierboostingalgorithms,an dis thefocusofthispaper.

PseudocodeforAdaBoostis trainingset whereeach belongstosomedomainorinstancespace , andeachlabel is insomelabelset . Formostofthispaper, weassume ! "# %$; later, givenweakorbaselearningalgorithmrepeated lyina seriesofrounds&' ( ! *). Oneofthemainideasofthealgorithmis tomaintaina +onround&isdenoted,.-/ +0 . Initially, allweightsaresetequally, butoneachround,theweightsofincorrectlycl assi edexamplesareincreasedsothattheweaklearn eris 's jobis to ndaweakhypothesis12-435 6 7 % "# %$appropriateforthedistribution,8-. Thegoodnessofa weakhypothesisis measuredbyitserror9-: <;>= @? A2B CD1E- GF H JIK L JMN BPORQ SUTWVXZYS,#- +0 Noticethattheerrorismeasuredwithrespectt othedistribution,[ ,theweaklearnermaybeanalgorithmthatcanus etheweights,\ , whenthisis notpossible,a subsetofthetrainingexamplescanbesampleda ccordingto,8-, andthese(unweighted) thehorse-racingexample,theinstances : correspondtodescriptionsofhorseraces(suc haswhichhorsesarerunning,whataretheodds, thetrackrecordsofeachhorse,etc.)]

2 Given: : / * where , ! "# %$Initialize,[ P+ > .For& ! *): Trainweaklearnerusingdistribution,-. Getweakhypothesis1 -3! 6 ! "# %$witherror9- H;=/ @? ABCR1Z- GF H UI Choose -: 9-9- . Update:,#- K +0 ,#-/ P+ - Bif1Z- Bif1Z- GF ,#-/ P+ 5 / ! - 1Z- * -where -is a normalizationfactor(chosensothat,- K willbea distribution).Outputthe nalhypothesis:" K > $#&% ' )(+*L-X -W1E- K -, Figure1 % give theoutcomes( ,thewinners) , -hasbeenreceived,AdaBoostchoosesa parameter -asinthe , -measurestheimportancethatis assignedto1 -. Notethat -/.10if9-32 54(whichwecanassumewithoutlossofgenerali ty),andthat ,8-is nextupdatedusingtheruleshowninthe toincreasetheweightofexamplesmisclassi edby12-, andtodecreasetheweightofcorrectlyclassi ,theweighttendstoconcentrateon hard "is a weightedmajorityvoteofthe)weakhypotheses where >-is theweightassignedto1 [42]show how ,foreachinstance ,theweakhypothesis1 -outputsa prediction1 - K 76whosesignis thepredictedlabel( or"# ) andwhosemagnitude81E- K 98givesa measureof con dence ,however, wefocusonlyonthecaseofbinary( 7 !]

". $) # roundsmarginFigure2 [41].Left: thetrainingandtesterrorcurves(lowerandup percurves,respectively)ofthecombinedclas si erasa eraswellasthetesterrorofthe nalcombinedclassi : Thecumulative distributionofmarginsofthetrainingexampl esafter5,100and1000iterations,indicatedb yshort-dashed,long-dashed(mostlyhidden)a ndsolidcurves, -. Sincea hypothesisthatguesseseachinstance's classatrandomhasanerrorrateof 4(onbinaryproblems), E-thusmeasureshow muchbetterthanrandomare1E-'s [23]prove thatthetrainingerror(thefractionofmistak esonthetrainingset)ofthe nalhypothesis"is at most - 4 9- 9-W G - -2 ( 4L- -, (1)Thus,if eachweakhypothesisis slightlybetterthanrandomsothat.

Forsome 0, similarpropertyis , previousalgorithmsrequiredthatsucha lowerbound beknowna ,knowl-edgeofsucha boundis verydif ,ontheotherhand, thebasisofitsname Ada is shortfor adaptive. TheboundgiveninEq.(1),combinedwiththebou ndsongeneralizationerrorgivenbelow,prove thatAdaBoostisindeeda boostingalgorithminthesensethatit canef cientlyconverta weaklearningalgorithm(whichcanalwaysgene ratea hypothesiswitha weakedgeforanydistribution)intoa stronglearningalgorithm(whichcangenerate a hypothesiswithanarbitrarilylow errorrate,givensuf cientdata).4051015202530boosting setof27benchmarkproblemsasreportedbyFreu ndandSchapire[21].

Eachpointineachscatterplotshowsthetester rorrateofthetwo competingalgorithmsona -coordinateofeachpointgivesthetesterrorr ate(inpercent) ,andthe -coordinategivestheerrorrateofboostingst umps(leftplot) (rightplot).Allerrorrateshave [23]showedhow toboundthegeneralizationerrorofthe nalhypothesisintermsofitstrainingerror, thesamplesize , theVC-dimension oftheweakhypothesisspaceandthenumberofbo ostingrounds . (TheVC-dimensionis a standardmeasureofthe complexity ofa ,forinstance,Blumeret al.[5].)Speci cally, they usedtechniquesfromBaumandHaussler[4]tosh ow thatthegeneralizationerror, withhighprobability, is at most "!

#where %$ tif runfortoomany rounds, ,as , , in earlyexperiments,severalauthors[9,15,36] observedempiricallythatboost-ingoftendoe snotover t, , it wasobservedthatAdaBoostwouldsometimescon tinuetodrive downthegeneralizationerrorlongafterthetr ainingerrorhadreachedzero, , showsthetrainingandtestcurvesofrunningbo ostingontopofQuinlan's [37]onthe letter ndings,Schapireet al.[41],followingtheworkofBartlett[2],ga ve analternative ErrorNumber of ClassesAdaBoostSleeping-expertsRocchioNa ive-BayesPrTFIDF510152025303546810121416 1820% ErrorNumber of ClassesAdaBoostSleeping-expertsRocchioNa ive-BayesPrTFIDFF igure4:ComparisonoferrorratesforAdaBoost andfourothertextcategorizationmethods(na ive Bayes,probabilisticTF-IDF, Rocchioandsleepingexperts)asreportedbySc hapireandSinger[43].

Thealgorithmsweretestedontwo textcorpora Reutersnewswirearticles(left)andAPnewswi reheadlines(right) andwithvaryingnumbersofclasslabelsasindi catedonthe -axisofeach Z is de nedtobe L- -W1Z- K L- - (2)It is a numberinC@ ! "# Iwhichis positive if andonlyif"correctlyclassi , themagnitudeofthemargincanbeinterpreteda sa measureofcon superiorupperboundonthegeneralizationerr or. Speci cally, thegeneralizationerroris at most ;=5C =&'% * E 2 I " (3)forany 0withhighprobability. Notethatthisboundis entirelyindependentof), ,Schapireet particularlyaggressive atreducingthemargin(ina quanti ablesense)sinceit concentratesontheexampleswiththesmallest margins(whetherpositive ornegative).

A Short Introduction to Boosting - Home | Computer Science ...

Tags:

Information

Transcription of A Short Introduction to Boosting - Home | Computer Science ...

Related search queries

A Short Introduction to Boosting - Home | Computer Science ...

Tags:

Information

Documents from same domain

Related documents

Related search queries