Example: dental hygienist

A Short Introduction to Boosting - Home | Computer Science ...

Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999.(In Japanese, translation by Naoki Abe.)A ShortIntroductiontoBoostingYoav ResearchShannonLaboratory180 ParkAvenueFlorhamPark, yoav, schapire yoav, schapire a generalmethodforimprovingtheaccuracy ofany ,andexplainstheun-derlyingtheoryofboosti ng,includinganexplanationofwhyboostingof tendoesnotsufferfromover ttingaswellasboosting's relationshipto horse-racinggambler, hopingtomaximizehiswinnings,decidestocre atea computerprogramthatwillaccuratelypredict thewinnerofa horseracebasedontheusualinformation(numb erofracesrecentlywonbyeachhorse,bettingo ddsforeachhorse,etc.)

A Short Introduction to Boosting Yoav Freund Robert E. Schapire ... @research.att.com Abstract Boosting is a general method for improving the accuracy of any given learning algorithm. This short overview paper introduces the boosting algorithm AdaBoost, and explains the un- ... Introduction A horse-racing gambler, hoping to maximize his ...

Tags:

  Introduction, Short, Boosting, A short introduction to boosting

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of A Short Introduction to Boosting - Home | Computer Science ...

1 Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999.(In Japanese, translation by Naoki Abe.)A ShortIntroductiontoBoostingYoav ResearchShannonLaboratory180 ParkAvenueFlorhamPark, yoav, schapire yoav, schapire a generalmethodforimprovingtheaccuracy ofany ,andexplainstheun-derlyingtheoryofboosti ng,includinganexplanationofwhyboostingof tendoesnotsufferfromover ttingaswellasboosting's relationshipto horse-racinggambler, hopingtomaximizehiswinnings,decidestocre atea computerprogramthatwillaccuratelypredict thewinnerofa horseracebasedontheusualinformation(numb erofracesrecentlywonbyeachhorse,bettingo ddsforeachhorse,etc.)

2 To createsucha program,heasksa highlysuccessfulexpertgamblertoexplainhi sbettingstrategy. Notsurprisingly, theexpertis unabletoarticulatea grandsetofrulesforselectinga ,whenpresentedwiththedatafora speci csetofraces,theexperthasnotroublecomingu pwitha ruleofthumb forthatsetofraces(suchas, Betonthehorsethathasrecentlywonthemostra ces or Betonthehorsewiththemostfavoredodds ).Althoughsucha ruleofthumb,byitself,is obviouslyveryroughandinaccurate,it is notunreasonabletoexpectit ,byrepeatedlyaskingtheexpert's opinionondifferentcollectionsofraces,the gambleris abletoextractmany ,therearetwo problemsfacedbythegambler:First,how shouldhechoosethecollectionsofracesprese ntedtotheexpertsoastoextractrulesofthumb fromtheexpertthatwillbethemostuseful?

3 Second,oncehehascollectedmany rulesofthumb,how canthey becombinedintoa single,highlyaccuratepredictionrule?Boos tingreferstoa generalandprovablyeffective methodofproducinga veryaccuratepre-dictionrulebycombiningro ughandmoderatelyinaccuraterulesofthumbin a , ,wedescribesomeofthebasicunderlyingtheor yofboosting,includinganexplanationofwhyi t oftentendsnottoover theoreticalframeworkforstudyingmachinele arningcalledthe PAC learningmodel,duetoValiant[46];seeKearns andVazirani[32]fora [30,31] werethe rsttoposethequestionofwhethera weak learn-ingalgorithmwhichperformsjustsligh tlybetterthanrandomguessinginthePAC modelcanbe boosted intoanarbitrarilyaccurate strong [38]cameupwiththe , Freund[17]developeda muchmoreef cientboostingalgorithmwhich,althoughopti malina certainsense, rstexperimentswiththeseearlyboostingalgo rithmswerecarriedoutbyDrucker, SchapireandSimard[16] ,introducedin1995byFreundandSchapire[23] ,solvedmany ofthepracticaldif cultiesoftheearlierboostingalgorithms,an dis thefocusofthispaper.

4 PseudocodeforAdaBoostis trainingset whereeach belongstosomedomainorinstancespace , andeachlabel is insomelabelset . Formostofthispaper, weassume ! "# %$; later, givenweakorbaselearningalgorithmrepeated lyina seriesofrounds&' ( ! *). Oneofthemainideasofthealgorithmis tomaintaina +onround&isdenoted,.-/ +0 . Initially, allweightsaresetequally, butoneachround,theweightsofincorrectlycl assi edexamplesareincreasedsothattheweaklearn eris 's jobis to ndaweakhypothesis12-435 6 7 % "# %$appropriateforthedistribution,8-. Thegoodnessofa weakhypothesisis measuredbyitserror9-: <;>= @? A2B CD1E- GF H JIK L JMN BPORQ SUTWVXZYS,#- +0 Noticethattheerrorismeasuredwithrespectt othedistribution,[ ,theweaklearnermaybeanalgorithmthatcanus etheweights,\ , whenthisis notpossible,a subsetofthetrainingexamplescanbesampleda ccordingto,8-, andthese(unweighted) thehorse-racingexample,theinstances : correspondtodescriptionsofhorseraces(suc haswhichhorsesarerunning,whataretheodds, thetrackrecordsofeachhorse,etc.)]

5 2 Given: : / * where , ! "# %$Initialize,[ P+ > .For& ! *): Trainweaklearnerusingdistribution,-. Getweakhypothesis1 -3! 6 ! "# %$witherror9- H;=/ @? ABCR1Z- GF H UI Choose -: 9-9- . Update:,#- K +0 ,#-/ P+ - Bif1Z- Bif1Z- GF ,#-/ P+ 5 / ! - 1Z- * -where -is a normalizationfactor(chosensothat,- K willbea distribution).Outputthe nalhypothesis:" K > $#&% ' )(+*L-X -W1E- K -, Figure1 % give theoutcomes( ,thewinners) , -hasbeenreceived,AdaBoostchoosesa parameter -asinthe , -measurestheimportancethatis assignedto1 -. Notethat -/.10if9-32 54(whichwecanassumewithoutlossofgenerali ty),andthat ,8-is nextupdatedusingtheruleshowninthe toincreasetheweightofexamplesmisclassi edby12-, andtodecreasetheweightofcorrectlyclassi ,theweighttendstoconcentrateon hard "is a weightedmajorityvoteofthe)weakhypotheses where >-is theweightassignedto1 [42]show how ,foreachinstance ,theweakhypothesis1 -outputsa prediction1 - K 76whosesignis thepredictedlabel( or"# ) andwhosemagnitude81E- K 98givesa measureof con dence ,however, wefocusonlyonthecaseofbinary( 7 !]

6 ". $) # roundsmarginFigure2 [41].Left: thetrainingandtesterrorcurves(lowerandup percurves,respectively)ofthecombinedclas si erasa eraswellasthetesterrorofthe nalcombinedclassi : Thecumulative distributionofmarginsofthetrainingexampl esafter5,100and1000iterations,indicatedb yshort-dashed,long-dashed(mostlyhidden)a ndsolidcurves, -. Sincea hypothesisthatguesseseachinstance's classatrandomhasanerrorrateof 4(onbinaryproblems), E-thusmeasureshow muchbetterthanrandomare1E-'s [23]prove thatthetrainingerror(thefractionofmistak esonthetrainingset)ofthe nalhypothesis"is at most - 4 9- 9-W G - -2 ( 4L- -, (1)Thus,if eachweakhypothesisis slightlybetterthanrandomsothat.

7 Forsome 0, similarpropertyis , previousalgorithmsrequiredthatsucha lowerbound beknowna ,knowl-edgeofsucha boundis verydif ,ontheotherhand, thebasisofitsname Ada is shortfor adaptive. TheboundgiveninEq.(1),combinedwiththebou ndsongeneralizationerrorgivenbelow,prove thatAdaBoostisindeeda boostingalgorithminthesensethatit canef cientlyconverta weaklearningalgorithm(whichcanalwaysgene ratea hypothesiswitha weakedgeforanydistribution)intoa stronglearningalgorithm(whichcangenerate a hypothesiswithanarbitrarilylow errorrate,givensuf cientdata).4051015202530boosting setof27benchmarkproblemsasreportedbyFreu ndandSchapire[21].

8 Eachpointineachscatterplotshowsthetester rorrateofthetwo competingalgorithmsona -coordinateofeachpointgivesthetesterrorr ate(inpercent) ,andthe -coordinategivestheerrorrateofboostingst umps(leftplot) (rightplot).Allerrorrateshave [23]showedhow toboundthegeneralizationerrorofthe nalhypothesisintermsofitstrainingerror, thesamplesize , theVC-dimension oftheweakhypothesisspaceandthenumberofbo ostingrounds . (TheVC-dimensionis a standardmeasureofthe complexity ofa ,forinstance,Blumeret al.[5].)Speci cally, they usedtechniquesfromBaumandHaussler[4]tosh ow thatthegeneralizationerror, withhighprobability, is at most "!

9 #where %$ tif runfortoomany rounds, ,as , , in earlyexperiments,severalauthors[9,15,36] observedempiricallythatboost-ingoftendoe snotover t, , it wasobservedthatAdaBoostwouldsometimescon tinuetodrive downthegeneralizationerrorlongafterthetr ainingerrorhadreachedzero, , showsthetrainingandtestcurvesofrunningbo ostingontopofQuinlan's [37]onthe letter ndings,Schapireet al.[41],followingtheworkofBartlett[2],ga ve analternative ErrorNumber of ClassesAdaBoostSleeping-expertsRocchioNa ive-BayesPrTFIDF510152025303546810121416 1820% ErrorNumber of ClassesAdaBoostSleeping-expertsRocchioNa ive-BayesPrTFIDFF igure4:ComparisonoferrorratesforAdaBoost andfourothertextcategorizationmethods(na ive Bayes,probabilisticTF-IDF, Rocchioandsleepingexperts)asreportedbySc hapireandSinger[43].

10 Thealgorithmsweretestedontwo textcorpora Reutersnewswirearticles(left)andAPnewswi reheadlines(right) andwithvaryingnumbersofclasslabelsasindi catedonthe -axisofeach Z is de nedtobe L- -W1Z- K L- - (2)It is a numberinC@ ! "# Iwhichis positive if andonlyif"correctlyclassi , themagnitudeofthemargincanbeinterpreteda sa measureofcon superiorupperboundonthegeneralizationerr or. Speci cally, thegeneralizationerroris at most ;=5C =&'% * E 2 I " (3)forany 0withhighprobability. Notethatthisboundis entirelyindependentof), ,Schapireet particularlyaggressive atreducingthemargin(ina quanti ablesense)sinceit concentratesontheexampleswiththesmallest margins(whetherpositive ornegative).


Related search queries