Example: bachelor of science

Intro duction - SVMs

ATutorialonSupportVectorRegressionAlexJ Smola GMD BernhardSch olkopf GMD NeuroCOLT TechnicalReportSeriesNC TR October ProducedaspartoftheESPRITW orkingGroupinNeuralandComputationalLearn ingII NeuroCOLT FormoreinformationseetheNeuroCOLT websitehttp www neurocolt comoremailneurocolt neurocolt com smola first gmd deGMDFIRST RudowerChaussee Berlin Germany bs first gmd deGMDFIRST RudowerChaussee Berlin Germany Received OCT Introduction AbstractInthistutorialwegiveanoverviewof thebasicideasunderlyingSupportVector SV machinesforregressionandfunctionestimati on Further more weincludeasummaryofcurrentlyusedalgorith msfortrainingSVmachines coveringboththequadratic orconvex programmingpartandadvancedmethodsfordeal ingwithlargedatasets Finally wementionsomemodi cationsandextensionsthathavebeenappliedt othestandardSValgorith

Intro duction Abstract In this tutorial w egiv eano v erview of the basic ideas underlying Supp ort V ector SV mac hines for regression and function estimation

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Intro duction - SVMs

1 ATutorialonSupportVectorRegressionAlexJ Smola GMD BernhardSch olkopf GMD NeuroCOLT TechnicalReportSeriesNC TR October ProducedaspartoftheESPRITW orkingGroupinNeuralandComputationalLearn ingII NeuroCOLT FormoreinformationseetheNeuroCOLT websitehttp www neurocolt comoremailneurocolt neurocolt com smola first gmd deGMDFIRST RudowerChaussee Berlin Germany bs first gmd deGMDFIRST RudowerChaussee Berlin Germany Received OCT Introduction AbstractInthistutorialwegiveanoverviewof thebasicideasunderlyingSupportVector SV machinesforregressionandfunctionestimati on Further more weincludeasummaryofcurrentlyusedalgorith msfortrainingSVmachines coveringboththequadratic orconvex programmingpartandadvancedmethodsfordeal ingwithlargedatasets Finally wementionsomemodi cationsandextensionsthathavebeenappliedt othestandardSValgorithm anddiscusstheaspectofregularizationandca pacitycontrolfromaSVpointofview IntroductionThepurposeofthispaperistwofo ld Itshouldserveasaself containedintro ductiontoSupportVectorregressionforreade rsnewtothisrapidlydeveloping eldofresearch

2 Ontheotherhand itattemptstogiveanoverviewofrecentdevelo pmentsinthe eld Tothisend wedecidedtoorganizetheessayasfollows Westartbygiv ingabriefoverviewofthebasictechniquesins ections and plusashortsummarywithanumberof guresanddiagramsinsection Section reviewscurrentalgorithmictechniquesusedf oractuallyimplementingSVma chines Thismaybeofmostinterestforpracticioners Thefollowingsectionscovermoreadvancedtop icssuchasextensionsofthebasicSValgorithm sec connectionsbetweenSVmachinesandregulariz ationtheory sec andmethodsforcarryingoutmodelselectionan dcapacitycontrol sec Weconcludewithadiscussionofopenquestions andproblemsandcurrentdirec tionsofSVresearch Mostoftheresultspresentedinthisreviewpap eralreadyhavebeenpublishedelsewhere butthecomprehensivepresentationsandsomed etailsarenew HistoricBackgroundTheSValgorithmisanonli neargeneralizationoftheGeneralizedPortra ital gorithmdevelopedinRussiainthesixties VapnikandLerner VapnikandChervonenkis Assuch itis rmlygroundedintheframeworkofstatisticall earningtheory orVCtheory

3 Whichhasbeendevelopedoverthelastthreedec adesbyVapnik ChervonenkisandothersVapnikandChervonenk is Vapnik Inanutshell VCtheorycharacterizespropertiesoflearnin gmachineswhichenablethemtogeneralizewell tounseendata Initspresentform theSVmachinewasdevelopedatAT TBellLabora toriesbyVapnikandco workers Boseretal Guyonetal CortesandVapnik Sch olkopfetal Vapniketal Duetothisin dustrialcontext SVresearchhasuptodatehadasoundorientatio ntowardsreal worldapplications InitialworkfocusedonOCR opticalcharacterrecog nition Withinashortperiodoftime SVclassi ersbecamecompetitivewiththebestavailable systemsforbothOCRandobjectrecognitiontas ks Sch olkopf Asimilarapproach howeverusinglinearinsteadofquadraticprog ramming wastakenatthesametimeintheUSA mainlybyMangasarian Introduction etal Sch olkopfetal AcomprehensivetutorialonSVclassi ershasbeenpublishedbyBurges Butalsoinregressionandtimeseriespre dictionapplications excellentperformancesweresoonobtained M ulleretal Druckeretal Stitsonetal MatteraandHaykin

4 AsnapshotofthestateoftheartinSVlearningw asrecentlytakenattheannualNeuralInformat ionProcessingSystemsconference Sch olkopfetal SVlearninghasnowevolvedintoanactiveareao fresearch Moreover itisintheprocessofenteringthestandardmet hodstoolboxofmachinelearning Haykin CherkasskyandMulier e g TheBasicIdeaSupposewehavearegiventrainin gdataf x y x y g X R whereXdenotesthespaceoftheinputpatterns forinstance Rd Thesemightbe forinstance exchangeratesforsomecurrencymeasuredatsu bsequentdaystogetherwithcorrespondingeco nometricindicators In SVregressionVapnik ourgoalisto ndafunctionf x thathasatmost deviationfromtheactuallyobtainedtargetsy iforallthetrainingdata andatthesametime isas ataspossible Inotherwords wedonotcareabouterrorsaslongastheyareles sthan butwillnotacceptanydeviationlargerthanth is Thismaybeimportantifyouwanttobesurenotto losemorethan moneywhendealingwithexchangerates forinstance Forpedagogicalreasons webeginbydescribingthecaseoflinearfuncti onsf takingtheformf x hw xi bwithw X b R whereh idenotesthedotproductinX Flatnessinthecaseof meansthatoneseekssmallw

5 OnewaytoensurethisistominimizetheEuclide annorm i e kwk Formallywecanwritethisproblemasaconvexop timizationproblembyrequiring minimize kwk subjectto yi hw xii b hw xii b yi Thetacitassumptionin wasthatsuchafunctionfactuallyexiststhata pproximatesallpairs xi yi with precision orinotherwords thattheconvexoptimizationproblemisfeasib le Sometimes however thismaynotbethecase orwealsomaywanttoallowforsomeerrors Analogouslytothe softmargin lossfunctionin CortesandVapnik onecanintroduceslackvariables i itocopewithotherwiseinfeasibleconstraint softheoptimizatio See Smola foranoverviewoverotherwaysofspecifying atnessofsuchfunctions Introduction problem Hencewearriveattheformulationstatedin Vapnik minimize kwk C Pi i i subjectto yi hw xii b ihw xii b yi i i i TheconstantC determinesthetradeo betweenthe atnessoffandtheamountuptowhichdeviations largerthan aretolerated Theformulationabovecorrespondstodealingw ithasocalled insensitivelossfunctionj j describedbyj j ifj j j j otherwise Fig depictsthesituationgraphically

6 Onlythepointsoutsidetheshadedregioncontr ibutetothecostinsofar asthedeviationsarepenalizedinalinearfash ion Itturnsoutthattheoptimizationproblem canbesolvedmorexxxxxxxxxxxxxx+ x + 0 Figure Thesoftmarginlosssettingcorrespondsforal inearSVmachine easilyinitsdualformulation Moreover aswewillseeinSec thedualformulationprovidesthekeyforexten dingSVmachinetononlinearfunctions Hencewewilluseastandarddualizationmethod utilizingLagrangemultipliers asdescribedine g Fletcher DualFormulationandQuadraticProgrammingTh ekeyideaistoconstructaLagrangefunctionfr omboththeobjectivefunc tion itwillbecalledtheprimalobjectivefunction intherestofthisarticle andthecorrespondingconstraints byintroducingadualsetofvariables Itcanbeshownthatthisfunctionhasasaddlepo intwithrespecttotheprimalanddualvariable sattheoptimalsolution Fordetailsseee g Goldstein Mangasarian McCormick Vanderbei a andtheexplanationsinsection Henceweproceedasfollows L kwk C Xi i i Xi i i yi hw xii b Introduction Xi i i yi hw xii b Xi i i i i

7 Itisunderstoodthatthedualvariablesin havetosatisfypositivitycon straints i e i i i i Itfollowsfromthesaddlepointconditionthat thepartialderivativesofLwithrespecttothe primalvariables w b i i havetovanishforoptimality bL P i i i wL w P i i i xi iL C i i Substituting and into yieldsthedualoptimizationproblem maximize Pi j i i j j hxi xji Pi i i Pi yi i i subjectto Pi i i i i C Inderiving wealreadyeliminatedthedualvariables i ithroughcondition asthesevariablesdidnotappearinthedualobj ectivefunctionanymorebutonlywerepresenti nthedualfeasibilityconditions Eq canberewrittenasfollowsw Xi i i xiandthereforef x Xi i i hxi xi b Thisistheso calledSupportVectorexpansion i e wcanbecompletelyde scribedasalinearcombinationofthetraining patternsxi Inasense thecomplexityofafunction srepresentationbySVsisindependentofthedi men sionalityoftheinputspaceX anddependsonlyonthenumberofSVs More over thecompletealgorithmcanbedescribedinterm sofdotproductsbetweenthedata Evenwhenevaluatingf x

8 Weneednotcomputewexplicitly al thoughthismaybecomputationallymoree cientinthelinearsetting Theseobservationswillcomehandyfortheform ulationofanonlinearextension ComputingbSofarweneglectedtheissueofcomp utingb Thelattercanbedonebyex ploitingthesocalledKarush Kuhn Tucker KKT conditions Karush KuhnandTucker Thesestatethatattheoptimalsolutionthepro ductbetweendualvariablesandconstraintsha stovanish IntheSVcasethismeans i i yi hw xii b i i yi hw xii b Kernels and C i i C i i Thisallowsustomakeseveralusefulconclusio ns Firstlyonlysamples xi yi withcorresponding i Clieoutsidethe insensitivetubearoundf Sec ondly i i i e therecanneverbeasetofdualvariables i iwhicharebothsimultaneouslynonzeroasthis wouldrequirenonzeroslacksinbothdirection s Finallyfor i C wehave i andmoreoverthesecondfactorin hastovanish Hencebcanbecomputedasfollows b yi hw xii for i C b yi hw xii for i C Anotherwayofcomputingbwillbediscussedint hecontextofinteriorpointop timization cf Sec Therebturnsouttobeaby productoftheoptimizationprocess Hencefurtherconsiderationsshallbedeferre dtothecorrespondingsec


Related search queries