Example: dental hygienist

Visualizing Data using t-SNE

JournalofMachineLearningResearch9 ,5000 LETilburg, King s College Road,M5S3G4 Toronto,ON,CanadaEditor:YoshuaBengioAbst ractWe presenta newtechniquecalled t-SNE thatvisualizeshigh-dimensionaldatabygivi ngeachdatapointa locationina two a variationofStochasticNeighborEmbedding(H intonandRoweis,2002)thatismucheasiertoop timize,andproducessignificantlybettervis ualizationsbyreducingthetendency betterthanexistingtechniquesat creatinga singlemapthatrevealsstructureat many particularlyimportantforhigh-dimensional datathatlieonseveraldifferent,butrelated ,low-dimensionalmanifolds, ,weshow howt-SNEcanuserandomwalksonneighborhoodg raphstoallowtheimplicitstructureofalloft hedatato influencethewayin whicha subsetofthedatais illustratetheperformanceoft-SNEona widevarietyofdatasetsandcompareit withmany othernon-parametricvisualizationtechniqu es,includingSammonmapping,Isomap.

multidimensional scaling 1. Introduction Visualization of high-dimensional data is an important problem in many different domains, and deals with data of widely varying dimensionality. Cell nuclei that are relevant to breast cancer, for example, are described by approximately 30 variables (Street et al., 1993), whereas the pixel ...

Tags:

  Data, Visualization, Multidimensional

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Visualizing Data using t-SNE

1 JournalofMachineLearningResearch9 ,5000 LETilburg, King s College Road,M5S3G4 Toronto,ON,CanadaEditor:YoshuaBengioAbst ractWe presenta newtechniquecalled t-SNE thatvisualizeshigh-dimensionaldatabygivi ngeachdatapointa locationina two a variationofStochasticNeighborEmbedding(H intonandRoweis,2002)thatismucheasiertoop timize,andproducessignificantlybettervis ualizationsbyreducingthetendency betterthanexistingtechniquesat creatinga singlemapthatrevealsstructureat many particularlyimportantforhigh-dimensional datathatlieonseveraldifferent,butrelated ,low-dimensionalmanifolds, ,weshow howt-SNEcanuserandomwalksonneighborhoodg raphstoallowtheimplicitstructureofalloft hedatato influencethewayin whicha subsetofthedatais illustratetheperformanceoft-SNEona widevarietyofdatasetsandcompareit withmany othernon-parametricvisualizationtechniqu es,includingSammonmapping,Isomap.

2 visualization ,dimensionalityreduction,ma nifoldlearning,embeddingalgorithms, differentdomains,anddealswithdataofwidel yvaryingdimensionality. Cellnucleithatarerelevanttobreastcancer, forexample,aredescribedbyapproximately30 variables(Streetetal.,1993),whereasthepi xelintensityvectorsusedtorepresentimages ortheword-countvectorsusedtorepresentdoc umentstypicallyhave ,a varietyoftechniquesforthevisualizationof suchhigh-dimensionaldatahave beenproposed,many ofwhicharereviewedbydeOliveiraandLevkowi tz(2003).Importanttechniquesincludeicono graphicdisplayssuchasChernoff faces(Chernoff, 1973),pixel-basedtechniques(Keim,2000),a ndtechniquesthatrepre-sentthedimensionsi n thedataasverticesin a graph(Battistaet al.,1994).Mostofthesetechniquessimplypro videtoolstodisplaymorethantwo datadimensions,andleave theinterpretationofthec 2008 LaurensvanderMaatenandGeoffrey ,dimensionalityreductionmethodsconvertth ehigh-dimensionaldatasetX=fx1;x2; :::;xngintotwo orthree-dimensionaldataY=fy1;y2; :::;yngthatcanbedisplayedina , werefertothelow-dimensionaldatarepresent ationYasa map, beenproposedthatdifferinthetypeofstructu rethey (PCA;Hotelling,1933)andclassicalmultidim ensionalscaling(MDS.)

3 Torgerson,1952) low-dimensional,non-linearmanifoldit isusu-allymoreimportanttokeepthelow-dime nsionalrepresentationsofverysimilardatap ointsclosetogether, whichis typicallynotpossiblewitha largenumberofnonlineardimensionalityredu ctiontechniquesthataimtopreserve thelocalstructureofdatahave beenproposed,many ofwhicharereviewedbyLeeandVerleysen(2007 ).Inparticular, wementionthefollowingseventechniques:(1) Sammonmapping(Sammon,1969),(2)curvilinea rcomponentsanalysis(CCA;DemartinesandH erault,1997),(3)StochasticNeighborEmbedd ing(SNE;HintonandRoweis,2002),(4)Isomap( Tenenbaumet al.,2000),(5)MaximumVarianceUnfolding(MV U;Weinbergeret al.,2004),(6)LocallyLinearEmbedding(LLE; RoweisandSaul,2000),and(7)LaplacianEigen maps(BelkinandNiyogi,2002).Despitethestr ongper-formanceofthesetechniquesonartifi cialdatasets,they areoftennotverysuccessfulat visualizingreal, , mostofthetechniquesarenotcapableofretain ingboththelocalandtheglobalstructureofth edataina ,a recentstudyrevealsthatevena semi-supervisedvariantofMVUisnotcapableo fseparatinghandwrittendigitsintotheirnat uralclusters(Songet al.

4 ,2007).Inthispaper, wedescribea wayofconvertinga high-dimensionaldatasetintoa matrixofpair-wisesimilaritiesandweintrod ucea newtechnique,called t-SNE , capableofcapturingmuchofthelocalstructur eofthehigh-dimensionaldataverywell,while alsorevealingglobalstructuresuchasthepre senceofclustersat illustratetheperformanceoft-SNEbycompari ngit tothesevendimensionalityreductiontech-ni quesmentionedabove onfive datasetsfroma ,mostofthe(7+1) 5=40mapsarepresentedinthesupplementalmat erial, , weoutlineSNEaspresentedbyHintonandRoweis (2002), ,wepresentt-SNE, , , Section5 showshowt-SNEcanbemodifiedtovisualizerea l-worlddatasetsthatcontainmany morethan10; moredetailin (SNE) datapointxiis theconditionalprobability,pjji, thatxiwouldpickxjasitsneighborif neighborswerepickedinproportiontotheirpr obabilitydensityundera ,pjjiis relativelyhigh,whereasforwidelyseparated datapoints,pjjiwillbealmostinfinitesimal (forreasonablevaluesofthevarianceoftheGa ussian, i).

5 Mathematically, theconditionalprobabilitypjjiis givenbypjji=exp kxi xjk2=2 2i k6=iexp kxi xkk2=2 2i ;(1)where iis thevarianceoftheGaussianthatis centeredondatapointxi. Themethodfordeterminingthevalueof iis presentedlaterin modelingpairwisesimilarities, , it is possibletocomputea similarconditionalprobability,whichweden otebyqjji. We set2thevarianceoftheGaussianthatis employedinthecomputationoftheconditional probabilitiesqjjito1p2. Hence,wemodelthesimilarityofmappointyjto mappointyibyqjji=exp kyi yjk2 k6=iexp( kyi ykk2):Again,sinceweareonlyinterestedinmo delingpairwisesimilarities,wesetqiji= themappointsyiandyjcorrectlymodelthesimi laritybetweenthehigh-dimensionaldata-poi ntsxiandxj, ,SNEaimstofinda low-dimensionaldatarepresentationthatmin imizesthemismatchbetweenpjjiandqjji.

6 Anaturalmeasureofthefaithfulnesswithwhic hqjjimodelspjjiistheKullback-Leiblerdive rgence(whichis inthiscaseequaltothecross-entropy uptoanadditive constant).SNEminimizesthesumofKullback-L eiblerdivergencesoveralldatapointsusinga givenbyC= iKL(PijjQi) = i jpjjilogpjjiqjji;(2)inwhichPirepresentst heconditionalprobabilitydistributionover allotherdatapointsgivendata-pointxi, andQirepresentstheconditionalprobability distributionoverallothermappointsgivenma ppointyi. BecausetheKullback-Leiblerdivergenceis notsymmetric,differenttypesoferrorinthep airwisedistancesinthelow-dimensionalmapa renotweightedequally. Inparticular, thereis a largecostforusingwidelyseparatedmappoint storepresentnearbydatapoints( , datasetsthatconsistofpairwisesimilaritie sbetweenobjectsratherthanhigh-dimensiona lvectorrepresentationsofeachobject, ,humanwordassociationdataconsistsofthepr obabilityofproducingeachpossiblewordinre sponsetoagivenword,asa resultofwhichit is ,welosethepropertythatthedatais a perfectmodelofitselfif weembedit ina spaceofthesamedimensionality, becauseinthehigh-dimensionalspace,weused a differentvariance smallqjjitomodela largepjji), butthereisonlya ,theSNEcostfunctionfocusesonretainingthe localstructureofthedatainthemap(forreaso nablevaluesofthevarianceoftheGaussianint hehigh-dimensionalspace, i).

7 Theremainingparametertobeselectedis thevariance ioftheGaussianthatis centeredovereachhigh-dimensionaldatapoin t,xi. It is notlikelythatthereis a singlevalueof ithatis optimalforalldatapointsinthedatasetbecau sethedensityofthedatais likelytovary. Indenseregions,a smallervalueof iis particularvalueof iinducesa probabilitydistribution,Pi, whichincreasesas binarysearchforthevalueof ithatproducesaPiwitha fixedperplexitythatis definedasPer p(Pi) =2H(Pi);whereH(Pi)is theShannonentropy ofPimeasuredinbitsH(Pi) = jpjjilog2pjji:Theperplexitycanbeinterpre tedasa smoothmeasureoftheeffective fairlyrobusttochangesintheperplexity, isperformedusinga surprisinglysimpleform C yi=2 j(pjji qjji+pijj qijj)(yi yj):Physically, thegradientmaybeinterpretedastheresultan tforcecreatedbya setofspringsbetweenthemappointyiandallot hermappointsyj.

8 Allspringsexerta forcealongthedirection(yi yj).Thespringbetweenyiandyjrepelsorattra ctsthemappointsdependingonwhetherthedist ancebetweenthetwo inthemapis proportionalto itslength,andalsoproportionaltoitsstiffn ess,whichis themismatch(pjji qjji+pijj qijj) initializedbysamplingmappointsrandomlyfr omanisotropicGaussianwithsmallvarianceth atis ,a relativelylargemomentumtermis addedto ,thecurrentgradientisaddedtoanexponentia llydecayingsumofpreviousgradientsinorder todeterminethechangesinthecoordinatesoft hemappointsat , thegradientupdatewitha momentumtermis givenbyY(t)=Y(t 1)+ C Y+ (t) Y(t 1) Y(t 2) ; (t)indicatesthesolutionat iterationt, indicatesthelearningrate,and (t)representsthemomentumat ,intheearlystagesoftheoptimization,Gauss iannoiseis thevarianceofthenoisechangesveryslowlyat thecriticalpointatwhichtheglobalstructur eofthemapstartstoform,SNEtendstofindmaps witha , thisrequiressensiblechoicesoftheinitiala mountofGaussiannoiseandtherateatwhichit , is thereforecommontoruntheoptimizationsever altimesona ,SNEis inferiortomethodsthatallowconvex optimizationandit discussedSNEasit waspresentedbyHintonandRoweis(2002).

9 AlthoughSNEcon-structsreasonablygoodvisu alizations,it is hamperedbya costfunctionthatis difficulttooptimizeandbya problemwereferto asthe crowdingproblem .Inthissection,wepresenta new techniquecalled t-DistributedStochasticNeighborEmbedding or t-SNE ways:(1)it usesasymmetrizedversionoftheSNEcostfunct ionwithsimplergradientsthatwasbrieflyint roducedbyCooket al.(2007)and(2)it usesa Student-tdistributionratherthana Gaussiantocomputethesim-ilaritybetweentw o pointsinthelow-dimensionalspace. t-SNEemploysa ,wefirstdiscussthesymmetricversionofSNE( ).Subsequently, wediscussthecrowdingproblem( ),andtheuseofheavy-taileddistributionsto addressthisproblem( ).We concludethesectionbydescribingourapproac htotheoptimizationofthet-SNEcostfunction ( ). tominimizingthesumoftheKullback-Leiblerd ivergencesbetweenthecondi-tionalprobabil itiespjjiandqjji, it is alsopossibletominimizea singleKullback-Leiblerdivergencebetweena jointprobabilitydistribution,P, inthehigh-dimensionalspaceanda jointprobabilitydistribution,Q, inthelow-dimensionalspace:C=KL(PjjQ) = i jpi jlogpi jqi j:whereagain, refertothistypeofSNEassymmetricSNE,becau seithasthepropertythatpi j=pjiandqi j=qjifor8i;j.

10 InsymmetricSNE, visualizationofthedatais notnearlyasproblematicaspickingthemodelt hatdoesbestona ,theaimis toseethestructureinthetrainingdata, jaregivenbyqi j=exp kyi yjk2 k6=lexp( kyk ylk2);(3)Theobviouswaytodefinethepairwis esimilaritiesinthehigh-dimensionalspacep i jispi j=exp kxi xjk2=2 2 k6=lexp( kxk xlk2=2 2);butthiscausesproblemswhena high-dimensionaldatapointxiis anoutlier( ,allpairwisedis-tanceskxi xjk2arelargeforxi).Forsuchanoutlier, thevaluesofpi jareextremelysmallforallj, result,thepositionofthemappointis circumventthisproblembydefiningthejointp robabilitiespi jinthehigh-dimensionalspaceto bethesymmetrizedconditionalprobabilities ,thatis,wesetpi j=pjji+pijj2n. Thisensuresthat jpi j>12nforalldatapointsxi, asa resultofwhicheachdatapointximakesa , thesimplerformofitsgradient, fairlysimilartothatofasymmetricSNE,andis givenby C yi=4 j(pi j qi j)(yi yj).


Related search queries