Example: air traffic controller

Visualizing Data using t-SNE

JournalofMachineLearningResearch9 ,5000 LETilburg, King s College Road,M5S3G4 Toronto,ON,CanadaEditor:YoshuaBengioAbst ractWe presenta newtechniquecalled t-SNE thatvisualizeshigh-dimensionaldatabygivi ngeachdatapointa locationina two a variationofStochasticNeighborEmbedding(H intonandRoweis,2002)thatismucheasiertoop timize,andproducessignificantlybettervis ualizationsbyreducingthetendency betterthanexistingtechniquesat creatinga singlemapthatrevealsstructureat many particularlyimportantforhigh-dimensional datathatlieonseveraldifferent,butrelated ,low-dimensionalmanifolds, ,weshow howt-SNEcanuserandomwalksonneighborhoodg raphstoallowtheimplicitstructureofalloft hedatato influencethewayin whicha subsetofthedatais illustratetheperformanceoft-SNEona wi

VAN DER MAATEN AND HINTON a small qjji to model a large pjji), but there is only a small cost for using nearby map points to represent widely separated datapoints. This small cost comes from wasting some of the probability mass in the relevant Q distributions. In other words, the SNE cost function focuses on retaining the

Tags:

  Der van

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Visualizing Data using t-SNE

1 JournalofMachineLearningResearch9 ,5000 LETilburg, King s College Road,M5S3G4 Toronto,ON,CanadaEditor:YoshuaBengioAbst ractWe presenta newtechniquecalled t-SNE thatvisualizeshigh-dimensionaldatabygivi ngeachdatapointa locationina two a variationofStochasticNeighborEmbedding(H intonandRoweis,2002)thatismucheasiertoop timize,andproducessignificantlybettervis ualizationsbyreducingthetendency betterthanexistingtechniquesat creatinga singlemapthatrevealsstructureat many particularlyimportantforhigh-dimensional datathatlieonseveraldifferent,butrelated ,low-dimensionalmanifolds.

2 Weshow howt-SNEcanuserandomwalksonneighborhoodg raphstoallowtheimplicitstructureofalloft hedatato influencethewayin whicha subsetofthedatais illustratetheperformanceoft-SNEona widevarietyofdatasetsandcompareit withmany othernon-parametricvisualizationtechniqu es,includingSammonmapping,Isomap, :visualization,dimensionalityreduction,m anifoldlearning,embeddingalgorithms, differentdomains,anddealswithdataofwidel yvaryingdimensionality. Cellnucleithatarerelevanttobreastcancer, forexample,aredescribedbyapproximately30 variables(Streetetal.)

3 ,1993),whereasthepixelintensityvectorsus edtorepresentimagesortheword-countvector susedtorepresentdocumentstypicallyhave ,a varietyoftechniquesforthevisualizationof suchhigh-dimensionaldatahave beenproposed,many ofwhicharereviewedbydeOliveiraandLevkowi tz(2003).Importanttechniquesincludeicono graphicdisplayssuchasChernoff faces(Chernoff, 1973),pixel-basedtechniques(Keim,2000),a ndtechniquesthatrepre-sentthedimensionsi n thedataasverticesin a graph(Battistaet al.,1994).Mostofthesetechniquessimplypro videtoolstodisplaymorethantwo datadimensions,andleave theinterpretationofthec 2008 LaurensvanderMaatenandGeoffrey ,dimensionalityreductionmethodsconvertth ehigh-dimensionaldatasetX=fx1;x2; :::;xngintotwo orthree-dimensionaldataY=fy1;y2; :::;yngthatcanbedisplayedina , werefertothelow-dimensionaldatarepresent ationYasa map, beenproposedthatdifferinthetypeofstructu rethey (PCA;Hotelling,1933)andclassicalmultidim ensionalscaling(MDS.

4 Torgerson,1952) low-dimensional,non-linearmanifoldit isusu-allymoreimportanttokeepthelow-dime nsionalrepresentationsofverysimilardatap ointsclosetogether, whichis typicallynotpossiblewitha largenumberofnonlineardimensionalityredu ctiontechniquesthataimtopreserve thelocalstructureofdatahave beenproposed,many ofwhicharereviewedbyLeeandVerleysen(2007 ).Inparticular, wementionthefollowingseventechniques:(1) Sammonmapping(Sammon,1969),(2)curvilinea rcomponentsanalysis(CCA;DemartinesandH erault,1997),(3)StochasticNeighborEmbedd ing(SNE;HintonandRoweis,2002),(4)Isomap( Tenenbaumet al.

5 ,2000),(5)MaximumVarianceUnfolding(MVU;W einbergeret al.,2004),(6)LocallyLinearEmbedding(LLE; RoweisandSaul,2000),and(7)LaplacianEigen maps(BelkinandNiyogi,2002).Despitethestr ongper-formanceofthesetechniquesonartifi cialdatasets,they areoftennotverysuccessfulat visualizingreal, , mostofthetechniquesarenotcapableofretain ingboththelocalandtheglobalstructureofth edataina ,a recentstudyrevealsthatevena semi-supervisedvariantofMVUisnotcapableo fseparatinghandwrittendigitsintotheirnat uralclusters(Songet al.,2007).Inthispaper, wedescribea wayofconvertinga high-dimensionaldatasetintoa matrixofpair-wisesimilaritiesandweintrod ucea newtechnique,called t-SNE , capableofcapturingmuchofthelocalstructur eofthehigh-dimensionaldataverywell,while alsorevealingglobalstructuresuchasthepre senceofclustersat illustratetheperformanceoft-SNEbycompari ngit tothesevendimensionalityreductiontech-ni quesmentionedabove onfive datasetsfroma ,mostofthe(7+1)

6 5=40mapsarepresentedinthesupplementalmat erial, , weoutlineSNEaspresentedbyHintonandRoweis (2002), ,wepresentt-SNE, , , Section5 showshowt-SNEcanbemodifiedtovisualizerea l-worlddatasetsthatcontainmany morethan10; moredetailin (SNE) datapointxiis theconditionalprobability,pjji, thatxiwouldpickxjasitsneighborif neighborswerepickedinproportiontotheirpr obabilitydensityundera ,pjjiis relativelyhigh,whereasforwidelyseparated datapoints,pjjiwillbealmostinfinitesimal (forreasonablevaluesofthevarianceoftheGa ussian, i).

7 Mathematically, theconditionalprobabilitypjjiis givenbypjji=exp kxi xjk2=2 2i k6=iexp kxi xkk2=2 2i ;(1)where iis thevarianceoftheGaussianthatis centeredondatapointxi. Themethodfordeterminingthevalueof iis presentedlaterin modelingpairwisesimilarities, , it is possibletocomputea similarconditionalprobability,whichweden otebyqjji. We set2thevarianceoftheGaussianthatis employedinthecomputationoftheconditional probabilitiesqjjito1p2. Hence,wemodelthesimilarityofmappointyjto mappointyibyqjji=exp kyi yjk2 k6=iexp( kyi ykk2):Again,sinceweareonlyinterestedinmo delingpairwisesimilarities,wesetqiji= themappointsyiandyjcorrectlymodelthesimi laritybetweenthehigh-dimensionaldata-poi ntsxiandxj, ,SNEaimstofinda low-dimensionaldatarepresentationthatmin imizesthemismatchbetweenpjjiandqjji.

8 Anaturalmeasureofthefaithfulnesswithwhic hqjjimodelspjjiistheKullback-Leiblerdive rgence(whichis inthiscaseequaltothecross-entropy uptoanadditive constant).SNEminimizesthesumofKullback-L eiblerdivergencesoveralldatapointsusinga givenbyC= iKL(PijjQi) = i jpjjilogpjjiqjji;(2)inwhichPirepresentst heconditionalprobabilitydistributionover allotherdatapointsgivendata-pointxi, andQirepresentstheconditionalprobability distributionoverallothermappointsgivenma ppointyi. BecausetheKullback-Leiblerdivergenceis notsymmetric,differenttypesoferrorinthep airwisedistancesinthelow-dimensionalmapa renotweightedequally.

9 Inparticular, thereis a largecostforusingwidelyseparatedmappoint storepresentnearbydatapoints( , datasetsthatconsistofpairwisesimilaritie sbetweenobjectsratherthanhigh-dimensiona lvectorrepresentationsofeachobject, ,humanwordassociationdataconsistsofthepr obabilityofproducingeachpossiblewordinre sponsetoagivenword,asa resultofwhichit is ,welosethepropertythatthedatais a perfectmodelofitselfif weembedit ina spaceofthesamedimensionality, becauseinthehigh-dimensionalspace,weused a differentvariance smallqjjitomodela largepjji), butthereisonlya ,theSNEcostfunctionfocusesonretainingthe localstructureofthedatainthemap(forreaso nablevaluesofthevarianceoftheGaussianint hehigh-dimensionalspace, i).

10 Theremainingparametertobeselectedis thevariance ioftheGaussianthatis centeredovereachhigh-dimensionaldatapoin t,xi. It is notlikelythatthereis a singlevalueof ithatis optimalforalldatapointsinthedatasetbecau sethedensityofthedatais likelytovary. Indenseregions,a smallervalueof iis particularvalueof iinducesa probabilitydistribution,Pi, whichincreasesas binarysearchforthevalueof ithatproducesaPiwitha fixedperplexitythatis definedasPer p(Pi) =2H(Pi);whereH(Pi)is theShannonentropy ofPimeasuredinbitsH(Pi) = jpjjilog2pjji:Theperplexitycanbeinterpre tedasa smoothmeasureoftheeffective fairlyrobusttochangesintheperplexity, isperformedusinga surprisinglysimpleform C yi=2 j(pjji qjji+pijj qijj)(yi yj):Physically, thegradientmaybeinterpretedastheresultan tforcecreatedbya setofspringsbetweenthemappointyiandallot hermappointsyj.


Related search queries