Example: bachelor of science

A Tutorial on Spectral Clustering - arXiv

A Tutorial on Spectral ClusteringUlrike von LuxburgMax Planck Institute for Biological CyberneticsSpemannstr. 38, 72076 T ubingen, article appears in Statistics and Computing, 17 (4), original publication is available recent years, Spectral Clustering has become one of the most popular modern clusteringalgorithms. It is simple to implement, can be solved efficiently by standard linear algebra software,and very often outperforms traditional Clustering algorithms such as the k-means algorithm. Onthe first glance Spectral Clustering appears slightly mysterious, and it is not obvious to see whyit works at all and what it really does. The goal of this Tutorial is to give some intuition onthose questions. We describe different graph Laplacians and their basic properties, present themost common Spectral Clustering algorithms, and derive those algorithms from scratch by severaldifferent approaches.

A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tubingen, Germany ulrike.luxburg@tuebingen.mpg.de

Tags:

  Tutorials, Spectral, Clustering, Spectral clustering

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of A Tutorial on Spectral Clustering - arXiv

1 A Tutorial on Spectral ClusteringUlrike von LuxburgMax Planck Institute for Biological CyberneticsSpemannstr. 38, 72076 T ubingen, article appears in Statistics and Computing, 17 (4), original publication is available recent years, Spectral Clustering has become one of the most popular modern clusteringalgorithms. It is simple to implement, can be solved efficiently by standard linear algebra software,and very often outperforms traditional Clustering algorithms such as the k-means algorithm. Onthe first glance Spectral Clustering appears slightly mysterious, and it is not obvious to see whyit works at all and what it really does. The goal of this Tutorial is to give some intuition onthose questions. We describe different graph Laplacians and their basic properties, present themost common Spectral Clustering algorithms, and derive those algorithms from scratch by severaldifferent approaches.

2 Advantages and disadvantages of the different Spectral Clustering algorithmsare : Spectral Clustering ; graph Laplacian1 IntroductionClustering is one of the most widely used techniques for exploratory data analysis, with applicationsranging from statistics, computer science, biology to social sciences or psychology. In virtually everyscientific field dealing with empirical data, people attempt to get a first impression on their data bytrying to identify groups of similar behavior in their data. In this article we would like to introducethe reader to the family of Spectral Clustering algorithms. Compared to the traditional algorithms such ask-means or single linkage, Spectral Clustering has many fundamental advantages. Results ob-tained by Spectral Clustering often outperform the traditional approaches, Spectral Clustering is verysimple to implement and can be solved efficiently by standard linear algebra Tutorial is set up as a self-contained introduction to Spectral Clustering .

3 We derive spectralclustering from scratch and present different points of view to why Spectral Clustering works. Apartfrom basic linear algebra, no particular mathematical background is required by the reader. However,we do not attempt to give a concise review of the whole literature on Spectral Clustering , which isimpossible due to the overwhelming amount of literature on this subject. The first two sectionsare devoted to a step-by-step introduction to the mathematical objects used by Spectral Clustering :similarity graphs in Section 2, and graph Laplacians in Section 3. The Spectral Clustering algorithmsthemselves will be presented in Section 4. The next three sections are then devoted to explainingwhy those algorithms work. Each section corresponds to one explanation: Section 5 describes a graphpartitioning approach, Section 6 a random walk perspective, and Section 7 a perturbation theoryapproach.

4 In Section 8 we will study some practical issues related to Spectral Clustering , and discussvarious extensions and literature related to Spectral Clustering in Section [ ] 1 Nov 20072 Similarity graphsGiven a set of data pointsx1,..xnand some notion of similaritysij 0 between all pairs of datapointsxiandxj, the intuitive goal of Clustering is to divide the data points into several groups suchthat points in the same group are similar and points in different groups are dissimilar to each other. Ifwe do not have more information than similarities between data points, a nice way of representing thedata is in form of thesimilarity graphG= (V,E). Each vertexviin this graph represents a data pointxi. Two vertices are connected if the similaritysijbetween the corresponding data pointsxiandxjispositive or larger than a certain threshold, and the edge is weighted bysij.

5 The problem of clusteringcan now be reformulated using the similarity graph: we want to find a partition of the graph suchthat the edges between different groups have very low weights (which means that points in differentclusters are dissimilar from each other) and the edges within a group have high weights (which meansthat points within the same cluster are similar to each other). To be able to formalize this intuition wefirst want to introduce some basic graph notation and briefly discuss the kind of graphs we are goingto Graph notationLetG= (V,E) be an undirected graph with vertex setV={v1,..,vn}. In the following we assumethat the graphGis weighted, that is each edge between two verticesviandvjcarries a non-negativeweightwij 0. The weightedadjacency matrixof the graph is the matrixW= (wij)i,j=1.

6 ,n. Ifwij= 0 this means that the verticesviandvjare not connected by an edge. AsGis undirected werequirewij=wji. The degree of a vertexvi Vis defined asdi=n j= that, in fact, this sum only runs over all vertices adjacent tovi, as for all other verticesvjtheweightwijis 0. Thedegree matrixDis defined as the diagonal matrix with the degreesd1,..,dnon the diagonal. Given a subset of verticesA V, we denote its complementV\AbyA. Wedefine the indicator vector1A= (f1,..,fn) Rnas the vector with entriesfi= 1 ifvi Aandfi= 0 otherwise. For convenience we introduce the shorthand notationi Afor the set of indices{i|vi A}, in particular when dealing with a sum like i Awij. For two not necessarily disjoint setsA,B Vwe defineW(A,B) := i A,j consider two different ways of measuring the size of a subsetA V:|A|:= the number of vertices inAvol(A) := i ,|A|measures the size ofAby its number of vertices, while vol(A) measures the size ofAby summing over the weights of all edges attached to vertices inA.

7 A subsetA Vof a graph isconnected if any two vertices inAcan be joined by a path such that all intermediate points also lieinA. A subsetAis called a connected component if it is connected and if there are no connectionsbetween vertices inAandA. The nonempty setsA1,..,Akform a partition of the graph ifAi Aj= andA1 .. Ak= Different similarity graphsThere are several popular constructions to transform a given setx1,..,xnof data points with pairwisesimilaritiessijor pairwise distancesdijinto a graph. When constructing similarity graphs the goal isto model the local neighborhood relationships between the data -neighborhood graph:Here we connect all points whose pairwise distances are smaller than .As the distances between all connected points are roughly of the same scale (at most ), weighting theedges would not incorporate more information about the data to the graph.

8 Hence, the -neighborhoodgraph is usually considered as an unweighted neighbor graphs:Here the goal is to connect vertexviwith vertexvjifvjis amongthek-nearest neighbors ofvi. However, this definition leads to a directed graph, as the neighborhoodrelationship is not symmetric. There are two ways of making this graph undirected. The first way isto simply ignore the directions of the edges, that is we connectviandvjwith an undirected edge ifviis among thek-nearest neighbors ofvjorifvjis among thek-nearest neighbors ofvi. The resultinggraph is what is usually calledthek-nearest neighbor graph. The second choice is to connect verticesviandvjif bothviis among thek-nearest neighbors ofvjandvjis among thek-nearest neighbors ofvi. The resulting graph is called themutualk-nearest neighbor graph. In both cases, after connectingthe appropriate vertices we weight the edges by the similarity of their fully connected graph:Here we simply connect all points with positive similarity with eachother, and we weight all edges bysij.

9 As the graph should represent the local neighborhood re-lationships, this construction is only useful if the similarity function itself models local neighbor-hoods. An example for such a similarity function is the Gaussian similarity functions(xi,xj) =exp( xi xj 2/(2 2)), where the parameter controls the width of the neighborhoods. This pa-rameter plays a similar role as the parameter in case of the -neighborhood graphs mentioned above are regularly used in Spectral Clustering . To our knowledge, theoreticalresults on the question how the choice of the similarity graph influences the Spectral Clustering resultdo not exist. For a discussion of the behavior of the different graphs we refer to Section Graph Laplacians and their basic propertiesThe main tools for Spectral Clustering are graph Laplacian matrices.

10 There exists a whole field ded-icated to the study of those matrices, called Spectral graph theory ( , see Chung, 1997). In thissection we want to define different graph Laplacians and point out their most important will carefully distinguish between different variants of graph Laplacians. Note that in the literaturethere is no unique convention which matrix exactly is called graph Laplacian . Usually, every authorjust calls his matrix the graph Laplacian. Hence, a lot of care is needed when reading literature ongraph the following we always assume thatGis an undirected, weighted graph with weight matrixW,wherewij=wji 0. When using eigenvectors of a matrix, we will not necessarily assume that theyare normalized. For example, the constant vector1and a multiplea1for somea6= 0 will be consideredas the same eigenvectors.


Related search queries