Example: bankruptcy

Santo Fortunato - arXiv

Community detection in graphsSanto Fortunato Complex Networks and Systems Lagrange Laboratory, ISI Foundation, Viale S. Severo 65, 10133, Torino, modern science of networks has brought significant advances to our understanding of complexsystems. One of the most relevant features of graphs representing real systems is communitystructure, or clustering, i. e. the organization of vertices in clusters, with many edges joiningvertices of the same cluster and comparatively few edges joining vertices of different clusters. Suchclusters, or communities, can be considered as fairly independent compartments of a graph , playinga similar role like, e. g., the tissues or the organs in the human body. Detecting communitiesis of great importance in sociology, biology and computer science, disciplines where systems areoften represented as graphs. This problem is very hard and not yet satisfactorily solved, despitethe huge effort of a large interdisciplinary community of scientists working on it over the past fewyears.

Community detection in graphs Santo Fortunato Complex Networks and Systems Lagrange Laboratory, ISI Foundation, Viale S. Severo 65, 10133, Torino,

Tags:

  Graph, Aston, Santo fortunato, Fortunato, Graphs santo fortunato

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Santo Fortunato - arXiv

1 Community detection in graphsSanto Fortunato Complex Networks and Systems Lagrange Laboratory, ISI Foundation, Viale S. Severo 65, 10133, Torino, modern science of networks has brought significant advances to our understanding of complexsystems. One of the most relevant features of graphs representing real systems is communitystructure, or clustering, i. e. the organization of vertices in clusters, with many edges joiningvertices of the same cluster and comparatively few edges joining vertices of different clusters. Suchclusters, or communities, can be considered as fairly independent compartments of a graph , playinga similar role like, e. g., the tissues or the organs in the human body. Detecting communitiesis of great importance in sociology, biology and computer science, disciplines where systems areoften represented as graphs. This problem is very hard and not yet satisfactorily solved, despitethe huge effort of a large interdisciplinary community of scientists working on it over the past fewyears.

2 We will attempt a thorough exposition of the topic, from the definition of the main elementsof the problem, to the presentation of most methods developed, with a special focus on techniquesdesigned by statistical physicists, from the discussion of crucial issues like the significance ofclustering and how methods should be tested and compared against each other, to the descriptionof applications to real Introduction2II. Communities in real-world networks4 III. Elements of Community Detection8A. Computational complexity9B. Communities91. Basics92. Local definitions103. Global definitions114. Definitions based on vertex similarity12C. Partitions131. Basics132. Quality functions: modularity14IV. Traditional methods16A. graph partitioning16B. Hierarchical clustering19C. Partitional clustering19D. Spectral clustering20V. Divisive algorithms23A. The algorithm of Girvan and Newman23B. Other methods25VI. Modularity-based methods27A.

3 Modularity optimization271. Greedy techniques272. Simulated annealing293. Extremal optimization294. Spectral optimization305. Other optimization strategies33B. Modifications of modularity34C. Limits of modularity38 VII. Spectral Algorithms41 VIII. Dynamic Algorithms43A. Spin models43 Electronic address: Random walk45C. Synchronization47IX. Methods based on statistical inference48A. Generative models49B. Blockmodeling, model selection and informationtheory52X. Alternative methods54XI. Methods to find overlapping communities58A. Clique percolation58B. Other techniques60 XII. Multiresolution methods and cluster hierarchy62A. Multiresolution methods63B. Hierarchical methods65 XIII. Detection of dynamic communities66 XIV. Significance of clustering70XV. Testing Algorithms73A. Benchmarks74B. Comparing partitions: measures77C. Comparing algorithms79 XVI. General properties of real clusters82 XVII. Applications on real-world networks85A.

4 Biological networks85B. Social networks86C. Other networks88 XVIII. Outlook90A. Elements of graph Theory921. Basic Definitions922. graph Matrices943. Model graphs94 References96 [ ] 25 Jan 20102I. INTRODUCTIONThe origin of graph theory dates back to Euler s solu-tion of the puzzle of K onigsberg s bridges in 1736 (Euler,1736). Since then a lot has been learned about graphsand their mathematical properties (Bollobas, 1998). Inthe 20th century they have also become extremely usefulas representation of a wide variety of systems in differentareas. Biological, social, technological, and informationnetworks can be studied as graphs, and graph analysishas become crucial to understand the features of thesesystems. For instance, social network analysis started inthe 1930 s and has become one of the most importanttopics in sociology (Scott, 2000; Wasserman and Faust,1994). In recent times, the computer revolution has pro-vided scholars with a huge amount of data and computa-tional resources to process and analyze these data.

5 Thesize of real networks one can potentially handle has alsogrown considerably, reaching millions or even billions ofvertices. The need to deal with such a large number ofunits has produced a deep change in the way graphs areapproached (Albert and Barab asi, 2002; Barratet al.,2008; Boccalettiet al., 2006; Mendes and Dorogovtsev,2003; Newman, 2003; Pastor-Satorras and Vespignani,2004).Graphs representing real systems are not regular like,e. g., lattices. They are objects where order coexists withdisorder. The paradigm of disordered graph is the ran-dom graph , introduced by P. Erd os and A. R enyi (Erd osand R enyi, 1959). In it, the probability of having anedge between a pair of vertices is equal for all possiblepairs (see Appendix). In a random graph , the distribu-tion of edges among the vertices is highly instance, the distribution of the number of neigh-bours of a vertex, ordegree, is binomial, so most ver-tices have equal or similar degree.

6 Real networks arenot random graphs, as they display big inhomogeneities,revealing a high level of order and organization. The de-gree distribution is broad, with a tail that often followsa power law: therefore, many vertices with low degreecoexist with some vertices with large degree. Further-more, the distribution of edges is not only globally, butalso locally inhomogeneous, with high concentrations ofedges within special groups of vertices, and low concen-trations between these groups. This feature of real net-works is calledcommunity structure(Girvan and New-man, 2002), orclustering, and is the topic of this review(for earlier reviews see Refs. (Danonet al., 2007; Fortu-nato and Castellano, 2009; Newman, 2004a; Porteret al.,2009; Schaeffer, 2007)). Communities, also calledclustersormodules, are groups of vertices which probably sharecommon properties and/or play similar roles within thegraph. In Fig. 1 a schematic example of a graph withcommunities is offers a wide variety of possible group organi-zations: families, working and friendship circles, villages,towns, nations.

7 The diffusion of Internet has also ledto the creation of virtual groups, that live on the Web, FIG. 1A simple graph with three communities, enclosedby the dashed circles. Reprinted figure with permission fromRef. ( Fortunato and Castellano, 2009).c 2009 by online communities. Indeed, social communities havebeen studied for a long time (Coleman, 1964; Freeman,2004; Kottak, 2004; Moody and White, 2003). Communi-ties also occur in many networked systems from biology,computer science, engineering, economics, politics, protein-protein interaction networks, communities arelikely to group proteins having the same specific functionwithin the cell (Chen and Yuan, 2006; Rives and Galitski,2003; Spirin and Mirny, 2003), in the graph of the WorldWide Web they may correspond to groups of pages deal-ing with the same or related topics (Dourisboureet al.,2007; Flakeet al., 2002), in metabolic networks they maybe related to functional modules such as cycles and path-ways (Guimer`a and Amaral, 2005; Pallaet al.)

8 , 2005),in food webs they may identify compartments (Krauseet al., 2003; Pimm, 1979), and so can have concrete applications. Cluster-ing Web clients who have similar interests and are ge-ografically near to each other may improve the perfor-mance of services provided on the World Wide Web, inthat each cluster of clients could be served by a dedi-cated mirror server (Krishnamurthy and Wang, 2000).Identifying clusters of customers with similar interestsin the network of purchase relationships between cus-tomers and products of online retailers (like, , ) enables to set up efficient recommen-dation systems (Reddyet al., 2002), that better guidecustomers through the list of items of the retailer andenhance the business opportunities. Clusters of largegraphs can be used to create data structures in orderto efficiently store the graph data and to handle naviga-3tional queries, like path searches (Agrawal and Jagadish,1994; Wuet al.

9 , 2004).Ad hoc networks(Perkins, 2001),i. e. self-configuring networks formed by communicationnodes acting in the same region and rapidly changing(because the devices move, for instance), usually haveno centrally maintained routing tables that specify hownodes have to communicate to other nodes. Grouping thenodes into clusters enables one to generate compact rout-ing tables while the choice of the communication pathsis still efficient (Steenstrup, 2001).Community detection is important for other reasons,too. Identifying modules and their boundaries allows fora classification of vertices, according to their structuralposition in the modules. So, vertices with a central posi-tion in their clusters, i. e. sharing a large number of edgeswith the other group partners, may have an importantfunction of control and stability within the group; ver-tices lying at the boundaries between modules play an im-portant role of mediation and lead the relationships andexchanges between different communities (alike to Cser-mely s creative elements (Csermely, 2008)).

10 Such clas-sification seems to be meaningful in social (Burt, 1976;Freeman, 1977; Granovetter, 1973) and metabolic net-works (Guimer`a and Amaral, 2005). Finally, one canstudy the graph where vertices are the communities andedges are set between clusters if there are connections be-tween some of their vertices in the original graph and/orif the modules overlap. In this way one attains a coarse-grained description of the original graph , which unveilsthe relationships between modules1. Recent studies indi-cate that networks of communities have a different degreedistribution with respect to the full graphs (Pallaet al.,2005); however, the origin of their structures can be ex-plained by the same mechanism (Pollneret al., 2006).Another important aspect related to community struc-ture is the hierarchical organization displayed by mostnetworked systems in the real world. Real networks areusually composed by communities including smaller com-munities, which in turn include smaller communities, human body offers a paradigmatic example of hier-archical organization: it is composed by organs, organsare composed by tissues, tissues by cells, etc.


Related search queries