b, Renaud Lambiotte and Etienne Lefebvre …

[ ] 25 Jul 2008 Fast unfolding of communities in large networksVincent D. Blondel1;a, Jean-Loup Guillaume1,2;b, RenaudLambiotte1,3;cand Etienne Lefebvre11 Department of Mathematical Engineering, Universit e catholique de Louvain, 4avenue Georges Lemaitre, B-1348 Louvain-la-Neuve, Belgium2 LIP6, Universit e Pierre et Marie Curie, 4 place Jussieu, 75005 Paris, France3 Institute for Mathematical Sciences, Imperial College London, 53 Prince s Gate,South Kensington campus, SW72PG, propose a simple method to extract the community structure of largenetworks. Our method is a heuristic method that is based on modularity is shown to outperform all other known community detection method in terms ofcomputation time.

Moreover, the quality of the communitiesdetected is very good, asmeasured by the so-called modularity. This is shown first by identifying languagecommunities in a Belgian mobile phone network of millioncustomers and byanalyzing a web graph of 118 million nodes and more than one billion links. Theaccuracy of our algorithm is also verified on ad-hoc modular : Random graphs, networks; Critical phenomena of socio-economic systems;Socio-economic networksFast unfolding of communities in large networks21. IntroductionSocial, technological and information systems can often bedescribed in terms of complexnetworks that have a topology of interconnected nodes combining organization andrandomness [1, 2].

The typical size of large networks such associal network services,mobile phone networks or the web now counts in millions when not billions of nodesand these scales demand new methods to retrieve comprehensive information from theirstructure. A promising approach consists in decomposing the networks into sub-unitsor communities, which are sets of highly inter-connected nodes [3]. The identification ofthese communities is of crucial importance as they may help to uncover a-priori unknownfunctional modules such as topics in information networks or cyber-communities in socialnetworks. Moreover, the resulting meta-network, whose nodes are the communities, maythen be used to visualize the original network problem of community detection requires the partition of a network intocommunities of densely connected nodes, with the nodes belonging to differentcommunities being only sparsely connected.

Precise formulations of this optimizationproblem are known to be computationally intractable. Several algorithms have thereforebeen proposed to find reasonably good partitions in a reasonably fast way. This searchfor fast algorithms has attracted much interest in recent years due to the increasingavailability of large network data sets and the impact of networks on every day life. Onecan distinguish several types of community detection algorithms: divisive algorithmsdetect inter-community links and remove them from the network [4, 5, 6], agglomerativealgorithms merge similar nodes/communities recursively [7] and optimization methodsare based on the maximisation of an objective function [8, 9,10].

The quality of thepartitions resulting from these methods is often measured by the so-called modularityof the partition. The modularity of a partition is a scalar value between -1 and 1that measures the density of links inside communities as compared to links betweencommunities [4, 11]. In the case of weighted networks (weighted networks are networksthat have weights on their links, such as the number of communications between twomobile phone users), it is defined as [12]Q=12mXi,j Aij kikj2m (ci, cj),(1)whereAijrepresents the weight of the edge betweeniandj,ki=PjAijis the sum ofthe weights of the edges attached to vertexi,ciis the community to which vertexiisassigned, the -function (u, v) is 1 ifu=vand 0 otherwise andm= has been used to compare the quality of the partitions obtained bydifferent methods, but also as an objective function to optimize [13].

Unfortunately,exact modularity optimization is a problem that is computationally hard [14] and soapproximation algorithms are necessary when dealing with large networks. The fastestapproximation algorithm for optimizing modularity on large networks was proposedby Clauset et al. [8]. That method consists in recurrently merging communitiesthat optimize the production of modularity. Unfortunately, this greedy algorithm mayFast unfolding of communities in large networks3 Figure of the steps of our algorithm. Each pass is made of two phases:one where modularity is optimized by allowing only local changes of communities;one where the found communities are aggregated in order to build a new network ofcommunities.

The passes are repeated iteratively until no increase of modularity values of modularity that are significantly lower than what can be found byusing, for instance, simulated annealing [15]. Moreover, the method proposed in [8] has atendency to produce super-communities that contain a largefraction of the nodes, evenon synthetic networks that have no significant community structure. This artefact alsohas the disadvantage to slow down the algorithm considerably and makes it inapplicableto networks of more than a million nodes. This undesired effect has been circumventedby introducing tricks in order to balance the size of the communities being merged,thereby speeding up the running time and making it possible to deal with networks thathave a few million nodes [16].

The largest networks that have been dealt with so far in the literature are a protein-protein interaction network of 30739 nodes [17], a network of about 400000 items on saleon the website of a large on-line retailer [8], and a Japanesesocial networking systems ofabout million users [16]. These sizes still leave considerable room for improvement[18] considering that, as of today, the social networking service Facebook has about64 million active users, the mobile network operator Vodaphone has about 200 millioncustomers and Google indexes several billion web-pages. Let us also notice that in mostlarge networks such as those listed above there are several natural organization levels communities divide themselves into sub-communities andit is thus desirable to obtaincommunity detection methods that reveal this hierarchicalstructure [19].

Fast unfolding of communities in large networks42. MethodWe now introduce our algorithm that finds high modularity partitions of large networksin short time and that unfolds a complete hierarchical community structure for thenetwork, thereby giving access to different resolutions of community detection. Contraryto all the other community detection algorithms, the network size limits that we arefacing with our algorithm are due to limited storage capacity rather than limitedcomputation time: identifying communities in a 118 millionnodes network took only152 minutes [20].Our algorithm is divided in two phases that are repeated iteratively.

Assume thatwe start with a weighted network ofNnodes. First, we assign a different community toeach node of the network. So, in this initial partition thereare as many communities asthere are nodes. Then, for each nodeiwe consider the neighboursjofiand we evaluatethe gain of modularity that would take place by removingifrom its community and byplacing it in the community ofj. The nodeiis then placed in the community for whichthis gain is maximum (in case of a tie we use a breaking rule), but only if this gain ispositive. If no positive gain is possible,istays in its original community. This processis applied repeatedly and sequentially for all nodes until no further improvement can beachieved and the first phase is then complete.

b, Renaud Lambiotte and Etienne Lefebvre …

Tags:

Information

Transcription of b, Renaud Lambiotte and Etienne Lefebvre …

Related search queries

b, Renaud Lambiotte and Etienne Lefebvre …

Tags:

Information

Documents from same domain

Related documents

Related search queries