Example: air traffic controller

metapath2vec: Scalable Representation Learning for ...

Metapath2vec: Scalable Representation Learning forHeterogeneous NetworksYuxiao Dong Microso ResearchRedmond, WA 98052yuxdong@microso .comNitesh V. ChawlaUniversity of Notre DameNotre Dame, IN SwamiArmy Research LaboratoryAdelphi, MD study the problem of Representation Learning in heterogeneousnetworks. Its unique challenges come from the existence of mul-tiple types of nodes and links, which limit the feasibility of theconventional network embedding techniques. We develop twoscalable Representation Learning models, namelymetapath2vecandmetapath2vec++. emetapath2vecmodel formalizes meta-path-based random walks to construct the heterogeneous neighborhoodof a node and then leverages a heterogeneous skip-gram modelto perform node embeddings. emetapath2vec++model furtherenables the simultaneous modeling of structural and semantic cor-relations in heterogeneous networks. Extensive experiments showthatmetapath2vecandmetapath2vec++are able to not only outper-form state-of-the-art embedding models in various heterogeneousnetwork mining tasks, such as node classi cation, clustering, andsimilarity search, but also discern the structural and semantic cor-relations between diverse network CONCEPTS Information systems Social networks; Computing method-ologies Unsupervised Learning ; Learning latent represen-tations;Knowledge Representation and reasoning;KEYWORDSN etwork Embedding; Heterogeneous Representation Learning ; La-tent Represent

representation learning methods enable the automatic discovery of useful and meaningful (latent) features from the “raw networks.” However, these work has thus far focused on representation learning for homogeneous networks—representative of singular type of nodes and relationships. Yet a large number of social and

Tags:

  Learning, Representation, Representation learning

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of metapath2vec: Scalable Representation Learning for ...

1 Metapath2vec: Scalable Representation Learning forHeterogeneous NetworksYuxiao Dong Microso ResearchRedmond, WA 98052yuxdong@microso .comNitesh V. ChawlaUniversity of Notre DameNotre Dame, IN SwamiArmy Research LaboratoryAdelphi, MD study the problem of Representation Learning in heterogeneousnetworks. Its unique challenges come from the existence of mul-tiple types of nodes and links, which limit the feasibility of theconventional network embedding techniques. We develop twoscalable Representation Learning models, namelymetapath2vecandmetapath2vec++. emetapath2vecmodel formalizes meta-path-based random walks to construct the heterogeneous neighborhoodof a node and then leverages a heterogeneous skip-gram modelto perform node embeddings. emetapath2vec++model furtherenables the simultaneous modeling of structural and semantic cor-relations in heterogeneous networks. Extensive experiments showthatmetapath2vecandmetapath2vec++are able to not only outper-form state-of-the-art embedding models in various heterogeneousnetwork mining tasks, such as node classi cation, clustering, andsimilarity search, but also discern the structural and semantic cor-relations between diverse network CONCEPTS Information systems Social networks; Computing method-ologies Unsupervised Learning ; Learning latent represen-tations;Knowledge Representation and reasoning;KEYWORDSN etwork Embedding; Heterogeneous Representation Learning ; La-tent Representations; Feature Learning ; Heterogeneous InformationNetworksACM Reference format:Yuxiao Dong, Nitesh V.

2 Chawla, and Ananthram Swami. 2017. metap-ath2vec: Scalable Representation Learning for Heterogeneous Networks. InProceedings of KDD 17, August 13-17, 2017, Halifax, NS, Canada, ,10 : h INTRODUCTIONN eural network-based Learning models can represent latent embed-dings that capture the internal relations of rich, complex data acrossvarious modalities, such as image, audio, and language [15]. Social is work was done when Yuxiao was a student at University of Notre to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro t or commercial advantage and that copies bear this notice and the full citationon the rst page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permi ed. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior speci c permission and/or afee.

3 Request permissions from 17, August 13-17, 2017, Halifax, NS, Canada 2017 ACM. 978-1-4503-4887-4/17/08..$ : h ShenkerM. HanA. TomkinsR. E. TarjanD. SongJ. DeanT. KanadeR. N. TaylorC. D. ManningH. IshiiH. JensenR. AgrawalJ. MalikO. MutluKDDSIGGRAPHSIGIRFOCSS&POSDINIPSIJCA IICSESIGCOMMACLSIGMODCHICVPRWWWISCAW. B. Croft(a) DeepWalk / node2vecS. ShenkerM. HanA. TomkinsR. E. TarjanD. SongJ. DeanT. KanadeR. N. TaylorC. D. ManningH. IshiiH. JensenR. AgrawalJ. MalikO. MutluKDDSIGGRAPHSIGIRFOCSS&POSDINIPSIJCA IICSESIGCOMMACLSIGMODCHICVPRWWWISCAW. B. Croft(b) PTES. ShenkerM. HanA. TomkinsR. E. TarjanD. SongJ. DeanT. KanadeR. N. TaylorC. D. ManningH. IshiiH. JensenR. AgrawalJ. MalikO. MutluKDDSIGGRAPHSIGIRFOCSS&POSDINIPSIJCA IICSESIGCOMMACLSIGMODCHICVPRWWWISCAW. B. Croft(c)metapath2vecS. ShenkerM. HanA. TomkinsR. E. TarjanD. SongJ. DeanT. KanadeR. N. TaylorC. D. ManningH. IshiiH. JensenR. AgrawalJ. MalikO. MutluKDDSIGGRAPHSIGIRFOCSS&POSDINIPSIJCA IICSESIGCOMMACLSIGMODCHICVPRWWWISCAW.

4 B. Croft(d)metapath2vec++Figure 1: 2D PCA projections of the 128D embeddings of 16top CS conferences and corresponding high-pro le information networks are similarly rich and complex data thatencode the dynamics and types of human interactions, and are sim-ilarly amenable to Representation Learning using neural particular, by mapping the way that people choose friends andmaintain connections as a social language, recent advances innatural language processing (NLP) [3] can be naturally applied tonetwork Representation Learning , most notably the group of NLPmodels known as word2vec [17,18]. A number of recent researchpublications have proposed word2vec-based network representa-tion Learning frameworks, such as DeepWalk [22], LINE [30], andnode2vec [8]. Instead of handcra ed network feature design, theserepresentation Learning methods enable the automatic discovery ofuseful and meaningful (latent) features from the raw networks. However, these work has thus far focused on representationlearning for homogeneous networks representative of singulartype of nodes and relationships.

5 Yet a large number of social andinformation networks are heterogeneous in nature, involving diver-sity of node types and/or relationships between nodes [25]. eseheterogeneous networks present unique challenges that cannotbe handled by Representation Learning models that are speci callydesigned for homogeneous networks. Take, for example, a het-erogeneous academic network: How do we e ectively preservethe concept of word-context among multiple types of nodes, ,authors, papers, venues, organizations, Can random walks,such those used in DeepWalk and node2vec, be applied to networksTable 1: Case study of similarity search in the heterogeneous DBIS data used in [26].MethodPathSim [26]DeepWalk / node2vec [8, 22]LINE (1st+2nd) [30]PTE [29]metapath2vecmetapath2vec++Inputmeta- pathsheterogeneous random walk pathsheterogeneous edgesheterogeneous edgesprobabilistic meta-pathsprobabilistic meta-paths eryPKDDC. FaloutsosPKDDC. FaloutsosPKDDC. FaloutsosPKDDC. FaloutsosPKDDC.

6 FaloutsosPKDDC. Faloutsos1 ICDMJ. HanR. PanW. AggarwalKDDC. AggarwalA. AggarwalKDDR. Agrawal2 SDMR. AgrawalM. TongS. YuICDMP. YuM. PeiPAKDDJ. Han3 PAKDDJ. PeiR. YangA. GunopulosSDMY. TaoP. YuICDMJ. Pei4 KDDC. AggarwalG. FilhoM. KoudasDMKDN. KoudasM. ChengDMKDC. Aggarwal5 DMKDH. JagadishF. ChanS. VlachosPAKDDR. RastogiM. GantiSDMP. Yuof multiple types of nodes? Can we directly apply homogeneousnetwork-oriented embedding architectures ( , skip-gram) to het-erogeneous networks?By solving these challenges, the latent heterogeneous networkembeddings can be further applied to various network mining tasks,such as node classi cation [13], clustering [27,28], and similaritysearch [26,35]. In contrast to conventional meta-path-based meth-ods [25], the advantage of latent-space Representation Learning liesin its ability to model similarities between nodes without connectedmeta-paths. For example, if authors have never published papers inthe same venue imagine one publishes 10 papers all in NIPS andthe other has 10 publications all in ICML; their APCPA -based Path-Sim similarity [26] would be zero this will be naturally overcomeby network Representation formalize the heterogeneous network repre-sentation Learning problem, where the objective is to simultane-ously learn the low-dimensional and latent embeddings for multipletypes of nodes.

7 We present themetapath2vecand its extensionmeta-path2vec++frameworks. e goal ofmetapath2vecis to maximizethe likelihood of preserving both the structures and semantics of agiven heterogeneous network. Inmetapath2vec, we rst proposemeta-path [25] based random walks in heterogeneous networksto generate heterogeneous neighborhoods with network seman-tics for various types of nodes. Second, we extend the skip-grammodel [18] to facilitate the modeling of geographically and seman-tically close nodes. Finally, we develop a heterogeneous negativesampling-based method, referred to asmetapath2vec++, that en-ables the accurate and e cient prediction of a node s heterogeneousneighborhood. e proposedmetapath2vecandmetapath2vec++mod els are dif-ferent from conventional network embedding models, which focuson homogeneous networks [8,22,30]. Speci cally, conventionalmodels su er from the identical treatment of di erent types ofnodes and relations, leading to the production of indistinguishablerepresentations for heterogeneous nodes as evident through ourevaluation.

8 Further, themetapath2vecandmetapath2vec++modelsal so di er from the Predictive Text Embedding (PTE) model [29]in several ways. First, PTE is a semi-supervised Learning modelthat incorporates label information for text data. Second, the het-erogeneity in PTE comes from the text network wherein a linkconnects two words, a word and its document, and a word and itslabel. Essentially, the raw input of PTE is words and its output isthe embedding of each word, rather than multiple types of summarize the di erences of these methods in Table 1, whichlists their input to Learning algorithms, as well as the top- ve simi-larity search results in the DBIS network for the same two queriesused in [26] (see Section 4 for details). By modeling the hetero-geneous neighborhood and further leveraging the heterogeneousnegative sampling technique,metapath2vec++is able to achieve thebest top- ve similar results for both types of queries. Figure 1 showsthe visualization of the 2D projections of the learned embeddingsfor 16 CS conferences and corresponding high-pro le researchersin each eld.

9 Remarkably, we nd thatmetapath2vec++is capableof automatically organizing these two types of nodes and implicitlylearning the internal relationships between them, suggested by thesimilar directions and distances of the arrows connecting each example, it learns J. Dean OSDI and C. D. Manning also able to group each author-conference pairclosely, such as R. E. Tarjan and FOCS. All of these properties arenot discoverable from conventional network embedding summarize, our work makes the following contributions:(1)Formalizes the problem of heterogeneous network represen-tation Learning and identi es its unique challenges resultingfrom network heterogeneity.(2)Develops e ective and e cient network embedding frame-works,metapath2vec&metapath2vec++, for preserving bothstructural and semantic correlations of heterogeneous networks.(3) rough extensive experiments, demonstrates the e cacy andscalability of the presented methods in various heterogeneousnetwork mining tasks, such as node classi cation (achievingrelative improvements of 35 319% over benchmarks) and nodeclustering (achieving relative gains of 13 16% over baselines).

10 (4)Demonstrates the automatic discovery of internal semanticrelationships between di erent types of nodes in heterogeneousnetworks bymetapath2vec&metapath2vec++, not discoverableby existing PROBLEM DEFINITIONWe formalize the Representation Learning problem in heterogeneousnetworks, which was rst brie y introduced in [21]. In speci c, weleverage the de nition of heterogeneous networks in [25,27] andpresent the Learning problem with its inputs and nition Heterogeneous Networkis de ned as a graphG=(V,E,T)in which each nodevand each linkeare associatedwith their mapping functions (v):V TVand (e):E TE, the sets of object and relation types,where|TV|+|TE|> example, one can represent the academic network in Figure2(a) with authors (A), papers (P), venues (V), organizations (O) asnodes, wherein edges indicate the coauthor (A A), publish (A P,P V), a liation (O A) relationships. By considering a heteroge-neous network as input, we formalize the problem of heterogeneousnetwork Representation Learning as Network Representation Learn-ing:Given a heterogeneous networkG, the task is to learn thed-dimensional latent representationsX R|V| d,d |V|that areable to capture the structural and semantic relations among them.


Related search queries