Knowledge Graph Reﬁnement: A Survey of Approaches and ...

Semantic Web 0 (2016) 1 01 IOS PressKnowledge Graph Refinement:A Survey of Approaches and EvaluationMethodsEditor(s):Philipp Cimiano, Universit t Bielefeld, GermanySolicited review(s):Natasha Noy, Google Inc., USA; Philipp Cimiano, Universit t Bielefeld, Germany; two anonymous reviewersHeiko Paulheim,Data and Web Science Group, University of Mannheim, B6 26, 68159 Mannheim, GermanyE-mail: the recent years, different Web Knowledge graphs, both free and commercial, have been created. While Googlecoined the term Knowledge Graph in 2012, there are also a few openly available Knowledge graphs, with DBpedia, YAGO,and Freebase being among the most prominent ones. Those graphs are often constructed from semi-structured Knowledge , suchas Wikipedia, or harvested from the web with a combination of statistical and linguistic methods.

The result are large-scaleknowledge graphs that try to make a good trade-off between completeness and correctness. In order to further increase the utilityof such Knowledge graphs, various refinement methods have been proposed, which try to infer and add missing Knowledge tothe Graph , or identify erroneous pieces of information. In this article, we provide a Survey of suchknowledge Graph refinementapproaches, with a dual look at both the methods being proposed as well as the evaluation methodologies : Knowledge Graphs, Refinement, Completion, Correction, Error Detection, Evaluation1. IntroductionKnowledge graphs on the Web are a backbone ofmany information systems that require access to struc-tured Knowledge , be it domain-specific or domain-independent. The idea of feeding intelligent systemsand agents with general, formalized Knowledge of theworld dates back to classic Artificial Intelligence re-search in the 1980s [91].

More recently, with the ad-vent of Linked Open Data [5] sources like DBpedia[56], and by Google s announcement of the GoogleKnowledge Graph in 20121, representations of generalworld Knowledge as graphs have drawn a lot of atten-tion are various ways of building such knowl-edge graphs. They can be curated likeCyc[57], edited1 the crowd likeFreebase[9] andWikidata[104],or extracted from large-scale, semi-structured webknowledge bases such as Wikipedia, likeDBpedia[56]andYAGO[101]. Furthermore, information extractionmethods for unstructured or semi-structured informa-tion are proposed, which lead to Knowledge graphs likeNELL[14],PROSPERA[70], orKnowledgeVault[21].Whichever approach is taken for constructing aknowledge Graph , the result will never be perfect [10].As a model of the real world or a part thereof, formal-ized Knowledge cannot reasonably reachfull coverage, , contain information about each and every entity inthe universe.

Furthermore, it is unlikely, in particularwhen heuristic methods are applied, that the knowl-edge Graph isfully correct there is usually a trade-offbetween coverage and correctness, which is addresseddifferently in each Knowledge Graph . [111]To address those shortcomings, various methodsforknowledge Graph refinementhave been $ 2016 IOS Press and the authors. All rights reserved2 Knowledge Graph Refinement: A Survey of Approaches and Evaluation MethodsIn many cases, those methods are developed by re-searchers outside the organizations or communitieswhichcreatethe Knowledge graphs. They rather takean existing Knowledge Graph and try to increase itscoverage and/or correctness by various means. Sincesuch works are reviewed in this Survey , the focus ofthis Survey is not Knowledge graphconstruction, butknowledge this Survey , we view Knowledge graphconstruc-tionas a construction from scratch, , using a set ofoperations on one or more sources to create a knowl-edge Graph .

In contrast, Knowledge graphrefinementassumes that there is already a Knowledge Graph givenwhich is improved, , by adding missing knowl-edge or identifying and removing errors. Usually, thosemethods directly use the information given in a knowl-edge Graph , , as training information for automaticapproaches. Thus, the methods for bothconstructionandrefinementmay be similar, but not the same, sincethe latter work on a given Graph , while the former is important to note that for many knowledgegraphs, one or more refinement steps are applied whencreating and/or before publishing the Graph . For exam-ple, logical reasoning is applied on some knowledgegraphs for validating the consistency of statements inthe Graph , and removing the inconsistent post processing operations ( , operations ap-plied after the initial construction of the Graph ) wouldbe considered asrefinementmethods for this Survey ,and are included in the Knowledge base construction and re-finement has different advantages.

First, it allows atleast in principle for developing methods for refin-ing arbitrary Knowledge graphs, which can then be ap-plied to improve multiple Knowledge fine-tuning the heuristics that create a knowl-edge Graph , the impact of such generic refinementmethods can thus be larger. Second, evaluating refine-ment methods in isolation of the Knowledge Graph con-struction step allows for a better understanding anda cleaner separation of effects, , it facilitates morequalified statements about the effectiveness of a pro-posed rest of this article is structured as follows. Sec-tion 2 gives a brief introduction into Knowledge graphsin the Semantic Web. In section 3 and 4, we presenta categorization of Approaches and evaluation method-2 See section for a critical In section 5 and 6, we present the review ofmethods for completion ( , increasing coverage) anderror detection ( , increasing correctness) of knowl-edge graphs.

We conclude with a critical reflection ofthe findings in section 7, and a summary in section Knowledge Graphs in the Semantic WebFrom the early days, the Semantic Web has pro-moted a Graph -based representation of Knowledge , , by pushing the RDF standard3. In such a Graph -based Knowledge representation,entities, which arethe nodes of the Graph , are connected byrelations,which are the edges of the Graph ( ,Shakespearehas written Hamlet), and entities can have types, de-noted byis arelations ( ,Shakespeare is a writer,Hamlet is a play). In many cases, the sets of possibletypes and relations are organized in aschemaorontol-ogy, which defines their interrelations and restrictionsof their the advent of Linked Data [5], it was proposedto interlink different datasets in the Semantic Web.

Bymeans of interlinking, the collection of could be under-stood as one large, global Knowledge Graph (althoughvery heterogenous in nature). To date, roughly 1,000datasets are interlinked in theLinked Open Data cloud,with the majority of links connectingidenticalentitiesin two datasets [95].The termKnowledge Graphwas coined by Googlein 2012, referring to their use of semantic knowledgein Web Search ( Things, not strings ), and is recentlyalso used to refer to Semantic Web Knowledge basessuch as DBpedia or YAGO. From a broader perspec-tive, any Graph -based representation of some knowl-edge could be considered aknowledge Graph (thiswould include any kind of RDF dataset, as well as de-scription logic ontologies). However, there is no com-mon definition about what a Knowledge Graph is andwhat it is not. Instead of attempting a formal definitionof what a Knowledge Graph is, we restrict ourselves toa minimum set of characteristics of Knowledge graphs,which we use to tell Knowledge graphs from other col-lections of Knowledge which we wouldnotconsider asknowledge graphs.

A Knowledge graph1. mainly describes real world entities and their in-terrelations, organized in a Graph Refinement: A Survey of Approaches and Evaluation Methods32. defines possible classes and relations of entitiesin a allows for potentially interrelating arbitrary enti-ties with each covers various topical first two criteria clearly define the focus of aknowledge Graph to be the actualinstances(A-box indescription logic terminology), with the schema (T-box) playing only a minor role. Typically, this meansthat the number of instance-level statements is by sev-eral orders of magnitude larger than that of schemalevel statements (cf. Table 1). In contrast, the schemacan remain rather shallow, at a small degree of formal-ization. In that sense, mere ontologies without any in-stances (such asDOLCE[27]) would not be consid-ered as Knowledge graphs.

Likewise, we do not con-siderWordNet[67] as a Knowledge Graph , since it ismainly concerned with common nouns and words4andtheir relations (although a few proper nouns, , in-stances are also included).5 The third criterion introduces the possibility to de-fine arbitrary relations between instances, which arenot restricted in their domain and/or range. This is aproperty which is hardly found in relational databases,which follow a strict , Knowledge graphs are supposed tocover at least a major portion of the domains that ex-ist in the real world, and are not supposed to be re-stricted to only one domain (such as geographic enti-ties). In that sense, large, but single-domain datasets,such asGeoNames6, would not be considered a knowl-edge graphs on the Semantic Web are typi-cally provided using Linked Data [5] as a can be built using different methods: they canbe curated by an organization or a small, closed groupof people, crowd-sourced by a large, open group ofindividuals, or created with heuristic, automatic orsemi-automatic means.

Knowledge Graph Reﬁnement: A Survey of Approaches and ...

Tags:

Information

Transcription of Knowledge Graph Reﬁnement: A Survey of Approaches and ...

Related search queries

Knowledge Graph Reﬁnement: A Survey of Approaches and ...

Tags:

Information

Documents from same domain

Related documents

Related search queries