Example: stock market

Distant supervision for relation extraction without ...

Distant supervision for relation extraction without labeled dataMike Mintz, Steven Bills, Rion Snow, Dan JurafskyStanford University / Stanford, CA models of relation extraction for tasks likeACE are based on supervised learning of relationsfrom small hand- labeled corpora. We investigate analternative paradigm that does not require labeledcorpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corporaof any size. Our experiments use Freebase, a largesemantic database of several thousand relations, toprovidedistant supervision . For each pair of enti-ties that appears in some Freebase relation , we findall sentences containing those entities in a large un- labeled corpus and extract textual features to traina relation classifier.

pus are first hand-labeled for the presence of en-tities and the relations between them. The NIST Automatic Content Extraction (ACE) RDC 2003 and 2004 corpora, for example, include over 1,000 documents in which pairs of entities have been la-beled with 5 to 7 major relation types and 23 to 24 subrelations, totaling 16,771 relation instances.

Tags:

  Document, Supervision, Bleed, Labeled, La beled

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Distant supervision for relation extraction without ...

1 Distant supervision for relation extraction without labeled dataMike Mintz, Steven Bills, Rion Snow, Dan JurafskyStanford University / Stanford, CA models of relation extraction for tasks likeACE are based on supervised learning of relationsfrom small hand- labeled corpora. We investigate analternative paradigm that does not require labeledcorpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corporaof any size. Our experiments use Freebase, a largesemantic database of several thousand relations, toprovidedistant supervision . For each pair of enti-ties that appears in some Freebase relation , we findall sentences containing those entities in a large un- labeled corpus and extract textual features to traina relation classifier.

2 Our algorithm combines theadvantages of supervised IE (combining 400,000noisy pattern features in a probabilistic classifier)and unsupervised IE (extracting large numbers ofrelations from large corpora of any domain). Ourmodel is able to extract 10,000 instances of 102 re-lations at a precision of We also analyzefeature performance, showing that syntactic parsefeatures are particularly helpful for relations that areambiguous or lexically Distant in their IntroductionAt least three learning paradigms have been ap-plied to the task of extracting relational facts fromtext (for example, learning that a person is em-ployed by a particular organization, or that a ge-ographic entity is located in a particular region).

3 In supervised approaches, sentences in a cor-pus are first hand- labeled for the presence of en-tities and the relations between them. The NISTA utomatic Content extraction (ACE) RDC 2003and 2004 corpora, for example, include over 1,000documents in which pairs of entities have been la-beled with 5 to 7 major relation types and 23 to24 subrelations, totaling 16,771 relation systems then extract a wide variety of lexi-cal, syntactic, and semantic features, and use su-pervised classifiers to label therelation mentionholding between a given pair of entities in a testset sentence, optionally combining relation men-tions (Zhou et al.)

4 , 2005; Zhou et al., 2007; Sur-deanu and Ciaramita, 2007).Supervised relation extraction suffers from anumber of problems, however. labeled trainingdata is expensive to produce and thus limited inquantity. Also, because the relations are labeledon a particular corpus, the resulting classifiers tendto be biased toward that text alternative approach, purely unsupervisedinformation extraction , extracts strings of wordsbetween entities in large amounts of text, andclusters and simplifies these word strings to pro-duce relation -strings (Shinyama and Sekine, 2006;Banko et al., 2007). Unsupervised approaches canuse very large amounts of data and extract verylarge numbers of relations, but the resulting rela-tions may not be easy to map to relations neededfor a particular knowledge third approach has been to use a very smallnumber of seed instances or patterns to do boot-strap learning (Brin, 1998; Riloff and Jones, 1999;Agichtein and Gravano, 2000; Ravichandran andHovy, 2002; Etzioni et al.

5 , 2005; Pennacchiottiand Pantel, 2006; Bunescu and Mooney, 2007;Rozenfeld and Feldman, 2008). These seeds areused with a large corpus to extract a new set ofpatterns, which are used to extract more instances,which are used to extract more patterns, in an it-erative fashion. The resulting patterns often sufferfrom low precision and semantic propose an alternative paradigm, Distant su-pervision, that combines some of the advantagesof each of these approaches. Distant supervisionis an extension of the paradigm used by Snow etal. (2005) for exploiting WordNet to extracthyper-nym(is-a) relations between entities, and is simi-lar to the use ofweakly labeled datain bioinfor-matics (Craven and Kumlien, 1999; Morgan et al.

6 , relation nameNew instance/location/location/containsParis , Montmartre/location/location/containsOnt ario, Fort Erie/music/artist/originMighty Wagon, Cincinnati/people/deceasedperson/placeof deathFyodor Kamensky, Clearwater/people/person/nationalityMari anne Yvonne Heemskerk, Netherlands/people/person/placeofbirthWa vell Wayne Hinds, Kingston/book/author/workswrittenUpton Sinclair, Lanny Budd/business/company/foundersWWE, Vince McMahon/people/person/professionThomas Mellon, judgeTable 1: Ten relation instances extracted by our system that did not appear in ). Our algorithm uses Freebase (Bollacker etal., 2008), a large semantic database, to providedistant supervision for relation extraction .

7 Free-base contains 116 million instances of 7,300 rela-tions between 9 million entities. The intuition ofdistant supervision is that any sentence that con-tains a pair of entities that participate in a knownFreebase relation is likely to express that relationin some way. Since there may be many sentencescontaining a given entity pair, we can extract verylarge numbers of (potentially noisy) features thatare combined in a logistic regression whereas the supervised training paradigmuses a small labeled corpus of only 17,000 rela-tion instances as training data, our algorithm canuse much larger amounts of data: more text, morerelations, and more instances.

8 We use millionWikipedia articles and million instances of 102relations connecting 940,000 entities. In addition,combining vast numbers of features in a large clas-sifier helps obviate problems with bad our algorithm is supervised by adatabase, rather than by labeled text, it doesnot suffer from the problems of overfitting anddomain-dependence that plague supervised sys-tems. supervision by a database also means that,unlike in unsupervised approaches, the output ofour classifier uses canonical names for paradigm offers a natural way of integratingdata from multiple sentences to decide if a relationholds between two entities.

9 Because our algorithmcan use large amounts of unlabeled data, a pair ofentities may occur multiple times in the test each pair of entities, we aggregate the featuresfrom the many different sentences in which thatpair appeared into a single feature vector, allowingus to provide our classifier with more information,resulting in more accurate 1 shows examples of relation instancesextracted by our system. We also use this systemto investigate the value of syntactic versus lexi-cal (word sequence) features in relation syntactic features are known to improve theperformance of supervised IE, at least using cleanhand- labeled ACE data (Zhou et al.)

10 , 2007; Zhouet al., 2005), we do not know whether syntacticfeatures can improve the performance of unsuper-vised or distantly supervised IE. Most previousresearch in bootstrapping or unsupervised IE hasused only simple lexical features, thereby avoid-ing the computational expense of parsing (Brin,1998; Agichtein and Gravano, 2000; Etzioni et al.,2005), and the few systems that have used unsu-pervised IE have not compared the performanceof these two types of Previous workExcept for the unsupervised algorithms discussedabove, previous supervised or bootstrapping ap-proaches to relation extraction have typically re-lied on relatively small datasets, or on only a smallnumber of distinct relations.


Related search queries