Transcription of Distant supervision for relation extraction without ...
1 Distant supervision for relation extraction without labeled dataMike Mintz, Steven Bills, Rion Snow, Dan JurafskyStanford University / Stanford, CA models of relation extraction for tasks likeACE are based on supervised learning of relationsfrom small hand- labeled corpora. We investigate analternative paradigm that does not require labeledcorpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corporaof any size. Our experiments use Freebase, a largesemantic database of several thousand relations, toprovidedistant supervision . For each pair of enti-ties that appears in some Freebase relation , we findall sentences containing those entities in a large un- labeled corpus and extract textual features to traina relation classifier.
2 Our algorithm combines theadvantages of supervised IE (combining 400,000noisy pattern features in a probabilistic classifier)and unsupervised IE (extracting large numbers ofrelations from large corpora of any domain). Ourmodel is able to extract 10,000 instances of 102 re-lations at a precision of We also analyzefeature performance, showing that syntactic parsefeatures are particularly helpful for relations that areambiguous or lexically Distant in their IntroductionAt least three learning paradigms have been ap-plied to the task of extracting relational facts fromtext (for example, learning that a person is em-ployed by a particular organization, or that a ge-ographic entity is located in a particular region).
3 In supervised approaches, sentences in a cor-pus are first hand- labeled for the presence of en-tities and the relations between them. The NISTA utomatic Content extraction (ACE) RDC 2003and 2004 corpora, for example, include over 1,000documents in which pairs of entities have been la-beled with 5 to 7 major relation types and 23 to24 subrelations, totaling 16,771 relation systems then extract a wide variety of lexi-cal, syntactic, and semantic features, and use su-pervised classifiers to label therelation mentionholding between a given pair of entities in a testset sentence, optionally combining relation men-tions (Zhou et al.)
4 , 2005; Zhou et al., 2007; Sur-deanu and Ciaramita, 2007).Supervised relation extraction suffers from anumber of problems, however. labeled trainingdata is expensive to produce and thus limited inquantity. Also, because the relations are labeledon a particular corpus, the resulting classifiers tendto be biased toward that text alternative approach, purely unsupervisedinformation extraction , extracts strings of wordsbetween entities in large amounts of text, andclusters and simplifies these word strings to pro-duce relation -strings (Shinyama and Sekine, 2006;Banko et al., 2007). Unsupervised approaches canuse very large amounts of data and extract verylarge numbers of relations, but the resulting rela-tions may not be easy to map to relations neededfor a particular knowledge third approach has been to use a very smallnumber of seed instances or patterns to do boot-strap learning (Brin, 1998; Riloff and Jones, 1999;Agichtein and Gravano, 2000; Ravichandran andHovy, 2002; Etzioni et al.
5 , 2005; Pennacchiottiand Pantel, 2006; Bunescu and Mooney, 2007;Rozenfeld and Feldman, 2008). These seeds areused with a large corpus to extract a new set ofpatterns, which are used to extract more instances,which are used to extract more patterns, in an it-erative fashion. The resulting patterns often sufferfrom low precision and semantic propose an alternative paradigm, Distant su-pervision, that combines some of the advantagesof each of these approaches. Distant supervisionis an extension of the paradigm used by Snow etal. (2005) for exploiting WordNet to extracthyper-nym(is-a) relations between entities, and is simi-lar to the use ofweakly labeled datain bioinfor-matics (Craven and Kumlien, 1999; Morgan et al.
6 , relation nameNew instance/location/location/containsParis , Montmartre/location/location/containsOnt ario, Fort Erie/music/artist/originMighty Wagon, Cincinnati/people/deceasedperson/placeof deathFyodor Kamensky, Clearwater/people/person/nationalityMari anne Yvonne Heemskerk, Netherlands/people/person/placeofbirthWa vell Wayne Hinds, Kingston/book/author/workswrittenUpton Sinclair, Lanny Budd/business/company/foundersWWE, Vince McMahon/people/person/professionThomas Mellon, judgeTable 1: Ten relation instances extracted by our system that did not appear in ). Our algorithm uses Freebase (Bollacker etal., 2008), a large semantic database, to providedistant supervision for relation extraction .
7 Free-base contains 116 million instances of 7,300 rela-tions between 9 million entities. The intuition ofdistant supervision is that any sentence that con-tains a pair of entities that participate in a knownFreebase relation is likely to express that relationin some way. Since there may be many sentencescontaining a given entity pair, we can extract verylarge numbers of (potentially noisy) features thatare combined in a logistic regression whereas the supervised training paradigmuses a small labeled corpus of only 17,000 rela-tion instances as training data, our algorithm canuse much larger amounts of data: more text, morerelations, and more instances.
8 We use millionWikipedia articles and million instances of 102relations connecting 940,000 entities. In addition,combining vast numbers of features in a large clas-sifier helps obviate problems with bad our algorithm is supervised by adatabase, rather than by labeled text, it doesnot suffer from the problems of overfitting anddomain-dependence that plague supervised sys-tems. supervision by a database also means that,unlike in unsupervised approaches, the output ofour classifier uses canonical names for paradigm offers a natural way of integratingdata from multiple sentences to decide if a relationholds between two entities.
9 Because our algorithmcan use large amounts of unlabeled data, a pair ofentities may occur multiple times in the test each pair of entities, we aggregate the featuresfrom the many different sentences in which thatpair appeared into a single feature vector, allowingus to provide our classifier with more information,resulting in more accurate 1 shows examples of relation instancesextracted by our system. We also use this systemto investigate the value of syntactic versus lexi-cal (word sequence) features in relation syntactic features are known to improve theperformance of supervised IE, at least using cleanhand- labeled ACE data (Zhou et al.)
10 , 2007; Zhouet al., 2005), we do not know whether syntacticfeatures can improve the performance of unsuper-vised or distantly supervised IE. Most previousresearch in bootstrapping or unsupervised IE hasused only simple lexical features, thereby avoid-ing the computational expense of parsing (Brin,1998; Agichtein and Gravano, 2000; Etzioni et al.,2005), and the few systems that have used unsu-pervised IE have not compared the performanceof these two types of Previous workExcept for the unsupervised algorithms discussedabove, previous supervised or bootstrapping ap-proaches to relation extraction have typically re-lied on relatively small datasets, or on only a smallnumber of distinct relations.