Exploring Pre-trained Language Models for Event Extraction ...

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5284 5294 Florence, Italy, July 28 - August 2, 2019 Association for Computational Linguistics5284 Exploring Pre-trained Language Models for Event Extraction andGenerationSen Yang , Dawei Feng , Linbo Qiao, Zhigang Kan, Dongsheng Li National University of Defense Technology, Changsha, approaches to the task of ACEevent Extraction usually depend on manuallyannotated data, which is often laborious to cre-ate and limited in size. Therefore, in addi-tion to the difficulty of Event Extraction itself,insufficient training data hinders the learningprocess as well. To promote Event Extraction ,we first propose an Event Extraction model toovercome the roles overlap problem by sep-arating the argument prediction in terms ofroles.

Moreover, to address the problem of in-sufficient training data, we propose a methodto automatically generate labeled data by edit-ing prototypes and screen out generated sam-ples by ranking the quality. Experiments onthe ACE2005 dataset demonstrate that our ex-traction model can surpass most existing ex-traction methods. Besides, incorporating ourgeneration method exhibits further significantimprovement. It obtains new state-of-the-artresults on the Event Extraction task, includingpushing the F1 score of trigger classification , and the F1 score of argument classifi-cation to IntroductionEvent Extraction is a key and challenging task formany NLP applications. It targets to detect eventtrigger and arguments.

Figure 1 illustrates a sen-tence containing an Event of typeMeettriggeredby meeting , with two arguments: PresidentBush and several Arab leaders , both of whichplay the role Entity .There are two interesting issues in Event ex-traction that require more efforts. On the onehand, roles in an Event vary greatly in frequency(Figure 2), and they can overlap on some words, These two authors contributed equally. Corresponding Author.[Trigger] Event type: MeetSentence : President Bush is going to be meeting with several Arab leaders [Entity][Entity]Figure 1: An Event of typeMeetis highlighted in thesentence, including one trigger and two sharing the same argument (the roles over-lap problem).

For example, in sentence Theexplosion killed the bomber and three shoppers , killed triggers anAttackevent, while argument the bomber plays the role Attacker as wellas the role Victim at the same time. There areabout 10%events in the ACE2005 dataset (Dod-dington et al., 2004) having the roles overlap prob-lem. However, despite the evidence of the rolesoverlap problem, few attentions have been paid toit. On the contrary, it is often simplified in evalu-ation settings of many approaches. For example,in most previous works, if an argument plays mul-tiple roles in an Event simultaneously, the modelclassifies correctly as long as the prediction hitsany one of them, which is obviously far from ac-curate to apply to the real world.

Therefore, wedesign an effective mechanism to solve this prob-lem and adopt more rigorous evaluation criteria the other hand, so far most deep learn-ing based methods for Event Extraction follow thesupervised-learning paradigm, which requires lotsof labeled data for training. However, annotatingaccurately large amounts of data is a very labo-rious task. To alleviate the suffering of existingmethods from the deficiency of predefined eventdata, Event generation approaches are often usedto produce additional events for training (Yanget al., 2018; Zeng et al., 2018; Chen et al., 2017).And distant supervision (Mintz et al., 2009) is acommonly used technique to this end for label-ing external corpus.

But the quality and quantity5285 0 2: Frequency of roles that appear in events oftypeInjurein the ACE2005 events generated with distant supervision arehighly dependent on the source data. In fact, ex-ternal corpus can also be exploited by pre-trainedlanguage Models to generate sentences. Therefore,we turn to Pre-trained Language Models , attempt-ing to leverage their knowledge learned from thelarge-scale corpus for Event , this paper proposes a frameworkbased on Pre-trained Language Models , which in-cludes an Event Extraction model as our baselineand a labeled Event generation method. Our pro-posed Event Extraction model is constituted of atrigger extractor and an argument extractor whichrefers result of the former for inference.

In addi-tion, we improve the performance of the argumentextractor by re-weighting the loss function basedon the importance of Language Models have also been ap-plied to generating labeled data. Inspired by thework of Guu et al. (2018), we take the existingsamples as prototypes for Event generation, whichcontains two key steps: argument replacement andadjunct token rewriting. Through scoring the qual-ity of generated samples, we can pick out thoseof high quality. Incorporating them with existingdata can further improve the performance of ourevent Related workEvent ExtractionIn terms of analysis granularity,there are document-level Event Extraction (Yanget al., 2018) and sentence-level Event Extraction (Zeng et al.)

, 2018). We focus on the statisticalmethods of the latter in this paper. These meth-ods can be further divided into two detailed cat-egories: the feature based ones (Liao and Grish-man, 2010; Liu et al., 2010; Miwa et al., 2009; Liuet al., 2016; Hong et al., 2011; Li et al., 2013b)which track designed features for Extraction , andthe neural based ones that take advantage of neu-ral networks to learn features automatically (Chenet al., 2015; Nguyen and Grishman, 2015; Fenget al., 2016). Event GenerationExternal resources such asFreebase, Frame-Net and WordNet are commonlyemployed to generate Event and enrich the train-ing data. Several previous Event generation ap-proaches (Chen et al., 2017; Zeng et al.

, 2018)base a strong assumption in distant supervision1to label events in unsupervised corpus. But in fact,co-occurring entities could have none expected re-lationship. In addition, Huang et al. (2016) incor-porates abstract meaning representation and distri-bution semantics to extract events. While Liu et al.(2016, 2017) manages to mine additional eventsfrom the frames in Language ModelPre-trained lan-guage Models are capable of capturing the mean-ing of words dynamically in consideration of theircontext. McCann et al. (2017) exploits languagemodel Pre-trained on supervised translation corpusin the target task. ELMO (Embeddings from Lan-guage Models ) (Peters et al., 2018) gets contextsensitive embeddings by encoding characters withstacked bidirectional LSTM (Long Short TermMemory) and residual structure (He et al.

, 2016).Howard and Ruder (2018) obtains comparable re-sult on text classification. GPT (Generative Pre-Training) (Radford et al., 2018) improves the stateof the art in 9 of 12 tasks. BERT (BidirectionalEncoder Representations from Transformers) (De-vlin et al., 2018) breaks records of 11 NLP taskand received a lot of Extraction ModelThis section describes our approach to extractevents that occur in plain text. We consider eventextraction as a two-stage task, which includes trig-ger Extraction and argument Extraction , and pro-pose aPre-trainedLanguageModel basedEventExtractor (PLMEE). Figure 3 illustrates the archi-tecture of PLMEE. It consists of a trigger extractorand an argument extractor, both of which rely onthe feature representation of Trigger ExtractorTrigger extractor targets to predict whether a tokentriggers an Event .

Exploring Pre-trained Language Models for Event Extraction ...

Tags:

Information

Transcription of Exploring Pre-trained Language Models for Event Extraction ...

Related search queries

Exploring Pre-trained Language Models for Event Extraction ...

Tags:

Information

Documents from same domain

Related documents

Related search queries