Example: air traffic controller

ERNIE: Enhanced Language Representation with Informative …

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1441 1451 Florence, Italy, July 28 - August 2, 2019 Association for Computational Linguistics1441 ERNIE: Enhanced Language Representation with Informative EntitiesZhengyan Zhang1,2,3 , Xu Han1,2,3 , Zhiyuan Liu1,2,3 , Xin Jiang4, Maosong Sun1,2,3, Qun Liu41 Department of Computer Science and Technology, Tsinghua University, Beijing, China2 Institute for Artificial Intelligence, Tsinghua University, Beijing, China3 State Key Lab on Intelligent Technology and Systems, Tsinghua University, Beijing, China4 Huawei Noah s Ark Language Representation models suchas BERT pre-trained on large-scale corporacan well capture rich semantic patterns fromplain text, and be fine-tuned to consistently im-prove the performance of various NLP , the existing pre-trained languagemodels rarely consider incorporating knowl-edge graphs (KGs)

edge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative en-tities in KGs can enhance language represen-tation with external knowledge. In this pa-per, we utilize both large-scale textual cor-pora and KGs to train an enhanced language representation model (ERNIE), which can

Tags:

  Graph

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of ERNIE: Enhanced Language Representation with Informative …

1 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1441 1451 Florence, Italy, July 28 - August 2, 2019 Association for Computational Linguistics1441 ERNIE: Enhanced Language Representation with Informative EntitiesZhengyan Zhang1,2,3 , Xu Han1,2,3 , Zhiyuan Liu1,2,3 , Xin Jiang4, Maosong Sun1,2,3, Qun Liu41 Department of Computer Science and Technology, Tsinghua University, Beijing, China2 Institute for Artificial Intelligence, Tsinghua University, Beijing, China3 State Key Lab on Intelligent Technology and Systems, Tsinghua University, Beijing, China4 Huawei Noah s Ark Language Representation models suchas BERT pre-trained on large-scale corporacan well capture rich semantic patterns fromplain text, and be fine-tuned to consistently im-prove the performance of various NLP , the existing pre-trained languagemodels rarely consider incorporating knowl-edge graphs (KGs)

2 , which can provide richstructured knowledge facts for better languageunderstanding. We argue that Informative en-tities in KGs can enhance Language represen-tation with external knowledge. In this pa-per, we utilize both large-scale textual cor-pora and KGs to train an Enhanced languagerepresentation model (ERNIE), which cantake full advantage of lexical, syntactic, andknowledge information simultaneously. Theexperimental results have demonstrated thatERNIE achieves significant improvements onvarious knowledge-driven tasks, and mean-while is comparable with the state-of-the-artmodel BERT on other common NLP source code and experiment details ofthis paper can be obtained IntroductionPre-trained Language Representation models, in-cluding feature-based (Mikolov et al.)

3 , 2013; Pen-nington et al., 2014; Peters et al., 2017, 2018) andfine-tuning (Dai and Le, 2015; Howard and Ruder,2018; Radford et al., 2018; Devlin et al., 2019)approaches, can capture rich Language informa-tion from text and then benefit many NLP appli-cations. BERT (Devlin et al., 2019), as one of themost recently proposed models, obtains the state-of-the-art results on various NLP applications bysimple fine-tuning, including named entity recog-nition (Sang and De Meulder, 2003), question indicates equal contribution Corresponding author: DylanChronicles:Volume OneBlowin in the windSongwriterWriteris_ais_aBob Dylan wrote Blowin in the Wind in 1962, and wrote Chronicles: Volume One in 2004. Figure 1:An example of incorporating extraknowledge information for Language understand-ing.

4 The solid lines present the existing knowl-edge facts. The red dotted lines present the factsextracted from the sentence in red. The green dot-dash lines present the facts extracted from the sen-tence in (Rajpurkar et al., 2016; Zellers et al.,2018), natural Language inference (Bowman et al.,2015), and text classification (Wang et al., 2018).Although pre-trained Language representationmodels have achieved promising results andworked as a routine component in many NLPtasks, they neglect to incorporate knowledge in-formation for Language understanding. As shownin Figure 1, without knowingBlowin in the WindandChronicles: Volume Onearesongandbookrespectively, it is difficult to recognize the two oc-cupations ofBob Dylan, ,songwriterandwriter, on the entity typing task.

5 Furthermore,it is nearly impossible to extract the fine-grainedrelations, such ascomposerandauthoronthe relation classification task. For the existingpre-trained Language Representation models, thesetwo sentences are syntactically ambiguous, like UNKwroteUNKinUNK . Hence, consideringrich knowledge information can lead to better lan-guage understanding and accordingly benefits var-ious knowledge-driven applications, entitytyping and relation incorporating external knowledge into lan-guage Representation models, there are two main1442challenges. (1)Structured Knowledge Encod-ing: regarding to the given text, how to effectivelyextract and encode its related Informative facts inKGs for Language Representation models is an im-portant problem; (2)Heterogeneous InformationFusion: the pre-training procedure for languagerepresentation is quite different from the knowl-edge Representation procedure, leading to two in-dividual vector spaces.

6 How to design a specialpre-training objective to fuse lexical, syntactic,and knowledge information is another overcome the challenges mentioned above,we proposeEnhanced LanguageRepresentatioNwithInformativeEnt ities (ERNIE), which pre-trains a Language Representation model on bothlarge-scale textual corpora and KGs:(1) For extracting and encoding knowledge in-formation, we firstly recognize named entity men-tions in text and then align these mentions to theircorresponding entities in KGs. Instead of directlyusing the graph -based facts in KGs, we encode thegraph structure of KGs with knowledge embed-ding algorithms like TransE (Bordes et al., 2013),and then take the Informative entity embeddingsas input for ERNIE. Based on the alignments be-tween text and KGs, ERNIE integrates entity rep-resentations in the knowledge module into the un-derlying layers of the semantic module.

7 (2) Similar to BERT, we adopt the masked lan-guage model and the next sentence prediction asthe pre-training objectives. Besides, for the bet-ter fusion of textual and knowledge features, wedesign a new pre-training objective by randomlymasking some of the named entity alignments inthe input text and asking the model to select ap-propriate entities from KGs to complete the align-ments. Unlike the existing pre-trained languagerepresentation models only utilizing local contextto predict tokens, our objectives require modelsto aggregate both context and knowledge facts forpredicting both tokens and entities, and lead to aknowledgeable Language Representation conduct experiments on two knowledge-driven NLP tasks, , entity typing and relationclassification.

8 The experimental results show thatERNIE significantly outperforms the state-of-the-art model BERT on these knowledge-driven tasks,by taking full advantage of lexical, syntactic, andknowledge information. We also evaluate ERNIEon other common NLP tasks, and ERNIE stillachieves comparable Related WorkMany efforts are devoted to pre-training lan-guage Representation models for capturing lan-guage information from text and then utilizingthe information for specific NLP tasks. Thesepre-training approaches can be divided into twoclasses, , feature-based approaches and fine-tuning early work (Collobert and Weston, 2008;Mikolov et al., 2013; Pennington et al., 2014)focuses on adopting feature-based approaches totransform words into distributed these pre-trained word representations capturesyntactic and semantic information in textual cor-pora, they are often used as input embeddings andinitialization parameters for various NLP mod-els, and offer significant improvements over ran-dom initialization parameters (Turian et al.)

9 , 2010).Since these word-level models often suffer fromthe word polysemy, Peters et al. (2018) furtheradopt the sequence-level model (ELMo) to capturecomplex word features across different linguisticcontexts and use ELMo to generate context-awareword from the above-mentioned feature-based Language approaches only using the pre-trained Language representations as input features,Dai and Le (2015) train auto-encoders on unla-beled text, and then use the pre-trained modelarchitecture and parameters as a starting pointfor other specific NLP models. Inspired by Daiand Le (2015), more pre-trained Language repre-sentation models for fine-tuning have been pro-posed. Howard and Ruder (2018) present AWD-LSTM (Merity et al.

10 , 2018) to build a univer-sal Language model (ULMFiT). Radford et al.(2018) propose a generative pre-trained Trans-former (Vaswani et al., 2017) (GPT) to learn lan-guage representations. Devlin et al. (2019) pro-pose a deep bidirectional model with multiple-layer Transformers (BERT), which achieves thestate-of-the-art results for various NLP both feature-based and fine-tuning lan-guage Representation models have achieved greatsuccess, they ignore the incorporation of knowl-edge demonstrated in recentwork, injecting extra knowledge information cansignificantly enhance original models, such asreading comprehension (Mihaylov and Frank,2018; Zhong et al., 2018), machine transla-tion (Zaremoodi et al.


Related search queries