Prompt-Learning for Fine-Grained Entity Typing

Natural Language MLM Yes Class:Entailment Inference ( 3 classes, [CLS] What happened to his lab ? [MASK] , his lab was torn down. [SEP]. Entailment/Netural/. Contradiction) prompt [CLS] Bob Dylan, who wrote the song "Blowing in the Wind", Label Words Class Sets Entity Typing ( > 40 classes, won the Nobel Prize in Literature in 2016 . MLM. Prompt-Learning for fine -GrainedBob Entity Typing Person/Location/. Dylan is [MASK]. Organization/ ). [SEP]. prompt Ning Ding1 , Yulin Chen3 , Xu Han1,5 , Guangwei Xu2 , Pengjun Xie2 , Hai-Tao Zheng3 , Zhiyuan Liu1,5 , Juanzi Li1 , Hong-Gee Kim4. 1. Department of Computer Science and Technology, Tsinghua University 2. Alibaba Group 3 SIGS, Tsinghua University 4 Seoul National University 5. State Key Lab on Intelligent Technology and Systems, Tsinghua University {dingn18, yl-chen21, Abstract MLM Apple As an effective approach to tune pre-trained Knowledge [CLS] iPhone is produced by [MASK].}

[SEP]. [ ] 24 Aug 2021. language models (PLMs) for specific tasks, Probing prompt Prompt-Learning has recently attracted much attention from researchers. By using cloze- style language prompts to stimulate the ver- MLM Great Positive Sentiment satile knowledge of PLMs, Prompt-Learning Classification [CLS] I like this. It was [MASK] . [SEP]. ( ~ 2 classes ). can achieve promising results on a series of prompt NLP tasks, such as natural language inference, sentiment classification, and knowledge [CLS] What happened to his lab ? Natural probing. In this work, we investigate the ap- Language MLM Yes Entailment plication of Prompt-Learning on Fine-Grained Inference [MASK] , his lab was torn down in 1904 . [SEP]. ( ~ 3 classes ). Entity Typing in fully supervised, few-shot prompt and zero-shot scenarios. We first develop a simple and effective Prompt-Learning pipeline [CLS] Bob Dylan, the author of "Blowing in the Wind", by constructing Entity -oriented verbalizer and Entity won the Nobel Prize in Literature in 2016.

Templates and conducting masked language Typing Bob Dylan is [MASK] . [SEP] PERSON. ( > 46 classes ). modeling. Further, to tackle the zero-shot MLM. ARTIST. AUTHOR. regime, we propose a self-supervised strategy prompt Label Words Class Sets that carries out distribution-level optimization in Prompt-Learning to automatically summa- Figure 1: Examples of Prompt-Learning to stimulate the rize the information of Entity types. Extensive knowledge of PLMs by formalizing specific tasks as experiments on three Fine-Grained Entity typ- equivalent cloze-style tasks. ing benchmarks (with up to 86 classes) under fully supervised, few-shot and zero-shot settings show that Prompt-Learning methods sig- downstream NLP tasks. Considering the versatile nificantly outperform fine -tuning baselines, es- knowledge contained in PLMs, many efforts of pecially when the training data is insufficient.

Researchers have been devoted to stimulating task- 1 Introduction specific knowledge in PLMs and adapting such knowledge to downstream NLP tasks. fine -tuning In recent years, pre-trained language models with extra classifiers has been one typical solution (PLMs) have been widely explored and become for adapting PLMs to specific tasks and achieves a key instrument for natural language understand- promising results on various NLP tasks (Qiu et al., ing (Devlin et al., 2019; Liu et al., 2019) and gener- 2020; Han et al., 2021a). ation (Radford et al., 2018; Raffel et al., 2020). By Some recent efforts on probing knowledge of applying self-supervised learning on large-scale PLMs show that, by writing some natural language unlabeled corpora, PLMs can capture rich lexi- prompts, we can induce PLMs to complete fac- cal (Jawahar et al., 2019), syntactic (Hewitt and tual knowledge (Petroni et al.))

, 2019). GPT-3 fur- Manning, 2019; Wang et al., 2021), and factual ther utilizes the information provided by prompts knowledge (Petroni et al., 2019) that well benefits to conduct few-shot learning and achieves awe- . equal contribution some results (Brown et al., 2020). Inspired by this, . corresponding authors Prompt-Learning has been introduced. As shown in Figure 1, in Prompt-Learning , downstream tasks BBN (Weischedel and Brunstein, 2005). All these are formalized as equivalent cloze-style tasks, and datasets have a complex type hierarchy consisting PLMs are asked to handle these cloze-style tasks of rich Entity types, requiring models to have good instead of original downstream tasks. Compared capabilities of Entity attribute detection. Empiri- with conventional fine -tuning methods, prompt - cally, our method yields significant improvements learning does not require extra neural layers and on these benchmark datasets, especially under the intuitively bridges the objective form gap between zero-shot and few-shot settings.

We also make an pre-training and fine -tuning. Sufficient empirical analysis and point out both the superiority and bot- analysis shows that, either for manually picking tleneck of Prompt-Learning in Fine-Grained Entity hand-crafted prompts (Liu et al., 2021b; Han et al., Typing , which may advance further efforts to ex- 2021b) or automatically building auto-generated tract Entity attributes using PLMs. Our source code prompts (Shin et al., 2020; Gao et al., 2020; Lester and pre-trained models will be publicly available. et al., 2021), taking prompts for tuning models is surprisingly effective for the knowledge stimula- 2 Background tion and model adaptation of PLMs, especially in In this section, we first give a problem definition of the low-data regime. the Entity Typing task ( ), followed by an intro- Intuitively, Prompt-Learning is applicable to fine - duction of conventional vanilla fine -tuning ( ).

Grained Entity Typing , which aims at classifying and prompt -based tuning ( ) with PLMs. marked entities from input sequences into specific types in a pre-defined label set. We discuss this Problem Definition topic with a motivating example, He is from New The input of Entity Typing is a dataset D =. York . By adding a prompt with a masking token {x1 , .., xn } with n sentences, and each sentence [MASK], the sentence becomes He is from New x contains a marked Entity mention m. For each York. In this sentence, New York is [MASK] . Due input sentence x, Entity Typing aims at predicting to the wealth of knowledge acquired during pre- the Entity type y Y of its marked mention m, training, PLMs can compute a probability distri- where Y is a pre-defined set of Entity types. En- bution over the vocabulary at the masked position, tity Typing is typically regarded as a context-aware and a relatively higher probability with the word classification task.

For example, in the sentence city than the word person . In other words, with London is the fifth album by the rock band Jesus simple prompts, the abstract Entity attributes con- , the Entity mention London should be clas- tained in PLMs can be efficiently exploited, which sified as Music rather than Location. In the era is meaningful for downstream Entity -related tasks. of PLMs, using pre-trained neural language models In this work, we comprehensively explore the ( BERT) as the encoder and performing model application of Prompt-Learning to Fine-Grained en- tuning for classifying types becomes a standard tity Typing in fully supervised, few-shot and zero- paradigm. shot settings. Particularly, we first introduce a Vanilla fine -tuning naive pipeline, where we construct Entity -oriented prompts and formalize Fine-Grained Entity Typing In the vanilla fine -tuning paradigm of Entity typ- as a cloze-style task.

This simple pipeline yields ing, for each token ti in an input sequence promising results in our experiments, especially x = {[CLS], t1 , .. , m, .. , tT , [SEP]} with when supervision is insufficient. Then, to tackle a marked Entity mention m = {ti , .. , tj }, the the zero-shot scenario where no explicit supervi- PLM M produces its contextualized representa- sion exists in training, we develop a self-supervised tion {h[CLS] , h1 , .. , hT , h[SEP] }. Empirically, strategy under our Prompt-Learning pipeline. Our we choose the embedding of the [CLS] token, self-supervised strategy attempts to automatically h[CLS] , as the final representation that is fed into summarize Entity types by optimizing the similarity an output layer to predict the probability distribu- of the predicted probability distributions of paired tion over the label space examples in Prompt-Learning .

P (y Y|s) = softmax(Wh[CLS] + b), (1). Three popular benchmarks are used for our experiments, including F EW-NERD (Ding et al., where W and b are learnable parameters. W, b 2021b), OntoNotes (Weischedel et al., 2013), and all parameters of PLMs are tuned by maximiz- Label Words Class Sets City Mapping LOCATION/. Location CITY.. Copy the Entity mention MLM head [CLS] London is one of the biggest cities in the world. London is a [MASK] . [SEP]. Input prompt Figure 2: The illustration of Prompt-Learning for Fine-Grained Entity Typing with supervision. We take hard- encoding prompt strategy as an example in this figure. ing the objective function n1 ni=1 log(P (yi |si )), Label Words Set V . P. where yi is the golden type label of si . For Fine-Grained Entity Typing , datasets usu- ally use hierarchical label space such as P ER - prompt -based Tuning SON /A RTIST (F EW-NERD) and O RGANIZA - In prompt -based tuning, for each label y Y, we TION /PARTY (OntoNotes).

Prompt-Learning for Fine-Grained Entity Typing

Tags:

Information

Transcription of Prompt-Learning for Fine-Grained Entity Typing

Related search queries

Prompt-Learning for Fine-Grained Entity Typing

Tags:

Information

Documents from same domain

Related documents

Related search queries