Example: tourism industry

Get To The Point: Summarization with Pointer-Generator ...

Get To The Point: Summarization with Pointer-Generator NetworksAbigail SeeStanford J. LiuGoogle D. ManningStanford sequence-to-sequence models haveprovided a viable new approach forab-stractivetext Summarization (meaningthey are not restricted to simply selectingand rearranging passages from the origi-nal text). However, these models have twoshortcomings: they are liable to reproducefactual details inaccurately, and they tendto repeat themselves. In this work we pro-pose a novel architecture that augments thestandard sequence-to-sequence attentionalmodel in two orthogonal ,we use a hybrid Pointer-Generator networkthat can copy words from the source textviapointing, which aids accurate repro-duction of information, while retaining theability to produce novel words through thegenerator.

words, and repeating themselves (see Figure1). In this paper we present an architecture that addresses these three issues in the context of multi-sentence summaries. While most recent ab-stractive work has focused on headline genera-tion tasks (reducing one or two sentences to a single headline), we believe that longer-text sum-

Tags:

  Generators, Words, Protein, Repeating, Pointer generator

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Get To The Point: Summarization with Pointer-Generator ...

1 Get To The Point: Summarization with Pointer-Generator NetworksAbigail SeeStanford J. LiuGoogle D. ManningStanford sequence-to-sequence models haveprovided a viable new approach forab-stractivetext Summarization (meaningthey are not restricted to simply selectingand rearranging passages from the origi-nal text). However, these models have twoshortcomings: they are liable to reproducefactual details inaccurately, and they tendto repeat themselves. In this work we pro-pose a novel architecture that augments thestandard sequence-to-sequence attentionalmodel in two orthogonal ,we use a hybrid Pointer-Generator networkthat can copy words from the source textviapointing, which aids accurate repro-duction of information, while retaining theability to produce novel words through thegenerator.

2 Second, we usecoveragetokeep track of what has been summarized,which discourages repetition. We applyour model to theCNN / Daily Mailsum-marization task, outperforming the currentabstractive state-of-the-art by at least 2 ROUGE IntroductionSummarization is the task of condensing a piece oftext to a shorter version that contains the main in-formation from the original. There are two broadapproaches to methodsassemble summariesexclusively from passages (usually whole sen-tences) taken directly from the source text, whileabstractive methodsmay generate novel wordsand phrases not featured in the source text asa human-written abstract usually does.

3 The ex-tractive approach is easier, because copying largeOriginal Text (truncated):lagos, nigeria (cnn) a day after winning nige-ria s presidency,muhammadu buharitold cnn s christiane amanpour thathe plans to aggressively fight corruption that has long plagued nigeriaand go after the root of the nation s he ll rapidly giveattention to curbing violence in the northeast part of nigeria, where the ter-rorist group boko haram operates. by cooperating with neighboring nationschad, cameroon and niger,he said his administration is confident it willbe able to thwart criminalsand others contributing to nigeria s the first time in nigeria s history, the opposition defeated the ruling partyin democratic incumbent goodluck jonathan byabout 2 million votes, according to nigeria s independent national win comes after a long history of military rule, coupsand botched attempts at democracy in africa s most populous Seq2 Seq + Attention.

4 UNK UNKsays his administration is confi-dent it will be able todestabilize nigeria s his admin-istration is confident it will be able to thwart criminals and says the country has long nigeria and nigeria s :muhammadu buharisays he plans to aggressively fight cor-ruptionin the northeast part of nigeria. he says he ll rapidly give at-tention to curbing violencein the northeast part of nigeria. he says hisadministration is confident it will be able to thwart + Coverage:muhammadu buharisays he plans to aggressivelyfight corruption that has long plagued nigeria. he says his administration isconfident it will be able to thwart criminals.

5 The win comes after a long his-tory of military rule, coups and botched attempts at democracy in africa smost populous 1:Comparison of output of 3 abstrac-tive Summarization models on a news article. Thebaseline model makesfactual errors, anonsen-sical sentenceand struggles with OOV wordsmuhammadu buhari. The Pointer-Generator modelis accurate butrepeats itself. Coverage eliminatesrepetition. The final summary is composed fromseveral of text from the source document ensuresbaseline levels of grammaticality and the other hand, sophisticated abilities that arecrucial to high-quality Summarization , such asparaphrasing, generalization, or the incorporationof real-world knowledge, are possible only in anabstractive framework (see Figure 5).

6 Due to the difficulty of abstractive summariza-tion, the great majority of past work has been ex-tractive (Kupiec et al., 1995; Paice, 1990; Sag-gion and Poibeau, 2013). However, the recent suc-cess ofsequence-to-sequencemodels ( Distribution<START>Vocabulary DistributionContext VectorGermanyazooPartial Summary"beat"Germany emerge victorious in 2-0 win against Argentina on Saturday ..Encoder HiddenStates DecoderHidden StatesSource TextFigure 2: Baseline sequence-to-sequence model with attention. The model may attend to relevant wordsin the source text to generate novel words , , to produce the novel wordbeatin the abstractive summaryGermanybeatArgentina 2-0the model may attend to the wordsvictoriousandwinin the source al.)

7 , 2014), in which recurrent neural networks(RNNs) both read and freely generate text, hasmade abstractive Summarization viable (Chopraet al., 2016; Nallapati et al., 2016; Rush et al.,2015; Zeng et al., 2016). Though these systemsare promising, they exhibit undesirable behaviorsuch as inaccurately reproducing factual details,an inability to deal with out-of-vocabulary (OOV) words , and repeating themselves (see Figure 1).In this paper we present an architecture thataddresses these three issues in the context ofmulti-sentence summaries.

8 While most recent ab-stractive work has focused on headline genera-tion tasks (reducing one or two sentences to asingle headline), we believe that longer-text sum-marization is both more challenging (requiringhigher levels of abstraction while avoiding repe-tition) and ultimately more useful. Therefore weapply our model to the recently-introducedCNN/Daily Maildataset (Hermann et al., 2015; Nallap-ati et al., 2016), which contains news articles (39sentences on average) paired with multi-sentencesummaries, and show that we outperform the state-of-the-art abstractive system by at least 2 hybridpointer-generatornetwork facili-tates copying words from the source text viapoint-ing(Vinyals et al.)

9 , 2015), which improves accu-racy and handling of OOV words , while retainingthe ability togeneratenew words . The network,which can be viewed as a balance between extrac-tive and abstractive approaches, is similar to Guet al. s (2016) CopyNet and Miao and Blunsom s(2016) Forced-Attention Sentence Compression,that were applied to short-text Summarization . Wepropose a novel variant of thecoverage vector(Tuet al., 2016) from Neural Machine Translation,which we use to track and control coverage of thesource document. We show that coverage is re-markably effective for eliminating Our ModelsIn this section we describe (1) our baselinesequence-to-sequence model, (2) our Pointer-Generator model, and (3) our coverage mechanismthat can be added to either of the first two code for our models is available Sequence-to-sequence attentional modelOur baseline model is similar to that of Nallapatiet al.

10 (2016), and is depicted in Figure 2. The to-kens of the articlewiare fed one-by-one into theencoder (a single-layer bidirectional LSTM), pro-ducing a sequence ofencoder hidden states hi. Oneach stept, the decoder (a single-layer unidirec-tional LSTM) receives the word embedding of theprevious word (while training, this is the previousword of the reference summary; at test time it isthe previous word emitted by the decoder), andhasdecoder state st. Theattention distribution atis calculated as in Bahdanau et al. (2015):eti=vTtanh(Whhi+Wsst+battn)(1)at= softmax(et)(2)wherev,Wh,Wsandbattnare learnable parame-ters.


Related search queries