Example: tourism industry

CHAPTER 13 Constituency Parsing - Stanford University

Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright 2021. Allrights reserved. Draft of December 29, ParsingOne morning I shot an elephant in my he got into my pajamas I don t Marx,Animal Crackers, 1930 Syntactic Parsing is the task of assigning a syntactic structure to a sentence. Thischapter focuses on Constituency structures, those assigned by context-free grammarsof the kind described in CHAPTER 12. In the next CHAPTER we ll introduce dependencyparses, an alternative kind of parse structure,Parse trees can be used in applications such asgrammar checking: sentence thatcannot be parsed may have grammatical errors (or at least be hard to read). Parsetrees can be an intermediate stage of representation forsemantic analysis(as weshow in CHAPTER 16) and thus play a role in applications likequestion example to answer the questionWhich flights to Denver depart before the Seattle flight?

Recall from Chapter 12 that grammars in CNF are restricted to rules of the form A !B C or A !w. That is, the right-hand side of each rule must expand either to two non-terminals or to a single terminal. Restricting a grammar to CNF does not. 4 CHAPTER 13•CONSTITUENCY PARSING

Tags:

  Chapter, Parsing

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of CHAPTER 13 Constituency Parsing - Stanford University

1 Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright 2021. Allrights reserved. Draft of December 29, ParsingOne morning I shot an elephant in my he got into my pajamas I don t Marx,Animal Crackers, 1930 Syntactic Parsing is the task of assigning a syntactic structure to a sentence. Thischapter focuses on Constituency structures, those assigned by context-free grammarsof the kind described in CHAPTER 12. In the next CHAPTER we ll introduce dependencyparses, an alternative kind of parse structure,Parse trees can be used in applications such asgrammar checking: sentence thatcannot be parsed may have grammatical errors (or at least be hard to read). Parsetrees can be an intermediate stage of representation forsemantic analysis(as weshow in CHAPTER 16) and thus play a role in applications likequestion example to answer the questionWhich flights to Denver depart before the Seattle flight?

2 We ll need to know that the questioner wants a list of flights going to Denver, notflights going to Seattle, and parse structure (knowing thatto Denvermodifiesflights,andwhich flights to Denveris the subject of thedepart) can help begin by discussing ambiguity and the problems it presents, and then givethe Cocke-Kasami-Younger (CKY) algorithm (Kasami 1965, Younger 1967), thestandard dynamic programming approach to syntactic Parsing . We ve already seenother dynamic programming algorithms like minimum edit distance ( CHAPTER 2) andViterbi ( CHAPTER 8).The vanilla CKY algorithm returns an efficient representation of the set of parsetrees for a sentence, but doesn t tell uswhichparse tree is the right one. For that,we need to augment CKY with scores for each possible constituent. We ll see howto do this with neural span-based parsers. And we ll introduce other methods likesupertaggingfor Parsing CCG,partial Parsing methods, for use in situations inwhich a superficial syntactic analysis of an input may be sufficient, and the standardset of metrics for evaluating parser AmbiguityAmbiguity is the most serious problem faced by syntactic parsers.

3 CHAPTER 8 intro-duced the notions ofpart-of-speech ambiguityandpart-of-speech disambigua-tion. Here, we introduce a new kind of ambiguity, calledstructural ambiguity,structuralambiguityillustrated with a new toy grammarL1, shown in Figure , which adds a fewrules to theL0grammar from the last ambiguity occurs when the grammar can assign more than one parseto a sentence. Groucho Marx s well-known line as Captain Spaulding inAnimalCrackersis ambiguous because the phrasein my pajamascan be part of theNP2 CHAPTER13 CONSTITUENCYPARSINGG rammarLexiconS NP VPDet that|this|the|aS Aux NP VPNoun book|flight|meal|moneyS VPVerb book|include|preferNP PronounPronoun I|she|meNP Proper-NounProper-Noun Houston|NWANP Det NominalAux doesNominal NounPreposition from|to|on|near|throughNominal Nominal NounNominal Nominal PPVP VerbVP Verb NPVP Verb NP PPVP Verb PPVP VP PPPP Preposition NPFigure English grammar and my pajamasNominalNounelephantDetanVerbshotN PPronounISVPPPin my pajamasVPNPN ominalNounelephantDetanVerbshotNPPronoun IFigure parse trees for an ambiguous sentence.

4 The parse on the left corresponds to the humorousreading in which the elephant is in the pajamas, the parse on the right corresponds to the reading in whichCaptain Spaulding did the shooting in his byelephantor a part of the verb phrase headed byshot. Figure illus-trates these two analyses of Marx s line using rules ambiguity, appropriately enough, comes in many forms. Two commonkinds of ambiguity areattachment ambiguityandcoordination ambiguity. Asentence has anattachment ambiguityif a particular constituent can be attached toattachmentambiguitythe parse tree at more than one place. The Groucho Marx sentence is an example ofPP-attachment ambiguity. Various kinds of adverbial phrases are also subject to thiskind of ambiguity. For instance, in the following example the gerundive-VPflyingto Pariscan be part of a gerundive sentence whose subject isthe Eiffel Toweror itcan be an adjunct modifying the VP headed bysaw:( ) We saw the Eiffel Tower flying to ambiguityphrases can be conjoined by a conjunction CKY Parsing : A DYNAMICPROGRAMMINGAPPROACH3 For example, the phraseold men and womencan be bracketed as[old [men andwomen]], referring toold menandold women, or as[old men] and [women], inwhich case it is only the men who are old.

5 These ambiguities combine in complexways in real sentences, like the following news sentence from the Brown corpus:( ) President Kennedy today pushed aside other White House business todevote all his time and attention to working on the Berlin crisis address hewill deliver tomorrow night to the American people over nationwidetelevision and sentence has a number of ambiguities, although since they are semanticallyunreasonable, it requires a careful reading to see them. The last noun phrase could beparsed[nationwide [television and radio]]or[[nationwide television] and radio].The direct object ofpushed asideshould beother White House businessbut couldalso be the bizarre phrase[other White House business to devote all his time andattention to working]( , a structure likeKennedy affirmed [his intention to proposea new budget to address the deficit]). Then the phraseon the Berlin crisis address hewill deliver tomorrow night to the American peoplecould be an adjunct modifyingthe verbpushed.

6 APPlikeover nationwide television and radiocould be attachedto any of the higherVPs orNPs ( , it could modifypeopleornight).The fact that there are many grammatically correct but semantically unreason-able parses for naturally occurring sentences is an irksome problem that affects allparsers. Fortunately, the CKY algorithm below is designed to efficiently handlestructural ambiguities. And as we ll see in the following section, we can augmentCKY with neural methods to choose a single correct parse bysyntactic CKY Parsing : A Dynamic Programming ApproachDynamic programmingprovides a powerful framework for addressing the prob-lems caused by ambiguity in grammars. Recall that a dynamic programming ap-proach systematically fills in a table of solutions to sub-problems. The complete ta-ble has the solution to all the sub-problems needed to solve the problem as a the case of syntactic Parsing , these sub-problems represent parse trees for all theconstituents detected in the dynamic programming advantage arises from the context-free nature of ourgrammar rules once a constituent has been discovered in a segment of the input wecan record its presence and make it available for use in any subsequent derivationthat might require it.

7 This provides both time and storage efficiencies since subtreescan be looked up in a table, not reanalyzed. This section presents the Cocke-Kasami-Younger (CKY) algorithm, the most widely used dynamic-programming based ap-proach to Parsing (Kaplan 1973, Kay 1982) is a related approach,and dynamic programming methods are often referred to aschart Conversion to Chomsky Normal FormThe CKY algorithm requires grammars to first be in Chomsky Normal Form (CNF).Recall from CHAPTER 12 that grammars in CNF are restricted to rules of the formA B CorA w. That is, the right-hand side of each rule must expand either totwo non-terminals or to a single terminal. Restricting a grammar to CNF does not4 CHAPTER13 CONSTITUENCYPARSING lead to any loss in expressiveness, since any context-free grammar can be convertedinto a corresponding CNF grammar that accepts exactly the same set of strings asthe original s start with the process of converting a generic CFG into one represented inCNF.

8 Assuming we re dealing with an -free grammar, there are three situations weneed to address in any generic grammar: rules that mix terminals with non-terminalson the right-hand side, rules that have a single non-terminal on the right-hand side,and rules in which the length of the right-hand side is greater than remedy for rules that mix terminals and non-terminals is to simply introducea new dummy non-terminal that covers only the original terminal. For example, arule for an infinitive verb phrase such asINF-VP to VPwould be replaced by thetwo rulesINF-VP TO VPandTO with a single non-terminal on the right are calledunit productions. WeUnitproductionscan eliminate unit productions by rewriting the right-hand side of the original ruleswith the right-hand side of all the non-unit production rules that they ultimately leadto. More formally, ifA Bby a chain of one or more unit productions andB is a non-unit production in our grammar, then we addA for each such rule inthe grammar and discard all the intervening unit productions.

9 As we demonstratewith our toy grammar, this can lead to a substantialflatteningof the grammar and aconsequent promotion of terminals to fairly high levels in the resulting with right-hand sides longer than 2 are normalized through the introduc-tion of new non-terminals that spread the longer sequences over several new , if we have a rule likeA B C we replace the leftmost pair of non-terminals with a new non-terminal and introducea new production, resulting in the following new rules:A X1 X1 B CIn the case of longer right-hand sides, we simply iterate this process until the of-fending rule has been replaced by rules of length 2. The choice of replacing theleftmost pair of non-terminals is purely arbitrary; any systematic scheme that resultsin binary rules would our current grammar, the ruleS Aux NP VPwould be replaced by the tworulesS X1 VPandX1 Aux entire conversion process can be summarized as follows:1.

10 Copy all conforming rules to the new grammar Convert terminals within rules to dummy Convert unit Make all rules binary and add them to new shows the results of applying this entire conversion procedure totheL1grammar introduced earlier on page 2. Note that this figure doesn t showthe original lexical rules; since these original lexical rules are already in CNF, theyall carry over unchanged to the new grammar. Figure does, however, showthe various places where the process of eliminating unit productions has, in effect,created new lexical rules. For example, all the original verbs have been promoted tobothVPs and toSs in the converted CKY Parsing : A DYNAMICPROGRAMMINGAPPROACH5L1 GrammarL1in CNFS NP VPS NP VPS Aux NP VPS X1 VPX1 Aux NPS VPS book|include|preferS Verb NPS X2 PPS Verb PPS VP PPNP PronounNP I|she|meNP Proper-NounNP TWA|HoustonNP Det NominalNP Det NominalNominal NounNominal book|flight|meal|moneyNominal Nominal NounNominal Nominal NounNominal Nominal PPNominal Nominal PPVP VerbVP book|include|preferVP Verb NPVP Verb NPVP Verb NP PPVP X2 PPX2 Verb NPVP Verb PPVP Verb PPVP VP PPVP VP PPPP Preposition NPPP Preposition NPFigure and its conversion to CNF.


Related search queries