Transcription of CHAPTER 12 Constituency Grammars
1 Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright 2021. Allrights reserved. Draft of December 29, GrammarsBecause the Nightby Bruce Springsteen and Patty SmithThe Fire Next Timeby James BaldwinIf on a winter s night a travelerby Italo CalvinoLove Actuallyby Richard CurtisSuddenly Last Summerby Tennessee WilliamsA Scanner Darklyby Philip K. DickSix titles that are not constituents, from Geoffrey K. Pullum onLanguage Log (who was pointing out their incredible rarity).The study of grammar has an ancient pedigree. The grammar of sanskrit wasdescribed by the Indian grammarian P sometime between the 7th and 4th cen-turies BCE, in his famous treatise the adhy ay ( 8 books ). And our wordsyn-taxcomes from the Greeks yntaxis, meaning setting out together or arrangement ,syntaxand refers to the way words are arranged together. We have seen various syntacticnotions in previous chapters: ordering of sequences of words ( CHAPTER 2), probabil-ities for these word sequences ( CHAPTER 3), and the use of part-of-speech categoriesas a grammatical equivalence class for words ( CHAPTER 8).
2 In this CHAPTER and thenext three we introduce a variety of syntactic phenomena that go well beyond thesesimpler approaches, together with formal models for capturing them in a computa-tionally useful bulk of this CHAPTER is devoted to context-free Grammars . Context-free gram-mars are the backbone of many formal models of the syntax of natural language (and,for that matter, of computer languages). As such, they play a role in many computa-tional applications, including grammar checking, semantic interpretation, dialogueunderstanding, and machine translation. They are powerful enough to express so-phisticated relations among the words in a sentence, yet computationally tractableenough that efficient algorithms exist for parsing sentences with them (as we showin CHAPTER 13). And in CHAPTER 16 we show how they provide a systematic frame-work for semantic interpretation. Here we also introduce the concept of lexicalizedgrammars, focusing on one example,combinatory categorial grammar , CHAPTER 14 we introduce a formal model of grammar calledsyntactic depen-denciesthat is an alternative to these Constituency Grammars , and we ll give algo-rithms fordependency parsing.
3 Both Constituency and dependency formalisms areimportant for language , we provide a brief overview of the grammar of English, illustrated froma domain with relatively simple sentences called ATIS (Air Traffic Information Sys-tem) (Hemphill et al., 1990). ATIS systems were an early spoken language systemfor users to book flights, by expressing sentences likeI d like to fly to ConstituencySyntactic Constituency is the idea that groups of words can behave as single units,or constituents. Part of developing a grammar involves building an inventory of theconstituents in the language. How do words group together in English? Considerthenoun phrase, a sequence of words surrounding at least one noun. Here are somenoun phraseexamples of noun phrases (thanks to Damon Runyon):Harry the Horsea high-class spot such as Mindy sthe Broadway coppersthe reason he comes into the Hot Boxtheythree parties from BrooklynWhat evidence do we have that these words group together (or form constituents )?
4 One piece of evidence is that they can all appear in similar syntactic environments,for example, before a parties from Brooklynarrive..a high-class spot such as Mindy sattracts..the Broadway copperslove..theysitBut while the whole noun phrase can occur before a verb, this is not true of eachof the individual words that make up a noun phrase. The following are not grammat-ical sentences of English (recall that we use an asterisk (*) to mark fragments thatare not grammatical English sentences):*fromarrive..*asattracts..*th eis..*spotsat..Thus, to correctly describe facts about the ordering of these words in English, wemust be able to say things like Noun Phrases can occur before verbs .Other kinds of evidence for Constituency come from what are calledpreposedorpreposedpostposedconstru ctions. For example, the prepositional phraseon September sev-postposedenteenthcan be placed in a number of different locations in the following examples,including at the beginning (preposed) or at the end (postposed):On September seventeenth, I d like to fly from Atlanta to DenverI d like to flyon September seventeenthfrom Atlanta to DenverI d like to fly from Atlanta to Denveron September seventeenthBut again, while the entire phrase can be placed differently, the individual wordsmaking up the phrase cannot be:*On September, I d like to fly seventeenthfrom Atlanta to Denver*OnI d like to fly September seventeenthfrom Atlanta to Denver*I d like to fly on Septemberfrom Atlanta to Denver Context-Free GrammarsThe most widely used formal system for modeling constituent structure in Englishand other natural languages is theContext-Free grammar , orCFG.
5 CONTEXT-FREEGRAMMARS3free Grammars are also calledPhrase-Structure Grammars , and the formalismis equivalent toBackus-Naur Form, orBNF. The idea of basing a grammar onconstituent structure dates back to the psychologist Wilhelm Wundt 1900 but wasnot formalized until Chomsky (1956) and, independently, Backus (1959).A context-free grammar consists of a set ofrulesorproductions, each of whichrulesexpresses the ways that symbols of the language can be grouped and ordered to-gether, and alexiconof words and symbols. For example, the following productionslexiconexpress that anNP(ornoun phrase) can be composed of either aProperNounorNPa determiner (Det) followed by aNominal; aNominalin turn can consist of one Det NominalNP ProperNounNominal Noun|Nominal NounContext-free rules can be hierarchically embedded, so we can combine the previousrules with others, like the following, that express facts about the lexicon:Det aDet theNoun flightThe symbols that are used in a CFG are divided into two classes.
6 The symbolsthat correspond to words in the language ( the , nightclub ) are calledterminalterminalsymbols; the lexicon is the set of rules that introduce these terminal symbols. Thesymbols that express abstractions over these terminals are callednon-terminals. Innon-terminaleach context-free rule, the item to the right of the arrow ( ) is an ordered list of oneor more terminals and non-terminals; to the left of the arrow is a single non-terminalsymbol expressing some cluster or generalization. The non-terminal associated witheach word in the lexicon is its lexical category, or part of CFG can be thought of in two ways: as a device for generating sentencesand as a device for assigning a structure to a given sentence. Viewing a CFG as agenerator, we can read the arrow as rewrite the symbol on the left with the stringof symbols on the right .So starting from the symbol:NPwe can use our first rule to rewriteNPas:Det Nominaland then rewriteNominalas:Det Nounand finally rewrite these parts-of-speech as:a flightWe say the stringa flightcan be derived from the non-terminalNP.
7 Thus, a CFGcan be used to generate a set of strings. This sequence of rule expansions is called aderivationof the string of words. It is common to represent a derivation by aparsederivationtree(commonly shown inverted with the root at the top). Figure shows the treeparse treerepresentation of this the parse tree shown in Fig. , we can say that the nodeNPdominatesdominatesall the nodes in the tree (Det,Nom,Noun,a,flight). We can say further that itimmediately dominates the formal language defined by a CFG is the set of strings that are derivablefrom the designatedstart symbol. Each grammar must have one designated startstart symbol1 When talking about these rules we can pronounce the rightarrow as goes to , and so we mightread the first rule above as NP goes to Det Nominal .4 CHAPTER12 CONSTITUENCYGRAMMARSNPNomNounflightDetaF igure parse tree for a flight.
8 Symbol, which is often calledS. Since context-free Grammars are often used to definesentences,Sis usually interpreted as the sentence node, and the set of strings thatare derivable fromSis the set of sentences in some simplified version of s add a few additional rules to our inventory. The following rule expressesthe fact that a sentence can consist of a noun phrase followed by averb phrase:verb phraseS NP VPI prefer a morning flightA verb phrase in English consists of a verb followed by assorted other things;for example, one kind of verb phrase consists of a verb followed by a noun phrase:VP Verb NPprefer a morning flightOr the verb may be followed by a noun phrase and a prepositional phrase:VP Verb NP PPleave Boston in the morningOr the verb phrase may have a verb followed by a prepositional phrase alone:VP Verb PPleaving on ThursdayA prepositional phrase generally has a preposition followed by a noun example, a common type of prepositional phrase in the ATIS corpus is used toindicate location or direction:PP Preposition NPfrom Los AngelesTheNPinside aPPneed not be a location;PPsare often used with times anddates, and with other nouns as well.
9 They can be arbitrarily complex. Here are tenexamples from the ATIS corpus:to Seattleon these flightsin Minneapolisabout the ground transportation in Chicagoon Wednesdayof the round trip flight on United Airlinesin the eveningof the AP fifty seven flighton the ninth of Julywith a stopover in NashvilleFigure gives a sample lexicon, and Fig. summarizes the grammar ruleswe ve seen so far, which we ll callL0. Note that we can use the or-symbol|toindicate that a non-terminal has alternate possible can use this grammar to generate sentences of this ATIS-language . Westart withS, expand it toNP VP, then choose a random expansion ofNP(let s say, toI), and a random expansion ofVP(let s say, toVerb NP), and so on until we generatethe stringI prefer a morning flight. Figure shows a parse tree that represents acomplete derivation ofI prefer a morning can also represent a parse tree in a more compact format calledbracketednotation; here is the bracketed representation of the parse tree of Fig.
10 CONTEXT-FREEGRAMMARS5 Noun flights|flight|breeze|tri p|morningVerb is|prefer|like|need|want|fly|doAdjective chea pest|non-stop|first|latest|other|directP ronoun me|I|you|itProper-Noun Alaska|Baltimore|Los Angeles|Chicago|United|AmericanDetermine r the|a|an|this|these|thatPreposition from|to|on|near|inConjunction and|or|butFigure lexicon RulesExamplesS NP VPI + want a morning flightNP PronounI|Proper-NounLos Angeles|Det Nominala + flightNominal Nominal Nounmorning + flight|NounflightsVP Verbdo|Verb NPwant + a flight|Verb NP PPleave + Boston + in the morning|Verb PPleaving + on ThursdayPP Preposition NPfrom + Los AngelesFigure grammar forL0, with example phrases for each parse tree for I prefer a morning flight according to grammarL0.( ) [S[NP[ProI]] [VP[Vprefer] [NP[Deta] [Nom[Nmorning] [Nom[Nflight]]]]]]A CFG like that ofL0defines a formal language. We saw in CHAPTER 2 that a for-mal language is a set of strings.