Example: bachelor of science

Chapter 6 Formal Language Theory

Chapter 6 Formal Language TheoryIn this Chapter , we introduce Formal Language Theory , the computationaltheories of languages and grammars. The models are actuallyinspired byformal logic, enriched with insights from the Theory of begin with the definition of a Language and then proceed to aroughcharacterization of the basicChomsky hierarchy. We then turn to a more de-tailed consideration of the types of languages in the hierarchy and LanguagesWhat is a Language ? Formally, a languageLis defined as as set (possiblyinfinite) of strings over some finite 7 ( Language )A languageLis a possibly infinite set of stringsover a finite alphabet .We define as the set of all possible strings over some alphabet . ThusL . The set of all possible languages over some alphabet is theset ofall possible subsets of , 2 or ( ).

Formal Language Theory In this chapter, we introduce formal language theory, the computational theories of languages and grammars. The models are actually inspired by formal logic, enriched with insights from the theory of computation. We begin with the definition of a language

Tags:

  Language, Formal, Formal language

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Chapter 6 Formal Language Theory

1 Chapter 6 Formal Language TheoryIn this Chapter , we introduce Formal Language Theory , the computationaltheories of languages and grammars. The models are actuallyinspired byformal logic, enriched with insights from the Theory of begin with the definition of a Language and then proceed to aroughcharacterization of the basicChomsky hierarchy. We then turn to a more de-tailed consideration of the types of languages in the hierarchy and LanguagesWhat is a Language ? Formally, a languageLis defined as as set (possiblyinfinite) of strings over some finite 7 ( Language )A languageLis a possibly infinite set of stringsover a finite alphabet .We define as the set of all possible strings over some alphabet . ThusL . The set of all possible languages over some alphabet is theset ofall possible subsets of , 2 or ( ).

2 This may seem rather simple,but is actually perfectly adequate for our GrammarsA grammar is a way to characterize a languageL, a way to list out whichstrings of are inLand which are not. IfLis finite, we could simply list94 Chapter 6. Formal Language THEORY95the strings, but languages by definition need not be finite. Infact, all of thelanguages we are interested in are infinite. This is, as we showed in Chapter 2,also true of human the material of this Chapter to that of the preceding two, wecan view a grammar as a logical system by which we can prove things. Forexample, we can view the strings of a Language as WFFs. If we can provesome stringuwith respect to some languageL, then we would conclude thatuis inL, way to view a grammar as a logical system is as a set of formalstatements we can use to prove that some particular stringufollows fromsome initial assumption.

3 This, in fact, is precisely how we presented thesyntax of sentential logic in Chapter 4. For example, we can think of thesymbol WFF as the initial assumption or symbol of any derivational tree ofa well-formed formula of sentential logic. We then follow the rules for atomicstatements (page 47) and WFFs (page 47).Our notion of grammar will be more specific, of course. The grammarincludes a set of rules from which we can derive strings. These rules areeffectively statements of logical equivalence of the form: , where and are again the WFFs of sentential logic. We know a formula like(p q ) is well-formed because we can progress upward from atomic statementsto WFFs showing how each fits the rules cited above. For example, weknow thatpis an atomic statement andqis an atomic statement.

4 We alsoknow that ifqis an atomic statement, then so isq . We also know thatany atomic statement is a WFF. Finally, we know that two WFFs can beassembled together into a WFF with parentheses around the whole thing anda conjunction in the can represent all these steps in the form if we add someadditional symbols. Let s adoptWfor a WFF andAfor an atomic we know thatpandqcan be atomic statements, then this is equivalent toA pandA q. Likewise, we know that any atomic statement followedby a prime is also an atomic statement:A A . We know that any atomicstatement is a WFF:W A. Last, we know that any two WFFs can be1 These statements seem to go in only one direction, yet they are not bound by therestriction we saw in first-order logic where a substitutionbased on logical consequencecan only apply to an entire formula.

5 It s probably best to understand these statementsas more like biconditionals, rather than conditionals, even though the traditional symbolhere is the same as for a logical 6. Formal Language THEORY96conjoined:W (W W).Each of these rules is part of the grammar of the syntax of WFFs. Ifevery part of a formula follows one of the rules of the grammarof the syntaxof WFFs, then we say that the formula is indeed a to the example (p q ), we can show that every part of theformula follows one of these rules by constructing a tree.( )W(WAp WAAq )Each branch corresponds to one of the rules we posited. The mother ofeach branch corresponds to and the daughters to . The elements at thevery ends of branches are referred to asterminal elements, and the elementshigher in the tree are allnon-terminal elements.

6 If all branches correspondto actual rules of the grammar and the top node is a legal starting node,then the string is syntactically well-formed with respect to that , we define a grammar as{VT,VN,S,R}, whereVTis the set ofterminal elements,VNis the set of non-terminals,Sis a member ofVN, andRis a finite set of rules of the form above. The symbolSis defined as theonly legal root non-terminal. As in the preceding example, we use capitalletters for non-terminals and lowercase letters for 8 (Grammar){VT,VN,S,R}, whereVTis the set of terminalelements,VNis the set of non-terminals,Sis a member ofVN, andRis afinite set of more closely atR, we will require that the left side of a rulecontain at least one non-terminal element and any number of other define asVT VN, all of the terminals and non-terminals a finite set of ordered pairs from VN.

7 Thus is equivalenttoh , 6. Formal Language THEORY97 Definition 9 (Rule)Ris a finite set of ordered pairs from VN ,where =VT can now consider grammars of different types. The simplestcaseto consider first, from this perspective, arecontext-freegrammars, orType2grammars. In such a grammar, all rules ofRare of the formA ,whereAis a single non-terminal element ofVNand is a string of terminalsfromVTand non-terminals fromVN. Such a rule says that a non-terminalAcan dominate the string in a tree. These are the traditionalphrase-structuretaught in introductory linguistics courses. The set of languagesthat can be generated with such a system is fairly restrictedand derivationsare straightforwardly represented with a syntactic tree.

8 The partial grammarwe exemplified above for sentential logic was of this somewhat more powerful system can be had if we allowcontext-sensitiverewrite rules, / (where cannot be ). Such a rule says thatAcan dominate in a tree if is preceded by and followed by . Ifwe set trees aside, and just concentrate on string equivalences, then this isequivalent to A . Context-sensitive grammars are also referred toasType the other direction from context-free grammars, that is towardlesspowerful grammars, we have theregularorright-linearorType grammars only contain rules of the following form:A xBorA non-terminalAcan be rewritten as a single terminal elementxor asingle non-terminal followed by a single terminal.( ) 1 context-sensitiveA / 2 context-freeA 3right-linear(A x BA x)We will see that these three types of grammars allow for successivelymore restrictive languages and can be paired with specific types of abstractmodels of computers.

9 We will also see that the Formal properties of the mostrestrictive grammar types are quite well understood and that as we move upthe hierarchy, the systems become less and less well understood, or, moreand more 6. Formal Language THEORY98 Let s look at a few examples. For all of these, assume the alphabet is ={a,b,c}.How might we define a grammar for the Language that includes all stringscomposed of one instance ofbpreceded by any number of instances ofa:{b,ab,aab,aaab,..}? We must first decide what sort of grammar to writeamong the three types we ve discussed. In general, context-free grammarsare the easiest and most intuitive to write. In this case, we might havesomething like this:( )S A bA A A aThis is an instance of a context-free grammar because all rules have a singlenon-terminal on the left and a string of terminals and non-terminals on theright.

10 This grammar cannot be right-linear because it includes rules wherethe right side has a non-terminal followed by a terminal. This grammarcannot be context-sensitive because it contains rules where the right side is . For the stringsb,ab, andaab, this produces the following trees.( )SA bSAA abSAAA aabIn terms of our Formal characterization of grammars, we have: Chapter 6. Formal Language THEORY99( )VT={a,b}VN={S,A}S=SR= S A bA A A a Other grammars are possible for this Language too. For example:( )S bS A bA aA A aThis grammar is context-free, but also qualifies as context-sensitive. We nolonger have on the right side of any rule and a single non-terminal on the leftqualifies as a string of terminals and non-terminals. This grammar producesthe following trees for the same three strings.


Related search queries