Difference between revisions of "Introduction to Syntax"

From Wiki**3

(What is a Grammar?)
(What is a Grammar?)
Line 10: Line 10:
 
* Derivation: <amsmath>w_0\underset{\text{\tiny G}}{\Rightarrow}w_1\underset{\text{\tiny G}}{\Rightarrow}\cdots\underset{\text{\tiny G}}{\Rightarrow}w_n\;\Leftrightarrow{}w_0\overset{*}{\underset{\text{\tiny G}}{\Rightarrow}}w_n</amsmath>
 
* Derivation: <amsmath>w_0\underset{\text{\tiny G}}{\Rightarrow}w_1\underset{\text{\tiny G}}{\Rightarrow}\cdots\underset{\text{\tiny G}}{\Rightarrow}w_n\;\Leftrightarrow{}w_0\overset{*}{\underset{\text{\tiny G}}{\Rightarrow}}w_n</amsmath>
 
* Generated language: <amsmath>L(G) = \{ w\: |\: w \in \Sigma^*  \wedge{}S\overset{*}{\underset{\text{\tiny G}}{\Rightarrow}}w \}</amsmath>
 
* Generated language: <amsmath>L(G) = \{ w\: |\: w \in \Sigma^*  \wedge{}S\overset{*}{\underset{\text{\tiny G}}{\Rightarrow}}w \}</amsmath>
 +
 +
== Context-free Grammars ==
 +
 +
The above grammar is unrestricted in its derivations capabilities and directly supports context-dependent rules.
 +
 +
In our coverage of programming language processing, though, only context-free grammars will be considered. This means that the <amsmath>R</amsmath> set is defined on <amsmath>(V-\Sigma)\times{}V^*</amsmath>), i.e., only non-terminals are allowed on the left side of a rule.
  
 
= The FIRST and FOLLOW sets =
 
= The FIRST and FOLLOW sets =

Revision as of 16:26, 3 April 2010

What is a Grammar?

An unrestricted grammar is a quadruple , where is an alphabet; is the set of terminal symbols (); is the set of non-terminal symbols; is the initial symbol; and is a set of rules (a finite subset of ).

The following are defined:

  • Direct derivation:
  • Derivation:
  • Generated language:

Context-free Grammars

The above grammar is unrestricted in its derivations capabilities and directly supports context-dependent rules.

In our coverage of programming language processing, though, only context-free grammars will be considered. This means that the set is defined on ), i.e., only non-terminals are allowed on the left side of a rule.

The FIRST and FOLLOW sets

Computing the FIRST Set

The FIRST set for a given string or symbol can be computed as follows:

  1. If a is a terminal symbol, then FIRST(a) = {a}
  2. If X is a non-terminal symbol and X -> ε is a production then add ε to FIRST(X)
  3. If X is a non-terminal symbol and X -> Y1...Yn is a production, then
    a ∈ FIRST(X) if a ∈ FIRST(Yi) and ε ∈ FIRST(Yj), i>j (i.e., Yj ε)

As an example, consider production X -> Y1...Yn

  • If Y1 ε then FIRST(X) = FIRST(Y1)
  • If Y1 ε and Y2 ε then FIRST(X) = FIRST(Y1) \ {ε} ∪ FIRST(Y2)
  • If Yi ε (∀i) then FIRST(X) = ∪i(FIRST(Yi)\{ε}) ∪ {ε}

The FIRST set can also be computed for a string Y1...Yn much in the same way as in case 3 above.

Computing the FOLLOW Set

The FOLLOW set is computed for non-terminals and indicates the set of terminal symbols that are possible after a given non-terminal. The special symbol $ is used to represent the end of phrase (end of input).

  1. If X is the grammar's initial symbol then {$} ⊆ FOLLOW(X)
  2. If A -> αXβ is a production, then FIRST(β)\{ε} ⊆ FOLLOW(X)
  3. If A -> αX or A -> αXβ (β ε), then FOLLOW(A) ⊆ FOLLOW(X)

The algorithm should be repeated until the FOLLOW set remains unchanged.

Exercises