Latest revision as of 22:46, 3 December 2018

Compiladores
Introdução ao Desenvolvimento de Compiladores
Aspectos Teóricos de Análise Lexical
A Ferramenta Flex
Introdução à Sintaxe
Análise Sintáctica Descendente
Gramáticas Atributivas
A Ferramenta YACC
Análise Sintáctica Ascendente
Análise Semântica
Geração de Código
Tópicos de Optimização

What is a Grammar?

An unrestricted grammar is a quadruple [math]G=(V,\Sigma,R,S)[/math], where [math]V[/math] is an alphabet; [math]\Sigma[/math] is the set of terminal symbols ([math]\Sigma\subseteq{}V[/math]); [math](V-\Sigma)[/math] is the set of non-terminal symbols; [math]S[/math] is the initial symbol; and [math]R[/math] is a set of rules (a finite subset of [math](V^*(V-\Sigma)V^*)\times{}V^*[/math]).

The following are defined:

Direct derivation: [math]u\underset{G}{\Rightarrow}v\;\text{iff}\;\exists_{w_1,w_2\in{}V^*}: \exists_{(u',v')\in{}R}: u=w_1u'w_2 \wedge v=w_1v'w_2[/math]
Derivation: [math]w_0\underset{G}{\Rightarrow}w_1\underset{G}{\Rightarrow}\cdots\underset{G}{\Rightarrow}w_n\;\Leftrightarrow{}w_0\overset{*}{\underset{G}{\Rightarrow}}w_n[/math]
Generated language: [math]L(G) = \{ w\: |\: w \in \Sigma^* \wedge{}S\overset{*}{\underset{G}{\Rightarrow}}w \}[/math]

Context-free Grammars

The above grammar is unrestricted in its derivation capabilities and directly supports context-dependent rules.

In our study of programming language processing, though, only context-free grammars will be considered. This means that the [math]R[/math] set is defined on [math](V-\Sigma)\times{}V^*[/math], i.e., only non-terminals are allowed on the left side of a rule.

Even though we gave a formal definition of grammar, seldom will we use them described in that form. Most grammars are described in Backus-Naur Form (BNF), Extended Backus–Naur Form (EBNF), or other metasyntax.

Consider the following example grammar:

 E -> E + E | T
 T -> T * T | F
 F -> ( E ) | val

This grammar provides a description of the syntax of simple additive and multiplicative expressions. The language will be any expression involving the two operators, sets of parentheses and values (val). Tokens (terminals) are in blue. Token val represents any number and is a terminal in this grammar: in a compiler implementation this would be recognized at the lexical level.

The FIRST and FOLLOW sets

Computing the FIRST Set

The FIRST set for a given string or symbol can be computed as follows:

If a is a terminal symbol, then FIRST(a) = {a}
If X is a non-terminal symbol and X -> ε is a production then add ε to FIRST(X)
If X is a non-terminal symbol and X -> Y₁...Y_n is a production, then
a ∈ FIRST(X) if a ∈ FIRST(Y_i) and ε ∈ FIRST(Y_j), i>j (i.e., Y_j [math]\overset{*}{\Rightarrow}[/math] ε)

As an example, consider production X -> Y₁...Y_n

If Y₁ [math]\overset{*}{\not\Rightarrow}[/math] ε then FIRST(X) = FIRST(Y₁)
If Y₁ [math]\overset{*}{\Rightarrow}[/math] ε and Y₂ [math]\overset{*}{\not\Rightarrow}[/math] ε then FIRST(X) = FIRST(Y₁) \ {ε} ∪ FIRST(Y₂)
If Y_i [math]\overset{*}{\Rightarrow}[/math] ε (∀i) then FIRST(X) = ∪_i(FIRST(Y_i)\{ε}) ∪ {ε}

The FIRST set can also be computed for a string Y₁...Y_n much in the same way as in case 3 above.

Computing the FOLLOW Set

The FOLLOW set is computed for non-terminals and indicates the set of terminal symbols that are possible after a given non-terminal. The special symbol $ is used to represent the end of phrase (end of input).

If X is the grammar's initial symbol then {$} ⊆ FOLLOW(X)
If A -> αXβ is a production, then FIRST(β)\{ε} ⊆ FOLLOW(X)
If A -> αX or A -> αXβ (β [math]\overset{*}{\Rightarrow}[/math] ε), then FOLLOW(A) ⊆ FOLLOW(X)

The algorithm should be repeated until the FOLLOW set remains unchanged.

LOOKAHEADs

The LOOKAHEAD set is defined for a rule as follows:

LOOKAHEAD(R -> Body) = FIRST(Body) ∪ FOLLOW(R), when Body [math]\overset{*}{\Rightarrow}[/math] ε

Exercises

Exercise 1 - simple ambiguous grammar.

@@ Line 4: / Line 4: @@
 = What is a Grammar? =
-An unrestricted grammar is a quadruple <amsmath>G=(V,\Sigma,R,S)</amsmath>, where <amsmath>V</amsmath> is an alphabet; <amsmath>\Sigma</amsmath> is the set of terminal symbols (<amsmath>\Sigma\subseteq{}V</amsmath>); <amsmath>(V-\Sigma)</amsmath> is the set of non-terminal symbols; <amsmath>S</amsmath> is the initial symbol; and <amsmath>R</amsmath> is a set of rules (a finite subset of <amsmath>(V^*(V-\Sigma)V^*)\times{}V^*</amsmath>).
+An unrestricted grammar is a quadruple <math>G=(V,\Sigma,R,S)</math>, where <math>V</math> is an alphabet; <math>\Sigma</math> is the set of terminal symbols (<math>\Sigma\subseteq{}V</math>); <math>(V-\Sigma)</math> is the set of non-terminal symbols; <math>S</math> is the initial symbol; and <math>R</math> is a set of rules (a finite subset of <math>(V^*(V-\Sigma)V^*)\times{}V^*</math>).
 The following are defined:
-* Direct derivation: <amsmath>u\underset{\text{\tiny G}}{\Rightarrow}v\;\text{iff}\;\exists_{w_1,w_2\in{}V^*}: \exists_{(u',v')\in{}R}: u=w_1u'w_2 \wedge v=w_1v'w_2</amsmath>
+* Direct derivation: <math>u\underset{G}{\Rightarrow}v\;\text{iff}\;\exists_{w_1,w_2\in{}V^*}: \exists_{(u',v')\in{}R}: u=w_1u'w_2 \wedge v=w_1v'w_2</math>
-* Derivation: <amsmath>w_0\underset{\text{\tiny G}}{\Rightarrow}w_1\underset{\text{\tiny G}}{\Rightarrow}\cdots\underset{\text{\tiny G}}{\Rightarrow}w_n\;\Leftrightarrow{}w_0\overset{*}{\underset{\text{\tiny G}}{\Rightarrow}}w_n</amsmath>
+* Derivation: <math>w_0\underset{G}{\Rightarrow}w_1\underset{G}{\Rightarrow}\cdots\underset{G}{\Rightarrow}w_n\;\Leftrightarrow{}w_0\overset{*}{\underset{G}{\Rightarrow}}w_n</math>
-* Generated language: <amsmath>L(G) = \{ w\: |\: w \in \Sigma^*  \wedge{}S\overset{*}{\underset{\text{\tiny G}}{\Rightarrow}}w \}</amsmath>
+* Generated language: <math>L(G) = \{ w\: |\: w \in \Sigma^*  \wedge{}S\overset{*}{\underset{G}{\Rightarrow}}w \}</math>
 == Context-free Grammars ==
@@ Line 16: / Line 16: @@
 The above grammar is unrestricted in its derivation capabilities and directly supports context-dependent rules.
-In our study of programming language processing, though, only context-free grammars will be considered. This means that the <amsmath>R</amsmath> set is defined on <amsmath>(V-\Sigma)\times{}V^*</amsmath>, i.e., only non-terminals are allowed on the left side of a rule.
+In our study of programming language processing, though, only context-free grammars will be considered. This means that the <math>R</math> set is defined on <math>(V-\Sigma)\times{}V^*</math>, i.e., only non-terminals are allowed on the left side of a rule.
-Even though we gave a formal definition of grammar, seldom will we use them described in that form. Most grammars are described in [[wikipedia:Bakus-Naur Form|Bakus-Naur Form]] (BNF),  [[wikipedia:Extended Backus–Naur Form|Extended Backus–Naur Form]] (EBNF), or other [[wikipedia:metasyntax|metasyntax]].
+Even though we gave a formal definition of grammar, seldom will we use them described in that form. Most grammars are described in [[wikipedia:Backus–Naur Form|Backus-Naur Form]] (BNF),  [[wikipedia:Extended Backus–Naur Form|Extended Backus–Naur Form]] (EBNF), or other [[wikipedia:metasyntax|metasyntax]].
 Consider the following example grammar:
@@ Line 37: / Line 37: @@
 # If X is a non-terminal symbol and X -> ε is a production then add ε to FIRST(X)
 # If X is a non-terminal symbol and X -> Y<sub>1</sub>...Y<sub>n</sub> is a production, then
-#:a ∈ FIRST(X) if a ∈ FIRST(Y<sub>i</sub>) and ε ∈ FIRST(Y<sub>j</sub>), i>j (i.e., Y<sub>j</sub> <amsmath>\overset{*}{\Rightarrow}</amsmath> ε)
+#:a ∈ FIRST(X) if a ∈ FIRST(Y<sub>i</sub>) and ε ∈ FIRST(Y<sub>j</sub>), i>j (i.e., Y<sub>j</sub> <math>\overset{*}{\Rightarrow}</math> ε)
 As an example, consider production X -> Y<sub>1</sub>...Y<sub>n</sub>
-* If Y<sub>1</sub> <amsmath>\overset{*}{\nRightarrow}</amsmath> ε then FIRST(X) = FIRST(Y<sub>1</sub>)
+* If Y<sub>1</sub> <math>\overset{*}{\not\Rightarrow}</math> ε then FIRST(X) = FIRST(Y<sub>1</sub>)
-* If Y<sub>1</sub> <amsmath>\overset{*}{\Rightarrow}</amsmath> ε and Y<sub>2</sub> <amsmath>\overset{*}{\nRightarrow}</amsmath> ε then FIRST(X) = FIRST(Y<sub>1</sub>) \ {ε} ∪ FIRST(Y<sub>2</sub>)
+* If Y<sub>1</sub> <math>\overset{*}{\Rightarrow}</math> ε and Y<sub>2</sub> <math>\overset{*}{\not\Rightarrow}</math> ε then FIRST(X) = FIRST(Y<sub>1</sub>) \ {ε} ∪ FIRST(Y<sub>2</sub>)
-* If Y<sub>i</sub> <amsmath>\overset{*}{\Rightarrow}</amsmath> ε (∀i) then FIRST(X) = ∪<sub>i</sub>(FIRST(Y<sub>i</sub>)\{ε}) ∪ {ε}
+* If Y<sub>i</sub> <math>\overset{*}{\Rightarrow}</math> ε (∀i) then FIRST(X) = ∪<sub>i</sub>(FIRST(Y<sub>i</sub>)\{ε}) ∪ {ε}
 The FIRST set can also be computed for a string Y<sub>1</sub>...Y<sub>n</sub> much in the same way as in case 3 above.
@@ Line 52: / Line 52: @@
 # If X is the grammar's initial symbol then {$} ⊆ FOLLOW(X)
 # If A -> αXβ is a production, then FIRST(β)\{ε} ⊆ FOLLOW(X)
-# If A -> αX or A -> αXβ (β <amsmath>\overset{*}{\Rightarrow}</amsmath> ε), then FOLLOW(A) ⊆ FOLLOW(X)
+# If A -> αX or A -> αXβ (β <math>\overset{*}{\Rightarrow}</math> ε), then FOLLOW(A) ⊆ FOLLOW(X)
 The algorithm should be repeated until the FOLLOW set remains unchanged.
@@ Line 60: / Line 60: @@
 The LOOKAHEAD set is defined for a rule as follows:
-LOOKAHEAD(R -> Body) = FIRST(Body) ∪ FOLLOW(R), when Body <amsmath>\overset{*}{\Rightarrow}</amsmath> ε
+LOOKAHEAD(R -> Body) = FIRST(Body) ∪ FOLLOW(R), when Body <math>\overset{*}{\Rightarrow}</math> ε
 = Exercises =
@@ Line 66: / Line 66: @@
 * [[Introduction to Syntax/Exercise 1|Exercise 1]] - simple ambiguous grammar.
-[[category:Teaching]]
+[[category:Compiladores]]
-[[category:Compilers]]
+[[category:Ensino]]

Difference between revisions of "Introduction to Syntax"

From Wiki**3