Difference between revisions of "Theoretical Aspects of Lexical Analysis"

From Wiki**3

(Building the NFA: Thompson's Algorithm)
(Building the NFA: Thompson's Algorithm)
Line 25: Line 25:
  
 
== Building the NFA: Thompson's Algorithm ==
 
== Building the NFA: Thompson's Algorithm ==
 
<dot>
 
rankdir=LR;   
 
node [shape = doublecircle]; LR_0 LR_3 LR_8;
 
node [shape = circle];
 
LR_0 -> LR_2 [ label = "SS(B)" ];
 
LR_0 -> LR_1 [ label = "SS(S)" ];
 
LR_1 -> LR_3 [ label = "S($end)" ];
 
LR_2 -> LR_6 [ label = "SS(b)" ];
 
LR_2 -> LR_5 [ label = "SS(a)" ];
 
LR_2 -> LR_4 [ label = "S(A)" ];
 
        LR_4 -> LR_8 [ label = "S(D)" ]; 
 
LR_5 -> LR_7 [ label = "S(a)" ];
 
LR_5 -> LR_5 [ label = "S(b)" ];
 
LR_6 -> LR_6 [ label = "S(b)" ];
 
LR_6 -> LR_5 [ label = "S(a)" ];
 
LR_7 -> LR_8 [ label = "S(b)" ];
 
LR_7 -> LR_5 [ label = "S(a)" ];
 
LR_8 -> LR_6 [ label = "S(b)" ];
 
LR_8 -> LR_5 [ label = "S(a)" ];
 
</dot>
 
  
 
== Building DFAs from NFAs ==
 
== Building DFAs from NFAs ==

Revision as of 21:14, 15 March 2008

Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).

In this case, we will be using regular expressions for recognizing portions of the input text.

Regular Expressions

Regular expressions are defined considering a finite alphabet Σ = { a, b, ..., c } and the empty string ε:

The languages (sets of strings) for each of these entities are:

  • {ε}, for ε
  • {a}, for an entry a in Σ

The following primitive constructors are defined:

  • concatenation
  • alternative
  • Kleene-star (*)

Extensions (derived from the above):

  • Transitive closure (+) - a+ ("one or more 'a'")
  • Optionality (?) - a? ("zero or one 'a'")
  • Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)

Recognizing/Matching Regular Expressions

Building the NFA: Thompson's Algorithm

Building DFAs from NFAs

DFA Minimization

Input Processing

Recognizing Multiple Expressions

Example 1: Ambiguous Expressions

Example 2: Backtracking