Difference between revisions of "Theoretical Aspects of Lexical Analysis"

From Wiki**3

(Regular Expressions)
Line 6: Line 6:
 
== Regular Expressions ==
 
== Regular Expressions ==
  
Regular expressions are defined considering an alphabet { a, b, ..., c } and the empty string ''eps''.
+
Regular expressions are defined considering a finite alphabet '''Σ = { a, b, ..., c }''' and the empty string '''ε''':
* eps
 
* a
 
  
Primitive constructors:
+
The languages (sets of strings) for each of these entities are:
 +
* '''{ε}''', for '''ε'''
 +
* '''{a}''', for an entry '''a''' in '''Σ'''
 +
 
 +
The following primitive constructors are defined:
 
* concatenation
 
* concatenation
 
* alternative
 
* alternative
 
* Kleene-star (*)
 
* Kleene-star (*)
  
Extensions:
+
Extensions (derived from the above):
 
* Transitive closure (+) - a+ ("one or more 'a'")
 
* Transitive closure (+) - a+ ("one or more 'a'")
 
* Optionality (?) - a? ("zero or one 'a'")
 
* Optionality (?) - a? ("zero or one 'a'")

Revision as of 03:24, 14 March 2008

Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).

In this case, we will be using regular expressions for recognizing portions of the input text.

Regular Expressions

Regular expressions are defined considering a finite alphabet Σ = { a, b, ..., c } and the empty string ε:

The languages (sets of strings) for each of these entities are:

  • {ε}, for ε
  • {a}, for an entry a in Σ

The following primitive constructors are defined:

  • concatenation
  • alternative
  • Kleene-star (*)

Extensions (derived from the above):

  • Transitive closure (+) - a+ ("one or more 'a'")
  • Optionality (?) - a? ("zero or one 'a'")
  • Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)

Recognizing Regular Expressions

Building the NFA: Thompson's Algorithm

Building DFAs from NFAs

DFA Minimization

Input Processing

Recognizing Multiple Expressions

Example 1: Ambiguous Expressions

Example 2: Backtracking