Theoretical Aspects of Lexical Analysis

Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).

In this case, we will be using regular expressions for recognizing portions of the input text.

Regular Expressions

Regular expressions are defined considering a finite alphabet Î£ = { a, b, ..., c } and the empty string Îµ:

The languages (sets of strings) for each of these entities are:

{Îµ}, for Îµ
{a}, for an entry a in Î£

The following primitive constructors are defined:

concatenation
alternative
Kleene-star (*)

Extensions (derived from the above):

Transitive closure (+) - a+ ("one or more 'a'")
Optionality (?) - a? ("zero or one 'a'")
Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)

Recognizing Regular Expressions

Building the NFA: Thompson's Algorithm

Building DFAs from NFAs

DFA Minimization

Input Processing

Recognizing Multiple Expressions

Example 1: Ambiguous Expressions

Example 2: Backtracking

Theoretical Aspects of Lexical Analysis

From Wiki**3

Revision as of 01:24, 14 March 2008 by Root (talk | contribs) (→Regular Expressions)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Regular Expressions

Recognizing Regular Expressions

Building the NFA: Thompson's Algorithm

Building DFAs from NFAs

DFA Minimization

Input Processing

Recognizing Multiple Expressions

Example 1: Ambiguous Expressions

Example 2: Backtracking

Theoretical Aspects of Lexical Analysis

From Wiki**3

Revision as of 01:24, 14 March 2008 by Root (talk | contribs) (→Regular Expressions)(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Regular Expressions

Recognizing Regular Expressions

Building the NFA: Thompson's Algorithm

Building DFAs from NFAs

DFA Minimization

Input Processing

Recognizing Multiple Expressions

Example 1: Ambiguous Expressions

Example 2: Backtracking

Revision as of 01:24, 14 March 2008 by Root (talk | contribs) (→Regular Expressions)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)