Theoretical Aspects of Lexical Analysis - Wiki**3

Theoretical Aspects of Lexical Analysis

From Wiki**3

Revision as of 03:33, 14 March 2008 by Root (talk | contribs) (→‎Recognizing Regular Expressions)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

1 Regular Expressions
2 Recognizing/Matching Regular Expressions
3 Building the NFA: Thompson's Algorithm
4 Building DFAs from NFAs
5 DFA Minimization
6 Input Processing
7 Recognizing Multiple Expressions
8 Example 1: Ambiguous Expressions
9 Example 2: Backtracking

Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).

In this case, we will be using regular expressions for recognizing portions of the input text.

Regular Expressions

Regular expressions are defined considering a finite alphabet Î£ = { a, b, ..., c } and the empty string Îµ:

The languages (sets of strings) for each of these entities are:

{Îµ}, for Îµ
{a}, for an entry a in Î£

The following primitive constructors are defined:

concatenation
alternative
Kleene-star (*)

Extensions (derived from the above):

Transitive closure (+) - a+ ("one or more 'a'")
Optionality (?) - a? ("zero or one 'a'")
Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)

Recognizing/Matching Regular Expressions

Building the NFA: Thompson's Algorithm

Building DFAs from NFAs

DFA Minimization

Input Processing

Recognizing Multiple Expressions

Example 1: Ambiguous Expressions

Example 2: Backtracking

Retrieved from "https://web.tecnico.ulisboa.pt/~david.matos/w/pt/index.php?title=Theoretical_Aspects_of_Lexical_Analysis&oldid=2241"