(→Regular Expressions) |
(→Recognizing Regular Expressions) |
||
Line 22: | Line 22: | ||
* Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched) | * Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched) | ||
− | == Recognizing Regular Expressions == | + | == Recognizing/Matching Regular Expressions == |
== Building the NFA: Thompson's Algorithm == | == Building the NFA: Thompson's Algorithm == |
Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).
In this case, we will be using regular expressions for recognizing portions of the input text.
Regular expressions are defined considering a finite alphabet Σ = { a, b, ..., c } and the empty string ε:
The languages (sets of strings) for each of these entities are:
The following primitive constructors are defined:
Extensions (derived from the above):