Line 1: | Line 1: | ||
{{TOCright}} | {{TOCright}} | ||
+ | Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token). | ||
+ | |||
+ | In this case, we will be using regular expressions for recognizing portions of the input text. | ||
+ | |||
== Regular Expressions == | == Regular Expressions == | ||
+ | Regular expressions are defined considering an alphabet { a, b, ..., c } and the empty string ''eps''. | ||
+ | * eps | ||
+ | * a | ||
+ | |||
+ | Primitive constructors: | ||
+ | * concatenation | ||
+ | * alternative | ||
+ | * Kleene-star (*) | ||
+ | |||
+ | Extensions: | ||
+ | * Transitive closure (+) - a+ ("one or more 'a'") | ||
+ | * Optionality (?) - a? ("zero or one 'a'") | ||
+ | * Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched) | ||
== Recognizing Regular Expressions == | == Recognizing Regular Expressions == |
Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).
In this case, we will be using regular expressions for recognizing portions of the input text.
Regular expressions are defined considering an alphabet { a, b, ..., c } and the empty string eps.
Primitive constructors:
Extensions: