Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).
In this case, we will be using regular expressions for recognizing portions of the input text.
Regular expressions are defined considering a finite alphabet Σ = { a, b, ..., c } and the empty string ε:
The languages (sets of strings) for each of these entities are:
The following primitive constructors are defined:
Extensions (derived from the above):