Lexical analysis, the first step in the compilation process, splits the input data into segments and classifies them. Each segment of the input (a lexeme) will be assigned a label (the token).
In this case, we will be using regular expressions for recognizing portions of the input text.
Regular expressions are defined considering a finite alphabet Σ = { a, b, ..., c } and the empty string ε:
The languages (sets of strings) for each of these entities are:
The following primitive constructors are defined:
Extensions (derived from the above):
<dot> rankdir=LR; node [shape = doublecircle]; LR_0 LR_3 LR_8; node [shape = circle]; LR_0 -> LR_2 [ label = "SS(B)" ]; LR_0 -> LR_1 [ label = "SS(S)" ]; LR_1 -> LR_3 [ label = "S($end)" ]; LR_2 -> LR_6 [ label = "SS(b)" ]; LR_2 -> LR_5 [ label = "SS(a)" ]; LR_2 -> LR_4 [ label = "S(A)" ];
LR_4 -> LR_8 [ label = "S(D)" ];
LR_5 -> LR_7 [ label = "S(a)" ]; LR_5 -> LR_5 [ label = "S(b)" ]; LR_6 -> LR_6 [ label = "S(b)" ]; LR_6 -> LR_5 [ label = "S(a)" ]; LR_7 -> LR_8 [ label = "S(b)" ]; LR_7 -> LR_5 [ label = "S(a)" ]; LR_8 -> LR_6 [ label = "S(b)" ]; LR_8 -> LR_5 [ label = "S(a)" ]; </dot>