Difference between revisions of "Theoretical Aspects of Lexical Analysis"

Revision as of 21:14, 15 March 2008

Regular expressions are defined considering a finite alphabet Î£ = { a, b, ..., c } and the empty string Îµ:

The languages (sets of strings) for each of these entities are:

The following primitive constructors are defined:

Extensions (derived from the above):

Transitive closure (+) - a+ ("one or more 'a'")
Optionality (?) - a? ("zero or one 'a'")
Character classes - [a-z] ("all chars in the 'a-z' range" - only one character is matched)

@@ Line 25: / Line 25: @@
 == Building the NFA: Thompson's Algorithm ==
-<dot>
-	rankdir=LR;
-	node [shape = doublecircle]; LR_0 LR_3 LR_8;
-	node [shape = circle];
-	LR_0 -> LR_2 [ label = "SS(B)" ];
-	LR_0 -> LR_1 [ label = "SS(S)" ];
-	LR_1 -> LR_3 [ label = "S($end)" ];
-	LR_2 -> LR_6 [ label = "SS(b)" ];
-	LR_2 -> LR_5 [ label = "SS(a)" ];
-	LR_2 -> LR_4 [ label = "S(A)" ];
-        LR_4 -> LR_8 [ label = "S(D)" ];
-	LR_5 -> LR_7 [ label = "S(a)" ];
-	LR_5 -> LR_5 [ label = "S(b)" ];
-	LR_6 -> LR_6 [ label = "S(b)" ];
-	LR_6 -> LR_5 [ label = "S(a)" ];
-	LR_7 -> LR_8 [ label = "S(b)" ];
-	LR_7 -> LR_5 [ label = "S(a)" ];
-	LR_8 -> LR_6 [ label = "S(b)" ];
-	LR_8 -> LR_5 [ label = "S(a)" ];
-</dot>
 == Building DFAs from NFAs ==