Theoretical Aspects of Lexical Analysis/Exercise 16

From Wiki**3

< Theoretical Aspects of Lexical Analysis

Compute the non-deterministic finite automaton (NFA) by using Thompson's algorithm. Compute the minimal deterministic finite automaton (DFA).
The alphabet is Σ = { a, b, c }. Indicate the number of processing steps for the given input string.

  • G = { bc*, a*|c, a|b* }, input string = aaabcacc

NFA

The following is the result of applying Thompson's algorithm. State 5 recognizes the first expression (token T1); state 13 recognizes token T2; and state 21 recognizes token T3.

nfa 0 0 s->0 5 5 13 13 21 21 1 1 0->1 6 6 0->6 14 14 0->14 2 2 1->2 b 2->5 3 3 2->3 4 4 3->4 c 4->5 4->3 7 7 6->7 11 11 6->11 8 8 7->8 10 10 7->10 12 12 11->12 c 9 9 8->9 a 10->13 9->8 9->10 12->13 15 15 14->15 17 17 14->17 16 16 15->16 a 18 18 17->18 20 20 17->20 16->21 19 19 18->19 b 20->21 19->18 19->20

DFA

Determination table for the above NFA:

In α∈Σ move(In, α) ε-closure(move(In, α)) In+1 = ε-closure(move(In, α))
- - 0 0, 1, 6, 7, 8, 10, 11, 13, 14, 15, 17, 18, 20, 21 0 (T2)
0 a 9, 16 8, 9, 10, 13, 16, 21 1 (T2)
0 b 2, 19 2, 3, 5, 18, 19, 20, 21 2 (T1)
0 c 12 12, 13 3 (T2)
1 a 9 8, 9, 10, 13 4 (T2)
1 b - - -
1 c - - -
2 a - - -
2 b 19 18, 19, 20, 21 5 (T3)
2 c 4 3, 4, 5 6 (T1)
3 a - - -
3 b - - -
3 c - - -
4 a 9 8, 9, 10, 13 4 (T2)
4 b - - -
4 c - - -
5 a - - -
5 b 19 18, 19, 20, 21 5 (T3)
5 c - - -
6 a - - -
6 b - - -
6 c 4 3, 4, 5 6 (T1)

Graphically, the DFA is represented as follows:

dfa 0 0 s->0 1 1 0->1 a 2 2 0->2 b 3 3 0->3 c 4 4 1->4 a 5 5 2->5 b 6 6 2->6 c 4->4 a 5->5 b 6->6 c

The minimization tree is as follows. Note that before considering transition behavior, states are split according to the token they recognize.

mintree {0, 1, 2, 3, 4, 5, 6} {0, 1, 2, 3, 4, 5, 6} {}        {}        {0, 1, 2, 3, 4, 5, 6}->{}        NF {0, 1, 2, 3, 4, 5, 6} {0, 1, 2, 3, 4, 5, 6} {0, 1, 2, 3, 4, 5, 6}->{0, 1, 2, 3, 4, 5, 6}  F {2, 6} {2, 6} {0, 1, 2, 3, 4, 5, 6} ->{2, 6}  T1 {0, 1, 3, 4} {0, 1, 3, 4} {0, 1, 2, 3, 4, 5, 6} ->{0, 1, 3, 4}  T2 {5} {5} {0, 1, 2, 3, 4, 5, 6} ->{5}  T3 {2} {2} {2, 6}->{2}  b {6} {6} {2, 6}->{6} {0, 1, 4} {0, 1, 4} {0, 1, 3, 4}->{0, 1, 4}  a {3} {3} {0, 1, 3, 4}->{3} {0} {0} {0, 1, 4}->{0}  b {1, 4} {1, 4} {0, 1, 4}->{1, 4} {1, 4} {1, 4} {1, 4}->{1, 4}  a, b, c

Given the minimization tree, the final minimal DFA is as follows.

dfa 0 0 s->0 14 14 0->14 a 2 2 0->2 b 3 3 0->3 c 14->14 a 5 5 2->5 b 6 6 2->6 c 5->5 b 6->6 c

Input Analysis

In Input In+1 / Token
0 aaabcacc$ 14
14 aabcacc$ 14
14 abcacc$ 14
14 bcacc$ T2 (aaa)
0 bcacc$ 2
2 cacc$ 6
6 acc$ T1 (bc)
0 acc$ 14
14 cc$ T2 (a)
0 cc$ 3
3 c$ T2 (c)
0 c$ 3
3 $ T2 (c)

The input string aaabcacc is, after 13 steps, split into five tokens: T2 (corresponding to lexeme aaa), T1 (bc), T2 (a), T2 (c), T2 (c).