Bottom-Up Parsing

From Wiki**3

Revision as of 14:56, 28 April 2008 by Root (talk | contribs) (New page: Bottom-up parsers start from the input tokens and try to build increasingly complex structures until the initial symbol is reached. The bottom-up parsers covered in this section are the o...)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Bottom-up parsers start from the input tokens and try to build increasingly complex structures until the initial symbol is reached.

The bottom-up parsers covered in this section are the ones belonging to the LR family (left-right rightmost derivation). These parsers are also known as shift/reduce parsers, after the actions they perform (shift = input consumption; reduce = node building). The parsers covered are LR(0) (a simple parser without any kind of lookahead), SLR (a parser that builds only when the input matches the elements in the follow set of a rule being reduced), and LALR(1) (a parser that restricts the follow symbols that can be matched in each state).

The sections below describe, first, the LR(0) and SLR(1) algorithms, and then, the LALR(1) algorithm. All three parsers share the same "shift" and "goto" actions, but differ in the restrictions put on "reduce" actions.

Building SLR Parse Tables

For building SLR tables we will define the notion of LR(0) item.

An LR(0) item is, informally, a production with a dot (•). The dot represents the portion of the rule that was seen by the parser so far. When the dot crosses over a symbol (advancing its position within the rule), the parser has performed a shift action (if the symbol is a terminal) or a reduce (if the symbol is a non-terminal).

For instance:

 E -> • A B C
 E -> A • B C

Internally, the parser may use a pair of integers for representing an item (one for the rule, another for the dot's position).

For each production in a grammar, there is a set of LR(0) items, each corresponding to a possible position of the dot.

The main idea is to build an automaton that recognizes viable prefixes from the input. This automaton may be thought of as an NFA in which every single LR(0) item and corresponding transition is considered. This NFA could then be converted into a DFA and minimized. This process is, however, too long and, since a direct approach exists, unnecessary.

The following steps describe the direct construction of the DFA.