The Flex Lexical Analyzer

From Wiki**3

Revision as of 01:22, 16 March 2012 by Root (talk | contribs) (Basic Concepts)

Compiladores
Introdução ao Desenvolvimento de Compiladores
Aspectos Teóricos de Análise Lexical
A Ferramenta Flex
Introdução à Sintaxe
Análise Sintáctica Descendente
Gramáticas Atributivas
A Ferramenta YACC
Análise Sintáctica Ascendente
Análise Semântica
Geração de Código
Tópicos de Optimização

Flex is a tool that allows the recognition of language items (elements) specified as regular expressions. These regular expressions are as specified by the corresponding formal definition (alphabet and primitive operators). In addition, Flex also supports the derived operators (see theory page) as well as a number of meta- and pseudo-characters, such as end-of-file markers or special and non-printable characters. Flex is also capable of recognizing characters by their properties (such as being a digit, and so on). All these aspects make Flex very capable in what concerns alphabet specification.

To perform string recognition, from regular expression specifications, Flex uses a finite state machine (a finite automaton). Even though it is not its primary task, Flex is also capable of using a stack in conjunction with its state machine, making it theoretically capable of processing context-free languages (theoretically, since it is not as convenient as using a dedicated tool, such as BYACC, Bison, or others).

Basic Concepts

Flex is a code generator that reads a specification file and generates the lexical analyzer as a C or C++ module (depending on the options). The description is the same in both cases (only a few details in the actions are different from one case to the other). In current versions, the code generated for the C analyzer is compatible with a C++ compiler, making it possible to program actions in this language, even without using classes. This may be a good approach, since C++ data structures are usually more programmer-friendly than C, leading to fewer bugs.

A Flex specification file is composed by three parts:

  • Definitions
  • Rules (these are used to build the automaton), actions (for manipulating the strings that match the regular expressions)
  • Code

Structure of a Flex Specification

How to Debug a Flex Specification

There are various flags and variables to activate the debug functionality in Flex (there is no need to insert useless code such as printfs or similar).

In the Flex specification file:

%option debug

This flag suffices when developing in C. When using C++ scanners (%option c++), even though the above flag still generates debug code, it's still necessary to tell the scanner to actually output debug information. This can be done by calling the set_debug method with a non-zero argument (this can be done conveniently at the start of the rules section in a Flex specification file -- note how the action is indented from the left and without any rule):

 %%
                        { set_debug(1); }

Also, the YYDEBUG environment variable will activate the debug messages both in Flex and YACC (allowing simultaneous debugging of token recognition by the scanner and corresponding use by the parser). In the command line (syntax may vary, depending on the actual environment definition circumstances):

 export YYDEBUG=1

See Also

Examples

Exercises

  • Exercise 1 - Printing the strings present in a C/C++ program.
  • Exercise 2 - Printing the number of times "." and "->" are used as operators.
  • Exercise 3 - Printing the comments present in a C/C++ program.
  • Exercise 4 - Removing the actions from a YACC specification.
  • Exercise 5 - Removing the actions from a Flex specification.
  • Exercise 6 - Playlist processing.
  • Exercise 7 - Text processing.