Construção de Compiladores em C++ usando Objectos e Padrões

From Wiki**3

OBSOLETE: do not use

This is a manual on object- and pattern-oriented compiler construction in C++.

Introduction

A compiler is a program that takes as input a program written in the source language and translates it into another the target language, typically (although not necessarily) one understandable by a machine.

Various types: translator (between two high-level languages), compiler (from high- to low-level language), decompiler (from low- to high-level language), rewriter (within the same language).

Modern compilers usually take source code and produce object code, in a form usable by other programs, such as a linker or a virtual machine. Examples:

  • C/C++ are typically compiled into object code for later linking (.o files);
  • Java is typically compiled (Sun compiler) into binary class files (.class), used by the virtual machine. The GCC Java compiler can, besides the class files, also produce object files (like for C/C++).

Compiler or interpreter: although both types of language analysis tools share some properties, they differ in how they handle the analyzed code. Compilers will simply produce an equivalent version of the original language in a target language; interpreters will, like compilers, translate the input language, not into a target version, but rather directly into the execution of the actions described in the source language.

Who Should Read This Document?

This document is for those who seek to use the flex and yacc tools beyond the C programming language and apply object-oriented (OO) programming techniques to compiler contruction. In the following text, the C++ programming language is used, but the rationale is valid for other OO languages as well. Note, however, that C++ works with C tools almost without change, something that may not be true of other languages (although there may exist tools similar to flex and yacc that support them).

The use of C++ is not motivated only by a "better" C, a claim some would deny. Rather, it is motivated by the advantages that can be gained from bringing OO design principles in contact with compiler construction problems. In this regard, C++ is a more obvious choice than C (even though one could say that if you have mastered OO design, than you can do it in almost any language, C++ continues to be a better choice than C, simply because it offers direct support for those principles and a strict type system), and is not so far removed that traditional compiler development techniques and tools have to be abandoned.

Going beyond basic OO principles into the world of design patterns is just a small step, but one that contributes much of the overall gains in this change: indeed, effective use of a few choice design patterns -- especially, but not necessarily limited to, the composite and visitor design patterns -- contributes to a much more robust compiler and a much easier development process.

The document assumes basic knowlege of object-oriented design as well as abstract data type definition. Knowledge about design patterns is desirable, but not necessary: the patterns used in the text will be briefly presented. Nevertheless, useful insights can be gained from reading a patterns book, such as the "gang-of-4 book".

Regarding C++

Using C++ is not only a way of ensuring a "better C", but also a way of being able to use OO architecture principles in a native environment (the same principles could have been applied to C development, at the cost of increased development difficulties). Thus, we are not interested only in taking a C++ compiler, our old C code and "hope for the best". Rather, using C++ is intendend to impact every step of compiler development, from the organization of the compiler as a whole to the makeup of each component.

Using C++ is not only a decision of what language to use to write the code: it is also a matter of who or what writes the compiler code. If for a human programmer using C++ is just a matter of competence, tools that generate some of the compiler's code must be chosen carefully so that the code they generate works as expected. Some of the most common compiler development support tools already support C++ natively. This is the case of the GNU Flex lexical analyser or the GNU Bison parser generator. Other tools, such as Berkeley YACC (BYACC) support only C. In the former case, the generated code and the objects it supports have only to be integrated into the architecture; in the latter case, further adaptation may be needed, either by the programmer or through specialized wrappers. BYACC-generated parsers, in particular, as will be seen, although they are C code, are simple to adapt to C++.

Organization

This text parallels both the structure and development process of a compiler. Thus, the first part deals with lexical analysis, or by a different name, with the morphological analysis of the language being recognized. The second part presents syntax analysis in general and LALR(1) parsers in particular. The fourth part is dedicated to semantic analysis and the deep structure of a program as represented by a languistic structure. Semantic processing also covers code generation, translation, interpretation, as well as the other processes that use similar development processes.

Regarding the appendices, they present the code used throught the document. In particular, detailed descriptions of each hierarchy are presented. Also presented is the structure of the final compiler, in terms of code: both the code developed by the compiler developer, and the support code for compiler development and final program execution.

  • Using C++ and the CDK Library
  • Lexical Analysis
    • Theoretical Aspects of Lexical Analysis
    • The Flex Lexical Analyser
    • Lexical Analysis Case
  • Syntactic Analysis
    • Theoretical Aspects of Syntax
    • Using Berkeley YACC
    • Syntactic Analysis Case
  • Semantic Analysis
    • The Syntax-Semantics Interface
    • Semantic Analysis and Code Generation

See Also

  • The CDK Library
  • Postfix Code Generator
  • The Runtime Library

Further Reading

  • A. V. Aho, R. Sethi, J. D. Ullman (1986). Compilers: Principles, Techniques, and Tools. Addison-Wesley Publishing Company. ISBN 0-20110194-7.
  • E. Gamma, R. Helm, R. Johnson, J. Vlissides (1995). Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. ISBN 0-201-63361-2.
  • Nasm, The Netwide Assembler. http://freshmeat.net/projects/nasm/
  • P. R. dos Santos (2004). postfix.h
  • W3C. 2001. XML - Extensible Markup Language. http://www.w3.org/XML/