Semantic Analysis

From Wiki**3

Compiladores
Introdução ao Desenvolvimento de Compiladores
Aspectos Teóricos de Análise Lexical
A Ferramenta Flex
Introdução à Sintaxe
Análise Sintáctica Descendente
Gramáticas Atributivas
A Ferramenta YACC
Análise Sintáctica Ascendente
Análise Semântica
Geração de Código
Tópicos de Optimização

Semantic analysis is mostly concerned with types associated with language objects and how these types are used by the language constructs that depend on them, such as functions and arithmetic operators.

Types can be implicitly specified (e.g., in literals) and inferred (e.g., from operations). This is the case of languages such as Python and other scripting languages, able to make type inference at run time. It can also in languages such as C++ (auto) and Java (var), that make type inference at compile time.

On the other hand, typed entities may be explicitly declared. This is how most statically compiled languages work: the program's entities are explicitly typed and types may be verified by the compiler.

This section focuses on type checking, based on the abstract syntax tree's nodes, specifically those that declare typed entities (declarations of typed program entities, such as functions and variables), and those that use those entities (functions and operators). The entities themselves, of course, must remember their own types, so that they may require compliance.

Representing Typed Information in the AST

Type information is present in the AST itself. This information may be directly set by the parser, during syntactic analysis, e.g. in declarations, or it may be set -- the most usual way -- during semantic analysis.

The main nodes involved in representing types are the following:

  • typed_node -- this is the superclass of any node that bears a type. It also provides a convenient interface for checking and managing types.
  • expression_node -- this is a subclass of typed_node that represents program expressions, that is, any value that can be used by a program. Expressions may be primitive, e.g. literals, or composed by other expressions, e.g. operators.
  • lvalue_node -- left-values denote the write-compatible memory locations, these are not the usual values denoted by expression nodes, although any left-value can be converted into an expression, either by considering the memory address it represents (a pointer), or the value at that location (rvalue_node). Left-values are usually known as variables:variable_node, in the simplest case; index_node (for instance) in a more elaborate one.
  • Other cases of typed nodes correspond, in certain languages, to function and variable declarations.

Class cdk::typed_node

File typed_node.h
#ifndef __CDK15_AST_TYPEDNODE_NODE_H__
#define __CDK15_AST_TYPEDNODE_NODE_H__

#include <cdk/ast/basic_node.h>
#include <cdk/types/types.h>
#include <memory>

namespace cdk {

  /**
   * Typed nodes store a type description.
   */
  class typed_node: public basic_node {
  protected:
    // This must be a pointer, so that we can anchor a dynamic
    // object and be able to change/delete it afterwards.
    std::shared_ptr<basic_type> _type;

  public:
    /**
     * @param lineno the source code line number corresponding to
     * the node
     */
    typed_node(int lineno) :
        basic_node(lineno), _type(nullptr) {
    }

    std::shared_ptr<basic_type> type() {
      return _type;
    }
    void type(std::shared_ptr<basic_type> type) {
      _type = type;
    }

    bool is_typed(typename_type name) const {
      return _type->name() == name;
    }

  };

} // cdk

#endif

Class cdk::expression_node

File expression_node.h
#ifndef __CDK15_AST_EXPRESSIONNODE_NODE_H__
#define __CDK15_AST_EXPRESSIONNODE_NODE_H__

#include <cdk/ast/typed_node.h>

namespace cdk {

  /**
   * Expressions are typed nodes that have a value.
   */
  class expression_node: public typed_node {

  protected:
    /**
     * @param lineno the source code line corresponding to the node
     */
    expression_node(int lineno) :
        typed_node(lineno) {
    }

  };

} // cdk

#endif

Class cdk::lvalue_node

File lvalue_node.h
#ifndef __CDK15_LVALUE_NODE_H__
#define __CDK15_LVALUE_NODE_H__

#include <cdk/ast/typed_node.h>
#include <string>

namespace cdk {

  /**
   * Class for describing syntactic tree leaves for lvalues.
   */
  class lvalue_node: public typed_node {
  protected:
    lvalue_node(int lineno) :
        typed_node(lineno) {
    }

  };

} // cdk

#endif

Declarations and definitions

Declarations and definitions can be seen as typed nodes since they declare entities that bear types. The precise definition of these nodes depends on the language and, thus, the nodes are not provided by the CDK. In general, though, they all have to be able to store one or more names (the entity or entities) being declared/defined (variables, functions, and so on) and, possibly, other information (e.g., access qualifiers).

Representing and Manipulating Types

Types are used to characterize the memory used by the various language entities (described by one or more AST nodes).

Types should not be confused with AST nodes.

The CDK has four base definitions. They are, in general, sufficient for most languages, and are easily extended.

  • basic_type -- this is the abstract superclass. It is used mostly to refer to unknown or general types.
  • primitive_type -- this class is used to represent any "atomic" data type (that is, unstructured or non-reference types).
  • reference_type -- this class is used to describe reference/pointer types.
  • structured_type -- this class allows for the definition of complex (i.e., hierarchical) data types: it is suitable for describing tuples or structures/classes.
  • functional_type -- this class allows for the definition of types for function objects: it is suitable for describing input/output types for functions.

Class cdk::basic_type

In addition to providing a base representation for all type references, it also provides two operators for comparing any two types.

File basic_type.h
#ifndef __CDK17_TYPES_BASIC_TYPE_H__
#define __CDK17_TYPES_BASIC_TYPE_H__

#include <cdk/types/typename_type.h>
#include <cstdlib>
#include <memory>

namespace cdk {

  /**
   * This class represents a general type concept.
   */
  class basic_type {
    size_t _size = 0; // in bytes
    typename_type _name = TYPE_UNSPEC;

  protected:

    struct explicit_call_disabled {};

  protected:

    basic_type() :
        _size(0), _name(TYPE_UNSPEC) {
    }
    basic_type(size_t size, typename_type name) :
        _size(size), _name(name) {
    }

    virtual ~basic_type() noexcept = 0;

  public:

    size_t size() const { return _size; }
    typename_type name() const { return _name; }

  };

  inline bool operator==(const std::shared_ptr<basic_type> t1, const std::shared_ptr<basic_type> t2) {
    return t1->size() == t2->size() && t1->name() == t2->name();
  }
  inline bool operator!=(const std::shared_ptr<basic_type> t1, const std::shared_ptr<basic_type> t2) {
    return !(t1 == t2);
  }

} // cdk

#endif

Class cdk::primitive_type

File primitive_type.h
#ifndef __CDK17_TYPES_PRIMITIVE_TYPE_H__
#define __CDK17_TYPES_PRIMITIVE_TYPE_H__

#include <cdk/types/typename_type.h>
#include <cdk/types/basic_type.h>
#include <cstdlib>

namespace cdk {

  /**
   * Primitive (i.e., non-structured non-indirect) types.
   */
  class primitive_type: public basic_type {
  public:
    //primitive_type() :
    //    basic_type(0, TYPE_UNSPEC) {
    //}
    explicit primitive_type(explicit_call_disabled, size_t size, typename_type name) :
        basic_type(size, name) {
    }

    ~primitive_type() = default;


  public:

    static auto create(size_t size, typename_type name) {
      return std::make_shared<primitive_type>(explicit_call_disabled(), size, name);
    }

    static auto cast(std::shared_ptr<basic_type> type) {
      return std::dynamic_pointer_cast<primitive_type>(type);
    }

  };

} // cdk

#endif

Class cdk::reference_type

File reference_type.h
#ifndef __CDK17_TYPES_REFERENCE_TYPE_H__
#define __CDK17_TYPES_REFERENCE_TYPE_H__

#include <cdk/types/basic_type.h>

namespace cdk {

  /**
   * This class represents a reference type concept (such as a C pointer or a C++ reference).
   */
  struct reference_type: public basic_type {
    std::shared_ptr<basic_type> _referenced = nullptr;

  public:
    explicit reference_type(explicit_call_disabled, size_t size,  std::shared_ptr<basic_type> referenced) :
        basic_type(size, TYPE_POINTER), _referenced(referenced) {
    }

    ~reference_type() = default;

    std::shared_ptr<basic_type> referenced() const {
      return _referenced;
    }

  public:

    static auto create(size_t size, std::shared_ptr<basic_type> referenced) {
      return std::make_shared<reference_type>(explicit_call_disabled(), size, referenced);
    }

    static auto cast(std::shared_ptr<basic_type> type) {
      return std::dynamic_pointer_cast<reference_type>(type);
    }

  };

} // cdk

#endif

Class cdk::structured_type

File structured_type.h
#ifndef __CDK17_TYPES_STRUCTURED_TYPE_H__
#define __CDK17_TYPES_STRUCTURED_TYPE_H__

#include <vector>
#include <numeric>
#include <cdk/types/basic_type.h>

namespace cdk {

  /**
   * This class represents a structured type concept.
   */
  class structured_type: public basic_type {
    std::vector<std::shared_ptr<basic_type>> _components;

  private:
    size_t compute_size(const std::vector<std::shared_ptr<basic_type>> &components) {
      size_t size = 0;
      for (auto component : components)
        size += component->size();
      return size;
    }

  public:

    explicit structured_type(explicit_call_disabled, const std::vector<std::shared_ptr<basic_type>> &components) :
        basic_type(compute_size(components), TYPE_STRUCT), _components(components) {
      // EMPTY
    }

    ~structured_type() = default;

  public:

    std::shared_ptr<basic_type> component(size_t ix) { return _components[ix]; }
    const std::vector<std::shared_ptr<basic_type>>& components() const { return _components; }
    size_t length() const { return _components.size(); }

  public:

    static auto create(const std::vector<std::shared_ptr<basic_type>> &types) {
      return std::make_shared<structured_type>(explicit_call_disabled(), types);
    }

    static auto cast(std::shared_ptr<basic_type> type) {
      return std::dynamic_pointer_cast<structured_type>(type);
    }

  };

} // cdk

#endif

Class cdk::functional_type

File functional_type.h
#ifndef __CDK17_TYPES_FUNCTIONAL_TYPE_H__
#define __CDK17_TYPES_FUNCTIONAL_TYPE_H__

#include <vector>
#include <numeric>
#include <cdk/types/basic_type.h>
#include <cdk/types/structured_type.h>

namespace cdk {

  /**
   * This class represents a functional type concept.
   */
  class functional_type: public basic_type {
    std::shared_ptr<structured_type> _input;
    std::shared_ptr<structured_type> _output;

  public:

    // size 4 is because this is actually just a pointer
    explicit functional_type(explicit_call_disabled, const std::vector<std::shared_ptr<basic_type>> &input, const std::vector<std::shared_ptr<basic_type>> &output) :
        basic_type(4, TYPE_FUNCTIONAL), _input(structured_type::create(input)), _output(structured_type::create(output)) {
      // EMPTY
    }

    ~functional_type() = default;

  public:

    std::shared_ptr<basic_type> input(size_t ix) {
      return _input->component(ix);
    }

    std::shared_ptr<basic_type> output(size_t ix) {
      return _output->component(ix);
    }

    const std::shared_ptr<structured_type> &input() const {
      return _input;
    }
    const std::shared_ptr<structured_type> &output() const {
      return _output;
    }

    size_t input_length() const {
      return _input->length();
    }

    size_t output_length() const {
      return _output->length();
    }

  public:

    static auto create(const std::vector<std::shared_ptr<basic_type>> &input_types, const std::vector<std::shared_ptr<basic_type>> &output_types) {
      return std::make_shared<functional_type>(explicit_call_disabled(), input_types, output_types);
    }

    static auto create(const std::vector<std::shared_ptr<basic_type>> &input_types, const std::shared_ptr<basic_type> &output_type) {
      std::vector<std::shared_ptr<basic_type>> output_types = { output_type };
      return std::make_shared<functional_type>(explicit_call_disabled(), input_types, output_types);
    }

    static auto create(const std::vector<std::shared_ptr<basic_type>> &output_types) {
      std::vector<std::shared_ptr<basic_type>> input_types;
      return std::make_shared<functional_type>(explicit_call_disabled(), input_types, output_types);
    }

    static auto create(const std::shared_ptr<basic_type> &output_type) {
      std::vector<std::shared_ptr<basic_type>> output_types = { output_type };
      return create(output_types);
    }

    static auto cast(std::shared_ptr<basic_type> type) {
      return std::dynamic_pointer_cast<functional_type>(type);
    }

  };

} // cdk

#endif

Convenience functions for handling types

The following functions are provided for convenience: they allow for writing clearer code.

Implementations have been omitted for the sake of clarity (they are available in the CDK).

File types.h
#ifndef __CDK17_TYPES_TYPES_H__
#define __CDK17_TYPES_TYPES_H__

#include <cdk/types/basic_type.h>
#include <cdk/types/primitive_type.h>
#include <cdk/types/reference_type.h>
#include <cdk/types/structured_type.h>
#include <cdk/types/functional_type.h>
#include <memory>

namespace cdk {

  inline std::string to_string(std::shared_ptr<basic_type> type) {
    if (type->name() == TYPE_INT) return "integer";
    if (type->name() == TYPE_DOUBLE) return "double";
    if (type->name() == TYPE_STRING) return "string";
    if (type->name() == TYPE_VOID) return "void";
    if (type->name() == TYPE_POINTER) {
      auto r = cdk::reference_type::cast(type)->referenced();
      return "pointer to " + to_string(r);
    } else {
      return "(unknown or unsupported type)";
    }
  }

} // cdk

#endif

The Symbol Table

A interface pública da tabela de símbolos é a seguinte (foram omitidas todas as partes não públicas, assim como os métodos de construção/destruição):

  • push - create a new context and make it current.
  • pop - destroy the current context: the previous context becomes the current one. If the first context is reached no operation is performed.
  • insert - define a new identifier in the local (current) context: name is the symbol's name; symbol is the symbol. Returns true if this is a new identifier (may shadow another defined in an upper context). Returns false if the identifier already exists in the current context.
  • replace_local - replace the data corresponding to a symbol in the current context: name is the symbol's name; symbol is the symbol. Returns true if the symbol exists; false if the symbol does not exist in any of the contexts.
  • replace - replace the data corresponding to a symbol (look for the symbol in all available contexts, starting with the innermost one): name is the symbol's name; symbol is the symbol. Returns true if the symbol exists; false if the symbol does not exist in any of the contexts.
  • find_local - search for a symbol in the local (current) context: name is the symbol's name; symbol is the symbol. Returns the symbol if it exists; and nullptr if the symbol does not exist in the current context.
  • find - search for a symbol in the avaible contexts, starting with the first one and proceeding until reaching the outermost context. name is the symbol's name; from how many contexts up from the current one (zero). Returns nullptr if the symbol cannot be found in any of the contexts; or, the symbol and corresponding attributes.
File symbol_table.h (interface summary)
namespace cdk {

  template<typename Symbol>
  class symbol_table {
  public:
    void push();

    void pop();

    bool insert(const std::string &name, std::shared_ptr<Symbol> symbol);

    bool replace_local(const std::string &name, std::shared_ptr<Symbol> symbol);

    bool replace(const std::string &name, std::shared_ptr<Symbol> symbol);

    std::shared_ptr<Symbol> find_local(const std::string &name);

    std::shared_ptr<Symbol> find(const std::string &name, size_t from = 0) const;

};

Symbol representation

Symbols describe named program entities and store their properties. They provide support for the semantic processor: declarations create new symbols. Expressions and left-values refer to those symbols.

A simple representation in this case could be done in the following way. Note that this definition is just an example and contains only minimal information. It should be extended to account for the needs of the language being implemented.

File symbol.h (Tiny language)
#ifndef __TINY_TARGETS_SYMBOL_H__
#define __TINY_TARGETS_SYMBOL_H__

#include <string>
#include <memory>
#include <cdk/types/basic_type.h>

namespace tiny {

  class symbol {
    std::string _name; // identifier
    std::shared_ptr<cdk::basic_type> _type; // type (type id + type size)
  public:
    // constructors, destructor, getters, etc.

  public:
    // critical for type checking (interface similar to that of class cdk::typed_node)
    std::shared_ptr<cdk::basic_type> type() const { return _type; }
    void set_type(std::shared_ptr<cdk::basic_type> t) { _type = t; }
    bool is_typed(cdk::typename_type name) const { return _type->name() == name; }
  };

  // this function simplifies symbol creation in the type_checker visitor (see below)
  inline auto make_symbol(const std::string &name, std::shared_ptr<cdk::basic_type> type, /* rest of ctor args */) {
    return std::make_shared<symbol>(name, type, /* rest of ctor args */);
  }

} // tiny
#endif

Type Checking: Using Visitors

Type checking is the process of verifying whether the types used in the various language constructs are appropriate. It can be performed at compile time (static type checking) or at run time.

The type checking discussed here is the static approach, i.e., checking whether the types used for objects and the operations that manipulate them at compile time are consistent.

In the approach followed by CDK-based compilers, code generation is carried out by visitors that are responsible for traversing the abstract syntax tree and generate, evaluating each node. Node evaluation may depend on the specificities of the data types being manipulated, the simplest of which is the data type's size, important in all memory-related operations.

Examples

Type checking example: the Tiny language

The following example considers a simple grammar and performs the whole of the semantic analysis process and, finally, generates the corresponding C code. The semantic analysis process must account for variables (they must be declared before they can be used) and for their types (all types must be used correctly).

Type checking example: the Simple language

The following example considers an evolution of Compact, called Simple. Where Compact forces some verification via syntactic analysis (thus, presenting low flexibility), Simple has a richer grammar and, consequently, admits constructions that may not be correct in what concerns types of operators, functions, and their arguments. Type checking in this case is built-in, since, without it, it would be impossible to guarantee the correctness of any expression.

Exercises