The YACC Parser Generator/Example: Calculator with Variables

From Wiki**3

< The YACC Parser Generator

This example implements a simple calculator. The calculator has an unspecified number of integer variables and the common binary integer operators (namely, addition, subtraction, multiplication, division, and modulus), and unary integer operators (+ and -).

The language will contain the following concepts (tokens): VAR (a variable: the corresponding s attribute will contain its name); INT (an integer: the corresponding i attribute contains its value); and the operators (see below).

The Lexical Analyzer (Flex) Specification

The lexical analyzer (vars.l) is very simple and limited to recognizing variable names (token VAR), integers numbers (token INT), and the operators themselves.

%option noyywrap
%{
#include <cstdlib>
#include <string>
#include "y.tab.h"
%}
%%
[_[:alpha:]][_[:alnum:]]*     yylval.s = new std::string(yytext);  return VAR;
[[:digit:]]+                  yylval.i = strtol(yytext, NULL, 10); return INT;
[-+*/%=^:,]                   return *yytext;
.|\n                          ; /* ignore all the rest */
%%

The Syntactic Analyzer (YACC) Specification

The syntactic analyzer will be built to immediately compute the expressions in a syntax-directed fashion as they occur. It is, thus, important to use trees that build nodes as the expressions occur (left-recursive grammars). If the grammar were right-recursive, the last node would be the first to be built and the (syntax-directed) evaluation would be from the last expression to the first.

%{
#include <iostream>
#include <string>
#include <map>
static std::map<std::string, int> vars;
inline void yyerror(const char *s) { std::cout << s << std::endl; }
%}

%union { int i; std::string *s; }

%token tBATATA
%token<i> tINT
%token<s> tVAR
%type<i> expr

%right '='
%left '+' '-'
%left '*' '/' '%'
%right tBATATA

%%

list: stmt
    | list stmt
    ;

stmt: expr ','
    | expr ':'          { std::cout << $1 << std::endl; }
    ;

expr: tINT               { $$ = $1; }
    | tVAR               { $$ = vars[*$1];      delete $1; }
    | tVAR '=' expr      { $$ = vars[*$1] = $3; delete $1; }
    | expr '+' expr     { $$ = $1 + $3; }
    | expr '-' expr     { $$ = $1 - $3; }
    | expr '*' expr     { $$ = $1 * $3; }
    | expr '/' expr     { $$ = $1 / $3; }
    | expr '%' expr     { $$ = $1 % $3; }
    | '+' expr  %prec tBATATA    { $$ =  $2; }
    | '-' expr  %prec tBATATA    { $$ = -$2; }
    | '(' expr ')'              { $$ =  $2; }
    ;

%%
extern int yylex();
extern int yyparse();
int main() { yyparse(); }

Token BATATA is not recognized by the lexical analyzer and is only used to specify the precedence of the unary operators (+ and -) by specifying that the corresponding rules should be reduced with the precedence associated with the token BATATA (the highest in the grammar).

How to Compile?

The Flex specification is processed as follows (the file lex.yy.c is produced):

 flex vars.l

The YACC specification is processed as follows (files y.tab.h, needed by the Flex-generated code, and y.tab.c):

 yacc -dtv vars.y

Compiling the C/C++ code (it is C++ simply because we programmed the extra code in that language):

 g++ -c lex.yy.c
 g++ -c y.tab.c
 g++ -o vars y.tab.o lex.yy.o