Difference between revisions of "The YACC Parser Generator/Example: Calculator with Variables"

From Wiki**3

< The YACC Parser Generator
(The Syntactic Analyzer (YACC) Specification)
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
This example implements a simple calculator. The calculator has an unspecified number of integer variables and the common binary integer operators (namely, addition, subtraction, multiplication, division, and modulus), and unary integer operators (+ and -).
 
This example implements a simple calculator. The calculator has an unspecified number of integer variables and the common binary integer operators (namely, addition, subtraction, multiplication, division, and modulus), and unary integer operators (+ and -).
 +
 +
The language will contain the following concepts (tokens): '''VAR''' (a variable: the corresponding '''s''' attribute will contain its name); '''INT''' (an integer: the corresponding '''i''' attribute contains its value); and the operators (see below).
  
 
== The Lexical Analyzer (Flex) Specification ==
 
== The Lexical Analyzer (Flex) Specification ==
  
The lexical analyzer is very simple and limited to recognizing variable names, integers numbers, and the operators themselves.
+
The lexical analyzer (<tt>vars.l</tt>) is very simple and limited to recognizing variable names (token '''VAR'''), integers numbers (token '''INT'''), and the operators themselves.
<text>
+
<source lang="text">
</text>
+
%option noyywrap
 +
%{
 +
#include <cstdlib>
 +
#include <string>
 +
#include "y.tab.h"
 +
%}
 +
%%
 +
[_[:alpha:]][_[:alnum:]]*    yylval.s = new std::string(yytext);  return VAR;
 +
[[:digit:]]+                  yylval.i = strtol(yytext, NULL, 10); return INT;
 +
[-+*/%=^:,]                  return *yytext;
 +
.|\n                          ; /* ignore all the rest */
 +
%%
 +
</source>
  
 
== The Syntactic Analyzer (YACC) Specification ==
 
== The Syntactic Analyzer (YACC) Specification ==
  
 +
The syntactic analyzer will be built to immediately compute the expressions in a syntax-directed fashion as they occur. It is, thus, important to use trees that build nodes as the expressions occur (left-recursive grammars). If the grammar were right-recursive, the last node would be the first to be built and the (syntax-directed) evaluation would be from the last expression to the first.
 +
<source lang="text">
 +
%{
 +
#include <iostream>
 +
#include <string>
 +
#include <map>
 +
static std::map<std::string, int> vars;
 +
inline void yyerror(const char *s) { std::cout << s << std::endl; }
 +
%}
 +
 +
%union { int i; std::string *s; }
 +
 +
%token tBATATA
 +
%token<i> tINT
 +
%token<s> tVAR
 +
%type<i> expr
 +
 +
%right '='
 +
%left '+' '-'
 +
%left '*' '/' '%'
 +
%right tBATATA
 +
 +
%%
 +
 +
list: stmt
 +
    | list stmt
 +
    ;
 +
 +
stmt: expr ','
 +
    | expr ':'          { std::cout << $1 << std::endl; }
 +
    ;
 +
 +
expr: tINT              { $$ = $1; }
 +
    | tVAR              { $$ = vars[*$1];      delete $1; }
 +
    | tVAR '=' expr      { $$ = vars[*$1] = $3; delete $1; }
 +
    | expr '+' expr    { $$ = $1 + $3; }
 +
    | expr '-' expr    { $$ = $1 - $3; }
 +
    | expr '*' expr    { $$ = $1 * $3; }
 +
    | expr '/' expr    { $$ = $1 / $3; }
 +
    | expr '%' expr    { $$ = $1 % $3; }
 +
    | '+' expr  %prec tBATATA    { $$ =  $2; }
 +
    | '-' expr  %prec tBATATA    { $$ = -$2; }
 +
    | '(' expr ')'              { $$ =  $2; }
 +
    ;
 +
 +
%%
 +
extern int yylex();
 +
extern int yyparse();
 +
int main() { yyparse(); }
 +
</source>
 +
 +
Token '''BATATA''' is not recognized by the lexical analyzer and is only used to specify the precedence of the unary operators (+ and -) by specifying that the corresponding rules should be reduced with the precedence associated with the token '''BATATA''' (the highest in the grammar).
 +
 +
== How to Compile? ==
 +
 +
The Flex specification is processed as follows (the file <tt>lex.yy.c</tt> is produced):
 +
 +
  flex vars.l
 +
 +
The YACC specification is processed as follows (files <tt>y.tab.h</tt>, needed by the Flex-generated code, and <tt>y.tab.c</tt>):
 +
 +
  yacc -dtv vars.y
 +
 +
Compiling the C/C++ code (it is C++ simply because we programmed the extra code in that language):
 +
 +
  g++ -c lex.yy.c
 +
  g++ -c y.tab.c
 +
  g++ -o vars y.tab.o lex.yy.o
  
[[category:Compilers]]
+
[[category:Compiladores]]
[[category:Teaching]]
+
[[category:Ensino]]

Latest revision as of 10:34, 16 April 2021

This example implements a simple calculator. The calculator has an unspecified number of integer variables and the common binary integer operators (namely, addition, subtraction, multiplication, division, and modulus), and unary integer operators (+ and -).

The language will contain the following concepts (tokens): VAR (a variable: the corresponding s attribute will contain its name); INT (an integer: the corresponding i attribute contains its value); and the operators (see below).

The Lexical Analyzer (Flex) Specification

The lexical analyzer (vars.l) is very simple and limited to recognizing variable names (token VAR), integers numbers (token INT), and the operators themselves.

%option noyywrap
%{
#include <cstdlib>
#include <string>
#include "y.tab.h"
%}
%%
[_[:alpha:]][_[:alnum:]]*     yylval.s = new std::string(yytext);  return VAR;
[[:digit:]]+                  yylval.i = strtol(yytext, NULL, 10); return INT;
[-+*/%=^:,]                   return *yytext;
.|\n                          ; /* ignore all the rest */
%%

The Syntactic Analyzer (YACC) Specification

The syntactic analyzer will be built to immediately compute the expressions in a syntax-directed fashion as they occur. It is, thus, important to use trees that build nodes as the expressions occur (left-recursive grammars). If the grammar were right-recursive, the last node would be the first to be built and the (syntax-directed) evaluation would be from the last expression to the first.

%{
#include <iostream>
#include <string>
#include <map>
static std::map<std::string, int> vars;
inline void yyerror(const char *s) { std::cout << s << std::endl; }
%}

%union { int i; std::string *s; }

%token tBATATA
%token<i> tINT
%token<s> tVAR
%type<i> expr

%right '='
%left '+' '-'
%left '*' '/' '%'
%right tBATATA

%%

list: stmt
    | list stmt
    ;

stmt: expr ','
    | expr ':'          { std::cout << $1 << std::endl; }
    ;

expr: tINT               { $$ = $1; }
    | tVAR               { $$ = vars[*$1];      delete $1; }
    | tVAR '=' expr      { $$ = vars[*$1] = $3; delete $1; }
    | expr '+' expr     { $$ = $1 + $3; }
    | expr '-' expr     { $$ = $1 - $3; }
    | expr '*' expr     { $$ = $1 * $3; }
    | expr '/' expr     { $$ = $1 / $3; }
    | expr '%' expr     { $$ = $1 % $3; }
    | '+' expr  %prec tBATATA    { $$ =  $2; }
    | '-' expr  %prec tBATATA    { $$ = -$2; }
    | '(' expr ')'              { $$ =  $2; }
    ;

%%
extern int yylex();
extern int yyparse();
int main() { yyparse(); }

Token BATATA is not recognized by the lexical analyzer and is only used to specify the precedence of the unary operators (+ and -) by specifying that the corresponding rules should be reduced with the precedence associated with the token BATATA (the highest in the grammar).

How to Compile?

The Flex specification is processed as follows (the file lex.yy.c is produced):

 flex vars.l

The YACC specification is processed as follows (files y.tab.h, needed by the Flex-generated code, and y.tab.c):

 yacc -dtv vars.y

Compiling the C/C++ code (it is C++ simply because we programmed the extra code in that language):

 g++ -c lex.yy.c
 g++ -c y.tab.c
 g++ -o vars y.tab.o lex.yy.o