Difference between revisions of "The YACC Parser Generator/Example: Calculator with Variables"

Revision as of 19:49, 4 May 2008

This example implements a simple calculator. The calculator has an unspecified number of integer variables and the common binary integer operators (namely, addition, subtraction, multiplication, division, and modulus), and unary integer operators (+ and -).

The language will contain the following concepts (tokens): VAR (a variable: the corresponding s attribute will contain its name); INT (an integer: the corresponding i attribute contains its value); and the operators (see below).

The Lexical Analyzer (Flex) Specification

The lexical analyzer (vars.l) is very simple and limited to recognizing variable names (token VAR), integers numbers (token INT), and the operators themselves. <text> %option noyywrap %{

include <cstdlib>
include <string>
include "y.tab.h"

%} %% [_[:alpha:]][_[:alnum:]]* yylval.s = new std::string(yytext); return VAR; digit:+ yylval.i = strtol(yytext, NULL, 10); return INT; [-+*/%=^:,] return *yytext; .|\n ; /* ignore all the rest */ %% </text>

The Syntactic Analyzer (YACC) Specification

The syntactic analyzer will be built to immediately compute the expressions in a syntax-directed fashion as they occur. It is, thus, important to use trees that build nodes as the expressions occur (left-recursive grammars). If the grammar were right-recursive, the last node would be the first to be built and the (syntax-directed) evaluation would be from the last expression to the first. <text> %{

include <iostream>
include <string>
include <map>

static std::map<std::string, int> vars; %}

%union { int i; std::string *s; }

%token BATATA %token INT %token ~~VAR %type<i> expr~~

~~%right '=' %left '+' '-' %left '*' '/' '%' %right BATATA~~

%%

~~list: stmt~~

| list stmt ;

stmt: expr ','

| expr ':' { std::cout << $1 << std::endl; } ;

expr: INT { $$ = $1; }

| VAR { $$ = vars[*$1]; delete $1; } | VAR '=' expr { $$ = vars[*$1] = $3; delete $1; } | expr '+' expr { $$ = $1 + $3; } | expr '-' expr { $$ = $1 - $3; } | expr '*' expr { $$ = $1 * $3; } | expr '/' expr { $$ = $1 / $3; } | expr '%' expr { $$ = $1 % $3; } | '+' expr %prec BATATA { $$ = $2; } | '-' expr %prec BATATA { $$ = -$2; } | '(' expr ')' { $$ = $2; } ;

%% extern int yylex(); extern int yyparse(); void yyerror(char *s) { std::cout << s << std::endl; } int main() { yyparse(); } </text>
Token BATATA is not recognized by the lexical analyzer and is only used to specify the precedence of the unary operators (+ and -) by specifying that the corresponding rules should be reduced with the precedence associated with the token BATATA (the highest in the grammar).

How to Compile?

The Flex specification is processed as follows (the file lex.yy.c is produced):

flex vars.l

The YACC specification is processed as follows (files y.tab.h, needed by the Flex-generated code, and y.tab.c):

yacc -dtv vars.y

Compiling the C/C++ code (it is C++ simply because we programmed the extra code in that language):

~~g++ -c lex.yy.c g++ -c y.tab.c g++ -o vars y.tab.o lex.yy.o~~

@@ Line 74: / Line 74: @@
 </text>
-Token '''BATATA''' is not recognized by the lexical analyzer and is used only to specify the precedence of the unary operators (+ and -) by specifying that the corresponding rules should be reduced with the precedence associated with the token '''BATATA''' (the highest in the grammar).
+Token '''BATATA''' is not recognized by the lexical analyzer and is only used to specify the precedence of the unary operators (+ and -) by specifying that the corresponding rules should be reduced with the precedence associated with the token '''BATATA''' (the highest in the grammar).
 == How to Compile? ==