Code Generation

Compiladores
Introdução ao Desenvolvimento de Compiladores
Aspectos Teóricos de Análise Lexical
A Ferramenta Flex
Introdução à Sintaxe
Análise Sintáctica Descendente
Gramáticas Atributivas
A Ferramenta YACC
Análise Sintáctica Ascendente
Análise Semântica
Geração de Código
Tópicos de Optimização

Topics

Interpretation: syntax-directed and based on abstract syntax tree.
Code generation: syntax-directed and based on abstract syntax tree.

The approach described here is based on generation from the intermediate representation in the form of a an abstract syntax tree formed by the nodes created during the parsing process.

The target machine is the Postfix engine described below.

The Postfix Machine

SL0 - single stack, large stack, 0 operand machine.

Three special purpose registers: IP (instruction pointer), for jumps and function calls; FP (frame pointer), the basis of the frame pointer for each function; SP (stack pointer), keeps track of the top of the stack.

The stack grows in the direction of lower addresses.

Almost all operations use the stack as input and output (exceptions are PUSH and POP, for returning values from functions in accordance with CDecl), and instructions that use either immediate arguments or arguments in global memory (i.e., with explicit named addressing).

The Postfix reference guide (a.k.a. Appendix B) contains further details about the Postfix machine and its operations.

Basic Structures (code)

The mnemonics are those of the Postfix language. Comments indicate missing parts.

Cycles

Basic structure of a "while" cycle:

<asm> LABEL condition

evaluate condition expression

JZ endwhile

evaluate block

JMP condition LABEL endwhile </asm>

Basic structure of a C-style "for" cycle:

<asm>

evaluate initializer

LABEL condition

evaluate condition expression

JZ endfor

evaluate block

LABEL increment

evaluate increment expression

JMP condition LABEL endfor </asm>

"Break" and "Continue" Instructions

The C-style "break" and "continue" instructions must jump, respectively, to the end of the cycle or to its condition (for deciding on whether to execute the next iteration). In the case of "for" cycles, the jump to the condition is not direct, but rather via a jump to the increment label.

If there are multiple nested cycles, the code generator must keep track of the innermost one, so that it can generate the appropriate jumps. This can be achieved through the use of address stacks (one for the condition labels and another for increment labels).

If-Then and If-Then-Else Instructions

Basic structure of an "if-then" instruction:

<asm>

evaluate condition expression

JZ endif

evaluate then block

LABEL endif </asm>

Basic structure of an "if-then-else" instruction:

<asm>

evaluate condition expression

JZ else

evaluate then block

JMP endif LABEL else

evaluate else block

LABEL endif </asm>

Functions

The function structure presented here assumes a "void" return. If a return were desired, the use of POP or DPOP would be required before the LEAVE instruction.

"sizeoflocalvars" is the number of bytes needed to store all of the function's local variables. If this value is 0 (zero), START may be used instead of ENTER.

If the function is not global (e.g. C-style "static" function), then GLOBAL should not be used. <asm> GLOBAL functionname,FUNC LABEL functionname ENTER sizeoflocalvars

evaluate function body

LEAVE RET </asm>

Basic Structures (data)

Global (permanent) variables

We include in this category C-like "static" function variables (the only difference is that they are not accessed with their explicit names, but rather with a automatic symbol table-based name).

Addressing of these variables is by name, as shown in the following examples.

<asm>

a = 1

INT 1 ; put 1 on the stack DUP ; duplicate value ADDR a ; put address of "a" on the stack STORE ; write the value on the given address </asm>

If the resulting value is not needed, throw it away:

<asm>

a = 1; // expression as instruction

INT 1 ; put 1 on the stack DUP ; duplicate value ADDR a ; put address of "a" on the stack STORE ; write the value on the given address TRASH 4 </asm>

<asm>

a = b (both global)

ADDR b ; put address of "b" on the stack LOAD ; read the value at the given address DUP ; duplicate value (= is an expression) ADDR a ; put address of "a" on the stack STORE ; write the value on the given address </asm>

If ADDR is immediately followed by LOAD, then the two instructions may be replaced by a single one with the same effect: ADDRV. Similarly, ADDR+STORE = ADDRA.

<asm>

a = 1

INT 1 ; put 1 on the stack DUP ; duplicate value (= is an expression) ADDRA a ; write the value in the address corresponding to "a" </asm>

<asm>

a = b (both global)

ADDRV b ; read value in the address of "b" and put it on the stack DUP ; duplicate value (= is an expression) ADDRA a ; write the value in the address corresponding to "a" </asm>

Local variables

In the following examples, we assume that both a and b are local variables (thus having negative offsets relative to the framepointer). If they were function arguments, the offsets would be positive.

<asm>

a = 1

INT 1 ; put 1 on the stack DUP ; duplicate value (= is an expression) LOCAL -4 ; put address of "a" (assuming offset -4) on the stack STORE ; write the value on the given address </asm>

<asm>

a = b (both local)

LOCAL -8 ; put address of "b" (assuming offset -8) on the stack LOAD ; read the value at the given address DUP ; duplicate value (= is an expression) LOCAL -4 ; put address of "a" (assuming offset -4) on the stack STORE ; write the value on the given address </asm>

If LOCAL is immediately followed by LOAD, then the two instructions may be replaced by a single one with the same effect: LOCV. Similarly, LOCAL+STORE = LOCA.

<asm>

a = 1

INT 1 ; put 1 on the stack DUP ; duplicate value (= is an expression) LOCA -4 ; write the value on the address of "a" (assuming offset -4) </asm>

<asm>

a = b (both local)

LOCV -8 ; read value in the address of "b" (assuming offset -8) and put it on the stack DUP ; duplicate value (= is an expression) LOCA -4 ; write the value on the address of "a" (assuming offset -4) </asm>

Examples

Example 1 - translating a C function to Postfix
Example 2 - simple program printing a global string
Example 3 - C function with pointer arithmetic to Postfix
Example 4 - C while cycle

A simple pure-Postfix example to illustrate duplication of stack values for double-precision floating point numbers.

Duplication of Floating Point Numbers on the Stack

Exercises

Exercise 5 - "gcd"
Exercise 6 - "powmod"
Exercise 7 - "traverse"
Exercise 8 - set of small exercises and questions
Exercise 9 - "mkvec" and "compute", Test 2, 2012/06/06

Code Generation

From Wiki**3

Revision as of 16:29, 6 April 2015 by Root (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Topics

The Postfix Machine

Basic Structures (code)

Cycles

"Break" and "Continue" Instructions

If-Then and If-Then-Else Instructions

Functions

Basic Structures (data)

Global (permanent) variables

Local variables

Examples

Exercises

Code Generation

From Wiki**3

Revision as of 16:29, 6 April 2015 by Root (talk | contribs) (diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Topics

The Postfix Machine

Basic Structures (code)

Cycles

"Break" and "Continue" Instructions

If-Then and If-Then-Else Instructions

Functions

Basic Structures (data)

Global (permanent) variables

Local variables

Examples

Exercises

Revision as of 16:29, 6 April 2015 by Root (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)