Difference between revisions of "Code Generation"

From Wiki**3

(Functions)
Line 29: Line 29:
 
Basic structure of a "while" cycle:
 
Basic structure of a "while" cycle:
  
<asm>
+
<source lang="asm">
 
LABEL condition
 
LABEL condition
 
;; evaluate condition expression
 
;; evaluate condition expression
Line 36: Line 36:
 
JMP condition
 
JMP condition
 
LABEL endwhile
 
LABEL endwhile
</asm>
+
</source>
  
 
Basic structure of a C-style "for" cycle:
 
Basic structure of a C-style "for" cycle:
  
<asm>
+
<source lang="asm">
 
;; evaluate initializer
 
;; evaluate initializer
 
LABEL condition
 
LABEL condition
Line 50: Line 50:
 
JMP condition
 
JMP condition
 
LABEL endfor
 
LABEL endfor
</asm>
+
</source>
  
 
=== "Break" and "Continue" Instructions ===
 
=== "Break" and "Continue" Instructions ===
Line 62: Line 62:
 
Basic structure of an "if-then" instruction:
 
Basic structure of an "if-then" instruction:
  
<asm>
+
<source lang="asm">
 
;; evaluate condition expression
 
;; evaluate condition expression
 
JZ endif
 
JZ endif
 
;; evaluate then block
 
;; evaluate then block
 
LABEL endif
 
LABEL endif
</asm>
+
</source>
  
 
Basic structure of an "if-then-else" instruction:
 
Basic structure of an "if-then-else" instruction:
  
<asm>
+
<source lang="asm">
 
;; evaluate condition expression
 
;; evaluate condition expression
 
JZ else
 
JZ else
Line 79: Line 79:
 
;; evaluate else block
 
;; evaluate else block
 
LABEL endif
 
LABEL endif
</asm>
+
</source>
  
 
=== Functions ===
 
=== Functions ===
Line 88: Line 88:
  
 
If the function is not global (e.g. C-style "static" function), then GLOBAL should not be used.
 
If the function is not global (e.g. C-style "static" function), then GLOBAL should not be used.
<asm>
+
<source lang="asm">
 
GLOBAL functionname,FUNC
 
GLOBAL functionname,FUNC
 
LABEL functionname
 
LABEL functionname
Line 95: Line 95:
 
LEAVE
 
LEAVE
 
RET
 
RET
</asm>
+
</source>
  
 
== Basic Structures (data) ==
 
== Basic Structures (data) ==
Line 105: Line 105:
 
Addressing of these variables is by name, thus, they have memory labels:
 
Addressing of these variables is by name, thus, they have memory labels:
  
<asm>
+
<source lang="asm">
 
;; int a
 
;; int a
 
BSS            ; select segment for uninitialized variables
 
BSS            ; select segment for uninitialized variables
Line 112: Line 112:
 
LABEL  a        ; define the label
 
LABEL  a        ; define the label
 
SALLOC 4          ; allocate space (sizeof(int) == 4)
 
SALLOC 4          ; allocate space (sizeof(int) == 4)
</asm>
+
</source>
  
<asm>
+
<source lang="asm">
 
;; int a = 3
 
;; int a = 3
 
DATA            ; select segment for initialized variables
 
DATA            ; select segment for initialized variables
Line 121: Line 121:
 
LABEL  a        ; define the label
 
LABEL  a        ; define the label
 
SINT 3        ; allocate space by defining initial value
 
SINT 3        ; allocate space by defining initial value
</asm>
+
</source>
  
<asm>
+
<source lang="asm">
 
;; const int a = 3
 
;; const int a = 3
 
RODATA          ; select segment for constant objects
 
RODATA          ; select segment for constant objects
Line 130: Line 130:
 
LABEL  a        ; define the label
 
LABEL  a        ; define the label
 
SINT 3        ; allocate space by defining initial value
 
SINT 3        ; allocate space by defining initial value
</asm>
+
</source>
  
<asm>
+
<source lang="asm">
 
;; char *s = "abcdef"
 
;; char *s = "abcdef"
 
RODATA          ; select segment for constant objects (string literal)
 
RODATA          ; select segment for constant objects (string literal)
Line 143: Line 143:
 
LABEL  s        ; define the label
 
LABEL  s        ; define the label
 
SADDR _L123        ; allocate space by defining initial value (address of literal)
 
SADDR _L123        ; allocate space by defining initial value (address of literal)
</asm>
+
</source>
  
 
=== Acessing global (permanent) variables ===
 
=== Acessing global (permanent) variables ===
Line 151: Line 151:
 
Addressing of these variables is by name, as shown in the following examples.
 
Addressing of these variables is by name, as shown in the following examples.
  
<asm>
+
<source lang="asm">
 
;; a = 1
 
;; a = 1
 
INT 1    ; put 1 on the stack
 
INT 1    ; put 1 on the stack
Line 157: Line 157:
 
ADDR a  ; put address of "a" on the stack
 
ADDR a  ; put address of "a" on the stack
 
STINT    ; write the value on the given address
 
STINT    ; write the value on the given address
</asm>
+
</source>
 
If the resulting value is not needed, throw it away:
 
If the resulting value is not needed, throw it away:
  
<asm>
+
<source lang="asm">
 
;; a = 1;  // expression as instruction
 
;; a = 1;  // expression as instruction
 
INT 1    ; put 1 on the stack
 
INT 1    ; put 1 on the stack
Line 167: Line 167:
 
STINT    ; write the value on the given address
 
STINT    ; write the value on the given address
 
TRASH 4
 
TRASH 4
</asm>
+
</source>
  
<asm>
+
<source lang="asm">
 
;; a = b (both global)
 
;; a = b (both global)
 
ADDR b  ; put address of "b" on the stack
 
ADDR b  ; put address of "b" on the stack
Line 176: Line 176:
 
ADDR a  ; put address of "a" on the stack
 
ADDR a  ; put address of "a" on the stack
 
STINT    ; write the value on the given address
 
STINT    ; write the value on the given address
</asm>
+
</source>
  
 
If ADDR is immediately followed by LDINT, then the two instructions may be replaced by a single one with the same effect: ADDRV. Similarly, ADDR+STINT = ADDRA.
 
If ADDR is immediately followed by LDINT, then the two instructions may be replaced by a single one with the same effect: ADDRV. Similarly, ADDR+STINT = ADDRA.
  
<asm>
+
<source lang="asm">
 
;; a = 1
 
;; a = 1
 
INT 1    ; put 1 on the stack
 
INT 1    ; put 1 on the stack
 
DUP32      ; duplicate value (= is an expression)
 
DUP32      ; duplicate value (= is an expression)
 
ADDRA a ; write the value in the address corresponding to "a"
 
ADDRA a ; write the value in the address corresponding to "a"
</asm>
+
</source>
  
<asm>
+
<source lang="asm">
 
;; a = b (both global)
 
;; a = b (both global)
 
ADDRV b ; read value in the address of "b" and put it on the stack
 
ADDRV b ; read value in the address of "b" and put it on the stack
 
DUP32      ; duplicate value (= is an expression)
 
DUP32      ; duplicate value (= is an expression)
 
ADDRA a ; write the value in the address corresponding to "a"
 
ADDRA a ; write the value in the address corresponding to "a"
</asm>
+
</source>
  
 
=== Accessing local variables ===
 
=== Accessing local variables ===
Line 198: Line 198:
 
In the following examples, we assume that both '''a''' and '''b''' are local variables (thus having negative offsets relative to the framepointer). If they were function arguments, the offsets would be positive.
 
In the following examples, we assume that both '''a''' and '''b''' are local variables (thus having negative offsets relative to the framepointer). If they were function arguments, the offsets would be positive.
  
<asm>
+
<source lang="asm">
 
;; a = 1
 
;; a = 1
 
INT 1    ; put 1 on the stack
 
INT 1    ; put 1 on the stack
Line 204: Line 204:
 
LOCAL -4 ; put address of "a" (assuming offset -4) on the stack
 
LOCAL -4 ; put address of "a" (assuming offset -4) on the stack
 
STINT    ; write the value on the given address
 
STINT    ; write the value on the given address
</asm>
+
</source>
  
<asm>
+
<source lang="asm">
 
;; a = b (both local)
 
;; a = b (both local)
 
LOCAL -8 ; put address of "b" (assuming offset -8) on the stack
 
LOCAL -8 ; put address of "b" (assuming offset -8) on the stack
Line 213: Line 213:
 
LOCAL -4 ; put address of "a" (assuming offset -4) on the stack
 
LOCAL -4 ; put address of "a" (assuming offset -4) on the stack
 
STINT    ; write the value on the given address
 
STINT    ; write the value on the given address
</asm>
+
</source>
  
 
If LOCAL is immediately followed by LDINT, then the two instructions may be replaced by a single one with the same effect: LOCV. Similarly, LOCAL+STINT = LOCA.
 
If LOCAL is immediately followed by LDINT, then the two instructions may be replaced by a single one with the same effect: LOCV. Similarly, LOCAL+STINT = LOCA.
  
<asm>
+
<source lang="asm">
 
;; a = 1
 
;; a = 1
 
INT 1  ; put 1 on the stack
 
INT 1  ; put 1 on the stack
 
DUP32      ; duplicate value (= is an expression)
 
DUP32      ; duplicate value (= is an expression)
 
LOCA -4 ; write the value on the address of "a" (assuming offset -4)
 
LOCA -4 ; write the value on the address of "a" (assuming offset -4)
</asm>
+
</source>
  
<asm>
+
<source lang="asm">
 
;; a = b (both local)
 
;; a = b (both local)
 
LOCV -8 ; read value in the address of "b" (assuming offset -8) and put it on the stack
 
LOCV -8 ; read value in the address of "b" (assuming offset -8) and put it on the stack
 
DUP32      ; duplicate value (= is an expression)
 
DUP32      ; duplicate value (= is an expression)
 
LOCA -4 ; write the value on the address of "a" (assuming offset -4)
 
LOCA -4 ; write the value on the address of "a" (assuming offset -4)
</asm>
+
</source>
  
 
== Examples ==
 
== Examples ==

Revision as of 20:24, 6 December 2018

Compiladores
Introdução ao Desenvolvimento de Compiladores
Aspectos Teóricos de Análise Lexical
A Ferramenta Flex
Introdução à Sintaxe
Análise Sintáctica Descendente
Gramáticas Atributivas
A Ferramenta YACC
Análise Sintáctica Ascendente
Análise Semântica
Geração de Código
Tópicos de Optimização

Topics

  • Interpretation: syntax-directed and based on abstract syntax tree.
  • Code generation: syntax-directed and based on abstract syntax tree.

The approach described here is based on generation from the intermediate representation in the form of a an abstract syntax tree formed by the nodes created during the parsing process.

The target machine is the Postfix engine described below.

The Postfix Machine

SL0 - single stack, large stack, 0 operand machine.

Three special purpose registers: IP (instruction pointer), for jumps and function calls; FP (frame pointer), the basis of the frame pointer for each function; SP (stack pointer), keeps track of the top of the stack.

The stack grows in the direction of lower addresses.

Almost all operations use the stack as input and output (exceptions are instructions for returning values from functions in accordance with CDecl), and instructions that use either immediate arguments or arguments in global memory (i.e., with explicit named addressing).

The Postfix reference guide (a.k.a. Appendix B) contains further details about the Postfix machine and its operations.

Basic Structures (code)

The mnemonics are those of the Postfix language. Comments indicate missing parts.

Cycles

Basic structure of a "while" cycle:

LABEL condition
;; evaluate condition expression
JZ endwhile
;; evaluate block
JMP condition
LABEL endwhile

Basic structure of a C-style "for" cycle:

;; evaluate initializer
LABEL condition
;; evaluate condition expression
JZ endfor
;; evaluate block
LABEL increment
;; evaluate increment expression
JMP condition
LABEL endfor

"Break" and "Continue" Instructions

The C-style "break" and "continue" instructions must jump, respectively, to the end of the cycle or to its condition (for deciding on whether to execute the next iteration). In the case of "for" cycles, the jump to the condition is not direct, but rather via a jump to the increment label.

If there are multiple nested cycles, the code generator must keep track of the innermost one, so that it can generate the appropriate jumps. This can be achieved through the use of address stacks (one for the condition labels and another for increment labels).

If-Then and If-Then-Else Instructions

Basic structure of an "if-then" instruction:

;; evaluate condition expression
JZ endif
;; evaluate then block
LABEL endif

Basic structure of an "if-then-else" instruction:

;; evaluate condition expression
JZ else
;; evaluate then block
JMP endif
LABEL else
;; evaluate else block
LABEL endif

Functions

The function structure presented here assumes a "void" return. If a return were desired, the use of STFVAL32 or STFVAL64 would be required before the LEAVE instruction.

"sizeoflocalvars" is the number of bytes needed to store all of the function's local variables. If this value is 0 (zero), START may be used instead of ENTER.

If the function is not global (e.g. C-style "static" function), then GLOBAL should not be used.

GLOBAL functionname,FUNC
LABEL functionname
ENTER sizeoflocalvars
;; evaluate function body
LEAVE
RET

Basic Structures (data)

Defining global (permanent) variables

We include in this category C-like "static" function variables (the only difference is that they are not defined with their explicit names, but rather with a name that reflects their being visible only inside their functions).

Addressing of these variables is by name, thus, they have memory labels:

;; int a
BSS             ; select segment for uninitialized variables
ALIGN           ; align memory
GLOBAL a, OBJ   ; declare a global symbol (linker)
LABEL  a        ; define the label
SALLOC 4          ; allocate space (sizeof(int) == 4)
;; int a = 3
DATA            ; select segment for initialized variables
ALIGN           ; align memory
GLOBAL a, OBJ   ; declare a global symbol (linker)
LABEL  a        ; define the label
SINT 3         ; allocate space by defining initial value
;; const int a = 3
RODATA          ; select segment for constant objects
ALIGN           ; align memory
GLOBAL a, OBJ   ; declare a global symbol (linker)
LABEL  a        ; define the label
SINT 3         ; allocate space by defining initial value
;; char *s = "abcdef"
RODATA          ; select segment for constant objects (string literal)
ALIGN           ; align memory
LABEL  _L123    ; define the label
SSTRING "abcdef"    ; allocate space by defining initial value (string literal)
DATA            ; select segment for initialized variables
ALIGN           ; align memory
GLOBAL s, OBJ   ; declare a global symbol (linker)
LABEL  s        ; define the label
SADDR _L123        ; allocate space by defining initial value (address of literal)

Acessing global (permanent) variables

We include in this category C-like "static" function variables (the only difference is that they are not accessed with their explicit names, but rather with a automatic symbol table-based name).

Addressing of these variables is by name, as shown in the following examples.

;; a = 1
INT 1    ; put 1 on the stack
DUP32      ; duplicate value
ADDR a   ; put address of "a" on the stack
STINT    ; write the value on the given address

If the resulting value is not needed, throw it away:

;; a = 1;  // expression as instruction
INT 1    ; put 1 on the stack
DUP32      ; duplicate value
ADDR a   ; put address of "a" on the stack
STINT    ; write the value on the given address
TRASH 4
;; a = b (both global)
ADDR b   ; put address of "b" on the stack
LDINT     ; read the value at the given address
DUP32      ; duplicate value (= is an expression)
ADDR a   ; put address of "a" on the stack
STINT    ; write the value on the given address

If ADDR is immediately followed by LDINT, then the two instructions may be replaced by a single one with the same effect: ADDRV. Similarly, ADDR+STINT = ADDRA.

;; a = 1
INT 1     ; put 1 on the stack
DUP32       ; duplicate value (= is an expression)
ADDRA a ; write the value in the address corresponding to "a"
;; a = b (both global)
ADDRV b ; read value in the address of "b" and put it on the stack
DUP32       ; duplicate value (= is an expression)
ADDRA a ; write the value in the address corresponding to "a"

Accessing local variables

In the following examples, we assume that both a and b are local variables (thus having negative offsets relative to the framepointer). If they were function arguments, the offsets would be positive.

;; a = 1
INT 1    ; put 1 on the stack
DUP32       ; duplicate value (= is an expression)
LOCAL -4 ; put address of "a" (assuming offset -4) on the stack
STINT    ; write the value on the given address
;; a = b (both local)
LOCAL -8 ; put address of "b" (assuming offset -8) on the stack
LDINT     ; read the value at the given address
DUP32       ; duplicate value (= is an expression)
LOCAL -4 ; put address of "a" (assuming offset -4) on the stack
STINT    ; write the value on the given address

If LOCAL is immediately followed by LDINT, then the two instructions may be replaced by a single one with the same effect: LOCV. Similarly, LOCAL+STINT = LOCA.

;; a = 1
INT 1   ; put 1 on the stack
DUP32       ; duplicate value (= is an expression)
LOCA -4 ; write the value on the address of "a" (assuming offset -4)
;; a = b (both local)
LOCV -8 ; read value in the address of "b" (assuming offset -8) and put it on the stack
DUP32       ; duplicate value (= is an expression)
LOCA -4 ; write the value on the address of "a" (assuming offset -4)

Examples

  • Example 1 - translating a C function to Postfix
  • Example 2 - simple program printing a global string
  • Example 3 - C function with pointer arithmetic to Postfix
  • Example 4 - C while cycle

A simple pure-Postfix example to illustrate duplication of stack values for double-precision floating point numbers.

Exercises