The approach described here is based on generation from the intermediate representation in the form of an abstract syntax tree formed by the nodes created during the parsing process.
The target machine is the Postfix engine described below.
The Postfix machine is a stack machine possessing a single large stack and 0-operand instructions (SL0).
Three special purpose registers: IP (instruction pointer, a.k.a, program counter), for jumps and function calls; FP (frame pointer, a.k.a. base pointer), the basis of the frame pointer for each function; SP (stack pointer), keeps track of the top of the stack.
The stack grows in the direction of lower addresses.
Almost all operations use the stack as input and output (exceptions are instructions for returning values from functions in accordance with CDecl), and instructions that use either immediate arguments or arguments in global memory (i.e., with explicit named addressing).
The Postfix reference guide contains further details about the Postfix machine and its operations.
The mnemonics are those of the Postfix language. Comments indicate missing parts.
Basic structure of a while cycle:
LABEL condition
;; evaluate condition expression
JZ endcycle
;; evaluate block
JMP condition
LABEL endcycle
Basic structure of a do-while cycle:
LABEL startcycle
;; evaluate block
LABEL condition
;; evaluate condition expression
JNZ startcycle
LABEL endcycle
Basic structure of a C-style for cycle:
;; evaluate initializer
LABEL condition
;; evaluate condition expression
JZ endfor
;; evaluate block
LABEL increment
;; evaluate increment expression
JMP condition
LABEL endfor
The C-style break and continue instructions must jump, respectively, to the end of the cycle or to its condition (for deciding on whether to execute the next iteration). In the case of for cycles, the jump to the condition is not direct, but rather via a jump to the increment label.
If there are multiple nested cycles, the code generator must keep track of the innermost one, so that it can generate the appropriate jumps. This can be achieved through the use of address stacks (one for break labels and another for continue labels).
Basic structure of an "if-then" instruction:
;; evaluate condition expression
JZ endif
;; evaluate then block
LABEL endif
Basic structure of an "if-then-else" instruction:
;; evaluate condition expression
JZ else
;; evaluate then block
JMP endif
LABEL else
;; evaluate else block
LABEL endif
The function structure presented here assumes a "void" return. If a return were desired, the use of STFVAL32 or STFVAL64 would be required before the LEAVE instruction.
"sizeoflocalvars" is the number of bytes needed to store all of the function's local variables. If this value is 0 (zero), START may be used instead of ENTER.
If the function is not global (e.g. C-style "static" function), then GLOBAL should not be used.
GLOBAL functionname,FUNC
LABEL functionname
ENTER sizeoflocalvars
;; evaluate function body
LEAVE
RET
We include in this category C-like "static" function variables (the only difference is that they are not defined with their explicit names, but rather with a name that reflects their being visible only inside their functions). Note, however, that "static" variables, even though they exist in globally accessible memory spaces, do not have public names and, hence, are not declared GLOBAL.
Addressing of these variables is by name, thus, they have memory labels:
;; int a
BSS ; select segment for uninitialized variables
ALIGN ; align memory
GLOBAL a, OBJ ; declare a global symbol (linker)
LABEL a ; define the label
SALLOC 4 ; allocate space (sizeof(int) == 4)
;; int a = 3
DATA ; select segment for initialized variables
ALIGN ; align memory
GLOBAL a, OBJ ; declare a global symbol (linker)
LABEL a ; define the label
SINT 3 ; allocate space by defining initial value
;; const int a = 3
RODATA ; select segment for constant objects
ALIGN ; align memory
GLOBAL a, OBJ ; declare a global symbol (linker)
LABEL a ; define the label
SINT 3 ; allocate space by defining initial value
;; char *s = "abcdef"
RODATA ; select segment for constant objects (string literal)
ALIGN ; align memory
LABEL _L123 ; define the label
SSTRING "abcdef" ; allocate space by defining initial value (string literal)
DATA ; select segment for initialized variables
ALIGN ; align memory
GLOBAL s, OBJ ; declare a global symbol (linker)
LABEL s ; define the label
SADDR _L123 ; allocate space by defining initial value (address of literal)
;; L22 example
;; var f = (int i) -> int:
;; return i
TEXT _F123 ; select segment for this (anonymous) function
ALIGN ; align memory
LABEL _F123 ; define the label
ENTER 0
LOCAL +8 ; &i
LDINT ; i
STFVAL32
LEAVE
RET
DATA ; select segment for initialized variables
ALIGN ; align memory
GLOBAL f, OBJ ; declare a global symbol (linker)
LABEL f ; define the label
SADDR _F123 ; allocate space by defining initial value (address of function)
We include in this category C-like "static" function variables (the only difference is that they are not accessed with their explicit names, but rather with a automatic symbol table-based name).
Addressing of these variables is by name, as shown in the following examples.
;; a = 1
INT 1 ; put 1 on the stack
DUP32 ; duplicate value
ADDR a ; put address of "a" on the stack
STINT ; write the value on the given address
If the resulting value is not needed, throw it away:
;; a = 1; // expression as instruction
INT 1 ; put 1 on the stack
DUP32 ; duplicate value
ADDR a ; put address of "a" on the stack
STINT ; write the value on the given address
TRASH 4
;; a = b (both global)
ADDR b ; put address of "b" on the stack
LDINT ; read the value at the given address
DUP32 ; duplicate value (= is an expression)
ADDR a ; put address of "a" on the stack
STINT ; write the value on the given address
If ADDR is immediately followed by LDINT, then the two instructions may be replaced by a single one with the same effect: ADDRV. Similarly, ADDR+STINT = ADDRA. Note that this, in most cases, will correspond to an optimization and its use by the code generator will be impossible or very limited.
;; a = 1
INT 1 ; put 1 on the stack
DUP32 ; duplicate value (= is an expression)
ADDRA a ; write the value in the address corresponding to "a"
;; a = b (both global)
ADDRV b ; read value in the address of "b" and put it on the stack
DUP32 ; duplicate value (= is an expression)
ADDRA a ; write the value in the address corresponding to "a"
In the following examples, we assume that both a and b are local variables (thus having negative offsets relative to the framepointer). If they were function arguments, the offsets would be positive.
;; a = 1
INT 1 ; put 1 on the stack
DUP32 ; duplicate value (= is an expression)
LOCAL -4 ; put address of "a" (assuming offset -4) on the stack
STINT ; write the value on the given address
;; a = b (both local)
LOCAL -8 ; put address of "b" (assuming offset -8) on the stack
LDINT ; read the value at the given address
DUP32 ; duplicate value (= is an expression)
LOCAL -4 ; put address of "a" (assuming offset -4) on the stack
STINT ; write the value on the given address
If LOCAL is immediately followed by LDINT, then the two instructions may be replaced by a single one with the same effect: LOCV. Similarly, LOCAL+STINT = LOCA. Note that this, in most cases, will correspond to an optimization and its use by the code generator will be impossible or very limited.
;; a = 1
INT 1 ; put 1 on the stack
DUP32 ; duplicate value (= is an expression)
LOCA -4 ; write the value on the address of "a" (assuming offset -4)
;; a = b (both local)
LOCV -8 ; read value in the address of "b" (assuming offset -8) and put it on the stack
DUP32 ; duplicate value (= is an expression)
LOCA -4 ; write the value on the address of "a" (assuming offset -4)
A simple pure-Postfix example to illustrate duplication of stack values for double-precision floating point numbers.