Well, if you've made it this far, your compiler is well on its way to producing code. In this project, you will be extending your compiler to produce runnable assembly code for use on Sun SPARC workstations. By the end of this project, your compiler will finally be able to create runnable FLAME programs.
DO NOT WAIT UNTIL THE LAST MINUTE TO START THIS PROJECT!
If your compiler is still not parsing correct programs, it would be a very good idea to get that working as soon as possible. Please see Ian or myself.
For example, if a user supplies a FLAME program hello.flm like this:
fun main()
begin
print("Hello World\n")
end
The output of your compiler will be a file hello.s that looks something like this:
! generated by flame.py
.section ".text"
! function: main
.global main
.type main,2
main:
save %sp, 64, %sp
sethi %hi(.L1), %o0
or %o0, %lo(.L1), %o0
call flprint
nop
call _exit
nop
.section ".rodata"
.L1: .asciz "Hello World\n"
This file will then be assembled with the Sun assembler and linked with a small FLAME runtime library to produce
an executable. For example:
This project assumes no prior experience with SPARC assembly language (details follow). However, you will have to do a considerable amount of experimentation to make sure things work correctly.% python flame.py hello.flm % as hello.s % ld hello.o flame.o -lc % a.out Hello World %
import sys import os.path filename = sys.argv[1] outname = os.path.splitext(filename)[0] + ".s"
The output of your parser should be the root node of a well-formed syntax tree representing the contents of the input program. If a syntax error or type error occurred during parsing, you may want to have your parser return None to indicate an error.f = open(filename) data = f.read() f.close() import flameparse top = flameparse.parse(data)
def generate(file,top):
print >>file, "! Created by flame.py"
print >>file, "! Yourname, CS326 (Spring 2001)"
top = flameparse.parse(data)
if top:
import flamegen
outf = open(outfile,"w")
flamegen.generate(outf,top)
outf.close()
Congratulations, your compiler just produced its first output file.% python flame.py simple.flm % more simple.s ! Created by flame.py ! Yourname, CS326 (Spring 2001) %
To walk the tree, create a collection of functions that try to print information about each statement in the source program. For example (note: you will have to modify this code to match your syntax tree representation):
def emit_program(out,top):
print >>out,"\n! program"
funclist = (get list of functions)
for f in funclist:
emit_function(out,f)
def emit_function(out,func):
print >>out,"\n! function: %s (start) " % func.name
...
statements = (get list of statements)
emit_statements(out,statements)
...
print >>out,"! function: %s (end) " % func.name
def emit_statements(out,statements):
for s in statements:
emit_statement(out,s)
def emit_statement(out,s):
if s.name == 'print':
emit_print(out,s)
elif s.name == 'read':
emit_read(out,s)
elif s.name == 'write':
emit_write(out,s):
elif s.name == 'while':
emit_while(out,s):
...
def emit_print(out,s):
print >>out, "\n! print (start)"
...
print >>out, "! print (end)"
def emit_while(out,s):
print >>out, "\n! while (start)"
...
statement = (get body of while)
emit_statement(out,statement)
...
print >>out, "! while (end)"
For each statement in your program, make the corresponding emit function print a starting and ending
marker as shown in the example. Note: lines starting with "!" are comments in assembler.
Now, modify your generate() function to walk through a program and print information about each function and individual statements. For example, if you supplied the following program:
/* Compute GCD */
fun main()
x: int;
y: int;
g: int;
begin
read(x);
read(y);
g := y;
while x > 0 do
begin
g := x;
x := y - (y/x)*x;
y := g
end;
write(g)
end
The output of your compiler should look roughly like this:
Notes:! Created by flame.py ! Yourname, CS326 (Spring 2001) ! program ! function: main (start) ! read (start) ! read (end) ! read (start) ! read (end) ! assign (start) ! assign (end) ! while (start) ! assign (start) ! assign (end) ! assign (start) ! assign (end) ! assign (start) ! assign (end) ! while(end) ! write (start) ! write (end) ! function: main (end)
Write a function eval_expression() that produces pseudocode for expression evaluation using this technique. For example:push 2 # Stack : 2 push 3 # Stack : 3 2 push 4 # Stack : 4 3 2 mutiply # Stack : 12 2 add # Stack : 14 push 9 # Stack : 9 14 sub # Stack : 5 val := pop # Stack : # val gets the value 5
def eval_expression(out,expr):
if expr.name == 'number':
print >>out, "! push", expr.value
elif expr.name == 'id':
print >>out, "! push", expr.value
elif expr.name == '+':
left = expr.children[0]
right = expr.children[1]
eval_expression(out,left)
eval_expression(out,right)
print >>out, "! add"
elif expr.name == '-':
left = expr.children[0]
right = expr.children[1]
eval_expression(out,left)
eval_expression(out,right)
print >>out, "! sub"
...
In addition, produce similar pseudocode for the relational operations and
boolean operators (and,or,not).
Now, modify the emit functions for the statements involving expressions to produce expression pseudocode. For example:
def emit_write(out,s):
print >>out, "\n! write (start)"
expr = (get expr from s)
eval_expression(out,expr)
print >>out, "! expr := pop"
print >>out, "! write(expr)"
print >>out, "! write (end)"
With these modifications, the output of your compiler will look something like this:
Notes:! Created by flame.py ! Yourname, CS326 (Spring 2001) ! program ! function: main (start) ! read (start) ! read (end) ! read (start) ! read (end) ! assign (start) ! push y ! g := pop ! assign (end) ! while (start) ! push x ! push 0 ! gt ! relop := pop ! assign (start) ! push x ! g := pop ! assign (end) ! assign (start) ! push y ! push y ! push x ! div ! push x ! mul ! sub ! x := pop ! assign (end) ! assign (start) ! push g ! y := pop ! assign (end) ! while(end) ! write (start) ! push g ! expr := pop ! write(expr) ! write (end) ! function: main (end)
! push 2 ! push 4 ! push i ! mul ! push 10 ! add ! index := pop ! push a[index] ! add ! x := pop
def emit_while(out,s):
print >>out, "\n! while (start)"
print >>out, "! test:"
relop = (get relation expression from s)
eval_expression(out,relop)
print >>out, "! relop := pop"
print >>out, "! if not relop: goto done"
statement = (get statement from s)
emit_statement(out,statement)
print >>out, "! goto test"
print >>out, "! done:"
print >>out, "! while (end)"
With these changes, your program should now look like this:
Notes:! Created by flame.py ! Yourname, CS326 (Spring 2001) ! program ! function: main (start) ! read (start) ! read (end) ! read (start) ! read (end) ! assign (start) ! push y ! g := pop ! assign (end) ! while (start) ! test: ! push x ! push 0 ! gt ! relop := pop ! if not relop: goto done ! assign (start) ! push x ! g := pop ! assign (end) ! assign (start) ! push y ! push y ! push x ! div ! push x ! mul ! sub ! x := pop ! assign (end) ! assign (start) ! push g ! y := pop ! assign (end) ! goto test ! done: ! while(end) ! write (start) ! push g ! expr := pop ! write(expr) ! write (end) ! function: main (end)
test:
compute relational expression
if false: goto done
statements
goto test # Go back for another test
done:
... continue with program ...
compute relational expression if false: goto else statements for true goto next else: statements for false next: ... next statement ...
The pseudocode would look like this:x := foo(3,2*x+y,b) + 2
! push 3 ! arg1 := pop ! push 2 ! push x ! mul ! push y ! add ! arg2 := pop ! push b ! arg3 := pop ! push foo(arg1,arg2,arg3) ! push 2 ! add ! x := pop
The general structure of your output with these two segments is specified in the Sun assembler as follows:
! Created by flame.py
! Yourname, CS326 (Spring 2001)
.section ".text"
Instructions
.section ".rodata"
Data
Since these two sections are generally created at the same time, the easiest way to deal with the
data segment is to create a special in-memory string file for it like this:
import StringIO
data = StringIO.StringIO()
...
def emit_print(out,n):
value = (get string literal value)
label = new_label()
# Drop a literal in the data segment
print >>data, '%s: .asciz "%s"' % (label, value)
...
# emit rest of print statement
def generate(out,top):
...
print >>out, ' .section ".text"'
print >>data, ' .section ".rodata"'
emit all of the code
...
# Append the data segment at the end of the output
print >>out, data.getvalue()
Within the assembly code created by your compiler, you will need to give names to various things. These
names are often known as labels. Labels are always indicated by a name and a colon like this:
.section ".text"
label1:
statements
label2:
statements
.section ".rodata"
label3:
data
label4:
data
As a general rule, the only label names explicitly defined by a FLAME
program are the names of functions. For all other labels, you should
write a utility function new_label() that manufactures a new
label name each time it is called. Usually these manufactured names
correspond to parts of control-flow constructs such as if and while.
They may also refer to global data in the data segment. Normally,
these types of special labels are given names such as ".L12" or ".L23"
so that they don't conflict with valid program function names (which can
never start with a ".").
In addition, some labels may be declared as global so that they can be accessed externally. The .global directive is used to define a global label. All of your functions should be declared in this manner. For example:
.global main
main:
statements
To illustrate the use of labels, modify your emit functions to insert labels an appropriate locations. For example,
def emit_while(out,s):
print >>out, "\n! while (start)"
test_label = new_label()
done_label = new_label()
print >>out, "%s:" % test_label
relop = (get relation expression from s)
eval_expression(out,relop)
print >>out, "! relop := pop"
print >>out, "! if not relop: goto %s" % done_label
statement = (get statement from s)
emit_statement(out,statement)
print >>out, "! goto %s" % test_label
print >>out, "%s:" % done_label
print >>out, "! while (end)"
With these changes, your output should now look like this (notice the use of
.L1, .L2, and main):
! Created by flame.py
! Yourname, CS326 (Spring 2001)
.section ".text"
! program
! function: main (start)
.global main
main:
! read (start)
! read (end)
! read (start)
! read (end)
! assign (start)
! push y
! g := pop
! assign (end)
! while (start)
.L1:
! push x
! push 0
! gt
! relop := pop
! if not relop: goto .L2
! assign (start)
! push x
! g := pop
! assign (end)
! assign (start)
! push y
! push y
! push x
! div
! push x
! mul
! sub
! x := pop
! assign (end)
! assign (start)
! push g
! y := pop
! assign (end)
! goto .L1
.L2:
! while(end)
! write (start)
! push g
! expr := pop
! write(expr)
! write (end)
! function: main (end)
.section ".rodata"
funcname:
save %sp, -96, %sp
...
instructions
...
ret
restore
The first instruction (save) is used to allocate a new stack frame. The number 96 is the size of the stack frame
size in bytes. The actual number you supply here will depend on the number of local variables, temporaries, and
function calling conventions used by your compiler. The stack frame size must always be larger than 64 bytes and be a multiple
of 8. The ret instruction is used to return from a subroutine. The restore instruction
restores the stack pointer to its previous value (undoing the effects of the save instruction).
(Note: the SPARC has delayed branches so the restore instruction appearing immediately after ret
always executes before the ret actually takes effect).
Within the body of a function, two registers are used to hold information about the location of the stack frame. The %fp register contains the address of the top of the stack frame (the value of %sp before the save instruction executes). The %sp register contains the address of the bottom of the stack frame (after the save instruction). Please note that the stack grows downward so the value of %fp is always larger than than the value of %sp.
The first 64 bytes immediately above the stack pointer %sp are always reserved by the system for register storage. Your compiler should never access or store data in the range of memory addresses [%sp, %sp+63]!.
Local variables and temporaries are stored below the frame pointer. For example, if you had a FLAME function like this:
fun foo()
x : int;
y : float;
a : int;
b : int[20];
begin
...
end
You would need to allocate at least 4+4+4+4*20=92 bytes of memory for local variables.
Thus, the start of your procedure might look like this:
foo:
save %sp, -160, %sp
...
The value of 160 is formed by taking the 92 bytes of memory needed for local variables,
adding 64 bytes for register storage, and adding 4 bytes of padding to make the value a multiple of 8.
Of course, if your procedure had additional temporary variables, you would need to allocate
extra space for them as well.
The local variables themselves might be stored in the following memory locations:
x:int -> [%fp - 4]
y:float -> [%fp - 8]
a:int -> [%fp - 12]
b:int[20] -> [%fp - 92]
_____________ < %fp
| x:int |
|-------------| < %fp - 4
| y:float |
|-------------| < %fp - 8
| a:int |
|-------------| < %fp - 12
| b[19] |
|-------------|
| b[18] |
|-------------|
// //
|-------------|
| b[0] |
|-------------| < %fp - 92
| PAD |
|-------------| < %fp - 96, %sp+64
| |
| RESERVED |
| |
|-------------| < %sp, %fp - 160
Within each procedure, the following CPU registers may be used for temporary calculations.
The input registers %i0-%i5 are commonly used to hold the first 6 input arguments to a function.
The output registers %o0-%o5 are used to hold output parameters when a procedure wants to call another subroutine. There is a special relationship between the %i and %o registers---specifically, whenever a procedure call is made, the output registers in the caller turn into the input registers in the called procedure. This shifting is really due to the behavior of SPARC register windows (described later).
The global registers %g0-%g7 are shared by all procedures and should only be used for temporary calculations such as computing memory addresses. The register %g0 is hardwired to the value 0. Registers %g4-%g7 are reserved by the operating system and should generally be avoided by your compiler.
The %i6 register is the same as the frame pointer %fp. The %o6 register is the same as the stack pointer %sp. The %i7 register contains the return address of a subroutine (the value of the PC that will be restored when a procedure returns).
Specific instructions:
funcname:
save %sp, -128, %sp
...
.Ln:
ret
restore
The special .Ln label at the end is used to label the location of the function
return. You will be using this a little later.
.Ln:
mov 0, %o0 ! only appears in main
call _exit ! only appears in main
nop ! only appears in main
ret
restore
To call these functions from assembly language, you will need to use the following code sequences:void flwritei(int x); /* Write an integer */ void flwritef(float f); /* Write a float */ int flreadi(); /* Read an integer */ float flreadf(); /* Read a float */ void flprint(char *s); /* Print a string */
! call flwritei(int)
mov val, %o0
call flwritei
nop
! call flwritef(float)
mov val, %o0
call flwritef
nop
! call flreadi()
call flreadi
nop
st %o0, result
! call flreadf()
call flreadf
nop
st %o0, result
! call flprint()
sethi %hi(addr), %o0
or %o0, %lo(addr), %o0
call flprint
nop
In these example, val is assumed to be a register, and result is assumed to be a memory location.
addr is the address of the string literal in the program data segment. See the next section.
fun main()
begin
print("Hello World\n")
end
To do it, implement the emit function for the print statement by making it produce the following
code:
.Ln: .asciz "Hello World\n"
sethi %hi(.Ln), %o0
or %o0, %lo(.Ln), %o0
call flprint
nop
Now, let's try out your hello world program:
% flame.py hello.flm
% cat hello.s
! Created by flame.py
! Yourname, CS326 (Spring 2001)
.section ".text"
! program
! function: main (start)
.global main
main:
save %sp, -128, %sp
sethi %hi(.L2), %o0
or %o0, %lo(.L2), %o0
call flprint
nop
.L1:
mov 0, %o0
call _exit
nop
ret
restore
! function: main (end)
.section ".rodata"
.L2: .asciz "Hello World\n"
% as hello.s
% ld hello.o flame.o -lc
% a.out
Hello World
%
Congratulations, you have just compiled your first runnable FLAME program.
The local variables to each function should be contained in some kind of list or parse tree structure created back in Project 2/3. In this part of the project, you will walk through this list and figure out where the variable is suppose to be stored on the function stack frame. Recall that the local variables are usually stored at the top of the stack frame (right below the %fp).
First, you need to know the size of each datatype:
Now, write a function allocate_locals() that iterates over the local variable list and assigns a frame offset to each variable. The frame offset should be stored in the symbol table entry for each local. For example, if you had the function,int 4 bytes (32 bits) float 4 bytes (32 bits) int[n] 4*n bytes float[n] 4*n bytes
fun main()
x : int;
y : float;
a : int;
b : int[20];
begin
...
end
you might assign the following offset values (see Part 7):
The allocate_locals() function should return the total number of bytes needed to store the local variables.x.symtab.frame_offset = -4 [%fp - 4] y.symtab.frame_offset = -8 [%fp - 8] a.symtab.frame_offset = -12 [%fp - 12] b.symtab.frame_offset = -92 [%fp - 92]
Finally, make the following modifications to your code generator:
The easiest way to translate the stack pseudocode to a real implementation is to treat the %l0-%l7 registers as always representing the top 8 elements of the expression stack. Thus, if you wanted to evaluate the expression 2+3*4-9, it might be evaluated as follows:
SPARC assembly Pseudocode ---------------------------------------------- mov 2, %l0 push 2 mov 3, %l1 push 3 mov 4, %l2 push 4 mov %l1,%o0 call .mul multiply mov %l2,%o1 mov %o0, %l1 add %l0,%l1,%l0 add mov 9, %l1 push 9 sub %l0,%l1,%l0 sub st %l0, value val := pop
To implement this, create a couple of functions that implement the functionality of a stack machine. For example, your implementation might have the following functions or methods:
One problem with the stack implementation is that once more than 8 values are on the stack, old values will have to be spilled to memory in order to accomodate new values. To handle memory spills, create a temporary variable region that is located below all of the local variables in the function stack frame. Now, modify the push() and pop() functions as follows:print >>out,"mov 2, %s" % push() print >>out,"mov 3, %s" % push() print >>out,"mov 4, %s" % push() r = pop() l = pop() print >>out,"mov %s, %%o0" % l print >>out,"call .mul" print >>out,"mov %s, %%o1" % r print >>out,"mov %%o0, %s" % push() r = pop() l = pop() print >>out,"add %s, %s, %s" % (l,r,push()) print >>out,"mov 9, %s" % push() r = pop() l = pop() print >>out,"sub %s, %s, %s" % (l,r,push()) result = pop() print >>out,"st %s, %s" % (result,val)
The SPARC assembly code for evaluating this on a stack might look like this:1+(2+(3+(4+(5+(6+(7+(8+(9+10))))))))
You implementation will have to keep track of the maximum number of spilled variables during expression evaluation. This number will have to be added to the stack frame size given to the save statement.mov 1, %l0 ! push 1 mov 2, %l1 ! push 2 mov 3, %l2 ! push 3 mov 4, %l3 ! push 4 mov 5, %l4 ! push 5 mov 6, %l5 ! push 6 mov 7, %l6 ! push 7 mov 8, %l7 ! push 8 st %l0, [%fp - 160] ! spill %l0 (1) mov 9, %l0 ! push 9 st %l1, [%fp - 164] ! spill %l1 (2) mov 10, %l1 ! push 10 add %l0,%l1,%l0 ! (9+10) add %l7,%l0,%l7 ! 8+(9+10) add %l6,%l7,%l6 ! 7+(8+(9+10)) add %l5,%l6,%l5 ! 6+(7+(8+(9+10))) add %l4,%l5,%l4 ! 5+(6+(7+(8+(9+10)))) add %l3,%l4,%l3 ! 4+(5+(6+(7+(8+(9+10))))) add %l2,%l3,%l2 ! 3+(4+(5+(6+(7+(8+(9+10)))))) ld [%fp - 164], %l1 ! recover 2 add %l1, %l2, %l1 ! 2+(3+(4+(5+(6+(7+(8+(9+10))))))) ld [%fp - 160], %l0 ! recover 1 add %l0, %l1, %l0 ! 1+(2+(3+(4+(5+(6+(7+(8+(9+10))))))))
Here are a few general code generation rules you will need to use to evaluate expressions:
If the value has absolute value greater than 4095, do the following:mov constant, %ln
sethi %hi(.Lm), %g1
or %g1, %lo(.Lm), %g1
ld [%g1], %ln
...
.section rodata
.Lm: .word constant
sethi %hi(.Lm), %g1
or %g1, %lo(.Lm), %g1
ld [%g1], %ln
...
.section rodata
.Lm: .float floatconstant
where offset is the frame pointer offset (stored in the symbol table) for the local variable and %ln is the register corresponding to the top of the expression stack.ld [%fp + offset], %ln
add %li, %lj, %lk ! %li + %lj -> %lk sub %li, %lj, %lk ! %li - %lj -> %lk
mov %li, %o0 ! %li * %lj -> %lk call .mul mov %lj, %o1 ! Note: this is in branch delay slot mov %o0, %lk mov %li, %o0 ! %li / %lj -> %lk call .div mov %lj, %o1 ! Note: this is in branch delay slot mov %o0, %lk
and %li, %lj, %lk ! %li and %lj -> %lk or %li, %lj, %lk ! %li or %lj -> %lk
! < cmp %li, %lj ! if %li < %lj then %lk = 1 else %lk = 0 bl .Lm ! branch less than mov 1, %lk ! branch delay (always executed) mov 0, %lk .Lm: ! <= cmp %li, %lj ! if %li <= %lj then %lk = 1 else %lk = 0 ble .Lm ! branch less than or equal mov 1, %lk ! branch delay (always executed) mov 0, %lk .Lm: ! > cmp %li, %lj ! if %li > %lj then %lk = 1 else %lk = 0 bg .Lm ! branch greater than mov 1, %lk ! branch delay (always executed) mov 0, %lk .Lm: ! >= cmp %li, %lj ! if %li >= %lj then %lk = 1 else %lk = 0 bge .Lm ! branch greater than mov 1, %lk ! branch delay (always executed) mov 0, %lk .Lm: ! == cmp %li, %lj ! if %li == %lj then %lk = 1 else %lk = 0 be .Lm ! branch greater than mov 1, %lk ! branch delay (always executed) mov 0, %lk .Lm: ! != cmp %li, %lj ! if %li != %lj then %lk = 1 else %lk = 0 bne .Lm ! branch greater than mov 1, %lk ! branch delay (always executed) mov 0, %lk .Lm:
neg %li, %lj ! -%li -> %lk
xor %li, 1, %lj ! not %li -> %lk
Gets translated into code like this,a := expr
! evaluate expression
...
! assignment
st %lm, [%fp + offset]
where %lm is the register containing the result of the expression and
offset is the frame offset of the variable.
! evaluate expression
...
! write
mov %lm, %o0
call flwritei
nop
where %lm is the register containing the result of the expression given to write.
Test out your write() statement and assignment functions by compiling a few simple programs such as this:
fun main()
a : int;
b : int;
c : int;
begin
a := 3;
b := a + 10;
c := b * 2;
write(c)
end
! while statement
.Lm:
! Evaluate relop
...
cmp %li, %g0 ! %li contains result of relop
be .Ln ! == 0, false, exit
nop
...
statements
...
ba .Lm
nop
.Ln:
The general form of a if-else-statement is as follows:
! if-else statement
! Evaluate relop
...
cmp %li, %g0 ! %li contains result of relop
be .Lm ! == 0, false, goto else
nop
! then part
...
statements
...
ba .Ln ! skip else part
nop
! else part
.Lm:
...
statements
...
.Ln:
To access an array element, you will need to do three things:
! Evaluate expr
...
! compute offset
sll %lm, 2, %lm ! Multiply by 4 (left shift 2 bits)
add %fp, %lm, %lm ! Add to frame pointer
ld [%lm + offset], %ln
In addition, you may want to put arrays bounds checking in this code as well.
Storing to an array is similar. You will first need to compute the array index and memory address before a value can be stored.
Note: When evaluating expressions such as a + b[4*i-2] + c, the computation of the array index will use the same expression stack as the outer expression. Just make sure your stack allocator allows for nested expression evaluations and everything should work out.
When arrays are passed as parameters to function, they should be passed by reference (e.g., a pointer to the first element of the array is passed instead of a copy of the array itself).
The code you generate will roughly following this prototype:foo(a1,a2,a3,a4)
evaluate a1 expression stack: a1 evaluate a2 expression stack: a2 a1 evaluate a3 expression stack: a3 a2 a1 evaluate a4 expression stack: a4 a3 a2 a1 ! Make actual function call t = pop() store t, arg4 t = pop() store t, arg3 t = pop() store t, arg2 t = pop() store t, arg1 call foo
.section ".data"
DISPLAY: .word 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
! number of nesting levels here
import StringIO
f = StringIO.StringIO() # Create an in-memory file object
emit_function_body(f,node)
print >>outf,"! function (start)"
print >>outf,"%s:" % function_name
print >>outf," save %%sp, -%d, %%sp" % frame_size
print >>outf, f.getvalue()
Of course, there is more than one way to do this.
% python flame.py testname.flm % a.out
% tar -cf yourlogin.tar flame
% cp yourlogin.tar /stage/classes/current/CS326/handin/project4