Milestone #3: Front-End (Scanner & Parser)¶
Due: Thursday July 13th at 11:59pm
This milestone will require you to get started implementing the front-end for the compiler.
Working Individually or as a Pair¶
You are allowed to work alone or in teams of two for the compiler project. Before creating your repository via the invitation, you will first need to
Decide if you want to work alone or find a partner in the class.
Come up with a fun team name or if you are working alone a name for your compiler.
The first person (or yourself if you are working alone) to accept the Github classroom invite will enter in the team name. If working in a pair, the second group member will need to click on their team name in-order to join the repository.
Once you have done these two steps, then you can move on to Getting started section.
Getting started¶
A Git repository will be created for you on GitHub. However, before that repository can be created for you, you need to have a GitHub account. If you do not yet have one, you can get an account here: https://github.com/join.
To actually get your private repository, you will need this invitation URL:
Compiler Project Invitation (Please check the Post “Compiler Project is ready” Ed)
When you click on an invitation URL, you will have to complete the following steps:
See step 3 in “Working Individually or as a Pair” section for the first step.
- You now need to clone your repository (i.e., download it to your machine).
Make sure you’ve set up SSH access on your GitHub account.
For each repository, you will need to get the SSH URL of the repository. To get this URL, log into GitHub and navigate to your project repository (take into account that you will have a different repository per project). Then, click on the green “Code” button, and make sure the “SSH” tab is selected. Your repository URL should look something like this: git@github.com:mpcs51300-sum23/proj-TEAM_NAME-GITHUB-USERNAME.git.
If you do not know how to use
git clone
to clone your repository then follow this guide that Github provides: Cloning a Repository
If you run into any issues, or need us to make any manual adjustments to your registration, please let us know via Ed Discussion.
Language Overview: GoLite¶
Please make sure you read over the document that describes the language we will be implementing this quarter: Language Overview
Compiler Structure¶
To help with code modularity and ease of grading, we will require a specific repository structure throughout the implementation of the compiler. We will provide specific names to directories that are required; however, you are able to include additional directories and files to help with your implementation. You are not allowed to use any external third-party libraries unless Professor Samuels specifically gives you permission to do so. You may only use the libraries provided in the standard library of your chosen language. If you think you need to use an external library then ask before using it.
The initial contents of the repository includes directories and files needed to help you get started implementing the front-end of the compiler. Lets review these directories now:
benchmarks - Throughout the implementation of the compiler, you can place sample Golite programs in this directory to help test the various phases. Professor Samuels during week 4 will provide additional benchmark tests to place inside this directory.
golite - The directory that includes the main driver for the compiler (i.e., the file that includes the
main
function or entry point code into the compiler).grammars - The directory that will contain the ANTLR grammar files and the full ANTLR runtime jar file (i.e.,
antlr-4.11.1-complete.jar
). If you are running ANTLR remotely then you will need to install Java and Java Runtime on your local machine: Install Java Runtime . The Java is already installed on the linux servers. You will also see a file calledgenerate.sh
. This is a bash script used to generate the lexer and parser. Right now, it’s preset to produce Go code but you can change theDlanguage=YOUR_LANGUAGE
flag to your specific language. When ran, thegenerate.sh
files will automatically place the generated lexer and parser into the parser and lexer directories.lexer - The directory contains the autogenerated lexer files. Additionally, you will need to write code to connect the autogenerated code into the your compiler implementation.
parser - The directory contains the autogenerated parser files. Additionally, you will need to write code to connect the autogenerated code into the your compiler implementation.
build.sh - You are required to write a bash script that fully builds the compiler into an executable. This script must also call any other scripts that generate/build any additional components of the compiler. If you using a language that does not build an executable (i.e., Python), then just generating/building all the required components is fine. You will provide instructions in the README file that explains how to run your compiler.
README - This document provide specific instructions to run your compiler on the CS Linux Servers. You must be detailed and explicit with these instructions. Please break these instructions down into specific steps (ie.. Step 1, 2, 3, etc.). Also provide instructions on how to run a a program and specify flags. Failing to provide a straight-forward way of executing your compiles will result in significant deductions to your score.
As we build each phase of the compiler, we will state the additional required directories. Always make sure that your compiler runs correctly on the CS Linux Servers. If it doesn’t run on those servers then you will receive an automatic zero for your completeness score for the project. We need to be able to grade them specifically on those machines. Again feel free to include additional directories/files as needed.
Compiler Implementation Languages¶
You are required to implement the entire compiler in one of the following languages:
Java
C++
Java (or dialect)
Python 3
Go
You are not allowed to implement the compiler in any other language. We recommend you implement the compiler in Go since he and TA will be able to help you faster. Implementing, the compiler in other languages may result in not fully be able to help quickly fix problems. Please keep this in mind that building and fixing the compiler will be your sole responsibility but the course staff will help in any way possible.
Milestone Requirement: Scanner & Parser¶
The main requirement for the first milestone is to get a working scanner and parser completed. You must use ANTLR (ANother Tool for Language Recognition) to build your scanner (i.e., lexer) and parser. The grammar files must be placed inside the grammars
directory. We already have created this directory for you and have placed the ANTLR jar file inside this directory.
Task 0: Learning ANTLR¶
Your first task is to understand ANTLR system by reading over it’s documentation:
You are required to read over all the sections under the Sections area, with the exception of these sections:
Unicode U+FFFF, U+10FFFF character streams
Parsing binary streams
Parser and lexer interpreters
Resources
Professor Samuels will shortly post a video explaining how you can implement the Cal language using ANTLR and connecting it to a compiler implementation. However, We want you to first read over these docs before watching the video so you understand better what’s happening in the video.
Task 1: Implement the Lexer using ANTLR¶
Assume the following Golite program is defined in a file simple.golite
inside the benchmarks/simple/
directory:
1 func main() {
2
3 var a int;
4
5 a = 3 + 4 + 5;
6
7 printf("%d", a);
8
9 }
For the lexer, the compiler must read in a Golite program and produce a token stream that will be provided to the parser. The default execution of the compiler at this point is to just run the lexer and parser. However, add a flag -lex
to command line arguments such that it will print out each token on a separate line for a given program. With the -lex
command the execution of the compiler stops at the end of lexing.
Note
Make sure to think about your function decomposition inside driver file because you will be adding more command line arguments to the compiler as the quarter progresses.
Sample Run using Go ($: is just mimicking the command line)
$: go run golite.go -lex simple.golite
Token.PACKAGE(1)
Token.IDENT(1, "MAIN")
Token.SEMICOLON(1)
Token.IMPORT(3)
Token.FMT(3)
Token.SEMICOLON(3)
Token.FUNC(5)
Token.IDENT(5, "MAIN")
Token.LPAREN(6)
... (I'm not printing them all out but you get the idea)
Your output does not have to look like the above output. We just want to see a series of tokens being printed out to verify that your scanner is working. Your implementation must go under the lexer
directory along with the generated lexer code. Please note you will need to write more code to incorporate the generated lexer into your compiler.
Task 2: Implement the Parser and the Abstract Syntax Tree (AST)¶
For the parser, the compiler must read in a Golite program and parse through the input source code. This will require that you connect your lexer code to your parser code. For this milestone, you must also design and construct an abstract syntax tree based on the language specification and formal grammar. You must determine the appropriate representations for your nodes based on the formal grammar and the AST will be the final output for this milestone. To help you with this construction, here are few nodes you should think about defining:
Statement nodes: Create a node for each type of statement in the grammar. Additionally, we recommend that you define a
Statement
interface that all statement nodes can inherit methods unique to statements.Expression nodes: Create a
Binary
,Unary
andField
nodes to handle binary expressions (i.e., integer and boolean), unary expressions, and field expressions (e.g.,mystruct.coords.x
). We also recommend that you define aExpression
interface that all expression nodes can inherit methods unique to expressions. Variables (i.e., identifiers) are also considered expressions; therefore, it should have its own type of expression node. DO NOT define a node for each type of term (e.g.,BoolTerm
andSimpleTerm
etc.). These all represent the forming of binary expressions so you can handle them easily by generating a binary expression node.Literal nodes: For each type of literal value (i.e., integer, boolean, etc.), define a node that represents that type of literal value. Since literals are expressions, they should also inherit from your
Expression
interface.Left hand-side value: Every assignment statement has a left-hand side (i.e., an l-value). You should define a node that can handle l-values that are just variables (e.g.,
x
andy
) and field l-values (e.g.,point.x
andpoint.y
).I/O and allocation nodes: Define nodes for reading (i.e.,
scan
) and printing (i.e.,print
) along with nodes to handle memory allocation/deallocation (i.e.,new
anddelete
).Finally you should define nodes for type declarations, functions, a program.
You may define a few more nodes then this but these suggestions should cover most node definitions you’ll need in your AST. Please refer back to the lecture slides and videos for tips and suggestions about the AST construction.
At this point, the default output of the compiler will print out the AST. We are not requiring a specific printing format. You are free to print the AST as you wish. However, you must define String()
methods for each AST node to print out its representation as string. If there is an error in the lexing or parsing of a program then the compiler must produce a formatted syntax error(s). The syntax error message must be formatted as follows
syntax error(LINE NUMBER): MESSAGE
where LINE NUMBER
is the line number, COLUMN NUMBER
is the column number where the error occurred. The MESSAGE
is a helpful error message. You can either collect all possible errors in the program or exit early if you find one syntax error (the latter is easier to implement).
Assume the following Golite program is defined in a file bad.golite
inside the benchmarks/bad/
directory:
1 func main() {
2
3 var a bool;
4
5 a = +;
6
7 printf("%d", a);
8
9 }
Sample Run using Go ($: is just mimicking the command line)
$: go run golite.go simple.golite
func main() {
var a int;
a = 3 + 4 + 5;
printf("%d", a);
}
$: go run golite.go bad.golite
syntax error(5,3): expected expression on right-hand side but found '+'
The message does not have to match the one above. You can just use the message you get back from ANTLR. Notice the default execution of the compiler at this point is just running through both the lexer then the parser. Additionally, the print out of the AST looks very similar to the original source code. You should try to do the same in your implementation; however, it does not to be exact but close to the original source code.
Grading and What to Submit¶
You must provide the following to get full credit for this milestone:
Submit a few Golite programs (e.g.,
simple1.golite
,simple2.golite
, etc.) that verifies your lexer and parser are running correctly.Make sure to explicitly fill out the
README
file as explained above.
The milestone is 6% of your grade and the exact weights for grading are:
6% credit: 75% and above of all tokens are being identified and parsed correctly.
5%-3% credit: 74%-50% of all tokens are being identified and parsed correctly.
3%-0% credit: No solution provided or less than 50% of the tokens in the language are being identified.
We will verify “correctness” by checking the output of your AST from parsing.
Submission¶
Before submitting, make sure you’ve added, committed, and pushed all your code to GitHub. You must submit your final work through Gradescope (linked from our Canvas site) in the “Milestone #3” assignment page via two ways,
Uploading from Github directly (recommended way): You can link your Github account to your Gradescope account and upload the correct repository based on the homework assignment. When you submit your homework, a pop window will appear. Click on “Github” and then “Connect to Github” to connect your Github account to Gradescope. Once you connect (you will only need to do this once), then you can select the repsotiory you wish to upload and the branch (which should always be “main” or “master”) for this course.
Uploading via a Zip file: You can also upload a zip file of the assignment directory.