Milestone #1: Front-End (Scanner & Parser)¶
Due: Friday February 3rd at 11:59pm
This milestone will require you to get started implementing the front-end for the compiler.
Working Individually or as a Pair¶
You are allowed to work alone or in teams of two for the compiler project. Before creating your repository via the invitation, you will first need to
Decide if you want to work alone or find a partner in the class.
Come up with a fun team name or if you are working alone a name for your compiler.
The first person (or yourself if you are working alone) to accept the Github classroom invite will enter in the team name. If working in a pair, the second group member will need to click on their team name in-order to join the repository.
Once you have done these two steps, then you can move on to Getting started section.
Getting started¶
A Git repository will be created for you on GitHub. However, before that repository can be created for you, you need to have a GitHub account. If you do not yet have one, you can get an account here: https://github.com/join.
To actually get your private repository, you will need this invitation URL:
When you click on an invitation URL, you will have to complete the following steps:
See step 3 in “Working Individually or as a Pair” section for the first step.
- You now need to clone your repository (i.e., download it to your machine).
Make sure you’ve set up SSH access on your GitHub account.
For each repository, you will need to get the SSH URL of the repository. To get this URL, log into GitHub and navigate to your project repository (take into account that you will have a different repository per project). Then, click on the green “Code” button, and make sure the “SSH” tab is selected. Your repository URL should look something like this: git@github.com:mpcs51300-win23/proj-TEAM_NAME-GITHUB-USERNAME.git.
If you do not know how to use
git clone
to clone your repository then follow this guide that Github provides: Cloning a Repository
If you run into any issues, or need us to make any manual adjustments to your registration, please let us know via Ed Discussion.
Language Overview: GoLite¶
Please make sure you read over the document that describes the language we will be implementing this quarter: Language Overview
Compiler Structure¶
To help with code modularity and ease of grading, we will require a specific repository structure throughout the implementation of the compiler. We will provide specific names to directories that are required; however, you are able to include additional directories and files to help with your implementation. You are not allowed to use any external third-party libraries unless Professor Samuels specifically gives you permission to do so. You may only use the libraries provided in the standard library of your chosen language. If you think you need to use an external library then ask before using it.
The initial contents of the repository includes directories and files needed to help you get started implementing the front-end of the compiler. Lets review these directories now:
benchmarks - Throughout the implementation of the compiler, you can place sample Golite programs in this directory to help test the various phases. Professor Samuels during week 4 will provide additional benchmark tests to place inside this directory.
golite - The directory that includes the main driver for the compiler (i.e., the file that includes the
main
function or entry point code into the compiler).grammars - The directory that will contain the ANTLR grammar files and the full ANTLR runtime jar file (i.e.,
antlr-4.11.1-complete.jar
). If you are running ANTLR remotely then you will need to install Java and Java Runtime on your local machine: Install Java Runtime . The Java is already installed on the linux servers. You will also see a file calledgenerate.sh
. This is a bash script used to generate the lexer and parser. Right now, it’s preset to produce Go code but you can change theDlanguage=YOUR_LANGUAGE
flag to your specific language. When ran, thegenerate.sh
files will automatically place the generated lexer and parser into the parser and lexer directories.lexer - The directory contains the autogenerated lexer files. Additionally, you will need to write code to connect the autogenerated code into the your compiler implementation.
parser - The directory contains the autogenerated parser files. Additionally, you will need to write code to connect the autogenerated code into the your compiler implementation.
build.sh - You are required to write a bash script that fully builds the compiler into an executable. This script must also call any other scripts that generate/build any additional components of the compiler. If you using a language that does not build an executable (i.e., Python), then just generating/building all the required components is fine. You will provide instructions in the README file that explains how to run your compiler.
README - This document provide specific instructions to run your compiler on the CS Linux Servers. You must be detailed and explicit with these instructions. Please break these instructions down into specific steps (ie.. Step 1, 2, 3, etc.). Also provide instructions on how to run a a program and specify flags. Failing to provide a straight-forward way of executing your compiles will result in significant deductions to your score.
As we build each phase of the compiler, we will state the additional required directories. Always make sure that your compiler runs correctly on the CS Linux Servers. If it doesn’t run on those servers then you will receive an automatic zero for your completeness score for the project. We need to be able to grade them specifically on those machines. Again feel free to include additional directories/files as needed.
Compiler Implementation Languages¶
You are required to implement the entire compiler in one of the following languages:
Java
C++
Java (or dialect)
Python 3
Go
You are not allowed to implement the compiler in any other language. Professor Samuels recommends you implement the compiler in Go since he and TA will be able to help you faster. Implementing, the compiler in other languages may result in not fully be able to help quickly fix problems. Please keep this in mind that building and fixing the compiler will be your sole responsibility but the course staff will help in any way possible.
Milestone Requirement: Scanner & Parser¶
The main requirement for the first milestone is to get a working scanner and parser completed. You must use ANTLR (ANother Tool for Language Recognition) to build your scanner (i.e., lexer) and parser. The grammar files must be placed inside the grammars
directory. We already have created this directory for you and have placed the ANTLR jar file inside this directory.
Task 0: Learning ANTLR¶
Your first task is to understand ANTLR system by reading over it’s documentation:
You are required to read over all the sections under the Sections area, with the exception of these sections:
Unicode U+FFFF, U+10FFFF character streams
Parsing binary streams
Parser and lexer interpreters
Resources
Professor Samuels will shortly post a video explaining how you can implement the Cal language using ANTLR and connecting it to a compiler implementation. However, I want you to first read over these docs before watching the video so you understand better what’s happening in the video.
Task 1: Implement the Lexer using ANTLR¶
Assume the following Golite program is defined in a file simple.golite
inside the benchmarks/simple/
directory:
1 func main() {
2
3 var a int;
4
5 a = 3 + 4 + 5;
6
7 printf("%d", a);
8
9 }
For the lexer, the compiler must read in a Golite program and produce a token stream that will be provided to the parser. The default execution of the compiler at this point is to just run the lexer and parser. However, add a flag -lex
to command line arguments such that it will print out each token on a separate line for a given program. With the -lex
command the execution of the compiler stops at the end of lexing.
Note
Make sure to think about your function decomposition inside driver file because you will be adding more command line arguments to the compiler as the quarter progresses.
Sample Run using Go ($: is just mimicking the command line)
$: go run golite.go -lex simple.golite
Token.PACKAGE(1)
Token.IDENT(1, "MAIN")
Token.SEMICOLON(1)
Token.IMPORT(3)
Token.FMT(3)
Token.SEMICOLON(3)
Token.FUNC(5)
Token.IDENT(5, "MAIN")
Token.LPAREN(6)
... (I'm not printing them all out but you get the idea)
Your output does not have to look like the above output. We just want to see a series of tokens being printed out to verify that your scanner is working. Your implementation must go under the lexer
directory along with the generated lexer code. Please note you will need to write more code to incorporate the generated lexer into your compiler.
Task 2: Implement the Parser using ANTLR¶
Assume the following Golite program is defined in a file bad.golite
inside the benchmarks/bad/
directory:
1 func main() {
2
3 var a bool;
4
5 a = +;
6
7 printf("%d", a);
8
9 }
For the parser, the compiler must read in a Golite program and parse through the input source code. This will require that you connect your lexer code to your parser code. For this milestone, if the input source code parses correctly then nothing is printed to the console. However, you should start thinking about implementing your Abstract Syntax Tree because this will be the final output of the parser but it is not required to be implemented for this milestone. For now, the default execution of the compiler will produce nothing or produce a formatted syntax error(s). The syntax error message must be formatted as follows
syntax error(LINE NUMBER): MESSAGE
where LINE NUMBER
is the line number, COLUMN NUMBER
is the colunmn number where the error occurred. The MESSAGE
is a helpful error message. You can either collect all possible errors in the program or exit early if you find one syntax error (the latter is easier to implement).
Sample Run using Go ($: is just mimicking the command line)
$: go run golite.go simple.golite
$: go run golite.go bad.golite
syntax error(5,3): expected expression on right-hand side but found '+'
The message does not have to match the one above. You can just use the message you get back from ANTLR. Notice the default execution of the compiler at this point is just running through both the lexer then the parser.
Grading and What to Submit¶
You must provide the following to get full credit for this milestone:
Submit a few Golite programs (e.g.,
simple1.golite
,simple2.golite
, etc.) that verifies your lexer and parser are running correctly.Make sure to explicitly fill out the
README
file as explained above.
The milestone is 5% of your grade and the exact weights for grading are:
5% credit: 75% of all tokens are being identified and parsed correctly.
3% credit: 50% of all tokens are being identified and parsed correctly.
0% credit: No solution provided or less than 50% of the tokens in the language are being identified.
Submission¶
Before submitting, make sure you’ve added, committed, and pushed all your code to GitHub. You must submit your final work through Gradescope (linked from our Canvas site) in the “Milestone #1” assignment page via two ways,
Uploading from Github directly (recommended way): You can link your Github account to your Gradescope account and upload the correct repository based on the homework assignment. When you submit your homework, a pop window will appear. Click on “Github” and then “Connect to Github” to connect your Github account to Gradescope. Once you connect (you will only need to do this once), then you can select the repsotiory you wish to upload and the branch (which should always be “main” or “master”) for this course.
Uploading via a Zip file: You can also upload a zip file of the assignment directory.