A Snap Parser¶
Our first goal is to be able to successfully parse our goal into an
AST. An AST is a
programmatic representation of the structure of our program (which you
may have encountered via the ast
module in Python).
Create a test file named test_example.py
in your Snap package’s test
folder, and create a test case within it. The test case will guide us along the
way towards at least being able to parse our example program. We’ll handle
interpreting it soon.
We wish to parse the example program we had earlier:
foo = 2 + 7
bar = "cat" + "dog"
print(foo)
print(bar)
into an AST. There are numerous ways to represent the above source code as an AST. Feel free to be a bit creative, but as a sampling, we will represent the first sample of this piece of code as the following (RPython) AST:
Assign("foo", BinaryOperation("+", Integer(2), Integer(7)))
See if you can write a (failing) test case that asserts that the result of parsing the above source should produce that AST. You’ll need to create a few classes corresponding to the various node types in your AST.
Note
Hooray! Tests don’t need to be valid RPython, they won’t run within your interpreter, so you have freedom to write them however you’d like.
We’ll follow roughly typical TDD (if you’re unfamiliar, don’t worry too much about it), so create the function that will parse your AST and make your example test pass by simply giving it a fake implementation that simply returns the (overall) AST we want for our example program.
We’ll now build out our parser in small chunks until we successfully can parse the above program.
RPly and an Actual Start to Our Parser¶
We need a way to actually parse our source code into the AST we’ve
just designed. There are a number of paths forward. The most obvious
is to implement a parser ourselves “by hand”, but there are a number
of tools that exist to make this easier. The RPython standard library
has the rpython.rlib.parsing
module, but we’ll use the RPly library, which has
a slightly nicer API.
There are only two pages of documentation, so have a quick read through them to see how you will likely want to proceed. You’ll likely find consulting the CyCy parser helpful minutes as well but try and wait until you’re off the ground.
You may want to consult additional resources on EBNF notation if this is your first exposure to it. For our purposes superficial knowledge should suffice until you want to extend the parser (later) as well, so it can also wait.
Break down the things you will need to parse into smaller chunks, write unit tests for the (small AST) that will result from parsing them, and then implement enough of a parser to make your tests pass, extending your implementation at each step. For example, you might want to take the following path:
- Parse integer literals
- Parse sums of integers
- Parse assignment of an integer
- Parse the assignment of a sum
- ...
slowly building up your parser via your unit tests.
Translation¶
Let’s not forget about translation! Generally speaking, you should not necessarily attempt to translate your code not after change or commit (it’s tedious and long as you’ll soon notice). Rely on your tests to check that your code works, then simply periodically attempt to translate, and fix any invalid RPython you’ve introduced since the last time.
Let’s set up enough to be able to translate our project and check if it’s valid RPython. Recall that translation only operates on code paths that are actually reachable from your entry point, which we’re about to define, and only on objects after import-time, where you’re able to fully utilize Python.
Note
If you forgot about the type unification requirement of RPython, you’ll likely see your first translation error with your AST nodes!
These errors are cryptic, but if you stare at them and trust them for long enough, they often are clear enough to point you in the right direction.
See if you can solve your error.
More About non-RPython¶
With the above note again in mind about translation operating only on code paths your interpreter will traverse (and not ones used only during tests or debugging), it’s useful to set yourself up for easy debugging of your parser.
Give your AST nodes a helpful __repr__
and see what you can do in
regular Python to help yourself as you go along.
You’ve likely already implemented __eq__
and __ne__
methods on
your nodes to get your tests to pass. These too do not need to be valid
RPython since they likely will only be used in tests.