My principles while writing the grammar for this language have been:
- Do not change the syntax to suit the grammar, unless the proposed new syntax makes more sense than the original.
- Do as much work as possible during the parsing phase: do not fix a parsing problem by adding rules which add more work to the semantic analysis phase.
The result of this should be a cleaner, easier to use language. However it can cause problems while writing the grammar, usually in the form of shift-reduce conflicts. Some of the conflicts I’ve encountered are below.
Current problem
My current status in the language is that I’ve just started on statement parsing (having finished expressions). However, there’s one problem which looks like it’s very difficult to solve nicely while leaving the syntax unchanged.
The problem is that the start of a tuple type can look exactly like the start of a parenthesised expression. To illustrate the problem, here is an example of what a statement that starts with tuple type looks like:
(Comparable<String>, int, String) x, y, z = foo();
and here is an example of what a statement which starts with a parenthesised expression can look like:
(a < b ? foo : bar).test();
You can quite easily see that both of these start with a left parenthesis, followed by a Name, followed by a left angle bracket. Also, they are both statements, and are allowed in exactly the same places in a program.
The problem here is what the parser does once it gets to the left angle bracket. It can either shift the Name (which has now been reduced to a QName, or qualified name) onto the parse stack in anticipation of the first scenario (a list of type parameters), or it can reduce it into a Field Access Expression, in anticipation of a comparison with some other expression. This is a shift-reduce conflict, and makes it impossible to parse the language without rewriting part of the grammar. In this case, it will be difficult to rewrite the grammar because it spans so much of the Expression and Type hierarchies.
Other (solved) problems
I have encountered a few other problems while writing the grammar for this language. Here are some of them.
Comparison vs Shift Expression
If “>>” is actually two right angle bracket tokens, then it becomes hard to disambiguate between a shift expression and relational (comparison) expression; the following statements are hard to decide between:
int a = b >> c; int a = b > c ? 8 : 16;
In the case of finding something followed by a right angle bracket, the parser can either shift and expect another angle bracket for a shift expression, or it can reduce and expect a relational expression. The easy way to fix this is to make “>>” into a single token. This, however, causes another problem. Type parameters can easily end in “>>”, for example:
Iterator<Comparable<String>> iterator = blah();
To fix this, a set of rules for type parameters had to be devised so that the programmer doesn’t have to add a space between the two angle brackets. It involves rules for TypeParameterListRAngle, which is a type parameter list with a single trailing right angle bracket. The solution requires the rules for both PointerType and Type to have variants which end in right angle brackets, along with TypeParameter, WildcardTypeParameter, TypeArgument, and others (the grammar is at the end of this post).
Type vs Expression
This problem was to do with closure creation. The original syntax for closure creation was:
{int -> String throws Exception} intToString = String closure(int x) throws Exception { throw new NotImplementedYetException(); }
But this caused the same conflict as the one I am currently working on. When this happened, I didn’t realize that there were other situations where Type and Expression could both occur in the same place. I now realize that the problem I am currently working on is a more general version of this one, and once it is fixed, the old syntax for closures would work just as well.
However, because I didn’t realize at the time, I came up with this new syntax for closures:
{int -> String throws Exception} intToString = closure(int x -> String) throws Exception { return "" + x; }
The only reason I switched to this syntax instead of solving the problem was that I thought that, by using the arrow to indicate the return type list, this was more in keeping with the closure’s type signature.
Static Initializer vs Field/Method definition
Static initializers start with the same thing as a field/method definition, a member header. This is basically a combination of an optional access specifier and an optional list of modifiers. The reason for this is to avoid shift-reduce conflicts between the static keyword and the member header on other members. A bad member header for a static initializer (anything but a single “static” modifier) is detected and causes an error during parsing.
The difference between static initializers and fields/methods starts immediately after a member header. A static initializer starts with a block, but a field/method starts with a type, which could be a closure type. Since blocks and closure types both start with open braces, the start of a statement could also be the start of a list of types. This can again cause the Type vs Expression problem, but once that is solved a comma could indicate an assignee list as part of an assignment statement, or a type list as part of a closure type. This may be relatively easily solved by using a right recursive list for both types and assignees, but there is an alternative syntax which would make things slightly easier for the parser:
class Foo { static Foo { String a, b = "hello", "world!"; } }
My argument for this syntax is that it makes it look slightly more like a constructor, which also does initialization. However, it does necessitate more typing (when you actually use a static initializer). So although this has currently been solved by changing the syntax, I would be happy to change it back again if anyone presents a reasonable argument for the old (or indeed any other!) syntax.
Grammar
Here is the current Grammar for the language. It still has conflicts and there are some things haven’t been implemented yet, but feel free to have a look anyway. When I have finished the grammar I will likely attach the final version to a blog post.