Language Update and Syntax Fragments

I’ve been working on my language more recently. Most importantly, I’ve implemented an LALR(1) parser generator to work with. Now that it’s working, I can work on the grammar and AST (Abstract Syntax Tree).

So far, I have the start of a parser that will take a list of tokens (the tokenizer currently just gives a constant list of tokens) and parses it directly into an AST. Currently, it can generate ASTs for the top level of the grammar, as far as the following:

CompilationUnit = epsilon
                | CompilationUnit PackageDeclaration
                | CompilationUnit ImportDeclaration
                | CompilationUnit TypeDefinition

PackageDeclaration = package QName ;

ImportDeclaration = import QName ;
                  | import QName . * ;
                  | import static QName ;
                  | import static QName . * ;

TypeDefinition = ClassDefinition
               | InterfaceDefinition

AccessSpecifier = public | private | package | protected
                | package protected | protected package
                | epsilon

Modifiers = Modifiers Modifier | epsilon
Modifier = static | abstract | final | immutable | synchronized
         | NativeSpecification | transient | volatile
NativeSpecification = native ( StringLiteral )

ClassDefinition     = AccessSpecifier Modifiers class Name ClassExtendsClause ImplementsClause { MemberList }
InterfaceDefinition = AccessSpecifier Modifiers interface Name InterfaceExtendsClause { MemberList }

ClassExtendsClause     = extends PointerType | epsilon
ImplementsClause       = implements InterfaceList | epsilon
InterfaceExtendsClause = extends InterfaceList | epsilon
InterfaceList          = InterfaceList , PointerType | PointerType

QName = Name | QName . Name

That’s just a fairly basic representation of the top level of the language. It’s quite similar to Java in a few respects, but it also has some noticeable differences.

Another thing I’ve been doing is thinking about how certain other parts of the language will work. You’ll notice that there are no generics/templates in the above grammar – this is because I haven’t even fully decided on the syntax or the semantics yet. So here are a few thoughts about various things:

Closure Types

{A -> B} f;
{A, B -> void} g;
{ -> void} run;
{A, B, C -> D, E} h;
{int, String -> String, int} swap;
{File -> String throws IOException} read;
{File, {String -> String[]} -> String[] throws IOException} splitFile;

This is currently my favourite choice for the syntax of representing the type of a closure (or any method pointer). Closures can handle exceptions, which is important if they are to be assigned from methods. They also support tuples for both parameters and results. This is important, as all function arguments and results will be tuples in this language. This allows any set of values to be passed in or out of a function.

They do not, however, support default arguments. Default arguments is really just going to be a syntactic sugar applied when calling certain types of functions, and you cannot find the values of the default arguments without also being able to see which function is being called. This restriction will also affect inheritance of default argument functions.

Closure Creation

Closures can be created in two different ways. The first, assigning from a method, is trivially simple:

(String, int) swap(int x, String y)
{
   return y, x;
}
...in some other piece of code...
{int, String -> String, int} swapValues = swap; // if there are multiple definitions of swap(), this infers the correct one

The other way, creating an anonymous closure, is slightly more involved:

{int, String -> String, int} swapClosure = (int, String) closure(String a, int b)
{
   return b, a;
}; // the semicolon is required as the closure creation is part of the expression, not its own statement

This creates an anonymous closure using approximately the same syntax as a method definition. The differences are that access specifiers and modifiers are not necessary, and that the anonymous closure specification does not have a name, it uses the ‘closure’ keyword in place of one because a closure has no use for a name before it is assigned to a variable.

Templates/Generics

Currently, I’m not sure how this part of the type system should be implemented. There are, however, a few constraints on them:

A type parameter must be able to represent:
- A single class/interface type
- A primitive type (int, long, double, boolean etc.)
- A tuple of multiple different types. (facilitates maps where the key/value is a tuple, among other things)
- A closure (method pointer)
- Nothing at all (like the Void (with a capital V) type in Java)
It must also be able to support some restrictions on type parameters, such as:
- X is a type which extends/implements some base class/interface Y
- X is a superclass of some other type Y
It must also support wildcards, so that, for example, a method can take a Set<? extends SetItem>

Also, it would be much nicer if it didn’t use type erasure, so that the type information is available at runtime, and so that new instances of a type argument and arrays of instances of a type argument can be created.

It is likely that there will be problems with some of these constraints (there will have to be some boxing of tuples for generic function calls), but having such an expressive type system would be a great advantage in a new language.

If anyone’s reading this and has a suggestion on how to do any of this better (or how to implement templates/generics at all) please don’t hesitate to comment below or email me.

Anthony's Blog

My very own piece of write-only memory.

Language Update and Syntax Fragments

Leave a Reply