Nullability is a feature that allows you to specify whether or not something can be null. This is useful because it allows the language to enforce that access to a field or method is via a non-null value. This should make NullPointerExceptions a thing of the past!
In fact, the only place in plinth where you might get a run-time error because of a null value is when you cast from a nullable type to a not-null type.
Let’s start by looking at some of the things we can and can’t do with nullable values (the ?:
and ?.
operators are based on those in languages like Groovy):
// casting between null and not-null values: Foo foo = new Foo(); ?Foo nullable = foo; Foo notNull = cast<Foo> nullable; // throws a run time error if nullable is null // accessing fields and methods: stdout::println(nullable.someText); // error: cannot access a field on a nullable object nullable.doSomething(); // error: cannot call a method on a nullable object // the null coalescing operator (giving a default value): ?uint index = null; uint notNullIndex = index ?: 0; // the null traversal operator: // (the result is always null if the left-hand-side is null) ?string str = null; ?string subString = str?.substring(0, 2); // method call (the arguments are only evaluated if str is not null) ?#[]ubyte bytes = str?.bytes; // field access ?{string -> boolean} equalsFunc = str?.equals; // method access
Before I implemented nullability in Plinth, every type was not-null, including custom type definitions. The main thing you need to consider with not-null fields is that they have no default value; this makes plinth’s default behaviour very different from most major object oriented languages.
Because some fields have no default value, they must be initialised before the end of the constructor. In fact, they must be initialised before the newly allocated object could possibly be accessed from outside the constructor. This means that, until all of the fields of this
have been given values (or have default ones like zero or null), the value of this
cannot escape the constructor. In plinth, making sure of this amounts to stopping people from using this
(except for things like this.field = value;
) and stopping people from calling non-static methods.
So far this is all fine, but what happens when we have two (or more) classes which need not-null references to each other? This problem is difficult to solve, for a number of reasons. I’ll use the following as a very simple example:
class A { B b; this(B b) // <-- constructor { this.b = b; } } class B { A a; this(A a) { this.a = a; } }
Obviously, in this scenario, one of the constructors must always be run first. This seems like it makes it impossible to create either of them, but let’s suppose that we have a way of allocating an object without calling its constructor. When an object has just been allocated, it’s still ‘uninitialised’ until its constructor has been run. Now, if we could pass an uninitialised object into another constructor, we might be able to do something like this:
uninitialised A allocA = alloc A; uninitialised B allocB = new B(allocA); A a = new (allocA) A(allocB); // creates an A inside allocA, like C++'s placement-new operators B b = allocB;
But now we have more problems. B’s constructor can’t let this
escape until all of the fields are initialised, but now after it’s done this.a = a;
it can do whatever it wants, including letting ‘this’ escape and trying to access data inside a (which still hasn’t been initialised yet). We need to mark that the constructor can take certain uninitialised values, and then make sure that it can’t let ‘this’ escape even after everything’s been initialised. We could do this with an ‘uninitialised’ modifier:
class B { A a; uninitialised this(uninitialised A a) { this.a = a; } }
Now, we know exactly what isn’t allowed to escape: ‘this’ and ‘a’ (we only allow uninitialised parameters in uninitialised constructors). Then, once we’ve created allocB
, we know that it being initialised depends only on allocA
being initialised, since it is the only uninitialised argument.
Next, we call the constructor on allocA
, to produce a
(A’s constructor will have to be modified the same way as B’s). We now also know that allocA
being initialised depends only on allocB
being initialised. We know the dependency cycle! All that we need to do is find the solution which makes the most things initialised (i.e. the greatest fixpoint of these simultaneous equations). Then we will know that both of them are initialised, thus allowing the assignments to a
and b
.
This scheme is based loosely on some discussions with and papers recommended to me by James Elford, in particular the Freedom Before Commitment paper[1]. However, it is not yet anywhere near implemented in plinth, for several reasons:
- I’ve just come up with it recently, and I have lots of other things before it on my to-do list (interfaces! properties! exceptions!).
- Processing the type system for uninitialised values would be extremely complicated, and would involve each type having a list of things which its initialisation state is conditional upon. I’d like to fill in the rest of the type system before adding this complexity.
- It can be easily added later. The only things which would not be backwards compatible with the current language are the keywords.
- There are workarounds which can be used in the mean time (see below)
That said, this cyclic initialisation scheme is something I would very much like to implement eventually, once all of the problems with uninitialised variables have been well-thought-out.
The slightly ugly workaround which can be used until this scheme is implemented is to have a nullable field and a property to access it. This will give a run-time error whenever a field is accessed on a not-properly-initialised object:
class A { ?B realB; property B b getter { return cast<B> realB; } setter { realB = value; }; this() { } void setB(B b) { // this method is only necessary if we make property's setter private this.b = b; // calls property setter } }
Next week’s post will talk about binary compatibility, and how objects in plinth are represented in LLVM bitcode.
References:
[1] Summers, Alexander J., and Peter Müller. “Freedom Before Commitment.” (2010).
You did the code sample above:
uninitialised A allocA = alloc A;
uninitialised B allocB = new B(allocA);
A a = new (allocA) A(allocB); // creates an A inside allocA, like C++'s placement-new operators
B b = allocB;
It looks like, at line 3, `allocB` is an as-yet uninitialized reference (it stores a reference to `allocA`, which is in turn uninitialized). But what if we never run line 4? Can we escape `a` ? So we have a reference to an `A` with an reference to `allocB`, which has uninitialized type? What if we re-assigned the field of `allocB` before we ran the “initialization” step of `allocA`, as:
uninitialised A allocA1 = alloc A;
uninitialised A allocA2 = alloc A;
uninitialised B allocB = new B(allocA1);
allocB.a = allocA2; // this should be illegal
How about the following?
class C
{
D d;
E e;
uninitialised this(uninitialised E, uninitialised D) { ... }
}
class D
{
C c;
}
class E
{
C c;
}
uninitialised E e' = alloc E;
uninitialised D d' = alloc D;
uninitialised C c' = new C(d', e');
E e = new (e') E(c');
// What's the type of e.c.d? It should be "uninitialised D", but nothing in the type of "e" will give this away.
I think the problem you’ve got here is that the initialisation state of an object is inherently dependent on that of others (its fields). In the code from the post, you need some way of saying that `a` from line 3 still isn’t initialised all the way up until after line 4.
For the benefit of anyone who hasn’t read Freedom Before Commitment, it actually skirts over the problem I mentioned quite vaguely. They do so by, instead of declaring an “uninitialised” version of the variable, and then calling some `init` method later, instead typing a new reference with `uninitialised` until all the objects passed into its constructor have been initialised. This is what we’re seeing with `allocB` becoming assignable to a reference of type `B` in the example Anthony gives. The problem is that we have to be quite careful; we still have to treat the (now “fully” initialised) `a` as `uninitialised` until allocB is also fully initialised. So in the last example I gave, we’d treat `e` as uninitialized, even though we’ve passed through its initialisation routine, until `c’` becomes fully initialised (which will be after we call
D d = new (d') D(c')
It’s worth reading Summers & Müller, because I can’t do the subtleties of the problem justice in the comments section of a blog.
Fähndrich et al. give a different model of initialisation which achieves more or less the same effect here, but by making the period for which an object must be considered uninitialised more explicit, they make the problem more clear.
Thanks for the comments!
The way that I plan to solve the problems you’ve mentioned is very similar to what they do in the Freedom Before Commitment paper. Whenever you call a constructor, any uninitialised arguments to that constructor are taken as initialisation-dependencies for the result. Similarly, whenever you assign an uninitialised value to a field, that value becomes an initialisation-dependency.
Then, when you try to cast one of these values to initialised, you first have to make sure all of its dependencies would also be initialised if it was.
So in my original example:
uninitialised A allocA = alloc A;
uninitialised B allocB = new B(allocA);
A a = new (allocA) A(allocB); // creates an A inside allocA, like C++'s placement-new operators
B b = allocB;
During line 3, when the constructor is run, both allocA and allocB are immediately initialised, and from that point on they can be assigned to normal variables. Line 4 doesn’t really need to be there, it’s just to demonstrate that both of them are now initialised.
Now your example:
uninitialised A allocA1 = alloc A;
uninitialised A allocA2 = alloc A;
uninitialised B allocB = new B(allocA1);
allocB.a = allocA2; // this should be illegal
This is actually fine. After line 3, allocB depends only on allocA1; but after line 4, it depends on *both* allocA1 and allocA2, since allocA1 was passed to the constructor and allocA2 was assigned to a field. Note that we have absolutely no idea what happens in the constructor, so we can’t discount allocA1 as a dependency.
This is a really interesting point, and it crosses over with binary compatibility a bit. Since the constructor could be in an external library that might be upgraded, we can’t even be sure that we know the full list of fields in the object: If a class is sealed (so it can’t have any subclasses), then it is binary compatible to add new fields to the end of its definition – I’ll cover this properly in next week’s post.
Incidentally, this is why things had to be done slightly differently from the system in the Freedom Before Commitment paper, because there you have to know the full list of fields to be initialised at the point where you call the constructor. In this system, you can have objects whose constructors haven’t run yet, so you can always pass two objects into each other’s constructors.
As you said: to anyone who hasn’t, it’s worth reading the Design section in Summers & Müller, because I’ve probably glossed over a fair amount of detail here.