Last week, I implemented the instanceof
operator. Doing this made me consider in much more detail how casting and instanceof should work, as there are several ways that they could.
The Unified Type System
A unified type system is one where every type can be cast to some all-encompassing super-type. In Plinth, the name of that type is ‘object
‘, or to be more specific ‘?#object
‘ in order to account for nullability and immutability.
When you cast, say, a double
to an object
, you are implicitly performing a heap allocation so that the double
can be stored on the heap inside something that looks like an object
. In fact, it is just an object
with a double
tacked onto the end, and with its virtual functions replaced by ones which extract that double
and call the real functions. The same system is used to cast tuples, compounds, functions, and all of the other primitives to objects.
On the other hand, arrays and classes already look enough like objects to be cast to them directly (using the same pointer).
Because we might want to use instanceof
on an object
, or check whether to throw a CastError
before we actually do a cast, every object stores a pointer to a block of run-time type information (RTTI). This RTTI holds data about the sort of type (e.g. ‘primitive’ or ‘array’), and the properties of the individual type (e.g. the kind of primitive it is, or an array’s base type). When we cast a double
to an object
, that object
has an RTTI block for double
.
Casting
Different casts are performed in very different ways. A cast from uint
to long
simply zero-extends the binary value from 32 bits to 64 bits, whereas a cast from a class Foo
to object
is just a matter of reinterpreting the same pointer as a different type.
Casting from an object
to some other type is much more complicated. If the RTTI for the object does not match the destination type properly, we need to throw a CastError
. If it does match, we might have to extract a value from inside the object, or maybe reinterpret that object as a class, or possibly search through the RTTI for a super-interface’s VFT and tuple it with the object pointer.
But what happens if we do the following:
long a = 5; object obj = a; int b = cast<int> obj;
The object
is a long
, but long
s can be cast to int
s easily – they are just a truncation from 64 bits to 32 bits.
The problem is knowing that obj
is a long
. It might be anything from a boolean
to a string
, in which cases the result should definitely be a CastError
. Since the RTTI for long
doesn’t match the values we expect for an int
, the cast won’t be allowed.
However, with a lot of work, it would be possible to allow it. We could write a really long line of checks for whether the run-time-type is int
or ubyte
or boolean
or float
or {uint -> string}
or [][]double
. We would actually only need to check the types that we could convert from, but it would require a huge amount of LLVM code for such a small amount of Plinth code.
This type of “transitive” cast gets especially cumbersome when you consider the fact that you can convert tuples like (ubyte, short, boolean)
to (int, long, boolean)
. If this type of cast were allowed, casting from an object to a tuple would require looking through the RTTI for each of the tuple’s values (recursively) to first check whether the conversion is possible, and second find out how to perform it.
In Plinth, transitive casts are not allowed. In order to handle the sort of cast we tried to do above, we would have to do something like:
int b = cast<int> cast<long> obj;
Instanceof
The instanceof operator (which might be renamed to ‘is
‘ at some point) allows you to check whether a value is an instance of a given type. For example:
long a = 5; object obj = a; boolean isInt = obj instanceof int; // false boolean isFoo = obj instanceof ?#Foo; // type error: cannot check against a nullable or immutable type
As shown, you cannot check whether something is nullable or data-immutable. This is because these are properties of the reference, not the value itself: even if the reference is immutable, the value behind the reference usually won’t be. Similarly, the value behind a nullable field won’t itself be nullable – it can’t be null. The exception to this is when checking nested types: []boolean
is different from []?boolean
.
There are several different ways which instanceof
could have worked. For example it could return true whenever the value is assignable to the type without losing data, so a uint
with value 3
would be an instance of ubyte
, because it fits inside an 8 bit unsigned value. This scheme would have required the same sort of transitive checking described above.
Another way would have been to return true whenever the assignment would work in a single step without losing data (i.e. the same, but without the transitive checks), which would illustrate exactly when casting would work. This could give you true for (ubyte, Foo, string) instanceof (uint, Bar, string)
. It could even allow nullable values, returning true if a null value were checked against a nullable type. The problem is that this system is very often confusing and can’t tell you unambiguously whether this number you’ve got is a ubyte
.
Plinth uses the simplest and hopefully the most obvious system: value instanceof type
is only true when the value is already an instance of type. For classes and interfaces, this means checking against all super-types as well, because those super-types are part of the sub-type. But comparing a short
to an int
will result in false even if the cast is known at compile time to be just a sign-extension to 32 bits.