Casting and Instanceof

Last week, I implemented the instanceof operator. Doing this made me consider in much more detail how casting and instanceof should work, as there are several ways that they could.

The Unified Type System

A unified type system is one where every type can be cast to some all-encompassing super-type. In Plinth, the name of that type is ‘object‘, or to be more specific ‘?#object‘ in order to account for nullability and immutability.

When you cast, say, a double to an object, you are implicitly performing a heap allocation so that the double can be stored on the heap inside something that looks like an object. In fact, it is just an object with a double tacked onto the end, and with its virtual functions replaced by ones which extract that double and call the real functions. The same system is used to cast tuples, compounds, functions, and all of the other primitives to objects.

On the other hand, arrays and classes already look enough like objects to be cast to them directly (using the same pointer).

Because we might want to use instanceof on an object, or check whether to throw a CastError before we actually do a cast, every object stores a pointer to a block of run-time type information (RTTI). This RTTI holds data about the sort of type (e.g. ‘primitive’ or ‘array’), and the properties of the individual type (e.g. the kind of primitive it is, or an array’s base type). When we cast a double to an object, that object has an RTTI block for double.

Casting

Different casts are performed in very different ways. A cast from uint to long simply zero-extends the binary value from 32 bits to 64 bits, whereas a cast from a class Foo to object is just a matter of reinterpreting the same pointer as a different type.

Casting from an object to some other type is much more complicated. If the RTTI for the object does not match the destination type properly, we need to throw a CastError. If it does match, we might have to extract a value from inside the object, or maybe reinterpret that object as a class, or possibly search through the RTTI for a super-interface’s VFT and tuple it with the object pointer.

But what happens if we do the following:

long a = 5;
object obj = a;
int b = cast<int> obj;

The object is a long, but longs can be cast to ints easily – they are just a truncation from 64 bits to 32 bits.

The problem is knowing that obj is a long. It might be anything from a boolean to a string, in which cases the result should definitely be a CastError. Since the RTTI for long doesn’t match the values we expect for an int, the cast won’t be allowed.

However, with a lot of work, it would be possible to allow it. We could write a really long line of checks for whether the run-time-type is int or ubyte or boolean or float or {uint -> string} or [][]double. We would actually only need to check the types that we could convert from, but it would require a huge amount of LLVM code for such a small amount of Plinth code.

This type of “transitive” cast gets especially cumbersome when you consider the fact that you can convert tuples like (ubyte, short, boolean) to (int, long, boolean). If this type of cast were allowed, casting from an object to a tuple would require looking through the RTTI for each of the tuple’s values (recursively) to first check whether the conversion is possible, and second find out how to perform it.

In Plinth, transitive casts are not allowed. In order to handle the sort of cast we tried to do above, we would have to do something like:

int b = cast<int> cast<long> obj;

Instanceof

The instanceof operator (which might be renamed to ‘is‘ at some point) allows you to check whether a value is an instance of a given type. For example:

long a = 5;
object obj = a;
boolean isInt = obj instanceof int; // false
boolean isFoo = obj instanceof ?#Foo; // type error: cannot check against a nullable or immutable type

As shown, you cannot check whether something is nullable or data-immutable. This is because these are properties of the reference, not the value itself: even if the reference is immutable, the value behind the reference usually won’t be. Similarly, the value behind a nullable field won’t itself be nullable – it can’t be null. The exception to this is when checking nested types: []boolean is different from []?boolean.

There are several different ways which instanceof could have worked. For example it could return true whenever the value is assignable to the type without losing data, so a uint with value 3 would be an instance of ubyte, because it fits inside an 8 bit unsigned value. This scheme would have required the same sort of transitive checking described above.

Another way would have been to return true whenever the assignment would work in a single step without losing data (i.e. the same, but without the transitive checks), which would illustrate exactly when casting would work. This could give you true for (ubyte, Foo, string) instanceof (uint, Bar, string). It could even allow nullable values, returning true if a null value were checked against a nullable type. The problem is that this system is very often confusing and can’t tell you unambiguously whether this number you’ve got is a ubyte.

Plinth uses the simplest and hopefully the most obvious system: value instanceof type is only true when the value is already an instance of type. For classes and interfaces, this means checking against all super-types as well, because those super-types are part of the sub-type. But comparing a short to an int will result in false even if the cast is known at compile time to be just a sign-extension to 32 bits.

Anthony's Blog

My very own piece of write-only memory.

The Unified Type System

Casting

Instanceof

Leave a Reply