Plinth has an extensive type system. As described in last week’s post, there are several different sorts of user defined types, but there are also lots of different sorts of built-in types:
Primitives
There are three main sorts of primitive type: integers, floating point numbers, and booleans. Integers have both signed and unsigned variants.
ubyte |
byte |
boolean |
ushort |
short |
float |
uint |
int |
double |
ulong |
long |
As in several other languages (and in contrast to C/C++), bytes are 8 bits, shorts are 16 bits, ints are 32 bits, and longs are 64 bits.
One thing to note is that there is no char type. This is because it would just be an alias for an integer large enough to hold all of the possible character code points (i.e. uint), and we may as well use integers instead. Character literals will be allowed, but they will be equivalent to an integer literal for the character code they represent (so writing ‘a’ will be the same as writing 97).
Arrays
Array types are written slightly differently than they are in most other languages. Here are a few array declarations and creations:
[]uint arrayOfUnsignedIntegers = new [5]uint; []?float arrayOfNullableFloats = new [9]?float; []string arrayOfStrings = new []string {"Hello", "Arrays"}; ?[]#object nullableArrayOfImmutableObjects = null;
As you can see, this syntax allows us to represent the nullability and immutability of each part of the type precisely and unambiguously. With the postfix notation that most languages use, this sort of type would be much more difficult to understand.
Array accesses, however, are still postfix. This makes much more sense for indexing, since this way for multidimensional arrays the indices are in the same order as in the type signature:
[][]double matrix = new [2][10]double; double lastValue = matrix[1][9]; []double lastRow = matrix[matrix.length - 1]; double sameLastValue = lastRow[lastRow.length - 1];
If an array is nullable, array element (or even length) accesses on it are not allowed. You must first cast it to a not-null array type.
Tuples
Tuple types are written by enclosing a list of types in parentheses. They can be declared and assigned to as follows:
(uint, string) test = foo(); (?[]object, boolean) asdf = null, false;
The assignment syntax is quite flexible, and allows you to declare multiple variables at once, while extracting their values from e.g. a function call:
uint x, y = 5, 4; // this is equivalent to: (uint, uint) x, y = 5, 4; x, y = y, x; ([]string, double) str, d = function();
Tuples can also be nullable:
?(uint, string) test2 = test; ?(bigint, double) tuple = check ? bigint(2), 5.3 : null;
There are two different ways to extract a single value from a tuple:
(bigint, string) values = getValues(); bigint large = values ! 1; // (the index into the tuple must be a constant value - calculations are not allowed) (bigint, string) _, text = values; large, _ = values;
Of course, splitting up a nullable tuple in this way is not allowed, you would have to cast it to not-null first.
Object
The object type is the super-type of all other types. It can be used as follows:
object nothing = object(); object integer = 5; object tuple = 7, "hello"; ?object nullable = null; #object immutableArray = new [4]short;
As with several other types, objects can be nullable and/or immutable.
Objects are always stored on the heap. This means that in order to convert primitives, compound types, tuples, and functions to objects, we must store them on the heap, boxed inside objects. This is unfortunate, since this conversion is often implicit and is not at noticeable as a normal heap allocation using ‘new’ would be, but it is required in order to have a unified type system, and it is the only way of doing generics without templated code.
Some weird situations start occurring when you convert a compound type to an object. Compound types are usually passed by value, but when we store them as objects on the heap, all references to them will use the same instance of that object, bypassing the usual pass-by-value semantics.
Functions
Function types are written as a list of parameter types followed by a return type:
{uint, string -> void} foo = someMethod; #{ -> string} toStr = 9.81.toString;
All methods can be accessed as function-typed values in this way. The interesting thing about function values is that they not only store a function pointer, but a pointer to the callee to call the function on. This allows you to pass around a method that changes some of an object’s state. For types which are not just pointers (i.e. primitives, compound types, tuples, functions), the callee is first stored as an object on the heap.
Functions can be nullable and/or immutable, but immutability means something different for functions than it does for objects and arrays. Objects and arrays have data-immutability, where an immutable value cannot have any of its data modified. In contrast, functions have method-immutability, in the sense that an immutable method cannot modify any global state. If you call an immutable function, you can be fairly sure that it won’t have any side-effects (modulo native methods and mutable fields).
Strings
string is actually just a compound type, but string values are special because they can be created by using string literals in double quotes, and because they can be concatenated using the +
operator. Other than this, strings are an immutable compound type which has methods for common operations like finding the character at a given position, or finding its length.
Strings have full support for unicode; in fact, they are stored in UTF-8 by default because it usually uses much less memory than UTF-32. Since UTF-8 is a variable-length format, with a single character taking anywhere from one to six bytes, getting the length of a string is an O(n) operation. Under the hood, the string type lazily calculates the UTF-32 representation the first time it is needed (e.g. when finding the length), and stores it for later use.
Here is the source code for string, and some actual plinth code which isn’t just an example.
Next week’s post will be on plinth’s generated LLVM bitcode, and run-time type information.