Generics/Templates

I’ve been thinking more about the generics/templates problem today. I can think of two ways of doing it, each having its own advantages and disadvantages.

Templating

Doing things this way would mean checking the types with the generic type arguments, and then creating a new specialization in machine code for each combination of type parameters. The type parameters would only apply to instances of a class (static members cannot relate to the type parameters).

This has the advantage that separate methods are created for each specialization of the type parameters, which means that if someone uses a tuple as a generic type parameter, it does not have to be autoboxed into a pointer type to be passed to the method (a win for performance, since it requires fewer heap allocations).

It also has a major disadvantage. If a library defines a generic class, then any code using that library will have to include its specializations as part of its own machine code. This could cause problems for dynamically linked programs, for example in the following program:

class LibStack<E>
{
  E[] stack = new E[0];
  void push(E e) { ... }
  E pop() { ... }
  E peek()
  {
    return stack[getFrontPos(stack.length)];
  }
  static int getFrontPos(int size)
  {
    return size - 1;
  }
}

(Try to ignore the bad implementation for a minute, it helps me to demonstrate the problem)

So far, this works fine and an application writer has decided to use LibStack<Int> in their application, which generates the specialization for Int inside their executable. This works great until the library writers decide to change the front position of the stack to be the start of the array.

static int getFrontPos(int size)
{
  return 0;
}

This piece of code is changed, along with the code for push(), peek() and pop(). Unfortunately, the new code for push(), peek() and pop() will only reach the application’s code when it is recompiled. So the static method will be changed (as it is loaded dynamically from the library), but the push, peek and pop methods are stored in the application’s code, and so the old versions are used. This results in application code such as the following breaking:

void main()
{
  LibStack<Int> stack = new LibStack<Int>();
  stack.push(new Int(1));
  stack.push(new Int(2));
  stack.push(new Int(3));
  println(stack.peek()); // not necessarily how std-out will be implemented in the language
}

With the old version of the library, this will print 3 as intended, however the same binary loading the new version of the library will print 1 instead.

Generics with Run Time Type Information

Doing things this way would provide the same interface to the programmer. The class/method/whatever would be type-checked with generic type arguments, and the type parameters for classes would only apply to the instances of a class, not the static members. The difference is that this way, only one copy of the class and its methods is created in machine code.

This has the advantage that additional machine code is not generated for each specialization, and so it avoids the problem with templates above as well as lowering code size.

The disadvantage of this way of doing things is that each function can only take a certain number of arguments, so tuples and other primitive types must be autoboxed into a pointer type before they can be used by a generic class/method/whatever. This boxing causes a heap allocation (essentially it would internally call a method with ‘new BoxedIntStringLongTuple(a, b, c)’ instead of just ‘(a, b, c)’ as the parameter).

This method currently sounds very much like Java’s Generics, which have type erasure so that you cannot, for example, create an array of a generic type parameter. However there is something that can be done about this: Run Time Type Information. Basically, this involves passing information about the type parameter into the classes/methods/etc that use it. So a generic class would have generic type information embedded into it, and methods of that class could query this type information to create an array of that type (of course, this would really be a boxed type if a primitive type such as a tuple or a closure was used).

Autoboxing (which causes lots of heap allocations) also provides us with an advantage. If only pointer types are used in generics, then all functions that use generic type parameters have the opportunity to call standard Object functions on them; that’s right: equals(), hashCode() and others can be provided for boxed tuples, closures, and other primitive types. These functions would have to be implemented sensibly. For tuples, they should only delegate their calculations to the values the tuple contains. But if you want to create a Set<{ -> void}>, or a Map<(int, {int -> String}), String>, it should be possible.

Again, these are only ideas, comments/suggestions for improvements are appreciated.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.