Gypsum now has type parameters!

Published on 2014-12-09
Tagged: compilers gypsum

I'm happy to introduce a useful new feature to Gypsum: type parameters. Type parameters are also known as generics in other languages. They enable parametric polymorphism, providing abstraction over types for functions and classes.

Example: Identity function

Let's start with the simplest possible example. Type parameters allow us to implement an identity function (a function that returns its argument) with a correct return type. Without type parameters, the function would look like this:

def id(x: Object): Object = x
...
id("foo")   // the type of this expression is `Object`

If we call this function with a String, we get an Object back. Looking at the implementation, we know that we'll actually get the same string back, but the compiler doesn't know that just from looking at the return type. Unsatisfying.

Here's how we would implement this function using a type parameter:

def id[static T](x: T): T = x
...
id[String]("foo")   // the type of this expression is `String`

T is a type parameter. It's basically a placeholder that represents some other object type. When we call the id function, we pass a type argument in brackets before the regular argument. When we pass String in this example, both the parameter type and the return type effectively become String. It's as if we defined the function like this instead:

def id(x: String): String = x

Note that the type parameter T has a static attribute. I'll explain this a little later.

Example: Linked list

A well-known example of type parameter utility is the humble linked list. Actually, type parameters are just as useful for any container, but linked lists are very easy to implement in a short example.

class List[static E](value: E, next: List[E]?)
...
var strings = List[String]("foo", List[String]("bar", null))

The code above declares a class List. Each List object has a value of type E. When we refer to the type of a List object, we write the type argument in brackets, like we did with the identity function earlier. For example, we could have a List[String] (list of strings) as above. Each List object has a next field of type List[E]?, which points to the next node in the list. The ? sigil indicates that next might be null.

We can define methods within List which use the type parameter E implicitly. For example, here is a take method, which returns a new list comprising the first few elements of the receiver.

class List[static E](value: E, next: List[E]?)
  def to-string: String =
    value.to-string + (if (next !== null) " " + next.to-string else "")

  def take(n: i64): List[E]? =
    if (next === null || n == 0)
      null
    else
      List[E](value, next.take(n - 1))


def main =
  var strings = List[String]("foo",
                  List[String]("bar",
                    List[String]("baz", null)))
  var taken-strings = strings.take(2)
  print(taken-strings.to-string + "\n")
  // this will print "foo bar\n"

Example: The map method

We can also define methods that have their own type parameters. In the example below, map takes a function as a parameter. It applies the function to each element of the list and returns a new list with the results. Gypsum doesn't have first-class closures yet, so we approximate them with an abstract class (another new feature).

abstract class Function[static P, static R]
  abstract def apply(param: P): R


class List[static E](value: E, next: List[E]?)
  def to-string: String =
    value.to-string + (if (next !== null) " " + next.to-string else "")

  def map[static R](fn: Function[E, R]): List[R] =
    var mapped-value = fn.apply(value)
    var mapped-next = if (next === null) null else next.map[R](fn)
    List[R](mapped-value, mapped-next)


class AppendString(suffix: String) <: Function[String, String]
  def apply(param: String) = param + suffix


def main =
  var strings = List[String]("foo",
                  List[String]("bar",
                    List[String]("baz", null)))
  var mapped-strings = strings.map[String](AppendString("x"))
  print(mapped-strings.to-string + "\n")
  // this will print "foox barx bazx\n"

Kinds of type parameters

You're probably wondering why there was a static attribute on all of the type parameters in the examples above. Gypsum will eventually support three different kinds of type parameters.

Static parameters (marked with the static attribute) are just like generics in Java: they are really only there so the compiler can perform some type checking. They are erased at compile-time, so if you have a List object at run-time, there's no way to find out what kind of elements it may contain.

Dynamic parameters (not marked with any attribute) are not erased. When you call a function that has dynamic type parameters, the compiler will generate code to pass special type values as arguments to that function. I imagine an is-instance-of function could be implemented this way. Likewise, when you instantiate a class that has dynamic type parameters, the type values will be stored in the new object.

Dynamic parameters are more useful than static parameters, but they will have some overhead, both in time and space. There is a tradeoff here, so Gypsum will leave the choice up to you. Dynamic parameters are the default (but right now, you'll get a NotImplementedError if you try to use them).

Template parameters (marked with the template attribute) are like templates in C++. When you define a class or function with static or dynamic type parameters, you end up with just one definition in bytecode that can be used with different types. The definition has to be generic, which means it can only deal with reference types (i.e., object types) since reference values are all the same size and have a common ancestor class (the Object class). Template parameters will not have this limitation. When you use a function or class with a template parameter, the definition is copied, and the template parameters are replaced with the type arguments you pass. This allows type arguments to be value types or primitive types.

Of course, there is some overhead for this in code size. There may be a speed advantage though. C++ programmers sometimes avoid virtual methods and use the curiously recurring template pattern to implement static polymorphism. Gypsum will let you decide how you want to make that tradeoff.

For now, Gypsum only supports static parameters. Dynamic and template parameters will come some time in the future (hopefully soon).

Conclusion

Type parameters are a really useful feature for statically-typed languages. They allow a more expressive type system, which means the compiler can do a better job of checking your code. They also let you avoid a lot of grungy run-time casts and type checks, which can reduce performance. They even provide some level of self-documentation in code: List[String] is a lot more descriptive than List.

Both Gypsum and its VM, CodeSwitch, are open source, and are available on GitHub. If you want to play with an example, see map-list.gy in the examples directory. Most of the code in this article was borrowed from there.