Existential types in Gypsum
Near the end of my work on pattern matching, I added existential types to Gypsum. I've never had good intuition for existential types, so I dreaded having to add them, but unfortunately they are necessary for any kind of casting or type testing, which is essential for pattern matching.
Existential types allow you to express that you have an object with a known class, but you don't know what's inside it. For example, instead of having a list of strings, you have a list of "something". In technical terms, you have an instance of some paraterized class, but you don't know the type arguments. Existential types are similar to wildcard types in Java.
List[String] // normal class type forsome [X] List[X] // existential type List[_] // syntactic sugar for existential type
Existential types are required in systems with erased type arguments when you're performing down-casts with type checks. That was a mouthful, so let's break it down. Type arguments in Gypsum (like the
List[String]) are erased at compile time. The compiler uses these type arguments to check the type safety of the program, but they are not present at run-time. If you have some random object, you can check its class to determine that it's a
List, but no part of the object tells you it should only contain
Strings. You could look at the elements of the
List, but it might be empty, and you might come up with the wrong answer anyway: all the elements of a
List[Object] might happen to be
Strings. In the future, Gypsum will support dynamic type arguments (not erased) that will allow this kind of check, but there will be a performance penalty for using them (they take up space and have to be loaded and stored), so erased type arguments (marked with the
static keyword) will probably always be more common.
Because type arguments are erased, when we down-cast a value to a subtype, we cannot know the type arguments of the new type. Existential types express this lack of knowledge, which is why we need them.
There are many situations when we don't actually need to know the type arguments when we're dealing with an object. For example, if we want to know the length of a list, it doesn't really matter what's inside it.
def get-length-if-arg-is-list(obj: Object): i64 = match (obj) case list: forsome [X] List[X] => list.length case _ => 0
You've already seen a couple examples of existential type syntax, but here it is in detail.
An existential type begins with the
forsome keyword, followed by a list of type parameters in brackets, followed by an inner type. The type parameters may have upper and lower bounds. If the bounds are omitted, the upper bound is
Object, and the lower bound is
ty ::= forsome [tp, ...] ty | ... tp ::= T <: tyupper >: tylower
The type parameters may only be referenced by the inner type and by each other. They are invalid outside of the existential type.
There is some syntactic sugar to make existential types less clunky and more familiar to programmers who have used Java wildcard types. The
_ symbol (blank) can be used as a type argument. If a class type has blank type arguments, the whole type is wrapped in an existential, and the blanks are replaced with fresh type parameters.
// These types are equivalent. Foo[_, _, _] forsome [X, Y, Z] Foo[X, Y, Z]
The bounds of these type arguments are automatically set to the bounds of the class's type parameters.
The subtype relation
The hardest part about implementing existential types was understanding how they interact with the rest of the type system. But this is the most important part. The key function in the Gypsum compiler is
Type.lub (in ir_types.py). This function computes the least upper bound of two types in the type lattice. The least upper bound is the most specific type which is a supertype of both types.
Type.isSubtypeOf is implemented using this.
Here are the rules for computing the least upper bound of two types, at least one of which is existential, in plain English.
rightbe the types being combined.
rightInnerbe their inner types. If one of them is not existential, then the inner type is the type itself.
innerLubbe the least upper bound of
innerLubreferences any type parameters declared in
right, then the least upper bound is an existential type wrapped around
innerLubdeclaring the variables that were referenced.
- If no parameters are referenced, the least upper bound is
There are two important details to be aware of.
First, any type is equivalent to itself wrapped in an existential. This works in both directions, and it doesn't throw any information away.
// These types are equivalent. Foo[Bar, Baz] forsome [X] Foo[Bar, Baz]
This rule is applied at the beginning, when we get the inner types, and at the end, when we decide whether to wrap
innerLub in an existential.
Second, the least upper bound of the inner types of two existential types determines the least upper bound of the existential types themselves. This is the middle part.
// Suppose we have a class with covariant type parameters. class Foo[static +A, static +B] // This is the least upper bound of two types. // Recall: `Nothing` is a subtype of all class types. // `â‹` denotes least upper bound. forsome [X] Foo[X, Nothing] â‹ forsome [Y] Foo[Nothing, Y] = forsome [X, Y] Foo[X, Y]
In a previous article on pattern matching, I had an example like this:
def get-or-else[static T](opt: Option[T], default: T): T = match (opt) case Some[T](value) => value case _ => default
The first match case has a destructuring pattern. The type of
value is determined by the return type of the
Option.try-match method. That method returns
Option[T], so the type of
value is inferred to be
How does that work though? How can
Option[T] when it's supposed to test and match objects in general? The answer is that
try-match is overloaded. There are actually two implementations.
One implementation handles options specifically. Since we know statically that the argument is an option, we don't need to do any type testing and can just return it.
class Option[static +T] ... static def try-match(opt: Option[T]) = opt
The other implementation handles objects in general. It performs a type test and a down-cast as part of a variable pattern.
class Option[static +T] static def try-match(obj: Object): Option[Object] = match (obj) case opt: Option[_] => opt case _ => None
Note that the type of
Option[_], an existential type. We return
Option[Object] though, which is equivalent.
Frequently, we have an object with an existential type, and we want to call a method or load a property. This is always allowed when the result has nothing to do with the existential type (like the
length property in the
get-length-if-arg-is-list example above). It's even allowed most of the time even when existential variables are involved, but the result is up-cast to a more generic, non-existential.
Here's the general logic for accessing a property (either a method or field):
- Peel off any existential layers from the receiver (object) type and set the existential type parameters aside. The effective receiver type will be a regular class type.
- Determine what property is being accessed, based on the name and the types of the arguments (if there are arguments).
- Determine the type of the result. If type arguments were applied, this may involve type substitution.
- If the result type doesn't reference any of the existential variables peeled off earlier, then that is the final result type.
- If a property is being assigned, report an error. We cannot assign to an existential type safely, because we don't really know what it is.
- Otherwise, up-cast the result type to its base class type. Repeat until no existential variables are referenced.
Let's walk through these steps with this call:
let opt: forsome [X] Some[X] let value = opt.get
- The receiver type is
forsome [X] Some[X].
- We peel off existential types. The effective receiver type is
- We look up the
getmethod. Its return type is
Tis the type parameter in
- We perform type argument substitution, replacing
X. So the result type is
- The result type does reference an existential variable, and we are not assigning it. We up-cast to its upper bound,
- The result type is
I'm glad existential types are finally implemented. I still feel like my grasp on them is tenuous, and I'm anxious that my implementation of the least upper bound rules might not be completely right. Most theoretical treatments of existential types have special
close expressions for dealing with them explicitly. In Gypsum, I wanted most operations involving existential types to be implicit, so there's not a direct mapping to theory.
If you're interested in reading more about existential types, I found the paper Towards an Existential Types Model for Java Wildcards [pdf] to be a useful comparison.
The book Types and Programming Languages by Pierce is always a good reference for type systems.