Type inference

Published on 2008-01-01
Tagged: compilers fenris

This entry was taken from my old Fenris development blog. It was originally posted on May 8, 2008. Although the Fenris project is now defunct, I think these old entries have some general appeal, so I've added them to this blog.

Type inference is a common feature in functional languages that I would like to bring into Fenris. Although Fenris resembles Java syntactically, I don't think it should inherit Java's verbosity. In most situations, it's easy for the compiler to figure out which types you intend to use; you shouldn't really have to explicitly type every variable or function parameter. There are also cases where type inference would significantly clean up syntax, such as immediate functions, arrays, and records (which now require an explicit type).

Type inference for Fenris is a difficult problem because definitions can be made in an arbitrary order. In the normal type inference algorithm, you can figure out types one definition at a time, but something more imaginative will have to be done for Fenris. The purpose of this entry is to describe the initial design (which will probably change a lot over the next few weeks).

Type inference will definitely take place in the IR phase. Wherever a type is needed but not given, the compiler will generate an erased type with a unique identifier. Erased types just act as placeholders and will be replaced after type inference using a substitution system already in use for generics. The unique identifier acts as a key for a constraint table in the IR environment. Each erased type will be mapped to a list of type constraints that are built as the program is comipiled to IR. A constraint describes a certain property of a type. Here are some I've already thought of:

is a class
is a subtype of _
has member _
has member _ with arguments _
is a closure with arguments _, return type _
is an array
supports binary operator _ with other type _
supports unary operator _

I'll probably think of more later. Frequently, we'll need to unify two types. This is done by taking the union of the constraint list of each type and mapping both identifiers to the same list.

Once we have a full list of constraints, we will need to actually assign types for the erased types. This is done by making a big list of possible types and using the constraints to eliminate them. In some cases, we will have several related candidates; in those cases, we choose the most general type. If multiple candidate types are not related, an ambiguity error is reported. If there are no candidate types, the erased type is overconstrained, which is also an error.

Most of the above constraints rely on other erased types being decided first. We can think of this as directed acyclic (hopefully) graph where the nodes are erased types and the edges are dependencies. The fixpoint algorithm is useful for solving systems like these. It will loop until no more types can be decided, which (again hopefully) means that all types have been decided.

After type inference is complete, we'll need to rewrite the code. The obvious part of this involves replacing erased types with real types, but this also includes distinguishing between record and object operations and adding generic parameters where needed. This may actually be fairly difficult to get right.