Method resolution

Published on 2008-01-01
Tagged: compilers fenris

This entry was taken from my old Fenris development blog. It was originally posted on March 6, 2008. Although the Fenris project is now defunct, I think these old entries have some general appeal, so I've added them to this blog.

I've decided on at least a tenative approach to calling methods polymorphically. It seems the best compromise is to use a closed hash table instead of a vtable and interface table. This allows very simple method lookup and average O(1) for all operations.

Method hash table format
R	n - 1
m₁ key	m₁ pointer
m₂ key	m₂ pointer
...	...
m_n key	m_n pointer

The first two words of the hash table tell its size and hash function. R is a random number, and n is the size. The size is guaranteed to be a power of two. Hash codes are computed as H(k) = (k ^ R) & (n - 1). We store n - 1 in the table so the subtraction isn't necessary.

After the header are key-value pairs for methods. Every method defined in a class or interface has a unique key. Keys are actually addresses of bytes in an empty segment; the linker guarantees that they are unique. The values in this case are the addresses of the actual methods. Note that the same method may have multiple entries in the hash table for a class, since it may be defined in multiple interfaces and/or superclasses.

Of course every hash table has the potential for collisions. The run-of-the-mill chained hash table is extensible, but is hard on caches. Since we will never need to add methods to the table after it is initially populated, it is better to use a closed implementation. If a collision occurs, the program probes entries in the table in a circular fashion after the initial slot. Tentative C code for method lookup is as follows:

void* method_lookup( int* hash_table, int key )
{
    int R = hash_table[0];
    int n_minus_1 = hash_table[1];
    int index = ( R ^ key ) & n_minus_1;
    int method = hash_table[index + 2];

    while ( method != key )
    {
        if ( method == 0 )
            return method_rehash( hash_table, index, key );
        index += 2;
        index &= n_minus_1 << 1;
        method = hash_table[index + 2];
    }

    return (void*) method;    
}

The keys aren't finalized until the program is linked, so we cannot construct the hash tables ahead of time. The method_rehash function will look up the appropriate pointers for methods as they are needed, populating the hash table lazily. It may be based either on a separate, binary searched table (which can be constructed at compile time) or dynamic lookup.

This approach is very similar to method lookup in Objective C. It is advantageous over other techniques because interface coercion is a no-op and method lookup is average O(1) for both interface and class method calls. It does not incur the overhead of storing interface table pointers in memory (as tuples with the object pointer), and it is relatively fast and cache-friendly. It is also fairly to implement. The only disadvantage is that it is not quite as fast as normal method calls in C++.

At least initially, the method_lookup function will be part of the runtime (rather than inlined). This means that the implementation can be tweaked with minimal modification to the compiler. The first implementation may even use a simple binary search tree rather than a hash table.