Memory management in CodeSwitch

Published on 2015-09-12
Tagged: codeswitch garbage-collection gypsum virtual-machines

CodeSwitch has its own garbage collected heap, which is used not only for objects allocated by interpreted code, but also for most internal data structures. In this article, I'll describe how the heap is built, how the garbage collector works, and how it tracks pointers to the heap from C++ code.

For readers who don't follow this blog regularly, CodeSwitch is the virtual machine that interprets code for Gypsum, an experimental programming language I'm working on. Both can be found on GitHub.

Heap layout

The heap is organized into chunks, which are 1 MB ranges of memory obtained from the operating system using mmap. Each chunk has a header which contains a pointer to the chunk's allocation range (contiguous area of free space where new blocks can be allocated), a pointer to the head of the chunk's free list (a linked list of free blocks), and a marking bitmap used by the garbage collector. The chunk header also contains a pointer to the VM object, which simplifies a lot of operations and removes the need for a global or thread-local pointer. The rest of the chunk is used to store blocks, which are actual allocated objects.

Header

VM pointer

Allocation range

Free list head

Marking bitmap
...

Blocks
...

Every chunk is aligned at a 1 MB boundary in the virtual address space. This alignment lets us easily find the chunk for any given block, which is handy for allocation, garbage collection, and other operations.

chunk = block & ~(1 * MB - 1)

Check out memory.h to see how Chunk is defined.

Blocks and allocation

A block is anything that can be allocated on the heap. Each kind of block is defined using a C++ class derived from the Block root class in block.h. The first word in each block is a pointer to a meta, a block that contains meta-data about other blocks. Because the first word is used this way, Block and its subclasses cannot have virtual methods, since the compiler would use the first word for the vtable pointer.

Metas store lots of info about blocks they represent, including the type of the block, the size of the block, the size of the block's elements (if it's an array), and which words in the block may contain pointers. For objects (instances of the Object class), the meta also contains a pointer to the Class and pointers to virtual methods. Metas for non-objects are constructed manually (see Roots::initialize in roots.cpp). Metas for objects can also be generated automatically from classes (see Class::buildInstanceMeta in class.cpp).

Blocks are allocated using overloaded new operators. For most kinds of blocks, this operator just takes the Heap as an argument, but some blocks use additional arguments to determine how large they are. For example, the new operator for String (string.h) takes the number of characters in the string. new operators which take extra arguments will partially initialize blocks using these arguments, so they don't have to be passed to the constructor, too. new operators are also responsible for setting the meta of a newly allocated block.

// Example: allocating a string.
u32 chars[3] = {'f', 'o', 'o'};
String* str = new(heap, 3) String(chars);

new operators obtain free memory from the Heap (see Heap::allocate in heap.cpp for the entry point). The heap tries to obtain memory from the current allocation range (a contiguous block of free memory) by incrementing the pointer at the lower end of the range. If there's not enough space, the heap searches the free list in each chunk (a linked list of Free blocks that track non-contiguous free space) for a new allocation range that is large enough. If that doesn't work, the heap may expand by allocating a new chunk from the OS. If the heap decides not to expand, an AllocationError is thrown. The caller must catch it, invoke the garbage collector, then retry. All classes have static factory methods which automate the process of calling new, catching the exception, invoking the garbage collector, and trying again. So it's rare to call new directly.

Tracking pointers from C++

C++ is not designed to deal with garbage collection, but it's a flexible language, and you can do just about anything with it if you bang on it hard enough. There are two main challenges. First, the garbage collector needs to be able to find all pointers to blocks on the heap in order to determine which blocks are reachable and which blocks are dead. Second, the garbage collector needs to be able to move blocks around, which means it must be possible to rewrite pointers. (Conservative garbage collectors dodge both of these issues, but they have other problems).

The solution to both of these problems is to use handles (see handle.h). A handle is a C++ "smart pointer" object that points to a slot in a global table (HandleStorage) which contains all the pointers to garbage collected objects. The garbage collector can scan and update entries in this table all at once. CodeSwitch C++ code almost always uses handles when dealing with pointers to blocks. The only situation where raw pointers can be used is in sections where the garbage collector is guaranteed not to run. That means no allocation can be performed in those sections.

There are two different kinds of handles with different lifetimes (both heavily inspired by V8). Persistent handles have no lifetime restriction. Their constructor allocates a slot in the global handle table from a free list, and their destructor frees it. Local handles have their lifetime bound to a HandleScope. When a handle scope is created, it allocates a chunk of empty slots, and local handles are allocated linearly out of this. Local handles do not need to be freed, and their destructor does nothing. This makes them a little faster than persistent handles. When a handle scope is destroyed, its slots are also destroyed, which is why local handles can't outlive their enclosing handle scope. Since handle scopes can only be created on the stack, this also means that local handles can't be stored in objects.

To recap, local handles are faster than persistent handles, so they are used more widely. However, their lifetime restriction means they can only be used on the stack. Persistent handles are needed, especially for pointers from non-garbage-collected C++ objects to garbage-collected blocks. All handles returned through the API are persistent.

Note that handles are not needed for pointers from blocks to other blocks on the heap. The garbage collector can find these pointers by scanning the heap and using the Meta blocks mentioned earlier. These pointers are managed using a wrapper class called Ptr, which ensures write barriers are performed. A write barrier is a bit of code that notifies the garbage collector whenever a pointer is changed. This is required by incremental and concurrent garbage collectors. The current garbage collector is neither of these, but it will be some day. It's really hard to add a write barrier later, so I've designed for it now.

The garbage collector

The garbage collector is currently in the "simplest possible thing that will work" stage. For those familiar with garbage collection terminology, it is a stop-the-world, mark-sweep collector. You can find the implementation in gc.cpp.

Mark-sweep is the algorithm used to perform the collection. It consists of two phases. The marking phase discovers all live blocks on the heap which are reachable, either directly or indirectly, from a set of root pointers. The root pointers are mostly just the slots in the global handle table mentioned earlier. The marking algorithm performs a depth-first-search, starting from the root pointers. When it reaches a block that it hasn't seen yet, it sets the bit in the marking bitmap that corresponds to the first word of the block. Remember, each chunk has a marking bitmap as part of its header. The marking bitmap has a bit for each word in the chunk. Once the marking phase is complete, each live block has its mark bit set, and each dead block has its marking bit clear.

After the marking phase, the sweeping phase begins. The garbage collector iterates through each block in each chunk, checking mark bits. Dead blocks are converted into Free blocks by changing the meta and storing a block size. Consecutive free blocks are coalesced into larger free blocks. Pointers to free blocks in each chunk are kept in a C++ vector. Once the iteration is complete, the blocks are sorted by size (largest first), and the free list is constructed. Each free block points to the next free block in the list. A pointer to the first free block is stored in the chunk header.

If a chunk contains no live blocks, it is possible to release it back to the OS. However this is not done yet; right now, the heap never shrinks.

Conclusion

Hopefully, you've found the last few articles on CodeSwitch interesting. It's a pretty simple system, but it's architecturally similar to other VMs, especially V8 (which I worked on at my previous job). I wrote about garbage collection in V8 a couple years ago, so you might check that out if you want to know more.

There are a lot of new language features in Gypsum, so I'll get back to writing about that soon!