Arrays in Gypsum

Published on 2016-01-26
Tagged: gypsum

View All Posts

I recently added arrays to Gypsum, but they are quite different than traditional arrays. In most languages (like C or Java), arrays are a primitive that stand on their own. You can build other data structures like array lists and hash maps out of them. In Gypsum, array elements can be integrated into any class. The normal class members come first in memory, then the array elements follow immediately. This is mainly desirable from an efficiency perspective: it provides better cache behavior by improving spatial locality.

Compact memory layout

Consider the built-in String class. A String object contains a pointer to a meta (all blocks in CodeSwitch start with a meta), a length, and a contiguous sequence of code points. Here's the memory layout for the string "hello":

Offset Value Description
0: String (meta)
8: 5 (length)
16: h elements
20: e
24: l
28: l
32: o

This is more efficient than storing the string header and characters in separate blocks, as we would be forced to in Java, if we were implementing a string class from scratch with arrays. With this layout, we can retrieve a character from a string using a single load instruction.

String is a built-in class of course, so it's easy to implement it this way with some special handling. The new thing here is that you can use this compact array layout for any class.

New syntax

In order to add array elements to a class, some new syntax is required: an arrayelements declaration.

final class Array[static T]
  arrayelements T, get, set, length

This declaration can only be used in final classes because the first array element needs to be located at a predictable offset from the beginning of the object, after all the normal fields. If a non-final class had array elements, a subclass could declare additional fields that would overlap with the elements.

The arrayelements declaration has four parts.

  1. T - the type of the elements. All elements have the same type. It can be any type.
  2. get - the name of a method that will return an element. The method will take one argument: the index of the element to return.
  3. set - the name of a method that will change the value of an element. The method will take two arguments: the index of the element, and the value to store.
  4. length - the name of a method that will return the number of array elements.

The get, set, and length methods are generated by the compiler. You can name these anything. You can also use attributes like public, private, and override when declaring them.

There's also a bit of new syntax to allocate objects with array elements, since you need to specify the length.

let array = new(12i32) Array[String]

The new keyword is followed by the length in parenthesis. Once the object is allocated, its length cannot be changed.

Example: MutableArray

This is all well and good, but what if you just need a regular array? Just a collection of elements? You can implement this easily using arrayelements. Here's the MutableArray class from the standard library.

public final class MutableArray[static T]
  arrayelements T, public get, public set, public length

That's it! You can use this like an array in any other language.

import std.MutableArray

def fill[static T](value: T, count: i32) =
  let array = new(count) MutableArray[T]
  var i = 0i32
  while (i < count)
    array.set(i, value)
    i += 1i32
  array

Example: covariant immutable Array

You might recall that Gypsum supports covariant and contravariant type parameters. It's actually possible for the array element type to be covariant, as long as the array is immutable. This means, for example, that Array[String] would be a subtype of Array[Object]. This is not safe with mutable arrays, since you could create an Array[String], cast it to Array[Object], then store something that's not a String in there. Java and C# let you get away with this at the cost of some inefficient run-time type checking.

In Gypsum, all we need to do to make an array immutable is to add the final keyword to the arrayelements declaration. Here's the definition of the Array class from the standard library.

public final class Array[static +T]
  private def this(a: MutableArray[T]) =
    var i = 0i32
    while (i < length)
      set(i, a.get(i))
      i += 1i32

  static def create(a: MutableArray[T]) = new(a.length) Array[T](a)

  final arrayelements T, public get, private set, public length

The + on the type parameter indicates it's covariant.

The final keyword changes two rules. First, the element type will be considered covariant instead of invariant. Second, the set method can only be called in the initializer and in constructors. In these functions, the actual type of the object is known, and we don't need to worry about variance. Outside of these functions, it's unsafe to make changes to elements because we don't really know the element type. In this class, the set method is marked private to avoid exposing it to clients, since they can never call it.

Example: BitSet

Let's look at something that's not just an array. Here's an implementation of a bit set, a set of non-negative 32-bit integers represented by a bitmap. Each BitSet object contains a sequence of bytes which stores the bitmap.

import std.IllegalArgumentException

final class BitSet
  // The constructor is private. It doesn't do anything since there
  // are no normal fields.
  private def this = {}

  // Clients use this method to create a new `BitSet`. This calculates
  // the required capacity, then allocates a new `BitSet`. Allocated
  // objects are zero-initialized automatically.
  static def create(max: i32) =
    let bytes-needed = (max + 7i32) / 8i32
    new(bytes-needed) BitSet

  def contains(n: i32) =
    if (in-range(n))
      let byte = get-byte(byte-index(n))
      (byte & bit-mask(bit-index(n))) != 0i8
    else
      false

  def add(n: i32) =
    if (!in-range(n))
      throw IllegalArgumentException()
    var byte = get-byte(byte-index(n))
    byte |= bit-mask(bit-index(n))
    set-byte(byte-index(n), byte)

  def remove(n: i32) =
    if (!in-range(n))
      throw IllegalArgumentException()
    var byte = get-byte(byte-index(n))
    byte &= ~bit-mask(bit-index(n))
    set-byte(byte-index(n), byte)

  private def in-range(n: i32) = n >= 0i32 && n / 8i32 < capacity
  private def byte-index(n: i32) = n / 8i32
  private def bit-index(n: i32) = (n % 8i32).to-i8
  private def bit-mask(ix: i8) = 1i8 << ix

  // This is the important part. Each element is a signed byte (`i8`).
  // The accessor methods are all declared `private` to avoid exposing
  // implementation details.
  arrayelements i8,
      private get-byte,
      private set-byte,
      private capacity

You can see the full example on GitHub.