Pattern matching in Gypsum

Published on 2016-01-01
Tagged: gypsum

Have you written code in a language that has pattern matching? It's an incredibly powerful tool, and I really miss it when I'm writing code at work. I've always found it strange that this feature hasn't found its way into Java, C++, or some other mainstream language.

Introduction to pattern matching

You might think of pattern matching as a switch statement on steroids. You examine a value using several patterns, then execute an expression based on which one of the patterns successfully matched. switch statements are a primitive form of pattern matching: they only let you compare values with known constants. With true pattern matching, you can perform type tests, structural tests, and you can bind parts of the value to variables which can be referenced inside the expression for the matching case. This is amazingly useful for dealing with structured, polymorphic data.

Pattern matching is probably best explained with examples. The code below is written in Gypsum, which I'm proud to say now supports pattern matching.

The example below is a factorial function. You could implement this easily with an if expression, but it's good to start with something simple.

def factorial(n: i64): i64 =
  // The `match` expression performs a pattern match on the
  // expression in parenthesis.
  match (n)
    // The first two patterns match specific values.
    // If `n` is 0 or 1, then 1 is returned.
    case 0 => 1
    case 1 => 1

    // The `_` pattern matches anything that wasn't already
    // matched in the cases above.
    case _ => n * factorial(n - 1)

Options are objects which wrap values that may or may not be present. An option will either be Some if a value is present or None if not. Options are used as a safe replacement for null. If you miss a check for null, the compiler won't complain, and you may get a NullPointerException at run-time. You need to explicitly unwrap options, or the compiler will give you a type error. Pattern matching makes it very easy to do this unwrapping.

import std.Option, Some

// This function gets the value inside an option or returns
// a default value if there is none.
def get-or-else[static T](opt: Option[T], default: T) =
  match (opt)
    // This case will match if there is something inside the
    // option. The value will be bound to the variable `value`.
    case Some[T](value) => value

    // Cases match in order, so this will match if the previous
    // case didn't match, which means there's nothing inside.
    case _ => default

The example below evaluates arithmetic expression trees. This might be used in an interpreter. See arithmetic-evaluation.gy for the full example.

class Integer(value: i64)
  ...

abstract class Expr

final class ConstExpr(value: Integer) <: Expr
  ...

final case NegExpr(expr: Expr) <: Expr
  ...

final case AddExpr(left: Expr, right: Expr) <: Expr
  ...

final case SubExpr(left: Expr, right: Expr) <: Expr
  ...

final case MulExpr(left: Expr, right: Expr) <: Expr
  ...

final case DivExpr(left: Expr, right: Expr) <: Expr
  ...


// This function recursively evaluates an arithmetic expression
// tree. For each node in the tree, we determine what kind of
// expression it is, evaluate any sub-expressions, then perform
// an arithemtic operation to combine the results.
def evaluate(expr: Expr): Integer =
  // This `match` uses destructuring patterns. These patterns 
  // check the type of `expr`, and if the type matches, they 
  // extract internal values and bind them to local variables
  // such as `left` and `right`.
  match (expr)
    case ConstExpr(value) => value
    case NegExpr(subexpr) => -evaluate(subexpr)
    case AddExpr(left, right) => evaluate(left) + evaluate(right)
    case SubExpr(left, right) => evaluate(left) - evaluate(right)
    case MulExpr(left, right) => evaluate(left) * evaluate(right)
    case DivExpr(left, right) => evaluate(left) / evaluate(right)

Hopefully those examples gave you some idea of why pattern matching is a desirable feature. It is widely supported in functional languages like Haskell, OCaml, F#, and Scala, but unfortunately, there's very little support in mainstream languages. Gypsum now supports pattern matching (which is why I'm writing this).

Aside: Options and tuples in Gypsum

In order to support pattern matching, I've added two useful data structures to the Gypsum standard library: options and tuples.

Options are pretty much as described above. There is an Option base class, which describes options in general. Some is a subclass that represents options that actually have a value. None-class and its singleton value None represents options that have no value. You can see the full source code in option.gy (it's actually quite short).

import std.Option, Some, None

let foo: Option[String] = Some[String]("something")
let bar: Option[String] = None

Tuples are objects that contain a fixed number of values of various types. Tuples are normal classes in the standard library, but Gypsum has some special syntactic sugar to support them.

import std.Tuple2

// This statement uses normal Gypsum syntax (no syntactic sugar).
let foo: Tuple2[String, String] = Tuple2[String, String]("abc", "def")

// This statement uses special tuple syntax for the type and expression.
let foo: (String, String) = ("abc", "def")

The source code for tuples is in tuple.gy. Tuples with up to 10 elements are supported.

Pattern matching in Gypsum

Pattern matching in Gypsum revolves around patterns, which are a family of syntactic entities, separate from expressions and types. Each pattern describes how to recognize a value or some structure within a value. There are many different kinds of patterns, and we'll go into some detail on each of them here.

Literal patterns are the simplest kind of pattern: they match a specific, literal value. You saw these earlier in the factorial example. If you only used literal patterns, the match expression would basically be a switch statement. An example of a literal pattern:

match (x)
  case 42 => ... // matches if `x` is 42

Constant patterns are a lot like literal patterns, but they match against a symbolic constant.

let x = 42
match (y)
  case x => ... // matches if `x` equals `y`

Variable patterns match anything. They bind the matched value to a local variable. Variables patterns can also include a type test.

let x: Object = "foo"
match (x)
  case y: String => y.length  // matches if `x` is a `String`

Note that constant patterns and variable patterns without type tests use the same syntax. The compiler decides which was intended based on whether the symbol refers to another definition. This means that variable patterns cannot shadow any definition in scope.

The wildcard pattern is similar to a variable pattern, but it doesn't bind the value to a variable. It has an optional type test. It's frequently used as the last case in a match expression, since it acts as a "catch all".

match (x)
  case _: String => ...  // matches if `x` is a `String`
  case _ => ...          // matches anything else

Tuple patterns match tuples. They contain sub-patterns which match tuple components.

match (x)
  case _, y: String => ...  // matches if the second component
                            // of a tuple is a `String`

Destructuring patterns are the most generally useful kind of pattern. A destructuring pattern usually checks whether an object has a particular type, then extracts its fields and matches them against sub-patterns (which is why it's called "destructuring").

let opt: Option[String] = ...
match (opt)
  case Some[String](value) => value  // destructuring pattern for `Some`

Destructuring patterns are customizable. When a destructuring pattern is used, the compiler will look for a matcher function. The matcher function may be an independent function named by the symbol in the pattern or a static method (named try-match) of a class named by the pattern. If the name in the pattern refers to an object, the matcher function may also be a regular method of that object.

Matcher functions are expected to return an Option which is Some if the match is successful and None if not. The value inside the option may be a scalar value or a tuple; if a tuple is returned, the components are matched against multiple sub-patterns.

// Standalone matcher function.
def foo(obj: Object): Option[String] = Some[String]("foo")

// Static matcher method. This is the most common.
class Bar
  static def try-match(obj: Object): Option[(String, String)] =
    Some[(String, String)](("foo", "bar"))

// Regular matcher method.
class Matcher
  def try-match(obj: Object): Option[(String, String, String)] =
    Some[(String, String, String)](("foo", "bar", "baz"))

let baz = Matcher()

...

match (x)
  case foo(a) => ...
  case Bar(a, b) => ...
  case baz(a, b, c) => ...

This might seem a little abstract, so let's see how try-match methods would be defined for some of those arithmetic classes from the earlier example.

class ConstExpr(value: Integer) <: Expr
  static def try-match(obj: Object): Option[Integer] =
    match (obj)
      case e: ConstExpr => Some[Integer](e.value)
      case _ => None

ConstExpr is a simple case. We check whether the matched object is an instance of ConstExpr using a variable pattern match, and if it is, we return Some with its integer value. The value can be matched in a sub-pattern.

Below is the code for AddExpr. It's a bit more elaborate, since it has two components (the left and right sub-expressions). As before, we check whether the matched object is an instance of AddExpr using a variable pattern. If it is, we return Some with a pair containing the sub-expressions.

final class AddExpr(left: Expr, right: Expr) <: Expr
  static def try-match(obj: Object) =
    match (obj)
      case add: AddExpr => Some[(Expr, Expr)]((add.left, add.right))
      case _ => None

You might use these together in an optimization function which looks for AddExprs where the left and right expressions are ConstExprs and replaces them with a single ConstExpr for the sum.

def optimize(expr: Expr): Expr =
  match (expr)
    case AddExpr(ConstExpr(left), ConstExpr(right)) =>
      ConstExpr(left + right)
    case _ => expr

One other neat thing about match expressions in Gypsum is that you can use conditional expressions with each case. For example, we could use in the optimization function above to simplify AddExprs with a constant zero operand:

def optimize(expr: Expr) =
  match (expr)
    case AddExpr(left, ConstExpr(const)) if const.value == 0 => left
    case AddExpr(ConstExpr(const), right) if const.value == 0 => right
    ...
    case _ => expr

One last thing to watch out for with match expressions: each pattern will be attempted in order. It is the programmer's responsibility to ensure that at least one of the patterns will match any incoming value. If none of the patterns match, a MatchException will be thrown.

Other uses for patterns in Gypsum

Patterns are used in a few places outside match expressions. In particular, you can use patterns in variable and parameter definitions.

def f(tuple: (String, String), _: Object) =
  let x, y = tuple

Any pattern used here must always be matchable. It doesn't make sense to perform type tests and casts in variable definitions and function calls. What would be the point of a statically typed language?

The other major use of pattern matching is in catch blocks.

try
  function-that-might-throw(x, y)
catch
  case e: NullPointerException => log(e.to-string)
  case e: ArrayIndexOutOfBoundsException => log(e.to-string)

If none of the cases are matched in a catch block, the exception is automatically rethrown.

Future work

Although pattern matching in Gypsum is already at a very useful stage, there are still plenty of improvements that can be made. A syntactic wart you may have noticed above is the need to specify type arguments for destructuring patterns, even when the type arguments are obvious. I plan to add more advanced type inference in the near future so these won't be necessary.

Another nice thing would be if multiple patterns could use the same expression. This would be especially useful for exception handling, since you frequently want to handle different kinds of exceptions the same way.

try
  function-that-might-throw(x, y)
catch
  case e: NullPointerException |
  case e: ArrayIndexOutOfBoundsException => log(e.to-string)

Finally, I'd like the compiler to warn the programmer when a match expression is not exhaustive. Frequently, you want to match against all sub-classes of a given base class. The compiler should tell you when you leave one out (or more commonly, when you add a subclass without updating the match).

match (expr)
  case ConstExpr(const) => ...
  case NegExpr(expr) => ...
  case AddExpr(left, right) => ...
  case MulExpr(left, right) => ...
  case DivExpr(left, right) => ...

// Warn the user that `SubExpr` is missing!