Pattern matching in Gypsum
Published on 2016-01-01
Have you written code in a language that has pattern matching? It's an incredibly powerful tool, and I really miss it when I'm writing code at work. I've always found it strange that this feature hasn't found its way into Java, C++, or some other mainstream language.
Introduction to pattern matching
You might think of pattern matching as a
switch statement on steroids. You examine a value using several patterns, then execute an expression based on which one of the patterns successfully matched.
switch statements are a primitive form of pattern matching: they only let you compare values with known constants. With true pattern matching, you can perform type tests, structural tests, and you can bind parts of the value to variables which can be referenced inside the expression for the matching case. This is amazingly useful for dealing with structured, polymorphic data.
Pattern matching is probably best explained with examples. The code below is written in Gypsum, which I'm proud to say now supports pattern matching.
The example below is a factorial function. You could implement this easily with an
if expression, but it's good to start with something simple.
def factorial(n: i64): i64 = // The `match` expression performs a pattern match on the // expression in parenthesis. match (n) // The first two patterns match specific values. // If `n` is 0 or 1, then 1 is returned. case 0 => 1 case 1 => 1 // The `_` pattern matches anything that wasn't already // matched in the cases above. case _ => n * factorial(n - 1)
Options are objects which wrap values that may or may not be present. An option will either be
Some if a value is present or
None if not. Options are used as a safe replacement for
null. If you miss a check for
null, the compiler won't complain, and you may get a
NullPointerException at run-time. You need to explicitly unwrap options, or the compiler will give you a type error. Pattern matching makes it very easy to do this unwrapping.
import std.Option, Some // This function gets the value inside an option or returns // a default value if there is none. def get-or-else[static T](opt: Option[T], default: T) = match (opt) // This case will match if there is something inside the // option. The value will be bound to the variable `value`. case Some[T](value) => value // Cases match in order, so this will match if the previous // case didn't match, which means there's nothing inside. case _ => default
The example below evaluates arithmetic expression trees. This might be used in an interpreter. See arithmetic-evaluation.gy for the full example.
class Integer(value: i64) ... abstract class Expr final class ConstExpr(value: Integer) <: Expr ... final case NegExpr(expr: Expr) <: Expr ... final case AddExpr(left: Expr, right: Expr) <: Expr ... final case SubExpr(left: Expr, right: Expr) <: Expr ... final case MulExpr(left: Expr, right: Expr) <: Expr ... final case DivExpr(left: Expr, right: Expr) <: Expr ... // This function recursively evaluates an arithmetic expression // tree. For each node in the tree, we determine what kind of // expression it is, evaluate any sub-expressions, then perform // an arithemtic operation to combine the results. def evaluate(expr: Expr): Integer = // This `match` uses destructuring patterns. These patterns // check the type of `expr`, and if the type matches, they // extract internal values and bind them to local variables // such as `left` and `right`. match (expr) case ConstExpr(value) => value case NegExpr(subexpr) => -evaluate(subexpr) case AddExpr(left, right) => evaluate(left) + evaluate(right) case SubExpr(left, right) => evaluate(left) - evaluate(right) case MulExpr(left, right) => evaluate(left) * evaluate(right) case DivExpr(left, right) => evaluate(left) / evaluate(right)
Hopefully those examples gave you some idea of why pattern matching is a desirable feature. It is widely supported in functional languages like Haskell, OCaml, F#, and Scala, but unfortunately, there's very little support in mainstream languages. Gypsum now supports pattern matching (which is why I'm writing this).
Aside: Options and tuples in Gypsum
In order to support pattern matching, I've added two useful data structures to the Gypsum standard library: options and tuples.
Options are pretty much as described above. There is an
Option base class, which describes options in general.
Some is a subclass that represents options that actually have a value.
None-class and its singleton value
None represents options that have no value. You can see the full source code in option.gy (it's actually quite short).
import std.Option, Some, None let foo: Option[String] = Some[String]("something") let bar: Option[String] = None
Tuples are objects that contain a fixed number of values of various types. Tuples are normal classes in the standard library, but Gypsum has some special syntactic sugar to support them.
import std.Tuple2 // This statement uses normal Gypsum syntax (no syntactic sugar). let foo: Tuple2[String, String] = Tuple2[String, String]("abc", "def") // This statement uses special tuple syntax for the type and expression. let foo: (String, String) = ("abc", "def")
The source code for tuples is in tuple.gy. Tuples with up to 10 elements are supported.
Pattern matching in Gypsum
Pattern matching in Gypsum revolves around patterns, which are a family of syntactic entities, separate from expressions and types. Each pattern describes how to recognize a value or some structure within a value. There are many different kinds of patterns, and we'll go into some detail on each of them here.
Literal patterns are the simplest kind of pattern: they match a specific, literal value. You saw these earlier in the factorial example. If you only used literal patterns, the
match expression would basically be a
switch statement. An example of a literal pattern:
match (x) case 42 => ... // matches if `x` is 42
Constant patterns are a lot like literal patterns, but they match against a symbolic constant.
let x = 42 match (y) case x => ... // matches if `x` equals `y`
Variable patterns match anything. They bind the matched value to a local variable. Variables patterns can also include a type test.
let x: Object = "foo" match (x) case y: String => y.length // matches if `x` is a `String`
Note that constant patterns and variable patterns without type tests use the same syntax. The compiler decides which was intended based on whether the symbol refers to another definition. This means that variable patterns cannot shadow any definition in scope.
The wildcard pattern is similar to a variable pattern, but it doesn't bind the value to a variable. It has an optional type test. It's frequently used as the last case in a
match expression, since it acts as a "catch all".
match (x) case _: String => ... // matches if `x` is a `String` case _ => ... // matches anything else
Tuple patterns match tuples. They contain sub-patterns which match tuple components.
match (x) case _, y: String => ... // matches if the second component // of a tuple is a `String`
Destructuring patterns are the most generally useful kind of pattern. A destructuring pattern usually checks whether an object has a particular type, then extracts its fields and matches them against sub-patterns (which is why it's called "destructuring").
let opt: Option[String] = ... match (opt) case Some[String](value) => value // destructuring pattern for `Some`
Destructuring patterns are customizable. When a destructuring pattern is used, the compiler will look for a matcher function. The matcher function may be an independent function named by the symbol in the pattern or a static method (named
try-match) of a class named by the pattern. If the name in the pattern refers to an object, the matcher function may also be a regular method of that object.
Matcher functions are expected to return an
Option which is
Some if the match is successful and
None if not. The value inside the option may be a scalar value or a tuple; if a tuple is returned, the components are matched against multiple sub-patterns.
// Standalone matcher function. def foo(obj: Object): Option[String] = Some[String]("foo") // Static matcher method. This is the most common. class Bar static def try-match(obj: Object): Option[(String, String)] = Some[(String, String)](("foo", "bar")) // Regular matcher method. class Matcher def try-match(obj: Object): Option[(String, String, String)] = Some[(String, String, String)](("foo", "bar", "baz")) let baz = Matcher() ... match (x) case foo(a) => ... case Bar(a, b) => ... case baz(a, b, c) => ...
This might seem a little abstract, so let's see how
try-match methods would be defined for some of those arithmetic classes from the earlier example.
class ConstExpr(value: Integer) <: Expr static def try-match(obj: Object): Option[Integer] = match (obj) case e: ConstExpr => Some[Integer](e.value) case _ => None
ConstExpr is a simple case. We check whether the matched object is an instance of
ConstExpr using a variable pattern match, and if it is, we return
Some with its integer value. The value can be matched in a sub-pattern.
Below is the code for
AddExpr. It's a bit more elaborate, since it has two components (the left and right sub-expressions). As before, we check whether the matched object is an instance of
AddExpr using a variable pattern. If it is, we return
Some with a pair containing the sub-expressions.
final class AddExpr(left: Expr, right: Expr) <: Expr static def try-match(obj: Object) = match (obj) case add: AddExpr => Some[(Expr, Expr)]((add.left, add.right)) case _ => None
You might use these together in an optimization function which looks for
AddExprs where the left and right expressions are
ConstExprs and replaces them with a single
ConstExpr for the sum.
def optimize(expr: Expr): Expr = match (expr) case AddExpr(ConstExpr(left), ConstExpr(right)) => ConstExpr(left + right) case _ => expr
One other neat thing about
match expressions in Gypsum is that you can use conditional expressions with each case. For example, we could use in the optimization function above to simplify
AddExprs with a constant zero operand:
def optimize(expr: Expr) = match (expr) case AddExpr(left, ConstExpr(const)) if const.value == 0 => left case AddExpr(ConstExpr(const), right) if const.value == 0 => right ... case _ => expr
One last thing to watch out for with
match expressions: each pattern will be attempted in order. It is the programmer's responsibility to ensure that at least one of the patterns will match any incoming value. If none of the patterns match, a
MatchException will be thrown.
Other uses for patterns in Gypsum
Patterns are used in a few places outside
match expressions. In particular, you can use patterns in variable and parameter definitions.
def f(tuple: (String, String), _: Object) = let x, y = tuple
Any pattern used here must always be matchable. It doesn't make sense to perform type tests and casts in variable definitions and function calls. What would be the point of a statically typed language?
The other major use of pattern matching is in
try function-that-might-throw(x, y) catch case e: NullPointerException => log(e.to-string) case e: ArrayIndexOutOfBoundsException => log(e.to-string)
If none of the cases are matched in a catch block, the exception is automatically rethrown.
Although pattern matching in Gypsum is already at a very useful stage, there are still plenty of improvements that can be made. A syntactic wart you may have noticed above is the need to specify type arguments for destructuring patterns, even when the type arguments are obvious. I plan to add more advanced type inference in the near future so these won't be necessary.
Another nice thing would be if multiple patterns could use the same expression. This would be especially useful for exception handling, since you frequently want to handle different kinds of exceptions the same way.
try function-that-might-throw(x, y) catch case e: NullPointerException | case e: ArrayIndexOutOfBoundsException => log(e.to-string)
Finally, I'd like the compiler to warn the programmer when a
match expression is not exhaustive. Frequently, you want to match against all sub-classes of a given base class. The compiler should tell you when you leave one out (or more commonly, when you add a subclass without updating the
match (expr) case ConstExpr(const) => ... case NegExpr(expr) => ... case AddExpr(left, right) => ... case MulExpr(left, right) => ... case DivExpr(left, right) => ... // Warn the user that `SubExpr` is missing!