CodeSwitch API: native functions

Published on 2016-06-12
Tagged: codeswitch gypsum

As part of the API update I discussed in the previous article, I've added the capability for CodeSwitch to call native functions written in C++. This means that when you write a package, part of it can be written in Gypsum, and part of it in C++. This is useful for implementing new low-level primitives, such as files and sockets. It's also necessary for interacting with libraries written in native code, like Qt.

Declaring and registering native functions

In order for CodeSwitch to call a native function, the function must be declared in Gypsum code, and the native implementation must be registered with CodeSwitch.

Declaring a native function in Gypsum is easy. Just add the native attribute to the function definition, and don't include a body. Top-level functions may be native. Methods (both static and non-static) may also be native. Constructors and overloaded functions cannot be native.

native def top-level-function: i64

class Foo
  native def normal-method: i64

  static native def static-method: i64

Every function declared with the native keyword must have a C++ implementation that CodeSwitch can find. There are three ways to provide this implementation:

Explicit registration: functions may be registered explicitly through the CodeSwitch API. When you create a new VM you may provide a VMOptions object. This object contains (among other things) a list of tuples called nativeFunctions. Each tuple contains the name of a package, the name of a native function declared within that package, and a pointer to a C++ implementation of that function.
Native libraries: you can compile native functions for a package into a shared library (.so or .dylib file) and install that in the same directory as the CodeSwitch package. These libraries and the native functions within them will be loaded automatically when needed. Native functions loaded this way must follow a specific naming convention (details below) so the VM can find them.
Static linking: native functions may also be compiled into the same binary file as CodeSwitch. This may simplify distribution in some situations. The VM will dynamically load these functions when needed. As with native libraries, these functions must follow a specific naming convention so the VM can find them. They must also be visible in the dynamic symbol table, which may require some special attributes and compiler flags.

When CodeSwitch searches for the implementation of a function, by default, it searches the VMOptions.nativeFunctions list first, then searches native libraries. Statically linked functions are not searched by default. This search order can be changed globally by setting VMOptions.nativeFunctionSearchOrder. The search order can also be set on a per-package basis when calling VM.loadPackage.

Function naming convention

Native functions compiled into libraries or linked into the VM must follow a specific naming convention so the VM can find them. The compiled symbol name must be the package name (with dots replaced by two underscores) followed by the full declared function name (again, with dots between declaring scopes replaced by two underscores). The package name and declared name are separated by three underscores. Any characters which are not valid C identifier characters are replaced by single underscores.

This is best explained by example, so let's consider the File.exists method in the std.io package. The full name would be std__io___File__exists.

std.io      â†’ std__io        # package name
File.exists â†’ File__exists   # function name

std__io___File__exists

Native implementations must be declared with extern "C". C++ compilers usually encode some type information into compiled symbol names. This is called mangling, and it is how overloading is implemented in C++. Unfortunately, each compiler and operating system does this differently, so CodeSwitch can't reliably search for mangled symbols. Declaring with extern "C" turns off mangling.

One other detail: normally, when compiling a library, it's a good idea to exclude functions from the dynamic symbol table by default. The -fvisibility=hidden flag does this on g++ and clang. This prevents other libraries from linking to internal functions that may change. However, native functions must be visible in order for CodeSwitch to find them, so you may need to explicitly make them visible in the dynamic symbol table with __attribute__((visibility("default"))).

Let's tie all this together. Here's a full declaration of the File.exists method.

extern "C" __attribute__((visibility("default")))
bool std__io___File__exists(VM* vm, Object self) {
  auto path = getPath(vm, self);
  struct stat st;
  int ret = stat(path.c_str(), &st);
  return ret == 0;
}

Native function calling convention

Interoperability between languages is a major goal of CodeSwitch, so I wanted to make native functions feel natural. Parameters are passed in as regular parameters. Return values are just returned. Exceptions can be thrown and caught like regular C++ exceptions. There is a fairly unsurprising mapping between C++ types and Gypsum types.

Parameters

The first parameter of every native implementation must be a pointer to codeswitch::VM. This provides access to the rest of the VM, and is needed because CodeSwitch has no global state. Native functions may use this to load packages, look up functions, create new objects, or do anything else that native code can do through the API.

Native functions that implement non-static methods take a codeswitch::Object as their second parameter. This is a reference to the receiver (the object the method was called on). This is just like the this pointer in C++.

After those required parameters, native function parameters correspond directly to the parameters in the Gypsum declaration. For example, if this function is declared in Gypsum:

native def left-pad(<b>s: String, width: i64b>): String

Then the C++ function's parameters will look like this:

using codeswitch::String;
using codeswitch::VM;

extern "C" __attribute__((visibility("default")))
String leftpad___left_pad(<b>VM* vm, String s, int64_t widthb>) {
   ...
}

Return values

To return something from a native function, just return it. The return type must correspond with a Gypsum type, according to the table below. Note that unit functions in Gypsum are void in C++.

using codeswitch::VM;

extern "C" __attribute__((visibility("default")))
int64_t math___abs(VM* vm, int64_t x) {
  return x >= 0 ? x : -x;
}

Throwing and catching exceptions

Exceptions are wrapped using the codeswitch::Exception class. You can throw and catch them as you normally would in C++. A reference to the actual exception object being thrown can be retrieved with the get method.

using codeswitch::Exception;
using codeswitch::Object;
using codeswitch::VM;

extern "C" __attribute__((visibility("default")))
void utils___frob(VM* vm, double x) {
  try {
    frob();
  } catch (Exception& e) {
    Object obj = e.get();
    log(obj);
    throw e;
  }
}

Types in Gypsum and C++

CodeSwitch converts between Gypsum and C++ types according to the table below. Since type information is lost after C++ code is compiled, CodeSwitch has no way of checking types in native code. Correctness is therefore up to you. Bad things will happen if the types are wrong.

Gypsum type	C++ type
unit	void
boolean	bool
i8	int8_t
i16	int16_t
i32	int32_t
i64	int64_t
f32	float
f64	double
String	codeswitch::String
any other object	codeswitch::Object

Conclusion

I'm really pleased with how native functions interact with interpreted code. I think CodeSwitch has a much more intuitive system than JNI or V8.

CodeSwitch can be more intuitive because native code that uses the API must be written in C++. This lets us use objects, destructors, and exceptions in a way that makes sense.

This is a strength, but it's also a weakness: there are many languages that compile to native code that I'd like CodeSwitch to interoperate with. C, Rust, and Haskell all come to mind. Most other languages provide a foreign function interface for C because C is the lowest common denominator of native languages: everything is compatible with it. I may provide a separate interface for C in the future, but I think C++ will always have the primary native API.