Writing Bazel rules: simple binary rule

Published on 2018-07-31
Edited on 2023-09-10
Tagged: bazel go

View All Posts

This article is part of the series "Writing Bazel rules".

Bazel is an open source build system created by Google. It has a number of strengths that make it a good fit for large projects: distributed build, test, and cache; integrated code generation; support for multiple languages. It also scales extremely well. Bazel is used to build targets in Google's internal monorepo, which contains billions of lines of code. Large targets may include hundreds of thousands of actions, but incremental builds can still complete in seconds.

In this series of articles, I want to focus on one of Bazel's key strengths: the ability to extend the build system to support new languages with extensions written in Starlark. This time, we'll cover writing a simple rule that compiles and links a Go binary from sources. I'll cover libraries, tests, toolchains, and more in future articles.

The learning curve for extending Bazel is steeper than simpler build systems like Make or SCons. Bazel rules are highly structured, and learning this structure takes time. However, this structure helps you avoid introducing unnecessary complication and unexpected dependencies in large, complex builds.

How Bazel works

In each of these articles, I'll cover some of the theory of how Bazel works. Since this is the first article in the series, we'll start with the basics.

Starlark

Starlark is Bazel's configuration and extension language. It's essentially Python without some of the advanced features: Starlark has no classes, exceptions, or generators, and the module system is different. Starlark avoids being Turing complete by forbidding recursion dynamically and only allowing loops over data structures with fixed size. You can find a full list of differences in Bazel's documentation and in the language spec. These limitations prevent the build system from getting too complicated; most of the complexity should be pushed out into tools.

Aside: Starlark can be used on its own outside of Bazel. Facebook's Buck build system also uses Starlark. Alan Donovan gave a talk on Starlark at GothamGo 2017 with an example of using Starlark to configure a web server. He's also published an embeddable Starlark interpreter written in Go.

Repositories, packages, rules, labels

To build things in Bazel, you need to write build files (named BUILD or BUILD.bazel). They look like this:

load("@io_bazel_rules_go//go:def.bzl", "go_binary", "go_library", "go_test")

go_library(
    name = "fetch_repo_lib",
    srcs = [
        "fetch_repo.go",
        "module.go",
        "vcs.go",
    ],
    importpath = "github.com/bazelbuild/bazel-gazelle/cmd/fetch_repo",
    visibility = ["//visibility:private"],
    deps = ["@org_golang_x_tools_go_vcs//:vcs"],
)

go_binary(
    name = "fetch_repo",
    embed = [":fetch_repo_lib"],
    visibility = ["//visibility:public"],
)

go_test(
    name = "fetch_repo_test",
    srcs = ["fetch_repo_test.go"],
    embed = [":fetch_repo_lib"],
    deps = ["@org_golang_x_tools_go_vcs//:vcs"],
)

Build files contain a number of targets, written as Starlark function calls. The syntax is declarative: you say what you want to build, not how to build it. In this example, we're defining a Go library ("fetch_repo_lib") with a handful for source files. A binary ("fetch_repo") is built from that library. We also have a test ("fetch_repo_test") built from that library and an additional source file ("fetch_repo_test.go").

Each build file implicitly defines a Bazel package. A package consists of the targets declared in the build file and all of the files in the package's directory and subdirectories, excluding targets and files defined in other packages' subdirectories. Visibility restrictions are usually applied at the package level, and globs (wildcard patterns used to match source files) end at package boundaries. Frequently (not always), you'll have one package per directory.

Targets and files are named using labels, which are strings that look like "@io_bazel_rules_go//go:def.bzl". Labels have three parts: a repository name (io_bazel_rules_go), a package name (go), and a file or target name (def.bzl). The repository name and the package name may be omitted when a label refers to something in the same repository or package.

Repositories are defined in a file called WORKSPACE, which lives in the root directory of a project. I'll get more into repository rules more in a future article. For now, just think of them as git repositories with names.

Loading, analysis, and execution

Bazel builds targets in three phases: loading, analysis, and execution (actually there are more, but these are the phases you need to understand when writing rules).

In the loading phase, Bazel reads and evaluates build files. It builds a graph of targets and dependencies. For example, if you ask to build fetch_repo_test above, Bazel will build a graph with a fetch_repo_test node that depends on fetch_repo_test.go, :fetch_repo_lib, and @org_golang_x_tools_go_vcs//:vcs via srcs, embed, and deps edges, respectively.

In the analysis phase, Bazel evaluates rules in the target graph. Rules declare files and actions that will produce those files. The output of analysis is the file-action graph. Bazel has built-in rules for Java, C++, Python, and a few other things. Other rules are implemented in Starlark. It's important to note that rules cannot directly perform any I/O; they merely tell Bazel how it should execute programs to build targets. This means rules can't make any decisions based on the contents of source files (so no automatic dependency discovery).

In the execution phase, Bazel runs actions in the file-action graph needed to produce files that are out of date. Bazel has several strategies for running actions. Locally, it runs actions within a sandbox that only exposes declared inputs. This makes builds more hermetic, since it's harder to accidentally depend on system files that vary from machine to machine. Bazel may also run actions on remote build servers where this isolation happens automatically.

Setting up the repository

Okay, we've gotten all the theory out of the way for today. Let's dive into the code. We're going to write "rules_go_simple", a simplified version of github.com/bazelbuild/rules_go. Don't worry if you don't know Go — there's not any Go code in here today, and the implementation for other languages will be mostly the same.

I've created an example repository at github.com/jayconrod/rules_go_simple. For this article, we'll be looking at the v1 branch. In later articles, we'll add features to branches with higher version numbers.

The first thing we need is a WORKSPACE file. Every Bazel project should have one of these in the repository root directory. WORKSPACE configures external dependencies that we need to build and test our project. In our case, we have one dependency, @bazel_skylib, which we use to quote strings in shell commands.

Bazel only evaluates the WORKSPACE file for the current project; WORKSPACE files of dependencies are ignored. We declare all our dependencies inside a function in deps.bzl so that projects that depend on rules_go_simple can share our dependencies.

Here's our WORKSPACE file.

workspace(name = "rules_go_simple")

load("@rules_go_simple//:deps.bzl", "go_rules_dependencies")

go_rules_dependencies()

Here's deps.bzl. Note that the _maybe function is private (since it starts with _) and cannot be loaded from other files.

load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

def go_rules_dependencies():
    """Declares external repositories that rules_go_simple depends on. This
    function should be loaded and called from WORKSPACE files."""

    # bazel_skylib is a set of libraries that are useful for writing
    # Bazel rules. We use it to handle quoting arguments in shell commands.
    _maybe(
        git_repository,
        name = "bazel_skylib",
        remote = "https://github.com/bazelbuild/bazel-skylib",
        commit = "3fea8cb680f4a53a129f7ebace1a5a4d1e035914",
    )

def _maybe(rule, name, **kwargs):
    """Declares an external repository if it hasn't been declared already."""
    if name not in native.existing_rules():
        rule(name = name, **kwargs)

Note that declaring a repository doesn't automatically download it. Bazel will only download a repository if it needs something inside.

Declaring the go_binary rule

To define our binary rule, we'll create a new file, internal/rules.bzl. We'll start with a declaration like this:

go_binary = rule(
    implementation = _go_binary_impl,
    attrs = {
        "srcs": attr.label_list(
            allow_files = [".go"],
            doc = "Source files to compile for the main package of this binary",
        ),
        "_stdlib": attr.label(
            default = "//internal:stdlib",
        ),
    },
    doc = "Builds an executable program from Go source code",
    executable = True,
)

You may want to refer to the Bazel documentation for rule and attr here. There's a lot here, so let's break it down.

Note that all rules support a set of common attributes like name, visibility, and tags. These don't need to be declared explicitly.

Implementing go_binary

Let's look at our implementation function next.

def _go_binary_impl(ctx):
    # Declare an output file for the main package and compile it from srcs. All
    # our output files will start with a prefix to avoid conflicting with
    # other rules.
    main_archive = ctx.actions.declare_file("{name}_/main.a".format(name = ctx.label.name))
    go_compile(
        ctx,
        srcs = ctx.files.srcs,
        stdlib = ctx.files._stdlib,
        out = main_archive,
    )

    # Declare an output file for the executable and link it. Note that output
    # files may not have the same name as the rule, so we still need to use the
    # prefix here.
    executable_path = "{name}_/{name}".format(name = ctx.label.name)
    executable = ctx.actions.declare_file(executable_path)
    go_link(
        ctx,
        main = main_archive,
        stdlib = ctx.files._stdlib,
        out = executable,
    )

    # Return the DefaultInfo provider. This tells Bazel what files should be
    # built when someone asks to build a go_binary rule. It also says which
    # file is executable (in this case, there's only one).
    return [DefaultInfo(
        files = depset([executable]),
        executable = executable,
    )]

Implementation functions take a single argument, a ctx object. This provides an API used to access rule attributes and to declare files and actions. It also exposes lots of useful metadata.

The first thing we do here is compile the main package. (For readers unfamiliar with Go, packages are the compilation unit; multiple .go source files may be compiled into a single .a package file). We declare a main.a output file using ctx.actions.declare_file, which returns a File object. We then call go_compile to declare the compile action (which we'll get to in just a minute).

Next, we'll link our main.a into a standalone executable. We declare our executable file, then call go_link (which we'll also define in just a minute).

Finally, we need to tell Bazel what we've done by returning a list of providers. A provider is a struct returned by a rule that contains information needed by other rules and by Bazel itself. DefaultInfo is a special provider that all rules should return. Here, we store two useful pieces of information. files is a depset (more on depsets another time) that lists the files that should be built when another rule depends on our rule or when someone runs bazel build on our rule. No one cares about the main.a file, so we just return the binary file here. And executable points to our executable file. If someone runs bazel run on our rule, this is the file that gets run.

go_compile and go_link actions

I chose to define the go_compile and go_link actions in separate functions. They could easily have been inlined in the rule above. However, actions are frequently shared by multiple rules. In future articles, when we define go_library and go_test rules, we'll need to compile more packages, and we'll need to link a new kind of binary. We can't call go_binary from those rules, so it makes sense to pull these actions out into functions in actions.bzl.

Here's go_compile:

def go_compile(ctx, *, srcs, stdlib, out):
    """Compiles a single Go package from sources.

    Args:
        ctx: analysis context.
        srcs: list of source Files to be compiled.
        stdlib: list containing an importcfg file and a package directory
            for the standard library.
        out: output .a file. Should have the importpath as a suffix,
            for example, library "example.com/foo" should have the path
            "somedir/example.com/foo.a".
    """
    stdlib_importcfg = stdlib[0]
    cmd = "go tool compile -o {out} -importcfg {importcfg} -- {srcs}".format(
        out = shell.quote(out.path),
        importcfg = shell.quote(stdlib_importcfg.path),
        srcs = " ".join([shell.quote(src.path) for src in srcs]),
    )
    ctx.actions.run_shell(
        outputs = [out],
        inputs = srcs + stdlib,
        command = cmd,
        env = {"GOPATH": "/dev/null"},  # suppress warning
        mnemonic = "GoCompile",
        use_default_shell_env = True,
    )

This function builds a Bash command to invoke the compiler, then calls run_shell to declare an action that runs that command. run_shell takes our command, a list of input files that will be made available in the sandbox, and a list of output files that Bazel will expect.

Our go_link function is similar.

def go_link(ctx, *, out, stdlib, main):
    """Links a Go executable.

    Args:
        ctx: analysis context.
        out: output executable file.
        stdlib: list containing an importcfg file and a package directory
            for the standard library.
        main: archive file for the main package.
    """
    stdlib_importcfg = stdlib[0]
    cmd = "go tool link -o {out} -importcfg {importcfg} -- {main}".format(
        out = shell.quote(out.path),
        importcfg = shell.quote(stdlib_importcfg.path),
        main = shell.quote(main.path),
    )
    ctx.actions.run_shell(
        outputs = [out],
        inputs = [main] + stdlib,
        command = cmd,
        env = {"GOPATH": "/dev/null"},  # suppress warning
        mnemonic = "GoLink",
        use_default_shell_env = True,
    )

I wanted to keep this article from getting too absurdly long, so I chose to to keep things simple instead of doing it the Right Way. In general, I'd caution against using any Bash commands in Bazel actions for several reasons. It's hard to write portable commands (macOS has different versions of most shell commands than Linux with different flags; and in Windows you'll probably need to rewrite everything in Powershell). It's hard to get quoting and escaping right (definitely use shell.quote from @bazel_skylib). It's hard to avoid including some implicit dependency. Bazel tries to isolate you from this a bit with the sandbox; I had to use use_default_shell_env = True to be able to find go on PATH. We should generally avoid using tools installed on the user's system since they may differ across systems, but again, we're keeping it simple this time.

Instead of writing Bash commands, it's better to compile tools with Bazel and use those. That lets you write more sophisticated (and reproducible) build logic in your language of choice.

Exposing a public interface

It's useful to have declarations for all public symbols in one file. This way, you can refactor your rules without requiring users to update load statements in their projects. load statements import a public symbol from another .bzl file into the current file. They also expose that symbol for other files loading the current file. So all we have to do is create one file that loads our public symbols. That's def.bzl.

load("//internal:rules.bzl", _go_binary = "go_binary")

go_binary = _go_binary

Edit: In very old versions of Bazel, simply loading a symbol in a .bzl file would make it available for loading in other files. In newer versions, a symbol must be defined in order for it to be loadable. It's still a good practice to put your public definitions in one file, but it takes a little more work. Above, we load the internal go_binary as _go_binary, then redefine that as go_binary.

Testing the go_binary rule

To test go_binary, we can define a sh_test rule that runs a go_binary rule and checks its output. Here's our build file, tests/BUILD.bazel:

load("//:def.bzl", "go_binary")

sh_test(
    name = "hello_test",
    srcs = ["hello_test.sh"],
    args = ["$(location :hello)"],
    data = [":hello"],
)

go_binary(
    name = "hello",
    srcs = [
        "hello.go",
        "message.go",
    ],
)
Our go_binary rule has two sources, hello.go and message.go. It just prints "Hello, world!". Our test has a data dependency on the hello binary. This means that when the test is run, Bazel will build hello and make it available. To avoid hardcoding the location of the binary in the test, we pass it in as an argument. See "$(location)" substitution for how this works.

Here's our test script:

#!/bin/bash

set -euo pipefail

program="$1"
got=$("$program")
want="Hello, world!"

if [ "$got" != "$want" ]; then
  cat >&2 <<EOF
got:
$got

want:
$want
EOF
  exit 1
fi

You can test this out with bazel test //tests:hello_test.