Writing Bazel rules: repository rules

Published on 2019-11-09
Edited on 2020-02-01
Tagged: bazel go

View All Posts

This article is part of the series "Writing Bazel rules".

In the last few articles, we've built a set of Bazel rules with a small but useful set of features. These rules have one major problem though: they depend on a toolchain installed on the host system. This causes several issues. It requires developers to install a toolchain when they might not otherwise need to. It means that builds are not reproducible: if two developers on the same project have different versions of Go installed, they'll build different binaries. It also breaks remote execution: the host's toolchain may not be available on the execution platform.

In this article, we'll start solving these problems by defining a repository rule that downloads a Go toolchain and generates a custom build file. We won't get all the way through configuring the rest of the rules to use the new toolchain — that's a complex topic that will be covered in the next article in this series.

What is a repository rule?

A repository rule is a special function that can be used in a WORKSPACE file to define an external workspace. You've probably used http_archive and git_repository already. These are repository rules that ship with Bazel.

Much like regular rules, you can define a repository rule by calling a special function (repository_rule) and providing an implementation function and a number of attributes. Once defined, the rule may be used in a WORKSPACE file or a function called from WORKSPACE. Bazel will call a repository rule's implementation function when a file is needed in an external workspace defined with that repository rule.

Let's start with a small example. Create a file named deps.bzl with the hello_repo repository rule below. This can be used to create an external workspace with a file named hello.txt that contains a custom message.

def _hello_repo_impl(ctx):
    ctx.file("hello.txt", ctx.attr.message)
    ctx.file("BUILD.bazel", 'exports_files(["hello.txt"])')

hello_repo = repository_rule(
    implementation = _hello_repo_impl,
    attrs = {
        "message": attr.string(
            mandatory = True,
        ),
    },
)

Now create a WORKSPACE file that defines an external workspace using hello_repo.

load("//:deps.bzl", "hello_repo")

hello_repo(
    name = "hello",
    message = "Hello, world!",
)

Finally, create an empty BUILD.bazel file in the same directory.

Run the command bazel build @hello//:hello.txt. Bazel will evaluate the repository rule, which will create a directory tree inside Bazel's cache. To see the result of this, run bazel info output_base to get the main output directory, then look in the external/hello subdirectory.

$ cd $(bazel info output_base)/external/hello
$ ls
BUILD.bazel  hello.txt  WORKSPACE
$ cat hello.txt
Hello, world!

Repository rules have a number of important differences compared with regular rules. The ctx parameter is a repository_ctx, which has a very different API. repository_ctx lets you download files, execute commands, and access the file system.

Another important difference is that repository rules are evaluated during the loading phase, rather than the analysis phase. This means (among many other things) that repository rules cannot create actions or depend on files created by actions. The Bazel documentation has more information.

go_download rule

Let's define a rule that downloads a Go distribution, then installs a custom build file. This will be defined in a new file within rules_go_simple, //internal:repo.bzl.

We'll start with the rule declaration:

go_download = repository_rule(
    implementation = _go_download_impl,
    attrs = {
        "urls": attr.string_list(
            mandatory = True,
            doc = "List of mirror URLs for a Go distribution archive",
        ),
        "sha256": attr.string(
            mandatory = True,
            doc = "Expected SHA-256 sum of the downloaded archive",
        ),
        "goos": attr.string(
            mandatory = True,
            values = ["darwin", "linux", "windows"],
            doc = "Host operating system for the Go distribution",
        ),
        "goarch": attr.string(
            mandatory = True,
            values = ["amd64"],
            doc = "Host architecture for the Go distribution",
        ),
        "_build_tpl": attr.label(
            default = "@rules_go_simple//internal:BUILD.dist.bazel.tpl",
        ),
    },
    doc = "Downloads a standard Go distribution and installs a build file",
)

This definition references our implementation function (_go_download_impl, which we'll see in a moment) and defines a number of attributes.

Let's look at the implementation function next. I've omitted the code that generates constraints for brevity; we'll cover it in the next article.

def _go_download_impl(ctx):
    # Download the Go distribution.
    ctx.report_progress("downloading")
    ctx.download_and_extract(
        ctx.attr.urls,
        sha256 = ctx.attr.sha256,
        stripPrefix = "go",
    )

    # Add a build file to the repository root directory.
    # We need to fill in some template parameters, based on the platform.
    constraint_str = ... # omitted for brevity

    substitutions = {
        "{goos}": ctx.attr.goos,
        "{goarch}": ctx.attr.goarch,
        "{exe}": ".exe" if ctx.attr.goos == "windows" else "",
        "{exec_constraints}": constraint_str,
        "{target_constraints}": constraint_str,
    }
    ctx.template(
        "BUILD.bazel",
        ctx.attr._build_tpl,
        substitutions = substitutions,
    )

We first call ctx.download_and_extract. This method downloads an archive (which may be .zip, .tar.gz, or a number of other formats), verifies its SHA-256 sum, and extracts it into the workspace directory.

Next, we'll call ctx.template. This method loads a template (which may be a file or a string), performs a number of string substitutions, then writes the content to a file. Note that if you ever want to just copy a file, you can call ctx.template without any substitutions.

That's it. This is a relatively small rule, which is good: complexity in repository rules is harder to deal with than complexity in other parts of the build. Ideally, all rules can be this small, but this is rarely the case. For example, if you need to authenticate to a private server or communicate over a custom protocol, you may need to build and execute a custom binary. It's challenging to do this in a way that works across platforms, doesn't depend on other toolchains, and doesn't introduce non-reproducibility. Sometimes it's best to check in a pre-compiled custom binary and just run that.

Things to watch out for

Avoid depending on the host system. Requiring a host tool or library forces developers to install something outside the repository, which makes it harder for new developers to get started. It may also break remote execution, since host tools and libraries may not be available on the remote execution platform.

Try to stay reproducible and deterministic. The repository_ctx API provides rules with direct access to the host system without sandboxing. Take care not to let information from the host system slip into the build, such as directory names, environment variables, or timestamps.

Understand how repository rules are executed. Repository rules can resolve labels with the ctx.path method. If resolving a label requires another repository rule to be evaluated first, evaluation of the current repository rule will be stopped and restarted later from the beginning. Try to resolve any labels that need to be resolved at the beginning of the rule body, before anything that takes a long time like an uncached download. After being successfully evaluated, a repository rule may be re-evaluated if a call in WORKSPACE changes, the .bzl file containing the rule definition changes, or the file pointed to by any resolved label changes.

Avoid depending on external Starlark code. Repository rules are typically loaded from .bzl files before most external workspaces have been defined. Requiring a library like bazel-skylib adds complication for users writing WORKSPACE files, since they must ensure your dependencies are loaded first. It may be better to keep a private copy of these dependencies in your own repository.

Testing is difficult. bazel-skylib has a Starlark unit testing toolchain which can help test complicated functions. However, you can't really be confident in a repository rule without a good integration test that sets up a Bazel workspace that uses the rule, recursively invokes Bazel, then verifies the result. bazel-integration-testing has some tools to do this. rules_go has a go_bazel_test rule that does for Go, but in may not be useful for other languages.

Conclusion

Repository rules let developers download code, execute commands directly on the host system, and dynamically generate build files. These capabilities are powerful, but they're restricted in other parts of the build for good reasons. Downloaded data could be corrupted and must be verified. Commands on the host system may not work the same (or at all) on different platforms. No one wants to debug multiple layers of dynamically generated build files.

That said, repository rules are the only way to integrate with other dependency management systems like Go modules, Maven, or npm. When written correctly, they can provide a lot of value to users. So proceed with caution.