Writing Bazel rules: repository rules

Published on 2019-11-09
Edited on 2025-09-10
Tagged: bazel go

This article is part of the series "Writing Bazel rules".

Writing Bazel rules: simple binary rule
Writing Bazel rules: library rule, depsets, providers
Writing Bazel rules: data and runfiles
Writing Bazel rules: moving logic to execution
Writing Bazel rules: repository rules
Writing Bazel rules: platforms and toolchains
Writing Bazel rules: module extensions

In the last few articles, we've built a set of Bazel rules with a small but useful set of features. These rules have one major problem though: they depend on tools installed on the host system. That requires developers to install tools when they might not otherwise need. Teammates will likely end up with different versions of those tools, so they'll build different binaries, and programs may work on one machine but not another. This also breaks remote execution and caching: the host's toolchain may not be available on the remote platform.

In this article, we'll solve part of this problem by defining a repository rule that downloads a Go toolchain and generates a custom build file for our go_stdlib and go_tool_binary targets. We won't configure the rest of the rules to use the new toolchain yet — that's a complex topic that will be covered in the next article in this series.

What is a repository rule?

A repository rule is a Bazel extension written in Starlark that defines a repo, a directory containing source code and build files, usually fetched from an external source. For example, http_archive and git_repository are repository rules built into Bazel. A repository rule can only be used in a project's MODULE.bazel file, not in BUILD.bazel.

A module dependency (declared with with bazel_dep) also creates a repo directory, but it's a bit different. A module is downloaded from a module registry (typically the Bazel Central Registry) and must be published ahead of time. We'll cover modules in a later article.

A module extension also provides a way to create repo directories, though it uses repository rules to do so. A module extension is useful for integrating with an external package management system or for toolchainization. We'll also cover these in a later article.

For now, we'll just focus on repository rules, since they're a basic building block and are relatively easy to use.

A historical note: before Bazel introduced Bzlmod and MODULE.bazel files, external dependencies were declared in WORKSPACE files. Repository rules were the only way to fetch external code in workspace mode. Bazel had a very complicated evaluation model in workspace mode, so I'm glad it's being removed in Bazel 9. But repository rules continue to be useful in module mode.

Much like a regular rule, to define a repository rule, you call a special function (repository_rule) and provide an implementation function and a number of attributes. Once defined, the rule may be loaded into MODULE.bazel with a use_repo_rule call.

Let's start with a small example. Start in an empty directory. Create a file named repo.bzl with the hello_repo repository rule below. This creates repo with a file named hello.txt that contains a custom message.

def _hello_repo_impl(ctx):
    ctx.file("hello.txt", ctx.attr.message)
    ctx.file("BUILD.bazel", 'exports_files(["hello.txt"])')

hello_repo = repository_rule(
    implementation = _hello_repo_impl,
    attrs = {
        "message": attr.string(
            mandatory = True,
        ),
    },
)

Create an empty BUILD.bazel file in the same directory. This is just needed to define a package so that we can load repo.bzl.

Create a MODULE.bazel file that declares a repo using hello_repo. Be careful with the syntax here: use_repo_rule is not like a load statement: you must assign its result to a variable.

hello_repo = use_repo_rule("//:repo.bzl", "hello_repo")

hello_repo(
    name = "hello",
    message = "Hello, world!",
)

Run the command bazel build @hello//:hello.txt. Bazel will evaluate the repository rule, which will create a directory tree inside Bazel's cache. To see the result of this, run bazel info output_base to get the main output directory, then look in the external/+_repo_rules+hello subdirectory.

$ cd $(bazel info output_base)/external/+_repo_rules+hello
$ ls
BUILD.bazel  hello.txt  REPO.bazel
$ cat hello.txt
Hello, world!

Repository rules have a number of important differences compared with regular rules. The ctx parameter is a repository_ctx, which has a very different API. repository_ctx lets you download files, execute commands, and access the file system.

Another important difference is that repository rules are evaluated during the loading phase, rather than the analysis phase. This means (among many other things) that repository rules cannot create actions or depend on files created by regular actions. If a repository rule uses a custom tool in its implementation, you'll need to either download a prebuilt binary or build the tool within the repository rule without using regular actions.

`go_download` rule

Let's define a rule that downloads a Go distribution, then generates a custom build file. We'll add this to internal/repo.bzl, a new file in rules_go_simple. We keep the definition separate from rules.bzl. It's almost always a good idea to keep repository rules in separate files to minimize load statements needed: repository rules can get very complicated when they depend on other repository rules being called first.

We'll start with the rule declaration:

go_download = repository_rule(
    implementation = _go_download_impl,
    attrs = {
        "urls": attr.string_list(
            mandatory = True,
            doc = "List of mirror URLs where a Go distribution archive can be downloaded",
        ),
        "sha256": attr.string(
            mandatory = True,
            doc = "Expected SHA-256 sum of the downloaded archive",
        ),
        "goos": attr.string(
            mandatory = True,
            values = ["darwin", "linux", "windows"],
            doc = "Host operating system for the Go distribution",
        ),
        "goarch": attr.string(
            mandatory = True,
            values = ["amd64", "arm64"],
            doc = "Host architecture for the Go distribution",
        ),
        "_build_tpl": attr.label(
            default = "//internal:BUILD.bazel.go_download.tpl",
        ),
    },
    doc = "Downloads a standard Go distribution and installs a build file",
)

This definition references our implementation function (_go_download_impl, which we'll see in a moment) and defines a number of attributes.

urls lists URLs where the distribution can be downloaded. It's often a good idea to save a copy of your dependencies to your own mirror in case something disappears upstream.
sha256 is a cryptographic sum to verify the download isn't corrupted or tampered with. This also allows Bazel to cache downloads across local workspaces: Bazel's download cache is keyed by SHA-256 sums.
goos, and goarch are values we'll use to generate the build file.
_build_tpl is a label for the template we use to generate the build file. This is a hidden attribute (its name starts with _), which means it must have a default value. We point to //internal:BUILD.bazel.go_download.tpl.

Let's look at the implementation function next. I've omitted the code that generates constraints for brevity; we'll cover it in the next article when we discuss platforms and toolchains.

def _go_download_impl(ctx):
    # Download the Go distribution.
    ctx.report_progress("downloading")
    ctx.download_and_extract(
        ctx.attr.urls,
        sha256 = ctx.attr.sha256,
        strip_prefix = "go",
    )

    # Add a build file to the repository root directory.
    # We need to fill in some template parameters, based on the platform.
    ctx.report_progress("generating build file")
    os_constraint = _GOOS_TO_CONSTRAINT.get(ctx.attr.goos)
    if os_constraint == None:
        fail("unsupported goos: " + ctx.attr.goos)
    arch_constraint = _GOARCH_TO_CONSTRAINT.get(ctx.attr.goarch)
    if arch_constraint == None:
        fail("unsupported goarch: " + ctx.attr.goarch)
    constraints = [os_constraint, arch_constraint]
    constraint_str = ",\n        ".join(['"%s"' % c for c in constraints])

    substitutions = {
        "{goos}": ctx.attr.goos,
        "{goarch}": ctx.attr.goarch,
        "{exe}": ".exe" if ctx.attr.goos == "windows" else "",
        "{exec_constraints}": constraint_str,
        "{target_constraints}": constraint_str,
    }
    ctx.template(
        "BUILD.bazel",
        ctx.attr._build_tpl,
        substitutions = substitutions,
    )

We first call ctx.download_and_extract. This method downloads an archive (which may be .zip, .tar.gz, or a number of other formats), verifies its SHA-256 sum, and extracts it into the workspace directory.

Next, we'll call ctx.template. This method loads a template (which may be a file or a string), performs a number of string substitutions, then writes the content to a file. If you ever want to just copy a file, you can call ctx.template without any substitutions.

That's it. This is a relatively small rule, which is good: complexity in repository rules is harder to deal with than complexity in other parts of the build. Ideally, all rules should be this small, but this is rarely the case. For example, if you need to authenticate to a private server or communicate over a custom protocol, you may need to build and execute a custom binary. It's challenging to do this in a way that works across platforms, doesn't depend on other toolchains, and doesn't introduce non-reproducibility. Sometimes it's best to check in a pre-compiled custom binary and just run that.

We won't actually make use of our downloaded rule yet. That's in the next article.

Things to watch out for

Don't depend on the host system. Requiring a host tool or library forces developers to install something outside the repository, which makes it harder for new developers to get started. It may also break remote execution, since host tools and libraries may not be available on the remote execution platform. Instead, build tools within repository rules or download pre-compiled binaries.

Stay reproducible and deterministic. The repository_ctx API provides rules with direct access to the host system without sandboxing. Take care not to let information from the host system slip into the build, such as directory names, environment variables, or timestamps.

Understand how repository rules are executed. Repository rules can convert labels to paths with the ctx.path method. This even works on dynamically constructed labels. Be careful though: this can lead to a lot of complication. Repos declared with repo rules exist in a different namespace per module. When repo rules are used in module extensions, that's also different. It's usually better to inject dependencies through attributes, as we did with _build_tpl above, than to use ctx.path. See External dependencies overview in the official documentation for a brief explanation or Mike Bland's Migrating to Bazel Modules (a.k.a.) Bzlmod for a much more thorough treatment.

Testing is difficult. bazel-skylib has a Starlark unit testing toolchain which can help test complicated functions. However, you can't really be confident in a repository rule without a good integration test that sets up a Bazel workspace that uses the rule, recursively invokes Bazel, then verifies the result. rules_bazel_integration_test has some tools to do this. rules_go has a go_bazel_test rule that does for Go, but it can't be used directly for other languages.

Conclusion

Repository rules let developers download code, execute commands directly on the host system, and dynamically generate build files. These capabilities are powerful, but they're restricted in other parts of the build for good reasons. Downloaded data can be corrupted and must be verified. Commands on the host system may not work the same (or at all) on different platforms. No one wants to debug multiple layers of dynamically generated build files.

That said, repository rules are a useful way to fetch external code and generate build files. As we'll see later, they can also be used with module extensions to integrate with other dependency management systems like Go modules, Maven, or npm. When written correctly, they can provide a lot of value to users. So proceed with caution.

Writing Bazel rules: repository rules

What is a repository rule?

go_download rule

Things to watch out for

Conclusion

`go_download` rule