Writing Bazel rules: repository rules
In the last few articles, we've built a set of Bazel rules with a small but useful set of features. These rules have one major problem though: they depend on tools installed on the host system. That requires developers to install tools when they might not otherwise need. Teammates will likely end up with different versions of those tools, so they'll build different binaries, and programs may work on one machine but not another. This also breaks remote execution and caching: the host's toolchain may not be available on the remote platform.
In this article, we'll solve part of this problem by defining a repository rule that downloads a Go toolchain and generates a custom build file for our go_stdlib
and go_tool_binary
targets. We won't configure the rest of the rules to use the new toolchain yet — that's a complex topic that will be covered in the next article in this series.
What is a repository rule?
A repository rule is a Bazel extension written in Starlark that defines a repo, a directory containing source code and build files, usually fetched from an external source. For example, http_archive
and git_repository
are repository rules built into Bazel. A repository rule can only be used in a project's MODULE.bazel
file, not in BUILD.bazel
.
A module dependency (declared with with bazel_dep
) also creates a repo directory, but it's a bit different. A module is downloaded from a module registry (typically the Bazel Central Registry) and must be published ahead of time. We'll cover modules in a later article.
A module extension also provides a way to create repo directories, though it uses repository rules to do so. A module extension is useful for integrating with an external package management system or for toolchainization. We'll also cover these in a later article.
For now, we'll just focus on repository rules, since they're a basic building block and are relatively easy to use.
A historical note: before Bazel introduced Bzlmod and MODULE.bazel
files, external dependencies were declared in WORKSPACE
files. Repository rules were the only way to fetch external code in workspace mode. Bazel had a very complicated evaluation model in workspace mode, so I'm glad it's being removed in Bazel 9. But repository rules continue to be useful in module mode.
Much like a regular rule, to define a repository rule, you call a special function (repository_rule
) and provide an implementation function and a number of attributes. Once defined, the rule may be loaded into MODULE.bazel
with a use_repo_rule
call.
Let's start with a small example. Start in an empty directory. Create a file named repo.bzl
with the hello_repo
repository rule below. This creates repo with a file named hello.txt that contains a custom message.
def _hello_repo_impl(ctx):
ctx.file("hello.txt", ctx.attr.message)
ctx.file("BUILD.bazel", 'exports_files(["hello.txt"])')
hello_repo = repository_rule(
implementation = _hello_repo_impl,
attrs = {
"message": attr.string(
mandatory = True,
),
},
)
Create an empty BUILD.bazel
file in the same directory. This is just needed to define a package so that we can load repo.bzl
.
Create a MODULE.bazel
file that declares a repo using hello_repo
. Be careful with the syntax here: use_repo_rule
is not like a load
statement: you must assign its result to a variable.
hello_repo = use_repo_rule("//:repo.bzl", "hello_repo")
hello_repo(
name = "hello",
message = "Hello, world!",
)
Run the command bazel build @hello//:hello.txt
. Bazel will evaluate the repository rule, which will create a directory tree inside Bazel's cache. To see the result of this, run bazel info output_base
to get the main output directory, then look in the external/+_repo_rules+hello
subdirectory.
$ cd $(bazel info output_base)/external/+_repo_rules+hello
$ ls
BUILD.bazel hello.txt REPO.bazel
$ cat hello.txt
Hello, world!
Repository rules have a number of important differences compared with regular rules. The ctx
parameter is a repository_ctx
, which has a very different API. repository_ctx
lets you download files, execute commands, and access the file system.
Another important difference is that repository rules are evaluated during the loading phase, rather than the analysis phase. This means (among many other things) that repository rules cannot create actions or depend on files created by regular actions. If a repository rule uses a custom tool in its implementation, you'll need to either download a prebuilt binary or build the tool within the repository rule without using regular actions.
go_download
rule
Let's define a rule that downloads a Go distribution, then generates a custom build file. We'll add this to internal/repo.bzl
, a new file in rules_go_simple. We keep the definition separate from rules.bzl
. It's almost always a good idea to keep repository rules in separate files to minimize load
statements needed: repository rules can get very complicated when they depend on other repository rules being called first.
We'll start with the rule declaration:
go_download = repository_rule(
implementation = _go_download_impl,
attrs = {
"urls": attr.string_list(
mandatory = True,
doc = "List of mirror URLs where a Go distribution archive can be downloaded",
),
"sha256": attr.string(
mandatory = True,
doc = "Expected SHA-256 sum of the downloaded archive",
),
"goos": attr.string(
mandatory = True,
values = ["darwin", "linux", "windows"],
doc = "Host operating system for the Go distribution",
),
"goarch": attr.string(
mandatory = True,
values = ["amd64", "arm64"],
doc = "Host architecture for the Go distribution",
),
"_build_tpl": attr.label(
default = "//internal:BUILD.bazel.go_download.tpl",
),
},
doc = "Downloads a standard Go distribution and installs a build file",
)
This definition references our implementation function (_go_download_impl
, which we'll see in a moment) and defines a number of attributes.
urls
lists URLs where the distribution can be downloaded. It's often a good idea to save a copy of your dependencies to your own mirror in case something disappears upstream.sha256
is a cryptographic sum to verify the download isn't corrupted or tampered with. This also allows Bazel to cache downloads across local workspaces: Bazel's download cache is keyed by SHA-256 sums.goos
, andgoarch
are values we'll use to generate the build file._build_tpl
is a label for the template we use to generate the build file. This is a hidden attribute (its name starts with_
), which means it must have a default value. We point to //internal:BUILD.bazel.go_download.tpl.
Let's look at the implementation function next. I've omitted the code that generates constraints for brevity; we'll cover it in the next article when we discuss platforms and toolchains.
def _go_download_impl(ctx):
# Download the Go distribution.
ctx.report_progress("downloading")
ctx.download_and_extract(
ctx.attr.urls,
sha256 = ctx.attr.sha256,
strip_prefix = "go",
)
# Add a build file to the repository root directory.
# We need to fill in some template parameters, based on the platform.
ctx.report_progress("generating build file")
os_constraint = _GOOS_TO_CONSTRAINT.get(ctx.attr.goos)
if os_constraint == None:
fail("unsupported goos: " + ctx.attr.goos)
arch_constraint = _GOARCH_TO_CONSTRAINT.get(ctx.attr.goarch)
if arch_constraint == None:
fail("unsupported goarch: " + ctx.attr.goarch)
constraints = [os_constraint, arch_constraint]
constraint_str = ",\n ".join(['"%s"' % c for c in constraints])
substitutions = {
"{goos}": ctx.attr.goos,
"{goarch}": ctx.attr.goarch,
"{exe}": ".exe" if ctx.attr.goos == "windows" else "",
"{exec_constraints}": constraint_str,
"{target_constraints}": constraint_str,
}
ctx.template(
"BUILD.bazel",
ctx.attr._build_tpl,
substitutions = substitutions,
)
We first call ctx.download_and_extract
. This method downloads an archive (which may be .zip, .tar.gz, or a number of other formats), verifies its SHA-256 sum, and extracts it into the workspace directory.
Next, we'll call ctx.template
. This method loads a template (which may be a file or a string), performs a number of string substitutions, then writes the content to a file. If you ever want to just copy a file, you can call ctx.template
without any substitutions.
That's it. This is a relatively small rule, which is good: complexity in repository rules is harder to deal with than complexity in other parts of the build. Ideally, all rules should be this small, but this is rarely the case. For example, if you need to authenticate to a private server or communicate over a custom protocol, you may need to build and execute a custom binary. It's challenging to do this in a way that works across platforms, doesn't depend on other toolchains, and doesn't introduce non-reproducibility. Sometimes it's best to check in a pre-compiled custom binary and just run that.
We won't actually make use of our downloaded rule yet. That's in the next article.
Things to watch out for
Don't depend on the host system. Requiring a host tool or library forces developers to install something outside the repository, which makes it harder for new developers to get started. It may also break remote execution, since host tools and libraries may not be available on the remote execution platform. Instead, build tools within repository rules or download pre-compiled binaries.
Stay reproducible and deterministic. The repository_ctx
API provides rules with direct access to the host system without sandboxing. Take care not to let information from the host system slip into the build, such as directory names, environment variables, or timestamps.
Understand how repository rules are executed. Repository rules can convert labels to paths with the ctx.path
method. This even works on dynamically constructed labels. Be careful though: this can lead to a lot of complication. Repos declared with repo rules exist in a different namespace per module. When repo rules are used in module extensions, that's also different. It's usually better to inject dependencies through attributes, as we did with _build_tpl
above, than to use ctx.path
. See External dependencies overview in the official documentation for a brief explanation or Mike Bland's Migrating to Bazel Modules (a.k.a.) Bzlmod for a much more thorough treatment.
Testing is difficult. bazel-skylib has a Starlark unit testing toolchain which can help test complicated functions. However, you can't really be confident in a repository rule without a good integration test that sets up a Bazel workspace that uses the rule, recursively invokes Bazel, then verifies the result. rules_bazel_integration_test has some tools to do this. rules_go has a go_bazel_test
rule that does for Go, but it can't be used directly for other languages.
Conclusion
Repository rules let developers download code, execute commands directly on the host system, and dynamically generate build files. These capabilities are powerful, but they're restricted in other parts of the build for good reasons. Downloaded data can be corrupted and must be verified. Commands on the host system may not work the same (or at all) on different platforms. No one wants to debug multiple layers of dynamically generated build files.
That said, repository rules are a useful way to fetch external code and generate build files. As we'll see later, they can also be used with module extensions to integrate with other dependency management systems like Go modules, Maven, or npm. When written correctly, they can provide a lot of value to users. So proceed with caution.