Writing Bazel rules: platforms and toolchains
One of Bazel's biggest strengths is its ability to isolate a build from the host system. This enables reproducible builds and remote execution, which lets Bazel scale to huge projects. This isolation isn't completely automatic; rules must cooperate with Bazel to ensure the correct tools are used when the host, execution, and target platforms may all be different.
In the previous article, we defined a repository rule which let us download and verify a Go toolchain. This time, we'll configure our simple set of rules to use that toolchain. After this, our rules will be almost completely independent from the host system. Our users will be able checkout and build a Go project without needing to install Go themselves.
Concepts
Before we get to the actual code, let's go over some platform and toolchain jargon. There's a lot this time. You may also want to read through the official documentation on Platforms and Toolchains.
A platform is a description of where software can run, defined with the platform
rule. The host platform is where Bazel itself runs. The execution platform is where actions run. Normally, this is the same as the host platform, but if you're using remote execution, the execution platform may be different. The target platform is where the software you're building should run. By default, this is also the same as the host platform, but if you're cross-compiling, it can be different.
A platform is described by a list of constraint values, defined with the constraint_value
rule. A constraint value is a fact about a platform, for example, the CPU is x86_64, or the operating system is Linux. There are a number of constraint values defined in the platforms
module. Bazel itself depends on this module, but if you use it, you should declare your own dependency with bazel_dep
in MODULE.bazel
so you get a predictable minimum version. You can list constraints in that module with bazel query @platforms//...
. You can also define your own.
A constraint setting is a category of constraint values, at most one of which may be true for any platform. A constraint setting is defined with the constraint_setting
rule. @platforms//os:os
and @platforms//cpu:cpu
are the two main settings to worry about, but again, you can define your own.
A toolchain is a special target defined with the toolchain
rule that associates a toolchain implementation with a list of constraint values for both the target and execution platforms.
A toolchain type is a target defined with the toolchain_type
rule, which is a name that identifies a kind of toolchain.
A toolchain implementation is a target that represents the actual toolchain by listing the files that are part of the toolchain (for example, the compiler and standard library) and code needed to use the toolchain. A toolchain implementation must return a ToolchainInfo
provider.
So that's a lot to take in. How does it all fit together?
Anyone who's defining a toolchain needs to declare a toolchain_type
target. This is just a unique symbol.
The actual toolchains are defined with toolchain
targets that point to implementations. We'll define a go_toolchain
rule for our implementation, but you can use any rule that returns a ToolchainInfo
provider.
A rule can request a toolchain using its type by setting the toolchains
parameter in its rule
declaration. The rule implementation can then access the toolchain through ctx.toolchains
.
Users register toolchains they'd like to use by calling the register_toolchains
function in their MODULE.bazel
file or by passing the --extra_toolchains
flag on the command line.
Finally, when Bazel begins a build, it checks the constraints for the execution and target platforms. It then selects a suitable set of toolchains that are compatible with those constraints. Bazel will provide the ToolchainInfo
objects of those toolchains to the rules that request them.
…
Got all that? Actually I'm not sure I do either. It's an elegant system, but it's difficult to grasp. If you want to see how Bazel selects or rejects registered toolchains, use the --toolchain_resolution_debug
flag.
The key insight is that Bazel's toolchains are a dynamic dependency injection system. If you've ever used something like Dagger or Guice, Bazel's toolchains are conceptually similar. A toolchain type is like an interface. A toolchain is like a static method with a @Provides
annotation. A rule that requires a toolchain is like a constructor with an @Inject
annotation. The system automatically finds a suitable implementation for every injected interface, based on the constraint values in the execution and target platforms.
Migrating rules to toolchains
Let's start using toolchains in rules_go_simple
. We're now on the v5
branch.
First, we'll declare a toolchain_type
. Rules can request this with the label @rules_go_simple//:toolchain_type
.
toolchain_type(
name = "toolchain_type",
visibility = ["//visibility:public"],
)
Since a toolchain_type
is basically an interface, we should document what can be done with that interface. Starlark is a dynamically typed language, and there's no place to write down required method or field names. I declared a dummy provider in providers.bzl
with some documentation, but you could write this in a README or wherever makes sense for your project.
Next, we'll create our toolchain implementation rule, go_toolchain
.
def _go_toolchain_impl(ctx):
# Find important files and paths.
go_cmd = find_go_cmd(ctx.files.tools)
env = {"GOROOT": paths.dirname(paths.dirname(go_cmd.path))}
# Return a ToolchainInfo provider. This is the object that rules get
# when they ask for the toolchain.
return [platform_common.ToolchainInfo(
# Functions that generate actions. Rules may call these.
# This is the public interface of the toolchain.
compile = go_compile,
link = go_link,
build_test = go_build_test,
# Internal data. Contents may change without notice.
# Think of these like private fields in a class. Actions may use these
# (they are methods of the class) but rules may not (they are clients).
internal = struct(
go_cmd = go_cmd,
env = env,
builder = ctx.executable.builder,
tools = ctx.files.tools,
stdlib = ctx.file.stdlib,
),
)]
go_toolchain = rule(
implementation = _go_toolchain_impl,
attrs = {
"builder": attr.label(
mandatory = True,
executable = True,
cfg = "exec",
doc = "Executable that performs most actions",
),
"tools": attr.label_list(
mandatory = True,
doc = "Compiler, linker, and other executables from the Go distribution",
),
"stdlib": attr.label(
mandatory = True,
allow_single_file = True,
cfg = "target",
doc = "Package files for the standard library compiled by go_stdlib",
),
},
doc = "Gathers functions and file lists needed for a Go toolchain",
)
go_toolchain
is a normal rule that returns a ToolchainInfo
provider. When rules request the toolchain, they will get one of these structs. There are no mandatory fields, so you can put anything in here. I included three "methods" (which are actually just functions): compile
, link
, and build_test
. These correspond with the actions our rules need to create, so rules will call these instead of creating actions directly. I also included an internal
struct field, which includes private files and metadata. Our methods may access the internal
struct, but clients of the toolchain should not, since these values can change without notice.
Next, we'll declare go_toolchain
and toolchain
targets in BUILD.bazel.go_download.tpl
. This file is a template that gets expanded into a build file for the go_download
repository rule. See the previous article for details.
# toolchain_impl gathers information about the Go toolchain.
# See the GoToolchain provider.
go_toolchain(
name = "toolchain_impl",
builder = ":builder",
stdlib = ":stdlib",
tools = [":tools"],
)
# toolchain is a Bazel toolchain that expresses execution and target
# constraints for toolchain_impl. This target should be registered by
# calling register_toolchains in a WORKSPACE file.
toolchain(
name = "toolchain",
exec_compatible_with = [
{exec_constraints},
],
target_compatible_with = [
{target_constraints},
],
toolchain = ":toolchain_impl",
toolchain_type = "@rules_go_simple//:toolchain_type",
)
We use the goos
and goarch
attributes to set the {exec_constraints}
and {target_constraints}
template parameters in the go_download
rule. See repo.bzl
.
To complete the toolchain implementation, we'll modify our go_compile
, go_link
, and go_build_test
functions. They can obtain the toolchain using ctx.toolchains
. Here's go_compile
after this change:
def go_compile(ctx, *, srcs, importpath, deps, out):
"""Compiles a single Go package from sources.
Args:
ctx: analysis context.
srcs: list of source Files to be compiled.
importpath: the path other libraries may use to import this package.
deps: list of GoLibraryInfo objects for direct dependencies.
out: output .a File.
"""
toolchain = ctx.toolchains["@rules_go_simple//:toolchain_type"]
args = ctx.actions.args()
args.add("compile")
args.add("-stdlib", toolchain.internal.stdlib.path)
dep_infos = [d.info for d in deps]
args.add_all(dep_infos, before_each = "-arc", map_each = _format_arc)
if importpath:
args.add("-p", importpath)
args.add("-o", out)
args.add_all(srcs)
inputs = (srcs +
[dep.info.archive for dep in deps] +
[toolchain.internal.stdlib] +
toolchain.internal.tools)
ctx.actions.run(
outputs = [out],
inputs = inputs,
executable = toolchain.internal.builder,
arguments = [args],
env = toolchain.internal.env,
mnemonic = "GoCompile",
)
Finally, we'll update our rules to request the toolchain and call these functions. Here's go_library
after this change.
def _go_library_impl(ctx):
# Load the toolchain.
toolchain = ctx.toolchains["@rules_go_simple//:toolchain_type"]
# Declare an output file for the library package and compile it from srcs.
archive = ctx.actions.declare_file("{name}.a".format(name = ctx.label.name))
toolchain.compile(
ctx,
srcs = ctx.files.srcs,
importpath = ctx.attr.importpath,
deps = [dep[GoLibraryInfo] for dep in ctx.attr.deps],
out = archive,
)
# Return the output file and metadata about the library.
runfiles = _collect_runfiles(
ctx,
direct_files = ctx.files.data,
indirect_targets = ctx.attr.data + ctx.attr.deps,
)
return [
DefaultInfo(
files = depset([archive]),
runfiles = runfiles,
),
GoLibraryInfo(
info = struct(
importpath = ctx.attr.importpath,
archive = archive,
),
deps = depset(
direct = [dep[GoLibraryInfo].info for dep in ctx.attr.deps],
transitive = [dep[GoLibraryInfo].deps for dep in ctx.attr.deps],
),
),
]
go_library = rule(
implementation = _go_library_impl,
attrs = {
"srcs": attr.label_list(
allow_files = [".go"],
doc = "Source files to compile",
),
"deps": attr.label_list(
providers = [GoLibraryInfo],
doc = "Direct dependencies of the library",
),
"data": attr.label_list(
allow_files = True,
doc = "Data files available to binaries using this library",
),
"importpath": attr.string(
mandatory = True,
doc = "Name by which the library may be imported",
),
},
doc = "Compiles a Go archive from Go sources and dependencies",
toolchains = ["@rules_go_simple//:toolchain_type"],
)
Registering toolchains
Users must register toolchains so Bazel can make use of them. This happens in MODULE.bazel
. Before doing this, we'll use the go_download
repository rule to define two repos: one for linux/amd64
and one for darwin/arm64
.
go_download = use_repo_rule("//:go.bzl", "go_download")
go_download(
name = "go_darwin_arm64",
goarch = "arm64",
goos = "darwin",
sha256 = "544932844156d8172f7a28f77f2ac9c15a23046698b6243f633b0a0b00c0749c",
urls = ["https://go.dev/dl/go1.25.0.darwin-arm64.tar.gz"],
)
go_download(
name = "go_linux_amd64",
goarch = "amd64",
goos = "linux",
sha256 = "2852af0cb20a13139b3448992e69b868e50ed0f8a1e5940ee1de9e19a123b613",
urls = ["https://go.dev/dl/go1.25.0.linux-amd64.tar.gz"],
)
register_toolchains(
"@go_darwin_arm64//:toolchain",
"@go_linux_amd64//:toolchain",
)
When Bazel starts, it considers the target platforms (set with --platforms
), the execution platforms (set with register_execution_platforms
or --extra_execution_platforms
), and all registered toolchains (set with register_toolchains
or --extra_toolchains
). Bazel attempts to select a toolchain for each execution platform and target platform that satisfies all platform constraints. If multiple toolchains are compatible, Bazel picks the first registered toolchain. Modules using register_toolchains
are considered in breadth-first pre-order, so toolchains registered in the main module take priority over others. You can get Bazel to show its work using the --toolchain_resolution_debug
flag, which takes a regular expression matching the toolchain type.
NOTE: Registering toolchains declared in repository rules as we're doing above has a major disadvantage: Bazel needs to evaluate the repository rule in order to read the toolchain definition. This means we need to download both the macOS and Linux archives, even though we only want to use one. We'll fix this in the next article when we get to module extensions and toolchainization.
Using toolchains
Let's check whether this works by building //tests:hello
, a minimal "hello world" go_binary
:
$ bazel build --subcommands --toolchain_resolution_debug=. //tests:hello
Starting local Bazel server (8.3.1) and connecting to it...
INFO: Invocation ID: b6fa5c3d-a819-4252-9790-860e5b47e8db
INFO: ToolchainResolution: Target platform @@platforms//host:host: Selected execution platform @@platforms//host:host,
INFO: ToolchainResolution: Performing resolution of //:toolchain_type for target platform @@platforms//host:host
ToolchainResolution: Toolchain @@+_repo_rules+go_darwin_arm64//:toolchain_impl is compatible with target platform, searching for execution platforms:
ToolchainResolution: Compatible execution platform @@platforms//host:host
ToolchainResolution: All execution platforms have been assigned a //:toolchain_type toolchain, stopping
ToolchainResolution: Recap of selected //:toolchain_type toolchains for target platform @@platforms//host:host:
ToolchainResolution: Selected @@+_repo_rules+go_darwin_arm64//:toolchain_impl to run on execution platform @@platforms//host:host
INFO: ToolchainResolution: Target platform @@platforms//host:host: Selected execution platform @@platforms//host:host, type //:toolchain_type -> toolchain @@+_repo_rules+go_darwin_arm64//:toolchain_impl
INFO: ToolchainResolution: Target platform @@platforms//host:host: Selected execution platform @@platforms//host:host,
INFO: ToolchainResolution: Target platform @@platforms//host:host: Selected execution platform @@platforms//host:host,
INFO: Analyzed target //tests:hello (65 packages loaded, 7253 targets configured).
INFO: Found 1 target...
Target //tests:hello up-to-date:
bazel-bin/tests/hello
INFO: Elapsed time: 3.056s, Critical Path: 0.03s
INFO: 1 process: 9 action cache hit, 1 internal.
INFO: Build completed successfully, 1 total action
Since we're not cross-compiling, Bazel only considered the platform @@platforms//host:host
. Let's see what constraints that has.
$ bazel query --output=build @@platforms//host
platform(
name = "host",
constraint_values = ["@platforms//cpu:aarch64", "@platforms//os:osx"],
)
And what constraints did we say that toolchain was compatible with?
$ bazel query --output=build @go_darwin_arm64//:toolchain
toolchain(
name = "toolchain",
toolchain_type = "//:toolchain_type",
exec_compatible_with = ["@platforms//os:macos", "@platforms//cpu:aarch64"],
target_compatible_with = ["@platforms//os:macos", "@platforms//cpu:aarch64"],
toolchain = "@go_darwin_arm64//:toolchain_impl",
)
Almost the same. @platforms//os:macos
is an alias
pointing to @platforms//os:osx
, so this toolchain satisfies all constraints from the platform.
Conclusion
Platforms and toolchains are a mechanism for decoupling a set of rules from the tools they depend on. This is most immediately useful for isolating the build from the machine it runs on. It also provides flexibility for users: it lets developers (not necessarily the original rule authors) write their own rules compatible with existing toolchains and their own toolchains compatible with existing rules. In our case, someone could create a toolchain for gccgo or TinyGo, and it would work with rules_go_simple
as long as it satisfies the interface we documented for our toolchain_type
. Someone else could write a go_proto_library
rule that builds generated code with the same compiler as go_library
.
Ultimately, the toolchain system separates what is being built (rules) from how to build it (toolchain). This means when you change one component, you don't need to rewrite all the build files in your repository. Change is isolated, which is important in any system that needs to scale.