Writing Bazel rules: data and runfiles

Published on 2018-10-02
Tagged: bazel go

Bazel has a neat feature that can simplify a lot of work with tests and executables: the ability to make data files available at run-time using data attributes. You may have seen these in rules like this:

cc_library(
    name = "server_lib",
    srcs = ["server.cc"],
    data = ["private.key"],
)

When a file is listed in a data attribute (or something that behaves like a data attribute), Bazel makes that file available at run-time to executables started with bazel run. This is useful for all kinds of things such as plugins, configuration files, certificates and keys, and resources.

In this article, we'll add data attributes to the go_library and go_binary rules in rules_go_simple, the set of rules we've been working on. This won't take long: we only need to add a few lines of code for each rule.

Data and runfiles

We can start by adding a data attribute to our rules. Here's the new declaration for go_library. The attribute in go_binary is similar.

go_library = rule(
    _go_library_impl,
    attrs = {
        "srcs": attr.label_list(
            allow_files = [".go"],
            doc = "Source files to compile",
        ),
        "deps": attr.label_list(
            providers = [GoLibrary],
            doc = "Direct dependencies of the library",
        ),
        "data": attr.label_list(
            allow_files = True,
            doc = "Data files available to binaries using this library",
        ),
        "importpath": attr.string(
            mandatory = True,
            doc = "Name by which the library may be imported",
        ),
    },
    doc = "Compiles a Go archive from Go sources and dependencies",
)

Bazel tracks files that should be made available at run-time using runfiles objects. You can create new runfiles objects with ctx.runfiles. In order to actually make files available, you need to put one of these in the runfiles field in the DefaultInfo provider returned by your rule. Recall that DefaultInfo is used to list the output files and executables produced by a rule.

Here's how we create the DefaultInfo provider for go_library. Again, go_binary is similar.

return [
    DefaultInfo(
        files = depset([archive]),
        runfiles = ctx.runfiles(collect_data = True),
    ),
    ...
]

The expression ctx.runfiles(collect_data = True) gathers the files listed in the data attribute and the runfiles returned by rules in the deps and srcs attributes. That means any library can have data files, and they will be available to tests and binaries run with bazel run that link that library. There are a few different ways to call ctx.runfiles. If you set collect_data = True, as we did above, Bazel will collect data runfiles from dependencies in the srcs, deps, and data attributes. If you set collect_default = True, Bazel will collect default runfiles from the same dependencies. I have no idea what the distinction is between data and default runfiles, but when you construct DefaultInfo, you can set the data_runfiles or default_runfiles fields explicitly. If you just set runfiles, your files will be treated as both data and default.

What if you want to build the list of files explicitly? This is useful if you want to collect files from non-standard attributes, or if you create files within your rule. ctx.runfiles accepts a files argument, which is a simple list of files. You can access runfiles from your dependencies with an expression like dep[DefaultInfo].data_runfiles, where dep is a Target. You can combine runfiles objects using runfiles.merge, which returns a new runfiles object. So we could have implemented go_library like this:

# Gather runfiles.
runfiles = ctx.runfiles(files = ctx.files.data)
for dep in ctx.attr.deps:
    runfiles = runfiles.merge(dep[DefaultInfo].data_runfiles)

# Return the output file and metadata about the library.
return [
    DefaultInfo(
        files = depset([archive]),
        runfiles = runfiles,
    ),
    ...
]

NOTE: When you have an attribute that is a label or label_list, you can access a list of all the files from all the labels using ctx.files (for example: ctx.files.data). This is almost always more convenient than going through ctx.attr (which gives you a Target or a list of Targets), since each target may have multiple files. If your label has allow_single_file = True set, you can also access the file through ctx.file. And if executable = True, you can access it through ctx.executable.

Testing data and runfiles

We test our new support for runfiles with a simple binary that depends on a library. Both binary and library have data files, and the test verifies they are present.

sh_test(
    name = "data_test",
    srcs = ["data_test.sh"],
    args = ["$(location :list_data_bin)"],
    data = [":list_data_bin"],
)

go_binary(
    name = "list_data_bin",
    srcs = ["list_data_bin.go"],
    deps = [":list_data_lib"],
    data = ["foo.txt"],
)

go_library(
    name = "list_data_lib",
    srcs = ["list_data_lib.go"],
    data = ["bar.txt"],
    importpath = "rules_go_simple/v3/tests/list_data_lib"
)

You can run this test with bazel test //v3/tests/....

Runfiles on Windows

When Bazel executes a binary on Unix platforms, it creates a tree of symbolic links to the binary's runfiles. Windows requires you to be an administrator to create symbolic links (or have developer mode turned on in newer versions of Windows 10). This means Bazel can't create symbolic links to runfiles on Windows. Instead, Bazel creates a manifest file, which lists all the runfiles. The manifest is pointed to by the RUNFILES_MANIFEST_FILE environment variable, which is set for tests. Nothing points to the manifest file for binaries run with bazel run, but you should find a file named MANIFEST in the initial working directory of the binary.

The manifest file maps runfile paths beginning with the workspace name to absolute paths on the file system. Each line contains the two paths for one file, delimited by a space. If you have runfiles with spaces in their paths... well, don't do that I guess. Note that both the runfile path and the absolute path are slash-delimited.

For builtin rules, there are some built-in libraries for accessing runfiles at @bazel_tools//tools/java/runfiles and @bazel_tools//tools/cpp/runfiles. For sh_test rules, there's also a predeclared rlocation function. Unfortunately, there's no general-purpose utility for locating runfiles in other languages. The MANIFEST format above is undocumented, and it's possible it may change in the future.