Skip to content

RUNFILES_DIR inheritance causes issues when non-Python process invokes py_binaryΒ #3518

@sfc-gh-mkeller

Description

@sfc-gh-mkeller

🐞 bug report

Affected Rule

The issue is caused by the rule: py_console_script_binary (and likely py_binary in general) when used as a tool invoked by another process that has inherited RUNFILES_DIR from a different py_binary.

The specific file affected is _bazel_site_init.py generated by the bootstrap, specifically the _find_runfiles_root() function.

Is this a regression?

Yes, the previous version in which this bug was not present was: 1.6.3

This appears to be a variant of the issue that was fixed in 1.7.0 ("The stage1 bootstrap script now correctly handles nested RUNFILES_DIR environments, fixing issues where a py_binary calls another py_binary"), but the fix doesn't cover the case where a non-Python process sits between the two Python binaries.

Description

When using multiple py_binary tools in a Bazel genrule, where one tool (e.g., grpcio-tools) invokes a non-Python binary (e.g., protoc) that then invokes another py_binary as a plugin (e.g., protoc-gen-mypy), the second Python binary fails with ModuleNotFoundError because it inherits the wrong RUNFILES_DIR from the first Python binary.

The chain of execution is:

  1. grpcio-tools (py_binary) β†’ sets RUNFILES_DIR to its own runfiles
  2. grpcio-tools invokes protoc (C++ binary) β†’ inherits RUNFILES_DIR unchanged
  3. protoc invokes protoc-gen-mypy (py_console_script_binary) as a plugin β†’ inherits wrong RUNFILES_DIR
  4. protoc-gen-mypy bootstrap sees RUNFILES_DIR pointing to grpcio-tools.runfiles, correctly determines it doesn't contain its own files
  5. Fallback to __file__-based path calculation fails because __file__ resolves to the symlink target (outside .runfiles/) rather than the symlink path (inside .runfiles/)
  6. sys.path gets populated with non-existent paths β†’ ModuleNotFoundError

πŸ”¬ Minimal Reproduction

# BUILD.bazel
load("@rules_python//python/entry_points:py_console_script_binary.bzl", "py_console_script_binary")

py_console_script_binary(
    name = "protoc-gen-mypy",
    pkg = "@pypi//:mypy-protobuf",
)

py_binary(
    name = "grpcio-tools",
    srcs = ["grpcio_tools.py"],
    deps = ["@pypi//:grpcio-tools"],
)

genrule(
    name = "generate_protos",
    srcs = ["example.proto"],
    outs = ["example_pb2.py", "example_pb2.pyi"],
    cmd = """
        $(location :grpcio-tools) \
            --plugin=protoc-gen-mypy=$(location :protoc-gen-mypy) \
            --mypy_out=$(GENDIR) \
            $(location example.proto)
    """,
    tools = [
        ":grpcio-tools",
        ":protoc-gen-mypy",
    ],
)

The key aspect is that grpcio-tools invokes protoc (C++ binary), which then invokes protoc-gen-mypy as a subprocess plugin. The non-Python protoc binary passes through the environment unchanged.

πŸ”₯ Exception or Error


ModuleNotFoundError: No module named 'mypy_protobuf'
--mypy_out: protoc-gen-mypy: Plugin failed with status code 1.

With RULES_PYTHON_BOOTSTRAP_VERBOSE=1, the root cause is visible:


bootstrap: stage 2: initial environ: RUNFILES_DIR='.../grpcio-tools.runfiles'

bazel_site_init: runfiles_root: .../bin/tool/src/protos
# ^^^ WRONG! Should be .../protoc-gen-mypy.runfiles

bazel_site_init: append sys.path: .../bin/tool/src/protos/sfc_rules_python++pip+pypi/.../[email protected]/site-packages
# ^^^ This path doesn't exist - the actual files are in protoc-gen-mypy.runfiles/...

🌍 Your Environment

Operating System:

  
Linux 5.10.238 (aarch64)
  

Output of bazel version:

  
Build label: 8.2.1-sf5
  

Rules_python version:

  
1.7.0
  

Anything else relevant?

Root Cause: The _find_runfiles_root() function in _bazel_site_init.py has a fallback that calculates the runfiles root from __file__:

num_dirs_to_runfiles_root = _SELF_RUNFILES_RELATIVE_PATH.count("/") + 1
runfiles_root = os.path.dirname(__file__)
for _ in range(num_dirs_to_runfiles_root):
    runfiles_root = os.path.dirname(runfiles_root)

However, when Python imports a module through a symlink, __file__ is set to the symlink target, not the symlink path. The symlink target is:

.../bin/tool/src/protos/_protoc-gen-mypy.venv/.../site-packages/_bazel_site_init.py

But the symlink path (where the file is actually accessed from) is:

.../protoc-gen-mypy.runfiles/_main/tool/src/protos/_protoc-gen-mypy.venv/.../site-packages/_bazel_site_init.py

The symlink target doesn't have .runfiles/_main/ in its path, so the directory traversal lands in the wrong location.

Workaround: Create wrapper scripts that unset RUNFILES_DIR before invoking the plugins:

cat > wrapper/protoc-gen-mypy << 'EOF'
#!/bin/bash
unset RUNFILES_DIR
exec "$ACTUAL_BINARY" "$@"
EOF

This forces the bootstrap to use the __file__-based fallback without the incorrect RUNFILES_DIR validation failing first... but then the symlink issue means we'd still have a problem. Actually, unsetting RUNFILES_DIR works because when RUNFILES_DIR is unset, the bootstrap uses a different code path that correctly finds the runfiles via RUNFILES_MANIFEST_FILE or other means.

I want to disclaim that I've used AI to find this issue and the possible solution, as I'm not too familiar with Bazel yet. However; its solution seems to have fixed our issue locally and the explanation seems to match the other linked issue and its accompanying PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions