-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
What happened + What you expected to happen
Description
When submitting a Ray job via the Ray Jobs REST API, the
worker_process_setup_hook specified in runtime_env is executed in the
driver process, rather than in the worker processes.
According to the documentation:
https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html
worker_process_setup_hook: A callable that runs in the worker process on each worker node.
Expected behavior
worker_process_setup_hook should run in the worker process on each worker node, regardless of whether the job is submitted via the Python API or the Jobs REST API.
Actual behavior
When the job is submitted using the Ray Jobs REST API, the hook runs in the
driver process, not in worker processes.
Impact
This blocks per-worker initialization (e.g. environment setup, local resource
configuration) for jobs submitted via the Jobs API.
Versions / Dependencies
Environment:
- Ray version: 2.35.0
- Python version: 3.9 (container), 3.11.2 (development)
- Submission method: Ray Jobs REST API (via ray.run_job())
- OS: Debian 11 (in Docker container)
- Installation: Bazel-managed dependencies in uber-one monorepo
Reproduction script
Reproduction (high level)
- Submit a Ray job via the Jobs REST API
- Provide
worker_process_setup_hookinruntime_env - Log process context (PID / role) inside the hook
- Observe the hook runs in the driver process
We can provide a minimal reproduction script if helpful.
Issue Severity
None