Skip to content

Conversation

@nrghosh
Copy link
Contributor

@nrghosh nrghosh commented Jan 17, 2026

Summary

  • Upgrade vLLM dependency from 0.13.0 to 0.14.0rc1 0.14.0 for testing

Fixes

  1. PoolingParams.normalize → use_activation (python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py) - vllm#32243

  2. Multi-GPU DP tests switched to MoE models (doc/source/llm/doc_code/serve/multi_gpu/dp_basic_example.py, dp_pd_example.py) - vLLM now makes DP ranks independent for dense models - vllm#30739

Dependency Changes

  1. PyTorch 2.9.1 now required (default wheel compiled against CUDA 12.9)
  2. compressed-tensors ≥0.13.0 for updated quantization support
  3. CUDA 12.9 default (up from 12.4 in 0.13.0)
  4. Protobuf >= 6.33.2 ([grpc] Support gRPC server entrypoint vllm-project/vllm#30190)

Testing

  • LLM CPU tests
  • LLM multi-GPU tests
  • LLM GPU tests
  • LLM Batch Release tests (run locally)
  • LLM Serve Release tests (run locally)
  • Verify no breaking API changes
image

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades the vLLM dependency to version 0.14.0rc1. The changes include updating the version in requirements.txt, setup.py, and the Dockerfile. A detailed analysis document is also added, which is a great addition. My review focuses on ensuring the accuracy of this analysis document. I've found a couple of inconsistencies in the analysis document that should be addressed for clarity and correctness. Otherwise, the changes look good.

@nrghosh nrghosh force-pushed the nrghosh/vllm-0.14.0-rc branch from e3d235b to 01d9154 Compare January 17, 2026 00:35
@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Jan 17, 2026
# Those pins for the sake of workarounds should not be advertised as constraints
# on future releases in setup.py.
vllm[audio]>=0.13.0
vllm[audio] @ git+https://github.com/vllm-project/[email protected]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we upgrading to an rc release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to get ahead of testing - so we can be ready - they haven't released 0.14.0 just yet

@nrghosh nrghosh force-pushed the nrghosh/vllm-0.14.0-rc branch from 261437a to 8cc3ce8 Compare January 21, 2026 19:53
@nrghosh nrghosh changed the title [LLM] Upgrade vLLM to 0.14.0 [deps][LLM] Upgrade vLLM to 0.14.0 Jan 21, 2026
@nrghosh nrghosh force-pushed the nrghosh/vllm-0.14.0-rc branch from cf7f2be to b766902 Compare January 22, 2026 00:11
Copy link
Contributor Author

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Running llm release tests - cpu/gpu llm tests unblocked
  • main blocker it seems is the protobuf upgrade conflict + vllm 0.14.0 requiring a torch upgrade to torch==2.9.1+cpu

cc @aslonnie @elliot-barn

Copy link
Contributor Author

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-gpu test regression is fixed (running locally with vllm0.14.0) but is now OOMing on CI https://buildkite.com/ray-project/premerge/builds/58312/steps/table?sid=019be30d-ed6f-4ed6-94c7-6d9c87068347

cc @eicherseiji if we want to request them to be bumped from T4 -> L4 iirc or fix it on the config side

nrghosh and others added 8 commits January 26, 2026 15:58
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: Nikhil Ghosh <[email protected]>
- Use a MoE model (Deepseek-V2-Lite) because
vllm-project/vllm#30739 changes how vLLM handles
DP ranks - overrides dp_size=1 and dp_rank=0 if non-MoE model

- Fixes doc/source/llm/doc_code/serve/multi_gpu/dp_basic_example.py and
 doc/source/llm/doc_code/serve/multi_gpu/dp_pd_example.py

- vLLM 0.14.0 commit bd877162e optimizes DP for dense models by making each rank independent and only preserving DP coordination for MoE models where it's needed for expert

- Impact: Ray's DPServer DP coordination (rank assignment, stats addresses) was ignored for dense models like Qwen2.5-0.5B-Instruct, causing cascading assertion failures

- Fix: The tests now use an MoE model where vLLM's DP coordination is preserved. Outside of this test, dense model deployments should use Ray Serve replicas (num_replicas) instead of vLLM's data_parallel_size.

Signed-off-by: Nikhil Ghosh <[email protected]>
@duyleekun
Copy link

https://github.com/vllm-project/vllm/releases/tag/v0.15.0 released, just saying :)

Signed-off-by: Jeffrey Wang <[email protected]>
# Remove the GPU constraints, numpy pin, and scipy pin (LLM requires numpy>=2 and compatible scipy)
cp "python/${FILENAME}" "/tmp/ray-deps/${FILENAME}"
sed -e '/^--extra-index-url /d' -e '/^--find-links /d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp"
sed -e '/^--extra-index-url /d' -e '/^--find-links /d' -e '/^numpy==/d' -e '/^scipy==/d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is modified by Claude. We'll see if we need this.

# Those pins for the sake of workarounds should not be advertised as constraints
# on future releases in setup.py.
vllm[audio]>=0.14.0
vllm[audio] @ git+https://github.com/vllm-project/[email protected].0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.15.0 is somehow still unavailable. Will check again later.

Signed-off-by: Jeffrey Wang <[email protected]>
@jeffreywang-anyscale
Copy link
Contributor

Ran the following locally and everything succeeded. Trying to wrap my head around why premerge fails.

bash ci/ci.sh compile_pip_dependencies
bash ci/compile_llm_requirements.sh
bazel run //ci/raydepsets:raydepsets -- build --all-configs

Signed-off-by: Jeffrey Wang <[email protected]>
Signed-off-by: Jeffrey Wang <[email protected]>
Signed-off-by: Jeffrey Wang <[email protected]>
Signed-off-by: Jeffrey Wang <[email protected]>
Signed-off-by: Jeffrey Wang <[email protected]>
- --python-version=3.11
- --unsafe-package ray
- --python-platform=linux
# Use manylinux_2_31 for vllm 0.15.0 wheel compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hint: Wheels are available for `vllm` (v0.15.0) on the following platforms: `manylinux_2_31_aarch64`, `manylinux_2_31_x86_64`

Copy link
Contributor

@jeffreywang-anyscale jeffreywang-anyscale Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linux defaults to manylinux_2_28_x86_64 which vllm 0.15.0 does not support

Signed-off-by: Jeffrey Wang <[email protected]>
@duyleekun
Copy link

What's current ray policy on vllm version support? Since 0.15 produces a lot of breaking changes and some might want to mix vllm versions between ray apps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants