-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[deps][LLM] Upgrade vLLM to 0.14.0 #60253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request upgrades the vLLM dependency to version 0.14.0rc1. The changes include updating the version in requirements.txt, setup.py, and the Dockerfile. A detailed analysis document is also added, which is a great addition. My review focuses on ensuring the accuracy of this analysis document. I've found a couple of inconsistencies in the analysis document that should be addressed for clarity and correctness. Otherwise, the changes look good.
e3d235b to
01d9154
Compare
| # Those pins for the sake of workarounds should not be advertised as constraints | ||
| # on future releases in setup.py. | ||
| vllm[audio]>=0.13.0 | ||
| vllm[audio] @ git+https://github.com/vllm-project/[email protected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we upgrading to an rc release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to get ahead of testing - so we can be ready - they haven't released 0.14.0 just yet
261437a to
8cc3ce8
Compare
cf7f2be to
b766902
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Running llm release tests - cpu/gpu llm tests unblocked
- main blocker it seems is the protobuf upgrade conflict + vllm 0.14.0 requiring a torch upgrade to torch==2.9.1+cpu
nrghosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multi-gpu test regression is fixed (running locally with vllm0.14.0) but is now OOMing on CI https://buildkite.com/ray-project/premerge/builds/58312/steps/table?sid=019be30d-ed6f-4ed6-94c7-6d9c87068347
cc @eicherseiji if we want to request them to be bumped from T4 -> L4 iirc or fix it on the config side
Signed-off-by: Nikhil Ghosh <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: Nikhil Ghosh <[email protected]>
Signed-off-by: Nikhil Ghosh <[email protected]>
…ctivation Signed-off-by: Nikhil Ghosh <[email protected]>
- Use a MoE model (Deepseek-V2-Lite) because vllm-project/vllm#30739 changes how vLLM handles DP ranks - overrides dp_size=1 and dp_rank=0 if non-MoE model - Fixes doc/source/llm/doc_code/serve/multi_gpu/dp_basic_example.py and doc/source/llm/doc_code/serve/multi_gpu/dp_pd_example.py - vLLM 0.14.0 commit bd877162e optimizes DP for dense models by making each rank independent and only preserving DP coordination for MoE models where it's needed for expert - Impact: Ray's DPServer DP coordination (rank assignment, stats addresses) was ignored for dense models like Qwen2.5-0.5B-Instruct, causing cascading assertion failures - Fix: The tests now use an MoE model where vLLM's DP coordination is preserved. Outside of this test, dense model deployments should use Ray Serve replicas (num_replicas) instead of vLLM's data_parallel_size. Signed-off-by: Nikhil Ghosh <[email protected]>
Signed-off-by: Nikhil Ghosh <[email protected]>
43094cc to
ee57de3
Compare
|
https://github.com/vllm-project/vllm/releases/tag/v0.15.0 released, just saying :) |
ee57de3 to
10801be
Compare
Signed-off-by: Jeffrey Wang <[email protected]>
10801be to
eca9898
Compare
| # Remove the GPU constraints, numpy pin, and scipy pin (LLM requires numpy>=2 and compatible scipy) | ||
| cp "python/${FILENAME}" "/tmp/ray-deps/${FILENAME}" | ||
| sed -e '/^--extra-index-url /d' -e '/^--find-links /d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp" | ||
| sed -e '/^--extra-index-url /d' -e '/^--find-links /d' -e '/^numpy==/d' -e '/^scipy==/d' "/tmp/ray-deps/${FILENAME}" > "/tmp/ray-deps/${FILENAME}.tmp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is modified by Claude. We'll see if we need this.
| # Those pins for the sake of workarounds should not be advertised as constraints | ||
| # on future releases in setup.py. | ||
| vllm[audio]>=0.14.0 | ||
| vllm[audio] @ git+https://github.com/vllm-project/[email protected].0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.15.0 is somehow still unavailable. Will check again later.
Signed-off-by: Jeffrey Wang <[email protected]>
|
Ran the following locally and everything succeeded. Trying to wrap my head around why premerge fails. |
Signed-off-by: Jeffrey Wang <[email protected]>
Signed-off-by: Jeffrey Wang <[email protected]>
d706411 to
04eb5d2
Compare
Signed-off-by: Jeffrey Wang <[email protected]>
Signed-off-by: Jeffrey Wang <[email protected]>
Signed-off-by: Jeffrey Wang <[email protected]>
…ck locally Signed-off-by: Jeffrey Wang <[email protected]>
Signed-off-by: Jeffrey Wang <[email protected]>
Signed-off-by: Jeffrey Wang <[email protected]>
| - --python-version=3.11 | ||
| - --unsafe-package ray | ||
| - --python-platform=linux | ||
| # Use manylinux_2_31 for vllm 0.15.0 wheel compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hint: Wheels are available for `vllm` (v0.15.0) on the following platforms: `manylinux_2_31_aarch64`, `manylinux_2_31_x86_64`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linux defaults to manylinux_2_28_x86_64 which vllm 0.15.0 does not support
Signed-off-by: Jeffrey Wang <[email protected]>
|
What's current ray policy on vllm version support? Since 0.15 produces a lot of breaking changes and some might want to mix vllm versions between ray apps. |
Summary
0.14.0rc10.14.0 for testingFixes
PoolingParams.normalize → use_activation (
python/ray/llm/tests/batch/gpu/stages/test_vllm_engine_stage.py) - vllm#32243Multi-GPU DP tests switched to MoE models (
doc/source/llm/doc_code/serve/multi_gpu/dp_basic_example.py,dp_pd_example.py) - vLLM now makes DP ranks independent for dense models - vllm#30739Dependency Changes
Testing