Skip to content

Commit 41220ed

Browse files
Docs: remove vLLM install step from mbridge vllm quickstart (#618)
Co-authored-by: Claude Sonnet 4.6 <[email protected]>
1 parent e61f285 commit 41220ed

1 file changed

Lines changed: 4 additions & 12 deletions

File tree

docs/llm/mbridge/optimized/vllm.md

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,7 @@ This section shows how to use scripts and APIs to export a Megatron-Bridge LLM t
1818
nvcr.io/nvidia/nemo:vr
1919
```
2020

21-
3. Install vLLM by executing the following command inside the container if it is not available in the container:
22-
23-
```shell
24-
cd /opt/Export-Deploy
25-
uv sync --inexact --link-mode symlink --locked --extra vllm $(cat /opt/uv_args.txt)
26-
27-
```
28-
29-
4. Run the following deployment script to verify that everything is working correctly. The script exports the Llama Megatron-Bridge checkpoint to vLLM and subsequently serves it on the Triton server:
21+
3. Run the following deployment script to verify that everything is working correctly. The script exports the Llama Megatron-Bridge checkpoint to vLLM and subsequently serves it on the Triton server:
3022

3123
```shell
3224
python /opt/Export-Deploy/scripts/deploy/nlp/deploy_vllm_triton.py \
@@ -35,15 +27,15 @@ This section shows how to use scripts and APIs to export a Megatron-Bridge LLM t
3527
--tensor_parallelism_size 1
3628
```
3729

38-
5. If the test yields a shared memory-related error, increase the shared memory size using ``--shm-size`` (gradually by 50%, for example).
30+
4. If the test yields a shared memory-related error, increase the shared memory size using ``--shm-size`` (gradually by 50%, for example).
3931

40-
6. In a separate terminal, access the running container as follows:
32+
5. In a separate terminal, access the running container as follows:
4133

4234
```shell
4335
docker exec -it nemo-fw bash
4436
```
4537

46-
7. To send a query to the Triton server, run the following script:
38+
6. To send a query to the Triton server, run the following script:
4739

4840
```shell
4941
python /opt/Export-Deploy/scripts/deploy/nlp/query_vllm.py -mn llama -p "The capital of Canada is" -mat 50

0 commit comments

Comments
 (0)