Skip to content

Commit b7c6d4f

Browse files
feat: update dockerfile to 12.9.1 (#267)
* feat: update dockerfile to 12.9.1 * update readme on VLLM_NIGHTLY build arg
1 parent d69cc02 commit b7c6d4f

2 files changed

Lines changed: 18 additions & 3 deletions

File tree

Dockerfile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
FROM nvidia/cuda:12.8.0-base-ubuntu22.04
1+
FROM nvidia/cuda:12.9.1-base-ubuntu22.04
22

33
RUN apt-get update -y \
44
&& apt-get install -y python3-pip
55

6-
RUN ldconfig /usr/local/cuda-12.8/compat/
6+
RUN ldconfig /usr/local/cuda-12.9/compat/
77

88
# Install vLLM with FlashInfer - use CUDA 12.8 PyTorch wheels (compatible with vLLM 0.15.0)
99
RUN python3 -m pip install --upgrade pip && \
10-
python3 -m pip install "vllm[flashinfer]==0.15.0" --extra-index-url https://download.pytorch.org/whl/cu128
10+
python3 -m pip install "vllm[flashinfer]==0.15.0" --extra-index-url https://download.pytorch.org/whl/cu129
1111

1212

1313

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ To build an image with the model baked in, you must specify the following docker
8080
- `WORKER_CUDA_VERSION`: `12.1.0` (`12.1.0` is recommended for optimal performance).
8181
- `TOKENIZER_NAME`: Tokenizer repository if you would like to use a different tokenizer than the one that comes with the model. (default: `None`, which uses the model's tokenizer)
8282
- `TOKENIZER_REVISION`: Tokenizer revision to load (default: `main`).
83+
- `VLLM_NIGHTLY`: Set to `true` to replace the pinned vLLM release with the latest nightly build and the latest `transformers` from source. Useful for testing unreleased vLLM features. (default: `false`)
8384

8485
For the remaining settings, you may apply them as environment variables when running the container. Supported environment variables are listed in the [Environment Variables](#environment-variables) section.
8586

@@ -89,6 +90,20 @@ For the remaining settings, you may apply them as environment variables when run
8990
docker build -t username/image:tag --build-arg MODEL_NAME="openchat/openchat_3.5" --build-arg BASE_PATH="/models" .
9091
```
9192

93+
### Example: Building with vLLM Nightly
94+
95+
To use the latest unreleased vLLM build (installs from the nightly wheel index and `transformers` from source):
96+
97+
```bash
98+
docker build -t username/image:tag --build-arg VLLM_NIGHTLY=true .
99+
```
100+
101+
You can combine it with other arguments:
102+
103+
```bash
104+
docker build -t username/image:tag --build-arg VLLM_NIGHTLY=true --build-arg MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" --build-arg BASE_PATH="/models" .
105+
```
106+
92107
### (Optional) Including Huggingface Token
93108

94109
If the model you would like to deploy is private or gated, you will need to include it during build time as a Docker secret, which will protect it from being exposed in the image and on DockerHub.

0 commit comments

Comments
 (0)