feat: update dockerfile to 12.9.1 (#267)

velaraptor-runpod · web-flow · commit b7c6d4f9a267 · 2026-02-19T10:13:14.000+01:00
* feat: update dockerfile to 12.9.1

* update readme on VLLM_NIGHTLY build arg
diff --git a/Dockerfile b/Dockerfile
@@ -1,13 +1,13 @@
-FROM nvidia/cuda:12.8.0-base-ubuntu22.04 
+FROM nvidia/cuda:12.9.1-base-ubuntu22.04 
 
 RUN apt-get update -y \
     && apt-get install -y python3-pip
 
-RUN ldconfig /usr/local/cuda-12.8/compat/
+RUN ldconfig /usr/local/cuda-12.9/compat/
 
 # Install vLLM with FlashInfer - use CUDA 12.8 PyTorch wheels (compatible with vLLM 0.15.0)
 RUN python3 -m pip install --upgrade pip && \
-    python3 -m pip install "vllm[flashinfer]==0.15.0" --extra-index-url https://download.pytorch.org/whl/cu128
+    python3 -m pip install "vllm[flashinfer]==0.15.0" --extra-index-url https://download.pytorch.org/whl/cu129
 
 
 
diff --git a/README.md b/README.md
@@ -80,6 +80,7 @@ To build an image with the model baked in, you must specify the following docker
   - `WORKER_CUDA_VERSION`: `12.1.0` (`12.1.0` is recommended for optimal performance).
   - `TOKENIZER_NAME`: Tokenizer repository if you would like to use a different tokenizer than the one that comes with the model. (default: `None`, which uses the model's tokenizer)
   - `TOKENIZER_REVISION`: Tokenizer revision to load (default: `main`).
+  - `VLLM_NIGHTLY`: Set to `true` to replace the pinned vLLM release with the latest nightly build and the latest `transformers` from source. Useful for testing unreleased vLLM features. (default: `false`)
 
 For the remaining settings, you may apply them as environment variables when running the container. Supported environment variables are listed in the [Environment Variables](#environment-variables) section.
 
@@ -89,6 +90,20 @@ For the remaining settings, you may apply them as environment variables when run
 docker build -t username/image:tag --build-arg MODEL_NAME="openchat/openchat_3.5" --build-arg BASE_PATH="/models" .
 ```
 
+### Example: Building with vLLM Nightly
+
+To use the latest unreleased vLLM build (installs from the nightly wheel index and `transformers` from source):
+
+```bash
+docker build -t username/image:tag --build-arg VLLM_NIGHTLY=true .
+```
+
+You can combine it with other arguments:
+
+```bash
+docker build -t username/image:tag --build-arg VLLM_NIGHTLY=true --build-arg MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct" --build-arg BASE_PATH="/models" .
+```
+
 ### (Optional) Including Huggingface Token
 
 If the model you would like to deploy is private or gated, you will need to include it during build time as a Docker secret, which will protect it from being exposed in the image and on DockerHub.