Skip to content

vllm-project/vllm-gaudi

Repository files navigation

vLLM x Intel-Gaudi

vLLM Hardware Plugin for Intel® Gaudi®

| Documentation | Intel® Gaudi® Documentation | Optimizing Training Platform Guide |


Latest News 🔥

  • [2026/03] Version 0.16.0 is now available, built on vLLM 0.16.0 and fully compatible with Intel® Gaudi® v1.23.0.

    This release introduces validated support and critical stability fixes for Qwen3-VL models leveraging HPUMMEncoderAttention. Performance and stability were improved through backported Mamba architecture optimizations, Docker and UBI infrastructure enhancements, and a forced CPU loading mechanism for INC quantization to prevent OOM errors.

  • [2026/02] Version 0.15.1 is now available, built on vLLM 0.15.1 and fully compatible with Intel® Gaudi® v1.23.0.

    This release introduces validated support for Granite 4.0-h and Qwen3-VL (dense and MoE variants) on Intel Gaudi 3, alongside significant Llama 4 stability fixes. It also features major prefill performance improvements via full chunked prefill attention, FlashAttention online merge, b2b matmul operations, and KV cache sharing. Additionally, this version adds HPU ops for Mamba/SSM architectures to enable hybrid models, and introduces new support for ModelOpt FP8 quantization.

  • [2026/02] Version 0.14.1 is now available, built on vLLM 0.14.1 and fully compatible with Intel® Gaudi® v1.23.0. It introduces support for Granite 4.0h and Qwen 3 VL models.

  • [2026/01] Version 0.13.0 is now available, built on vLLM 0.13.0 and fully compatible with Intel® Gaudi® v1.23.0. It introduces experimental dynamic quantization for MatMul and KV‑cache operations to improve performance and also supports additional models.


About

The vLLM Hardware Plugin for Intel® Gaudi® integrates Intel® Gaudi® AI accelerators with vLLM to optimize large language model inference. It follows the [RFC]: Hardware pluggable and [RFC]: Enhancing vLLM Plugin Architecture principles, providing a modular interface for Intel® Gaudi® hardware. For more information, see the Plugin System document.

Getting Started

  1. Set up your execution environment. Additionally, to achieve the best performance on HPU, follow the methods outlined in the Optimizing Training Platform Guide.

  2. Get the last verified vLLM commit. While vLLM Hardware Plugin for Intel® Gaudi® follows the latest vLLM commits, upstream API updates may introduce compatibility issues. The saved commit has been thoroughly validated.

    git clone https://github.com/vllm-project/vllm-gaudi
    cd vllm-gaudi
    export VLLM_COMMIT_HASH=$(git show "origin/vllm/last-good-commit-for-vllm-gaudi:VLLM_STABLE_COMMIT" 2>/dev/null)
    cd ..
  3. Install vLLM using pip or build it from source:

    # Build vLLM from source for empty platform, reusing existing torch installation
    git clone https://github.com/vllm-project/vllm
    cd vllm
    git checkout $VLLM_COMMIT_HASH
    pip install -r <(sed '/^torch/d' requirements/build.txt)
    VLLM_TARGET_DEVICE=empty pip install --no-build-isolation -e .
    cd ..
  4. Install vLLM Hardware Plugin for Intel® Gaudi® from source:

    cd vllm-gaudi
    pip install -e .
    cd ..

    To see all the available installation methods, such as NIXL, see the Installation guide.

Contributing

We welcome and value any contributions and collaborations.

Contact Us

  • For technical questions and feature requests, please use GitHub Issues.

About

Community maintained hardware plugin for vLLM on Intel Gaudi

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors