munder-sa

Follow

munder-sa

Follow

Pinned Loading

SageAttention_win_blackwell SageAttention_win_blackwell Public

Forked from thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2
comfyui-llamacpp-client comfyui-llamacpp-client Public

Forked from fidecastro/comfyui-llamacpp-client

ComfyUI client for llama-server from llama.cpp

Python 1