-
-
Notifications
You must be signed in to change notification settings - Fork 13.1k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Triton MLA GQA perf fixes (4x improvement at 80k context)
v1
#33529
opened Feb 2, 2026 by
koush
Loading…
3 of 5 tasks
Adds padding and perf improvements to wvSplitK_fp8
rocm
Related to AMD ROCm
#33527
opened Feb 2, 2026 by
amd-hhashemi
Loading…
5 tasks
Update get_expert_mapping to include self parameter
#33525
opened Feb 1, 2026 by
Otsutsukii
Loading…
5 tasks
[Core][Scheduler] Fix FCFS queue ordering for skipped waiting requests
v1
#33522
opened Feb 1, 2026 by
harsh543
Loading…
[Kernel] Add enable_sm120_or_later for SM121 (DGX Spark) CUTLASS support
nvidia
#33517
opened Feb 1, 2026 by
Code4me2
Loading…
[Bugfix] Add SM110/SM120 device capability checks for NVFP4 MoE backends
bug
Something isn't working
nvidia
#33516
opened Feb 1, 2026 by
Code4me2
Loading…
Fix reasoning_tokens for text-based parsers in Responses API
frontend
#33513
opened Feb 1, 2026 by
anencore94
Loading…
fix(ROCm): Make flash_attn import optional in MLA attention
rocm
Related to AMD ROCm
#33511
opened Feb 1, 2026 by
rabi
Loading…
[FIX] guidance: use max(vocab_size, len(tokenizer)) for n_vocab
structured-output
v1
#33509
opened Feb 1, 2026 by
FredericOdermatt
Loading…
1 of 5 tasks
[Feature]: Qwen3-Next dual-stream execution in_proj_qkvz in_proj_ba
qwen
Related to Qwen models
#33505
opened Feb 1, 2026 by
SouthWest7
•
Draft
1 of 5 tasks
Add **kwargs parameter to v1 FlashAttentionImpl as catch-all
v1
#33504
opened Feb 1, 2026 by
haojin2
Loading…
5 tasks done
feat(spec_decode): fuse EAGLE step slot mapping and metadata updates
speculative-decoding
v1
#33503
opened Feb 1, 2026 by
sladyn98
Loading…
3 of 5 tasks
Scheduler: skip KV-blocked requests to prevent blocking (#31731)
v1
#33499
opened Feb 1, 2026 by
harsh543
Loading…
[Experimental][Refactor] Refactor vision chunk modality processing for unification
documentation
Improvements or additions to documentation
multi-modality
Related to multi-modality (#4194)
needs-rebase
[Bugfix] Fix assertion error in flashmla backend with fullgraph enabled
bug
Something isn't working
v1
#33496
opened Feb 1, 2026 by
Kurumi5210
Loading…
5 tasks
[Doc]: update paths for Offline/Online/Others example sections
documentation
Improvements or additions to documentation
#33494
opened Feb 1, 2026 by
soyr-redhat
Loading…
4 of 5 tasks
Perf tuning and expansion of cases covered for wvSplitKrc
rocm
Related to AMD ROCm
#33493
opened Feb 1, 2026 by
amd-hhashemi
Loading…
5 tasks
Sort safetensors files to ensure deterministic loading order
#33491
opened Feb 1, 2026 by
Lumosis
Loading…
1 of 5 tasks
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.