-
-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] Improve DCP/PCP error messages with actionable backend guidance
bug
Something isn't working
v1
#39036
opened Apr 5, 2026 by
Pawansingh3889
Loading…
[vLLM IR] Cache the fx_replacement to avoid re-tracing the same impl
#39034
opened Apr 5, 2026 by
gcanlin
Loading…
5 tasks
[Bugfix][MoE] Fix hardcoded SharedExperts output buffer size for DBO ubatches
bug
Something isn't working
#39033
opened Apr 5, 2026 by
Gregory-Pereira
Loading…
NemotronH default mamba_ssm_cache_dtype=float32; enable auto-hook for NemotronHNanoVLV2Config
#39032
opened Apr 5, 2026 by
netanel-haber
•
Draft
[Core] Per-group BlockPool for hybrid Mamba/attention models
v1
#39031
opened Apr 5, 2026 by
arbi-dev
Loading…
4 of 5 tasks
nano_nemotron_vl: fix tensor device mismatch exception when video profiling
ready
ONLY add when PR is ready to merge/full CI is needed
#39029
opened Apr 5, 2026 by
netanel-haber
Loading…
Gemma4 multi-turn, tool calling, and reasoning fixes
documentation
Improvements or additions to documentation
frontend
tool-calling
Add structure to Related to CPU backends
documentation
Improvements or additions to documentation
intel-gpu
Related to Intel GPU
nvidia
ready
ONLY add when PR is ready to merge/full CI is needed
ready-run-all-tests
Trigger CI with all tests for wide-ranging PRs
rocm
Related to AMD ROCm
requirements/ directory
ci/build
cpu
#39024
opened Apr 5, 2026 by
hmellor
Loading…
[MoE][Fix] Fix DeepEP HT hardcoded per_act_token_quant=False
#39023
opened Apr 5, 2026 by
thc1006
Loading…
2 tasks
[Perf] Remove per-step KV offload touch, touch once at request_finished
kv-connector
v1
#39021
opened Apr 5, 2026 by
kfirtoledo
Loading…
2 tasks
fix(attention): fix Gemma4 support for old gpus like Turing
v1
#39018
opened Apr 5, 2026 by
lisp19
Loading…
[MoE] BF16 Triton MoE Perf regression - restore low latency path
ready
ONLY add when PR is ready to merge/full CI is needed
#39016
opened Apr 5, 2026 by
milesial
Loading…
[vLLM IR] rework gemma_rms_norm
ready-run-all-tests
Trigger CI with all tests for wide-ranging PRs
#39014
opened Apr 5, 2026 by
ZJY0516
Loading…
5 tasks
Refactor move experts
ci/build
documentation
Improvements or additions to documentation
nvidia
performance
Performance-related issues
ready
ONLY add when PR is ready to merge/full CI is needed
rocm
Related to AMD ROCm
#39013
opened Apr 5, 2026 by
Jackmin801
Loading…
1 task
Fix async spec decode TOCTOU race and underflow on aborted requests
v1
#39012
opened Apr 5, 2026 by
gagandhakrey
Loading…
Update MusicFlamingo and add AudioFlamingoNext
documentation
Improvements or additions to documentation
multi-modality
Related to multi-modality (#4194)
new-model
Requests to new models
#39011
opened Apr 5, 2026 by
lashahub
Loading…
4 of 5 tasks
[MoE] Move remaining PrepareAndFinalize to prepare finalize folder
ready
ONLY add when PR is ready to merge/full CI is needed
ready-run-all-tests
Trigger CI with all tests for wide-ranging PRs
#39009
opened Apr 5, 2026 by
Jackmin801
Loading…
1 task
[MoE] Move GPT OSS Triton kernel experts into fused_moe/experts/
documentation
Improvements or additions to documentation
gpt-oss
Related to GPT-OSS models
ready
ONLY add when PR is ready to merge/full CI is needed
ready-run-all-tests
Trigger CI with all tests for wide-ranging PRs
#39007
opened Apr 5, 2026 by
Jackmin801
Loading…
3 tasks done
[MoE] Move DEEP_GEMM into experts/ subdirectory
documentation
Improvements or additions to documentation
needs-rebase
performance
Performance-related issues
ready
ONLY add when PR is ready to merge/full CI is needed
ready-run-all-tests
Trigger CI with all tests for wide-ranging PRs
#39005
opened Apr 5, 2026 by
Jackmin801
Loading…
5 tasks done
[Frontend] Add /v1/files upload endpoint for multimodal inputs (#38531)
documentation
Improvements or additions to documentation
frontend
multi-modality
Related to multi-modality (#4194)
#39003
opened Apr 4, 2026 by
Alberto-Codes
Loading…
4 of 5 tasks
[ROCm] Support unlimited sequence lengths via multi-pass reduction
rocm
Related to AMD ROCm
v1
#39001
opened Apr 4, 2026 by
ekuznetsov139
Loading…
[BugFix][Parser] Fixing Qwen3.5 tool call parsing
bug
Something isn't working
qwen
Related to Qwen models
ready
ONLY add when PR is ready to merge/full CI is needed
tool-calling
#38996
opened Apr 4, 2026 by
Gregory-Pereira
Loading…
[Quantization] - Layerwise reloading of Attention/KV quantized models
#38995
opened Apr 4, 2026 by
Josephasafg
Loading…
3 of 5 tasks
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.