forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 48
Pull requests: ROCm/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[CI] Fix CT W4A16 e2e memory-wait timeout on Strix Halo CI runner
#991
opened Jun 8, 2026 by
mgehre-amd
Loading…
wvSplitK int4: pad weight K-stride by +128 B on gfx1151
#989
opened Jun 8, 2026 by
mgehre-amd
Loading…
feat: Add FlexMLRT NPU vision backend for Qwen2.5-VL
#984
opened Jun 1, 2026 by
liangliangchang
•
Draft
5 tasks
[ROCm][MoE] Modular MoE: alias fused_out with output to skip finalize copy
#940
opened May 19, 2026 by
mgehre-amd
Loading…
2 tasks done
feat: Add NPU+GPU async pipelining for vision-language models
#936
opened May 14, 2026 by
liangliangchang
•
Draft
4 of 5 tasks
Annotate VLM/audio tower nn.Linear calls in PyTorch profiles
#934
opened May 13, 2026 by
mgehre-amd
Loading…
[ROCm][quant] INC: route w4a16-sym MoE through HybridW4A16 HIP path
#929
opened May 8, 2026 by
mgehre-amd
•
Draft
5 tasks
[bench] wvSplitK skinny GEMM: capture timed iters into a CUDA graph
#928
opened May 8, 2026 by
mgehre-amd
•
Draft
Auto-build flash-attn wheels on push, upload to S3
#910
opened Apr 30, 2026 by
mgehre-amd
•
Draft
1 task
[ROCm][DSv4] Share AITER decode dequant + fp8-cast buffers across layers (rebased, stacked on #902)
#903
opened Apr 27, 2026 by
ChuanLi1101
•
Draft
2 of 4 tasks
[ROCm][DSv4] Make AITER sparse decode cudagraph-clean (rebased, stacked on #901)
#902
opened Apr 27, 2026 by
ChuanLi1101
•
Draft
2 of 5 tasks
[ROCm][DSv4] AITER-accelerated MLA decode for DeepSeek V4 on MI355X (rebased on tj/dsv4prrebase)
#901
opened Apr 27, 2026 by
ChuanLi1101
•
Draft
1 of 4 tasks
[Do Not Merge] For review purpose: Rocm/aiter mla dsv4 decode cudagraph
#900
opened Apr 26, 2026 by
tjtanaavllm
•
Draft
5 tasks
[ROCm] support topk_softplus for all number of experts
#899
opened Apr 25, 2026 by
tjtanaa
Loading…
5 tasks
Tune hybrid_triton_w4a16 prefill kernel for gfx1151
#879
opened Apr 15, 2026 by
mgehre-amd
•
Draft
3 tasks done
Enable FLASH_ATTN backend with upstream flash-attn CK on ROCm for decode
#866
opened Apr 10, 2026 by
mgehre-amd
•
Draft
1 task
Previous Next
ProTip!
Updated in the last three days: updated:>2026-06-05.