[Feat]: Add 2D-tiled causal_conv1d prefill kernel for gated delta net by yiijin · Pull Request #1104 · ROCm/ATOM

yiijin · 2026-06-05T10:05:17Z

Motivation

Add a 2D tiled prefill kernel variant for causal conv1d that achieves ~3.4x speedup over the original atom kernel by processing multiple tokens per thread block simultaneously.

Technical Details

Add _causal_conv1d_fwd_kernel_tile: 2D tiled kernel with [BLOCK_N, BLOCK_M] coalesced loads
Replace per-token loop with batch tile loads + vectorized convolution + fused SiLU via v_rcp_f32
Add _causal_conv1d_fn_tile wrapper with configurable BLOCK_M/BLOCK_N/num_warps
Environment variable ATOM_CAUSAL_CONV1D_KERNEL=nontile to fallback to original kernel
Default config: BLOCK_M=64, BLOCK_N=32, num_warps=4

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR introduces a new 2D-tiled Triton prefill kernel for causal_conv1d (used by Gated Delta Net / Mamba-style ops) to improve prefill performance by processing multiple tokens per program instance, with an env-var switch to fall back to the original 1D per-token kernel.

Changes:

Added _causal_conv1d_fwd_kernel_tile and _causal_conv1d_fn_tile implementing a 2D-tiled [BLOCK_N, BLOCK_M] prefill path with fused SiLU.
Added a dispatcher in causal_conv1d_fn to select tiled vs original kernel via ATOM_CAUSAL_CONV1D_KERNEL=nontile.
Updated compute_causal_conv1d_metadata to generate metadata for both BLOCK_M=8 and BLOCK_M=64.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`atom/model_ops/mamba_ops/causal_conv1d.py`	Adds the 2D-tiled prefill kernel + Python wrapper and env-var dispatch.
`atom/model_ops/attentions/gdn_attn.py`	Extends causal-conv metadata generation to support the new `BLOCK_M=64` tiled path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

+    dim, cu_seqlen = x.shape
+    _, width = weight.shape
+    state_len = width - 1
+    np2_statelen = triton.next_power_of_2(state_len)


Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

- Add `idx_feats < dim` guard to `is_v_block` to prevent out-of-bounds stores when dim is not a multiple of BLOCK_N. - Remove unused `original_x_dtype` assignment in `_causal_conv1d_fn_tile`. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

…ride check - Assert KERNEL_WIDTH in {2, 3, 4} to fail fast on unsupported widths. - Validate metadata contains the selected BLOCK_M key. - Tighten stride check to require stride(0)==1 (channel-last), consistent with the original _causal_conv1d_fn. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings June 5, 2026 10:05

Copilot started reviewing on behalf of yiijin June 5, 2026 10:05 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Comment thread atom/model_ops/mamba_ops/causal_conv1d.py

Comment thread atom/model_ops/mamba_ops/causal_conv1d.py Outdated

yiijin force-pushed the conv branch from 665ec3e to 4a42b41 Compare June 5, 2026 10:22

Copilot AI review requested due to automatic review settings June 5, 2026 10:27

Copilot started reviewing on behalf of yiijin June 5, 2026 10:27 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings June 5, 2026 10:38

Copilot started reviewing on behalf of yiijin June 5, 2026 10:38 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Comment thread atom/model_ops/mamba_ops/causal_conv1d.py

Comment thread atom/model_ops/mamba_ops/causal_conv1d.py

Comment thread atom/model_ops/mamba_ops/causal_conv1d.py

yiijin force-pushed the conv branch from f2d7866 to f238e20 Compare June 8, 2026 03:59

yiijin and others added 9 commits June 7, 2026 23:19

opt causal_conv1d_fn

cff7db5

add 2d-tile optimized causal conv kernel

6156ff7

align functionalities with origin ATOM causal_conv1d of optimized kernel

f5c0857

refine causal_conv1d_tile kernel and unify api usage

8e23c7c

remove unnecessary annotation

e1e9a11

style: apply Black formatting to causal_conv1d.py

c359ebd

Co-authored-by: Cursor <cursoragent@cursor.com>

fix kernel dispatch

7f8c394

yiijin force-pushed the conv branch from f238e20 to 7f8c394 Compare June 8, 2026 04:19

Copilot AI review requested due to automatic review settings June 8, 2026 04:19

Copilot started reviewing on behalf of yiijin June 8, 2026 04:20 View session

yiijin changed the title ~~Add 2D-tiled causal_conv1d prefill kernel for gated delta net~~ [Feat]: Add 2D-tiled causal_conv1d prefill kernel for gated delta net Jun 8, 2026

Copilot AI reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat]: Add 2D-tiled causal_conv1d prefill kernel for gated delta net #1104

[Feat]: Add 2D-tiled causal_conv1d prefill kernel for gated delta net #1104
yiijin wants to merge 9 commits into
ROCm:mainfrom
yiijin:conv

yiijin commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yiijin commented Jun 5, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants