Description cuTile Python Parity Checklist
ct.atomic_add on bfloat16 (sm_90+) (NVIDIA/cutile-python@e517b6d )
Tiled-view atomic ops: tv.atomic_add, atomic_max, atomic_min, atomic_and, atomic_or, atomic_xor (NVIDIA/cutile-python@e517b6d )
ct.tiled_view(..., traversal_steps=...) + load/store via StridedView (NVIDIA/cutile-python@85da1e3 )
ct.load_advanced_indexing / ct.store_advanced_indexing: GatherScatterView via advanced indexing (NVIDIA/cutile-python@c2360bd , renamed in NVIDIA/cutile-python@d10a5da )
ct.mma_scaled: block-scaled narrow precision MMA. Add muladd_scaled and FP8 muladd #239 (NVIDIA/cutile-python@8ce0189 )
ct.mma(..., use_fast_acc=True): fp8 MMA fast accumulator. Add muladd_scaled and FP8 muladd #239 (NVIDIA/cutile-python@0d172bf )
@ct.kernel(num_worker_warps=...): entry hint for warp-specialized kernels. Add num_worker_warps entry hint #245 (NVIDIA/cutile-python@bfb2960 )
ct.pack_to_bytes / ct.unpack_from_bytes: Add reinterpret as interface to bitcast, pack, and unpack #238 (NVIDIA/cutile-python@fb2bdd0 )
ct.exp(x, rounding_mode=...): expose RoundingMode.FULL/APPROX Initial support for Tile IR 13.3. #234 (NVIDIA/cutile-python@b2d3f82 )
Reactions are currently unavailable
You can’t perform that action at this time.
cuTile Python Parity Checklist
ct.atomic_addon bfloat16 (sm_90+) (NVIDIA/cutile-python@e517b6d)tv.atomic_add,atomic_max,atomic_min,atomic_and,atomic_or,atomic_xor(NVIDIA/cutile-python@e517b6d)ct.tiled_view(..., traversal_steps=...)+ load/store viaStridedView(NVIDIA/cutile-python@85da1e3)ct.load_advanced_indexing/ct.store_advanced_indexing: GatherScatterView via advanced indexing (NVIDIA/cutile-python@c2360bd, renamed in NVIDIA/cutile-python@d10a5da)ct.mma_scaled: block-scaled narrow precision MMA. Addmuladd_scaledand FP8muladd#239 (NVIDIA/cutile-python@8ce0189)ct.mma(..., use_fast_acc=True): fp8 MMA fast accumulator. Addmuladd_scaledand FP8muladd#239 (NVIDIA/cutile-python@0d172bf)@ct.kernel(num_worker_warps=...): entry hint for warp-specialized kernels. Addnum_worker_warpsentry hint #245 (NVIDIA/cutile-python@bfb2960)ct.pack_to_bytes/ct.unpack_from_bytes: Addreinterpretas interface tobitcast,pack, andunpack#238 (NVIDIA/cutile-python@fb2bdd0)ct.exp(x, rounding_mode=...): expose RoundingMode.FULL/APPROX Initial support for Tile IR 13.3. #234 (NVIDIA/cutile-python@b2d3f82)