Skip to content

jool-space/PTX.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PTX.jl

Dev Build Status Coverage

A Julia interface for NVIDIA PTX, composing additively with CUDA.jl.

using PTX, CUDACore

function add_kernel!(c, a, b)
    tid = ptx"mov.u32"(sreg"%tid.x")
    i = Int(tid) + 1
    c[i] = ptx"add.f32"(a[i], b[i])
    return
end

let
    n = 128
    a = cu(randn(n))
    b = cu(randn(n))
    c = similar(a)
    @cuda threads=n add_kernel!(c, a, b)
    c == a + b
end

PTX.jl fills the gap CUDA.jl leaves uncovered: full TensorCore shape coverage (incl. TF32, FP8, sub-byte), TMA descriptors, cluster APIs, mbarriers, FP8 conversions, setmaxnreg, match.sync, prmt, and the rest of what <mma.h>, <cuda_pipeline.h>, and <cuda/barrier> ultimately compile down to. Composition with CUDA.jl is strictly additive — CUDA.jl owns launch, memory, and control flow; PTX.jl owns specialty op emission.

ptx_to_julia(source) turns a .ptx file into idiomatic Julia where each register is a variable and each instruction is a ptx"..."(...) call. See the Transpiler docs.

Credits

Primary design inspiration: pyptx by Patrick Toulmé. The parser, IR, and several wrappers and example kernels are ported from pyptx (Apache 2.0); see per-file headers and LICENSE.

Releases

No releases published

Contributors

Languages