Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,11 @@ uv run ruff format gpu_test/

- **Stack Type**: `!forth.stack` - untyped stack, programmer ensures type safety
- **Operations**: All take stack as input and produce stack as output (except `forth.stack`)
- **Supported Words**: literals (integer `42` and float `3.14`), `DUP DROP SWAP OVER ROT NIP TUCK PICK ROLL`, `+ - * / MOD`, `F+ F- F* F/` (float arithmetic), `FEXP FSQRT FLOG FABS FNEG` (float math intrinsics), `FMAX FMIN` (float min/max), `AND OR XOR NOT LSHIFT RSHIFT`, `= < > <> <= >= 0=`, `F= F< F> F<> F<= F>=` (float comparison), `S>F F>S` (int/float conversion), `@ !` (global memory), `F@ F!` (float global memory), `S@ S!` (shared memory), `SF@ SF!` (float shared memory), `CELLS`, `IF ELSE THEN`, `BEGIN UNTIL`, `BEGIN WHILE REPEAT`, `DO LOOP +LOOP I J K`, `LEAVE UNLOOP EXIT`, `{ a b -- }` (local variables in word definitions), `TID-X/Y/Z BID-X/Y/Z BDIM-X/Y/Z GDIM-X/Y/Z GLOBAL-ID` (GPU indexing).
- **Supported Words**: literals (integer `42` and float `3.14`), `DUP DROP SWAP OVER ROT NIP TUCK PICK ROLL`, `+ - * / MOD`, `F+ F- F* F/` (float arithmetic), `FEXP FSQRT FLOG FABS FNEG` (float math intrinsics), `FMAX FMIN` (float min/max), `AND OR XOR NOT LSHIFT RSHIFT`, `= < > <> <= >= 0=`, `F= F< F> F<> F<= F>=` (float comparison), `S>F F>S` (int/float conversion), `@ !` (global memory), `F@ F!` (float global memory), `S@ S!` (shared memory), `SF@ SF!` (float shared memory), `I8@ I8! SI8@ SI8!` (i8 memory), `I16@ I16! SI16@ SI16!` (i16 memory), `I32@ I32! SI32@ SI32!` (i32 memory), `HF@ HF! SHF@ SHF!` (f16 memory), `BF@ BF! SBF@ SBF!` (bf16 memory), `F32@ F32! SF32@ SF32!` (f32 memory), `CELLS`, `IF ELSE THEN`, `BEGIN UNTIL`, `BEGIN WHILE REPEAT`, `DO LOOP +LOOP I J K`, `LEAVE UNLOOP EXIT`, `{ a b -- }` (local variables in word definitions), `TID-X/Y/Z BID-X/Y/Z BDIM-X/Y/Z GDIM-X/Y/Z GLOBAL-ID` (GPU indexing).
- **Float Literals**: Numbers containing `.` or `e`/`E` are parsed as f64 (e.g. `3.14`, `-2.0`, `1.0e-5`, `1e3`). Stored on the stack as i64 bit patterns; F-prefixed words perform bitcast before/after operations.
- **Kernel Parameters**: Declared in the `\!` header. `\! kernel <name>` is required and must appear first. `\! param <name> i64[<N>]` becomes a `memref<Nxi64>` argument; `\! param <name> i64` becomes an `i64` argument. `\! param <name> f64[<N>]` becomes a `memref<Nxf64>` argument; `\! param <name> f64` becomes an `f64` argument (bitcast to i64 when pushed to stack). Using a param name in code emits `forth.param_ref` (arrays push address; scalars push value).
- **Shared Memory**: `\! shared <name> i64[<N>]` or `\! shared <name> f64[<N>]` declares GPU shared (workgroup) memory. Emits a tagged `memref.alloca` at kernel entry; ForthToGPU converts it to a `gpu.func` workgroup attribution. Using the shared name in code pushes its base address onto the stack. Use `S@`/`S!` for i64 or `SF@`/`SF!` for f64 shared accesses. Cannot be referenced inside word definitions.
- **Reduced-Width Memory**: `I8@ I16@ I32@` load a narrow integer, sign-extend to i64. `I8! I16! I32!` truncate i64 to narrow integer, store. `HF@ BF@ F32@` load a narrow float, extend to f64, bitcast to i64. `HF! BF! F32!` bitcast i64 to f64, truncate to narrow float, store. `S`-prefixed variants (`SI8@`, `SHF!`, etc.) use shared memory (address space 3).
- **Conversion**: `!forth.stack` → `memref<256xi64>` with explicit stack pointer
- **GPU**: Functions wrapped in `gpu.module`, `main` gets `gpu.kernel` attribute, configured with bare pointers for NVVM conversion
- **Local Variables**: `{ a b c -- }` at the start of a word definition binds read-only locals. Pops values from the stack in reverse name order (c, b, a) using `forth.pop`, stores SSA values. Referencing a local emits `forth.push_value`. SSA values from the entry block dominate all control flow, so locals work across IF/ELSE/THEN, loops, etc. On GPU, locals map directly to registers.
Expand Down
184 changes: 184 additions & 0 deletions include/warpforth/Dialect/Forth/ForthOps.td
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,190 @@ def Forth_SharedStoreFOp : Forth_StackOpBase<"shared_storef"> {
}];
}

//===----------------------------------------------------------------------===//
// Reduced-width memory operations.
//===----------------------------------------------------------------------===//

// --- i8 ---
def Forth_LoadI8Op : Forth_StackOpBase<"load_i8"> {
let summary = "Load i8 value from memory, sign-extend to i64";
let description = [{
Pops an address, loads an i8, sign-extends to i64, pushes result.
Forth semantics: ( addr -- value )
}];
}
def Forth_StoreI8Op : Forth_StackOpBase<"store_i8"> {
let summary = "Truncate i64 to i8 and store to memory";
let description = [{
Pops address and value, truncates i64 to i8, stores to memory.
Forth semantics: ( x addr -- )
}];
}
def Forth_SharedLoadI8Op : Forth_StackOpBase<"shared_load_i8"> {
let summary = "Load i8 from shared memory, sign-extend to i64";
let description = [{
Pops an address, loads an i8 from shared memory, sign-extends to i64.
Forth semantics: ( addr -- value )
}];
}
def Forth_SharedStoreI8Op : Forth_StackOpBase<"shared_store_i8"> {
let summary = "Truncate i64 to i8 and store to shared memory";
let description = [{
Pops address and value, truncates i64 to i8, stores to shared memory.
Forth semantics: ( x addr -- )
}];
}

// --- i16 ---
def Forth_LoadI16Op : Forth_StackOpBase<"load_i16"> {
let summary = "Load i16 value from memory, sign-extend to i64";
let description = [{
Pops an address, loads an i16, sign-extends to i64, pushes result.
Forth semantics: ( addr -- value )
}];
}
def Forth_StoreI16Op : Forth_StackOpBase<"store_i16"> {
let summary = "Truncate i64 to i16 and store to memory";
let description = [{
Pops address and value, truncates i64 to i16, stores to memory.
Forth semantics: ( x addr -- )
}];
}
def Forth_SharedLoadI16Op : Forth_StackOpBase<"shared_load_i16"> {
let summary = "Load i16 from shared memory, sign-extend to i64";
let description = [{
Pops an address, loads an i16 from shared memory, sign-extends to i64.
Forth semantics: ( addr -- value )
}];
}
def Forth_SharedStoreI16Op : Forth_StackOpBase<"shared_store_i16"> {
let summary = "Truncate i64 to i16 and store to shared memory";
let description = [{
Pops address and value, truncates i64 to i16, stores to shared memory.
Forth semantics: ( x addr -- )
}];
}

// --- i32 ---
def Forth_LoadI32Op : Forth_StackOpBase<"load_i32"> {
let summary = "Load i32 value from memory, sign-extend to i64";
let description = [{
Pops an address, loads an i32, sign-extends to i64, pushes result.
Forth semantics: ( addr -- value )
}];
}
def Forth_StoreI32Op : Forth_StackOpBase<"store_i32"> {
let summary = "Truncate i64 to i32 and store to memory";
let description = [{
Pops address and value, truncates i64 to i32, stores to memory.
Forth semantics: ( x addr -- )
}];
}
def Forth_SharedLoadI32Op : Forth_StackOpBase<"shared_load_i32"> {
let summary = "Load i32 from shared memory, sign-extend to i64";
let description = [{
Pops an address, loads an i32 from shared memory, sign-extends to i64.
Forth semantics: ( addr -- value )
}];
}
def Forth_SharedStoreI32Op : Forth_StackOpBase<"shared_store_i32"> {
let summary = "Truncate i64 to i32 and store to shared memory";
let description = [{
Pops address and value, truncates i64 to i32, stores to shared memory.
Forth semantics: ( x addr -- )
}];
}

// --- f16 ---
def Forth_LoadF16Op : Forth_StackOpBase<"load_f16"> {
let summary = "Load f16 from memory, extend to f64, bitcast to i64";
let description = [{
Pops an address, loads f16, extends to f64, bitcasts to i64.
Forth semantics: ( addr -- value )
}];
}
def Forth_StoreF16Op : Forth_StackOpBase<"store_f16"> {
let summary = "Bitcast i64 to f64, truncate to f16, store to memory";
let description = [{
Pops address and value, bitcasts i64 to f64, truncates to f16, stores.
Forth semantics: ( x addr -- )
}];
}
def Forth_SharedLoadF16Op : Forth_StackOpBase<"shared_load_f16"> {
let summary = "Load f16 from shared memory, extend to f64, bitcast to i64";
let description = [{
Pops an address, loads f16 from shared memory, extends to f64, bitcasts to i64.
Forth semantics: ( addr -- value )
}];
}
def Forth_SharedStoreF16Op : Forth_StackOpBase<"shared_store_f16"> {
let summary = "Bitcast i64 to f64, truncate to f16, store to shared memory";
let description = [{
Pops address and value, bitcasts i64 to f64, truncates to f16, stores to shared memory.
Forth semantics: ( x addr -- )
}];
}

// --- bf16 ---
def Forth_LoadBF16Op : Forth_StackOpBase<"load_bf16"> {
let summary = "Load bf16 from memory, extend to f64, bitcast to i64";
let description = [{
Pops an address, loads bf16, extends to f64, bitcasts to i64.
Forth semantics: ( addr -- value )
}];
}
def Forth_StoreBF16Op : Forth_StackOpBase<"store_bf16"> {
let summary = "Bitcast i64 to f64, truncate to bf16, store to memory";
let description = [{
Pops address and value, bitcasts i64 to f64, truncates to bf16, stores.
Forth semantics: ( x addr -- )
}];
}
def Forth_SharedLoadBF16Op : Forth_StackOpBase<"shared_load_bf16"> {
let summary = "Load bf16 from shared memory, extend to f64, bitcast to i64";
let description = [{
Pops an address, loads bf16 from shared memory, extends to f64, bitcasts to i64.
Forth semantics: ( addr -- value )
}];
}
def Forth_SharedStoreBF16Op : Forth_StackOpBase<"shared_store_bf16"> {
let summary = "Bitcast i64 to f64, truncate to bf16, store to shared memory";
let description = [{
Pops address and value, bitcasts i64 to f64, truncates to bf16, stores to shared memory.
Forth semantics: ( x addr -- )
}];
}

// --- f32 ---
def Forth_LoadF32Op : Forth_StackOpBase<"load_f32"> {
let summary = "Load f32 from memory, extend to f64, bitcast to i64";
let description = [{
Pops an address, loads f32, extends to f64, bitcasts to i64.
Forth semantics: ( addr -- value )
}];
}
def Forth_StoreF32Op : Forth_StackOpBase<"store_f32"> {
let summary = "Bitcast i64 to f64, truncate to f32, store to memory";
let description = [{
Pops address and value, bitcasts i64 to f64, truncates to f32, stores.
Forth semantics: ( x addr -- )
}];
}
def Forth_SharedLoadF32Op : Forth_StackOpBase<"shared_load_f32"> {
let summary = "Load f32 from shared memory, extend to f64, bitcast to i64";
let description = [{
Pops an address, loads f32 from shared memory, extends to f64, bitcasts to i64.
Forth semantics: ( addr -- value )
}];
}
def Forth_SharedStoreF32Op : Forth_StackOpBase<"shared_store_f32"> {
let summary = "Bitcast i64 to f64, truncate to f32, store to shared memory";
let description = [{
Pops address and value, bitcasts i64 to f64, truncates to f32, stores to shared memory.
Forth semantics: ( x addr -- )
}];
}

def Forth_ParamRefOp : Forth_Op<"param_ref", [Pure]> {
let summary = "Push kernel parameter address onto stack";
let description = [{
Expand Down
Loading