feat(demo): GPT-2 interactive inference with WarpForth attention by tetsuo-cpp · Pull Request #55 · tetsuo-cpp/warpforth

tetsuo-cpp · 2026-02-23T05:11:15Z

Summary

Add end-to-end GPT-2 text generation demo using a WarpForth-compiled attention kernel
Attention kernel uses f32 global memory (F32@/F32!) with f64 shared memory for softmax precision
PyCUDA wrapper shares PyTorch's CUDA context via autoprimaryctx for zero-copy tensor passing
Add f32 attention GPU end-to-end test validating reduced-width float memory ops

Closes #46

Verify cmake --build build --target check-warpforth passes
Run uv run ruff check demo/ gpu_test/ — no lint errors
On a GPU instance: compile kernel, run gpt2_generate.py, verify coherent text output

…kernel

tetsuo-cpp added 2 commits February 23, 2026 14:09

test(gpu-test): add f32 attention end-to-end GPU test

d7b5094

feat(demo): add GPT-2 interactive inference with WarpForth attention …

2aade97

…kernel

tetsuo-cpp merged commit 7f82e8f into canon Feb 23, 2026
1 check passed

tetsuo-cpp deleted the feat/gpt2-demo branch February 23, 2026 05:15