Skip to content

feat(demo): GPT-2 interactive inference with WarpForth attention#55

Merged
tetsuo-cpp merged 2 commits intocanonfrom
feat/gpt2-demo
Feb 23, 2026
Merged

feat(demo): GPT-2 interactive inference with WarpForth attention#55
tetsuo-cpp merged 2 commits intocanonfrom
feat/gpt2-demo

Conversation

@tetsuo-cpp
Copy link
Owner

@tetsuo-cpp tetsuo-cpp commented Feb 23, 2026

Summary

  • Add end-to-end GPT-2 text generation demo using a WarpForth-compiled attention kernel
  • Attention kernel uses f32 global memory (F32@/F32!) with f64 shared memory for softmax precision
  • PyCUDA wrapper shares PyTorch's CUDA context via autoprimaryctx for zero-copy tensor passing
  • Add f32 attention GPU end-to-end test validating reduced-width float memory ops

Closes #46

Test plan

  • Verify cmake --build build --target check-warpforth passes
  • Run uv run ruff check demo/ gpu_test/ — no lint errors
  • On a GPU instance: compile kernel, run gpt2_generate.py, verify coherent text output

@tetsuo-cpp tetsuo-cpp merged commit 7f82e8f into canon Feb 23, 2026
1 check passed
@tetsuo-cpp tetsuo-cpp deleted the feat/gpt2-demo branch February 23, 2026 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GPT-2 interactive inference demo with WarpForth attention

1 participant