This document explains the internal design of go-webgpu for contributors. It covers the layered architecture, the FFI contract, the enum translation layer, and the testing strategy.
- High-Level Overview
- FFI Layer Architecture
- Struct Layout Contract
- convert.go — The Enum Translation Layer
- gputypes Relationship
- Async Callbacks
- Testing Strategy
- Ecosystem Context
Your Go Application
│ uses public API (Go strings, typed values)
▼
go-webgpu/wgpu (public package)
├─ Public structs — Go-idiomatic types, gputypes aliases
├─ Method receivers — error-returning wrappers
└─ gputypes_aliases.go — re-exports for single-import UX
│ calls
▼
Internal FFI layer
├─ Wire structs — must match C ABI exactly (field order, padding)
├─ convert.go — translates enum values between gputypes and wgpu-native
├─ loader*.go — cross-platform dynamic library loading
└─ wgpu.go / procs — syscall procedure handles
│ calls
▼
wgpu-native v29 (Rust) — dynamically loaded .dll / .so / .dylib
│ calls
▼
Vulkan / Metal / D3D12 / OpenGL
Design principle: the public API is Go-idiomatic. The FFI-level ABI complexity is entirely contained in the internal layer and never surfaces to users.
Public structs use Go-friendly types. Wire structs must match C memory layout exactly.
// Public struct — user creates this
type TextureDescriptor struct {
Label StringView
Usage gputypes.TextureUsage // Go-typed
Dimension gputypes.TextureDimension
Format gputypes.TextureFormat
// ...
}
// Wire struct — passed directly to wgpu-native
type textureDescriptorWire struct {
NextInChain uintptr
Label StringView
Usage uint64 // CRITICAL: wgpu-native uses uint64 for flags
Dimension uint32 // converted: gputypes value → wgpu-native value
Format uint32 // converted via map in convert.go
// ...
}Conversion happens inside each method before the FFI call:
func (d *Device) CreateTexture(desc *TextureDescriptor) (*Texture, error) {
wire := textureDescriptorWire{
Usage: uint64(desc.Usage), // bitflag: direct cast
Dimension: toWGPUTextureDimension(desc.Dimension), // +1 shift
Format: toWGPUTextureFormat(desc.Format), // lookup table
// ...
}
// Call wgpu-native with &wire, not &desc
}The user never sees wire structs. They are unexported and created only at the FFI call site.
All wgpu-native functions are loaded once at Init() time via platform-specific loaders:
loader_windows.go—syscall.LazyDLL/syscall.LazyProcloader_unix.go— goffiffi.LoadLibrary/ffi.GetSymbol
Procedures are package-level variables (procCreateInstance, procDeviceGetQueue, etc.) used directly in method bodies.
Every wire struct field order, type, and padding must exactly match the C struct in webgpu.h.
This is verified at compile time by abi_test.go (271 tests) using unsafe.Sizeof and unsafe.Offsetof assertions.
Rules:
- Field order must follow the C struct definition (copy from webgpu.h comments)
- Enum fields that are
uint32in C must beuint32in Go (not Go enum types) - Flag fields that are
uint64in C must beuint64in Go (wgpu-native extends several flag types to 64-bit) - Pointer fields are
uintptr(8 bytes on 64-bit, coversWGPUSomething*andvoid*) - Booleans are
uint32(WGPUBool= Cuint32_t) StringViewis{Data uintptr, Length uintptr}— two pointers (16 bytes on 64-bit)
Any struct change requires updating both the Go struct and abi_test.go.
Example of a v29 ABI-breaking change: Limits gained NextInChain uintptr as its first field and two fields changed position. This would silently corrupt all limit queries if the wire struct was not updated.
convert.go bridges two independent numbering systems: gputypes (Go-idiomatic, WebGPU JS spec) and wgpu-native v29 (C spec, with some structural differences).
The wgpu-native v29 C header introduces BindingNotUsed=0 as a sentinel for binding-related enums. gputypes does not have this sentinel and starts from Undefined=0, shifting all subsequent values by one position.
| Enum | Reason | Conversion |
|---|---|---|
BufferBindingType |
v29 adds BindingNotUsed=0 |
Undefined=0 → 0, others +1 |
SamplerBindingType |
same | same |
TextureSampleType |
same | same |
StorageTextureAccess |
same | same |
VertexFormat |
gputypes omits single-component 8/16-bit variants added in v29 | full lookup table |
VertexStepMode |
gputypes has VertexBufferNotUsed=1 removed in v29; values shifted |
lookup table |
Conversion functions follow the naming convention toWGPU<EnumName> and return uint32.
All bitflag enums match between gputypes and wgpu-native v29 because they use power-of-2 values defined by the WebGPU spec:
TextureFormat— gputypes v0.3.0 matches v29 exactly, includingR16*/RG16*Unorm/Snorm variants- All flag types:
BufferUsage,TextureUsage,ShaderStage,ColorWriteMask,MapMode LoadOp,StoreOp,BlendFactor,BlendOperationPrimitiveTopology,FrontFace,CullMode,IndexFormatFilterMode,MipmapFilterMode,AddressMode,CompareFunction,StencilOperationTextureViewDimension,TextureDimension,TextureAspectPresentMode,CompositeAlphaMode,PowerPreference
For these enums, a direct uint32(value) cast is safe.
When adding support for a new enum, check whether it has a BindingNotUsed=0 or structural gap. Compare the gputypes values against the webgpu.h C header values before deciding whether a converter is needed.
gogpu/gputypes is the shared type registry for the entire gogpu ecosystem. It defines WebGPU enums and struct types following the WebGPU JavaScript specification numbering.
go-webgpu re-exports gputypes types and constants as type aliases in gputypes_aliases.go:
// gputypes_aliases.go
type TextureFormat = gputypes.TextureFormat
const TextureFormatBGRA8Unorm = gputypes.TextureFormatBGRA8Unorm
// ...This means users can write wgpu.TextureFormatBGRA8Unorm with a single import. There is no wrapping or copying — wgpu.TextureFormat and gputypes.TextureFormat are the same type at the Go type system level.
Important: gputypes values are NOT the same numbers as wgpu-native C values for the enums listed in the conversion table above. convert.go is the bridge between these two numbering systems. Never pass a gputypes enum value directly to an FFI function without checking whether a converter exists for that type.
wgpu-native uses callbacks for RequestAdapter, RequestDevice, MapAsync, and error scopes. The Go implementation converts these to synchronous calls using channels:
- A
*Requeststate struct with adone chan struct{}is allocated and registered in a global map - The request ID is passed as
Userdata1to the C callback - The callback (registered via
ffi.NewCallback) looks up the request by ID, writes results, and closesdone - The calling goroutine blocks on
<-req.donein a select loop that also callsProcessEvents()
Callback function pointers are created once via sync.Once and reused across calls. The global registry is protected by a sync.Mutex.
Thread safety: each callback modifies only its own request struct, and deletes the map entry atomically under the lock before writing to the struct, so there is no race between the callback and the waiting goroutine.
Tests are organized into three tiers by GPU requirement:
abi_test.go — 271 assertions verifying:
unsafe.Sizeof(wireStruct)matchesC sizeof(WGPUStruct)unsafe.Offsetof(wireStruct.Field)matches C field offsets- Enum constant values match webgpu.h integers
- gputypes type alignment with wgpu-native values for pass-through enums
These tests have zero external dependencies and catch ABI regressions immediately.
Tests that exercise logic without making FFI calls to wgpu-native:
TestMat4*,TestVec3*— math helpersTestStructSizes*— additional struct size checksTestWGPUError*— error type assertionsTestNullGuard*— nil/released handle defensive checksFuzz*— FFI boundary fuzz targets (seed corpus only in CI)
Filter: -run "Mat4|Vec3|StructSizes|CheckInit|WGPUError|Fuzz|NullGuard" in .github/workflows/.
Tests that require a real GPU and wgpu-native loaded:
TestAdapter*,TestDevice*,TestBuffer*,TestSurface*TestLeak*,TestErrorScope*
These require WGPU_NATIVE_PATH pointing to a valid wgpu-native binary. GitHub Actions runners have no GPU, so these tests are excluded from CI filter patterns.
Buffer mapping in WebGPU is inherently async: the GPU must finish any in-flight work on the buffer before the CPU can access it. go-webgpu provides three access patterns:
Buffer.Map(ctx, mode, offset, size) error — the recommended path for most applications.
- Calls
mapAsyncStartwhich issueswgpuBufferMapAsyncwith a Go callback registered inmapRequests(global map, protected bysync.Mutex) - The callback (
mapCallbackHandler) is a C-callable function pointer created once viaffi.NewCallback - After submitting the request,
Mapkicks an initialDevice.Poll(false)(for synchronous-complete backends) - If not immediately complete, a background goroutine drives
Device.Pollcontinuously so the mapping resolves without the caller needing to pump events - Blocks on
<-req.doneorctx.Done(), whichever fires first
Buffer.MapAsync(mode, offset, size) (*MapPending, error) — for callers that want to do other work while waiting.
- Same
mapAsyncStartpath as Map - Returns a
*MapPendingimmediately without blocking MapPending.Status()performs a non-blockingselectonreq.doneMapPending.Wait(ctx)blocks with context support- Caller must drive
Device.Pollexternally untilStatus()returns ready
After Map or MapAsync resolves, Buffer.MappedRange(offset, size) (*MappedRange, error) wraps Buffer.GetMappedRange and validates the buffer state (must be BufferMapStateMapped). The returned MappedRange.Bytes() returns a []byte backed by the GPU-mapped memory, valid until Buffer.Unmap().
caller
│ Map(ctx, ...)
▼
mapAsyncStart ──► wgpuBufferMapAsync (C FFI)
│ │
│ C callback ──► mapCallbackHandler (Go)
│ │ closes req.done
▼
select req.done / ctx.Done
│ done
▼
MappedRange ──► Bytes() ──► []byte view into GPU memory
│
▼
Unmap ──► wgpuBufferUnmap (C FFI)
Adapter.Limits() and Device.Limits() return a cached Limits value with no error. Limits are fetched once via wgpuAdapterGetLimits / wgpuDeviceGetLimits at RequestAdapter / RequestDevice time and stored inside the Adapter / Device struct.
This design has two benefits:
- No error handling at call site — limits are always available once you have a valid Adapter or Device
- No FFI overhead on repeated access — common in render loops that check
MaxUniformBufferBindingSizeor similar
The cached value is read-only and thread-safe (written once before the struct is returned to the caller).
Every public descriptor type has an unexported *Wire counterpart used at the FFI boundary. The pattern is always:
- Public struct — Go-idiomatic types (
string,bool,*T,[]T) - Conversion — inside the method body, construct the wire struct from the public struct
- Wire struct — C-layout types (
uintptr,uint32,uint64,StringView,Bool) - FFI call — pass
unsafe.Pointerto the wire struct
The wire struct is a local variable on the stack; its address is safe to pass to wgpu-native only for the duration of the FFI call (wgpu-native copies all descriptor data synchronously).
// Public descriptor — what the user writes
type BufferDescriptor struct {
Label string
Usage gputypes.BufferUsage
Size uint64
MappedAtCreation bool
}
// Wire struct — matches WGPUBufferDescriptor byte-for-byte
type bufferDescriptorWire struct {
NextInChain uintptr
Label StringView // {Data uintptr, Length uintptr}
Usage gputypes.BufferUsage // uint64 on wgpu-native
Size uint64
MappedAtCreation Bool // uint32 (WGPUBool)
_pad [4]byte
}The 271 ABI tests in abi_test.go verify that every wire struct matches the C header at compile time.
born-ml/born (ML framework)
│ uses gogpu for GPU compute
▼
gogpu/gogpu (graphics framework)
│ backend selection
┌┴──────────────────────────┐
▼ ▼
go-webgpu/webgpu gogpu/wgpu
(FFI → wgpu-native) (Pure Go WebGPU)
│
▼
go-webgpu/goffi (Pure Go FFI)
Shared: gogpu/gputypes ← WebGPU type definitions used by all
| Project | Approach | Runtime requirement |
|---|---|---|
| go-webgpu/webgpu | FFI to wgpu-native (Rust) | wgpu_native.dll / .so / .dylib |
| gogpu/wgpu | Pure Go WebGPU implementation | None |
The two implementations share gputypes as their type contract. This is what makes gogpu backend switching possible: the same gputypes.TextureFormat constant works with both backends, and only the FFI translation (our convert.go) changes.