A minimal userspace library for dispatching compute work to AMDGPU hardware directly through the Linux KFD (Kernel Fusion Driver) interface. The goal of this project is to be a self-contained C++ library to explore and learn AMDGPU internals.
Important
This is a personal project done for educational purposes only. It is not affiliated with nor a replacement for AMD's ROCm stack. The interface is largely untested and intentionally incomplete. For a supported compute runtime, use ROCm.
libkfd talks directly to /dev/kfd and the DRM render nodes to manage GPU
resources from user space. It is written in C++23 but the core library does not
depend on any C++ runtime features. The library handles:
- Context -- Opens the KFD device, enumerates the GPU topology, and initializes library state.
- Device -- Represents a single GPU node. Owns the DRM fd, GPUVM aperture, doorbell mapping, and trap handler.
- Memory -- RAII buffer type for allocating VRAM or GTT (system) memory, mapping it to device page tables, and pinning host memory.
- Queues -- User-space ring buffers for submitting PM4 compute packets
(
ComputeQueue) or SDMA memory-movement packets (SDMAQueue) via MMIO doorbells. - Loader -- Loads AMDGPU ELF objects into GPU VRAM, performs relocations, and provides symbol lookup.
- Signals -- Combines KFD events (interrupt-driven wakeup) with GPU-writable fence slots for synchronization across queues and devices.
- Trap handler -- Per-architecture trap handler binaries are embedded at build time and installed on each device for exception delivery.
- Topology -- Parses the KFD sysfs topology to enumerate nodes, memory banks, caches, and IO links.
- Clang >= 21 -- Required for C23
#embed, C++23 features, and the AMDGPU cross-compilation targets used for trap handlers and test kernels. A nightly build or a build from LLVM trunk will work. - CMake >= 3.28
- Ninja (recommended) or Make
- libdrm with amdgpu support (
libdrm_amdgpuvia pkg-config) - xcb, xcb-dri3, xcb-present (optional, for the
computetoytool) - Linux kernel >= 6.7 with KFD enabled (typically
CONFIG_HSA_AMD=y) - An AMDGPU -- GFX9 (Vega), GFX10 (RDNA 1/2), GFX11 (RDNA 3), or GFX12 (RDNA 4)
# presets: `debug`, `release`, `asan`, `tsan`.
cmake --preset release
cmake --build --preset releaseTo install:
cmake --preset release
cmake --build --preset release
cmake --install build/release --prefix /usr/localAfter installing, run kfdinfo to verify that the system has usable GPU
devices:
/usr/local/bin/kfdinfoAfter installing, use find_package:
find_package(libkfd REQUIRED)
add_executable(my_app main.cpp)
target_link_libraries(my_app PRIVATE libkfd::kfd)Or pull the source directly with FetchContent:
include(FetchContent)
FetchContent_Declare(libkfd
GIT_REPOSITORY https://github.com/jhuber6/libkfd.git
GIT_TAG master
)
FetchContent_MakeAvailable(libkfd)
add_executable(my_app main.cpp)
target_link_libraries(my_app PRIVATE libkfd::kfd)Both approaches require Clang >= 21 and libdrm_amdgpu to be available on the
system.
The core workflow is: open a context, get a device, create queues, allocate memory, load a kernel, dispatch, and synchronize. The signal interface is a monotonically decreasing counter.
#include <libkfd/libkfd.h>
// Open /dev/kfd and enumerate GPUs.
auto ctx = KFD_EXPECT(kfd::Context::create());
auto &dev = ctx.devices().front();
// Create a compute queue.
auto compute = KFD_EXPECT(kfd::ComputeQueue::create(dev));
// Load a GPU ELF code object and look up a kernel.
auto exe = KFD_EXPECT(kfd::Executable::load(dev, elf_bytes, compute));
auto kernel = KFD_EXPECT(exe.kernel("my_kernel.kd"));
// Allocate and map a GTT buffer for kernel arguments.
auto buf = KFD_EXPECT(kfd::Buffer::allocate(
dev, size, kfd::MemType::GTT, kfd::MemFlags::WRITABLE));
KFD_EXPECT(buf.map(dev));
// Set up dispatch dimensions and build the kernarg buffer.
kfd::DispatchConfig cfg{.grid = {.x = num_blocks}, .block = {.x = 256}};
auto kernarg = KFD_EXPECT(kernel.alloc());
kernel.fill(kernarg, my_args, cfg);
// Dispatch and wait for completion.
auto sig = KFD_EXPECT(kfd::Signal::create(ctx));
KFD_EXPECT(compute.dispatch(kernel, cfg, kernarg, sig));
KFD_EXPECT(sig.wait(kfd::Condition::EQ, 0, UINT64_MAX));Error handling uses std::expected<T, kfd::Error>, which wraps around standard
Linux errno values. The KFD_EXPECT macro unwraps a value or prints the error
and exits.
A complete working example is in tools/sandbox/, which runs
a SAXPY kernel on the GPU.
GPU kernels are plain C compiled as freestanding AMDGPU executables. Use
<gpuintrin.h> for portable builtins, or the raw Clang attributes directly:
// saxpy.c
#include <gpuintrin.h>
__gpu_kernel void saxpy(float *y, const float *x, float a, unsigned n) {
unsigned i = __gpu_thread_id_x() + __gpu_block_id_x() * __gpu_num_threads_x();
if (i < n)
y[i] = a * x[i] + y[i];
}Compile to an AMDGPU ELF for a specific GPU architecture. The resulting ELF can
be loaded at runtime via kfd::Executable::load like in the SAXPY example.
clang --target=amdgcn--amdhsa -mcpu=gfx1100 -nostdlibinc -O2 saxpy.c -o imageThe test suite covers all of the basic operations. It uses catch2 and can be invoked through ctest.
ctest -j8The project includes some command-line tools to serve as examples. The
gpu-loader utility requires the LLVM libc GPU
headers.
| Tool | Description |
|---|---|
| kfdinfo | Prints a detailed summary of the GPU topology - identity, compute layout, memory, caches, IO links, and firmware versions. |
| sandbox | Runs a SAXPY kernel end-to-end as a minimal libkfd demo. |
| gpu-loader | An llvm-gpu-loader equivalent that launches a main() function on the GPU as a hosted environment. |
| computetoy | Runs a compute kernel as a procedural image synthesizer and presents it to an X11 window via DRI3. Requires xcb, xcb-dri3, and xcb-present. |
include/libkfd/ Public headers
detail/ Internal utilities (ELF, allocators, mutex, etc.)
packets/ PM4 and SDMA packet definitions
lib/ Library implementation
detail/ Internal utilities
device/ Trap handler assembly
tests/
detail/ Unit tests (no GPU needed)
core/ Core subsystem tests (GPU needed)
device/ Full device integration tests (GPU needed)
tools/ Command-line tools
cmake/ CMake and pkg-config install templates
Apache-2.0. See LICENSE for details.