A Diffusion Model Research Framework
Alchemy Lab is a modular research infrastructure for building, training, and deploying diffusion models for image generation.
It is designed to:
- Enable fast, configuration-driven experimentation
- Facilitate evaluation, monitoring, and analysis
- Bridge research prototypes and scalable systems
It is not intended to compete with high-level libraries such as those offered by Hugging Face, or vLLM/sglang. Instead, it is closer to a personal research platform.
- Modular diffusion core
- Composable UNet-style architectures
- Support for latent diffusion
- Configuration-driven experiment management
- Structured training harness with clear separation from model primitives
- Support for distributed training (DDP)
Alchemy Lab is organised as a monorepo with three pillars:
- core - mathematical primitives and model components
- lab - experiment configuration and training infrastructure
- runtime - inference and deployment capabilities
src/alchemy/
|-- core/ # diffusion primitives, model components
|-- lab/ # training infrastructure
|-- runtime/ # inference
Alchemy Lab may be installed using uv after cloning to your local machine:
git clone https://github.com/j9smith/alchemy-lab
cd alchemy-lab
uv syncOnce installed, experiments can be parameterised by amending the configuration files found in lab/configs, and then executed via the entrypoint lab/cli/train.py:
cd alchemy-lab/src/alchemy/lab/cli
uv run python train.pyExample config file:
defaults:
- model: unet2d
- vae: sd_vae_ft_mse
- data: celeba_256
- optim: adamw
- loss: eps_linear
- logging: default
- checkpoints: default
train:
resume: "checkpoint.pt"
precision: fp32
max_steps: 50000
lr: 0.0002
ema_decay: 0.9999
log_dir: "./log_dir/"
experiment_name: "default"
save_every_n_steps: 5000
save_path: "./weights/"
save_prefix: "unet"
dist:
backend: ncclImages can be sampled by loading saved checkpoints via the cli/sample.py script:
uv run python sample.py --ckpt ./weights/unet_stepXXX.pt --device cuda --use_ema --n 24Sampled images are stored in output/samples.png.
Training can be profiled by leveraging in-code NVTX annotations to produce NVIDIA Nsight Systems reports. This can be achieved by loading the dedicated profiling script lab/cli/profiling.py under nsys (ensure that nsys is installed):
nsys profile \
--output ~/reports/alchemy_$(date +%Y%m%d_%H%M%S) \
--trace cuda,nvtx,osrt,cublas,cudnn \
--capture-range cudaProfilerApi \
--capture-range-end stop \
--stats true \
--gpu-metrics-devices all \
uv run python profiling.pyAlchemy Lab also includes a lightweight inference server implemented in C++ using TensorRT for accelerated GPU inference.
- NVIDIA GPU with TensorRT installed
- Docker with NVIDIA Container Toolkit
- ONNX model files exported from a trained checkpoint via
lab/cli/export.py - TensorRT
.planfiles converted from the exported ONNX models viatrtexec
Firstly, a trained checkpoint must be exported to ONNX. The following will export both the denoiser and (if specified) the decoder loaded in config:
cd src/alchemy/lab/cli
uv run python export.py --ckpt ./weights/unet_stepXXX.pt --use_emaThen convert to TensorRT plans (must be run on the target GPU inside the runtime container; adjust shapes as necessary):
trtexec --onnx=denoiser.onnx --saveEngine=denoiser.plan \
--minShapes=xt:1x4x32x32,t:1 --optShapes=xt:4x4x32x32,t:4 --maxShapes=xt:8x4x32x32,t:8
trtexec --onnx=decoder.onnx --saveEngine=decoder.plan \
--minShapes=latent:1x4x32x32 --optShapes=latent:4x4x32x32 --maxShapes=latent:8x4x32x32Start the dev container from src/alchemy/runtime/:
docker compose up -d
docker compose exec api bashBuild and launch the server:
cd build
cmake --build .
./alchemy-runtimeTo generate an image while the server is running, from another terminal:
curl -X POST http://localhost:8000/generate -o output.ppm
xdg-open output.ppmAlchemy Lab is very much a work in progress. Planned extensions include:
- DiT architecture
- Mixed precision training
- Distributed training (FSDP)
- Performance enhancements
ONNX exportGeneric deployment infrastructure

