Preliminary release (v0.4.x) — Rengu Flow is under active development. APIs, config keys, CLI commands, and documentation may change in breaking or non-breaking ways between releases. Pin versions and re-read the docs when upgrading.
A TOML-driven training framework for diffusion models. You describe a run in a config file — model, adapter, optimizer, LR scheduler, dataset, and training options — and Rengu Flow launches it with DeepSpeed. Everything is modular and registry-based: models, adapters, optimizers, and schedulers are selected and configured entirely from TOML, with an optional local web UI on top.
- Config-first — One main TOML points to a dataset TOML and sets model, adapter, optimizer, scheduler, and training options. No code changes to start a run.
- Modular & registry-based — Models, adapters, optimizers, and schedulers are pluggable; extend the framework by registering new ones (see architecture).
- VRAM-aware — Block swap (CPU offload of transformer blocks), activation checkpointing and offload, optional quantization, gradient release, and OOM-skip let large models train on a single consumer GPU.
- Self-managing dependencies — The
renguCLI drives uv: it creates the venv, installs Python, and pulls only the optional extras a given config needs (e.g. Cosmos deps when[model] type = "cosmos_predict2"). - External control — Drop signal files in the run directory (
save,save_quit,export_model,preview_now, …) to checkpoint, export, preview, or exit cleanly — no API or restart required. - Local web UI — Optional control panel to build configs and datasets, launch and queue runs, send signals, and watch live progress and previews.
- Adapters & full finetune — LoRA, LoKr (vendored, ComfyUI/Forge-compatible saves), the full LyCORIS algorithm family, and adapter-free full-model finetuning.
- Datasets — Directory datasets with aspect-ratio buckets, multi-resolution resolution schedules, a disk cache (v2) for latents and text embeddings, tag dropout / caption variants, and opt-in augmentation presets.
- Dataset Studio (
rengu prep) — Auto-tagging, captioning, watermark cleanup, and a bulk tag editor for preparing training data (guide). - Training loop — Periodic evaluation on held-out datasets, image previews during training, min-SNR / debiased loss weighting, EMA, and
torch.compile. - Checkpointing & export — Resume checkpoints vs. inference exports, retention limits, scheduled saves, and optional async export (guide).
- Experiment tracking — A single sink fans out to a local manifest, TensorBoard, and (opt-in) Weights & Biases.
- Optimizers & schedulers — Built-in names, fully-qualified import paths, and vendored/extended optimizers (e.g. Prodigy, Automagic, K-Optimizers) selected from TOML.
| Model | Type | Adapters | Notes |
|---|---|---|---|
| Stable Diffusion XL | sdxl |
LoRA, LoKr, LyCORIS, full finetune | Optional UNet-only via freeze_text_encoders. |
| Cosmos Predict2 / Anima | cosmos_predict2 (alias anima) |
LoRA, LoKr, LyCORIS, full finetune | DiT + Wan VAE + Qwen3/T5. Anima checkpoints are this architecture; type = "anima" is accepted as a legacy alias. Needs the cosmos extra. |
Adapter selection. Set [adapter] type = "lora" / "lokr" / a lycoris_* type; omit [adapter] entirely for full-model finetune. The LyCORIS family (requires the lycoris extra) covers: lycoris_locon, lycoris_loha, lycoris_lokr, lycoris_dora, lycoris_dylora, lycoris_glora, lycoris_diag_oft, lycoris_boft. See SDXL training, Cosmos Predict2 / Anima, and full-model finetuning.
You install these (system level):
- OS — Linux, or Windows via WSL2. Native Windows is not supported for training; see the WSL workflow.
- NVIDIA GPU + driver — a CUDA-capable GPU with a driver recent enough for CUDA 13.x (check the "CUDA Version" reported by
nvidia-smi). - CUDA Toolkit 13.x — provides
nvcc, which DeepSpeed uses to JIT-compile its C++/CUDA ops. Its major version must match the PyTorch build (CUDA 13); without it, DeepSpeed's compiled ops fail to build. PointCUDA_HOMEat the toolkit if it is not auto-detected. - uv — required on
PATHfor./renguand./start-ui.sh. uv creates.venvand installs a compatible Python (3.10–3.13) automatically; no separate systempython3needed.
Installed automatically (by rengu init / uv sync):
- PyTorch 2.12 + CUDA 13.0 (
torch==2.12.0+cu130), torchvision 0.27, and DeepSpeed 0.19, from the PyTorchcu130index. - The CUDA 13 runtime stack — cuDNN, cuBLAS, NCCL, cuFFT, cuRAND, … — ships inside those PyTorch wheels (as
nvidia-*-cu13packages). You do not install cuDNN or the runtime libraries separately. - All remaining Python deps (diffusers, PEFT, safetensors, …). Exact versions are pinned in pyproject.toml and uv.lock.
Tested stack (May 2026, WSL2 + NVIDIA): Python 3.13, torch 2.12.0+cu130, torchvision 0.27.0+cu130, deepspeed 0.19.0 — verified end-to-end on an 8 GB RTX 3000 Ada (SDXL + Cosmos LoRA and SDXL full-finetune smokes).
From the repository root (Linux):
./rengu init # create rengu.local.toml + uv sync (base training stack)
./rengu init ui # also install the web UI extraThe ./rengu wrapper runs uv sync on first use, so the venv is built automatically. Install optional extras by listing profiles:
./rengu init cosmos lycoris # Cosmos Predict2 + LyCORIS adapters
./rengu init all # every documented extra| Profile | Installs |
|---|---|
base |
Core training stack (default) |
ui |
Local web control panel |
cosmos / cosmos_predict2 |
Cosmos Predict2 / Anima |
lycoris |
LyCORIS adapter family (incl. LoKr backend) |
optim |
Extended optimizers |
kaon |
K-Optimizers (git-pinned: Adakaon, AdaMuon, KProdigy, …) |
prep |
Dataset Studio (taggers, captioners, watermark cleanup) |
dev |
Test/dev tools |
all |
All of the above |
./rengu init --only-config writes rengu.local.toml and directories without syncing. Advanced users can run uv sync themselves and call .venv/bin/rengu directly. Before any train/validate/cache, Rengu Flow inspects the config and auto-installs any missing extras it needs.
./rengu update # fast-forward pull, re-sync from uv.lock, rebuild UI if presentrengu update pulls the latest project code, re-syncs dependencies from the lockfile, and recompiles the web UI if it was built locally. It also refreshes the optional profiles you already installed (so git-pinned extras like kaon move to their new commit pin); profiles you never installed are left alone. Useful flags: --all-extras (every documented extra), --no-pull (skip the git pull), and --force (discard local tracked code changes and hard-reset when a fast-forward is blocked — never touches untracked/ignored files, so your UI data dir and jobs.db are safe). Check your version with ./rengu version.
-
Set up the environment:
./rengu init
-
Local settings (optional).
rengu initcreatesrengu.local.toml(gitignored) for machine settings — UI host/port, default GPU count, master port, and subprocess env vars. Model checkpoint paths go in the training TOML, not here. Seerengu.local.toml.example. -
Create a training config from an example:
cp examples/minimal_config_lora_sdxl.toml my_train.toml
Edit
my_train.toml: set thedatasetpath and[model]paths (e.g.checkpoint_pathfor SDXL). -
Train:
./rengu train --config my_train.toml
- Validate without training:
./rengu validate --config my_train.toml - Build the dataset cache only:
./rengu cache --config my_train.toml - Resume from the latest checkpoint:
./rengu train --config my_train.toml --resume-from-checkpoint - Run DeepSpeed directly:
deepspeed --num_gpus=1 -m rengu_flow.main --config my_train.toml
- Validate without training:
See the CLI guide for every command, flag, and rengu.local.toml key.
| Command | Description |
|---|---|
rengu init [profiles…] |
Create rengu.local.toml + UI data dir, uv sync the chosen profiles |
rengu update [profiles…] |
Pull, re-sync from uv.lock, refresh installed extras, rebuild UI |
rengu version |
Print Rengu Flow version, git commit, and installed kaon version |
rengu train --config PATH |
Launch a DeepSpeed training run (--num-gpus, --master-port, --resume-from-checkpoint) |
rengu validate --config PATH |
Validate a training config and exit |
rengu cache --config PATH |
Build the dataset cache and exit |
rengu dump-dataset PATH |
Inspect a dataset TOML |
rengu prep <tag|caption|clean|models> |
Dataset Studio: tagging, captioning, watermark cleanup, model list/download |
rengu ui [start|serve|dev|build|reset-db] |
Run or build the local web UI |
Trailing args after -- are forwarded to the trainer (e.g. ./rengu train --config my.toml -- --regenerate_cache). Full flag reference: CLI guide.
./rengu ui startBuilds the frontend if needed, serves the API, and opens a browser. The UI lets you edit training configs and datasets, launch and queue runs, send signal files, and watch live progress and previews. Live progress is parsed from the job's stdout — no extra config required. See the Web UI user guide.
Training reacts to signal files dropped in the run directory (also exposed as buttons in the UI):
| Signal | Effect |
|---|---|
save / save_quit |
Checkpoint now (and exit). |
export_model / export_model_quit |
Export inference weights now (and exit). |
preview_now |
Render the configured preview prompts on the next step. |
continue / quit |
Resume after an export-wait pause / exit without saving. |
Details and the full list: signal files.
- User guide — Training each model, dataset config and prep, optimizers, previews, checkpoints, signal files, the CLI, and the web UI. Start at the docs index.
- Developer guide — Architecture, the model pipeline contract, adding optimizers/schedulers, networks/adapters, VRAM optimization, and testing.
- Implementation backlog — Planned / deferred work.
Rengu Flow is a preliminary release under active development; treat config keys and CLI flags as subject to change between versions. A few specific notes:
- Linux / WSL2 only. Native Windows is not supported. On WSL2, do not set
PYTORCH_CUDA_ALLOC_CONF = "expandable_segments:True"— Rengu Flow detects WSL and applies safe defaults automatically (see the CLI guide). - Built-in models are SDXL and Cosmos Predict2 / Anima. Other architectures (e.g. Flux) are not yet registered — see backlog.
- Cosmos
load_and_fuse_adapteris intentionally unsupported; load adapter weights instead.
Rengu Flow incorporates and adapts work from several projects. See THIRD_PARTY_NOTICES.md for full notices and licenses:
- diffusion-pipe (GPL-3.0) — design and portions of the training flow; vendored optimizers under
rengu_flow/vendor/diffusion_pipe_optimizers/. - NVIDIA Cosmos Predict2 (Apache-2.0) — DiT and LLM-adapter modeling code.
- Alibaba Wan VAE — used by the Cosmos pipeline.
- AI Toolkit / Ostris (MIT) — the Automagic optimizer.
Rengu Flow is distributed under the GNU General Public License v3.0 or later (LICENSE). See THIRD_PARTY_NOTICES.md for incorporated components and their licenses.