Add optimized/tflite: portable CPU (LiteRT / TFLite) release by Cortexelus · Pull Request #61 · Stability-AI/stable-audio-3

Cortexelus · 2026-07-02T04:59:13Z

Portable CPU (LiteRT / TFLite) release of the SA3 pipeline, parallel to optimized/mlx and optimized/tensorRT. Runs anywhere ai_edge_litert runs (macOS / Linux, x86 / ARM, via the XNNPACK delegate).

What's here

scripts/sa3_tflite.py — CLI matching the sa3_mlx / sa3_trt flag names and modes: text-to-audio, audio-to-audio (--init-audio), inpainting (--inpaint-range), CFG (--cfg + --negative-prompt + --apg), plus --threads and --cfg-batched.
scripts/weights.py — lazy HuggingFace auto-download (same ensure_local + symlink-from-cache pattern as MLX), pulling the fp32 .tflite models from stabilityai/stable-audio-3-optimized/tflite/*.
models/defs/tflite_pipeline.py — tokenizer + T5Gemma wrapper + pingpong sampler.
models/tokenizer.model — bundled SentencePiece (4 MB); the .tflite T5Gemma is encoder-only, so the tokenizer ships in-repo (cf. the TensorRT release bundling tokenizer.json).
install.sh / bootstrap.sh / sa3 wrapper / install.py — mirror the MLX release (uv-based; portable, no Apple-Silicon gate).

Notes

Baked-I/O variable-length graphs (conditioner + patch/unpatch in-graph). The canonical DiT is variable-batch, so CFG runs cond+uncond as one batch=2 invoke by default (--cfg-batched; ~7-29% faster on Apple-Silicon AMX) or sequential batch=1 (--no-cfg-batched, like TensorRT) — bit-identical.
Monotonic-rebuild audio-to-audio schedule, self-contained here. The equivalent change to the MLX/TRT samplers is a separate PR (sa3-a2a-monotonic-schedule).
SAME-L chunked decode (chunk=64, overlap=8); SAME-S decodes whole.

All fp32 except T5Gemma (fp16). Models are already live on HF. Tested end-to-end (HF download to 20 s generation, healthy audio; audio-to-audio verified).

Mirrors optimized/mlx and optimized/tensorRT — a LiteRT/TFLite CPU release of the SA3 pipeline. Auto-downloads the fp32 .tflite models from HuggingFace (stabilityai/stable-audio-3-optimized/tflite/*) via scripts/weights.py, exactly like the MLX release. CLI (scripts/sa3_tflite.py) matches the sa3_mlx / sa3_trt flag names and modes: text-to-audio, audio-to-audio (--init-audio), inpainting (--inpaint-range), and CFG (--cfg + --negative-prompt + --apg), plus --threads. - Baked-I/O variable-length graphs (conditioner + patch/unpatch in-graph). The DiT is variable-batch, so CFG runs cond+uncond as one batch=2 invoke by default (--cfg-batched; ~7-29% faster on Apple-Silicon AMX) or sequential batch=1 (--no-cfg-batched, like TensorRT); bit-identical either way. - Monotonic-rebuild pingpong schedule (matches the MLX/TRT audio-to-audio fix). - SentencePiece tokenizer bundled (models/tokenizer.model, 4 MB) since the .tflite T5Gemma is encoder-only. - SAME-L chunked decode (chunk=64, overlap=8); SAME-S decodes whole. install.sh / bootstrap.sh / the ./sa3 wrapper mirror the MLX release.

Cortexelus force-pushed the optimized-tflite-release branch from e0d4614 to 06e6418 Compare July 2, 2026 05:09

Cortexelus merged commit ea9ba36 into main Jul 2, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add optimized/tflite: portable CPU (LiteRT / TFLite) release#61

Add optimized/tflite: portable CPU (LiteRT / TFLite) release#61
Cortexelus merged 1 commit into
mainfrom
optimized-tflite-release

Cortexelus commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Cortexelus commented Jul 2, 2026

What's here

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant