feat: add KittenTTS Nano CoreML conversion (15M param distilled Kokoro/StyleTTS2)#33
feat: add KittenTTS Nano CoreML conversion (15M param distilled Kokoro/StyleTTS2)#33Alex-Wengg wants to merge 1 commit intomainfrom
Conversation
…o/StyleTTS2) ONNX INT8 → dequantized FP32 PyTorch → CoreML mlprogram conversion pipeline. Reconstructs the full model architecture from ONNX graph, fixes 9 bugs including LSTM gate reordering, Snake activations, dilations, and fp32 phase accumulation precision drift in CoreML runtime. Achieves 0.963 correlation with PyTorch reference.
| coreml/ | ||
| ├── convert_kittentts.py # Conversion script (model architecture + weight loading + CoreML export) | ||
| ├── README.md # This file | ||
| ├── kitten_tts_nano_weights.npz # Extracted dequantized weights (numpy) | ||
| └── kitten_tts_nano_weights.pt # Extracted weights (PyTorch state dict) | ||
| ``` |
There was a problem hiding this comment.
🔴 Missing pyproject.toml and uv.lock required by AGENTS.md for target directories
Both AGENTS.md files explicitly require that each target directory bundles its own pyproject.toml and uv.lock: "Each target directory is self-contained: pyproject.toml, uv.lock, conversion scripts, docs, and sample assets." Every other coreml target directory in the repo has these files (e.g. models/tts/kokoro/coreml/, models/tts/magpie/coreml/, models/vad/silero-vad/coreml/, etc.). The models/tts/kittentts/coreml/ directory only contains README.md and convert_kittentts.py, missing both pyproject.toml and uv.lock. This means uv sync cannot be run from this target directory, breaking the standard development workflow described in AGENTS.md.
Prompt for agents
Add a pyproject.toml and uv.lock to models/tts/kittentts/coreml/. Follow the pattern from models/tts/kokoro/coreml/pyproject.toml. The pyproject.toml should declare the project dependencies (torch, coremltools, onnx, onnxruntime, numpy, scipy, phonemizer, huggingface_hub) with requires-python = ">=3.10". Then run uv lock to generate the uv.lock file. Also update the README.md Files section (lines 186-191) to include pyproject.toml and uv.lock in the directory listing, and change the Quick Start prerequisites section (lines 13-17) to use uv sync instead of raw pip install.
Was this helpful? React with 👍 or 👎 to provide feedback.
| @@ -0,0 +1,1297 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
🟡 Conversion script filename uses underscores instead of kebab-case
AGENTS.md specifies "Lowercase-kebab-case for files/dirs". The file is named convert_kittentts.py (underscores) instead of convert-kittentts.py (kebab-case). The established pattern for conversion scripts in the repo consistently uses kebab-case: convert-coreml.py, compare-models.py, convert-gguf.py in models/vad/silero-vad/coreml/.
Prompt for agents
Rename models/tts/kittentts/coreml/convert_kittentts.py to models/tts/kittentts/coreml/convert-kittentts.py to follow the kebab-case naming convention specified in AGENTS.md. Also update the references in models/tts/kittentts/README.md (line 24), models/tts/kittentts/coreml/README.md (lines 28, 31, 34, 51, 187), and the coreml/README.md Files section accordingly.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Key CoreML fix: phase accumulation precision
torch.cumsumover 42k steps causes fp32 drift between CoreML and PyTorch runtimes. Higher harmonics (9th at 1800Hz) lose correlation (0.79). Fixed with chunked cumsum: reshape into 300-step frames, cumsum per frame, carry wrapped inter-frame phase. Source module correlation: 0.954 → 0.9999.Verification
Test plan
python convert_kittentts.py --seconds 5 --output kittentts_5s.mlpackageto verify conversionpython convert_kittentts.py --verify-onlyto verify weight loading (561/573 params)🤖 Generated with Claude Code