Skip to content

feat: add KittenTTS Nano CoreML conversion (15M param distilled Kokoro/StyleTTS2)#33

Open
Alex-Wengg wants to merge 1 commit intomainfrom
feat/kittentts-coreml
Open

feat: add KittenTTS Nano CoreML conversion (15M param distilled Kokoro/StyleTTS2)#33
Alex-Wengg wants to merge 1 commit intomainfrom
feat/kittentts-coreml

Conversation

@Alex-Wengg
Copy link
Member

@Alex-Wengg Alex-Wengg commented Mar 21, 2026

Summary

  • Adds complete ONNX INT8 → CoreML FP32 conversion pipeline for KittenTTS Nano (15M param distilled Kokoro/StyleTTS2, 24kHz)
  • Reconstructs full PyTorch model architecture from ONNX graph analysis and loads dequantized weights (561/573 params)
  • Fixes 9 bugs discovered during conversion: LSTM gate reordering, BERT weight mapping, Snake activations, resblock dilations, reflection padding, conv_post padding, BatchNorm→LayerNorm, NoiseResBlock dilations, and fp32 phase accumulation precision drift in CoreML runtime

Key CoreML fix: phase accumulation precision

torch.cumsum over 42k steps causes fp32 drift between CoreML and PyTorch runtimes. Higher harmonics (9th at 1800Hz) lose correlation (0.79). Fixed with chunked cumsum: reshape into 300-step frames, cumsum per frame, carry wrapped inter-frame phase. Source module correlation: 0.954 → 0.9999.

Verification

Metric Value
CoreML vs PyTorch correlation 0.963
RMS ratio (CoreML/ONNX) 0.99
Whisper transcription match Identical
Parameters loaded 561/573 (12 use defaults)

Test plan

  • Run python convert_kittentts.py --seconds 5 --output kittentts_5s.mlpackage to verify conversion
  • Run python convert_kittentts.py --verify-only to verify weight loading (561/573 params)
  • Compare CoreML output with ONNX reference audio
  • Verify Whisper transcription matches between CoreML and ONNX outputs

🤖 Generated with Claude Code


Open with Devin

…o/StyleTTS2)

ONNX INT8 → dequantized FP32 PyTorch → CoreML mlprogram conversion pipeline.
Reconstructs the full model architecture from ONNX graph, fixes 9 bugs including
LSTM gate reordering, Snake activations, dilations, and fp32 phase accumulation
precision drift in CoreML runtime. Achieves 0.963 correlation with PyTorch reference.
Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +186 to +191
coreml/
├── convert_kittentts.py # Conversion script (model architecture + weight loading + CoreML export)
├── README.md # This file
├── kitten_tts_nano_weights.npz # Extracted dequantized weights (numpy)
└── kitten_tts_nano_weights.pt # Extracted weights (PyTorch state dict)
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Missing pyproject.toml and uv.lock required by AGENTS.md for target directories

Both AGENTS.md files explicitly require that each target directory bundles its own pyproject.toml and uv.lock: "Each target directory is self-contained: pyproject.toml, uv.lock, conversion scripts, docs, and sample assets." Every other coreml target directory in the repo has these files (e.g. models/tts/kokoro/coreml/, models/tts/magpie/coreml/, models/vad/silero-vad/coreml/, etc.). The models/tts/kittentts/coreml/ directory only contains README.md and convert_kittentts.py, missing both pyproject.toml and uv.lock. This means uv sync cannot be run from this target directory, breaking the standard development workflow described in AGENTS.md.

Prompt for agents
Add a pyproject.toml and uv.lock to models/tts/kittentts/coreml/. Follow the pattern from models/tts/kokoro/coreml/pyproject.toml. The pyproject.toml should declare the project dependencies (torch, coremltools, onnx, onnxruntime, numpy, scipy, phonemizer, huggingface_hub) with requires-python = ">=3.10". Then run uv lock to generate the uv.lock file. Also update the README.md Files section (lines 186-191) to include pyproject.toml and uv.lock in the directory listing, and change the Quick Start prerequisites section (lines 13-17) to use uv sync instead of raw pip install.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@@ -0,0 +1,1297 @@
#!/usr/bin/env python3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Conversion script filename uses underscores instead of kebab-case

AGENTS.md specifies "Lowercase-kebab-case for files/dirs". The file is named convert_kittentts.py (underscores) instead of convert-kittentts.py (kebab-case). The established pattern for conversion scripts in the repo consistently uses kebab-case: convert-coreml.py, compare-models.py, convert-gguf.py in models/vad/silero-vad/coreml/.

Prompt for agents
Rename models/tts/kittentts/coreml/convert_kittentts.py to models/tts/kittentts/coreml/convert-kittentts.py to follow the kebab-case naming convention specified in AGENTS.md. Also update the references in models/tts/kittentts/README.md (line 24), models/tts/kittentts/coreml/README.md (lines 28, 31, 34, 51, 187), and the coreml/README.md Files section accordingly.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant