feat: add TDT CoreML export for parakeet-tdt-ctc-110m by JarbasAl · Pull Request #25 · FluidInference/mobius

JarbasAl · 2026-03-16T06:01:36Z

Add convert-tdt-coreml.py which exports the TDT decoder components (fused mel+encoder, RNNT decoder LSTM, joint decision with duration) instead of the CTC head. The CTC export only produces blank-dominant log-probabilities unsuitable for greedy transcription in hybrid models.

Components:

convert-tdt-coreml.py: Full TDT export pipeline (iOS 18 target)
individual_components.py: Shared torch.nn.Module wrappers for tracing
Updated README.md: Documents both TDT and CTC export paths
Updated pyproject.toml: Adds script entry point and includes

companion PR: FluidInference/FluidAudio#383

AI Disclosure

Claude Opus did most of the work

Add convert-tdt-coreml.py which exports the TDT decoder components (fused mel+encoder, RNNT decoder LSTM, joint decision with duration) instead of the CTC head. The CTC export only produces blank-dominant log-probabilities unsuitable for greedy transcription in hybrid models. Components: - convert-tdt-coreml.py: Full TDT export pipeline (iOS 18 target) - individual_components.py: Shared torch.nn.Module wrappers for tracing - Updated README.md: Documents both TDT and CTC export paths - Updated pyproject.toml: Adds script entry point and includes

- Replace logits[..., -self.num_extra:] with logits[..., self.vocab_with_blank:] to fix Python -0: slicing returning all logits when num_extra == 0 - Guard duration argmax with num_extra > 0 check, return zeros otherwise - Upgrade num_extra == 0 warning to error since TDT export is invalid without duration head - Fix _save_mlpackage: set iOS18 deployment target (matching export), remove unnecessary try/except

- Bump fsspec 2024.9.0 -> 2024.12.0 (required by nemo-toolkit 2.3.1) - Bump datasets 3.1.0 -> 3.3.2 (compatible with new fsspec) - Add missing transitive deps: editdistance, pyannote.metrics, ipython

The 110m model has no iOS 18-only ops — the int64->int32 warnings during conversion are just precision downcasts, not spec-version-gated operations. Verified all 4 components export at spec version 8 (iOS 17) and inference produces correct transcription via FluidAudio CLI.

devin-ai-integration

Devin Review found 2 new potential issues.

View 10 additional findings in Devin Review.

devin-ai-integration · 2026-03-16T20:15:57Z

models/stt/parakeet-tdt-ctc-110m/coreml/README.md

-  - `parakeet_ctc_decoder.mlpackage` — encoder -> log_probs
+Key differences from the 0.6B export:
+- **Fused frontend**: mel spectrogram + encoder are a single `Preprocessor.mlpackage` (0.6B has separate Preprocessor + Encoder)
+- **iOS 18 deployment target**: Required for int ops in the encoder's positional encoding


🟡 README claims iOS 18 deployment target but code uses iOS 17

The README at line 66 states "iOS 18 deployment target: Required for int ops in the encoder's positional encoding" as a key difference from the 0.6B export. However, commit 7475673 explicitly changed the deployment target from iOS 18 to iOS 17 in the code (convert-tdt-coreml.py:57 and convert-tdt-coreml.py:185), confirming that iOS 18 is not actually required. The README was not updated to reflect this fix, leaving stale documentation that will mislead users into believing they need iOS 18.

Suggested change

- **iOS 18 deployment target**: Required for int ops in the encoder's positional encoding

- **iOS 17 deployment target**: int64→int32 precision downcasts are handled automatically; no iOS 18-only ops

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-03-16T20:15:58Z

models/stt/parakeet-tdt-ctc-110m/coreml/convert-tdt-coreml.py

+            "decoder_layers": decoder_layers,
+            "checkpoint": checkpoint_meta,
+            "coreml": {
+                "compute_units": export_settings.compute_units.name,


🟡 Metadata records CPU_ONLY but mel+encoder is exported with CPU_AND_NE

The metadata coreml.compute_units field at convert-tdt-coreml.py:490 records export_settings.compute_units.name which is hardcoded to "CPU_ONLY" (line 184). However, the Preprocessor (mel+encoder) is actually converted with compute_units_override=melenc_cu (line 328), which defaults to CPU_AND_NE via the --mel-encoder-cu CLI option (line 165). This means the metadata misrepresents the actual compute unit configuration of the exported model, which could mislead downstream tools or developers reading the metadata to understand model behavior.

Prompt for agents

In models/stt/parakeet-tdt-ctc-110m/coreml/convert-tdt-coreml.py, the metadata at line 490 records export_settings.compute_units.name (always "CPU_ONLY") for the overall coreml configuration. However, the mel+encoder (Preprocessor) component is converted with melenc_cu (defaults to CPU_AND_NE). Either: 1. Change line 490 to record melenc_cu.name instead, or 2. Add per-component compute_units to the metadata components section (e.g., add a "compute_units" field to each component dict), or 3. Remove the top-level compute_units from the coreml metadata since it doesn't represent a single consistent value across components.

Was this helpful? React with 👍 or 👎 to provide feedback.

JarbasAl mentioned this pull request Mar 16, 2026

feat: support parakeet-tdt-ctc-110m hybrid model FluidInference/FluidAudio#383

Open

This comment was marked as resolved.

Sign in to view

JarbasAl marked this pull request as draft March 16, 2026 18:09

JarbasAl added 2 commits March 16, 2026 19:13

fix: resolve dependency conflicts in pyproject.toml

73c97bf

- Bump fsspec 2024.9.0 -> 2024.12.0 (required by nemo-toolkit 2.3.1) - Bump datasets 3.1.0 -> 3.3.2 (compatible with new fsspec) - Add missing transitive deps: editdistance, pyannote.metrics, ipython

JarbasAl marked this pull request as ready for review March 16, 2026 19:35

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot reviewed Mar 16, 2026

View reviewed changes

Alex-Wengg merged commit e72bfc9 into FluidInference:main Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add TDT CoreML export for parakeet-tdt-ctc-110m#25

feat: add TDT CoreML export for parakeet-tdt-ctc-110m#25
Alex-Wengg merged 4 commits intoFluidInference:mainfrom
TigreGotico:feat/tdt-ctc-110m-coreml-export

JarbasAl commented Mar 16, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Mar 16, 2026

Uh oh!

devin-ai-integration bot Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- iOS 18 deployment target: Required for int ops in the encoder's positional encoding
	- iOS 17 deployment target: int64→int32 precision downcasts are handled automatically; no iOS 18-only ops

Conversation

JarbasAl commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Disclosure

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JarbasAl commented Mar 16, 2026 •

edited

Loading