feat: support parakeet-tdt-ctc-110m hybrid model by JarbasAl · Pull Request #383 · FluidInference/FluidAudio

JarbasAl · 2026-03-16T05:58:33Z

Add AsrModelVersion.tdtCtc110m for the 110M parameter hybrid TDT-CTC model. Key differences from the 0.6B models:

Fused preprocessor+encoder (no separate Encoder.mlmodelc)
Smaller dimensions: encoderHidden=512, vocabSize=1024, 1 LSTM layer
Array-format vocabulary (vocab.json) instead of dict format
blankId=1024 (same as v2)

Changes:

AsrModels: optional encoder, fused frontend loading, array vocab support
AsrManager: version-aware decoder state shapes, fused frontend availability
AsrTranscription: skip encoder step when preprocessor output is fused
TdtDecoderState: parameterized LSTM layer count
TdtDecoderV3: use config.encoderHiddenSize instead of auto-detection
EncoderFrameView: accept explicit hidden size parameter
TranscribeCommand: --model-version tdt-ctc-110m, --model-dir flags
ModelNames: parakeetTdtCtc110m repo, fused model requirements

Companion PR: FluidInference/mobius#25

Why is this change needed?

better support for https://huggingface.co/nvidia/parakeet-tdt_ctc-110m

AI Disclosure

I never worked with swift before, Claude Opus did most of the work

Add AsrModelVersion.tdtCtc110m for the 110M parameter hybrid TDT-CTC model. Key differences from the 0.6B models: - Fused preprocessor+encoder (no separate Encoder.mlmodelc) - Smaller dimensions: encoderHidden=512, vocabSize=1024, 1 LSTM layer - Array-format vocabulary (vocab.json) instead of dict format - blankId=1024 (same as v2) Changes: - AsrModels: optional encoder, fused frontend loading, array vocab support - AsrManager: version-aware decoder state shapes, fused frontend availability - AsrTranscription: skip encoder step when preprocessor output is fused - TdtDecoderState: parameterized LSTM layer count - TdtDecoderV3: use config.encoderHiddenSize instead of auto-detection - EncoderFrameView: accept explicit hidden size parameter - TranscribeCommand: --model-version tdt-ctc-110m, --model-dir flags - ModelNames: parakeetTdtCtc110m repo, fused model requirements

Alex-Wengg · 2026-03-16T14:02:27Z

@JarbasAl did you test this on iOS , we had originally fused preprocessor+encoder before & it had incompatibility issues on iOS .

also what about the benchmarks

Alex-Wengg · 2026-03-16T17:36:05Z

Sources/FluidAudio/ModelNames.swift

    case qwen3Asr = "FluidInference/qwen3-asr-0.6b-coreml/f32"
    case qwen3AsrInt8 = "FluidInference/qwen3-asr-0.6b-coreml/int8"
    case multilingualG2p = "FluidInference/charsiu-g2p-byt5-coreml"
+    case parakeetTdtCtc110m = "FluidInference/parakeet-tdt-ctc-110m-coreml"


we don't have this on FluidInference HF

I assumed you would upload it before merging, also sent companion PR to mobius for the conversion

Default ASRConfig uses encoderHiddenSize=1024 but the 110m model produces encoder output with hidden size 512, causing a runtime crash in EncoderFrameView. Adapt the config from the model version before passing it to the decoder.

- Accept --model-version tdt-ctc-110m/110m - Use model-version-aware ASRConfig (blankId, encoderHiddenSize) - Fix CI debug path to use AsrModels.defaultCacheDirectory - Update usage text

- TranscribeCommand: add --model-dir and tdt-ctc-110m to help text, fix modelVersionLabel ternary that mislabeled 110m as "v3" in JSON - TdtDecoderV3.prepareJointInput: use config.encoderHiddenSize instead of convenience init that hardcodes 1024

JarbasAl · 2026-03-16T19:37:03Z

@JarbasAl did you test this on iOS , we had originally fused preprocessor+encoder before & it had incompatibility issues on iOS .

also what about the benchmarks

I only tested in a Mac mini, not iOS. but I should note I had to use iOS18 target for the conversion to work

EDIT: I take that back, works with 17

The AsrModels struct holds strong references to MLModel objects. Without clearing it, cleanup() only nil'd the individual model properties but the AsrModels copy still retained all four models.

Alex-Wengg · 2026-03-16T20:12:31Z

hi @JarbasAl
thanks for the contribution! what's the intended use case for this model? The differences listed in the description (fused frontend, smaller hidden size, array vocab) are structural traits rather than advantages over the 0.6B. what motivated of the conversion of this model?

JarbasAl · 2026-03-16T21:10:39Z

hi @JarbasAl thanks for the contribution! what's the intended use case for this model? The differences listed in the description (fused frontend, smaller hidden size, array vocab) are structural traits rather than advantages over the 0.6B. what motivated of the conversion of this model?

I am developing an application with FluidAudio where I use a proprietary finetuned version of that model, STT is the odd component not using FluidAudio directly.

figured it could be useful for the community to share support, the 100m model is very lightweight

WER improved ~3% in my test data by using this instead of the CTC export

Alex-Wengg · 2026-03-16T22:01:20Z

Sources/FluidAudio/ASR/TDT/TdtDecoderState.swift

    var timeJump: Int?

-    init() throws {
+    init(decoderLayers: Int = 2) throws {


any reasons why int = 2?

Alex-Wengg · 2026-03-16T22:14:49Z

Sources/FluidAudio/ASR/ChunkProcessor.swift

        var chunkIndex = 0
-        var chunkDecoderState = TdtDecoderState.make()
+        var chunkDecoderState = TdtDecoderState.make(
+            decoderLayers: manager.asrModels?.version.decoderLayers ?? 2


Alex-Wengg · 2026-03-16T22:38:55Z

https://huggingface.co/FluidInference/parakeet-tdt-ctc-110m-coreml HF uploaded

BrandonWeng

We probably need to double check if this runs on iOS or not. We previously had issues with ioS when we tried combining the mel processor and the encoder.

#118

If there's no problem, @Alex-Wengg can't we just replace the existing ctc 110m with this instead of maintaining both

Alex-Wengg · 2026-03-17T22:23:03Z

We probably need to double check if this runs on iOS or not. We previously had issues with ioS when we tried combining the mel processor and the encoder.

#118

If there's no problem, @Alex-Wengg can't we just replace the existing ctc 110m with this instead of maintaining both

this is in theory possible but i will need to do some testings first. the custom vocab research paper did not mention anything about preprocessor specifications and generally preprocessors are pretty simple too

JarbasAl mentioned this pull request Mar 16, 2026

feat: add TDT CoreML export for parakeet-tdt-ctc-110m FluidInference/mobius#25

Merged

This comment was marked as resolved.

Sign in to view

Alex-Wengg reviewed Mar 16, 2026

View reviewed changes

JarbasAl marked this pull request as draft March 16, 2026 18:09

JarbasAl added 3 commits March 16, 2026 18:43

feat: add tdt-ctc-110m support to ASR benchmark

196965c

- Accept --model-version tdt-ctc-110m/110m - Use model-version-aware ASRConfig (blankId, encoderHiddenSize) - Fix CI debug path to use AsrModels.defaultCacheDirectory - Update usage text

JarbasAl marked this pull request as ready for review March 16, 2026 19:35

This comment was marked as resolved.

Sign in to view

fix: nil out asrModels in cleanup() to release MLModel references

64a64a2

The AsrModels struct holds strong references to MLModel objects. Without clearing it, cleanup() only nil'd the individual model properties but the AsrModels copy still retained all four models.

Alex-Wengg reviewed Mar 16, 2026

View reviewed changes

BrandonWeng reviewed Mar 17, 2026

View reviewed changes

SGD2718 added enhancement New feature or request speech-to-text issues related to transcription/asr labels Mar 17, 2026

Conversation

JarbasAl commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why is this change needed?

AI Disclosure

Uh oh!

This comment was marked as resolved.

Uh oh!

Alex-Wengg commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alex-Wengg Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JarbasAl Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

JarbasAl commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Alex-Wengg commented Mar 16, 2026

Uh oh!

JarbasAl commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alex-Wengg Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Alex-Wengg Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Alex-Wengg commented Mar 16, 2026

Uh oh!

BrandonWeng left a comment

Choose a reason for hiding this comment

Uh oh!

Alex-Wengg commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JarbasAl commented Mar 16, 2026 •

edited

Loading

Alex-Wengg commented Mar 16, 2026 •

edited

Loading

Alex-Wengg Mar 16, 2026 •

edited

Loading

JarbasAl commented Mar 16, 2026 •

edited

Loading

JarbasAl commented Mar 16, 2026 •

edited

Loading

Alex-Wengg Mar 16, 2026 •

edited

Loading