Skip to content

feat: add CTC greedy/beam search decoding with ARPA language model support#384

Open
JarbasAl wants to merge 3 commits intoFluidInference:mainfrom
TigreGotico:feat/ctc-lm-decoding
Open

feat: add CTC greedy/beam search decoding with ARPA language model support#384
JarbasAl wants to merge 3 commits intoFluidInference:mainfrom
TigreGotico:feat/ctc-lm-decoding

Conversation

@JarbasAl
Copy link

@JarbasAl JarbasAl commented Mar 16, 2026

Adds standalone CTC decoding utilities that work with CTC log-probabilities:

  • ctcGreedyDecode: argmax per timestep with repeat collapse
  • ctcBeamSearch: prefix beam search with optional ARPA LM rescoring
  • ARPALanguageModel: loads unigram/bigram ARPA files for beam search

Both decoders accept [[Float]] (from CtcKeywordSpotter) and MLMultiArray (from direct CoreML inference) inputs.

Why is this change needed?

Reduce WER with domain specific language models

AI Disclosure

I never worked with swift before, Claude Opus did most of the work


Open with Devin

…pport

Adds standalone CTC decoding utilities that work with CTC log-probabilities:
- ctcGreedyDecode: argmax per timestep with repeat collapse
- ctcBeamSearch: prefix beam search with optional ARPA LM rescoring
- ARPALanguageModel: loads unigram/bigram ARPA files for beam search

Both decoders accept [[Float]] (from CtcKeywordSpotter) and MLMultiArray
(from direct CoreML inference) inputs.
30 tests covering:
- logAddExp math utilities (5 tests)
- decodeCtcTokenIds token decoding (4 tests)
- CTC greedy decode with [[Float]] and MLMultiArray (6 tests)
- CtcBeam struct properties (4 tests)
- CTC beam search without/with MLMultiArray (4 tests)
- ARPA LM loading, unigram/bigram parsing, scoring (7 tests)
- Beam search with LM overriding acoustic score (1 test)
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 6 additional findings in Devin Review.

Open in Devin Review

- Move \end\ check inside hasPrefix("\\") block so it's reachable
- Replace nested if with flat conditional in beam search LM scoring
- Add whitespace trimming to ARPALineReader EOF path
@SGD2718 SGD2718 added enhancement New feature or request speech-to-text issues related to transcription/asr labels Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request speech-to-text issues related to transcription/asr

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants