[codex] 支持外部语音识别模型和小米 MiMo ASR#294
Conversation
PR AI Review / PR AI 语义预检中文
PR 将语音识别从本地模型/布尔开关扩展为可配置 provider 系统(本地、腾讯云 ASR、小米 MiMo ASR)。新增 SpeechRecognitionConfig 域模型、TencentCloudAsrService 和 XiaomiMimoAsrService 服务,重构 SpeechTranscriptionService 按 provider 路由转录。UI 层在 AI 配置页新增语音识别能力区。测试覆盖较全面(6 个新/修改的测试文件,涵盖服务、ViewModel 和 Widget)。风险主要来自录音核心链路的重构和外部 ASR 的网络依赖。 影响范围
黄金链路
风险项
测试缺口
English
PR extends speech recognition from a local-only model/boolean toggle to a configurable provider system (Local, Tencent Cloud ASR, Xiaomi MiMo ASR). Adds SpeechRecognitionConfig domain model, TencentCloudAsrService, XiaomiMimoAsrService, and refactors SpeechTranscriptionService to route by provider. UI adds a speech recognition capability section in the AI setup page. Test coverage is comprehensive (6 new/modified test files covering services, ViewModel, and widgets). Risk stems from the recording core-path refactor and external ASR network dependencies. Affected Areas
Golden Path
Findings
Test Gaps
|
PR Preflight Summary / PR 预检汇总中文
English
PR Policy Preflight / PR 规则预检PR Policy Preflight / PR 规则预检中文
未发现确定性规则问题。 English
No deterministic policy findings. PR Flutter Quality / Flutter 质量预检PR Flutter Quality / Flutter 质量预检中文
English
Flutter Analyzer Baseline
No new analyzer issues introduced by this PR. Flutter Test Baseline
No new Flutter test failures introduced by this PR. |
|
中文回复 这个 feature 方向我认可:语音识别 provider 需要从本地模型扩展到外部服务。但我建议这个 PR 暂时不要按当前形态合入,原因是实现方向和产品预期有偏差。 我看了代码后,当前实现基本是硬编码了三个 provider: 更重要的是,这个 PR 替换掉了 腾讯云这条路径的配置门槛也比较高:用户需要额外理解和填写 MiMo 的复用也比较窄:它只会复用 我建议调整为:
总体来说,feature 有价值,但现在像是“新增两个国内 ASR provider”,而不是“让用户复用已有模型配置来选择最佳语音转录能力”。我建议先把 provider resolution 和默认路径改成 LLM-config-first,再保留腾讯云/MiMo 作为可选高级 provider。 English Reply I agree with the feature direction: speech recognition should support external providers beyond the local model. However, I would not merge this PR in its current shape because the implementation does not match the expected product behavior. After reviewing the code, the implementation effectively hard-codes three providers: More importantly, this PR replaces the generic cloud transcription path that already exists on The Tencent Cloud path also has a high setup burden: users need to understand and enter The MiMo reuse path is also too narrow. It only reuses configs where I suggest changing the design as follows:
Overall, the feature is valuable, but the current PR feels like “add two China ASR providers” rather than “reuse the user’s existing model configuration and choose the best available speech transcription capability.” I’d recommend making the default path LLM-config-first, then keeping Tencent/MiMo as optional advanced providers. |
背景
Closes #293
这次把语音识别从原来的本地模型/布尔开关扩展为可配置 provider,让用户在识别时可以选择本地模型或外部模型。外部 provider 首批接入腾讯云 ASR 和小米 MiMo ASR,并兼容新版 AI 服务配置页。
需求覆盖
mimo-v2.5-asr,支持录音结束或导入音频后的最终文本识别。/anthropic地址在 ASR 场景归一化为同 host 的/v1/chat/completions。use_local_speech_to_text语义兼容。实现方案
SpeechRecognitionConfig/ provider domain model,集中保存本地、腾讯云、小米 MiMo 的配置。TencentCloudAsrService、XiaomiMimoAsrService和统一的实时转录接口。SpeechTranscriptionService根据 provider 分发本地、腾讯云或 MiMo 转录路径,并显式区分实时能力。验证
https://token-plan-sgp.xiaomimimo.com/anthropic:直接拼/anthropic/chat/completions会 404,归一化到/v1/chat/completions后mimo-v2.5-asr返回正确识别文本。flutter test --no-pub --concurrency=1:633 passed,7 skipped。flutter analyze --no-pub:No issues found。git diff --check --cached:通过。