Skip to content

When changing the model of transcription from base.en to small.en, the whole system stucks and UI is not updated #47

@tahahasan88

Description

@tahahasan88

Hi,

So lets say if i change the model from base.en to small.en, there appears to be a race condition which is not seen in base.en model.
The model detects my voice correctly as i can see in the logs but the system does not send my text further to llm engine.

DEFAULT_RECORDER_CONFIG: Dict[str, Any] = {
"use_microphone": False,
"spinner": False,
"model": "small.en",
"realtime_model_type": "small.en",
"use_main_model_for_realtime": False,
"language": "en", # Default, will be overridden by source_language in init
"silero_sensitivity": 0.05,
"webrtc_sensitivity": 3,
"post_speech_silence_duration": 0.7,
"min_length_of_recording": 0.5,
"min_gap_between_recordings": 0,
"enable_realtime_transcription": True,
"realtime_processing_pause": 0.03,
"silero_use_onnx": True,
"silero_deactivity_detection": True,
"early_transcription_on_silence": 0,
"beam_size": 3,
"beam_size_realtime": 3,
"no_log_file": True,
"wake_words": "jarvis",
"wakeword_backend": "pvporcupine",
"allowed_latency_limit": 500,
# Callbacks will be added dynamically in _create_recorder
"debug_mode": True,
"initial_prompt_realtime": "The sky is blue. When the sky... She walked home. Because he... Today is sunny. If only I...",
"faster_whisper_vad_filter": False,
}

The base.en model is not perfect for transcription. Has any one faced this issue ? this is the log when using small.en

41:32.33 server INFO 🖥️🚦 State ToClient 0, ttsClientON 0, ChunkSent 0, hot 0, synth 0 gen 0 valid 0 tts_q_fin 0 mic_inter 0
41:32.55 uvicorn.ac INFO 127.0.0.1:65010 - "GET /static/pcmWorkletProcessor.js HTTP/1.1" 200
41:32.56 uvicorn.ac INFO 127.0.0.1:65010 - "GET /static/ttsPlaybackProcessor.js HTTP/1.1" 200
41:34.18 transcribe INFO 👂▶️ Recording started.
41:34.18 server INFO 🖥️🎙️ Recording started. TTS Client Playing: False
41:34.19 faster_whi INFO Processing audio with duration 00:01.024
41:34.19 faster_whi INFO VAD filter removed 00:00.000 of audio
41:34.70 transcribe INFO 👂🤫 Silence state changed: ACTIVE

HOT
41:35.05 server INFO 🖥️🧠 HOT: None
4
41:35.14 transcribe INFO 👂🔚 Potential sentence end detected (timed out):
---lots of timed out events...just omitting those log lines
41:53.47 transcribe INFO 👂⏹️ Recording stopped.
41:53.48 server INFO 🖥️🏁 =================== USER TURN END ===================
41:53.48 server INFO 🖥️🎙️ ⏸️ Microphone interrupted (end of turn)
41:53.48 server INFO 🖥️🔊 TTS STREAM RELEASED
41:53.48 transcribe INFO 👂🔚 Potential sentence end detected (timed out):
41:53.48 server INFO 🖥️🧠 Adding user request to history: 'Yo buddy!'
41:53.48 server INFO 🖥️📤 →→Client: {'type': 'final_user_request', 'content': 'Yo buddy!'}
41:53.48 transcribe INFO 👂🤫 Silence state changed: INACTIVE
41:53.48 transcribe INFO 👂🔚 Potential sentence end detected (timed out):
41:53.48 server INFO 🖥️🚦 State ToClient 1, ttsClientON 0, ChunkSent 0, hot 0, synth 0 gen 0 valid 0 tts_q_fin 0 mic_inter 1
41:53.49 faster_whi INFO Processing audio with duration 00:02.240
41:53.49 faster_whi INFO VAD filter removed 00:00.000 of audio
41:55.48 server INFO 🖥️🎙️ interruption flag reset after 2 seconds
41:55.48 server INFO 🖥️🚦 State ToClient 1, ttsClientON 0, ChunkSent 0, hot 0, synth 0 gen 0 valid 0 tts_q_fin 0 mic_inter 0
41:56.22 transcribe INFO 👂✅ Final user text: Are you listening?
41:56.22 turndetect INFO 🎤🔄 Resetting TurnDetection state.
41:56.22 server INFO
🖥️✅ FINAL USER REQUEST (STT Callback): Are you listening?
42:18.34 uvicorn.er INFO Shutting down
42:18.34 server ERRO 🖥️💥 RUNTIME_ERROR in process_incoming_data: RuntimeError('Cannot call "receive" once a disconnect message has been received.')
42:18.34 uvicorn.er INFO connection closed
42:18.34 audio_in INFO 👂🚫 Audio processing task cancelled.
42:18.34 audio_in INFO 👂⏹️ Audio chunk processing loop finished.
42:18.34 server INFO 🖥️🧹 Cleaning up WebSocket tasks...
42:18.34 server INFO 🖥️❌ WebSocket session ended.
42:18.44 uvicorn.er INFO Waiting for application shutdown.
42:18.44 server INFO 🖥️⏹️ Server shutting down
42:18.44 audio_in INFO 👂🛑 Shutting down AudioInputProcessor...
42:18.44 audio_in INFO 👂🛑 Signaling TranscriptionProcessor to shut down.
42:18.44 transcribe INFO 👂🔌 Shutting down TranscriptionProcessor...
42:18.44 transcribe INFO 👂🔌 Calling recorder shutdown()...
RealtimeSTT shutting down
42:19.83 transcribe INFO 👂🔌 Recorder shutdown() method completed.
42:19.83 transcribe INFO 👂🔌 TranscriptionProcessor shutdown process finished.
42:19.83 audio_in INFO 👂🚫 Cancelling background transcription task (Task-3)...
42:19.83 audio_in INFO 👂👋 AudioInputProcessor shutdown sequence initiated.
42:19.83 audio_in INFO 👂🚫 Transcription loop (Task-3) cancelled.
42:19.83 audio_in INFO 👂⏹️ Background transcription task (Task-3) finished.
42:19.83 uvicorn.er INFO Application shutdown complete.
42:19.83 uvicorn.er INFO Finished server process [52524]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions