Skip to content

fix: Improve voice agent first reply latency#54

Open
jalmari-sor wants to merge 1 commit into
mainfrom
fix/improve-voice-latency
Open

fix: Improve voice agent first reply latency#54
jalmari-sor wants to merge 1 commit into
mainfrom
fix/improve-voice-latency

Conversation

@jalmari-sor
Copy link
Copy Markdown
Contributor

closes #

Checklist

  • I have performed a self-review of my code
  • If it is a core feature and/or a bugfix, I have added thorough tests.
  • Does this PR introduce a breaking change? 🚨

# Rename prewarm entries from the temporary key to the real call_sid
# so the rest of the pipeline (MCP check, twilio_stream, _end_session)
# can look up the session by call_sid as normal.
if prewarm_key in self._active_sessions:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: race condition — prewarm session may not be re-keyed in time

_prewarm_audio_session runs concurrently and writes to _active_sessions[prewarm_key] only after its own await chain completes. This re-key check executes right after make_outbound_call() returns — with no synchronisation between the two. When the task has not yet registered the session, the check is False and _active_sessions[call_sid] is never populated.

When twilio_stream later calls _start_audio_session(call_sid), the session is not found, a second cold-start session is created, and the prewarmed session under prewarm_key is never cleaned up (leaking a live Gemini WebSocket connection).

Fix: eliminate the rename entirely — pass call_sid to _prewarm_audio_session only after make_outbound_call() returns (i.e. the two operations are still concurrent, but the task is created after call_sid is known and keyed directly under it).

# answered before or after prewarm completes (prewarmed path ignores
# init_config in _start_audio_session).
effective_prompt = system_prompt or DEFAULT_SYSTEM_PROMPT
if context_messages:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup: context_messages → system_prompt merging is duplicated

The same '\n'.join(context_messages) + f-string pattern appears here (lines 701–703) and again in twilio_transport.py lines 159–162. Any change to the merge format (adding a header, escaping, trimming) must be applied in both places. Consider extracting a shared helper, e.g.:

def _build_effective_prompt(base: str, context_messages: list[str] | None) -> str:
    if not context_messages:
        return base
    return f"{base}\n\nContext:\n{'\n'.join(context_messages)}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants