Skip to content

[Live] run_live(): Fire-and-forget tool calls cause duplicate model responses via orphaned function response reinjection #4902

@smwitkowski

Description

@smwitkowski

🔴 Required Information

Describe the Bug

When using run_live() (or bidi_stream_query()) with tools that the model calls after finishing speech ("fire-and-forget" pattern), the function response sent via send_content() arrives after turnComplete has already broken the receive() loop. On re-entry via the while True loop in _receive_from_model(), the orphaned response is consumed by the model as fresh input, triggering a complete duplicate response.

This affects any tool where the model has finished speaking before calling it — UI suggestion chips, session state updates, FAQ/search tools on short answers. The duplication rate correlates with how many zero-audio tool calls occur per turn: 100% on greeting turns with 3 tools, 47-80% on other tool-calling turns.

This is distinct from the multi-agent transfer duplication fixed in v1.20.0 (cf21ca3). That fix addressed agent_transfer tool calls. This covers single-agent fire-and-forget tool calls, which follow the same code path but are not covered by the existing guard.

Reproduces on adk web (Google's own dev server) — not specific to any custom infrastructure.

Steps to Reproduce

  1. Create a file greeter/agent.py:
from google.adk.agents import Agent

def suggest_topics(topics: list[str]) -> dict:
    """Display follow-up topic suggestions to the user."""
    return {"displayed": topics}

root_agent = Agent(
    model="gemini-live-2.5-flash-native-audio",
    name="greeter",
    instruction=(
        "You are a friendly greeter. "
        "Greet the user warmly in 1-2 sentences. "
        "After greeting, call suggest_topics with 3 relevant follow-up topics."
    ),
    tools=[suggest_topics],
)
  1. Create an empty greeter/__init__.py.

  2. Start adk web:

export GOOGLE_GENAI_USE_VERTEXAI=True
export GOOGLE_CLOUD_PROJECT=<your-project>
export GOOGLE_CLOUD_LOCATION=us-central1
adk web . --port 9001 --no-reload
  1. Open http://127.0.0.1:9001 in browser, select greeter agent.

  2. Click the microphone button and say "Hello" (text input uses generateContent which doesn't support the live audio model).

  3. Observe two symptoms:

    • Empty turns: The model calls suggest_topics, an empty turnComplete fires (no audio), then a second turnComplete delivers the actual greeting. The user hears one greeting but the session contains two turn cycles.
    • Lost tool results: On subsequent turns, ask the model to search for topics. It may say "Sure, let me find some topics," call the tool, receive results, but then never verbalize the results — the user has to prompt again. The tool response is consumed by the feedback loop but produces no spoken output.

Alternatively, reproduce programmatically via Runner.run_live():

import asyncio
from google.adk.agents import Agent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.agents.live_request_queue import LiveRequestQueue, LiveRequest
from google import genai

def suggest_topics(topics: list[str]) -> dict:
    """Display follow-up topic suggestions to the user."""
    return {"displayed": topics}

agent = Agent(
    model="gemini-live-2.5-flash-native-audio",
    name="greeter",
    instruction="Greet the user warmly. After greeting, call suggest_topics with 3 topics.",
    tools=[suggest_topics],
)

session_service = InMemorySessionService()
runner = Runner(agent=agent, app_name="repro", session_service=session_service)

async def run():
    session = await session_service.create_session(app_name="repro", user_id="test")
    queue = LiveRequestQueue()
    queue.send(LiveRequest(content=genai.types.Content(
        role="user", parts=[genai.types.Part(text="Hello")]
    )))
    turn_completes = 0
    async for event in runner.run_live(
        user_id="test", session_id=session.id, live_request_queue=queue
    ):
        if getattr(event, "turn_complete", None) is not None:
            turn_completes += 1
            print(f"turnComplete #{turn_completes}")
            if turn_completes >= 2:
                queue.close()
                print("BUG: Duplicate response detected")
                return

asyncio.run(run())

Expected Behavior

The model should respond once per user turn. Tool responses sent back to the model after turnComplete should not trigger a new model turn with duplicate content.

Observed Behavior

The observable symptoms vary depending on whether the orphaned tool response triggers a spoken response or an empty turn:

  1. Duplicate spoken responses (common in programmatic/text input): The model responds twice with substantially identical content — a full re-generation (34-70 audio chunks). Most visible in automated testing.

  2. Lost tool results (common in browser/mic input): The model calls a tool, an empty turnComplete fires immediately (fire-and-forget), and the tool results are consumed by the feedback loop but produce no spoken follow-up. The user has to re-prompt. Example from a live mic session — user asks about topics, model says "Sure, let me search," calls suggest_topics, receives results, but never verbalizes them.

  3. Silent extra turns: Empty turnComplete events (no audio, no transcription) appear between spoken turns, inflating turn counts and disrupting session state.

adk web session DB from a live microphone interaction shows the pattern clearly:

+     0ms  USER: "hello"
+ 29281ms  FUNCTION_CALL: suggest_topics
+ 29283ms  FUNCTION_RESPONSE: suggest_topics
+ 29287ms  TURN_COMPLETE                   ← empty, no audio (fire-and-forget)
+ 30699ms  TRANSCRIPTION: "Hello there! Welcome..."
+ 30703ms  TURN_COMPLETE                   ← actual spoken response

... later in conversation:

+129273ms  FUNCTION_CALL: suggest_topics
+129275ms  FUNCTION_RESPONSE: suggest_topics
+129287ms  TURN_COMPLETE                   ← empty
+129405ms  TURN_COMPLETE                   ← empty again, results NEVER spoken

Traced event sequence from the programmatic minimal repro:

+1164ms  FUNCTION_CALL: suggest_topics
+1164ms  FUNCTION_RESPONSE: suggest_topics
+1174ms  TURN_COMPLETE #1          ← fire-and-forget (10ms after tool call)
+6168ms  TRANSCRIPTION: "Hello! It's great to chat with you..."
+6170ms  TURN_COMPLETE #2          ← DUPLICATE (5s later, full greeting)

Sub-millisecond trace data from 5 instrumented trials on a production agent (3 tools):

Trial turnComplete (ms) tool_response_sent (ms) Delta Outcome
0 455716.8 455718.2 +1.4ms Orphaned (TC first)
1 467824.1 467821.7 -2.4ms Sent first, model ignores
2 486189.5 486190.1 +0.6ms Orphaned (TC first)
3 500995.8 500997.8 +2.0ms Orphaned (TC first)
4 515989.3 515989.8 +0.5ms Orphaned (TC first)

Environment Details

  • ADK Library Version: 1.27.1 (also reproduced on 1.24.1)
  • Desktop OS: macOS 15.4 (also reproduced on Debian 12 / Cloud Run)
  • Python Version: 3.13.0 (also tested 3.12)
  • genai SDK: google-genai 1.14.0

Model Information

  • Are you using LiteLLM: No
  • Which model: gemini-live-2.5-flash-native-audio (via Vertex AI, GOOGLE_GENAI_USE_VERTEXAI=True)

🟡 Optional Information

Regression

This affects ADK 1.24.1 through 1.27.1 (all versions tested). The v1.20.0 fix (cf21ca3) addressed multi-agent transfer duplication but did NOT cover this single-agent fire-and-forget pattern.

ADK Version Greeting duplication rate (3 tools)
1.24.1 95% (50 trials)
1.27.1 100% (10 trials)

How often has this issue occurred?

Always (100%) on greeting turns with post-speech tool calls. 47-80% on other tool-calling turns depending on audio duration.

Minimal Reproduction Code

See Steps to Reproduce above — both adk web browser and programmatic Runner.run_live() reproductions included.

Logs

Traced output from the programmatic reproduction:

ADK Duplicate Response — Minimal Reproduction
  Model:      gemini-live-2.5-flash-native-audio
  Iterations: 5
  ADK:        1.27.1

      +   1164ms  FUNCTION_CALL: suggest_topics
      +   1164ms  FUNCTION_RESPONSE: suggest_topics
      +   1174ms  TURN_COMPLETE #1
      +   6168ms  TRANSCRIPTION_FINISHED: Hello! It's great to chat with you. How can I help you today?
      +   6170ms  TURN_COMPLETE #2
  Trial 0: DUPE (tc=2)
      +   1155ms  FUNCTION_CALL: suggest_topics
      +   1155ms  FUNCTION_RESPONSE: suggest_topics
      +   1156ms  TURN_COMPLETE #1
      +   2806ms  TRANSCRIPTION_FINISHED: Hello! I'm the greeter agent...
      +   2808ms  TURN_COMPLETE #2
  Trial 1: DUPE (tc=2)

  BUG CONFIRMED: 5/5 trials produced duplicate responses.

Session DB from adk web live microphone interaction (ADK's own SQLite storage):

+     0ms  USER: hello
+ 29281ms  FUNCTION_CALL: suggest_topics
+ 29283ms  FUNCTION_RESPONSE: suggest_topics
+ 29287ms  TURN_COMPLETE              ← empty, fire-and-forget (no audio)
+ 30699ms  TRANSCRIPTION: "Hello there! Welcome..."
+ 30703ms  TURN_COMPLETE              ← actual spoken response (2nd turn cycle)

Additional Context

Root cause analysis

The interaction between two code sections in base_llm_flow.py creates a feedback cycle:

Function response reinjection (lines ~536-543):

yield event
# send back the function response to models
if event.get_function_responses():
    invocation_context.live_request_queue.send_content(event.content)

The yield suspends the generator. While suspended, turnComplete arrives and is buffered. When execution resumes, send_content() fires but receive() has already exited.

The while-True loop (lines ~700+):

while True:
    async with llm_connection.receive() as resp:
        async for event in self._postprocess_live(...):
            yield event
    await asyncio.sleep(0)

After receive() returns on turnComplete, the loop re-enters receive(). The orphaned tool response lands in this new cycle as fresh input.

Why fire-and-forget happens: The model registers tools as BLOCKING (default), but when it has nothing to say alongside the tool call, it emits turnComplete with 0 audio and empty transcription simultaneously. The model treats the tool as a side-effect and does not wait.

Audio duration correlation: When the model is still streaming audio while calling a tool, the response has time to arrive before turnComplete. When the model has 0 remaining audio, the 0-2ms window is too narrow.

Scenario Audio chunks Response arrives before TC? Duplicate?
Tool called mid-speech 52 chunks (~3s) Yes No
Tool called post-speech 0 chunks No (0-2ms race) Yes

Isolation testing

Layer Duplicates Trials
Direct Gemini API (no ADK) 0% 40
ADK with while-True loop removed 0% 275
ADK unmodified (1.27.1) 44% overall, 100% on greeting 275
adk web (Google's server) 100% on greeting 10

Proposed fixes

Option A: Send function response before yielding — TESTED, does NOT work

We tested moving send_content() before yield (10 trials). Result: 10/10 still duplicate. The race is not between send_content and yield — it's between the model sending turnComplete and ADK processing the tool call. By the time ADK sees the FUNCTION_CALL, the turnComplete is already buffered on the WebSocket. send_content() only enqueues to the LiveRequestQueue; it doesn't prevent the already-buffered turnComplete from being read by receive().

# Tested (still races — turnComplete already buffered before send_content runs):
if event.get_function_responses():
    invocation_context.live_request_queue.send_content(event.content)
yield event
# Result: 10/10 DUPE

Option B: Drain the WebSocket receive buffer before re-entering the while-True loop. Ensure any pending send_content responses are delivered and acknowledged before receive() reads the next message.

Option C: Extend the v1.20.0 guard (cf21ca3) to cover fire-and-forget tool calls (not just agent transfers). Detect when a cycle had function responses + 0 audio output, and suppress re-entry.

Option D (validated): Remove the while-True loop entirely

We monkey-patched _receive_from_model() to process one receive() cycle and return (no while True re-entry). Result: 0% duplication across 275 trials (10 conversation patterns, 4 tiers, LLM-judged). Trade-off: +11pp no-response rate (the loop was implicitly retrying on silence), mitigable with client-side retry.

Happy to contribute a PR if the team confirms the preferred direction.

Related Issues

Metadata

Metadata

Labels

live[Component] This issue is related to live, voice and video chat

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions