[Live] `run_live()`: Fire-and-forget tool calls cause duplicate model responses via orphaned function response reinjection

## 🔴 Required Information

### Describe the Bug

When using `run_live()` (or `bidi_stream_query()`) with tools that the model calls after finishing speech ("fire-and-forget" pattern), the function response sent via `send_content()` arrives **after** `turnComplete` has already broken the `receive()` loop. On re-entry via the `while True` loop in `_receive_from_model()`, the orphaned response is consumed by the model as fresh input, triggering a complete duplicate response.

This affects **any tool** where the model has finished speaking before calling it — UI suggestion chips, session state updates, FAQ/search tools on short answers. The duplication rate correlates with how many zero-audio tool calls occur per turn: 100% on greeting turns with 3 tools, 47-80% on other tool-calling turns.

This is distinct from the multi-agent transfer duplication fixed in v1.20.0 (cf21ca3). That fix addressed `agent_transfer` tool calls. This covers **single-agent** fire-and-forget tool calls, which follow the same code path but are not covered by the existing guard.

Reproduces on `adk web` (Google's own dev server) — not specific to any custom infrastructure.

### Steps to Reproduce

1. Create a file `greeter/agent.py`:

```python
from google.adk.agents import Agent

def suggest_topics(topics: list[str]) -> dict:
    """Display follow-up topic suggestions to the user."""
    return {"displayed": topics}

root_agent = Agent(
    model="gemini-live-2.5-flash-native-audio",
    name="greeter",
    instruction=(
        "You are a friendly greeter. "
        "Greet the user warmly in 1-2 sentences. "
        "After greeting, call suggest_topics with 3 relevant follow-up topics."
    ),
    tools=[suggest_topics],
)
```

2. Create an empty `greeter/__init__.py`.

3. Start `adk web`:
```bash
export GOOGLE_GENAI_USE_VERTEXAI=True
export GOOGLE_CLOUD_PROJECT=<your-project>
export GOOGLE_CLOUD_LOCATION=us-central1
adk web . --port 9001 --no-reload
```

4. Open `http://127.0.0.1:9001` in browser, select **greeter** agent.

5. Click the **microphone button** and say "Hello" (text input uses `generateContent` which doesn't support the live audio model).

6. Observe two symptoms:
   - **Empty turns:** The model calls `suggest_topics`, an empty `turnComplete` fires (no audio), then a second `turnComplete` delivers the actual greeting. The user hears one greeting but the session contains two turn cycles.
   - **Lost tool results:** On subsequent turns, ask the model to search for topics. It may say "Sure, let me find some topics," call the tool, receive results, but then **never verbalize** the results — the user has to prompt again. The tool response is consumed by the feedback loop but produces no spoken output.

Alternatively, reproduce programmatically via `Runner.run_live()`:

```python
import asyncio
from google.adk.agents import Agent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.agents.live_request_queue import LiveRequestQueue, LiveRequest
from google import genai

def suggest_topics(topics: list[str]) -> dict:
    """Display follow-up topic suggestions to the user."""
    return {"displayed": topics}

agent = Agent(
    model="gemini-live-2.5-flash-native-audio",
    name="greeter",
    instruction="Greet the user warmly. After greeting, call suggest_topics with 3 topics.",
    tools=[suggest_topics],
)

session_service = InMemorySessionService()
runner = Runner(agent=agent, app_name="repro", session_service=session_service)

async def run():
    session = await session_service.create_session(app_name="repro", user_id="test")
    queue = LiveRequestQueue()
    queue.send(LiveRequest(content=genai.types.Content(
        role="user", parts=[genai.types.Part(text="Hello")]
    )))
    turn_completes = 0
    async for event in runner.run_live(
        user_id="test", session_id=session.id, live_request_queue=queue
    ):
        if getattr(event, "turn_complete", None) is not None:
            turn_completes += 1
            print(f"turnComplete #{turn_completes}")
            if turn_completes >= 2:
                queue.close()
                print("BUG: Duplicate response detected")
                return

asyncio.run(run())
```

### Expected Behavior

The model should respond **once** per user turn. Tool responses sent back to the model after `turnComplete` should not trigger a new model turn with duplicate content.

### Observed Behavior

The observable symptoms vary depending on whether the orphaned tool response triggers a spoken response or an empty turn:

1. **Duplicate spoken responses (common in programmatic/text input):** The model responds twice with substantially identical content — a full re-generation (34-70 audio chunks). Most visible in automated testing.

2. **Lost tool results (common in browser/mic input):** The model calls a tool, an empty `turnComplete` fires immediately (fire-and-forget), and the tool results are consumed by the feedback loop but produce no spoken follow-up. The user has to re-prompt. Example from a live mic session — user asks about topics, model says "Sure, let me search," calls `suggest_topics`, receives results, but never verbalizes them.

3. **Silent extra turns:** Empty `turnComplete` events (no audio, no transcription) appear between spoken turns, inflating turn counts and disrupting session state.

`adk web` session DB from a live microphone interaction shows the pattern clearly:

```
+     0ms  USER: "hello"
+ 29281ms  FUNCTION_CALL: suggest_topics
+ 29283ms  FUNCTION_RESPONSE: suggest_topics
+ 29287ms  TURN_COMPLETE                   ← empty, no audio (fire-and-forget)
+ 30699ms  TRANSCRIPTION: "Hello there! Welcome..."
+ 30703ms  TURN_COMPLETE                   ← actual spoken response

... later in conversation:

+129273ms  FUNCTION_CALL: suggest_topics
+129275ms  FUNCTION_RESPONSE: suggest_topics
+129287ms  TURN_COMPLETE                   ← empty
+129405ms  TURN_COMPLETE                   ← empty again, results NEVER spoken
```

Traced event sequence from the programmatic minimal repro:

```
+1164ms  FUNCTION_CALL: suggest_topics
+1164ms  FUNCTION_RESPONSE: suggest_topics
+1174ms  TURN_COMPLETE #1          ← fire-and-forget (10ms after tool call)
+6168ms  TRANSCRIPTION: "Hello! It's great to chat with you..."
+6170ms  TURN_COMPLETE #2          ← DUPLICATE (5s later, full greeting)
```

Sub-millisecond trace data from 5 instrumented trials on a production agent (3 tools):

| Trial | `turnComplete` (ms) | `tool_response_sent` (ms) | Delta | Outcome |
|-------|---------------------|---------------------------|-------|---------|
| 0 | 455716.8 | 455718.2 | +1.4ms | Orphaned (TC first) |
| 1 | 467824.1 | 467821.7 | -2.4ms | Sent first, model ignores |
| 2 | 486189.5 | 486190.1 | +0.6ms | Orphaned (TC first) |
| 3 | 500995.8 | 500997.8 | +2.0ms | Orphaned (TC first) |
| 4 | 515989.3 | 515989.8 | +0.5ms | Orphaned (TC first) |

### Environment Details

- **ADK Library Version:** 1.27.1 (also reproduced on 1.24.1)
- **Desktop OS:** macOS 15.4 (also reproduced on Debian 12 / Cloud Run)
- **Python Version:** 3.13.0 (also tested 3.12)
- **genai SDK:** google-genai 1.14.0

### Model Information

- **Are you using LiteLLM:** No
- **Which model:** `gemini-live-2.5-flash-native-audio` (via Vertex AI, `GOOGLE_GENAI_USE_VERTEXAI=True`)

---

## 🟡 Optional Information

### Regression

This affects ADK 1.24.1 through 1.27.1 (all versions tested). The v1.20.0 fix (cf21ca3) addressed multi-agent transfer duplication but did NOT cover this single-agent fire-and-forget pattern.

| ADK Version | Greeting duplication rate (3 tools) |
|-------------|-------------------------------------|
| 1.24.1 | 95% (50 trials) |
| 1.27.1 | 100% (10 trials) |

### How often has this issue occurred?

**Always (100%)** on greeting turns with post-speech tool calls. 47-80% on other tool-calling turns depending on audio duration.

### Minimal Reproduction Code

See Steps to Reproduce above — both `adk web` browser and programmatic `Runner.run_live()` reproductions included.

### Logs

Traced output from the programmatic reproduction:

```
ADK Duplicate Response — Minimal Reproduction
  Model:      gemini-live-2.5-flash-native-audio
  Iterations: 5
  ADK:        1.27.1

      +   1164ms  FUNCTION_CALL: suggest_topics
      +   1164ms  FUNCTION_RESPONSE: suggest_topics
      +   1174ms  TURN_COMPLETE #1
      +   6168ms  TRANSCRIPTION_FINISHED: Hello! It's great to chat with you. How can I help you today?
      +   6170ms  TURN_COMPLETE #2
  Trial 0: DUPE (tc=2)
      +   1155ms  FUNCTION_CALL: suggest_topics
      +   1155ms  FUNCTION_RESPONSE: suggest_topics
      +   1156ms  TURN_COMPLETE #1
      +   2806ms  TRANSCRIPTION_FINISHED: Hello! I'm the greeter agent...
      +   2808ms  TURN_COMPLETE #2
  Trial 1: DUPE (tc=2)

  BUG CONFIRMED: 5/5 trials produced duplicate responses.
```

Session DB from `adk web` live microphone interaction (ADK's own SQLite storage):

```
+     0ms  USER: hello
+ 29281ms  FUNCTION_CALL: suggest_topics
+ 29283ms  FUNCTION_RESPONSE: suggest_topics
+ 29287ms  TURN_COMPLETE              ← empty, fire-and-forget (no audio)
+ 30699ms  TRANSCRIPTION: "Hello there! Welcome..."
+ 30703ms  TURN_COMPLETE              ← actual spoken response (2nd turn cycle)
```

### Additional Context

#### Root cause analysis

The interaction between two code sections in `base_llm_flow.py` creates a feedback cycle:

**Function response reinjection (lines ~536-543):**

```python
yield event
# send back the function response to models
if event.get_function_responses():
    invocation_context.live_request_queue.send_content(event.content)
```

The `yield` suspends the generator. While suspended, `turnComplete` arrives and is buffered. When execution resumes, `send_content()` fires but `receive()` has already exited.

**The while-True loop (lines ~700+):**

```python
while True:
    async with llm_connection.receive() as resp:
        async for event in self._postprocess_live(...):
            yield event
    await asyncio.sleep(0)
```

After `receive()` returns on `turnComplete`, the loop re-enters `receive()`. The orphaned tool response lands in this new cycle as fresh input.

**Why fire-and-forget happens:** The model registers tools as `BLOCKING` (default), but when it has nothing to say alongside the tool call, it emits `turnComplete` with 0 audio and empty transcription simultaneously. The model treats the tool as a side-effect and does not wait.

**Audio duration correlation:** When the model is still streaming audio while calling a tool, the response has time to arrive before `turnComplete`. When the model has 0 remaining audio, the 0-2ms window is too narrow.

| Scenario | Audio chunks | Response arrives before TC? | Duplicate? |
|----------|-------------|----------------------------|-----------|
| Tool called mid-speech | 52 chunks (~3s) | Yes | No |
| Tool called post-speech | 0 chunks | No (0-2ms race) | **Yes** |

#### Isolation testing

| Layer | Duplicates | Trials |
|-------|-----------|--------|
| Direct Gemini API (no ADK) | **0%** | 40 |
| ADK with while-True loop removed | **0%** | 275 |
| ADK unmodified (1.27.1) | 44% overall, 100% on greeting | 275 |
| `adk web` (Google's server) | **100%** on greeting | 10 |

#### Proposed fixes

**Option A: Send function response before yielding — TESTED, does NOT work**

We tested moving `send_content()` before `yield` (10 trials). Result: 10/10 still duplicate. The race is not between `send_content` and `yield` — it's between the model sending `turnComplete` and ADK processing the tool call. By the time ADK sees the `FUNCTION_CALL`, the `turnComplete` is already buffered on the WebSocket. `send_content()` only enqueues to the `LiveRequestQueue`; it doesn't prevent the already-buffered `turnComplete` from being read by `receive()`.

```python
# Tested (still races — turnComplete already buffered before send_content runs):
if event.get_function_responses():
    invocation_context.live_request_queue.send_content(event.content)
yield event
# Result: 10/10 DUPE
```

**Option B:** Drain the WebSocket receive buffer before re-entering the while-True loop. Ensure any pending `send_content` responses are delivered and acknowledged before `receive()` reads the next message.

**Option C:** Extend the v1.20.0 guard (cf21ca3) to cover fire-and-forget tool calls (not just agent transfers). Detect when a cycle had function responses + 0 audio output, and suppress re-entry.

**Option D (validated): Remove the while-True loop entirely**

We monkey-patched `_receive_from_model()` to process one `receive()` cycle and return (no `while True` re-entry). Result: **0% duplication across 275 trials** (10 conversation patterns, 4 tiers, LLM-judged). Trade-off: +11pp no-response rate (the loop was implicitly retrying on silence), mitigable with client-side retry.

Happy to contribute a PR if the team confirms the preferred direction.

### Related Issues

- **#3395** — Multiple responses after agent transfer + session resumption (multi-agent variant)
- **#930** — Single input triggers duplicate LLM processing cycle (closed as logging-only)
- **#3697** — Streaming content duplication in tool call responses
- **#2215** — ADK removes events between function responses
- **v1.20.0 (cf21ca3)** — Multi-agent transfer duplication fix; single-agent NOT covered
- **v1.22.0 (e32f017)** — Orphaned function responses from ContextFilterPlugin; different mechanism
- **[python-genai #2117](https://github.com/googleapis/python-genai/issues/2117)** — Premature turnComplete (P2, ~40 devs, 8 months open)
- **[LiveKit #4554](https://github.com/livekit/agents/issues/4554)** — Gemini Live speaks twice after function calls
- **[Pipecat #1564](https://github.com/pipecat-ai/pipecat/issues/1564)** — Long-running function calls break tool response processing
- **[Google AI Forum](https://discuss.ai.google.dev/t/scheduling-silent-in-non-blocking-function-response-not-preventing-duplicate-audio-generation/114361)** — SILENT scheduling doesn't prevent duplicate audio


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Live] `run_live()`: Fire-and-forget tool calls cause duplicate model responses via orphaned function response reinjection #4902

🔴 Required Information

Describe the Bug

Steps to Reproduce

Expected Behavior

Observed Behavior

Environment Details

Model Information

🟡 Optional Information

Regression

How often has this issue occurred?

Minimal Reproduction Code

Logs

Additional Context

Root cause analysis

Isolation testing

Proposed fixes

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Trial	`turnComplete` (ms)	`tool_response_sent` (ms)	Delta	Outcome
0	455716.8	455718.2	+1.4ms	Orphaned (TC first)
1	467824.1	467821.7	-2.4ms	Sent first, model ignores
2	486189.5	486190.1	+0.6ms	Orphaned (TC first)
3	500995.8	500997.8	+2.0ms	Orphaned (TC first)
4	515989.3	515989.8	+0.5ms	Orphaned (TC first)

Scenario	Audio chunks	Response arrives before TC?	Duplicate?
Tool called mid-speech	52 chunks (~3s)	Yes	No
Tool called post-speech	0 chunks	No (0-2ms race)	Yes

Layer	Duplicates	Trials
Direct Gemini API (no ADK)	0%	40
ADK with while-True loop removed	0%	275
ADK unmodified (1.27.1)	44% overall, 100% on greeting	275
`adk web` (Google's server)	100% on greeting	10

[Live] run_live(): Fire-and-forget tool calls cause duplicate model responses via orphaned function response reinjection #4902

Description

🔴 Required Information

Describe the Bug

Steps to Reproduce

Expected Behavior

Observed Behavior

Environment Details

Model Information

🟡 Optional Information

Regression

How often has this issue occurred?

Minimal Reproduction Code

Logs

Additional Context

Root cause analysis

Isolation testing

Proposed fixes

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Live] `run_live()`: Fire-and-forget tool calls cause duplicate model responses via orphaned function response reinjection #4902