You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ReAct loop in repi/investigation/react_loop.py runs strictly one tool call per LLM turn. Investigations that need to inspect several services or windows pay the latency + token cost of N full ReAct iterations for what is conceptually a single fan-out. All major providers (OpenAI, Anthropic, Mistral, Gemini) support parallel function / tool calls in a single assistant turn — the model emits multiple tool calls and the runtime dispatches them concurrently.
Scope (in)
Update the ReAct loop to recognise multiple tool calls in a single assistant message (the providers' adapters already return tool-call arrays where supported).
Persist each tool call + observation as its own step row in investigation_steps so the audit trail stays granular.
Update the system prompt to tell the model it may batch independent calls in one turn (e.g. inspect three services for the same window).
Keep single-call behaviour as the default for providers / models that don't support batching.
Scope (out)
Cross-turn dependency resolution (the model still serialises calls when one depends on another's output — that's a model decision, not loop behaviour).
Provider adapter changes beyond surfacing the tool-call array.
Acceptance
A query like "compare error rates across api-gateway, auth-service, payments in the last hour" finishes in noticeably fewer ReAct iterations than today.
The persisted investigation_steps table still has one row per tool call (not per turn), so the UI's step view is unchanged.
Existing single-call behaviour passes the eval harness without regression.
Files
repi/investigation/react_loop.py
repi/llm/adapters.py — confirm each provider surfaces the tool-call array.
repi/investigation/store.py — per-step persistence (likely no change).
tests/investigation/test_react_loop.py
Depends on
#13 (rate-limit configurability) — parallel calls amplify rate-limit pressure; do that one first or together.
Why
The ReAct loop in
repi/investigation/react_loop.pyruns strictly one tool call per LLM turn. Investigations that need to inspect several services or windows pay the latency + token cost of N full ReAct iterations for what is conceptually a single fan-out. All major providers (OpenAI, Anthropic, Mistral, Gemini) support parallel function / tool calls in a single assistant turn — the model emits multiple tool calls and the runtime dispatches them concurrently.Scope (in)
asyncio.gather, respecting the configured rate limit (see R4: Make LLM rate limit configurable #13).investigation_stepsso the audit trail stays granular.Scope (out)
Acceptance
investigation_stepstable still has one row per tool call (not per turn), so the UI's step view is unchanged.Files
repi/investigation/react_loop.pyrepi/llm/adapters.py— confirm each provider surfaces the tool-call array.repi/investigation/store.py— per-step persistence (likely no change).tests/investigation/test_react_loop.pyDepends on
#13 (rate-limit configurability) — parallel calls amplify rate-limit pressure; do that one first or together.