Why
A frequent investigation move is "compare what these two services emitted in the same window" — for example, did the auth service start erroring before or after the gateway? Today the LLM has to make 3–4 separate tool calls (get_service_summary for each, search_logs per service, then mental diffing) and the comparison happens implicitly in the model's reasoning, which is slow and easy to get wrong.
A dedicated tool returns a side-by-side diff in one call: per-service error/warning counts, top-N repeating signatures, first/last timestamp inside the window, and the signatures that appear in one service but not the other (the actual interesting signal).
Scope (in)
- New tool
compare_services(s1: str, s2: str, time_from: str, time_to: str, project_id: uuid|None) -> dict in repi/investigation/tools.py.
- Returns:
- Signatures extracted with
repi/retrieval/cluster_view.extract_signature (same util build_timeline uses).
- Single SQL pass per service via aggregate query; no per-row Python looping.
- Schema entry in
TOOL_SCHEMAS so the LLM discovers it.
- Mentioned in the ReAct system prompt as "prefer this over two separate
get_service_summary calls when comparing services".
Scope (out)
- More than two services (a future
compare_many could come later; two-arg covers the most common case).
- Cross-project comparison.
Acceptance
- Tool returns a populated diff against a seeded eval dataset that has overlapping and divergent signatures.
- A scripted scenario that previously took 4+ tool calls finishes in 1.
- Unit test in
tests/investigation/test_tools.py covers only_in_s1 / only_in_s2 / shared partitioning.
Files
repi/investigation/tools.py — add compare_services + register in TOOL_SCHEMAS.
repi/investigation/react_loop.py — dispatch table.
tests/investigation/test_tools.py
Why
A frequent investigation move is "compare what these two services emitted in the same window" — for example, did the auth service start erroring before or after the gateway? Today the LLM has to make 3–4 separate tool calls (
get_service_summaryfor each,search_logsper service, then mental diffing) and the comparison happens implicitly in the model's reasoning, which is slow and easy to get wrong.A dedicated tool returns a side-by-side diff in one call: per-service error/warning counts, top-N repeating signatures, first/last timestamp inside the window, and the signatures that appear in one service but not the other (the actual interesting signal).
Scope (in)
compare_services(s1: str, s2: str, time_from: str, time_to: str, project_id: uuid|None) -> dictinrepi/investigation/tools.py.{ "s1": { "service": "...", "error_count": ..., "warning_count": ..., "first_ts": "...", "last_ts": "...", "top_signatures": [{ "signature": "...", "count": ... }, ...] }, "s2": { ... same shape ... }, "only_in_s1": [ "<signature>", ... ], "only_in_s2": [ "<signature>", ... ], "shared": [ "<signature>", ... ] }repi/retrieval/cluster_view.extract_signature(same utilbuild_timelineuses).TOOL_SCHEMASso the LLM discovers it.get_service_summarycalls when comparing services".Scope (out)
compare_manycould come later; two-arg covers the most common case).Acceptance
tests/investigation/test_tools.pycoversonly_in_s1/only_in_s2/sharedpartitioning.Files
repi/investigation/tools.py— addcompare_services+ register inTOOL_SCHEMAS.repi/investigation/react_loop.py— dispatch table.tests/investigation/test_tools.py