Feat/drift alignment#74
Merged
Merged
Conversation
Stream 2 of the drift-alignment plan — promote signature-based event clustering from an ingest-time implementation detail to a first-class product surface. Per the feedback: "Humans don't want 5000 logs. They want event clusters." - repi/retrieval/cluster_view.py extracts the signature from each retrieved chunk's templated text body (log_chunks rows store "Signature: <sig>\nExamples: ..." — we read the prefix back out rather than re-running get_signature() over the templated string) and groups by signature. Aggregates: count, deduped service set, first/last timestamp. Singletons are dropped by default; they're the per-turn timeline's job, not this panel's. - /chat emits a `clusters: [...]` key on the SSE done event. Each entry: signature, count, services, first_ts, last_ts. Empty list when nothing crosses the min_count=2 threshold; the UI then hides the panel entirely. - web/components/chat/EventClusters.tsx renders an inline collapsible card under the assistant turn: count badge, signature in mono, service badges, time range. Default-open when ≤5 clusters. - Caveat documented in the docstring and the UI subtitle: clusters are over the retrieved top-K, not a corpus-wide aggregate. A real /clusters endpoint with a first-class signature column is the next step if that distinction starts to matter.
Addresses PR #71 review. - repi/api/chat.py: factor `_normalize_ts` and route both chunk-construction sites through it. The RRF path and the find_logs_by_id entity-bias path used different inline forms; both now produce ISO 8601 string or None, full stop. Closes the mixed-type hazard that cluster_view's `<`/`>` (and Stream 3's `sorted(...)`) would have hit if a future change to either source path reintroduced raw datetimes. - repi/retrieval/cluster_view.py: drop the get_signature fallback for chunks without a `Signature:` prefix. Re-running the masking regex over the whole templated body would also mask numerics inside `Examples: ...`, producing a signature that doesn't match what the ingestor would have stored for the same raw line — silent mis-clustering. Log a warning so we notice dual-source state and return empty so cluster_chunks skips the chunk. - web/components/chat/EventClusters.tsx: break-all → break-words on the signature code element. break-all chops mid-word and reads ugly on code-shaped strings. - tests/api/test_chat_timestamp_normalisation.py pins the _normalize_ts contract (None passthrough, naive/aware datetime → ISO, string idempotent). - tests/retrieval/test_cluster_view.py updated for the new empty-on-untemplated contract, asserts the warning lands.
…panel Stream 3 of the drift-alignment plan. Promotes the timeline view from an internal ReAct tool (investigation.tools.get_timeline) to a first-class chat artifact. Per the feedback: timelines are "insanely useful" and people love them — a chronological narrative beats a chunk dump for any RCA story. - repi/retrieval/timeline_view.py builds the timeline from the chunks the chat path has already hydrated — no second DB roundtrip. Sorts chronologically (ISO strings sort lexically; chat path normalises to UTC upstream via _dh.to_iso), then collapses consecutive runs with identical (service, level, signature) into one entry carrying first_ts / last_ts / repeat_count. The user sees "auth-service ERROR x12 14:02–14:04" instead of twelve near-identical lines. - Collapse key is (service, level, signature). Two ERRORs and a WARNING with the same masked template stay separate — INFO setup ≠ ERROR fallout, and cross-service hits with the same signature are coincidence, not a run. - Chunks without a timestamp are dropped. Placing them in chronological order would require fabricating a position, and "where exactly" is precisely what a timeline answers. - /chat emits a `timeline: [...]` key on the SSE done event alongside the existing clusters payload from Stream 2. - web/components/chat/Timeline.tsx renders a vertical timeline under the assistant message: HH:MM:SS on the left (full ISO on hover), service + level badges (level color-coded — ERROR red, WARNING amber, INFO blue), repeat count when >1, signature in mono. Default-open when ≤15 entries, collapsible above that.
Addresses PR #72 review. - Rename cluster_view._extract_signature → extract_signature. The moment timeline_view imported it, the leading underscore was no longer telling the truth — it's a shared primitive across two modules. Linters and readers were both being misled by the name. - timeline_view: import the renamed symbol; add a debug-log tally for chunks dropped because they lack a signature. A spike in that count signals dual-source state (external imports, pre-ingestor data) and matches the warning cluster_view already emits at extraction time.
Stream 4 of the drift-alignment plan. Closes the polish gap so /chat feels like a product, not a debug surface. Token streaming is the only plan item explicitly deferred — making it production-quality across five provider adapters with partial-stream error handling deserves its own PR rather than ride along here. - /chat ChatRequest gains optional `previous_chunk_ids: list[str]`. The frontend passes the last assistant turn's cited IDs; the backend reads them via the existing vector_store.get_chunks_by_ids (indexed PK lookup) and uses them to default-fill service + ±5min time envelope when the current intent has no explicit filter. Soft hint — caller filters and resolver output always win. - /chat done payload gains `cited_chunks: [...]` — minimal projection (chunk_id, service, level, timestamp, 600-char text window matching the LLM prompt) so the new UI evidence panel renders without a follow-up roundtrip. - web/components/chat/CitedChunks.tsx — third inline collapsible under the assistant turn, default-closed (it's the debug-grade view; the story is in Timeline and Clusters above it). Stacked-collapsibles approach chosen over a tabbed roll-up after the simpler-alternative surface-up. - Timeline and EventClusters gain optional controlled-mode `open` + `onOpenChange` props. Falls through to uncontrolled internal state when callers don't pass them — no behavior change for existing call sites. - ChatMessageView gains "Show timeline", "Show clusters", and "Investigate deeper" quick-action buttons under the assistant turn. The first two open the corresponding panel and scrollIntoView it; the third invokes a parent-supplied onInvestigateDeeper(query) callback which page.tsx wires to flip the Deep Research toggle and re-run the same query through /investigate. - README and the chat empty state reframe — repi now leads with the observability framing (continuous ingestion, hybrid retrieval, event clusters, incident timelines, optional autonomous root-cause investigation) instead of "log investigation engine."
…tings knob Addresses PR #73 review. - ChatRequest.previous_chunk_ids now Field(default_factory=list, max_length=50). Bounds the indexed-PK fetch and rejects malformed payloads. The legitimate caller only ever sends the last assistant turn's citations (<=10 in practice); 50 is generous headroom. - Service-narrowing now gates on dominance: pin the previous turn's top service only when its count >= SERVICE_DOMINANCE_RATIO (2x) the runner-up. Below that ratio the previous turn straddled services — a cross-service incident — and pinning one would hide the other half on the followup. Both branches (pin / skip) log at debug so sanity-checking on real conversations is one tail away. - Hardcoded `timedelta(minutes=5)` → `Settings.FOLLOWUP_BIAS_WINDOW_MINUTES` (default 5). Same conceptual dial as TIME_WINDOW_INITIAL_MINUTES, kept separate so operators tune them independently. - ChatRequest docstring corrected: was "neither in filters nor from the resolver" (logical inverse), now matches the actual "either missing" behaviour the code implements. - Inline imports of Counter and timedelta moved to module-top so the dependency graph stays visible to tooling. - CHUNK_TEXT_WINDOW = 600 constant extracted and used at both sites (the LLM evidence block and the SSE cited_chunks payload). Now they can't drift apart. - web/app/page.tsx: comment pinning the no-race invariant on onInvestigateDeeper. handleSend reads its second arg, not React state, so the setDeepResearch call is purely UI sync — order doesn't matter, setState's asynchrony can't reroute the request.
feat(chat): event clusters surfaced in /chat + UI panel
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
- log_parser: support log4j comma-millis, syslog (year inferred, never future), nginx/apache access logs; normalise all timestamps to naive UTC at one point; syslog error:/fatal: body tokens map to real levels - LogIngestor.ingest returns IngestStats (chunk_count, lines_total, lines_with_timestamp, level_counts); warns when zero timestamps parse - POST /ingest surfaces parse-quality fields and refreshes known_services so a freshly ingested service is immediately visible to the resolver - verified on LogHub OpenSSH + Zookeeper: 0/2000 -> 2000/2000 timestamps, level counts match file truth (1318 WARN + 13 ERROR)
Settings read a cwd-relative path while the CLI anchored to the repo root — 'repi serve' from any other directory silently booted with class defaults (openai provider, no key). Resolve by walking up from cwd, then the package-anchored path; fall back to the cwd-relative default so a fresh PUT /config can still create the file.
…e SSE stream
The documented POST-then-poll flow never executes (the loop runs while a
client is attached to /investigations/{id}/stream) and the stream URL was
wrong. Document the two-step flow in README and CLAUDE.md.
uv sync --frozen + pytest with uv caching; also gitignore local tmp-ui-tests/ and assets/ scratch folders.
…up bias, cited-chunks)
test_doctor_* implicitly depended on the dev machine's real .repi/config.json existing — first CI run on a fresh checkout exposed it. Point REPO_ROOT/CONFIG_DIR/CONFIG_FILE at a tmp config in both tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.