Feat/drift alignment by VarunGitGood · Pull Request #74 · VarunGitGood/repi

VarunGitGood · 2026-06-09T14:55:37Z

No description provided.

Stream 2 of the drift-alignment plan — promote signature-based event clustering from an ingest-time implementation detail to a first-class product surface. Per the feedback: "Humans don't want 5000 logs. They want event clusters." - repi/retrieval/cluster_view.py extracts the signature from each retrieved chunk's templated text body (log_chunks rows store "Signature: <sig>\nExamples: ..." — we read the prefix back out rather than re-running get_signature() over the templated string) and groups by signature. Aggregates: count, deduped service set, first/last timestamp. Singletons are dropped by default; they're the per-turn timeline's job, not this panel's. - /chat emits a `clusters: [...]` key on the SSE done event. Each entry: signature, count, services, first_ts, last_ts. Empty list when nothing crosses the min_count=2 threshold; the UI then hides the panel entirely. - web/components/chat/EventClusters.tsx renders an inline collapsible card under the assistant turn: count badge, signature in mono, service badges, time range. Default-open when ≤5 clusters. - Caveat documented in the docstring and the UI subtitle: clusters are over the retrieved top-K, not a corpus-wide aggregate. A real /clusters endpoint with a first-class signature column is the next step if that distinction starts to matter.

Addresses PR #71 review. - repi/api/chat.py: factor `_normalize_ts` and route both chunk-construction sites through it. The RRF path and the find_logs_by_id entity-bias path used different inline forms; both now produce ISO 8601 string or None, full stop. Closes the mixed-type hazard that cluster_view's `<`/`>` (and Stream 3's `sorted(...)`) would have hit if a future change to either source path reintroduced raw datetimes. - repi/retrieval/cluster_view.py: drop the get_signature fallback for chunks without a `Signature:` prefix. Re-running the masking regex over the whole templated body would also mask numerics inside `Examples: ...`, producing a signature that doesn't match what the ingestor would have stored for the same raw line — silent mis-clustering. Log a warning so we notice dual-source state and return empty so cluster_chunks skips the chunk. - web/components/chat/EventClusters.tsx: break-all → break-words on the signature code element. break-all chops mid-word and reads ugly on code-shaped strings. - tests/api/test_chat_timestamp_normalisation.py pins the _normalize_ts contract (None passthrough, naive/aware datetime → ISO, string idempotent). - tests/retrieval/test_cluster_view.py updated for the new empty-on-untemplated contract, asserts the warning lands.

…panel Stream 3 of the drift-alignment plan. Promotes the timeline view from an internal ReAct tool (investigation.tools.get_timeline) to a first-class chat artifact. Per the feedback: timelines are "insanely useful" and people love them — a chronological narrative beats a chunk dump for any RCA story. - repi/retrieval/timeline_view.py builds the timeline from the chunks the chat path has already hydrated — no second DB roundtrip. Sorts chronologically (ISO strings sort lexically; chat path normalises to UTC upstream via _dh.to_iso), then collapses consecutive runs with identical (service, level, signature) into one entry carrying first_ts / last_ts / repeat_count. The user sees "auth-service ERROR x12 14:02–14:04" instead of twelve near-identical lines. - Collapse key is (service, level, signature). Two ERRORs and a WARNING with the same masked template stay separate — INFO setup ≠ ERROR fallout, and cross-service hits with the same signature are coincidence, not a run. - Chunks without a timestamp are dropped. Placing them in chronological order would require fabricating a position, and "where exactly" is precisely what a timeline answers. - /chat emits a `timeline: [...]` key on the SSE done event alongside the existing clusters payload from Stream 2. - web/components/chat/Timeline.tsx renders a vertical timeline under the assistant message: HH:MM:SS on the left (full ISO on hover), service + level badges (level color-coded — ERROR red, WARNING amber, INFO blue), repeat count when >1, signature in mono. Default-open when ≤15 entries, collapsible above that.

Addresses PR #72 review. - Rename cluster_view._extract_signature → extract_signature. The moment timeline_view imported it, the leading underscore was no longer telling the truth — it's a shared primitive across two modules. Linters and readers were both being misled by the name. - timeline_view: import the renamed symbol; add a debug-log tally for chunks dropped because they lack a signature. A spike in that count signals dual-source state (external imports, pre-ingestor data) and matches the warning cluster_view already emits at extraction time.

Stream 4 of the drift-alignment plan. Closes the polish gap so /chat feels like a product, not a debug surface. Token streaming is the only plan item explicitly deferred — making it production-quality across five provider adapters with partial-stream error handling deserves its own PR rather than ride along here. - /chat ChatRequest gains optional `previous_chunk_ids: list[str]`. The frontend passes the last assistant turn's cited IDs; the backend reads them via the existing vector_store.get_chunks_by_ids (indexed PK lookup) and uses them to default-fill service + ±5min time envelope when the current intent has no explicit filter. Soft hint — caller filters and resolver output always win. - /chat done payload gains `cited_chunks: [...]` — minimal projection (chunk_id, service, level, timestamp, 600-char text window matching the LLM prompt) so the new UI evidence panel renders without a follow-up roundtrip. - web/components/chat/CitedChunks.tsx — third inline collapsible under the assistant turn, default-closed (it's the debug-grade view; the story is in Timeline and Clusters above it). Stacked-collapsibles approach chosen over a tabbed roll-up after the simpler-alternative surface-up. - Timeline and EventClusters gain optional controlled-mode `open` + `onOpenChange` props. Falls through to uncontrolled internal state when callers don't pass them — no behavior change for existing call sites. - ChatMessageView gains "Show timeline", "Show clusters", and "Investigate deeper" quick-action buttons under the assistant turn. The first two open the corresponding panel and scrollIntoView it; the third invokes a parent-supplied onInvestigateDeeper(query) callback which page.tsx wires to flip the Deep Research toggle and re-run the same query through /investigate. - README and the chat empty state reframe — repi now leads with the observability framing (continuous ingestion, hybrid retrieval, event clusters, incident timelines, optional autonomous root-cause investigation) instead of "log investigation engine."

…tings knob Addresses PR #73 review. - ChatRequest.previous_chunk_ids now Field(default_factory=list, max_length=50). Bounds the indexed-PK fetch and rejects malformed payloads. The legitimate caller only ever sends the last assistant turn's citations (<=10 in practice); 50 is generous headroom. - Service-narrowing now gates on dominance: pin the previous turn's top service only when its count >= SERVICE_DOMINANCE_RATIO (2x) the runner-up. Below that ratio the previous turn straddled services — a cross-service incident — and pinning one would hide the other half on the followup. Both branches (pin / skip) log at debug so sanity-checking on real conversations is one tail away. - Hardcoded `timedelta(minutes=5)` → `Settings.FOLLOWUP_BIAS_WINDOW_MINUTES` (default 5). Same conceptual dial as TIME_WINDOW_INITIAL_MINUTES, kept separate so operators tune them independently. - ChatRequest docstring corrected: was "neither in filters nor from the resolver" (logical inverse), now matches the actual "either missing" behaviour the code implements. - Inline imports of Counter and timedelta moved to module-top so the dependency graph stays visible to tooling. - CHUNK_TEXT_WINDOW = 600 constant extracted and used at both sites (the LLM evidence block and the SSE cited_chunks payload). Now they can't drift apart. - web/app/page.tsx: comment pinning the no-race invariant on onInvestigateDeeper. handleSend reads its second arg, not React state, so the setDeepResearch call is purely UI sync — order doesn't matter, setState's asynchrony can't reroute the request.

feat(chat): event clusters surfaced in /chat + UI panel

vercel · 2026-06-09T14:55:43Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
repi	Ready	Preview, Comment	Jun 11, 2026 11:06am

- log_parser: support log4j comma-millis, syslog (year inferred, never future), nginx/apache access logs; normalise all timestamps to naive UTC at one point; syslog error:/fatal: body tokens map to real levels - LogIngestor.ingest returns IngestStats (chunk_count, lines_total, lines_with_timestamp, level_counts); warns when zero timestamps parse - POST /ingest surfaces parse-quality fields and refreshes known_services so a freshly ingested service is immediately visible to the resolver - verified on LogHub OpenSSH + Zookeeper: 0/2000 -> 2000/2000 timestamps, level counts match file truth (1318 WARN + 13 ERROR)

Settings read a cwd-relative path while the CLI anchored to the repo root — 'repi serve' from any other directory silently booted with class defaults (openai provider, no key). Resolve by walking up from cwd, then the package-anchored path; fall back to the cwd-relative default so a fresh PUT /config can still create the file.

…e SSE stream The documented POST-then-poll flow never executes (the loop runs while a client is attached to /investigations/{id}/stream) and the stream URL was wrong. Document the two-step flow in README and CLAUDE.md.

uv sync --frozen + pytest with uv caching; also gitignore local tmp-ui-tests/ and assets/ scratch folders.

…up bias, cited-chunks)

test_doctor_* implicitly depended on the dev machine's real .repi/config.json existing — first CI run on a fresh checkout exposed it. Point REPO_ROOT/CONFIG_DIR/CONFIG_FILE at a tmp config in both tests.

Feat/drift alignment

VarunGitGood and others added 7 commits June 8, 2026 22:08

Merge pull request #71 from VarunGitGood/feat/drift-alignment-s2

0bb0763

feat(chat): event clusters surfaced in /chat + UI panel

VarunGitGood added 5 commits June 11, 2026 16:17

ci: run pytest on pushes to main and all PRs

207ed6f

uv sync --frozen + pytest with uv caching; also gitignore local tmp-ui-tests/ and assets/ scratch folders.

Merge branch 'feat/drift-alignment-s4' (s3+s4: timeline panel, follow…

0ad32e2

…up bias, cited-chunks)

vercel Bot deployed to Preview June 11, 2026 10:53 View deployment

test(doctor): isolate config-presence check from the host filesystem

a7957ba

test_doctor_* implicitly depended on the dev machine's real .repi/config.json existing — first CI run on a fresh checkout exposed it. Point REPO_ROOT/CONFIG_DIR/CONFIG_FILE at a tmp config in both tests.

vercel Bot deployed to Preview June 11, 2026 10:58 View deployment

chore: bump version to 0.2.0

78d434f

vercel Bot deployed to Preview June 11, 2026 11:06 View deployment

VarunGitGood merged commit f132a08 into main Jun 11, 2026
4 checks passed

VarunGitGood deleted the feat/drift-alignment branch June 14, 2026 11:59

VarunGitGood added a commit that referenced this pull request Jun 14, 2026

Merge pull request #74 from VarunGitGood/feat/drift-alignment

ffa1ec8

Feat/drift alignment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/drift alignment#74

Feat/drift alignment#74
VarunGitGood merged 14 commits into
mainfrom
feat/drift-alignment

VarunGitGood commented Jun 9, 2026

Uh oh!

vercel Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

VarunGitGood commented Jun 9, 2026

Uh oh!

vercel Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 9, 2026 •

edited

Loading