Skip to content

Feat/drift alignment#74

Merged
VarunGitGood merged 14 commits into
mainfrom
feat/drift-alignment
Jun 11, 2026
Merged

Feat/drift alignment#74
VarunGitGood merged 14 commits into
mainfrom
feat/drift-alignment

Conversation

@VarunGitGood

Copy link
Copy Markdown
Owner

No description provided.

VarunGitGood and others added 7 commits June 8, 2026 22:08
Stream 2 of the drift-alignment plan — promote signature-based event
clustering from an ingest-time implementation detail to a first-class
product surface. Per the feedback: "Humans don't want 5000 logs. They
want event clusters."

- repi/retrieval/cluster_view.py extracts the signature from each
  retrieved chunk's templated text body (log_chunks rows store
  "Signature: <sig>\nExamples: ..." — we read the prefix back out
  rather than re-running get_signature() over the templated string)
  and groups by signature. Aggregates: count, deduped service set,
  first/last timestamp. Singletons are dropped by default; they're
  the per-turn timeline's job, not this panel's.
- /chat emits a `clusters: [...]` key on the SSE done event. Each
  entry: signature, count, services, first_ts, last_ts. Empty list
  when nothing crosses the min_count=2 threshold; the UI then hides
  the panel entirely.
- web/components/chat/EventClusters.tsx renders an inline
  collapsible card under the assistant turn: count badge, signature
  in mono, service badges, time range. Default-open when ≤5 clusters.
- Caveat documented in the docstring and the UI subtitle: clusters
  are over the retrieved top-K, not a corpus-wide aggregate. A real
  /clusters endpoint with a first-class signature column is the next
  step if that distinction starts to matter.
Addresses PR #71 review.

- repi/api/chat.py: factor `_normalize_ts` and route both chunk-construction
  sites through it. The RRF path and the find_logs_by_id entity-bias path
  used different inline forms; both now produce ISO 8601 string or None,
  full stop. Closes the mixed-type hazard that cluster_view's `<`/`>` (and
  Stream 3's `sorted(...)`) would have hit if a future change to either
  source path reintroduced raw datetimes.
- repi/retrieval/cluster_view.py: drop the get_signature fallback for chunks
  without a `Signature:` prefix. Re-running the masking regex over the
  whole templated body would also mask numerics inside `Examples: ...`,
  producing a signature that doesn't match what the ingestor would have
  stored for the same raw line — silent mis-clustering. Log a warning so
  we notice dual-source state and return empty so cluster_chunks skips
  the chunk.
- web/components/chat/EventClusters.tsx: break-all → break-words on the
  signature code element. break-all chops mid-word and reads ugly on
  code-shaped strings.
- tests/api/test_chat_timestamp_normalisation.py pins the _normalize_ts
  contract (None passthrough, naive/aware datetime → ISO, string
  idempotent).
- tests/retrieval/test_cluster_view.py updated for the new
  empty-on-untemplated contract, asserts the warning lands.
…panel

Stream 3 of the drift-alignment plan. Promotes the timeline view from an
internal ReAct tool (investigation.tools.get_timeline) to a first-class
chat artifact. Per the feedback: timelines are "insanely useful" and
people love them — a chronological narrative beats a chunk dump for any
RCA story.

- repi/retrieval/timeline_view.py builds the timeline from the chunks
  the chat path has already hydrated — no second DB roundtrip. Sorts
  chronologically (ISO strings sort lexically; chat path normalises to
  UTC upstream via _dh.to_iso), then collapses consecutive runs with
  identical (service, level, signature) into one entry carrying
  first_ts / last_ts / repeat_count. The user sees "auth-service ERROR
  x12 14:02–14:04" instead of twelve near-identical lines.
- Collapse key is (service, level, signature). Two ERRORs and a WARNING
  with the same masked template stay separate — INFO setup ≠ ERROR
  fallout, and cross-service hits with the same signature are
  coincidence, not a run.
- Chunks without a timestamp are dropped. Placing them in chronological
  order would require fabricating a position, and "where exactly" is
  precisely what a timeline answers.
- /chat emits a `timeline: [...]` key on the SSE done event alongside
  the existing clusters payload from Stream 2.
- web/components/chat/Timeline.tsx renders a vertical timeline under
  the assistant message: HH:MM:SS on the left (full ISO on hover),
  service + level badges (level color-coded — ERROR red, WARNING amber,
  INFO blue), repeat count when >1, signature in mono. Default-open
  when ≤15 entries, collapsible above that.
Addresses PR #72 review.

- Rename cluster_view._extract_signature → extract_signature. The moment
  timeline_view imported it, the leading underscore was no longer telling
  the truth — it's a shared primitive across two modules. Linters and
  readers were both being misled by the name.
- timeline_view: import the renamed symbol; add a debug-log tally for
  chunks dropped because they lack a signature. A spike in that count
  signals dual-source state (external imports, pre-ingestor data) and
  matches the warning cluster_view already emits at extraction time.
Stream 4 of the drift-alignment plan. Closes the polish gap so /chat
feels like a product, not a debug surface. Token streaming is the only
plan item explicitly deferred — making it production-quality across
five provider adapters with partial-stream error handling deserves its
own PR rather than ride along here.

- /chat ChatRequest gains optional `previous_chunk_ids: list[str]`. The
  frontend passes the last assistant turn's cited IDs; the backend reads
  them via the existing vector_store.get_chunks_by_ids (indexed PK
  lookup) and uses them to default-fill service + ±5min time envelope
  when the current intent has no explicit filter. Soft hint — caller
  filters and resolver output always win.
- /chat done payload gains `cited_chunks: [...]` — minimal projection
  (chunk_id, service, level, timestamp, 600-char text window matching
  the LLM prompt) so the new UI evidence panel renders without a
  follow-up roundtrip.
- web/components/chat/CitedChunks.tsx — third inline collapsible under
  the assistant turn, default-closed (it's the debug-grade view; the
  story is in Timeline and Clusters above it). Stacked-collapsibles
  approach chosen over a tabbed roll-up after the simpler-alternative
  surface-up.
- Timeline and EventClusters gain optional controlled-mode `open` +
  `onOpenChange` props. Falls through to uncontrolled internal state
  when callers don't pass them — no behavior change for existing call
  sites.
- ChatMessageView gains "Show timeline", "Show clusters", and
  "Investigate deeper" quick-action buttons under the assistant turn.
  The first two open the corresponding panel and scrollIntoView it; the
  third invokes a parent-supplied onInvestigateDeeper(query) callback
  which page.tsx wires to flip the Deep Research toggle and re-run the
  same query through /investigate.
- README and the chat empty state reframe — repi now leads with the
  observability framing (continuous ingestion, hybrid retrieval, event
  clusters, incident timelines, optional autonomous root-cause
  investigation) instead of "log investigation engine."
…tings knob

Addresses PR #73 review.

- ChatRequest.previous_chunk_ids now Field(default_factory=list, max_length=50).
  Bounds the indexed-PK fetch and rejects malformed payloads. The legitimate
  caller only ever sends the last assistant turn's citations (<=10 in
  practice); 50 is generous headroom.
- Service-narrowing now gates on dominance: pin the previous turn's top
  service only when its count >= SERVICE_DOMINANCE_RATIO (2x) the runner-up.
  Below that ratio the previous turn straddled services — a cross-service
  incident — and pinning one would hide the other half on the followup.
  Both branches (pin / skip) log at debug so sanity-checking on real
  conversations is one tail away.
- Hardcoded `timedelta(minutes=5)` → `Settings.FOLLOWUP_BIAS_WINDOW_MINUTES`
  (default 5). Same conceptual dial as TIME_WINDOW_INITIAL_MINUTES, kept
  separate so operators tune them independently.
- ChatRequest docstring corrected: was "neither in filters nor from the
  resolver" (logical inverse), now matches the actual "either missing"
  behaviour the code implements.
- Inline imports of Counter and timedelta moved to module-top so the
  dependency graph stays visible to tooling.
- CHUNK_TEXT_WINDOW = 600 constant extracted and used at both sites (the
  LLM evidence block and the SSE cited_chunks payload). Now they can't
  drift apart.
- web/app/page.tsx: comment pinning the no-race invariant on
  onInvestigateDeeper. handleSend reads its second arg, not React state,
  so the setDeepResearch call is purely UI sync — order doesn't matter,
  setState's asynchrony can't reroute the request.
feat(chat): event clusters surfaced in /chat + UI panel
@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
repi Ready Ready Preview, Comment Jun 11, 2026 11:06am

- log_parser: support log4j comma-millis, syslog (year inferred, never
  future), nginx/apache access logs; normalise all timestamps to naive UTC
  at one point; syslog error:/fatal: body tokens map to real levels
- LogIngestor.ingest returns IngestStats (chunk_count, lines_total,
  lines_with_timestamp, level_counts); warns when zero timestamps parse
- POST /ingest surfaces parse-quality fields and refreshes known_services
  so a freshly ingested service is immediately visible to the resolver
- verified on LogHub OpenSSH + Zookeeper: 0/2000 -> 2000/2000 timestamps,
  level counts match file truth (1318 WARN + 13 ERROR)
Settings read a cwd-relative path while the CLI anchored to the repo root —
'repi serve' from any other directory silently booted with class defaults
(openai provider, no key). Resolve by walking up from cwd, then the
package-anchored path; fall back to the cwd-relative default so a fresh
PUT /config can still create the file.
…e SSE stream

The documented POST-then-poll flow never executes (the loop runs while a
client is attached to /investigations/{id}/stream) and the stream URL was
wrong. Document the two-step flow in README and CLAUDE.md.
uv sync --frozen + pytest with uv caching; also gitignore local
tmp-ui-tests/ and assets/ scratch folders.
test_doctor_* implicitly depended on the dev machine's real
.repi/config.json existing — first CI run on a fresh checkout exposed it.
Point REPO_ROOT/CONFIG_DIR/CONFIG_FILE at a tmp config in both tests.
@VarunGitGood VarunGitGood merged commit f132a08 into main Jun 11, 2026
4 checks passed
@VarunGitGood VarunGitGood deleted the feat/drift-alignment branch June 14, 2026 11:59
VarunGitGood added a commit that referenced this pull request Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant