Skip to content

Launch polish: project scoping, timeline landing, structured answers + SSE hardening#79

Merged
VarunGitGood merged 7 commits into
mainfrom
feat/launch-polish
Jun 13, 2026
Merged

Launch polish: project scoping, timeline landing, structured answers + SSE hardening#79
VarunGitGood merged 7 commits into
mainfrom
feat/launch-polish

Conversation

@VarunGitGood

Copy link
Copy Markdown
Owner

Summary

Launch-prep work bringing repi to v0.2.0 — UX redesign, ingestion/parser coverage, LLM efficiency, and final-answer / SSE fixes.

Highlights

UX redesign (P1/P2)

  • Project-centric scoping: every conversation is scoped to a project; retrieval and all ReAct tool calls carry project_id.
  • Timeline-first landing — event feed, project overview, and guided actions before the user types.
  • Live thinking indicator during investigations.

Investigation / answers

  • Compiled answers render as a structured card (root cause, confidence, affected services, trigger, propagation chain, ruled-out hypotheses, gaps) instead of raw JSON. Non-JSON answers fall back to plain text.
  • ReAct loop gathers evidence; a separate compile-LLM call produces the validated InvestigationAnswer.

SSE hardening

  • Live clarifications now emit clarification_request instead of a meaningless done (the question was previously lost until a page reload).
  • json.dumps in the stream uses default=str so a stray non-serializable observation can't kill the stream.
  • Client drops a single unparseable frame instead of dropping the connection, and tolerates EventSource auto-reconnect on transient disconnects.

Ingestion

  • Real-log parser coverage + readable signatures; preserve API version segments (e.g. /api/v1/) in signatures.

LLM

  • Cut token usage per turn; wait out 429s across every provider.

Verification

  • pytest tests/254 passed
  • npx tsc --noEmit + npm run build → clean
  • docker compose build app → image builds clean; docker compose config valid
  • Live SSE checked end-to-end: structured-answer path and live-clarification path both confirmed.

Launch note

The structured-answer fix ships in the web bundle, so the published image needs a rebuild + push to deliver it.

- projects table (settings JSONB: default_timeline_window, auto_load_timeline,
  max_events) + project_id on log_chunks/watcher_configs/conversations/
  investigations; real signature column on log_chunks (un-defers Path B from
  cluster_view) with idempotent backfill; Default project seed absorbs all
  pre-project rows
- /projects CRUD + /projects/{id}/services; resolve_project shared name-or-id
  resolver (name get-or-create, uuid must exist, blank -> Default)
- ingest stamps signature + project_id (API form field, worker via
  watcher_configs.project_id); /watchers accepts project_id
- scoping: RetrievalFilters.project_id applies to both vector + FTS arms;
  every ReAct tool gains project_id injected via container closures (LLM
  never sees it; cache keys include it); chat + investigate accept/inherit
  project_id; known_services resolved per project
- conversations list/detail return project_id + name for the sidebar

Verified live: zk ingested into 'Infra', ssh into Default; Infra-scoped chat
cites only Infra chunks; asking Infra about ssh-server clarifies with zero
cross-project leakage.
…, guided actions (UX P2)

- event_feed.py: deterministic event rules over per-(service, signature)
  time buckets — begins / spike / subsides / new_pattern / health
  transitions. Pure rule engine, no LLM cost per load; 15 unit tests.
- GET /projects/{id}/overview: events + corpus-wide signature clusters
  (the un-deferred Path B) + services + derived suggested actions.
  Window anchors to now and falls back to the project's latest data so
  the landing page always tells the most recent story available.
- UI: ProjectPicker (0 projects -> create, 1 -> auto-select, 2+ -> cards)
  replaces the empty-chat hero; ProjectOverview renders the timeline,
  clusters, services and suggested-action chips as the landing panel;
  chips route to Deep Research (grounded query: signature + service +
  time range pre-filled) or /chat; sidebar shows per-conversation
  project badges; chat + investigate carry the conversation's project.

Verified live on real LogHub data: Infra overview shows 'zookeeper enters
degraded state -> recovers' with clusters and 5 action chips; clicking an
Investigate chip ran a scoped DR investigation end-to-end. 6/6 puppeteer
flows pass against the new flow.
- parse Apache error-log format ([Sun Dec 04 ...] [error] ...) with level mapping
- tag syslog pam/sshd auth-failure bodies WARNING so error-scans surface them
- signature masking: collapse IPv4 to <IP>; preserve HTTP status codes,
  protocol versions, and mid-identifier digits (jk2_init) — high-cardinality
  tokens still mask
- sources.md records public datasets used for real-log testing
- _compact_observation clips tool results fed back to the LLM (lists to 10
  items, text to 300 chars, valid-JSON fallbacks) — full results still
  persisted to DB/ledger; sweep context no longer pretty-printed
- shared _post_with_429_retry: Retry-After-aware 429 waits for OpenAI,
  Anthropic, Gemini (previously Mistral-only)
- loop retries honor retry_after and never retry LLMBadRequestError
Steps only stream after a tool observation completes, leaving 10-30s of
dead air. ThinkingIndicator fills the gap with a contextual status line
(per-tool wording, reflection, compiling) rotating through generic
thinking words every 3s. Hidden on done/error/clarification.
The compiled InvestigationAnswer is persisted as json.dumps(...), but the
UI rendered it as plain text — so the final card showed raw JSON. Add a
CompiledAnswer component that parses the answer and renders root cause,
confidence, affected services, trigger, propagation chain, ruled-out
hypotheses, assumptions and gaps; non-JSON answers (clarification text,
legacy prose) fall back to plain text.

SSE robustness:
- Live clarifications now emit `clarification_request` instead of a
  meaningless `done` with "Awaiting clarification..." — previously the
  question was lost until a page reload hit the replay path.
- json.dumps in the stream generator uses default=str so a stray
  non-serializable observation can't kill the stream.
- Client drops a single unparseable frame instead of tearing down the
  connection, and tolerates EventSource auto-reconnect on transient
  disconnects rather than surfacing a hard error.
@vercel

vercel Bot commented Jun 13, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
repi Ready Ready Preview, Comment Jun 13, 2026 4:04pm

@VarunGitGood VarunGitGood merged commit 9f44891 into main Jun 13, 2026
4 checks passed
@VarunGitGood VarunGitGood deleted the feat/launch-polish branch June 14, 2026 11:59
VarunGitGood added a commit that referenced this pull request Jun 14, 2026
Launch polish: project scoping, timeline landing, structured answers + SSE hardening
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant