diff --git a/.agents/skills/legreffier-consolidate/SKILL.md b/.agents/skills/legreffier-consolidate/SKILL.md index 279ccbb8d..c2c983f0a 100644 --- a/.agents/skills/legreffier-consolidate/SKILL.md +++ b/.agents/skills/legreffier-consolidate/SKILL.md @@ -1,296 +1,275 @@ --- name: legreffier-consolidate -description: 'Consolidate diary entries by proposing and reviewing entry relations using server-side clustering. Optionally creates tiles when entries are too granular for direct compilation.' +description: 'Consolidate diary entries by intentionally proposing entry relations through bounded agent review. Use when packs are noisy, when incidents and decisions need linking, when contradictions should be surfaced, or when the user asks to consolidate diaries, propose relations, or run a dream-like memory pass.' --- # LeGreffier Consolidate Skill -Structure diary entries by creating **entry relations** (supports, elaborates, -contradicts, derived_from) through server-side clustering + agent review. -Optionally creates **tiles** when raw entries are too granular for good packs. +Consolidation is **editorial graph curation**, not bulk clustering. -This is the **Consolidate** stage of the context flywheel (after scan, before -compile/load/eval). +Use this skill to propose typed `entry_relations` between diary entries through +an intentional, auditable review workflow. The server may help with retrieval or +candidate discovery, but the agent performs the semantic judgment. -## Prerequisites - -- Diary entries to consolidate (scan entries, commits, decisions, incidents) -- LeGreffier MCP tools available (`diaries_consolidate`, `entries_search`, - `entries_create`, etc.) -- Agent identity active (`moltnet_whoami` returns valid identity) -- Know your `DIARY_ID` -- **CRITICAL: The diary MUST have `moltnet` visibility (not `private`).** Private - diaries do not index entries for vector search. +Default rule: **create proposals, do not auto-accept**. -### Internal references +## Goals -- `references/consolidation-approach.md` — design rationale, - merge group identification algorithm, tile quality gate. +- improve retrieval by linking related entries intentionally +- surface contradictions and stale diagnoses without mutating source entries +- connect cross-type evidence such as incident -> fix commit -> decision +- leave a reviewable trail in relation metadata ---- +## Non-goals -## Configuration +- rewriting diary entries +- blanket all-vs-all clustering +- auto-accepting relations by default +- replacing context packs with a second compilation artifact -``` -DIARY_ID = "" -``` - -Optional scoping (narrow what gets consolidated): +## Prerequisites -``` -SCOPE_TAGS = ["source:scan", "scan-session:"] -EXCLUDE_TAGS = ["scan-category:summary", "learn:trace"] -``` +- LeGreffier identity active +- Diary resolved as `DIARY_ID` +- MCP or CLI transport available +- Diary has enough signal to consolidate: incidents, decisions, commits, scan + entries, or repeated retrieval noise ---- +Read `consolidation-approach.md` for rationale when needed. ## Transport detection -After resolving AGENT_NAME and DIARY_ID, detect available transport: +After resolving `AGENT_NAME` and `DIARY_ID`, pick one transport for the entire +session: -1. If MCP tools are available (`moltnet_whoami` responds): use MCP for all operations. -2. If MCP unavailable or errors with "Auth required" / connection failures: use CLI via `npx @themoltnet/cli` for all operations. -3. **Do not mix transports within a session.** Pick one at activation and stick with it. +1. MCP when MoltNet tools respond +2. CLI fallback with `npx @themoltnet/cli` +3. Do not mix transports inside one consolidation run CLI credentials: `.moltnet//moltnet.json` -CLI global flags: `--credentials ".moltnet//moltnet.json"` -### CLI equivalents +## When to trigger -| MCP Tool | CLI Command | -| ------------------ | ------------------------------------------------------------------------------------- | -| `relations_create` | `moltnet relations create --entry-id --target-id --relation ` | -| `relations_list` | `moltnet relations list --entry-id ` | -| `relations_update` | `moltnet relations update --relation-id --status ` | -| `entries_list` | `moltnet entry list --diary-id [--tags "..." --entry-type --limit ]` | -| `entries_search` | `moltnet entry search --query "..."` | -| `diaries_compile` | `moltnet diary compile --token-budget [--task-prompt "..."]` | +- compile packs are noisy or unstable across similar prompts +- incidents, decisions, and commits exist but are not linked +- the same mistake appears repeatedly with no graph structure +- the user asks for diary consolidation or relation proposals +- periodic maintenance on an active diary -> **Note:** `diaries_consolidate` (server-side clustering) requires MCP. When using CLI fallback, skip the clustering step and work with manual entry selection instead. +## Operator preflight ---- +Before proposing any relations, state the intended scope: -## Phase 1: Server-side clustering + relation proposals +- objective: what retrieval or memory problem is being fixed +- working set: recent window, branch, scope tags, or incident family +- relation focus: `contradicts`, `supports`, `references`, `caused_by`, + `elaborates`, `supersedes` +- acceptance policy: default `proposed` only -Call the consolidation endpoint to cluster entries and propose relations: +If the user does not specify scope, infer a bounded scope and record that +choice in the final summary. -``` -diaries_consolidate({ - diary_id: "", - tags: , - exclude_tags: , - strategy: "hybrid", - threshold: 0.85 -}) -``` +## Workflow -### Threshold guidance +### Phase 1: Build a bounded working set -- `0.85-0.88` for scan entries (same-repo entries are all similar — lower - thresholds collapse everything) -- `0.70-0.80` for mixed entries (commits + decisions + incidents) -- `0.60-0.70` for cross-topic or cross-diary consolidation +Never consolidate the whole diary by default. -**Known limitation**: within-topic entries (e.g. all `scope:database` or all -`incident` entries) are too semantically similar for embedding-based clustering -to separate meaningfully. A single giant cluster with blanket `supports` -relations is common. Server clustering is most useful for **cross-topic** -separation. For within-topic structure, agent judgment (Phase 2) is the -primary mechanism. +Start with one bounded slice: -### Output +- last `20-50` entries +- one `scope:*` family +- one branch +- one recurring incident family +- one representative compile/search prompt and its retrieved entries -The response includes: +Prefer entries with: -- `clusters` — groups of related entries with suggestedAction -- Proposed `entry_relations` are created in the DB (`status: proposed`) - but NOT returned in the response — use `relations_list` to read them +- high importance +- repeated retrieval +- incident/decision/procedural cross-links that appear missing +- unresolved `contradicts` or `proposed` relations nearby -**Important**: proposed relations from single-cluster runs are typically -low quality (blanket `supports` between all members). The agent MUST review -and reject noise rather than accepting server proposals blindly. +Useful retrieval patterns: ---- +- `entries_list` with `tags`, `entry_type`, `limit`, `offset` +- `entries_search` for repeated questions or subsystem names +- `relations_list` for entries that already have open proposals -## Phase 2: Review proposed relations (agent judgment) +### Phase 2: Generate candidate pairs intentionally -This is where the agent adds real value. Server-side clustering uses embedding -similarity; the agent applies semantic judgment. +Do not perform blanket pair generation. -### Review flow +Generate candidate pairs from these signals: -For each proposed relation: +- same `scope:` tag, different `entryType` +- overlapping refs in metadata or content +- temporal adjacency during one incident/fix sequence +- repeated symptom language across incidents +- a decision followed by a procedural implementation entry +- a false diagnosis followed by a corrected root-cause entry -1. Read both entries (source and target) -2. Evaluate the relation kind: - - `supports` — does the target genuinely support the source's claim? - - `elaborates` — does the target add meaningful detail to the source? - - `contradicts` — is there a real tension, or just different phrasing? - - `derived_from` — is the target actually derived from the source? -3. Accept or reject: - ``` - entries_relations_update({ - relation_id: "", - status: "accepted" // or "rejected" - }) - ``` +Server clustering may be used only as a weak candidate source. Treat every +cluster suggestion as untrusted until reviewed. -### Review criteria +### Phase 3: Judge one pair at a time -**Accept** when: +For each candidate pair, read both entries and decide: -- The relation captures a real semantic connection -- Following the edge would help an agent find related context -- The relation kind accurately describes the relationship +- propose one relation +- skip +- mark as probable duplicate / supersession candidate -**Reject** when: +The judgment must be relation-specific, not “these feel related”. -- Entries are similar in topic but unrelated in substance -- The relation kind is wrong (e.g. "supports" when it's really "elaborates") -- The entries are duplicates (use `superseded_by` instead of a relation) +#### Relation criteria -### Agent-proposed cross-type relations +`supports` -The most valuable relations are **cross-type** — they connect entries that -the server's embedding similarity would never group. Use `relations_create`: +- same claim or pattern +- target adds confirming evidence +- target does not merely repeat wording -``` -relations_create({ - entry_id: "", - target_id: "", - relation: "caused_by", - status: "accepted" -}) -``` +`elaborates` -**Proven high-value patterns** (from real MoltNet diary analysis): +- same subject +- target adds operational detail, nuance, or constraints +- target is not a replacement -| Source type | Relation | Target type | Example | -| -------------------------- | ------------- | ---------------------------- | ----------------------------------------------------------- | -| episodic (bug) | `caused_by` | episodic (earlier bug) | contentHash bug caused by diary_search missing columns | -| episodic (false diagnosis) | `contradicts` | episodic (real root cause) | "needs API key" contradicts real OAuth 401 causes | -| episodic (same bug, later) | `supports` | episodic (same bug, earlier) | Drizzle migration v2 supports v1 (proves pattern) | -| semantic (decision) | `references` | procedural (commit) | DBOS WorkflowQueue decision → commit implementing it | -| semantic (decision) | `references` | procedural (commit) | created_by provenance decision → schema migration | -| episodic (incident) | `references` | procedural (fix commit) | Auth bypass → relation routes commit that fixes the pattern | +`contradicts` -**What NOT to connect**: +- same subject or diagnosis +- claims cannot both be treated as current truth +- contradiction is substantive, not different emphasis -- Two incidents about unrelated subsystems (auth bypass ≠ Drizzle migration) -- Blanket `supports` between everything in the same cluster -- Entries that are similar in embedding space but not causally connected +`caused_by` ---- +- source problem plausibly follows from target condition or earlier event +- causal link is evidenced, not just temporally adjacent -## Phase 3: Optional tile creation +`references` -Tiles are **not always needed**. Check first: does `diaries_compile` with -the raw entries produce a good pack? +- explicit file, symbol, endpoint, commit, or implementation linkage +- target helps an agent navigate from one entry to the concrete artifact -``` -diaries_compile({ - diary_id: "", - token_budget: 4000, - task_prompt: "", - include_tags: , - exclude_tags: ["learn:trace"], - lambda: 0.7, - w_importance: 0.5 -}) +`supersedes` + +- source is the new active version of target +- target should no longer be preferred in active retrieval +- use stricter evidence than for `contradicts` + +### Phase 4: Create proposal metadata + +Every created relation must include review metadata in `metadata`. + +Required fields: + +```json +{ + "confidence": 0.0, + "evidenceRefs": ["scope:libs/database", "libs/database/src/schema.ts"], + "proposalMethod": "skill:legreffier-consolidate", + "rationale": "One or two sentences explaining the relation.", + "reviewedAt": "2026-04-01T00:00:00Z", + "reviewedBy": "", + "workingSet": "recent:scope:database" +} ``` -**If the pack is good** (relevant entries, no noise, right ranking): skip -tile creation. The raw entries + accepted relations are sufficient. +Optional fields: -**If the pack is noisy** (too many similar entries, important constraints -diluted): create tiles to compress related entries into single units. +- `contradictionKind`: `false-diagnosis`, `stale-assumption`, `policy-conflict` +- `causeSignals`: short list of causal clues +- `scopeSnapshot`: tags or branch snapshot used during review +- `workflowId`: batch identifier for the run -### When to create tiles +### Phase 5: Persist as proposed relations -- 3+ scan entries about the same subsystem that all get pulled into packs -- Entries that individually are too small to be useful but together form - coherent knowledge -- When the compile budget is tight and you need higher token density +Create relations with `status: proposed` unless the user explicitly asks for +acceptance review in the same session. -### Tile format +Examples: -``` -tile_id: / -applies_to: +- incident `references` fix commit +- semantic decision `references` procedural implementation +- corrected incident `contradicts` earlier false diagnosis +- repeated incident `supports` earlier incident -## +### Phase 6: Review packet -[Synthesized content from merged entries] +At the end of the run, report: -### Constraints -- MUST: -- NEVER: +- working-set definition +- candidate count +- proposals created by relation type +- skipped candidates and why +- open questions or low-confidence areas -### When this matters -[1-2 sentence trigger] +If useful, create a `reflection` entry that summarizes the consolidation run. +Do not store the reflection as a substitute for the relations themselves. -Sources: [entry short IDs] -``` +## Dream pass -### Tile tags +A “dream” is a bounded background consolidation pass, not autonomous memory +rewrite. -``` -["source:tile", "tile-session:", "tile-scope:", "tile-id:/"] -``` +Use it only on a small window: -See `references/consolidation-approach.md` for merge rules and quality gate. +- recent `20-40` entries +- one subsystem or one retrieval problem +- unresolved incidents plus nearby decisions/commits ---- +Dream pass loop: -## Phase 4: Verify with compile +1. load a bounded working set +2. identify missing or weak graph structure +3. propose relations with rationale and confidence +4. stop after one pass +5. leave all new edges in `proposed` -After consolidation (relations accepted, optional tiles created), verify -the improvement: +Never let a dream pass: -``` -diaries_compile({ - diary_id: "", - token_budget: 4000, - task_prompt: "", - lambda: 0.7, - w_importance: 0.5 -}) -``` +- auto-accept relations +- edit source entries +- process the entire diary at once +- collapse contradictions into one rewritten summary -Compare with the pre-consolidation pack: +## High-value patterns -- Are the right entries selected? -- Is the ranking sensible? -- Are related entries grouped together? +- `episodic` incident `references` `procedural` fix commit +- `semantic` decision `references` `procedural` implementation +- `episodic` corrected diagnosis `contradicts` earlier misdiagnosis +- repeated incidents `supports` each other +- follow-up semantic rule `elaborates` earlier semantic constraint +- replacement decision or rule `supersedes` stale signed entry -See [CONTEXT_PACK_GUIDE.md](../../../docs/CONTEXT_PACK_GUIDE.md) for compile -recipes and parameter tuning. +## Anti-patterns ---- +- blanket `supports` edges for every entry in a cluster +- using `contradicts` for entries that merely differ in detail +- using `supersedes` when the target is still valid context +- creating causal edges from temporal order alone +- proposing cross-subsystem relations with no concrete evidence -## When to trigger consolidation +## Verification -- After a scan session completes (30+ new scan entries) -- After a feature branch merges (10+ commit entries on one topic) -- When compile packs feel noisy (too many loosely related entries) -- When the same question keeps pulling different entries on each compile -- Periodically (e.g. weekly) for active diaries with >100 entries +After consolidation, test retrieval quality with `diaries_compile` or +`entries_search` using the same task prompt as before. ---- +Look for: -## Recovery after context compression +- better ranking stability +- clearer path from incidents to fixes +- contradictions surfaced instead of silently merged +- less irrelevant retrieval from same-topic but unrelated entries -1. Read this skill file -2. Read `references/consolidation-approach.md` for methodology -3. Query completed work: - - `entries_relations_list({ diary_id, status: "accepted" })` — see what's done - - `entries_search({ tags: ["source:tile", "tile-session:"] })` — find tiles -4. Resume from where relations are still `proposed` +## Recovery after context compression ---- +1. Read this file +2. Read `consolidation-approach.md` if you need methodology rationale +3. Inspect existing proposals with `relations_list` +4. Resume from the current working set instead of restarting whole-diary review ## Permissions -This skill needs LeGreffier MCP tools (diaries_consolidate, entries_search, -entries_create, entries_relations_list, entries_relations_update) and diary -write access. +This skill needs diary read/write access and relation CRUD access. diff --git a/.agents/skills/legreffier-consolidate/consolidation-approach.md b/.agents/skills/legreffier-consolidate/consolidation-approach.md index 26f6d1ea2..a8da244f2 100644 --- a/.agents/skills/legreffier-consolidate/consolidation-approach.md +++ b/.agents/skills/legreffier-consolidate/consolidation-approach.md @@ -1,138 +1,120 @@ # Consolidation Approach — Methodology Reference -This document explains the _reasoning framework_ behind the legreffier-consolidate -skill. It is repo-agnostic. Read it for the "why" behind tile merging and -quality gates. The SKILL.md prescribes the execution steps; this doc explains -the design choices. +This document explains the design logic behind `legreffier-consolidate`. +`SKILL.md` is the operating procedure. This file explains why the workflow is +structured the way it is. ---- +## Core position -## Context tiles +Consolidation is a **graph curation** task. -### What a tile is +It should improve the structure around source entries, not rewrite source +entries into a new canonical layer. Context packs remain the compiled runtime +artifact. Entry relations remain the memory-structure layer. -A tile is a self-contained knowledge unit that answers one question well: +## Why agent-side instead of server-side -> "What do I need to know about X to work on Y correctly?" +Embedding clusters are useful for rough candidate discovery, but they are weak +at the exact distinctions that matter most: -Tiles are NOT documentation rewrites. They synthesize multiple scan entries, -deduplicate overlapping information, and focus on what an agent needs at task time. +- false diagnosis vs real root cause +- implementation reference vs same-topic similarity +- replacement vs elaboration +- causal chain vs temporal adjacency -### Design principles +Those are editorial judgments. They need an agent to read entries +intentionally and record why the edge exists. -1. **Minimal over comprehensive** — fewer tokens, higher density -2. **Concrete over abstract** — commands, paths, patterns, not prose -3. **Non-redundant with project docs** — don't restate what CLAUDE.md already says -4. **Scoped** — each tile has a clear `applies_to` boundary -5. **Synthesis, not summary** — combine info from multiple entries into something - no single source provides +## Trust model -### How to identify merge groups +The process is trustable when it has these properties: -Scan entries often overlap — docs-derived entries (Phase 1) and code-derived entries -(Phase 2) frequently describe the same subsystem from different angles. The -consolidation must merge these, not just list them side by side. +1. **Bounded scope** + One branch, one subsystem, one incident family, or one retrieval problem at + a time. +2. **Pairwise judgment** + Every proposal is based on reading the involved entries, not cluster shape. +3. **Typed relations** + The workflow chooses a specific relation because that relation’s criteria are + met, not because “related” felt good enough. +4. **Proposal-first** + New relations default to `proposed`, not `accepted`. +5. **Recorded rationale** + Each proposal carries confidence, rationale, and evidence refs. -**Algorithm:** +## Why no auto-accept -1. List all scan entries with their `scope` and `scan-category` tags -2. Group entries that share the same subsystem scope (e.g. two entries both scoped - to `libs/database`) -3. Group entries that cover the same conceptual area even if scoped differently - (e.g. an `architecture:auth-flow` doc entry + a `security:auth-model` doc entry - - a `libs/auth` code entry all describe the auth subsystem) -4. Each group becomes one tile. Standalone entries (no overlap) become tiles directly -5. The target is **fewer tiles than source entries** — if you have the same count, - you haven't merged enough +Accepted relations influence retrieval, contradiction handling, and in the case +of `supersedes`, active-state semantics. That is too much authority for a batch +process by default. -**Signals that entries should merge:** +Narrow auto-accept rules may exist later for explicit user-authored or +workflow-authored edges, but that should be opt-in and relation-specific. -- Same `scope:` tag -- Same subsystem name in the entry key -- One is a docs-derived view and the other is a code-derived view of the same area -- Significant constraint overlap (>50% of MUST/NEVER items are shared) +## Relation-first rather than summary-first -**Signals that entries should stay separate:** +The dream/consolidation pass should improve the graph before it produces prose. -- Different subsystems with no conceptual overlap -- Different layers (e.g. database vs API routing) even if they interact -- Merging would exceed the 400-token budget +Good effects of relation-first consolidation: -### Merge rules +- packs can prefer connected accepted evidence +- contradictions can remain visible instead of being flattened away +- stale diagnoses can be down-ranked without deleting history +- source entries remain the provenance anchor -When merging entries from different scan phases: +## Candidate generation heuristics -1. **Code wins on specifics** — function names, actual patterns, real constraints - found in source files -2. **Docs win on rationale** — architecture decisions, design context, cross-cutting - concerns, the "why" -3. **Deduplicate constraints** — if both sources say the same thing, keep one -4. **Prefer concrete over abstract** — `getExecutor(db)` beats "uses repository - pattern"; an actual command beats a description of what the command does +The best candidate generators are usually cross-type: -### Tile execution order +- incident -> fix commit +- decision -> implementation +- false diagnosis -> corrected diagnosis +- repeated incidents with matching symptoms +- follow-up rule -> earlier rule on same scope -Process tiles in dependency order when possible: +Shared tags, refs, and time windows usually outperform pure embedding +similarity for these tasks. -1. Identity/overview tiles first (frames everything) -2. Foundational library tiles (database, crypto, core utilities) -3. Service/application tiles (depend on libraries) -4. Cross-cutting tiles (workflow, testing, CI) -5. Caveat/known-issue tiles last (standalone) +## Dream pass definition -### Quality gate +A dream pass is a small periodic review loop: -Before creating each tile, verify ALL of these: +1. inspect a bounded recent working set +2. detect likely missing structure +3. propose a small number of typed edges +4. stop -- [ ] Under 400 tokens of core content -- [ ] Contains at least one MUST or NEVER constraint -- [ ] Has a clear `applies_to` scope -- [ ] Does NOT restate project docs (CLAUDE.md, README) verbatim -- [ ] Synthesizes from sources, not just copies -- [ ] Includes source entry IDs for provenance +It is not autonomous rewriting, not full-diary compression, and not hidden +maintenance that changes the active truth silently. ---- +## Recommended metadata fields -## Constraint quality criteria +Relation metadata should usually capture: -When extracting constraints for tiles, apply this filter to each candidate: +- `rationale` +- `confidence` +- `evidenceRefs` +- `proposalMethod` +- `reviewedAt` +- `reviewedBy` +- `workingSet` -1. **Triggerable** — clear when the rule applies. "Always follow best practices" - fails; "when writing repository methods, use getExecutor(db)" passes. -2. **Specific** — refers to a real repo convention or invariant, not a generic - programming principle. "Write tests" fails; "use vi.mock, never jest.mock" passes. -3. **Bounded** — fits one task family or subsystem. "All code should be clean" - fails; a rule scoped to `libs/database/**` passes. -4. **Grounded** — links to concrete files, functions, or evidence from the scan. -5. **Actionable** — an agent can follow it or a validator can check it. "Be careful - with auth" fails; "return 404, not 403, for denied private resources" passes. +Additional fields may exist for specific relation kinds, such as +`contradictionKind`. -### Common rejection reasons +## Quality gate -- **Restated project docs** — if CLAUDE.md already says it, a tile constraint adds nothing -- **Generic programming principles** — "write clean code", "handle errors" -- **Descriptive facts** — "the API uses Fastify" is a fact, not a rule -- **Too narrow** — if it only applies to one line in one file, it's a code comment +A consolidation run is good when: ---- +- every proposed edge has a clear relation-specific reason +- proposals are concentrated on one coherent working set +- there is no blanket same-cluster linking +- contradictions remain explicit +- compile/search behavior improves on the tested prompt -## Why merge before extracting constraints +A run is bad when: -Extracting constraints from individual entries produces duplicates (the same -constraint stated differently in Phase 1 docs and Phase 2 code), misses -synthesis opportunities, and inflates the candidate count with weak variants. - -Merging first creates a clean, deduplicated knowledge base. Constraints -extracted from merged content are higher quality because the synthesis has -already happened. - ---- - -## Recovery after context compression - -If context is compressed mid-run: - -1. Read the SKILL.md for execution steps -2. Read this file for the methodology rationale -3. Use the retrieval queries in SKILL.md to find completed tiles -4. Compare completed work against the scan entries to find where to resume +- most edges are generic `supports` +- rationale could apply to any pair in the same topic +- acceptance happened without explicit review +- the graph is denser but not more useful diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/instructions.json b/.tessl/tiles/getlarge/legreffier-consolidate/evals/instructions.json new file mode 100644 index 000000000..1c538589c --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/instructions.json @@ -0,0 +1,88 @@ +{ + "instructions": [ + { + "instruction": "Treat consolidation as editorial graph curation rather than bulk clustering or summary writing.", + "original_snippets": "Consolidation is editorial graph curation, not bulk clustering. ... relation-first rather than summary-first", + "relevant_when": "When designing or executing diary consolidation workflow artifacts.", + "why_given": "new knowledge" + }, + { + "instruction": "Default to creating proposed relations and do not auto-accept by default.", + "original_snippets": "Default rule: create proposals, do not auto-accept. ... Create relations with status: proposed unless the user explicitly asks for acceptance review", + "relevant_when": "Whenever relation proposals are being created or persisted.", + "why_given": "preference" + }, + { + "instruction": "Use a bounded working set instead of consolidating the entire diary by default.", + "original_snippets": "Never consolidate the whole diary by default. ... Start with one bounded slice", + "relevant_when": "When choosing scope for a consolidation run or dream pass.", + "why_given": "preference" + }, + { + "instruction": "Generate candidate pairs intentionally rather than using blanket all-vs-all pairing.", + "original_snippets": "Do not perform blanket pair generation. ... Generate candidate pairs from these signals", + "relevant_when": "When preparing relation candidates from diary entries.", + "why_given": "preference" + }, + { + "instruction": "Prefer candidate signals such as shared scope, overlapping refs, temporal adjacency, repeated symptoms, and cross-type sequences.", + "original_snippets": "Generate candidate pairs from these signals: same scope tag ... overlapping refs ... temporal adjacency ... decision followed by procedural implementation", + "relevant_when": "When building relation candidates from incidents, decisions, commits, and scan entries.", + "why_given": "new knowledge" + }, + { + "instruction": "Treat server clustering only as a weak candidate source and not as trusted semantic judgment.", + "original_snippets": "Server clustering may be used only as a weak candidate source. Treat every cluster suggestion as untrusted until reviewed.", + "relevant_when": "When a consolidation run includes server-provided clusters or similarity groups.", + "why_given": "new knowledge" + }, + { + "instruction": "Judge each candidate pair relation-specifically and not with a generic 'related' heuristic.", + "original_snippets": "For each candidate pair, read both entries ... The judgment must be relation-specific, not 'these feel related'.", + "relevant_when": "When deciding whether to create supports, elaborates, contradicts, caused_by, references, or supersedes edges.", + "why_given": "preference" + }, + { + "instruction": "Use contradicts only for substantive incompatibility on the same subject or diagnosis.", + "original_snippets": "contradicts ... claims cannot both be treated as current truth ... contradiction is substantive, not different emphasis", + "relevant_when": "When reviewing conflicting incidents, diagnoses, or semantic entries.", + "why_given": "new knowledge" + }, + { + "instruction": "Use supersedes only when one entry should replace another as the active version.", + "original_snippets": "supersedes ... source is the new active version of target ... use stricter evidence than for contradicts", + "relevant_when": "When deciding whether a newer entry replaces an older one rather than merely contradicting or elaborating it.", + "why_given": "new knowledge" + }, + { + "instruction": "Attach review metadata including rationale, confidence, evidence refs, proposal method, reviewer, and working set to every proposed relation.", + "original_snippets": "Every created relation must include review metadata ... confidence ... evidenceRefs ... proposalMethod ... rationale ... reviewedAt ... reviewedBy ... workingSet", + "relevant_when": "When serializing relation proposals or review packets.", + "why_given": "new knowledge" + }, + { + "instruction": "Report a review packet summarizing working set, candidate count, proposals by type, skips, and open questions.", + "original_snippets": "At the end of the run, report: working-set definition ... candidate count ... proposals created by relation type ... skipped candidates and why", + "relevant_when": "When finishing a consolidation batch and presenting results.", + "why_given": "preference" + }, + { + "instruction": "A dream pass must stay bounded, stop after one pass, and leave all new edges proposed.", + "original_snippets": "A dream is a bounded background consolidation pass ... stop after one pass ... leave all new edges in proposed", + "relevant_when": "When designing an autonomous or periodic maintenance workflow.", + "why_given": "new knowledge" + }, + { + "instruction": "A dream pass must never auto-accept relations, edit source entries, process the entire diary at once, or collapse contradictions into rewritten summaries.", + "original_snippets": "Never let a dream pass: auto-accept relations ... edit source entries ... process the entire diary at once ... collapse contradictions into one rewritten summary", + "relevant_when": "When defining the boundaries of a background consolidation routine.", + "why_given": "preference" + }, + { + "instruction": "Verify consolidation quality by checking compile/search behavior against the same task prompt as before.", + "original_snippets": "After consolidation, test retrieval quality with diaries_compile or entries_search using the same task prompt as before", + "relevant_when": "When validating whether a consolidation run improved retrieval quality.", + "why_given": "new knowledge" + } + ] +} diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-0/capability.txt b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-0/capability.txt new file mode 100644 index 000000000..c2f9ad3c2 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-0/capability.txt @@ -0,0 +1 @@ +Bounded working set diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-0/criteria.json b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-0/criteria.json new file mode 100644 index 000000000..9dc6f5ee9 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-0/criteria.json @@ -0,0 +1,41 @@ +{ + "checklist": [ + { + "description": "Defines one bounded working set such as a scope family, branch, recent window, or incident family instead of the full diary", + "max_score": 20, + "name": "Bounded scope" + }, + { + "description": "States the retrieval or memory problem the consolidation batch is meant to improve", + "max_score": 10, + "name": "Objective stated" + }, + { + "description": "Names one or more relation types in scope for the run instead of using a generic 'related entries' framing", + "max_score": 10, + "name": "Focus relation set" + }, + { + "description": "Uses at least three candidate-generation signals such as shared scope, overlapping refs, temporal adjacency, repeated symptoms, or cross-type sequences", + "max_score": 20, + "name": "Candidate signals" + }, + { + "description": "Does not propose a blanket all-pairs or whole-diary clustering pass", + "max_score": 15, + "name": "No all-vs-all" + }, + { + "description": "If clustering is mentioned, treats it only as a weak candidate source rather than final judgment", + "max_score": 10, + "name": "Untrusted clustering" + }, + { + "description": "Explains that candidate pairs will be reviewed pairwise before relation creation", + "max_score": 15, + "name": "Review plan" + } + ], + "context": "Tests whether the agent scopes consolidation to a bounded slice, uses intentional candidate-generation signals, and avoids blanket whole-diary review.", + "type": "weighted_checklist" +} diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-0/task.md b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-0/task.md new file mode 100644 index 000000000..8098ce7a4 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-0/task.md @@ -0,0 +1,24 @@ +# Stabilize Diary Consolidation Scope + +## Problem/Feature Description + +The maintainers of a MoltNet-powered diary have noticed that recent +consolidation attempts made the memory graph denser but not more useful. The +problem appears to be that every run starts from an undefined scope, pulls in +too many entries, and then treats “same general topic” as sufficient evidence +for linking. + +Design a concrete consolidation batch for this diary. The output should help a +future agent run one focused pass that is small enough to review, but still +likely to improve retrieval for an actual recurring question in the repo. + +## Output Specification + +Produce these files: + +- `consolidation-plan.md` describing the batch scope, objective, candidate + generation approach, and review flow +- `candidate-selection.json` with the working-set definition and the signals + used to form candidate pairs + +The outputs should stand on their own as instructions for a later agent run. diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-1/capability.txt b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-1/capability.txt new file mode 100644 index 000000000..370ab79c3 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-1/capability.txt @@ -0,0 +1 @@ +Typed proposal packet diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-1/criteria.json b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-1/criteria.json new file mode 100644 index 000000000..711c8a182 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-1/criteria.json @@ -0,0 +1,46 @@ +{ + "checklist": [ + { + "description": "Each proposal names a specific relation type rather than using a generic relation label", + "max_score": 15, + "name": "Typed edges" + }, + { + "description": "Leaves all new relations in proposed state instead of accepted state", + "max_score": 15, + "name": "Proposed status" + }, + { + "description": "Every proposal includes a concise rationale explaining why the chosen relation applies", + "max_score": 15, + "name": "Rationale field" + }, + { + "description": "Every proposal includes an explicit confidence value", + "max_score": 10, + "name": "Confidence field" + }, + { + "description": "Every proposal includes evidence references such as scope tags, file paths, symbols, or other stable identifiers", + "max_score": 15, + "name": "Evidence refs" + }, + { + "description": "Every proposal records reviewedBy and reviewedAt fields", + "max_score": 10, + "name": "Reviewer identity" + }, + { + "description": "Every proposal records proposalMethod and workingSet metadata", + "max_score": 10, + "name": "Proposal method" + }, + { + "description": "Includes a packet summarizing proposal counts by relation type and any skipped candidates or open questions", + "max_score": 10, + "name": "Review summary" + } + ], + "context": "Tests whether the agent proposes typed entry relations with explicit rationale and the required metadata contract, while leaving the relations unaccepted by default.", + "type": "weighted_checklist" +} diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-1/task.md b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-1/task.md new file mode 100644 index 000000000..9b1db6b6d --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-1/task.md @@ -0,0 +1,21 @@ +# Prepare a Reviewable Relation Batch + +## Problem/Feature Description + +An agent has already identified several diary entry pairs that look worth +linking, but the team does not trust unlabeled graph edits or relation batches +that cannot be audited later. They want a reviewable proposal packet that a +human or another agent can inspect before deciding which edges to accept. + +Create a relation proposal batch format and fill it with a handful of realistic +sample proposals for one consolidation run. The emphasis is on traceability and +review quality, not on acceptance. + +## Output Specification + +Produce these files: + +- `relation-proposals.json` containing a small batch of sample relation + proposals with metadata +- `review-packet.md` summarizing the working set, proposal counts, skipped + candidates, and any open questions reviewers should inspect diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-2/capability.txt b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-2/capability.txt new file mode 100644 index 000000000..9d933f91b --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-2/capability.txt @@ -0,0 +1 @@ +Contradiction judgment diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-2/criteria.json b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-2/criteria.json new file mode 100644 index 000000000..8ee14ea46 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-2/criteria.json @@ -0,0 +1,41 @@ +{ + "checklist": [ + { + "description": "Defines contradiction as applying only when two entries concern the same subject, diagnosis, or current-truth claim", + "max_score": 15, + "name": "Same subject test" + }, + { + "description": "Treats contradiction as substantive incompatibility rather than mere difference in detail or emphasis", + "max_score": 15, + "name": "Substantive conflict" + }, + { + "description": "Preserves contradictions as explicit relation proposals rather than collapsing them into one reconciled summary", + "max_score": 15, + "name": "No summary flattening" + }, + { + "description": "Does not replace contradiction with supersedes unless active-version replacement is separately justified", + "max_score": 15, + "name": "Not supersedes by default" + }, + { + "description": "Includes contradiction-specific metadata such as rationale, evidence refs, and an optional contradiction kind", + "max_score": 15, + "name": "Contradiction metadata" + }, + { + "description": "Calls out uncertainty or reviewer follow-up questions where the contradiction confidence is not absolute", + "max_score": 10, + "name": "Reviewer questions" + }, + { + "description": "Shows pairwise reasoning for each contradiction proposal rather than cluster-level statements", + "max_score": 15, + "name": "Pairwise evidence" + } + ], + "context": "Tests whether the agent distinguishes contradiction from elaboration or supersession, and preserves explicit conflict structure instead of flattening it into summary prose.", + "type": "weighted_checklist" +} diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-2/task.md b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-2/task.md new file mode 100644 index 000000000..0568efa01 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-2/task.md @@ -0,0 +1,21 @@ +# Surface Conflicting Memory Without Rewriting It + +## Problem/Feature Description + +The diary for a subsystem contains an earlier incident write-up that blamed the +failure on missing credentials, and a later incident write-up that concluded the +real root cause was an OAuth scope mismatch. Recent retrieval keeps surfacing +both, and different agents are treating them inconsistently. + +Design a consolidation output that keeps this conflict visible and reviewable. +The team wants future retrieval to stop treating the older diagnosis as +unquestioned truth, but they do not want the original entries rewritten or +hidden behind a synthesized paragraph. + +## Output Specification + +Produce these files: + +- `conflict-review.md` describing how the conflicting entries should be treated +- `conflict-proposals.json` with the proposed relations and metadata for that + conflict set diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-3/capability.txt b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-3/capability.txt new file mode 100644 index 000000000..0383fe271 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-3/capability.txt @@ -0,0 +1 @@ +Dream pass guardrails diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-3/criteria.json b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-3/criteria.json new file mode 100644 index 000000000..f00641ec6 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-3/criteria.json @@ -0,0 +1,41 @@ +{ + "checklist": [ + { + "description": "Defines the dream pass over a small recent or subsystem-bounded working set rather than the entire diary", + "max_score": 20, + "name": "Bounded window" + }, + { + "description": "Specifies that the dream routine stops after one pass instead of recursively reprocessing its own output", + "max_score": 10, + "name": "Single pass" + }, + { + "description": "Leaves newly created edges in proposed state", + "max_score": 15, + "name": "Proposed only" + }, + { + "description": "Explicitly avoids mutating or rewriting source entries during the dream pass", + "max_score": 15, + "name": "No entry edits" + }, + { + "description": "Preserves contradictions as graph structure rather than resolving them into rewritten summary text", + "max_score": 15, + "name": "No contradiction collapse" + }, + { + "description": "If server-side discovery is used, it is framed as candidate generation rather than trusted acceptance logic", + "max_score": 10, + "name": "Weak server role" + }, + { + "description": "Includes an explicit later review or packet output for the dream-generated proposals", + "max_score": 15, + "name": "Review hook" + } + ], + "context": "Tests whether the agent defines a bounded dream pass with explicit guardrails against auto-acceptance, whole-diary processing, entry mutation, and contradiction erasure.", + "type": "weighted_checklist" +} diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-3/task.md b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-3/task.md new file mode 100644 index 000000000..1cebc9fc2 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-3/task.md @@ -0,0 +1,20 @@ +# Define a Safe Background Consolidation Pass + +## Problem/Feature Description + +The MoltNet team wants an optional maintenance routine that can run while a +repository is idle and prepare diary relation proposals for later review. The +team is interested in a “dream” concept, but they are worried about a background +process silently changing active memory or overreaching across unrelated parts +of the diary. + +Design a background consolidation routine that is safe enough to trust. The +result should read like an operational guardrail document and a small execution +spec for a future implementation. + +## Output Specification + +Produce these files: + +- `dream-pass-spec.md` describing the routine, inputs, outputs, and guardrails +- `dream-pass-example.json` showing one example batch output from the routine diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-4/capability.txt b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-4/capability.txt new file mode 100644 index 000000000..7e8ded2a6 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-4/capability.txt @@ -0,0 +1 @@ +Consolidation verification diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-4/criteria.json b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-4/criteria.json new file mode 100644 index 000000000..9b2f8f214 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-4/criteria.json @@ -0,0 +1,41 @@ +{ + "checklist": [ + { + "description": "Compares retrieval before and after consolidation using the same representative task prompt", + "max_score": 15, + "name": "Same prompt check" + }, + { + "description": "Uses compile or search behavior as the verification surface instead of a generic success statement", + "max_score": 15, + "name": "Compile or search" + }, + { + "description": "Checks whether ranking or entry selection becomes more stable or sensible after consolidation", + "max_score": 15, + "name": "Ranking stability" + }, + { + "description": "Checks whether retrieval now connects incidents to relevant fixes or decisions more clearly", + "max_score": 15, + "name": "Incident-to-fix path" + }, + { + "description": "Checks whether contradictions are surfaced rather than silently merged away", + "max_score": 15, + "name": "Contradiction visibility" + }, + { + "description": "Checks for less irrelevant same-topic retrieval after consolidation", + "max_score": 15, + "name": "Noise reduction" + }, + { + "description": "Documents any remaining uncertainty or follow-up work instead of declaring the batch complete unconditionally", + "max_score": 10, + "name": "Open risks" + } + ], + "context": "Tests whether the agent verifies consolidation quality against retrieval behavior rather than treating relation creation as success by itself.", + "type": "weighted_checklist" +} diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-4/task.md b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-4/task.md new file mode 100644 index 000000000..85d572614 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/scenario-4/task.md @@ -0,0 +1,19 @@ +# Verify That Relation Proposals Improve Retrieval + +## Problem/Feature Description + +One consolidation batch has already been planned, and the maintainers want to +avoid the trap of calling it successful just because it generated a set of +edges. The actual goal is better retrieval: fewer irrelevant same-topic hits, +clearer paths from incidents to fixes, and more honest treatment of conflict. + +Design a validation artifact that another agent can use to judge whether the +consolidation batch improved retrieval quality for a real recurring question. + +## Output Specification + +Produce these files: + +- `verification-plan.md` describing the before/after checks +- `verification-checklist.json` with the concrete signals that should be + inspected after the consolidation run diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/summary.json b/.tessl/tiles/getlarge/legreffier-consolidate/evals/summary.json new file mode 100644 index 000000000..bb5857f75 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/summary.json @@ -0,0 +1,13 @@ +{ + "instructions_coverage": { + "coverage_percentage": 100, + "instructions_tested": 14, + "total_instructions": 14 + }, + "reason_distribution": { + "new knowledge": 8, + "preference": 6, + "reminder": 0 + }, + "total_scenarios": 5 +} diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/evals/summary_infeasible.json b/.tessl/tiles/getlarge/legreffier-consolidate/evals/summary_infeasible.json new file mode 100644 index 000000000..18a09a0a3 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/evals/summary_infeasible.json @@ -0,0 +1,4 @@ +{ + "infeasible_scenarios": [], + "total_infeasible": 0 +} diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/legreffier-consolidate/SKILL.md b/.tessl/tiles/getlarge/legreffier-consolidate/legreffier-consolidate/SKILL.md new file mode 100644 index 000000000..cf42f566b --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/legreffier-consolidate/SKILL.md @@ -0,0 +1,275 @@ +--- +name: legreffier-consolidate +description: 'Consolidate diary entries by intentionally proposing entry relations through bounded agent review. Use when packs are noisy, when incidents and decisions need linking, when contradictions should be surfaced, or when the user asks to consolidate diaries, propose relations, or run a dream-like memory pass.' +--- + +# LeGreffier Consolidate Skill + +Consolidation is **editorial graph curation**, not bulk clustering. + +Use this skill to propose typed `entry_relations` between diary entries through +an intentional, auditable review workflow. The server may help with retrieval or +candidate discovery, but the agent performs the semantic judgment. + +Default rule: **create proposals, do not auto-accept**. + +## Goals + +- improve retrieval by linking related entries intentionally +- surface contradictions and stale diagnoses without mutating source entries +- connect cross-type evidence such as incident -> fix commit -> decision +- leave a reviewable trail in relation metadata + +## Non-goals + +- rewriting diary entries +- blanket all-vs-all clustering +- auto-accepting relations by default +- replacing context packs with a second compilation artifact + +## Prerequisites + +- LeGreffier identity active +- Diary resolved as `DIARY_ID` +- MCP or CLI transport available +- Diary has enough signal to consolidate: incidents, decisions, commits, scan + entries, or repeated retrieval noise + +Read `references/consolidation-approach.md` for rationale when needed. + +## Transport detection + +After resolving `AGENT_NAME` and `DIARY_ID`, pick one transport for the entire +session: + +1. MCP when MoltNet tools respond +2. CLI fallback with `npx @themoltnet/cli` +3. Do not mix transports inside one consolidation run + +CLI credentials: `.moltnet//moltnet.json` + +## When to trigger + +- compile packs are noisy or unstable across similar prompts +- incidents, decisions, and commits exist but are not linked +- the same mistake appears repeatedly with no graph structure +- the user asks for diary consolidation or relation proposals +- periodic maintenance on an active diary + +## Operator preflight + +Before proposing any relations, state the intended scope: + +- objective: what retrieval or memory problem is being fixed +- working set: recent window, branch, scope tags, or incident family +- relation focus: `contradicts`, `supports`, `references`, `caused_by`, + `elaborates`, `supersedes` +- acceptance policy: default `proposed` only + +If the user does not specify scope, infer a bounded scope and record that +choice in the final summary. + +## Workflow + +### Phase 1: Build a bounded working set + +Never consolidate the whole diary by default. + +Start with one bounded slice: + +- last `20-50` entries +- one `scope:*` family +- one branch +- one recurring incident family +- one representative compile/search prompt and its retrieved entries + +Prefer entries with: + +- high importance +- repeated retrieval +- incident/decision/procedural cross-links that appear missing +- unresolved `contradicts` or `proposed` relations nearby + +Useful retrieval patterns: + +- `entries_list` with `tags`, `entry_type`, `limit`, `offset` +- `entries_search` for repeated questions or subsystem names +- `relations_list` for entries that already have open proposals + +### Phase 2: Generate candidate pairs intentionally + +Do not perform blanket pair generation. + +Generate candidate pairs from these signals: + +- same `scope:` tag, different `entryType` +- overlapping refs in metadata or content +- temporal adjacency during one incident/fix sequence +- repeated symptom language across incidents +- a decision followed by a procedural implementation entry +- a false diagnosis followed by a corrected root-cause entry + +Server clustering may be used only as a weak candidate source. Treat every +cluster suggestion as untrusted until reviewed. + +### Phase 3: Judge one pair at a time + +For each candidate pair, read both entries and decide: + +- propose one relation +- skip +- mark as probable duplicate / supersession candidate + +The judgment must be relation-specific, not “these feel related”. + +#### Relation criteria + +`supports` + +- same claim or pattern +- target adds confirming evidence +- target does not merely repeat wording + +`elaborates` + +- same subject +- target adds operational detail, nuance, or constraints +- target is not a replacement + +`contradicts` + +- same subject or diagnosis +- claims cannot both be treated as current truth +- contradiction is substantive, not different emphasis + +`caused_by` + +- source problem plausibly follows from target condition or earlier event +- causal link is evidenced, not just temporally adjacent + +`references` + +- explicit file, symbol, endpoint, commit, or implementation linkage +- target helps an agent navigate from one entry to the concrete artifact + +`supersedes` + +- source is the new active version of target +- target should no longer be preferred in active retrieval +- use stricter evidence than for `contradicts` + +### Phase 4: Create proposal metadata + +Every created relation must include review metadata in `metadata`. + +Required fields: + +```json +{ + "confidence": 0.0, + "evidenceRefs": ["scope:libs/database", "libs/database/src/schema.ts"], + "proposalMethod": "skill:legreffier-consolidate", + "rationale": "One or two sentences explaining the relation.", + "reviewedAt": "2026-04-01T00:00:00Z", + "reviewedBy": "", + "workingSet": "recent:scope:database" +} +``` + +Optional fields: + +- `contradictionKind`: `false-diagnosis`, `stale-assumption`, `policy-conflict` +- `causeSignals`: short list of causal clues +- `scopeSnapshot`: tags or branch snapshot used during review +- `workflowId`: batch identifier for the run + +### Phase 5: Persist as proposed relations + +Create relations with `status: proposed` unless the user explicitly asks for +acceptance review in the same session. + +Examples: + +- incident `references` fix commit +- semantic decision `references` procedural implementation +- corrected incident `contradicts` earlier false diagnosis +- repeated incident `supports` earlier incident + +### Phase 6: Review packet + +At the end of the run, report: + +- working-set definition +- candidate count +- proposals created by relation type +- skipped candidates and why +- open questions or low-confidence areas + +If useful, create a `reflection` entry that summarizes the consolidation run. +Do not store the reflection as a substitute for the relations themselves. + +## Dream pass + +A “dream” is a bounded background consolidation pass, not autonomous memory +rewrite. + +Use it only on a small window: + +- recent `20-40` entries +- one subsystem or one retrieval problem +- unresolved incidents plus nearby decisions/commits + +Dream pass loop: + +1. load a bounded working set +2. identify missing or weak graph structure +3. propose relations with rationale and confidence +4. stop after one pass +5. leave all new edges in `proposed` + +Never let a dream pass: + +- auto-accept relations +- edit source entries +- process the entire diary at once +- collapse contradictions into one rewritten summary + +## High-value patterns + +- `episodic` incident `references` `procedural` fix commit +- `semantic` decision `references` `procedural` implementation +- `episodic` corrected diagnosis `contradicts` earlier misdiagnosis +- repeated incidents `supports` each other +- follow-up semantic rule `elaborates` earlier semantic constraint +- replacement decision or rule `supersedes` stale signed entry + +## Anti-patterns + +- blanket `supports` edges for every entry in a cluster +- using `contradicts` for entries that merely differ in detail +- using `supersedes` when the target is still valid context +- creating causal edges from temporal order alone +- proposing cross-subsystem relations with no concrete evidence + +## Verification + +After consolidation, test retrieval quality with `diaries_compile` or +`entries_search` using the same task prompt as before. + +Look for: + +- better ranking stability +- clearer path from incidents to fixes +- contradictions surfaced instead of silently merged +- less irrelevant retrieval from same-topic but unrelated entries + +## Recovery after context compression + +1. Read this file +2. Read `references/consolidation-approach.md` if you need methodology rationale +3. Inspect existing proposals with `relations_list` +4. Resume from the current working set instead of restarting whole-diary review + +## Permissions + +This skill needs diary read/write access and relation CRUD access. diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/legreffier-consolidate/references/consolidation-approach.md b/.tessl/tiles/getlarge/legreffier-consolidate/legreffier-consolidate/references/consolidation-approach.md new file mode 100644 index 000000000..a8da244f2 --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/legreffier-consolidate/references/consolidation-approach.md @@ -0,0 +1,120 @@ +# Consolidation Approach — Methodology Reference + +This document explains the design logic behind `legreffier-consolidate`. +`SKILL.md` is the operating procedure. This file explains why the workflow is +structured the way it is. + +## Core position + +Consolidation is a **graph curation** task. + +It should improve the structure around source entries, not rewrite source +entries into a new canonical layer. Context packs remain the compiled runtime +artifact. Entry relations remain the memory-structure layer. + +## Why agent-side instead of server-side + +Embedding clusters are useful for rough candidate discovery, but they are weak +at the exact distinctions that matter most: + +- false diagnosis vs real root cause +- implementation reference vs same-topic similarity +- replacement vs elaboration +- causal chain vs temporal adjacency + +Those are editorial judgments. They need an agent to read entries +intentionally and record why the edge exists. + +## Trust model + +The process is trustable when it has these properties: + +1. **Bounded scope** + One branch, one subsystem, one incident family, or one retrieval problem at + a time. +2. **Pairwise judgment** + Every proposal is based on reading the involved entries, not cluster shape. +3. **Typed relations** + The workflow chooses a specific relation because that relation’s criteria are + met, not because “related” felt good enough. +4. **Proposal-first** + New relations default to `proposed`, not `accepted`. +5. **Recorded rationale** + Each proposal carries confidence, rationale, and evidence refs. + +## Why no auto-accept + +Accepted relations influence retrieval, contradiction handling, and in the case +of `supersedes`, active-state semantics. That is too much authority for a batch +process by default. + +Narrow auto-accept rules may exist later for explicit user-authored or +workflow-authored edges, but that should be opt-in and relation-specific. + +## Relation-first rather than summary-first + +The dream/consolidation pass should improve the graph before it produces prose. + +Good effects of relation-first consolidation: + +- packs can prefer connected accepted evidence +- contradictions can remain visible instead of being flattened away +- stale diagnoses can be down-ranked without deleting history +- source entries remain the provenance anchor + +## Candidate generation heuristics + +The best candidate generators are usually cross-type: + +- incident -> fix commit +- decision -> implementation +- false diagnosis -> corrected diagnosis +- repeated incidents with matching symptoms +- follow-up rule -> earlier rule on same scope + +Shared tags, refs, and time windows usually outperform pure embedding +similarity for these tasks. + +## Dream pass definition + +A dream pass is a small periodic review loop: + +1. inspect a bounded recent working set +2. detect likely missing structure +3. propose a small number of typed edges +4. stop + +It is not autonomous rewriting, not full-diary compression, and not hidden +maintenance that changes the active truth silently. + +## Recommended metadata fields + +Relation metadata should usually capture: + +- `rationale` +- `confidence` +- `evidenceRefs` +- `proposalMethod` +- `reviewedAt` +- `reviewedBy` +- `workingSet` + +Additional fields may exist for specific relation kinds, such as +`contradictionKind`. + +## Quality gate + +A consolidation run is good when: + +- every proposed edge has a clear relation-specific reason +- proposals are concentrated on one coherent working set +- there is no blanket same-cluster linking +- contradictions remain explicit +- compile/search behavior improves on the tested prompt + +A run is bad when: + +- most edges are generic `supports` +- rationale could apply to any pair in the same topic +- acceptance happened without explicit review +- the graph is denser but not more useful diff --git a/.tessl/tiles/getlarge/legreffier-consolidate/tile.json b/.tessl/tiles/getlarge/legreffier-consolidate/tile.json new file mode 100644 index 000000000..838fcb2ca --- /dev/null +++ b/.tessl/tiles/getlarge/legreffier-consolidate/tile.json @@ -0,0 +1,11 @@ +{ + "name": "getlarge/legreffier-consolidate", + "private": true, + "skills": { + "legreffier-consolidate": { + "path": "legreffier-consolidate/SKILL.md" + } + }, + "summary": "Agent-side diary consolidation workflow for proposing entry relations through bounded review, explicit rationale, and no auto-accept by default.", + "version": "0.1.0" +} diff --git a/docs/plans/2026-04-01-agent-side-consolidation-skill-spike.md b/docs/plans/2026-04-01-agent-side-consolidation-skill-spike.md new file mode 100644 index 000000000..ceeeb341c --- /dev/null +++ b/docs/plans/2026-04-01-agent-side-consolidation-skill-spike.md @@ -0,0 +1,72 @@ +# Agent-Side Consolidation Skill Spike + +## Summary + +`legreffier-consolidate` is currently stronger in documentation than in +implementation. The product schema supports a useful relation graph, but the +existing consolidation story is still too server-centric and too weakly +auditable for trustworthy diary maintenance. + +This spike proposes an intentional, agent-driven consolidation workflow. + +## Problem + +- server-side clustering is weak at the distinctions that matter most: + contradiction vs elaboration, causal chain vs temporal adjacency, replacement + vs disagreement, implementation reference vs same-topic similarity +- current relation proposal logic is effectively “cluster => supports edges” +- relation proposals need provenance and review semantics +- dream-like maintenance is attractive, but unsafe if it rewrites entries or + auto-accepts relations + +## Proposal + +Treat consolidation as **editorial graph curation**: + +- build a bounded working set instead of processing the whole diary +- generate candidate pairs intentionally +- judge each pair relation-specifically +- create `proposed` relations by default +- attach rationale, confidence, evidence refs, reviewer, and working-set + metadata to every proposal +- verify value against compile/search behavior for the same task prompt + +## Relation policy + +The skill should support: + +- `supports` +- `elaborates` +- `contradicts` +- `caused_by` +- `references` +- `supersedes` + +No auto-accept by default. + +## Dream pass + +A dream pass should be a bounded background proposal loop: + +- recent `20-40` entries or one subsystem slice +- one pass only +- proposal-first, never acceptance-first +- no source-entry rewriting +- no contradiction flattening + +## Deliverables in this spike + +- rewrite `.agents/skills/legreffier-consolidate/SKILL.md` +- rewrite the consolidation methodology reference +- package the skill as a Tessl tile +- add eval scenarios covering bounded scope, typed proposals, contradiction + handling, dream-pass guardrails, and verification + +## Follow-up work + +- align the server-side `diaries_consolidate` endpoint with an agent-side review + role instead of pretending to perform semantic consolidation +- define a stable metadata schema for relation proposals +- decide whether any narrow auto-accept rules are ever acceptable +- evaluate whether accepted/proposed relations should influence pack ranking + differently