Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
393 changes: 186 additions & 207 deletions .agents/skills/legreffier-consolidate/SKILL.md

Large diffs are not rendered by default.

186 changes: 84 additions & 102 deletions .agents/skills/legreffier-consolidate/consolidation-approach.md
Original file line number Diff line number Diff line change
@@ -1,138 +1,120 @@
# Consolidation Approach — Methodology Reference

This document explains the _reasoning framework_ behind the legreffier-consolidate
skill. It is repo-agnostic. Read it for the "why" behind tile merging and
quality gates. The SKILL.md prescribes the execution steps; this doc explains
the design choices.
This document explains the design logic behind `legreffier-consolidate`.
`SKILL.md` is the operating procedure. This file explains why the workflow is
structured the way it is.

---
## Core position

## Context tiles
Consolidation is a **graph curation** task.

### What a tile is
It should improve the structure around source entries, not rewrite source
entries into a new canonical layer. Context packs remain the compiled runtime
artifact. Entry relations remain the memory-structure layer.

A tile is a self-contained knowledge unit that answers one question well:
## Why agent-side instead of server-side

> "What do I need to know about X to work on Y correctly?"
Embedding clusters are useful for rough candidate discovery, but they are weak
at the exact distinctions that matter most:

Tiles are NOT documentation rewrites. They synthesize multiple scan entries,
deduplicate overlapping information, and focus on what an agent needs at task time.
- false diagnosis vs real root cause
- implementation reference vs same-topic similarity
- replacement vs elaboration
- causal chain vs temporal adjacency

### Design principles
Those are editorial judgments. They need an agent to read entries
intentionally and record why the edge exists.

1. **Minimal over comprehensive** — fewer tokens, higher density
2. **Concrete over abstract** — commands, paths, patterns, not prose
3. **Non-redundant with project docs** — don't restate what CLAUDE.md already says
4. **Scoped** — each tile has a clear `applies_to` boundary
5. **Synthesis, not summary** — combine info from multiple entries into something
no single source provides
## Trust model

### How to identify merge groups
The process is trustable when it has these properties:

Scan entries often overlap — docs-derived entries (Phase 1) and code-derived entries
(Phase 2) frequently describe the same subsystem from different angles. The
consolidation must merge these, not just list them side by side.
1. **Bounded scope**
One branch, one subsystem, one incident family, or one retrieval problem at
a time.
2. **Pairwise judgment**
Every proposal is based on reading the involved entries, not cluster shape.
3. **Typed relations**
The workflow chooses a specific relation because that relation’s criteria are
met, not because “related” felt good enough.
4. **Proposal-first**
New relations default to `proposed`, not `accepted`.
5. **Recorded rationale**
Each proposal carries confidence, rationale, and evidence refs.

**Algorithm:**
## Why no auto-accept

1. List all scan entries with their `scope` and `scan-category` tags
2. Group entries that share the same subsystem scope (e.g. two entries both scoped
to `libs/database`)
3. Group entries that cover the same conceptual area even if scoped differently
(e.g. an `architecture:auth-flow` doc entry + a `security:auth-model` doc entry
- a `libs/auth` code entry all describe the auth subsystem)
4. Each group becomes one tile. Standalone entries (no overlap) become tiles directly
5. The target is **fewer tiles than source entries** — if you have the same count,
you haven't merged enough
Accepted relations influence retrieval, contradiction handling, and in the case
of `supersedes`, active-state semantics. That is too much authority for a batch
process by default.

**Signals that entries should merge:**
Narrow auto-accept rules may exist later for explicit user-authored or
workflow-authored edges, but that should be opt-in and relation-specific.

- Same `scope:` tag
- Same subsystem name in the entry key
- One is a docs-derived view and the other is a code-derived view of the same area
- Significant constraint overlap (>50% of MUST/NEVER items are shared)
## Relation-first rather than summary-first

**Signals that entries should stay separate:**
The dream/consolidation pass should improve the graph before it produces prose.

- Different subsystems with no conceptual overlap
- Different layers (e.g. database vs API routing) even if they interact
- Merging would exceed the 400-token budget
Good effects of relation-first consolidation:

### Merge rules
- packs can prefer connected accepted evidence
- contradictions can remain visible instead of being flattened away
- stale diagnoses can be down-ranked without deleting history
- source entries remain the provenance anchor

When merging entries from different scan phases:
## Candidate generation heuristics

1. **Code wins on specifics** — function names, actual patterns, real constraints
found in source files
2. **Docs win on rationale** — architecture decisions, design context, cross-cutting
concerns, the "why"
3. **Deduplicate constraints** — if both sources say the same thing, keep one
4. **Prefer concrete over abstract** — `getExecutor(db)` beats "uses repository
pattern"; an actual command beats a description of what the command does
The best candidate generators are usually cross-type:

### Tile execution order
- incident -> fix commit
- decision -> implementation
- false diagnosis -> corrected diagnosis
- repeated incidents with matching symptoms
- follow-up rule -> earlier rule on same scope

Process tiles in dependency order when possible:
Shared tags, refs, and time windows usually outperform pure embedding
similarity for these tasks.

1. Identity/overview tiles first (frames everything)
2. Foundational library tiles (database, crypto, core utilities)
3. Service/application tiles (depend on libraries)
4. Cross-cutting tiles (workflow, testing, CI)
5. Caveat/known-issue tiles last (standalone)
## Dream pass definition

### Quality gate
A dream pass is a small periodic review loop:

Before creating each tile, verify ALL of these:
1. inspect a bounded recent working set
2. detect likely missing structure
3. propose a small number of typed edges
4. stop

- [ ] Under 400 tokens of core content
- [ ] Contains at least one MUST or NEVER constraint
- [ ] Has a clear `applies_to` scope
- [ ] Does NOT restate project docs (CLAUDE.md, README) verbatim
- [ ] Synthesizes from sources, not just copies
- [ ] Includes source entry IDs for provenance
It is not autonomous rewriting, not full-diary compression, and not hidden
maintenance that changes the active truth silently.

---
## Recommended metadata fields

## Constraint quality criteria
Relation metadata should usually capture:

When extracting constraints for tiles, apply this filter to each candidate:
- `rationale`
- `confidence`
- `evidenceRefs`
- `proposalMethod`
- `reviewedAt`
- `reviewedBy`
- `workingSet`

1. **Triggerable** — clear when the rule applies. "Always follow best practices"
fails; "when writing repository methods, use getExecutor(db)" passes.
2. **Specific** — refers to a real repo convention or invariant, not a generic
programming principle. "Write tests" fails; "use vi.mock, never jest.mock" passes.
3. **Bounded** — fits one task family or subsystem. "All code should be clean"
fails; a rule scoped to `libs/database/**` passes.
4. **Grounded** — links to concrete files, functions, or evidence from the scan.
5. **Actionable** — an agent can follow it or a validator can check it. "Be careful
with auth" fails; "return 404, not 403, for denied private resources" passes.
Additional fields may exist for specific relation kinds, such as
`contradictionKind`.

### Common rejection reasons
## Quality gate

- **Restated project docs** — if CLAUDE.md already says it, a tile constraint adds nothing
- **Generic programming principles** — "write clean code", "handle errors"
- **Descriptive facts** — "the API uses Fastify" is a fact, not a rule
- **Too narrow** — if it only applies to one line in one file, it's a code comment
A consolidation run is good when:

---
- every proposed edge has a clear relation-specific reason
- proposals are concentrated on one coherent working set
- there is no blanket same-cluster linking
- contradictions remain explicit
- compile/search behavior improves on the tested prompt

## Why merge before extracting constraints
A run is bad when:

Extracting constraints from individual entries produces duplicates (the same
constraint stated differently in Phase 1 docs and Phase 2 code), misses
synthesis opportunities, and inflates the candidate count with weak variants.

Merging first creates a clean, deduplicated knowledge base. Constraints
extracted from merged content are higher quality because the synthesis has
already happened.

---

## Recovery after context compression

If context is compressed mid-run:

1. Read the SKILL.md for execution steps
2. Read this file for the methodology rationale
3. Use the retrieval queries in SKILL.md to find completed tiles
4. Compare completed work against the scan entries to find where to resume
- most edges are generic `supports`
- rationale could apply to any pair in the same topic
- acceptance happened without explicit review
- the graph is denser but not more useful
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
{
"instructions": [
{
"instruction": "Treat consolidation as editorial graph curation rather than bulk clustering or summary writing.",
"original_snippets": "Consolidation is editorial graph curation, not bulk clustering. ... relation-first rather than summary-first",
"relevant_when": "When designing or executing diary consolidation workflow artifacts.",
"why_given": "new knowledge"
},
{
"instruction": "Default to creating proposed relations and do not auto-accept by default.",
"original_snippets": "Default rule: create proposals, do not auto-accept. ... Create relations with status: proposed unless the user explicitly asks for acceptance review",
"relevant_when": "Whenever relation proposals are being created or persisted.",
"why_given": "preference"
},
{
"instruction": "Use a bounded working set instead of consolidating the entire diary by default.",
"original_snippets": "Never consolidate the whole diary by default. ... Start with one bounded slice",
"relevant_when": "When choosing scope for a consolidation run or dream pass.",
"why_given": "preference"
},
{
"instruction": "Generate candidate pairs intentionally rather than using blanket all-vs-all pairing.",
"original_snippets": "Do not perform blanket pair generation. ... Generate candidate pairs from these signals",
"relevant_when": "When preparing relation candidates from diary entries.",
"why_given": "preference"
},
{
"instruction": "Prefer candidate signals such as shared scope, overlapping refs, temporal adjacency, repeated symptoms, and cross-type sequences.",
"original_snippets": "Generate candidate pairs from these signals: same scope tag ... overlapping refs ... temporal adjacency ... decision followed by procedural implementation",
"relevant_when": "When building relation candidates from incidents, decisions, commits, and scan entries.",
"why_given": "new knowledge"
},
{
"instruction": "Treat server clustering only as a weak candidate source and not as trusted semantic judgment.",
"original_snippets": "Server clustering may be used only as a weak candidate source. Treat every cluster suggestion as untrusted until reviewed.",
"relevant_when": "When a consolidation run includes server-provided clusters or similarity groups.",
"why_given": "new knowledge"
},
{
"instruction": "Judge each candidate pair relation-specifically and not with a generic 'related' heuristic.",
"original_snippets": "For each candidate pair, read both entries ... The judgment must be relation-specific, not 'these feel related'.",
"relevant_when": "When deciding whether to create supports, elaborates, contradicts, caused_by, references, or supersedes edges.",
"why_given": "preference"
},
{
"instruction": "Use contradicts only for substantive incompatibility on the same subject or diagnosis.",
"original_snippets": "contradicts ... claims cannot both be treated as current truth ... contradiction is substantive, not different emphasis",
"relevant_when": "When reviewing conflicting incidents, diagnoses, or semantic entries.",
"why_given": "new knowledge"
},
{
"instruction": "Use supersedes only when one entry should replace another as the active version.",
"original_snippets": "supersedes ... source is the new active version of target ... use stricter evidence than for contradicts",
"relevant_when": "When deciding whether a newer entry replaces an older one rather than merely contradicting or elaborating it.",
"why_given": "new knowledge"
},
{
"instruction": "Attach review metadata including rationale, confidence, evidence refs, proposal method, reviewer, and working set to every proposed relation.",
"original_snippets": "Every created relation must include review metadata ... confidence ... evidenceRefs ... proposalMethod ... rationale ... reviewedAt ... reviewedBy ... workingSet",
"relevant_when": "When serializing relation proposals or review packets.",
"why_given": "new knowledge"
},
{
"instruction": "Report a review packet summarizing working set, candidate count, proposals by type, skips, and open questions.",
"original_snippets": "At the end of the run, report: working-set definition ... candidate count ... proposals created by relation type ... skipped candidates and why",
"relevant_when": "When finishing a consolidation batch and presenting results.",
"why_given": "preference"
},
{
"instruction": "A dream pass must stay bounded, stop after one pass, and leave all new edges proposed.",
"original_snippets": "A dream is a bounded background consolidation pass ... stop after one pass ... leave all new edges in proposed",
"relevant_when": "When designing an autonomous or periodic maintenance workflow.",
"why_given": "new knowledge"
},
{
"instruction": "A dream pass must never auto-accept relations, edit source entries, process the entire diary at once, or collapse contradictions into rewritten summaries.",
"original_snippets": "Never let a dream pass: auto-accept relations ... edit source entries ... process the entire diary at once ... collapse contradictions into one rewritten summary",
"relevant_when": "When defining the boundaries of a background consolidation routine.",
"why_given": "preference"
},
{
"instruction": "Verify consolidation quality by checking compile/search behavior against the same task prompt as before.",
"original_snippets": "After consolidation, test retrieval quality with diaries_compile or entries_search using the same task prompt as before",
"relevant_when": "When validating whether a consolidation run improved retrieval quality.",
"why_given": "new knowledge"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Bounded working set
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{
"checklist": [
{
"description": "Defines one bounded working set such as a scope family, branch, recent window, or incident family instead of the full diary",
"max_score": 20,
"name": "Bounded scope"
},
{
"description": "States the retrieval or memory problem the consolidation batch is meant to improve",
"max_score": 10,
"name": "Objective stated"
},
{
"description": "Names one or more relation types in scope for the run instead of using a generic 'related entries' framing",
"max_score": 10,
"name": "Focus relation set"
},
{
"description": "Uses at least three candidate-generation signals such as shared scope, overlapping refs, temporal adjacency, repeated symptoms, or cross-type sequences",
"max_score": 20,
"name": "Candidate signals"
},
{
"description": "Does not propose a blanket all-pairs or whole-diary clustering pass",
"max_score": 15,
"name": "No all-vs-all"
},
{
"description": "If clustering is mentioned, treats it only as a weak candidate source rather than final judgment",
"max_score": 10,
"name": "Untrusted clustering"
},
{
"description": "Explains that candidate pairs will be reviewed pairwise before relation creation",
"max_score": 15,
"name": "Review plan"
}
],
"context": "Tests whether the agent scopes consolidation to a bounded slice, uses intentional candidate-generation signals, and avoids blanket whole-diary review.",
"type": "weighted_checklist"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Stabilize Diary Consolidation Scope

## Problem/Feature Description

The maintainers of a MoltNet-powered diary have noticed that recent
consolidation attempts made the memory graph denser but not more useful. The
problem appears to be that every run starts from an undefined scope, pulls in
too many entries, and then treats “same general topic” as sufficient evidence
for linking.

Design a concrete consolidation batch for this diary. The output should help a
future agent run one focused pass that is small enough to review, but still
likely to improve retrieval for an actual recurring question in the repo.

## Output Specification

Produce these files:

- `consolidation-plan.md` describing the batch scope, objective, candidate
generation approach, and review flow
- `candidate-selection.json` with the working-set definition and the signals
used to form candidate pairs

The outputs should stand on their own as instructions for a later agent run.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Typed proposal packet
Loading
Loading