Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,24 @@
"clean-code",
"refactoring"
]
},
{
"name": "researcher",
"source": "./plugins/researcher",
"description": "Source-grounded research assistant: an interactive skill gathers a research brief, then launches a bundled Dynamic Workflow that fans out firecrawl retrieval (WebSearch fallback) into cited findings, gates rounds on coverage and contradictions, and renders a self-contained HTML report where every claim traces to a numbered source. The report evolves across follow-up runs.",
"version": "0.0.0",
"author": {
"name": "Mateusz Gostański (grixu)",
"email": "mateusz.gostanski@gmail.com"
},
"category": "productivity",
"tags": [
"research",
"web",
"firecrawl",
"citations",
"workflow"
]
}
]
}
17 changes: 17 additions & 0 deletions plugins/researcher/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"name": "researcher",
"version": "0.0.0",
"description": "Source-grounded research assistant: an interactive skill gathers a research brief, then launches a bundled Dynamic Workflow that fans out firecrawl retrieval (WebSearch fallback) into cited findings, gates rounds on coverage and contradictions, and renders a self-contained HTML report where every claim traces to a numbered source. The report evolves across follow-up runs.",
"author": {
"name": "Mateusz Gostański (grixu)",
"email": "mateusz.gostanski@gmail.com"
},
"repository": "https://github.com/grixu/cc-toolkit",
"keywords": [
"research",
"web",
"firecrawl",
"citations",
"workflow"
]
}
53 changes: 53 additions & 0 deletions plugins/researcher/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Changelog

All notable changes to the **researcher** plugin will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

- `research` orchestrator skill — gathers a brief (infer-first; one consolidated prompt over only the dimensions it
can't infer: depth, recency, sources, audience), resolves the report folder (slugified goal; no index file — the
per-report `state.json` files are the registry), launches the bundled Dynamic Workflow, presents the manifest +
path, and runs the follow-up checkpoint as a structured **`AskUserQuestion`** (multiSelect — each proposed follow-up
is a selectable option, full text in the description) that extends the evolving report.
- `workflows/research.js` — the bundled Dynamic Workflow:
- **Setup** — creates the report dir and, when extending, loads only the prior report's `state.json` HEAD (with a
`schemaVersion` guard) — no read-back swarm; the prior findings stay on disk and the Synthesizer reads them later.
Tool availability (`mmdc`) is detected by the skill up front and passed in args, not preflighted here.
- **Plan** — derives distinct sub-query angles so parallel retrievers don't converge on the same hit; when the goal
hinges on a named product/vendor/standard's capability, it dedicates a primary/official-docs angle so "does X
support Y" is confirmed at the source rather than inferred from third-party threads.
- **Assessor-gated round loop** — parallel firecrawl retrievers (WebSearch fallback) emit findings with typed,
verbatim evidence spans and per-source trust tiers; deterministic dedup-by-URL assigns append-only source ids; a
**Conflict-scout** (every round) and, on a deep brief, an adversarial **Verifier** feed the **Assessor**, the
loop's single gate. Bounded by a per-depth round cap (quick=1 / standard=2 / deep=3), not a token budget. The
Assessor treats a central capability claim that rests only on secondary/community sources or inference as a
**material, resolvable gap** — it sends the next round to that actor's own primary docs instead of green-lighting
on inference, and a deep brief holds a high bar — so a single round can't settle "does X support Y" by guessing.
- **Synthesizer** — runs once on green light; reconciles findings, surfaces residual contradictions with
attribution, composes an audience-neutral cited draft. On a follow-up it reads the prior findings shards
directly (`Read`, batched in one turn) and rebuilds the whole answer from all findings holistically — never
seeded by the prior prose (`answer.md` is a write-only export).
- **Editor** — the sole audience-aware stage; re-cuts for concision and the brief's audience tier and marks
earn-their-place visuals.
- **Persist** — runs before the render: snapshots the prior `output.html`, then writes state append-only as a
small `state.json` HEAD plus bounded `findings/NNN.json` shards (≤20 findings each) and `answer.md`, each in its
own `Write` so no single agent turn ever emits the whole corpus — the failure mode that aborted large/deep runs.
On extend, this run's new findings are appended as **new** shards at indices continuing from the prior
`shardCount` (prior shards are never cleared or rewritten; the HEAD counts accumulate); `findings/` is cleared
only on a fresh run.
- **Composer** — renders an HTML report (semantic HTML against a fixed class vocabulary), copies the shipped
assets, compiles diagrams with a **global `mmdc`** (it never reads `report.css`, explores the dir, or downloads a
renderer — going straight to the render so it can't look hung), and renders Chart.js charts (state is already
persisted). Returns only the path + a compact manifest.
- Shipped assets — a version-pinned `chart.umd.js` (Chart.js 4.5.0) and a `report.css` (system fonts, light/dark,
~70ch column, sticky ToC sidebar, print styles), copied into each report so it stays offline and self-contained.
- Single **evolving** HTML report per topic with sidecar `diagrams/`, `assets/`, `snapshots/`, and `findings/` (sharded
state) folders; source ids, findings shards, and content artifacts are all written append-only — follow-up runs add
new shards rather than rewriting prior ones, so older snapshots keep resolving and existing citations never break.

[Unreleased]: https://github.com/grixu/cc-toolkit
193 changes: 193 additions & 0 deletions plugins/researcher/CONTEXT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# Researcher

A Claude Code plugin that answers a research question with a **source-grounded, cited report**.
An interactive front-end skill gathers the brief, then launches a bundled Dynamic Workflow that
fans out firecrawl-backed retrieval subagents (WebSearch fallback) into findings, gates rounds on
coverage and contradictions, then synthesises a cited answer where every claim is traceable to a
numbered source — rendered as an HTML report.

## Language

**Research brief**:
The structured spec the front-end skill produces from the user's question and answers — the goal
plus scope, depth, recency, source, and **audience** constraints. It is what drives a workflow run.
The **audience** is one of four coarse expertise tiers — `lay`, `informed`, `practitioner`, or
`expert` (`practitioner` = in the field but junior: knows the basics, not advanced terms or
abbreviations) — optionally refined by a one-line free-form descriptor (e.g. "a PM evaluating vendors"). It calibrates the
**Editor** *only* (no other stage reads it); the skill infers the tier from the question and available
context (e.g. a global CLAUDE.md), and only asks the user outright when it cannot.
_Avoid_: query, prompt

**Source**:
A web resource (URL) discovered during retrieval, deduplicated by URL and assigned a stable
**numeric id** used for citation, plus a coarse **trust tier** — `primary/official` >
`reputable-secondary` > `community/unverified` — set by the **Retriever** at fetch. The tier lets the
**Assessor** dismiss noise conflicts (official docs vs a stale blog) without a round, the
**Synthesizer** weigh sources when reconciling, and the **Report's** Sources list show the reader the
basis. Corroboration (how many independent sources back a finding) is a further, emergent signal — not
a substitute for the tier. Ids are append-only across the evolving **Report**: follow-up runs add new
sources with new ids and never renumber existing ones, so inline citations stay valid.
_Avoid_: link, reference

**Finding**:
A discrete claim extracted from one or more sources, tagged with the source ids it came from *and* a
typed **evidence span**. Each span has a `kind`: a `quote` (verbatim source text — the default, and the
only kind a cheap deterministic string-check against the scrape can confirm), an `image_region` (url +
alt/caption, for charts and infographics), or a `locator` (page/timestamp + the retriever's paraphrase,
for paywalled or non-text sources). Non-text kinds are explicitly **non-verbatim**, so the fidelity
guarantee degrades *visibly* rather than silently — a `quote` is audited by construction; anything else
announces that it isn't.
_Avoid_: result, fact

**Report**:
The deliverable — an HTML document backed by two sidecar folders, `diagrams/` (`.mmd` sources +
compiled SVGs) and `assets/` (the vendored chart library, the shipped `report.css`, and any
irreplaceable images), carrying the stated **goal**, a numbered list of **sources**, and an **answer** whose claims cite source ids inline
(e.g. "… happened in 2024 [2][5]"). Its body is kept readable by humans *and* agents — it is routinely
consumed as documentation, so heavy artifacts go to those sidecars and the body keeps only their
references and semantic pointers. It evolves across follow-up runs rather than spawning a new file each time: every
run merges its new **Findings** and the **Editor** re-cuts the whole **answer**, so the document stays
concise as it grows. The prior file is snapshotted before each overwrite.
_Avoid_: summary, output

**Research round**:
One pass of plan → parallel retrieval → extract → dedup (by URL, assign append-only ids) →
**conflict-scout** (a *deep* brief inserts a **Verifier** pass next). Merge is *not* a per-round step —
it is deferred to the **Synthesizer**, which runs once after the **Assessor** green-lights coverage.
Rounds run in two tiers: *within* a workflow run the
**assessor** gates them autonomously; *between* runs the user steers via follow-up questions. Bounded
by a hard round cap per depth — *not* a token budget (Claude Code does not reliably expose one). Both a
**Conflict-scout** conflict and a **Verifier** refutation that needs fresh evidence become gaps filled
by the *same* round retrieval — there is one information-pulling mechanism, not several.
_Avoid_: iteration

**Retriever**:
A workflow subagent that fetches from an external source. Today: **firecrawl** (search/scrape),
with **WebSearch** as fallback. Designed to admit more retrievers later (Perplexity, Gemini deep
research) without changing the report contract. Alongside text, a retriever records **candidate
image URLs** from each page (so the Composer can later fetch the ones worth including) and assigns each
**Source** its coarse **trust tier** at fetch.
_Avoid_: scraper, crawler, fetcher

**Assessor** (coverage assessor):
The loop's single gate. A subagent that, after a round, judges whether more research is needed —
weighing subject complexity, accumulated context size, explicit user intent (a "deep research" brief
biases toward more rounds), the **Conflict-scout's** `conflicts[]`, and — on a deep brief — the
**Verifier's** unresolved refutations. It judges each conflict's/refutation's **materiality** against
the brief's goal and planned sub-questions (a material, resolvable one is itself a gap). It green-lights
only when coverage is sufficient *and* no material, resolvable conflict or refutation remains;
otherwise it emits the gaps plus proposed follow-up questions.
_Avoid_: evaluator, critic

**Conflict-scout**:
A subagent that runs each round after dedup, *before* the **Assessor**: it diffs the accumulated
**Findings** for contradictions and emits `conflicts[]`, each tagging the clashing finding/source ids
with a *hint* at whether it is resolvable by more retrieval. **Materiality is not the scout's call** —
the **Assessor** judges it against the brief's goal and planned sub-questions (no drafted answer exists
yet). The scout only detects — it neither gates the loop (the **Assessor** does) nor writes prose (the
**Synthesizer** does), and it compares claims against *each other*, not ground truth, so it is no
fact-checker.
_Avoid_: verifier, fact-checker, referee

**Verifier**:
A depth-gated adversarial subagent — runs only on a *deep* brief, each round, after the
**Conflict-scout**. It tries to *refute* the material **Findings**, reasoning over the already-gathered
corpus (other findings, source **trust tiers**, internal logic) — it does **not** fetch itself. A
finding it can refute outright is dropped; one it cannot settle without fresh counter-evidence becomes
a gap the **Assessor** acts on (filled by an ordinary round, the same mechanism conflicts use). It
judges *truth/reliability*, where the **Conflict-scout** judges only mutual *consistency*. Quick and
standard briefs skip it.
_Avoid_: fact-checker, skeptic, critic

**Synthesizer**:
The reasoning core. A subagent that runs **once**, only after the **Assessor** green-lights coverage:
it reads the full deduplicated **Findings** and **Sources**, reconciles findings where they can be
reconciled, surfaces with attribution the residual contradictions the **Conflict-scout** flagged as
irreducible (it never hides them), and composes the structured draft **answer** with inline `[id]`
citations. It composes **audience-neutral** — the full, faithful argument with all its nuance and
caveats, never pre-trimmed for a reader; adapting it to the brief's audience is the **Editor's** job.
Turning gathered claims into a coherent, cited argument is its work — distinct from the **Editor**,
which adapts that draft to the reader and cuts and clarifies it afterward. On a follow-up run it
re-synthesizes the *whole* answer from the accumulated findings (holistic, per ADR-0005), not just the
new material.
_Avoid_: writer, merger, aggregator

**Editor**:
A subagent that adapts the draft **answer** to its reader before it is rendered — the **sole
audience-aware stage**. Guided by the brief's audience tier (`lay` / `informed` / `practitioner` / `expert`) it sets how
much jargon to define, how much prior knowledge to assume, and how dense to write: for `lay` it defines
terms inline, leads with intuition, and cuts expert-only nuance; for `informed` it assumes general
literacy but defines field-specific terms; for `practitioner` it assumes the basics yet still defines
advanced terms and expands abbreviations on first use; for `expert` it assumes the terminology (jargon
and abbreviations), trims background, and foregrounds caveats and edge cases. Throughout it adversarially cuts
redundancy and filler — without flattening the **Findings'** accuracy. It also marks where a visual (a
Mermaid diagram, table, or chart) would carry an idea better than prose — leaning on more of them for a
`lay` reader, fewer for an `expert` — and specifies what it should show, so the **Composer** renders
only visuals that earn their place. Independent of
whoever drafted the answer, so the cutting is a second pair of eyes, not self-grading.
_Avoid_: proofreader, summarizer

**Composer**:
The final stage of a workflow run — a subagent that renders the editor-approved **answer** as the
HTML **Report**: it builds the linear document (table of contents + anchors) and keeps the HTML body
**readable by humans and agents alike** — heavy artifacts live in the sidecar `diagrams/` and `assets/`
folders and the body carries only their references, since the Report is itself read as documentation.
It does not design: the body is semantic HTML against a fixed class vocabulary, styled by a shipped
`report.css` (copied into `assets/`, linked relatively), so every Report and snapshot shares one
identity and the head carries no bespoke `<style>`.
Mermaid diagrams the Editor called for are compiled to SVG with mmdc into `diagrams/` (beside their
`.mmd` sources) and embedded as `<img>`, each preceded by an HTML comment pointing to its `.mmd` source
so a reading agent gets the diagram's meaning without parsing path data. Quantitative charts use **Chart.js** — a version-pinned
copy vendored into `assets/` (not a CDN link, so the Report stays offline and self-contained): a
`<canvas>` plus a compact, readable `data` config, with a `<noscript>` `<table>` of the same numbers so
the data survives without JS. It reconstructs visuals from the **Findings** by default, downloading a
source image into `assets/` only when the visual is irreplaceable — always captioned with its
**Source** id. Writes the artifact to disk and returns only the path plus a short manifest, so the
verbose HTML never enters the main context.
_Avoid_: renderer, formatter, post-processor

**Follow-up question**:
A ready-to-use candidate sub-question the assessor proposes after a run. At the human checkpoint the
user selects any of these and/or adds their own; the selection becomes the next run's brief.
_Avoid_: suggestion

## Relationships

- A **Research brief** drives one or more **Research rounds**
- A **Research round** runs many **Retrievers** in parallel, each yielding **Sources**
- A **Finding** cites one or more **Sources** by numeric id
- A **Report** = the **goal** + the **Sources** + an **answer** assembled from **Findings**
- Each round the **Conflict-scout** diffs **Findings** for contradictions and feeds the **Assessor**;
on a deep brief the **Verifier** then tries to refute material **Findings** and also feeds the gate
- Every **Finding** carries a typed **evidence span** (a verbatim `quote` where the source allows), so
its citations are auditable
- The **Assessor** green-lights coverage; only then the **Synthesizer** reconciles the **Findings**
and composes the draft **answer**, which the **Editor** trims and the **Composer** renders

## Example dialogue

> **Dev:** "The same article came back from two retrievers — is that two sources or one?"
> **Domain expert:** "One **Source** — dedup by URL, one numeric id. Both retrievers surfacing it
> doesn't double-count; it just raises confidence in the **Findings** that cite it."

## Flagged ambiguities

- **"source" vs "citation"**: a **Source** is the resource (carrying a numeric id); a *citation* is a
reference to that id inside the answer. Kept distinct.
- **"round" vs "loop"**: a **Research round** is one pass; the *loop* is the control structure that
decides whether to run another round.
- **Assessor vs Synthesizer vs Editor**: three sequential single-responsibility stages on different
objects. The **Assessor** judges *coverage* (run another round?) and gates the loop; the
**Synthesizer** *composes* the cited draft and reconciles contradictions; the **Editor** judges
*prose* (concise and readable?). Gate → compose → polish.
- **Where audience lives**: only in the **Editor**. The **Synthesizer** composes *audience-neutral*
(the full, faithful argument); the **Editor** is the sole stage that adapts it to the brief's tier
(`lay`/`informed`/`practitioner`/`expert`) — so the same synthesis can be re-edited for a different reader without
re-synthesis. The **Composer** is audience-neutral too: it only renders what the Editor marked.
- **"cross-check" is split**: the **Conflict-scout** *detects* contradictions among findings (per
round); the **Assessor** *decides* which warrant another round; the **Synthesizer** *surfaces* what
survives. No single stage "does cross-check" end-to-end.
- **Conflict-scout vs Verifier**: the scout checks findings against *each other* (internal
consistency, every round); the **Verifier** checks a finding against independent *ground truth*
(only on deep briefs). Different objects, different trigger. Fidelity (does the cited source say
it?) is handled by neither — it is enforced structurally by the **Finding's** evidence span.
Loading
Loading