Code Conclave

Expert code review for data science and research codebases — the kinds of projects where silent data bugs, untracked analytical decisions, and "it ran once on my laptop" are the real failure modes.

Most research code is written by one person and reviewed by nobody before it's cited in a paper. This system closes that gap by running three independent expert reviewers over your codebase in parallel, each with a different lens tailored to what you're building (e.g., data-pipeline lenses for Python, package-design lenses for R — see below). A synthesis agent then aggregates findings, flags where reviewers disagree, and walks you through triage one decision at a time.

It's designed for the projects data scientists actually work on: ETL-and-analysis pipelines, R packages maintained by small teams, pre-publication audits, handoffs to a collaborator who wasn't there for the first 80% of the work. The goal is to catch the things that don't show up in a linter or a type checker — wrong merges, hardcoded thresholds, schema drift, silent NaN propagation, utility modules turning into junk drawers.

Architecture

Human + Interview agent (conversational)
└── Produces: project_context.md
         │
         ▼
Orchestrator (automated)
├── Spawns reviewers in parallel (Claude Code Agent tool)
│   ├── Perspective A ──┐
│   ├── Perspective B ──┤ (independent, no cross-talk)
│   └── Perspective C ──┘
│                       │
├── Collects all findings
│                       │
└── Runs synthesis (dedup, flag disagreements, prioritize)
         │
         ▼
    review_report.md
         │
         ▼
Human triages decisions → Claude implements

What the human actually does

Interview (optional): Converse with the interview agent to produce project_context.md. This is the only step that requires human input — the agent needs your intent, constraints, and context that can't be inferred from code.
Launch: Tell the orchestrator to run. ("Run the conclave", "Review this project", etc.)
Wait: The orchestrator spawns three reviewer agents in parallel, collects findings, and runs synthesis. You'll get status updates as each step completes.
Triage: The orchestrator presents prioritized findings one at a time, or you can edit review_report.md directly to batch your decisions.
Implement: Tell Claude to implement the "now" items from the action plan.

Why fan-out/fan-in, not debate?

LLM-to-LLM debate suffers from conformity cascade — agents shift toward consensus rather than maintaining genuinely independent perspectives (Wynn et al., 2025; arxiv.org/abs/2509.05396). By keeping reviewers independent and only aggregating at the synthesis step, we preserve the value of divergent expertise.

Why a project interview?

Reviewers produce better findings when they understand intent, constraints, and stage. The interview agent explores the codebase and asks the developer targeted questions to produce a project context document that all reviewers receive. This prevents reviewers from flagging intentional decisions as bugs, and helps them calibrate severity to the project's actual priorities.

The interview is optional — you can skip it and let reviewers work from the codebase alone. But it significantly improves review quality, especially for projects with workarounds, temporary scripts, or domain-specific constraints that aren't obvious from the code.

Why an orchestrator?

Without it, the human has to manually spawn three agents with the right prompts, wait for each to finish, copy findings into a synthesis step, and manage the whole flow. The orchestrator handles all that mechanics so the human can focus on the two parts that actually need human judgment: the interview and the triage.

Language versions

This repo contains review configurations for different ecosystems. Each shares the same fan-out/fan-in architecture and interview agent, but the perspectives and base prompts are tailored to the domain.

R — Package Development

Focused on reviewing R packages. Three perspectives:

Perspective	File	Core lens
Tidyverse API & Package Design	`r/personas/tidyverse_api.md`	API design, function composability, DESCRIPTION hygiene
Usability & Developer Experience	`r/personas/developer_experience.md`	Error messages, onboarding, documentation
Data Validation & Pipeline Contracts	`r/personas/data_validation.md`	Codebook schema, column contracts, assertions

See r/README.md for full details.

Python — Data Science Pipelines

Focused on reviewing data science pipelines (ETL, EDA, analysis). Three perspectives:

Perspective	File	Core lens
Pipeline Integrity	`python/personas/pipeline_integrity.md`	Data correctness, merge safety, silent failures
Reproducible Research	`python/personas/reproducible_research.md`	Audit trails, config-driven decisions, re-runnability
Pipeline Architecture	`python/personas/pipeline_architecture.md`	Modularity, extensibility, abstraction calibration

See python/README.md for full details.

File structure

code-conclave/
  README.md                        <- you are here
  orchestrator.md                  <- spawns reviewers, collects findings, runs synthesis
  project_interview.md             <- shared: repo context interview agent
  base_prompt.md                   <- shared rules, output format, severity levels
  synthesis_prompt.md              <- shared synthesis logic, convergence rule, triage structure
  project_context_template.md      <- stubbed example of interview output
  r/
    README.md                      <- R-specific usage guide
    base_prompt.md                 <- R-specific extension: scope, categories
    synthesis_prompt.md            <- R-specific extension: reviewer names, architecture prompts
    personas/
      tidyverse_api.md
      developer_experience.md
      data_validation.md
  python/
    README.md                      <- Python-specific usage guide
    base_prompt.md                 <- Python-specific extension: scope, categories
    synthesis_prompt.md            <- Python-specific extension: reviewer names, architecture prompts
    personas/
      pipeline_integrity.md
      reproducible_research.md
      pipeline_architecture.md

The root-level base_prompt.md and synthesis_prompt.md contain everything that's genuinely shared across domains. The language subdirectories contain thin extensions that supply scope, category enums, reviewer names, and domain-specific architecture prompts. When you run a review, the orchestrator feeds both the shared file and the language extension to each reviewer. This keeps the rules in one place so they can't drift across languages.

Getting started

Open a Claude Code session in the project you want to review and paste one of the following:

Quick start (skip the interview)

Read the orchestrator instructions at ~/code/code-conclave/orchestrator.md.
Review this codebase following those instructions. It's a [Python data science pipeline / R package].

Full workflow (recommended)

Read the project interview instructions at ~/code/code-conclave/project_interview.md.
Interview me about this project to produce a project_context.md, then read
~/code/code-conclave/orchestrator.md and run the full review.

The interview agent will ask you questions one at a time about the project's purpose, data, architecture, and priorities. Once it has enough context, it writes project_context.md and hands off to the orchestrator, which spawns reviewers, collects findings, synthesizes, and walks you through triage.

Triage

After synthesis, the orchestrator will walk you through prioritization one question at a time. Alternatively, you can edit review_report.md directly to batch your decisions:

_pending_ in the Decision column — replace with now, next sprint, backlog, or wontfix
Add rationale in the Rationale column
Disagreements list named options with a recommended label

Editing the file directly is useful when you want to:

See all decisions at once and batch your answers
Come back to it later if you need to think
Keep a permanent decision record
Hand off to Claude for implementation while you context-switch

Key design decisions

Structured output format: All reviewers use the same schema so synthesis can deduplicate programmatically
Confidence scores: Each finding includes a confidence level — the synthesis agent uses this to weigh disagreements
Explicit scope boundaries: Each reviewer is told what to ignore, not just what to focus on, to prevent overlap
No cross-talk: Reviewers never see each other's output. Only the synthesis agent sees all findings.
Project context: The interview agent produces a document that prevents reviewers from working blind

Adapting for other ecosystems

To add a new language/domain:

Create a new subdirectory (e.g., julia/, sql/)
Write a small base_prompt.md extension — it should only contain what's genuinely domain-specific: a one-paragraph review-domain description, the scope inclusion/exclusion lists, the category enum, domain-sharpened severity notes, and the file path format. Everything shared (rules, output format, severity levels, end-of-review template) lives in the root-level base_prompt.md and should not be duplicated.
Identify 3 perspectives that cover the domain's highest-stakes review concerns
Write persona files following the established format (see existing personas for structure — "Informed by" sources, expertise, focus areas, what to deprioritize, example findings)
Write a small synthesis_prompt.md extension — reviewer names + abbreviations, report header text, file path format, and domain-specific architecture prompts. The shared convergence rule, triage logic, and report template live in the root.
Add the new language to the orchestrator's Step 1 file list
project_interview.md is language-agnostic and shared — no new copy needed

Iterative use

I recommend using these agents iteratively. Once you've addressed changes from a review, set off another review cycle in a fresh chat. The fresh chat matters: it prevents the new reviewers from being anchored by the prior review's framing, and it keeps each round independent in the same way the three parallel reviewers are independent.

On the use of named perspectives

This system references real people's published expertise to define review perspectives, not to create synthetic personas or digital twins. The distinction matters:

We are reviewing code through the lens of publicly documented principles — books, talks, blog posts, and open-source work.
We are NOT simulating these individuals, generating quotes attributed to them, or claiming their endorsement.
Findings should be framed as "this conflicts with the principles in Column Names as Contracts" — not "Emily Riederer would say..."
The named references are shorthand for well-defined schools of thought. If any of these individuals expressed discomfort with this use, the names should be replaced with descriptive labels while keeping the review criteria intact.

Credits

The project interview agent's approach — relentless one-question-at-a-time interviewing that walks down the decision tree, provides a recommended answer for each question, and explores the codebase before asking — is adapted from Matt Pocock's grill-me skill (MIT licensed). The Code Conclave interview extends that pattern with a structured output schema (project_context.md) tailored to handing off context to downstream reviewer agents, but the core interviewing discipline is Matt's.

References

Du et al. (2023) "Improving Factuality and Reasoning through Multiagent Debate" — arxiv.org/abs/2305.14325
Wynn et al. (2025) "Talk Isn't Always Cheap: Failure Modes in Multi-Agent Debate" — arxiv.org/abs/2509.05396
Riederer, E. "Column Names as Contracts" — emilyriederer.netlify.app/post/column-name-contracts/
Ball, P. & HRDAG "The Task Is a Quantum of Workflow" — hrdag.org/2016/06/14/the-task-is-a-quantum-of-workflow/
Wilson et al. (2017) "Good Enough Practices in Scientific Computing" — doi.org/10.1371/journal.pcbi.1005510
Wickham, H. & Bryan, J. R Packages (2e) — r-pkgs.org
Bryan, J. "What They Forgot to Teach You About R" — rstats.wtf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Conclave

Architecture

What the human actually does

Why fan-out/fan-in, not debate?

Why a project interview?

Why an orchestrator?

Language versions

R — Package Development

Python — Data Science Pipelines

File structure

Getting started

Quick start (skip the interview)

Full workflow (recommended)

Triage

Key design decisions

Adapting for other ecosystems

Iterative use

On the use of named perspectives

Credits

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
python		python
r		r
README.md		README.md
base_prompt.md		base_prompt.md
orchestrator.md		orchestrator.md
project_context_template.md		project_context_template.md
project_interview.md		project_interview.md
synthesis_prompt.md		synthesis_prompt.md

Folders and files

Latest commit

History

Repository files navigation

Code Conclave

Architecture

What the human actually does

Why fan-out/fan-in, not debate?

Why a project interview?

Why an orchestrator?

Language versions

R — Package Development

Python — Data Science Pipelines

File structure

Getting started

Quick start (skip the interview)

Full workflow (recommended)

Triage

Key design decisions

Adapting for other ecosystems

Iterative use

On the use of named perspectives

Credits

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages