Aragora

Aragora is an auditable execution control plane for AI-assisted decisions: multi-model review in, a verifiable Decision Receipt out.

It coordinates heterogeneous models to adversarially review a change or a decision, preserves the dissent and provenance, stops truthfully when evidence is thin, and emits a portable receipt anyone can verify offline with the standalone verifier. PyPI publishing for the verifier is pending.

New here? The Quickstart gets you a working debate in under a minute. Auditors should start with the Cold Reviewer Guide.

I want to…	Command
Run the standalone debate engine	`pip install aragora-debate`
Verify a Decision Receipt with the standalone verifier	`PYTHONPATH=src python -m aragora_verify <receipt>` from `aragora-verify/`; PyPI publish pending
Call the Aragora API from Python	`pip install aragora-sdk`
Self-host the full platform	`docker compose -f deploy/demo/docker-compose.yml up`

The problem

Individual LLMs are unreliable. Their personas shift with context, their confidence does not correlate with accuracy, and they often optimize for plausible agreement instead of truth. Aragora treats that as a systems problem: it makes consequential AI-assisted decisions inspectable and verifiable instead of asking you to trust one model's say-so.

Disagreement becomes evidence. Heterogeneous models challenge each other before work advances; dissent is preserved, not averaged away.
Every decision has a receipt. Verdict, the reviewing models and their independence, dissent, confidence, and provenance stay inspectable.
It stops truthfully. When the quorum can't be formed or evidence is thin, the receipt says so — it never fabricates a consensus.
Receipts are portable and verifiable. A receipt is a schema-conformant artifact (the Open Decision Receipt) that aragora-verify checks offline, with no dependency on Aragora.

The wedge: a governance gate for AI-written code

Drop Aragora into CI. A multi-model quorum reviews each PR and posts a grounded PR comment — your second opinion, with the same review surface that feeds the Decision Receipt path.

# .github/workflows/aragora-review.yml
name: Aragora Review
on:
  pull_request:
    types: [opened, synchronize, reopened]
permissions:
  contents: read
  pull-requests: write
  issues: write
jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: synaptent/aragora@1e3ce85ae66489753cace8b60551a99fada9749c
        with:
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          post-comment: 'true'

The action posts a PR review comment and uploads the machine-readable review artifact. When a Decision Receipt artifact exists, anyone — a teammate, an auditor, a customer — can verify it independently with the standalone aragora-verify verifier (no Aragora dependency):

# PyPI publish pending; today it lives in this repo under aragora-verify/:
cd aragora-verify
PYTHONPATH=src python -m aragora_verify ../decision-receipt.odr.json

# After PyPI publish:
# aragora-verify decision-receipt.odr.json

Try it now

pip install aragora
aragora demo --offline              # zero-key debate, writes a local receipt

export ANTHROPIC_API_KEY=...        # provider credential for live model review
aragora review-pr 123               # multi-agent review of a GitHub PR
aragora receipt export <id> --format odr -o receipt.odr.json   # portable receipt

Core workflows

AI code review — heterogeneous-model review of a diff or PR, with severity-tagged findings and a receipt. See docs/CLI_REFERENCE.md.
Gauntlet — adversarial stress-testing of a claim or spec; attack/defend cycles produce a cryptographic receipt.
Structured debates — multi-round debate with consensus detection and convergence tracking (aragora ask).

The load-bearing core

Aragora is large, but five modules carry the product. Start here:

Module	Responsibility
`aragora/debate/`	The Arena orchestrator — runs rounds, detects consensus/convergence.
`aragora/agents/`	Agent implementations (API + CLI), heterogeneous model transport, fallback.
`aragora/gauntlet/`	Decision Receipts: the native record, the portable ODR, export and signing.
`aragora/swarm/`	The merge-quorum gate — collects model-review evidence and tiers settlement.
`aragora/server/`	The HTTP/WebSocket API and handlers.

aragora-verify/ is a separate verifier project with no Aragora dependency: the public verifier for receipts. Everything else under aragora/ is supporting or experimental surface — treat it as such until it's documented here.

Product boundary

Aragora is a governance and review layer, not an execution runtime. It is not a replacement for worker runtimes like Codex, Claude Code, or OpenCode. Use it when review, provenance, and a verifiable decision record matter; keep your existing runtimes when raw speed is all you need.

We do not sell lights-out autonomy as the default story.
We do not advance work without evidence, review, and clear terminal states.
Consequential effectors are denied by default unless an admin-scoped approval artifact exists; sandboxed backends are mandatory for browser/host effectors.

See Boundaries and Scope for the full non-goals ledger.

From the wedge to the full vision

The CI review gate is the first rung of a deliberate ladder — each step reuses the same receipt, memory, and quorum substrate rather than bolting on a new system:

CI review gate ─▶ portable Decision Receipt (ODR) ─▶ crux + calibration in the receipt
   ─▶ bounded-backlog foreman (unattended work, receipts + stop conditions)
   ─▶ unified idea→goal→action→orchestration DAG ─▶ organization substrate

The narrow product you can adopt today and the long-horizon thesis are the same system at different stages. The complete scope — every capability, the roadmap, and the agent-civilization frontier — is in the Full Vision postscript, annotated by status and gate.

Anatomy of a Decision Receipt

The receipt is the unit that carries through every stage. One review produces:

decision ──▶ model jury (heterogeneous, independent)
               ├─ proposals / critiques / revisions
               ├─ dissent trail (+ cruxes when a CruxReceipt is supplied — ODR-4)
               └─ per-agent ELO + Brier calibration when provenance present — ODR-5
          ──▶ verdict + confidence
          ──▶ human attestation (optional; EU AI Act Art. 14)   🔄 #8230
          ──▶ Ed25519 signature                                 🔄 #8225
          ──▶ portable ODR JSON  ──▶  aragora-verify <receipt>  ✅ schema+digest+quorum, offline
          ──▶ Rekor public anchoring (tamper-evidence)          🔄 #8231

Today's receipts verify on schema, digest, and quorum consistency offline; per the ODR spec, cruxes and calibration carry explicit absent markers unless their source is supplied (ODR-4/5), and public-key signing and anchoring are in-flight. See the proof ladder.

Find your path

Developer — Quickstart → aragora review-pr → CLI Reference · SDK Guide
Auditor / reviewer — Cold Reviewer Guide → Open Decision Receipt spec → aragora-verify
Founder / operator — the wedge above → proof ladder → Full Vision
Compliance buyer — Enterprise features → EU AI Act / SOC 2 status in honest current state
Agent / tool builder — the ODR as the external contract → MCP tools → API Reference

Documentation

Quickstart · Cold Reviewer Guide · CLI Reference
Open Decision Receipt spec · SDK Guide · API Reference
Feature status · Enterprise features · Architecture deep-dive
Inspiration and credits

Security

Secrets load from AWS Secrets Manager in production (never standing env keys); local development uses a gitignored .env. See the security overview, compliance overview, and deployment guide.

Contributing & License

Contributions welcome — see CONTRIBUTING.md. MIT licensed (see LICENSE).

Postscript — The Full Vision

The sections above are what Aragora is today and how to use it. What follows is the complete intended scope — the thesis, the whole capability surface, the roadmap, and the long-horizon vision — recorded in one place so that anyone (human or agent) who reads to the end sees exactly how large this project means to become. It is deliberately dense.

Read this honestly. Markers: ✅ shipped/working · 🟡 built but not productized · 🔄 in-flight · 🔮 designed or aspirational. Per our commercial discipline, external claims stay narrower than this roadmap and tied to measured proof. Canonical metrics live in docs/METRICS.md; the candid current-state ledger is docs/HONEST_ASSESSMENT.md. Where a number is contested across docs it is rounded here on purpose. Every major claim below carries a proof link, a status marker, or an explicit aspirational label — start with the proof ladder.

Proof ladder — how to verify every claim here

Claim class	Canonical source / gate
Scale & metrics	`docs/METRICS.md` — `python scripts/regenerate_metrics.py --check` fails CI on >0.5% drift
What's real vs aspirational	`docs/HONEST_ASSESSMENT.md`
Receipt format (the external contract)	Open Decision Receipt spec
Decision-semantics roadmap	ODR spine epic #8223; ODR-1..7 → #8224/#8225/#8226/#8227/#8229/#8230/#8231
Autonomy truth	B0 benchmark `docs/status/B0_BENCHMARK_TRUTH_STATUS.md`
Enterprise / compliance	GA checklist (SOC 2 / pentest gate) · Enterprise features
Frontier work is bounded	the proof-first Foreman gate + capability checkpoints CP-1..5 (below)

The thesis — Decision Integrity

Aragora orchestrates heterogeneous, adversarial multi-model debate to vet, challenge, and audit consequential decisions before they ship — and emits cryptographic proof that the decision was rigorously examined. This is a distinct category from cooperative agent orchestration (LangGraph, CrewAI, AutoGen), from single-provider agent SDKs, and from post-hoc AI observability: those coordinate graphs or monitor behavior after the fact; Aragora improves decision quality before commit and produces the audit trail as a byproduct. (docs/WHY_ARAGORA.md, docs/COMPARISON_MATRIX.md)

Why a single model isn't enough. LLMs exhibit correlated failures (shared training data and RLHF biases), sycophantic agreement (confidence uncorrelated with accuracy), and persona instability (minor prompt changes flip answers). Treat each model as an unreliable witness and extract signal from where independent witnesses disagree. Clinical triage, financial risk, legal review, and architecture decisions cannot rest on "probably right." The moat: no funded competitor combines structured adversarial multi-agent debate with portable, verifiable decision receipts. (docs/WHY_ARAGORA.md, docs/HONEST_ASSESSMENT.md)

Why this is not generic orchestration:

	Cooperative orchestration (LangGraph / CrewAI / AutoGen)	Aragora
Agent disagreement	a bug to resolve	the signal — preserved as dissent + cruxes
Primary output	the completed task	a verifiable Decision Receipt
Model independence	often single-provider	heterogeneous providers by design
When evidence is thin	proceeds	stops truthfully; the receipt says so
Delegation	open-ended	bounded; approval artifacts + sandboxed effectors

(docs/strategy/BOUNDARIES_AND_SCOPE.md, docs/COMPARISON_MATRIX.md)

The Five Pillars (product framing — docs/EXTENDED_README.md)

SMB-ready, enterprise-grade. Works for a 5-person startup on day one; scales to regulated enterprise without rearchitecting. SSO/MFA, encryption, RBAC, and compliance frameworks are built-in defaults, not premium tiers.
Leading-edge memory & context. 4-tier Continuum Memory (fast/medium/slow/ glacial) + Knowledge Mound give every debate institutional history and evidence provenance; RLM (Recursive Language Models) sustains coherence over long sessions and large corpora where single models degrade.
Extensible & modular. 12+ model providers, broad connectors, Python + TypeScript SDKs, a large REST/WebSocket API surface, and a workflow-template library.
Multi-agent robustness. Claude, GPT, Gemini, Grok, Mistral, DeepSeek run in structured Propose/Critique/Revise debates with multiple consensus modes; ELO and Brier calibration track per-domain agent quality; the Trickster flags hollow consensus. Output is higher-quality and lower-bias than any single model, with a complete dissent trail.
Self-healing & self-extending. The Nomic Loop lets the platform debate, design, implement, and verify improvements to itself, with human approval gates and automatic rollback.

The long arc — Tool → Organization Substrate

Aragora is designed to climb five stages: Tool → Teammate → Foreman → Chief of Staff → Organization Substrate — from bounded useful results today to a cross-org agentic operating system. The near-term focus (next 60 days) narrows hard to two of these pillars: reliable autonomous execution on bounded backlogs, and cryptographic receipts & auditability. (docs/CANONICAL_GOALS.md)

Tool ──▶ Teammate ──▶ Foreman ──▶ Chief of Staff ──▶ Organization Substrate
 one      assists      runs bounded   plans & routes      agents + humans as
 review   a human      backlogs       across backlogs     co-equal consumers
 +receipt              unattended     with receipts       on one runtime truth
(today's wedge) ───────────────────────────────────────▶ (long-horizon thesis)

The eight foundational pillars the substrate is organized around: ① adversarial heterogeneous consensus + crux-finding · ② reliable autonomous execution (contracts, preflight, repair, fail-closed escalation) · ③ a unified DAG (ideas → goals → actions → orchestration) with optional interactivity · ④ permissioned, portable, attributable memory across repos/docs/APIs/chat/inbox/telemetry · ⑤ cryptographic receipts & auditability with eventual proof-carrying code · ⑥ SMB operator leverage (intent → action in <10 min) · ⑦ self-improvement on the same substrate as user-facing work · ⑧ agents and humans as co-equal consumers with parity surfaces backed by one runtime truth. (docs/CANONICAL_GOALS.md)

The complete capability surface

Scale (canonical counts in docs/METRICS.md, rounded): ~4,200 Python files · ~1.9M LOC · 140+ top-level modules · 200,000+ test functions across ~5,400 files · 3,297 API operations across 2,870 paths · 35+ allowlisted agent types across 12+ providers · 41 Knowledge Mound adapter specs (46 files) · 360+ RBAC permissions · Python + TypeScript SDKs · v2.9.0. (Practical real-time debate uses 2–6 agents; the value is heterogeneity, not raw count — see docs/HONEST_ASSESSMENT.md.)

Core debate (✅). Arena engine orchestrates Propose/Critique/Revise/Vote phases, extended multi-round debates, semantic convergence detection, ELO-based team selection with continuous calibration. Consensus modes: majority, unanimous, weighted/ judge, byzantine, and Prover-Estimator. Enhancements: Trickster (hollow-consensus detection), Rhetorical Observer, Security Barrier (telemetry redaction), Calibration Tracker, Performance Monitor, Graph/Matrix topologies, breakpoints (pause/resume).

Agents (✅). API providers: Anthropic, OpenAI, Google Gemini, Mistral (Large/ Codestral), xAI Grok, OpenRouter (DeepSeek/Llama/Qwen/Yi/Kimi), local Ollama/LM Studio. CLI agents: Claude, Codex, Gemini, Grok. Resilience: Airlock circuit breaker, automatic OpenRouter fallback on 429, session circuit-breaker (auth-state pinning), per-provider rate limiting, unified error hierarchy.

Memory & learning (✅). Continuum Memory (4 tiers, atomic cross-system writes via Memory Coordinator); Unified Memory Gateway fanning out across Continuum, Knowledge Mound, Supermemory, and claude-mem with a surprise-driven RetentionGate (Titans/MIRAS) and SHA-256 + Jaccard dedup; RLM context-as-REPL-variables (not prompt compression); ELO ratings, tournaments, leaderboards, performance-based selection.

Knowledge management (✅, Phase A2). Knowledge Mound: semantic vector search, graph store with lineage, domain taxonomy, visibility tiers (private→workspace→org→ public→system), time-bounded access grants, cross-workspace federation. Auto-curation: dedup, contradiction detection, confidence decay, RBAC governance, ResilientPostgres store, SLO alerting. 40+ registered adapters auto-built from Arena subsystems; bridges (MetaLearner, Evidence, Pattern); belief networks with claim provenance and cruxes.

Enterprise & security (✅, production-ready). OIDC/SAML SSO, TOTP/HOTP MFA, SCIM 2.0 (Okta/Azure/OneLogin), scoped API keys; RBAC v2 (360+ permissions, 7 roles, hierarchy, decorators, middleware, audit); multi-tenancy (SQL auto-filtering, quotas, metering); AES-256-GCM field encryption with Cloud KMS + key rotation, PII anonymization, GDPR erasure; circuit breakers, rate limiting, SSRF protection, security headers; incremental backup/DR with drills.

Compliance & governance (mixed). EU AI Act artifact generation for Articles 9/12/13/ 14/15 with risk classification (🔄, bundle ~90/100 complete; enforcement deadline Aug 2, 2026); SOC 2 Type II controls implemented (🔄 ~98%; blocker: external penetration test not yet commissioned — not certified); GDPR DSAR/erasure/consent/retention (✅); HIPAA field encryption, Safe Harbor de-identification, breach-notification workflows (✅ controls); SOX-oriented audit profiles (✅). (docs/HONEST_ASSESSMENT.md, docs/GA_CHECKLIST.md)

Integrations & connectors (✅, 50+). Chat: Slack, Discord, Teams, Google Chat, Telegram, WhatsApp (with TTS/voice). Streaming: Kafka, RabbitMQ (DLQ, bidirectional). Enterprise data: SharePoint, Confluence, Notion, PostgreSQL/MongoDB/MySQL/SQL Server/ Snowflake (CDC), Salesforce, HubSpot, Zendesk, Jira. Sources: GitHub, ArXiv, Wikipedia, SEC filings, HackerNews, Reddit, Twitter/X. Email/voice: Gmail + Outlook sync, Twilio phone debates, SMTP. Healthcare: HL7v2, FHIR. Bidirectional routing returns results to the originating platform.

Observability & control plane (✅). Prometheus custom metrics, Grafana dashboards, OpenTelemetry tracing (OTLP, multiple backends), structured JSON logs with PII redaction and correlation IDs; agent registry with heartbeats, priority scheduler, liveness/ readiness probes; policy governance (conflict detection, Redis cache, background sync, versioned rollback). 1,500+ control-plane tests.

Advanced capabilities. Pulse trending-topic ingestion (✅); Gauntlet adversarial red-team + cryptographic receipts (✅); Swarm orchestration — bounded work orders, worker launcher into managed worktrees, reconciler, leases, salvage queue (✅); Inbox Trust Wedge — Gmail → debate → signed receipt → approval → execute (✅ CLI, 🔮 web GUI for retest); Live Explainability, structural argument verification, outcome-feedback loop (✅); Workflow Engine — DAG automation with 50+ templates (✅); Prompt Engine — vague prompt → validated spec via debate (✅); Computer-Use bridge + OpenClaw compatibility (🔄).

SDK & ecosystem (✅/🔄). aragora-sdk (blessed Python client), full aragora package, @aragora/sdk (TypeScript); MCP server exposing a large tool surface for Claude integration with reasoning-trace capture; extensible plugin/marketplace architecture (🔮 no public marketplace endpoint yet).

Deployment & HA (✅). Docker Compose (dev/prod), Kubernetes Helm chart with multi-region values (US/EU/APAC data residency), HPA/PDB/network policies, External Secrets Operator, cert-manager; Terraform IaC; SQLite (dev) / PostgreSQL (prod) with pooling; unified TTL cache + Redis cluster failover; incremental backup to local/S3/GCS, offline/air-gapped mode.

Built but not productized (🟡). Real code in-tree, little or no public surface — tracked openly, not hidden (docs/FEATURE_GAP_LIST.md dormant table):

Crux detector — engine + operator CLI complete; no public API/SDK or crux-set-in-receipt yet (#8227, ODR-4).
Pareto provider router — optimizer + pricing DB shipped; the loop doesn't yet route by decision stakes or record rationale (#8233).
Tamper-evident audit trail — trail verifies checksums; external-witness append-only anchoring is building (TET spec; Rekor #8231).
ERC-8004 / blockchain — registries + handlers in-tree, deployed to no network; de-scoped by the steering-leverage filter (the anchoring need is served by Rekor instead).
Skills marketplace — code + seed catalog exist; no public endpoint or third-party content.

Vertical specialists (docs/verticals/, 🔄 guides exist; packaged offerings 🔮)

Healthcare — FHIR R4 (Epic/Cerner via SMART-on-FHIR), drug-interaction and contraindication analysis, treatment-pathway debate, HIPAA-compliant receipts.
Financial services — credit underwriting, fraud investigation, investment-committee review, stress-testing, audit-grade dissent trails (SOX/SOC 2 workflows).
Legal — clause-by-clause adversarial contract analysis, counterparty modeling, missing-clause detection, M&A due diligence (6 concurrent review streams), litigation risk.
Accounting — loan risk, equity valuation, deal-structure analysis, materiality assessment.

Self-improvement — the Nomic Loop & the flywheel

The Nomic Loop (✅, 233+ tests). A five-phase autonomous self-improvement cycle: Context → Debate → Design → Implement → Verify. Heterogeneous agents propose and argue improvements, architect a solution, generate code in isolated worktrees, then gate on automated tests + cross-agent review + merge arbiter + Knowledge Mound learning. Infrastructure: TaskDecomposer, HardenedOrchestrator (default since Feb 2026), BranchCoordinator, MetaPlanner, AutonomousOrchestrator. Safety: prompt-injection scanning, canary tokens, sandbox execution, review-gate scoring, automatic rollback, protected-file checksums, human approval gates. CLI: aragora self-improve "<goal>", scripts/self_develop.py, scripts/nomic_staged.py. (CLAUDE.md, docs/plans/SELF_IMPROVING_ARAGORA.md)

The flywheel. Aragora uses Aragora to improve Aragora: debates over tradeoffs → receipts capturing the reasoning → Knowledge Mound learning → better-calibrated next debate. As the system fixes its own bugs and ships its own features, the verifiable corpus and the agent-calibration data grow — and that calibration data is the moat. The strategy-mission cadence that produced this very README gated its own merge through Aragora's quorum→receipt machinery — the flywheel, demonstrated.

The frontier — Agent Civilization Substrate (🔮 designed; gated, see below)

Beyond human-operated orgs, Aragora is designed as a substrate for consequential multi-agent coordination where reputation is earned against external ground truth, not closed-loop agreement. Six tracks (AGT-01..06): activate the CruxDetector in live debates; an A2A consumer surface where agents register/discover/transact/consume receipts; Manifold Markets integration with rolling Brier scoring; synthetic GitHub markets (predict PR merges / issue closures with verifiable resolution); ERC-8004 reputation flow (claims → stakes → resolution → reputation deltas → dispatch eligibility — contracts written, no mainnet; de-scoped June 2026 in favor of Sigstore Rekor anchoring); and a Verifiable-Improvements-per-Agent-Hour (VIAH) self-justification metric. (docs/plans/ agent-civilization designs)

Crux Finder (✅ MVP / 🔮 shaping). Consensus mode crux_finder surfaces the 3–5 load-bearing disagreements where flipping a belief flips the conclusion; signed CruxReceipts, aragora crux "<question>". Crux-shaping prompts and per-round claim targeting are deferred pending dogfood runs.
Epistemic CI / Decision Integrity Core (🔮 DIC-13..22). Extend receipts beyond debates to code and organizational claims: executable claims (evidence + freshness SLAs + verification contracts), proof-carrying code units that fail closed when assumptions decay, epistemic decay signals proposing bounded repair, and a read-only organizational truth map. Initial shape is manifest-based and read-only.
Trust-Compound plan (🔄 TCP-1..7). Make the large surface legible without deletion: a canonical-metrics manifest verified in CI (so a claim like "46 adapters" passes or fails the build), packaging clarity, hotspot-file splits, wire/showcase/ shelve classification per subsystem, generated artifacts as build outputs, this README rewrite, and public CruxSets at aragora.ai/cruxes.

Discipline — capability checkpoints (CP-1..5)

The vision is bounded, not open-ended. Booster-stage investment in the frontier must graduate through checkpoints, each ~4 weeks apart: CP-1 stable self-healing soaks → CP-2 CruxDetector driving real follow-ups → CP-3 stable prediction-calibration curves → CP-4 a reputation delta changing real dispatch → CP-5 positive VIAH trend without an operator- rescue spike. Failing a checkpoint downscales the next investment; it does not kill the vision. Frontier (AGT-/DIC-) work never carries boss-ready until the proof-first gate explicitly opens the upper tranche. (docs/plans/ trust-compound + checkpoints)

Roadmap — priority-gated (docs/FEATURE_GAP_LIST.md, docs/CANONICAL_GOALS.md)

Current execution spine — Open Decision Receipt (epic #8223; supersedes the P0/P1 ordering below): ODR-1 vendor-neutral receipt profile (JSON Schema + JCS) #8224 → ODR-2 Ed25519 public-key signing #8225 → ODR-3 aragora-verify standalone offline verifier + /api/receipts/verify #8226 → ODR-4 expose the crux finder #8227 → ODR-5 calibration report + calibrated confidence #8229 → ODR-6 human-oversight attestation + EU AI Act Art. 14 pack #8230 → ODR-7 Sigstore Rekor public anchoring #8231. (1–3 are the spine: a receipt a stranger can verify; 4–6 enrich the payload; 7 makes anchoring public.)

P0 — PMF blockers: truthful live founder loop (✅ 5/5 proved); smart provider routing (✅ optimizer + runtime; 🔄 decision-stakes routing); complete repeatable user journey (🔄); KM reads enrich debate context (🔄 live proof pending).
P1 — value-prop proof (Q2 2026): OpenClaw end-to-end (🔄); 5 functional frontend paths (🔄); 10+ agent coordinated debates (🔄 scale testing); agent-first beta via REST (✅ 12-runner fleet); GitHub Actions pre-merge gate (🔄); public demo at aragora.ai/demo (🔄); EU AI Act bundle (🔄 ~90/100); <10-min onboarding (🔄).
P2 — hardening & enterprise (post-PMF): external penetration test (🔮 vendor shortlisted); Decision-Integrity UI Workbench (🔄 partial); SOC 2 Type II audit (🔄 ~98% controls); Enterprise Communication Hub (✅ shipped, 🔄 trigger validation).
P3 — scale & revenue (Q3–Q4 2026, de-scoped until PMF): cloud marketplace listings; vertical packages; Skills Marketplace pilot; on-prem productization; data residency / international.
P4 — strategic evolution (2026+): ✅ Prover-Estimator, cross-verification, truth- ratio weighting, anti-sycophancy, prompt-to-spec, Obsidian sync; 🔮 Dialectical Runtime synthesis (DIC-23..28), market-resolution mechanism, meta-improver for protocols.
P5 — federation (🔮): distributed debates across orgs, cross-org knowledge sync.

Focus strategy — depth over breadth (docs/FOCUS.md)

The codebase is explicitly tiered for investment: Tier 1 defensible core (~17% of files, ~100% of unique value: debate engine, Gauntlet, Knowledge Mound, ELO/calibration, Continuum memory, belief networks, verification, explainability) — invest, harden, make receipts the primary output; Tier 2 essential infrastructure (agents, API, storage, CLI — maintain, prune the oversized server surface); Tier 3 enterprise (RBAC, audit, billing, compliance — keep, don't differentiate); Tier 4 connectors (commodity — sufficient as-is); Tier 5 scope creep (some workflow/RLM/blockchain/computer-use/ canvas — move to contrib/ or shelve until customer demand). The standalone aragora-debate library extracts Tier 1 so anyone can run an adversarial debate in ~10 lines with zero infra dependencies.

Honest current state (docs/HONEST_ASSESSMENT.md, docs/GA_CHECKLIST.md)

Real and working: the debate engine (genuine multi-agent debates against live LLM APIs), multiple consensus modes, hollow-consensus detection, cryptographic receipts with multi-format export, ELO rankings, Continuum memory, the fully-wired self-improvement infrastructure, enterprise auth/encryption/key-rotation, and a very large test suite. GA readiness is tracked at ~98% (58/59 checklist items).

Honest qualifications: the B0 benchmark reports 100% verified-truth on the strict set, with a separate, lower legacy full-corpus metric tracked alongside it (see docs/status/B0_BENCHMARK_TRUTH_STATUS.md); SOC 2 Type II is not certified (the one open GA blocker is the external penetration test); semantic convergence degrades to TF-IDF/Jaccard without the optional sentence-transformers dependency; "blockchain" receipts are SHA-256 hashing, not an on-chain immutable ledger; and practical real-time parallelism is 2–6 agents, not the larger allowlisted count — the value is heterogeneity. External positioning should remain narrower than this roadmap and anchored to measured proof.

North stars (docs/CANONICAL_GOALS.md)

A vague request becomes a reviewable, executable spec in minutes.
A bounded backlog runs unattended with clear receipts, stop conditions, and minimal rescue.
Any decision is inspectable from one-line summary down to evidence and provenance.
Shared memory improves future work without collapsing trust boundaries.
Important claims and cruxes stay linked to evidence, receipts, freshness, and delayed settlement.
Aragora evolves tool → teammate → foreman → chief of staff → org substrate on one coherent runtime.
Agents and humans participate as co-equal consumers with portable reputation tied to external truth oracles.

Name		Name	Last commit message	Last commit date
Latest commit History 14,045 Commits
.agents/skills/elves-aragora		.agents/skills/elves-aragora
.claude		.claude
.convoys		.convoys
.devcontainer		.devcontainer
.githooks		.githooks
.github		.github
.grok		.grok
.plan		.plan
.work_queue		.work_queue
aragora-debate		aragora-debate
aragora-operator		aragora-operator
aragora-verify		aragora-verify
aragora		aragora
benchmarks		benchmarks
bin		bin
contracts/erc8004		contracts/erc8004
demos		demos
deploy		deploy
diagnostics		diagnostics
docs-site		docs-site
docs		docs
examples		examples
ide/vscode-aragora		ide/vscode-aragora
marketplace		marketplace
migrations		migrations
replays		replays
scripts		scripts
sdk		sdk
security		security
shoggoth		shoggoth
supabase		supabase
templates/n8n		templates/n8n
tests		tests
tutorials		tutorials
.dockerignore		.dockerignore
.env.example		.env.example
.env.production.example		.env.production.example
.gitattributes		.gitattributes
.gitguardian.yaml		.gitguardian.yaml
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.importlinter		.importlinter
.mypy-baseline		.mypy-baseline
.pre-commit-config.yaml		.pre-commit-config.yaml
.trivy.yaml		.trivy.yaml
.trivyignore		.trivyignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
Dockerfile		Dockerfile
INSTALL.md		INSTALL.md
Idea-to-Execution-Pipeline-Research.md		Idea-to-Execution-Pipeline-Research.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
NEXT_STEPS.md		NEXT_STEPS.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SECURITY_AUDIT_INPUT_VALIDATION.md		SECURITY_AUDIT_INPUT_VALIDATION.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
action.yml		action.yml
alembic.ini		alembic.ini
aragora_logo.png		aragora_logo.png
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.production.yml		docker-compose.production.yml
docker-compose.quickstart.yml		docker-compose.quickstart.yml
docker-compose.simple.yml		docker-compose.simple.yml
docker-compose.sme.yml		docker-compose.sme.yml
docker-compose.yml		docker-compose.yml
favicon.png		favicon.png
github-app-manifest.json		github-app-manifest.json
k8s		k8s
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_staging.py		run_staging.py
sitecustomize.py		sitecustomize.py
trivy.yaml		trivy.yaml
uv.lock		uv.lock

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Aragora

The problem

The wedge: a governance gate for AI-written code

Try it now

Core workflows

The load-bearing core

Product boundary

From the wedge to the full vision

Anatomy of a Decision Receipt

Find your path

Documentation

Security

Contributing & License

Postscript — The Full Vision

Proof ladder — how to verify every claim here

The thesis — Decision Integrity

The Five Pillars (product framing — docs/EXTENDED_README.md)

The long arc — Tool → Organization Substrate

The complete capability surface

Vertical specialists (docs/verticals/, 🔄 guides exist; packaged offerings 🔮)

Self-improvement — the Nomic Loop & the flywheel

The frontier — Agent Civilization Substrate (🔮 designed; gated, see below)

Discipline — capability checkpoints (CP-1..5)

Roadmap — priority-gated (docs/FEATURE_GAP_LIST.md, docs/CANONICAL_GOALS.md)

Focus strategy — depth over breadth (docs/FOCUS.md)

Honest current state (docs/HONEST_ASSESSMENT.md, docs/GA_CHECKLIST.md)

North stars (docs/CANONICAL_GOALS.md)

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages