A harness is for one agent. Rigging is for the fleet.
The typed, trust-bearing, schema-mediated coupling layer that composes heterogeneous harnessed agents into a single coherent system.
🌊 Live site & interactive demo 🧾 Cheatsheet 📖 Long-form essay 📐 Spec 📊 Benchmarks ❓ FAQ 📕 Glossary
Note
🌊 Want to see it work first? The live site has three interactive demos: a blame-chain explorer (pick a failure, watch the runtime extract the proximate cause), a contract negotiation animation (six steps; press ▶), and a cost-attribution simulator (drag the sliders; watch where the overrun lands).
First-time setup: the site auto-deploys via pages.yml,
but a maintainer has to trigger the workflow once. See DEPLOY.md — it's one command.
Tip
Curated by Betty Guo (Dongxin Guo) — PhD candidate in Computer Science at The University of Hong Kong, advised by Prof. Siu-Ming Yiu. Rigging is the reference implementation of a concept she has been developing through her PhD. See § Curator and citation for how to cite.
The interesting systems in 2026 do not have an agent. They have a planner, two coders that disagree, a reviewer that gates merges, a test-runner that is older and more boring than any of them, and a verifier whose entire job is to reject plans that try to do too much. Each is its own agent. Each has its own harness. And they have to behave like a single system.
That layer — the typed, signed, opinionated runtime that turns ad-hoc multi-agent glue into an auditable substrate — is rigging. This repository is its first reference implementation.
🪢 You can have a great harness on every agent and still have terrible rigging. If you are not the model, and you are not the harness, you are the rigging.
git clone https://github.com/bettyguo/rigging
cd rigging
# Install the workspace (Python 3.12+)
python -m pip install -e .
# Generate an Ed25519 identity (writes rig.key + rig.key.did)
RIG_PASS=hunter2 rig identity create --passphrase-env RIG_PASS
# Run the smallest example — planner delegates to worker, verifier audits
rig run 01-two-agent-handoff
# Inspect the resulting trace + blame chain in your terminal
rig trace inspect ./trace.jsonNo API keys. No network. Every example runs offline.
💡 Prefer to look first? The live site has an interactive blame-chain explorer and a contract-negotiation animation. Or pick up the one-page cheatsheet (also printable).
A rig refuses to live without exactly three things:
| What it is | What it refuses | |
|---|---|---|
| ① Signed agent cards | An Ed25519-signed JSON document declaring an agent's capabilities, input/output schemas, and cost model. | Routing against an unsigned, malformed, or schema-mismatched card. |
| ② Delegation contracts | A typed, signed bill of lading exchanged between two agents before any work crosses their boundary. Pins capability, budget, verifier, expiry. | Issuing contracts whose capabilities are undeclared, whose budgets are unbounded, or whose verifier is unreachable. |
| ③ Blame chains | An ordered DAG of signed envelopes recovered from any trace. Walk it backwards to find the proximate cause of any failure. | Adjudicating fault — but making the question mechanically answerable. |
Cards. Contracts. Blame. Everything else in a rig exists to keep those three primitives honest.
Today, every production multi-agent stack ships a function that looks roughly like this:
# the function every team writes, differently, incorrectly
trace_id = uuid()
contract = {"caller": "planner", "callee": callee}
try:
out = await callee.run(req)
except Exception:
out = await fallback.run(req) # silently swap identity
cost[caller] += out.cost # ¯\_(ツ)_/¯
return outWith Rigging:
from rigging.runtime import Rig
from rigging.adapters import LocalPythonAdapter
from rigging.identity import KeyPair
rig = Rig(name="my-system")
rig.register(planner, keypair=planner_key)
rig.register(worker, keypair=worker_key)
rig.register(quality, keypair=quality_key)
result = await rig.call(
caller=planner,
callee_did=worker.did,
capability="translate_pdf",
input={"uri": "s3://docs/contract.pdf", "target_language": "fr"},
cost_budget=("usd", "0.50"),
verifier=quality.did,
)What you get behind that one call:
- A signed contract from planner to worker.
- A schema check that the input matches the worker's declared
translate_pdfshape. - A per-contract budget the worker cannot exceed (and that does not leak into other contracts).
- A verifier sub-contract that audits the worker's output and signs its verdict.
- An OpenTelemetry-compatible trace with
rig.*attributes. - A typed exception —
VerifierRejected,BudgetOverrun,CalleeUnreachable,SignatureInvalid,ContractExpired, … — if anything goes wrong, with a blame chain that names the responsible agent.
🛑 No silent retries. No transparent fallback. No tribal knowledge.
When a multi-agent run fails, the trace contains an ordered chain of signed envelopes. Walk it backwards. The first envelope whose contents, if replaced by ground truth, would have prevented the failure — that envelope's signing key is the proximate cause.
$ rig trace inspect ./trace.json --highlight=blame
trace 01HXQK3Z… 14 signed envelopes
└── contract 01HXQK3Z (planner → worker · translate_pdf · budget=usd 0.50)
├── propose sig ✓
├── accept sig ✓
├── execute ← proximate cause sig ✓
│ output: {pages:0, language:"??"} schema_violation
├── verify (sub-contract)
│ verdict: reject · reason: schema_violation sig ✓
└── void reason: verifier_rejected
blame ▶ did:rig:9rT…qN2 (worker)
🌊 Try it interactively on the live site → Pick a failure mode (adversarial output, budget overrun, expired contract, forged signature) and watch the runtime produce the chain step by step.
Every layer in the agent stack was open-coded by every team before it had a name. Each named layer is two years older than the one above it. The next layer — typed coupling across trust domains — is due.
The model was named first. Then the tool surface. Then the agent loop. Then the agent-to-agent wire. The layer that comes next — the typed coupling across trust domains — needs a name too. That layer is rigging.
Synthesised from real practitioner conversations. Detail in
docs/case-studies.md.
|
Before: a planner's "resilience" fallback silently spawned 52 sibling agents on the same query. $8,400 in token spend before alerting fired. With rigging: the retry is a new signed contract; the sibling's budget is carved from the parent's allocation. |
Before: four reviewers from three vendors auto-merged a regression. The post-mortem took 6 engineers × 8 hours to attribute. With rigging: one trace, one blame chain. |
Before: "prove which agent made each decision under what budget" took 2 weeks of log-joining. With rigging: every decision is a signed contract, every output a signed envelope. The audit is a database query. |
The rig is, in this sense, the insurance product of an agentic stack: it does not prevent the storm, but it makes "what was damaged and who is responsible" answerable. The premium is the discipline of typed contracts and signed envelopes. The payout is every incident that used to take a day to attribute.
A rig is not a wire format. It uses wire formats. A rig is not a harness. It composes harnesses. A rig is not a supervisor. It sits one floor up.
| MCP | A2A | Harness | Supervisor (LangGraph, CrewAI) |
Rigging | |
|---|---|---|---|---|---|
| Tool wire format | ✓ | — | — | — | ↗ reuses |
| Agent-to-agent wire | — | ✓ | — | — | ↗ reuses |
| Single agent loop | — | — | ✓ | partial | — |
| Multi-agent routing | — | — | — | ✓ | ✓ typed |
| Signed capability advertisement | — | partial | — | — | ✓ |
| Typed delegation contract | — | — | — | — | ✓ |
| Per-contract budget enforcement | — | — | per agent | — | ✓ recursive |
| Verifier as first-class participant | — | — | — | convention | ✓ |
| Cross-agent blame extraction | — | — | — | — | ✓ |
| Refuses silent retries | n/a | n/a | policy | policy | ✓ structural |
A longer survey is at docs/related-work.md, with an entry per project (A2A, MCP, ACP, OASF, KYA, OpenHarness, LangGraph, CrewAI, AutoGen, loom-agent, Teradata loom).
Each runs offline. No API keys, no network. Each has its own README.md.
| Example | What it demonstrates | |
|---|---|---|
| 01 | 01_two_agent_handoff |
The minimum viable rig — planner delegates to worker, verifier audits. |
| 02 | 02_three_vendor_rig |
Heterogeneous composition — three vendors, one rig. |
| 03 | 03_adversarial_subagent |
Compositional reliability — verifier catches the bad worker; blame chain names it. |
| 04 | 04_cost_attribution |
A → B → C with explicit sub-budgets; C overruns; A's budget is inviolable. |
| 05 | 05_vote_ensemble |
Three verifiers; majority rules. Disagreement is a composition problem, not a runtime problem. |
| 06 | 06_recursive_verification |
A verifier's verdict is itself audited by a meta-verifier. Recursion is bounded by verification_recursion_cap. |
rig run 01-two-agent-handoff # short
rig run 02-three-vendor-rig
rig run 03-adversarial-subagent
rig run 04-cost-attribution
rig run 05-vote-ensemble
rig run 06-recursive-verification # new in v0.2An annotated walkthrough of all six with sequence diagrams and "the invariant exercised" callouts is at docs/EXAMPLES.md.
Cost is a property of a contract, not of an agent. B may subcontract to C only by carving a sub-budget from B's own allocation. C's overruns hit B's ledger; A's budget is inviolable.
The naïve "original caller pays for everything in the call graph" is the choice every prototype makes and every production system regrets. Once agent B can subcontract to C without A's awareness, A is on the hook for arbitrary downstream spending. This is the agentic version of letting a subcontractor put arbitrary charges on the general contractor's credit card. Rigging refuses it structurally. See ADR-0006.
The art of this layer is in what it refuses, not in what it provides.
1. refuses to route against unsigned cards.
2. refuses to issue contracts whose capabilities are undeclared.
3. refuses to retry silently.
4. refuses to attribute cost to anyone other than the contract holder.
5. refuses to admit unverified output as verified.
6. refuses to be a marketplace, a scheduler, a router, or a harness.
Five small packages, one CLI, one-direction dependency graph. Provider-agnostic core. All LLM/MCP code lives only in adapters.
graph LR
CORE[rigging-core<br/>schemas · protocols · errors]
IDENT[rigging-identity<br/>Ed25519 · JCS · JWS · DIDs]
TRACE[rigging-trace<br/>OTel processor · blame extractor]
ADAPT[rigging-adapters<br/>LiteLLM · MCP · Local]
RUN[rigging-runtime<br/>Rig orchestrator · state machine]
CLI[rig CLI<br/>identity · run · trace · bench · doctor · card · contract]
IDENT --> CORE
TRACE --> CORE
RUN --> CORE
RUN --> IDENT
RUN --> TRACE
ADAPT --> CORE
ADAPT --> IDENT
CLI --> RUN
The full architecture, sequence diagram, state machine, and trust-boundary discussion is at docs/architecture.md.
Terminal states are fulfilled, rejected, and voided. There is no edge labelled "silently retry" — by design. See ADR-0009.
A five-axis benchmark the project is scored against — honestly. We do not claim 100% across the board, and we name the gaps.
| Axis | Score | Notes |
|---|---|---|
| Capability-advertisement fidelity | 0.50 | Floor is structural — half the probes go to a dishonest agent. |
| Delegation-contract expressiveness | 1.00 | Handoff, voting ensemble, recursive subcontracting, conditional delegation: all expressible. |
| Identity propagation | 0.85 | Spoofing, tampering, wrong-key covered. Revocation is v1. |
| Cost-attribution accuracy | 1.00 | Zero L1 error on the synthetic chain. |
| Blame-resolution correctness | 0.70 | Leaf attribution solid. Planner-misroutes and verifier-itself-wrong are v1. |
| Overall | 0.81 | We do not claim higher than the suite honestly supports. |
rig bench # smoke (under a minute)
rig bench --full # comprehensiveFull report: benchmarks/results/v0-reference.md.
Methodology: docs/benchmarks/rigging-completeness-matrix.md.
A single entry point. Every subcommand is read-only or local-only — nothing it does requires network or credentials.
rig --help # discoverable surface
rig doctor # audit env: Python, deps, packages, repo
rig examples # list all 6 built-in examples
rig version # show installed rigging version
rig identity create --passphrase-env RIG_PASS
rig identity show ./rig.key --passphrase-env RIG_PASS
rig identity verify ./agent.json
rig card show ./agent-card.json # pretty-print + verify signature
rig contract show ./contract.json # pretty-print a signed contract
rig spec validate ./agent-card.json # validate against the v0 schema
rig run 06-recursive-verification # run any of the six examples
rig trace inspect ./trace.json # pretty trace + blame chain
rig bench # smoke benchmark (< 1 min)
rig bench --full # full Rigging-Bench v0rig doctor is the friendliest place to start: a single read-only command that surfaces every health check (Python ≥ 3.12, every dependency version, every rig package importable, every expected repo file present) and exits with the number of failures.
rigging/
├── CONCEPT.md # The seminal essay (~2k words)
├── README.md # You are here
├── site/ # GitHub Pages source (live demo + cheatsheet)
├── assets/ # SVG hero, diagrams, brand
├── docs/
│ ├── architecture.md # Package graph + per-call sequence + state machine
│ ├── related-work.md # MCP · A2A · ACP · OASF · LangGraph · CrewAI · …
│ ├── EXAMPLES.md # Annotated walkthrough of the six examples
│ ├── case-studies.md # Three real-world failure modes
│ ├── glossary.md # The vocabulary
│ ├── FAQ.md # The questions we get every week
│ ├── roadmap.md # What's in v0, what's in v1
│ ├── spec/ # v0 specs: identity, agent-card, contract, trace
│ ├── adr/ # 10 architecture decision records
│ └── benchmarks/ # Methodology of the Rigging Completeness Matrix
├── packages/
│ ├── rigging-core/ # Schemas · protocols · errors
│ ├── rigging-identity/ # Ed25519 · JCS · JWS · signed cards
│ ├── rigging-trace/ # OTel processor · blame-chain extractor
│ ├── rigging-adapters/ # Local · LiteLLM · MCP
│ └── rigging-runtime/ # The Rig orchestrator + CLI
├── examples/ # 01..06 runnable examples (offline, no API keys)
├── benchmarks/rig_bench/ # Rigging-Bench v0
└── tests/ # 97 tests · unit · integration · property (hypothesis)
The dependency graph between packages is one-direction. Adapters never import from runtime.
v0 (this release): three primitives (cards, contracts, blame), four specs, ten ADRs, six runnable examples, the five-axis benchmark, the rig CLI (incl. doctor, card show, contract show), the interactive live site, and the printable cheatsheet.
v1 (the immediate horizon):
- Mid-chain blame attribution (planner-misroutes, verifier-itself-wrong, recursive verification).
- Card revocation (without forcing key rotation).
- KMS-backed signing.
- A real
rigging-vizweb visualizer (separate package). - One real-world harness adapter (LangGraph or AutoGen or Goose — picking based on community pull).
- TLA+ model of the contract-negotiation protocol; liveness and safety checked.
- A2A-native transport for cross-process rigs.
Full roadmap: docs/roadmap.md. Issues tagged v1 are open.
The short version is below; the full FAQ lives at docs/FAQ.md. The vocabulary is at docs/glossary.md.
Is this a new wire protocol?
No. Rigging sits above MCP and A2A. Tool calls inside a harness flow over MCP; the contract flows over A2A (or any equivalent); the trace flows over OpenTelemetry. A rig that invents a new wire format is, by our definition, doing it wrong.
Is the verifier privileged?
No — and we tried that first. A verifier is just an agent whose card declares a
verify capability. The runtime invariants apply to it uniformly. Disagreement becomes a composition problem (vote, recurse), not a runtime problem. See ADR-0007.
Why refuse silent retries?
A retry against a fresh agent produces output signed by an identity the caller did not address. The trace shows A → B but the work was done by B′. Blame analysis terminates in a contradiction. Retries are first-class events, with their own contracts and their own identifiers.
Why Ed25519 and not OIDC/OAuth?
v0 only needs to answer "is this card real and unchanged?" Long-lived per-agent Ed25519 keys solve that with no infrastructure. OAuth/OIDC is a v1 conversation.
Does the rig know about LLMs?
No.
rigging-core and rigging-runtime contain zero LLM-specific code. All provider concerns live under rigging-adapters.
How do you stop one caller from putting charges on another caller's card?
Cost is a property of a contract, not of an agent. B may subcontract to C only by carving a sub-budget from its own allocation. C's overruns hit B's ledger; A's budget is inviolable. See ADR-0006.
Will you ship a web dashboard?
Not in v0. The TUI is sufficient, and the live site is the visual demo. A dedicated
rigging-viz is on the v1 roadmap.
This repository is curated by Betty Guo (Dongxin Guo), PhD candidate in Computer Science at The University of Hong Kong, advised by Prof. Siu-Ming Yiu. Her research interests sit at the intersection of trust-bearing infrastructure for multi-agent systems, applied cryptography, and the systems substrate beneath modern AI agents. Rigging is the reference implementation of a concept she has been developing through her PhD.
Cite this work (BibTeX):
@software{guo2026rigging,
author = {Guo, Dongxin},
title = {Rigging: typed, trust-bearing coupling for harnessed agents},
year = {2026},
url = {https://github.com/bettyguo/rigging},
note = {Reference implementation, v0}
}If you use Rigging in academic work, link the live site (bettyguo.github.io/rigging) and the v0 reference benchmarks (benchmarks/results/v0-reference.md) for reproducibility. The four normative specs under docs/spec/ are the citable artefacts; the implementation in packages/ is the worked example.
| Reach out | Link |
|---|---|
| Homepage | https://bettyguo.github.io |
| GitHub | @bettyguo |
| Affiliation | HKU, Department of Computer Science |
| Advisor | Prof. Siu-Ming Yiu |
We welcome PRs. Especially welcome: adversarial scenarios for the benchmark, real-world harness adapters, and corrections to the spec.
- First time? Read
CONTRIBUTING.md. It is short. - Code of conduct:
CODE_OF_CONDUCT.md(Contributor Covenant v2.1). - Reporting vulnerabilities:
SECURITY.md. Do not file public issues for security reports. - Disagree with a design call? Write an ADR-style counter-proposal and PR it under
docs/adr/. We will engage.
# Hack on the runtime
python -m pip install -e ".[dev]"
pytest tests/ -q # 97 tests, ~3 seconds
ruff check . && mypy packages/ # lint + types
rig doctor # quick local environment auditThe site at site/ auto-deploys to GitHub Pages on every push to main via .github/workflows/pages.yml. The workflow uses enablement: true so it auto-creates the Pages site on its first successful run — no Settings step required.
First time: run the pages workflow once. Either way works:
# via the GitHub CLI
gh workflow run pages.yml --repo bettyguo/rigging
# OR via the GitHub UI: Repo → Actions → "pages" → Run workflowThen wait ~60–90 seconds. The site comes up at https://bettyguo.github.io/rigging/.
Local preview (pure static — no build):
cd site && python -m http.server 8080 # then open http://localhost:8080Auditing every internal link (also runs in CI):
python scripts/audit_links.py --strictFull troubleshooting at DEPLOY.md.
The English word rigged has fraudulent connotations. We do not. Throughout this project, rigging refers to its maritime sense: the load-bearing web of ropes, blocks, and lines on a sailing ship. The sails do not move the ship. The hull does not move the ship. The rigging does.
A rig refuses to route against unsigned cards. A rig refuses to issue contracts whose capabilities are undeclared. A rig refuses to retry silently. A rig refuses to attribute cost to anyone other than the contract holder. A rig refuses to admit unverified output as verified. A rig refuses to be a marketplace, a scheduler, a router, or a harness.
The art of this layer is in what it refuses, not in what it provides.
If this project moves the conversation forward, please star the repo. It is the single highest-signal vote a researcher or maintainer can cast.
Apache 2.0 · © The Rigging Authors · Built for skeptical practitioners and ICLR/NeurIPS reviewers alike.
Curated by Betty Guo (Dongxin Guo) · PhD candidate, HKU Department of Computer Science · advised by Prof. Siu-Ming Yiu.
🌊 Live site · 🧾 Cheatsheet · 📖 CONCEPT.md · 📐 Spec · 📕 Glossary · 📊 Benchmarks · Issues