rigging

A harness is for one agent. Rigging is for the fleet.

The typed, trust-bearing, schema-mediated coupling layer that composes heterogeneous harnessed agents into a single coherent system.

🌊 Live site & interactive demo 🧾 Cheatsheet 📖 Long-form essay 📐 Spec 📊 Benchmarks ❓ FAQ 📕 Glossary

Note

🌊 Want to see it work first? The live site has three interactive demos: a blame-chain explorer (pick a failure, watch the runtime extract the proximate cause), a contract negotiation animation (six steps; press ▶), and a cost-attribution simulator (drag the sliders; watch where the overrun lands).

First-time setup: the site auto-deploys via pages.yml, but a maintainer has to trigger the workflow once. See DEPLOY.md — it's one command.

Tip

Curated by Betty Guo (Dongxin Guo) — PhD candidate in Computer Science at The University of Hong Kong, advised by Prof. Siu-Ming Yiu. Rigging is the reference implementation of a concept she has been developing through her PhD. See § Curator and citation for how to cite.

🚀 The thirty-second pitch	🔍 Walk a real blame chain
⚡ 60-second quickstart	📅 Why now?
① Cards · ② Contracts · ③ Blame	💼 Use cases
🧑‍💻 What it looks like	⚖️ Rigging vs.
📚 Six examples	🏗 Architecture
💰 Cost attribution	📊 Rigging-Bench v0
🚫 The six refusals	⌨️ The `rig` CLI
📁 Repo layout	🗺 Roadmap
❓ FAQ	🎓 Curator and citation
🤝 Contributing	🚢 Deploy live site

The thirty-second pitch

The interesting systems in 2026 do not have an agent. They have a planner, two coders that disagree, a reviewer that gates merges, a test-runner that is older and more boring than any of them, and a verifier whose entire job is to reject plans that try to do too much. Each is its own agent. Each has its own harness. And they have to behave like a single system.

That layer — the typed, signed, opinionated runtime that turns ad-hoc multi-agent glue into an auditable substrate — is rigging. This repository is its first reference implementation.

🪢 You can have a great harness on every agent and still have terrible rigging. If you are not the model, and you are not the harness, you are the rigging.

60-second quickstart

git clone https://github.com/bettyguo/rigging
cd rigging

# Install the workspace (Python 3.12+)
python -m pip install -e .

# Generate an Ed25519 identity (writes rig.key + rig.key.did)
RIG_PASS=hunter2 rig identity create --passphrase-env RIG_PASS

# Run the smallest example — planner delegates to worker, verifier audits
rig run 01-two-agent-handoff

# Inspect the resulting trace + blame chain in your terminal
rig trace inspect ./trace.json

No API keys. No network. Every example runs offline.

💡 Prefer to look first? The live site has an interactive blame-chain explorer and a contract-negotiation animation. Or pick up the one-page cheatsheet (also printable).

The three primitives

A rig refuses to live without exactly three things:

	What it is	What it refuses
① Signed agent cards	An Ed25519-signed JSON document declaring an agent's capabilities, input/output schemas, and cost model.	Routing against an unsigned, malformed, or schema-mismatched card.
② Delegation contracts	A typed, signed bill of lading exchanged between two agents before any work crosses their boundary. Pins capability, budget, verifier, expiry.	Issuing contracts whose capabilities are undeclared, whose budgets are unbounded, or whose verifier is unreachable.
③ Blame chains	An ordered DAG of signed envelopes recovered from any trace. Walk it backwards to find the proximate cause of any failure.	Adjudicating fault — but making the question mechanically answerable.

Cards. Contracts. Blame. Everything else in a rig exists to keep those three primitives honest.

What it looks like in practice

Today, every production multi-agent stack ships a function that looks roughly like this:

# the function every team writes, differently, incorrectly
trace_id = uuid()
contract = {"caller": "planner", "callee": callee}
try:
    out = await callee.run(req)
except Exception:
    out = await fallback.run(req)        # silently swap identity
cost[caller] += out.cost                 # ¯\_(ツ)_/¯
return out

With Rigging:

from rigging.runtime import Rig
from rigging.adapters import LocalPythonAdapter
from rigging.identity import KeyPair

rig = Rig(name="my-system")
rig.register(planner, keypair=planner_key)
rig.register(worker,  keypair=worker_key)
rig.register(quality, keypair=quality_key)

result = await rig.call(
    caller=planner,
    callee_did=worker.did,
    capability="translate_pdf",
    input={"uri": "s3://docs/contract.pdf", "target_language": "fr"},
    cost_budget=("usd", "0.50"),
    verifier=quality.did,
)

What you get behind that one call:

A signed contract from planner to worker.
A schema check that the input matches the worker's declared translate_pdf shape.
A per-contract budget the worker cannot exceed (and that does not leak into other contracts).
A verifier sub-contract that audits the worker's output and signs its verdict.
An OpenTelemetry-compatible trace with rig.* attributes.
A typed exception — VerifierRejected, BudgetOverrun, CalleeUnreachable, SignatureInvalid, ContractExpired, … — if anything goes wrong, with a blame chain that names the responsible agent.

🛑 No silent retries. No transparent fallback. No tribal knowledge.

Walk a real blame chain

When a multi-agent run fails, the trace contains an ordered chain of signed envelopes. Walk it backwards. The first envelope whose contents, if replaced by ground truth, would have prevented the failure — that envelope's signing key is the proximate cause.

$ rig trace inspect ./trace.json --highlight=blame

trace 01HXQK3Z…                                14 signed envelopes
└── contract 01HXQK3Z (planner → worker · translate_pdf · budget=usd 0.50)
    ├── propose                                                    sig ✓
    ├── accept                                                     sig ✓
    ├── execute  ← proximate cause                                 sig ✓
    │   output: {pages:0, language:"??"}    schema_violation
    ├── verify (sub-contract)
    │   verdict: reject · reason: schema_violation                 sig ✓
    └── void   reason: verifier_rejected

blame ▶ did:rig:9rT…qN2    (worker)

🌊 Try it interactively on the live site → Pick a failure mode (adversarial output, budget overrun, expired contract, forged signature) and watch the runtime produce the chain step by step.

Why now?

Every layer in the agent stack was open-coded by every team before it had a name. Each named layer is two years older than the one above it. The next layer — typed coupling across trust domains — is due.

The model was named first. Then the tool surface. Then the agent loop. Then the agent-to-agent wire. The layer that comes next — the typed coupling across trust domains — needs a name too. That layer is rigging.

Use cases — where a rig saves your weekend

Synthesised from real practitioner conversations. Detail in docs/case-studies.md.

💸 The runaway subcontractor

Before: a planner's "resilience" fallback silently spawned 52 sibling agents on the same query. $8,400 in token spend before alerting fired.

With rigging: the retry is a new signed contract; the sibling's budget is carved from the parent's allocation. BudgetOverrun hits at $0.50.

🌙 3 AM "which agent broke it"

Before: four reviewers from three vendors auto-merged a regression. The post-mortem took 6 engineers × 8 hours to attribute.

With rigging: one trace, one blame chain. rig trace inspect names the scanner whose verdict was wrong, with its signed envelope as the proof.

📜 The compliance audit

Before: "prove which agent made each decision under what budget" took 2 weeks of log-joining.

With rigging: every decision is a signed contract, every output a signed envelope. The audit is a database query.

The rig is, in this sense, the insurance product of an agentic stack: it does not prevent the storm, but it makes "what was damaged and who is responsible" answerable. The premium is the discipline of typed contracts and signed envelopes. The payout is every incident that used to take a day to attribute.

Rigging vs MCP, A2A, harnesses, supervisors

A rig is not a wire format. It uses wire formats. A rig is not a harness. It composes harnesses. A rig is not a supervisor. It sits one floor up.

	MCP	A2A	Harness	Supervisor _{(LangGraph, CrewAI)}	Rigging
Tool wire format	✓	—	—	—	↗ reuses
Agent-to-agent wire	—	✓	—	—	↗ reuses
Single agent loop	—	—	✓	partial	—
Multi-agent routing	—	—	—	✓	✓ typed
Signed capability advertisement	—	partial	—	—	✓
Typed delegation contract	—	—	—	—	✓
Per-contract budget enforcement	—	—	per agent	—	✓ recursive
Verifier as first-class participant	—	—	—	convention	✓
Cross-agent blame extraction	—	—	—	—	✓
Refuses silent retries	n/a	n/a	policy	policy	✓ structural

A longer survey is at docs/related-work.md, with an entry per project (A2A, MCP, ACP, OASF, KYA, OpenHarness, LangGraph, CrewAI, AutoGen, loom-agent, Teradata loom).

The six runnable examples

Each runs offline. No API keys, no network. Each has its own README.md.

	Example	What it demonstrates
`01`	`01_two_agent_handoff`	The minimum viable rig — planner delegates to worker, verifier audits.
`02`	`02_three_vendor_rig`	Heterogeneous composition — three vendors, one rig.
`03`	`03_adversarial_subagent`	Compositional reliability — verifier catches the bad worker; blame chain names it.
`04`	`04_cost_attribution`	A → B → C with explicit sub-budgets; C overruns; A's budget is inviolable.
`05`	`05_vote_ensemble`	Three verifiers; majority rules. Disagreement is a composition problem, not a runtime problem.
`06`	`06_recursive_verification`	A verifier's verdict is itself audited by a meta-verifier. Recursion is bounded by `verification_recursion_cap`.

rig run 01-two-agent-handoff       # short
rig run 02-three-vendor-rig
rig run 03-adversarial-subagent
rig run 04-cost-attribution
rig run 05-vote-ensemble
rig run 06-recursive-verification  # new in v0.2

An annotated walkthrough of all six with sequence diagrams and "the invariant exercised" callouts is at docs/EXAMPLES.md.

Cost attribution that actually attributes

Cost is a property of a contract, not of an agent. B may subcontract to C only by carving a sub-budget from B's own allocation. C's overruns hit B's ledger; A's budget is inviolable.

The naïve "original caller pays for everything in the call graph" is the choice every prototype makes and every production system regrets. Once agent B can subcontract to C without A's awareness, A is on the hook for arbitrary downstream spending. This is the agentic version of letting a subcontractor put arbitrary charges on the general contractor's credit card. Rigging refuses it structurally. See ADR-0006.

The six refusals

The art of this layer is in what it refuses, not in what it provides.

1. refuses to route against unsigned cards.
2. refuses to issue contracts whose capabilities are undeclared.
3. refuses to retry silently.
4. refuses to attribute cost to anyone other than the contract holder.
5. refuses to admit unverified output as verified.
6. refuses to be a marketplace, a scheduler, a router, or a harness.

Architecture

Five small packages, one CLI, one-direction dependency graph. Provider-agnostic core. All LLM/MCP code lives only in adapters.

graph LR
    CORE[rigging-core<br/>schemas · protocols · errors]
    IDENT[rigging-identity<br/>Ed25519 · JCS · JWS · DIDs]
    TRACE[rigging-trace<br/>OTel processor · blame extractor]
    ADAPT[rigging-adapters<br/>LiteLLM · MCP · Local]
    RUN[rigging-runtime<br/>Rig orchestrator · state machine]
    CLI[rig CLI<br/>identity · run · trace · bench · doctor · card · contract]

    IDENT --> CORE
    TRACE --> CORE
    RUN --> CORE
    RUN --> IDENT
    RUN --> TRACE
    ADAPT --> CORE
    ADAPT --> IDENT
    CLI --> RUN

The full architecture, sequence diagram, state machine, and trust-boundary discussion is at docs/architecture.md.

The contract state machine

Terminal states are fulfilled, rejected, and voided. There is no edge labelled "silently retry" — by design. See ADR-0009.

Rigging-Bench v0

A five-axis benchmark the project is scored against — honestly. We do not claim 100% across the board, and we name the gaps.

Axis	Score	Notes
Capability-advertisement fidelity	0.50	Floor is structural — half the probes go to a dishonest agent.
Delegation-contract expressiveness	1.00	Handoff, voting ensemble, recursive subcontracting, conditional delegation: all expressible.
Identity propagation	0.85	Spoofing, tampering, wrong-key covered. Revocation is v1.
Cost-attribution accuracy	1.00	Zero L1 error on the synthetic chain.
Blame-resolution correctness	0.70	Leaf attribution solid. Planner-misroutes and verifier-itself-wrong are v1.
Overall	0.81	We do not claim higher than the suite honestly supports.

rig bench           # smoke (under a minute)
rig bench --full    # comprehensive

Full report: benchmarks/results/v0-reference.md. Methodology: docs/benchmarks/rigging-completeness-matrix.md.

The `rig` CLI

A single entry point. Every subcommand is read-only or local-only — nothing it does requires network or credentials.

rig --help                                # discoverable surface
rig doctor                                # audit env: Python, deps, packages, repo
rig examples                              # list all 6 built-in examples
rig version                               # show installed rigging version

rig identity create --passphrase-env RIG_PASS
rig identity show ./rig.key --passphrase-env RIG_PASS
rig identity verify ./agent.json

rig card show ./agent-card.json           # pretty-print + verify signature
rig contract show ./contract.json         # pretty-print a signed contract
rig spec validate ./agent-card.json       # validate against the v0 schema

rig run 06-recursive-verification         # run any of the six examples
rig trace inspect ./trace.json            # pretty trace + blame chain
rig bench                                 # smoke benchmark (< 1 min)
rig bench --full                          # full Rigging-Bench v0

rig doctor is the friendliest place to start: a single read-only command that surfaces every health check (Python ≥ 3.12, every dependency version, every rig package importable, every expected repo file present) and exits with the number of failures.

Repository layout

rigging/
├── CONCEPT.md                # The seminal essay (~2k words)
├── README.md                 # You are here
├── site/                     # GitHub Pages source (live demo + cheatsheet)
├── assets/                   # SVG hero, diagrams, brand
├── docs/
│   ├── architecture.md       # Package graph + per-call sequence + state machine
│   ├── related-work.md       # MCP · A2A · ACP · OASF · LangGraph · CrewAI · …
│   ├── EXAMPLES.md           # Annotated walkthrough of the six examples
│   ├── case-studies.md       # Three real-world failure modes
│   ├── glossary.md           # The vocabulary
│   ├── FAQ.md                # The questions we get every week
│   ├── roadmap.md            # What's in v0, what's in v1
│   ├── spec/                 # v0 specs: identity, agent-card, contract, trace
│   ├── adr/                  # 10 architecture decision records
│   └── benchmarks/           # Methodology of the Rigging Completeness Matrix
├── packages/
│   ├── rigging-core/         # Schemas · protocols · errors
│   ├── rigging-identity/     # Ed25519 · JCS · JWS · signed cards
│   ├── rigging-trace/        # OTel processor · blame-chain extractor
│   ├── rigging-adapters/     # Local · LiteLLM · MCP
│   └── rigging-runtime/      # The Rig orchestrator + CLI
├── examples/                 # 01..06 runnable examples (offline, no API keys)
├── benchmarks/rig_bench/     # Rigging-Bench v0
└── tests/                    # 97 tests · unit · integration · property (hypothesis)

The dependency graph between packages is one-direction. Adapters never import from runtime.

Roadmap

v0 (this release): three primitives (cards, contracts, blame), four specs, ten ADRs, six runnable examples, the five-axis benchmark, the rig CLI (incl. doctor, card show, contract show), the interactive live site, and the printable cheatsheet.

v1 (the immediate horizon):

Mid-chain blame attribution (planner-misroutes, verifier-itself-wrong, recursive verification).
Card revocation (without forcing key rotation).
KMS-backed signing.
A real rigging-viz web visualizer (separate package).
One real-world harness adapter (LangGraph or AutoGen or Goose — picking based on community pull).
TLA+ model of the contract-negotiation protocol; liveness and safety checked.
A2A-native transport for cross-process rigs.

Full roadmap: docs/roadmap.md. Issues tagged v1 are open.

FAQ

The short version is below; the full FAQ lives at docs/FAQ.md. The vocabulary is at docs/glossary.md.

Is this a new wire protocol?

No. Rigging sits above MCP and A2A. Tool calls inside a harness flow over MCP; the contract flows over A2A (or any equivalent); the trace flows over OpenTelemetry. A rig that invents a new wire format is, by our definition, doing it wrong.

Is the verifier privileged?

No — and we tried that first. A verifier is just an agent whose card declares a verify capability. The runtime invariants apply to it uniformly. Disagreement becomes a composition problem (vote, recurse), not a runtime problem. See ADR-0007.

Why refuse silent retries?

A retry against a fresh agent produces output signed by an identity the caller did not address. The trace shows A → B but the work was done by B′. Blame analysis terminates in a contradiction. Retries are first-class events, with their own contracts and their own identifiers.

Why Ed25519 and not OIDC/OAuth?

v0 only needs to answer "is this card real and unchanged?" Long-lived per-agent Ed25519 keys solve that with no infrastructure. OAuth/OIDC is a v1 conversation.

Does the rig know about LLMs?

No. rigging-core and rigging-runtime contain zero LLM-specific code. All provider concerns live under rigging-adapters.

How do you stop one caller from putting charges on another caller's card?

Cost is a property of a contract, not of an agent. B may subcontract to C only by carving a sub-budget from its own allocation. C's overruns hit B's ledger; A's budget is inviolable. See ADR-0006.

Will you ship a web dashboard?

Not in v0. The TUI is sufficient, and the live site is the visual demo. A dedicated rigging-viz is on the v1 roadmap.

Curator and citation

This repository is curated by Betty Guo (Dongxin Guo), PhD candidate in Computer Science at The University of Hong Kong, advised by Prof. Siu-Ming Yiu. Her research interests sit at the intersection of trust-bearing infrastructure for multi-agent systems, applied cryptography, and the systems substrate beneath modern AI agents. Rigging is the reference implementation of a concept she has been developing through her PhD.

Cite this work (BibTeX):

@software{guo2026rigging,
  author = {Guo, Dongxin},
  title  = {Rigging: typed, trust-bearing coupling for harnessed agents},
  year   = {2026},
  url    = {https://github.com/bettyguo/rigging},
  note   = {Reference implementation, v0}
}

If you use Rigging in academic work, link the live site (bettyguo.github.io/rigging) and the v0 reference benchmarks (benchmarks/results/v0-reference.md) for reproducibility. The four normative specs under docs/spec/ are the citable artefacts; the implementation in packages/ is the worked example.

Reach out	Link
Homepage	https://bettyguo.github.io
GitHub	@bettyguo
Affiliation	HKU, Department of Computer Science
Advisor	Prof. Siu-Ming Yiu

Contributing

We welcome PRs. Especially welcome: adversarial scenarios for the benchmark, real-world harness adapters, and corrections to the spec.

First time? Read CONTRIBUTING.md. It is short.
Code of conduct: CODE_OF_CONDUCT.md (Contributor Covenant v2.1).
Reporting vulnerabilities: SECURITY.md. Do not file public issues for security reports.
Disagree with a design call? Write an ADR-style counter-proposal and PR it under docs/adr/. We will engage.

# Hack on the runtime
python -m pip install -e ".[dev]"
pytest tests/ -q                # 97 tests, ~3 seconds
ruff check . && mypy packages/  # lint + types
rig doctor                      # quick local environment audit

Deploying the live site

The site at site/ auto-deploys to GitHub Pages on every push to main via .github/workflows/pages.yml. The workflow uses enablement: true so it auto-creates the Pages site on its first successful run — no Settings step required.

First time: run the pages workflow once. Either way works:

# via the GitHub CLI
gh workflow run pages.yml --repo bettyguo/rigging

# OR via the GitHub UI:  Repo → Actions → "pages" → Run workflow

Then wait ~60–90 seconds. The site comes up at https://bettyguo.github.io/rigging/.

Local preview (pure static — no build):

cd site && python -m http.server 8080  # then open http://localhost:8080

Auditing every internal link (also runs in CI):

python scripts/audit_links.py --strict

Full troubleshooting at DEPLOY.md.

A note on the name

The English word rigged has fraudulent connotations. We do not. Throughout this project, rigging refers to its maritime sense: the load-bearing web of ropes, blocks, and lines on a sailing ship. The sails do not move the ship. The hull does not move the ship. The rigging does.

A rig refuses to route against unsigned cards. A rig refuses to issue contracts whose capabilities are undeclared. A rig refuses to retry silently. A rig refuses to attribute cost to anyone other than the contract holder. A rig refuses to admit unverified output as verified. A rig refuses to be a marketplace, a scheduler, a router, or a harness.

The art of this layer is in what it refuses, not in what it provides.

⭐ Star history

If this project moves the conversation forward, please star the repo. It is the single highest-signal vote a researcher or maintainer can cast.

_{Curated by Betty Guo (Dongxin Guo) ·
PhD candidate, HKU Department of Computer Science ·
advised by Prof. Siu-Ming Yiu.}

_{🌊 Live site · 🧾 Cheatsheet · 📖 CONCEPT.md · 📐 Spec · 📕 Glossary · 📊 Benchmarks · Issues}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rigging

Table of contents

The thirty-second pitch

60-second quickstart

The three primitives

What it looks like in practice

Walk a real blame chain

Why now?

Use cases — where a rig saves your weekend

💸 The runaway subcontractor

🌙 3 AM "which agent broke it"

📜 The compliance audit

Rigging vs MCP, A2A, harnesses, supervisors

The six runnable examples

Cost attribution that actually attributes

The six refusals

Architecture

The contract state machine

Rigging-Bench v0

The `rig` CLI

Repository layout

Roadmap

FAQ

Curator and citation

Contributing

Deploying the live site

A note on the name

⭐ Star history

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
assets		assets
benchmarks		benchmarks
docs		docs
examples		examples
packages		packages
scripts		scripts
site		site
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONCEPT.md		CONCEPT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOY.md		DEPLOY.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

rigging

Table of contents

The thirty-second pitch

60-second quickstart

The three primitives

What it looks like in practice

Walk a real blame chain

Why now?

Use cases — where a rig saves your weekend

💸 The runaway subcontractor

🌙 3 AM "which agent broke it"

📜 The compliance audit

Rigging vs MCP, A2A, harnesses, supervisors

The six runnable examples

Cost attribution that actually attributes

The six refusals

Architecture

The contract state machine

Rigging-Bench v0

The rig CLI

Repository layout

Roadmap

FAQ

Curator and citation

Contributing

Deploying the live site

A note on the name

⭐ Star history

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The `rig` CLI

Packages