AgentKeys has 15+ specification documents but zero code. The implementation is broken into 8 stages (Stage 0–7) in v2/plans/development-stages.md with 105 tests + 6 E2E flows, estimated at 25–34 human-days. The user wants to execute this using the Anthropic harness pattern (initializer → coding loop → evaluator) combined with OMC orchestration tools (ralph for persistence loops, team for parallel stages, ultraqa for E2E cycling).
Key Anthropic principles applied:
progress.json+features.json+init.sh+stage-N-done.shas machine-readable handoff artifacts- Git commits per deliverable + stage-completion tags for resumability
- Generator-evaluator separation (agent writes code,
stage-N-done.shevaluates — no self-grading) - One feature group per ralph story, committed atomically
Tool mapping: /ralph = the coding loop (persist until PRD passes). /team = parallel agents for independent stages. /ultraqa = QA cycling for final E2E.
mkdir -p ~/Projects/agentkeys
cd ~/Projects/agentkeys
git init
# Copy spec docs so agents have them without leaving the repo
mkdir -p docs/plan docs/research/aiosandbox
cp ~/Projects/project-life/projects/idea/agentkeys/v2/*.md docs/spec/
cp ~/Projects/project-life/projects/idea/agentkeys/v2/plans/*.md docs/plan/
cp ~/Projects/project-life/projects/idea/agentkeys/v2/aiosandbox/*.md docs/research/aiosandbox/
git add -A && git commit -m "docs: seed spec documents from project-life"Then create CLAUDE.md in the repo root encoding the harness workflow (read progress.json → run init.sh → pick feature → implement → test → commit → update progress). The plan agent produced the full content for this file.
Invoke:
/oh-my-claudecode:ralph "Implement Stage 0 per docs/archived/development-stages-v2-2026-04.md: create Cargo workspace skeleton (7 crates), harness artifacts (init.sh, progress.json, features.json, stage-0-done.sh), agentkeys-types crate (all types from docs/spec/credential-backend-interface.md), agentkeys-core crate (CredentialBackend trait with 15 methods, PaymentRail trait, canonical CBOR serialization, OTP derivation, test vectors). 8 tests must pass. Tag stage-0-done when done."
Deliverables: Cargo workspace compiles, 8 tests pass, harness artifacts exist, bash harness/stage-0-done.sh exits 0.
Advance: bash harness/advance-stage.sh 0 1
The largest stage: 37 tests, 10 stories. Ralph loops through them.
Invoke:
/oh-my-claudecode:ralph "Implement Stage 1 per docs/archived/development-stages-v2-2026-04.md: agentkeys-mock-server (axum + rusqlite) with 7 SQLite tables, 15 REST endpoints implementing every CredentialBackend method, identity linking, master key custody, TTL/single-use enforcement, MockHttpClient connection. 37 tests must pass. See docs/archived/eng-review-test-plan.md for the full test matrix including property tests (pair-code collision, nonce uniqueness) and integrity tests (tamper detection, OTP replay). Tag stage-1-done when done."
Deliverables: Mock server starts on port 8090, all 37 tests pass, curl smoke test works, bash harness/stage-1-done.sh exits 0.
Advance: bash harness/advance-stage.sh 1 2
The one parallelization opportunity. Stages 2 (CLI, 14 tests) and 3 (Daemon+MCP, 13 tests) touch entirely different crates — zero merge conflicts.
Invoke:
/oh-my-claudecode:team 2:executor "Two parallel stages for AgentKeys. AGENT 1: Implement Stage 2 (CLI Core) per docs/archived/development-stages-v2-2026-04.md — 10 CLI commands in agentkeys-cli, 14 tests, keyring session storage, error messaging spec, --help with examples. AGENT 2: Implement Stage 3 (Daemon + MCP) per docs/archived/development-stages-v2-2026-04.md — agentkeys-daemon binary with MCP tools (get_credential, list_credentials), kernel hardening (memfd_secret, seccomp, caps), 13 tests. Use AGENTKEYS_SESSION env var as test seam (NOT the production bootstrap). Both agents: read harness/progress.json first, commit per deliverable, tag stage-N-done when complete."
Deliverables: Both stage-2-done.sh and stage-3-done.sh exit 0. cargo test --workspace passes all 72 tests (8+37+14+13).
Advance: bash harness/advance-stage.sh 3 4
The cross-component integration stage. Modifies both daemon (pair-on-startup) and CLI (agentkeys approve).
Invoke:
/oh-my-claudecode:ralph "Implement Stage 4 per docs/archived/development-stages-v2-2026-04.md: child-initiates rendezvous pairing (daemon generates keypair → open_auth_request → register_rendezvous → display pair code → long-poll), CLI approve command (fetch_auth_request → display OTP → user confirms → approve_auth_request), recovery flow (--recover with AgentIdentity resolution via identity graph). 11 tests must pass. Two-terminal pair E2E must work. Tag stage-4-done."
Deliverables: Pair flow works across two terminals, recovery preserves credentials, 11 tests pass.
Advance: bash harness/advance-stage.sh 4 5
Mixed Rust+TypeScript stage. Playwright browser automation for OpenRouter signup.
Invoke:
/oh-my-claudecode:ralph "Implement Stage 5 per docs/archived/development-stages-v2-2026-04.md: agentkeys-provisioner Rust orchestrator (spawn TS subprocess, IPC via stdin/stdout JSON, encrypt API key to shielding key, store_credential), provisioner-scripts/lib/email.ts (Gmail IMAP plus-addressing for verification codes), provisioner-scripts/scrapers/openrouter.ts (Playwright signup flow using email.ts). MCP tool: agentkeys.provision(service). 9 tests must pass. Tag stage-5-done."
Deliverables: Orchestrator IPC tests pass, email client tests pass, live OpenRouter provision works (manual verification by human).
Advance: bash harness/advance-stage.sh 5 6
Packaging and documentation polish.
Invoke:
/oh-my-claudecode:ralph "Implement Stage 6 per docs/archived/development-stages-v2-2026-04.md: @agentkeys/daemon npm package with postinstall binary selection (linux-x64, linux-arm64, darwin-x64, darwin-arm64), install.sh curl script, README with quickstart, docs/how-it-works.md, docs/security-model.md, CHANGELOG, LICENSE (MIT OR Apache-2.0), per-subcommand --help with examples. 7 tests must pass. Tag stage-6-done."
Advance: bash harness/advance-stage.sh 6 7
Pure integration testing. No new code, just cross-cutting E2E verification and bug fixes.
Invoke:
/oh-my-claudecode:ultraqa --custom "bash harness/stage-7-done.sh"
UltraQA runs the 6 E2E flows (full lifecycle, multi-agent isolation, pair+MCP+revoke, recovery, MCP auth demo, revocation latency), diagnoses failures, fixes, and repeats up to 5 cycles.
Done when: All 6 E2E flows pass. git tag stage-7-done. AgentKeys v0 is demo-ready.
# Human runs after each stage completes:
bash harness/stage-N-done.sh # must exit 0
git tag stage-N-done # tag the completion
cat harness/progress.json # verify stage marked complete
bash harness/advance-stage.sh N N+1 # advance to next stage| Failure | Recovery |
|---|---|
| Ralph session dies mid-story | Re-invoke /ralph with same PRD. Ralph reads progress.json + git log and resumes from last completed story. |
| One team agent fails, other succeeds | Invoke /ralph individually for the failed stage. The successful stage's work is already committed. |
| Test seems like a spec bug | credential-backend-interface.md > development-stages-v2-2026-04.md > eng-review-test-plan.md (priority order). Fix spec if genuinely wrong, then re-run. |
| Playwright breaks on live site | Update selectors in openrouter.ts. Rust IPC tests still pass (mock subprocess). |
| Stage 7 E2E keeps failing after 5 ultraqa cycles | Human diagnoses root cause. Likely a cross-component integration issue that needs manual architectural judgment. |
Day 0 : Repo setup + Stage 0 (ralph) → stage-0-done
Day 1-2 : Stage 1 (ralph, largest stage) → stage-1-done
Day 2-3 : Stages 2+3 in parallel (team) → stage-2-done + stage-3-done
Day 3-4 : Stage 4 (ralph) → stage-4-done
Day 4-5 : Stage 5 (ralph) → stage-5-done
Day 5-6 : Stage 6 (ralph) → stage-6-done
Day 6-7 : Stage 7 (ultraqa) → stage-7-done
Day 7 : DEMO READY ✓
~7-10 days with agent execution (vs. 25-34 days human-solo per the spec). Parallelization of Stages 2+3 saves ~4 days. Agent speed compresses each stage by ~60%.
v2/plans/development-stages.md— the 8-stage implementation contract (759 lines)v2/credential-backend-interface.md— the CredentialBackend trait (454 lines)v2/architecture.md— Cargo workspace layout and component inventory (355 lines)v2/plans/eng-review-test-plan.md— the full test matrix (122 lines)v2/plans/ceo-plan.md— product scope decisions constraining v0 (490 lines)
After Stage 7 completes:
cd ~/Projects/agentkeys
cargo test --workspace # 105 tests pass
bash harness/stage-7-done.sh # 6 E2E flows pass
git tag -l | grep stage # 8 tags: stage-0-done through stage-7-done
cat harness/progress.json | jq .stages # all 8 stages: "complete"
wc -l harness/features.json # all features implemented: trueThe system is demo-ready for the 4-demo meetup talk (multi-agent isolation, recovery, provisioning-in-action, cost transparency).