agentic-ops

A pattern reference for governed multi-agent engineering delivery: scoped work packages, handoff queues, isolated execution workspaces, review gates, and durable merged state.

Status

This repository ships the pattern reference: documentation, templates, and one synthetic worked walkthrough. Executable orchestration is not in scope at v1.

What this repo IS

A pattern reference for a governed multi-agent engineering-delivery workflow.
Documentation (architecture, operating model, lifecycle, vocabulary, private-vs-public separation, expanded long-form architecture under docs/architecture/) plus four reusable templates plus one synthetic end-to-end walkthrough.
A supporting repository in a four-repo portfolio. The three flagship repositories named under Adjacent repositories carry their own technical depth.

What this repo is NOT

Not the exact internal operating system used in any private work. The repo is a public-safe re-derivation of the pattern, not a snapshot of any real run-state.
Not a deployed platform or runtime. No customer deployment claim.
Not a managed service or proprietary SaaS.
Not a generalized multi-tenant agent platform.
Not a workshop, training course, adoption field-kit, or facilitator artifact.
Not a flagship proof repository with captured-run evidence.

How to read this repo

This README is the orientation document. The longer architecture coverage and component-level documentation live under docs/.

docs/architecture/overview.md — long-form architecture entry point: the seven major surfaces and how they fit together.
docs/architecture/state-machine.md — the work-package lifecycle, legal state transitions, and the runner's enforcement points.
docs/architecture/runner-modes.md — CLI and API entry points; the DryRun / Shadow / Canary / Live promotion ladder.
docs/architecture/handoff-queue.md — append-only transport semantics, queue-entry shape, return contract, queue discipline rules.
docs/architecture/candidate-writes-and-canon-gate.md — the candidate-to-canon path, Canon Gate policy boundary, commissioning, rollback.
docs/architecture/substrate.md — transactional store + vector retrieval + typed graph engine + local-model evaluator layer at the architectural level.
docs/architecture/audit-and-replay.md — append-only audit chain, replay determinism, anti-replay, rollback, override.
docs/architecture/identity-and-security.md — principal model, capability bundles, tenant boundaries, fail-closed defaults, secrets hygiene.
docs/architecture/human-ownership.md — the five seats of human ownership; why local models recommend rather than commission.
docs/architecture/cloud-portable-shape.md — local-first execution; cloud-portable target; what changes, what stays the same.
docs/architecture/glossary.md — every architecture term defined in plain workflow language.
docs/diagrams/ — professional Mermaid diagram sources used across the architecture documents.
docs/operating-model.md, docs/lifecycle-and-state.md, docs/vocabulary.md, docs/private-vs-public.md — the v1 framing documents, preserved.
templates/ — the v1 templates for work packages, handoff messages, return artifacts, and review responses.
examples/oss-library-maintenance/ — the v1 synthetic worked walkthrough.

If you only have time to read a few files, start with docs/architecture/overview.md, then this README's North Star and Architecture in prose sections below.

North Star

agentic-ops is not a Markdown-first workflow with scripts attached later. The pattern's finish line is a code-driven state machine with Markdown and YAML control surfaces around it.

The substrate the pattern describes is a governed agentic operating layer: a way to run serious agent work with durable state, scoped work packages, isolated execution, governed memory writes, review gates, audit trails, and human ownership. Agents do not live only in chat. They work through a system that knows who they are, what they are allowed to read, what they are allowed to write, which state they are in, what proof they owe, and who must review the result before anything becomes canon.

The durable model:

flowchart TB
    humanOwner["Human owner / review authority"]
    queue["Handoff queue: work package + onboarding"]
    identity["Scoped agent identity: role + tools + allowed surfaces"]
    workspace["Isolated execution workspace: branch + target repo"]
    sm["Code-driven state machine"]
    cli["CLI path"]
    api["API path"]
    outputs["Candidate writes / proof artifacts / review requests"]
    canonGate["Canon Gate: policy + redaction + provenance"]
    model["Local model evaluator: scoring + comparison + commissioning support"]
    reviewGate["Review gate authority"]
    canonState["Commissioned canon: accepted state only"]

    humanOwner --> queue --> identity --> workspace --> sm
    sm --> cli
    sm --> api
    cli --> outputs
    api --> outputs
    outputs --> canonGate
    canonGate --> model
    model --> reviewGate
    reviewGate --> canonState

    style humanOwner fill:#0f172a,stroke:#38bdf8
    style queue fill:#0f172a,stroke:#38bdf8
    style identity fill:#0f172a,stroke:#38bdf8
    style workspace fill:#172554,stroke:#60a5fa
    style sm fill:#172554,stroke:#60a5fa
    style cli fill:#172554,stroke:#60a5fa
    style api fill:#172554,stroke:#60a5fa
    style outputs fill:#172554,stroke:#60a5fa
    style canonGate fill:#3f1d1d,stroke:#f97316
    style model fill:#3f1d1d,stroke:#f97316
    style reviewGate fill:#3f1d1d,stroke:#f97316
    style canonState fill:#123524,stroke:#22c55e

The control files matter. Markdown and YAML make the work inspectable, governable, and human-readable. But they are not the runtime engine. Code owns the execution state machine, bootstrap, harness, substrate writes, API behavior, retry and stop behavior, audit, and validation.

Architecture in prose

The architecture starts with a simple premise: serious agent work needs an operating surface, not just a chat window. Chat is useful for instruction and collaboration, but it is a weak system of record. The pattern in this repository describes the operating substrate around the work: every agent gets a work package, a role, a handoff queue entry, a scoped execution workspace, an authorized read/write surface, a proof obligation, and a review path. The human operator and the review authority remain responsible for direction and acceptance. The agents do bounded work. The state machine makes that work repeatable.

The control plane in this pattern lives in a coordination summary repository: the place where the review authority books work, tracks status, records review verdicts, keeps the work-package registry, updates the backlog, and maintains the project board. The control plane does not own implementation code. It owns coordination, evidence pointers, accepted summaries, and state transitions. This separation matters because mixing implementation payload into the control surface produces an oversized PR where governance and implementation compete for the same review attention. Keeping the coordination repository summary-only prevents that drift.

The handoff queue is the transport layer between the review authority and the agents. It is not chat, email, instant messaging, or a vague inbox. A handoff queue entry tells an agent who it is, which work package it owns, which files define the assignment, what state the entry is in, where the expected return belongs, and what the next gate is. A correct agent does not improvise from memory when the queue entry exists. It pulls the queue, reads the entry, opens the work-package and onboarding files, checks the current state, and either returns scope confirmation, executes under a review-gate go, handles a revision, files a status hold, or stops. This is the core anti-drift mechanism for multi-agent work.

The state machine is the executable center. It enforces what a human or review authority would otherwise have to enforce manually: queue-entry discovery, agent-identity selection, workspace provisioning, work-package loading, state-transition validation, stop-condition handling, review-request routing, and audit capture. The state machine is what turns a CLI-driven operating model into a runnable system. Without it, the system depends on humans repeatedly saying "check the queue" and agents interpreting that correctly. With it, the review authority can book work and the runner can trigger the correct scoped agent under the correct state.

The worker agent is intentionally constrained. An agent is not an autonomous actor with blanket repo access. It receives a role, a work package, an execution workspace, a branch, owned surfaces, forbidden surfaces, expected evidence tier, validation commands, and a stop gate. The agent may implement, research, migrate, or review only inside that boundary. If the work package says scope confirmation first, the agent stops after scope confirmation. If a review-gate go lands, the agent executes. If review returns revision required, the agent fixes only the named findings. This discipline lets multiple agents run in parallel without turning the repo into an uncontrolled shared scratchpad.

The implementation plane is the adopter's target repository — the home for code, proof trees, research roots, runtime evidence, and detailed return artifacts. The implementation plane should be allowed to be detailed and technical. It should contain enough evidence that a future reviewer can understand what was built, what was proven, what was not proven, and what must happen next. The control plane should receive links and commissioned summaries, not the full payload.

The CLI and API are two entry points into the same operating model. The CLI is the local operator surface: dispatching, checking the queue, applying verdicts, running substrate checks, and reviewing work. The API path is the programmatic entry point for the same actions. They must not become two different systems. A command that starts a runner, reads a queue entry, writes a candidate, or requests review should call the same underlying state-machine logic whether invoked from the CLI or through an API.

The substrate is the memory and data plane. A transactional store owns transactional truth: candidate envelopes, run metadata, review status, policy versions, audit pointers, decisions, and consistent state. A vector store owns semantic retrieval over approved or candidate-indexed material, with tenant and policy filters. A first-party typed graph engine owns relationships, traversal, provenance, graph audit, and graph algorithms. These stores work together because agent work is not a single blob of text. It has records, semantic meaning, relationships, owners, provenance, review status, and time.

The candidate-write path is the safe bridge between agent work and durable memory. Agents write candidate records as they work, but those candidates are not canon. A candidate carries the principal, source, work package, policy version, evidence, redaction state, graph and vector projections, and review status. Local models and evaluators can help classify, score, compare, summarize, and explain candidate records. They can help commission, but they do not get to silently promote. Canon promotion belongs behind the Canon Gate and the accepted human review path.

The Canon Gate is the policy boundary. It should fail closed when identity is missing, scope is wrong, evidence is insufficient, redaction fails, replay is detected, tenant boundaries are unclear, policy versions do not match, local-model confidence is insufficient, or human review is required. The Canon Gate is not just a function call; it is the system's answer to the question "when is an agent-produced thing allowed to become trusted state?" If the answer is unclear, the candidate stays a candidate.

The local-model layer exists because memory tools that only append Markdown are not enough. Machine learning can help evaluate and commission memory: detecting duplicates, scoring evidence, comparing candidate facts, identifying disagreement, summarizing context, and surfacing risk. Local models should support judgment and review. They should not erase human ownership. The right pattern is model-assisted review, not model-owned truth.

The first-party graph engine is part of the pattern's core, not a placeholder. It should be built for agentic workloads from the beginning: typed nodes and edges, audit-aware writes, policy-aware traversal, replayable changes, bounded algorithms, tenant and principal checks, and configurable graph-analysis suites. The goal is not merely to store relationships. The goal is to make graph reasoning safe enough for agents to use during real work, while still preserving reviewability, provenance, and human accountability.

The review system is what keeps speed from becoming recklessness. Agents can work in parallel, but the review authority decides whether the result is accepted, rejected, blocked, or sent back for revision. Review artifacts must name what changed, what was validated, what was not touched, what evidence supports the claim, what risks remain, and what the next gate is. A review verdict is part of the system state, not a chat opinion.

The public surface — this repository — is a reflection of the pattern, not the source of truth. Public material explains the pattern safely; it does not leak private workflow details, customer or employer terms, raw discovery content, or unreviewed run state. The public material describes the architecture in a credible, reusable way.

The enterprise direction is the same pattern at larger scale. A person or team should be able to work with agents through a handoff queue, scoped work packages, governed memory, safe automation, review queues, and durable audit. A department should be able to capture workflow knowledge through natural conversation, sanitize it, identify safe automation opportunities, distinguish augmentation from replacement, and preserve human ownership. An organization should be able to introduce agents without losing control of identity, data boundaries, policy, auditability, or accountability. That is why this pattern is bigger than a local script runner.

The result the pattern aims for: a system where agents make humans faster and more accurate without pretending humans are optional. Agents prepare the work, surface context, check consistency, draft artifacts, run proofs, and write candidates. Humans decide, approve, correct, escalate, and own the outcome. The architecture exists to make that relationship durable.

Operating rule

The pattern carries a structural operating rule that adopters honor across every work package: implementation merges land in the implementation repository first, before the coordination summary repository records the lane as complete.

The coordination summary repository receives only:

work-package summaries
review verdicts
links to implementation PRs and merge commits
tracker, backlog, and project-board state
commissioned documentation summaries

It does not receive full implementation code, proof trees, captured runtime artifacts, or research dumps.

This rule exists because mixing implementation payload into the coordination surface produces a control-plane PR that becomes too large and too mixed to review. Keeping coordination summary-only is the operating discipline that lets the control plane stay readable as the system scales.

What this pattern enables (full scope)

The pattern is not a narrow work-package runner. It is a governed agentic substrate that can start locally, support real agent work, and later map to cloud, team, and enterprise operation without changing the core operating model.

Workstream	Required outcome
Repo-driven self-configuration	Agents read the repo, identify their role, queue entry, work package, allowed surfaces, stop conditions, and required proof shape without relying on chat as authority.
Bootstrap and harness	A repeatable startup path provisions the local runtime, loads configuration, checks dependencies, creates and validates execution workspaces, and enters the state machine.
CLI / API parity	The same governed operations are available through CLI and API. The CLI does not become a separate shadow system with different behavior.
Handoff-queue watcher and runner	The review authority books a queue entry and the system triggers the correct scoped agent using the right identity and credentials without manual "check the queue" prompts.
Local substrate	Local services are started, checked, stopped, and validated as a real substrate rather than a loose collection of files.
Data stores	A transactional store for source-of-truth records, a vector store for semantic retrieval, and a first-party graph engine for typed relationships, traversal, and provenance.
First-party graph engine	A purpose-built graph engine for agentic workloads: secure from inception, audit-aware, replayable, policy-aware, designed for graph algorithms that support agent work.
Candidate memory / write path	Agents write candidate records first; candidate records carry provenance, principal, policy version, evidence, local-model evaluator context, and review state.
Canon Gate	No candidate becomes commissioned canon unless it passes policy, evidence, redaction, identity, anti-replay, and human-review controls.
Local models and commissioning	Local-model evaluator support helps score, compare, classify, summarize, or commission candidates — but does not bypass review authority.
Human-in-the-loop ownership	Humans own judgment, approval, escalation, and accountability. Agents surface, prepare, compare, and execute bounded work.
Security and identity	Tenant boundaries, principal binding, least privilege, redaction, anti-replay, audit chain, override logging, and fail-closed defaults are first-class.
Testing and validation	Local command proof, runtime proof, stress tests, abuse tests, replay tests, rollback tests, redaction tests, and human-review escalation tests all matter.
Cloud portability	Local-first does not mean local-only. The architecture maps to cloud execution without rewriting the operating model.
Public-safe reflection	Public surface material comes after the private implementation is real and sanitized. Public-facing material does not leak private names, private workflows, or raw run state.

Operating architecture

The pattern preserves a clear separation of responsibilities across four planes.

flowchart LR
    subgraph controlPlane["Control plane: coordination summary repo"]
        humanOwner["Human owner / review authority"]
        queue["Handoff queue"]
        registry["Work-package registry"]
        tracker["Tracker / backlog / board"]
    end

    subgraph runtimePlane["Execution plane: runtime"]
        runner["Agent runner"]
        machine["State machine"]
        agent["Scoped worker agent"]
        workspace["Isolated execution workspace"]
    end

    subgraph implPlane["Implementation plane: implementation repo"]
        code["Implementation code"]
        proofs["Proof trees"]
        research["Research roots"]
        handoff["Return artifact"]
    end

    subgraph reviewPlane["Review and integration"]
        implpr["Implementation PR"]
        verdict["Queue review verdict"]
        coordpr["Coordination summary PR"]
        fold["Tracker / backlog / registry / board fold"]
    end

    humanOwner --> queue
    humanOwner --> registry
    humanOwner --> tracker
    queue --> runner
    registry --> runner
    runner --> machine
    machine --> agent
    agent --> workspace
    workspace --> code
    workspace --> proofs
    workspace --> research
    code --> implpr
    proofs --> implpr
    research --> implpr
    handoff --> coordpr
    implpr --> verdict
    coordpr --> verdict
    verdict --> fold
    fold --> tracker

    style humanOwner fill:#0f172a,stroke:#38bdf8
    style queue fill:#0f172a,stroke:#38bdf8
    style registry fill:#0f172a,stroke:#38bdf8
    style tracker fill:#0f172a,stroke:#38bdf8
    style runner fill:#172554,stroke:#60a5fa
    style machine fill:#172554,stroke:#60a5fa
    style agent fill:#172554,stroke:#60a5fa
    style workspace fill:#172554,stroke:#60a5fa
    style code fill:#312e81,stroke:#a5b4fc
    style proofs fill:#312e81,stroke:#a5b4fc
    style research fill:#312e81,stroke:#a5b4fc
    style handoff fill:#312e81,stroke:#a5b4fc
    style implpr fill:#3f1d1d,stroke:#f97316
    style verdict fill:#3f1d1d,stroke:#f97316
    style coordpr fill:#3f1d1d,stroke:#f97316
    style fold fill:#3f1d1d,stroke:#f97316

The state machine is the center of gravity. The handoff queue is transport. The work package is scope. The execution workspace is isolation. The PR is review and merge. The substrate is runtime memory and evidence. The review authority is the acceptance gate.

No worker infers permission from chat when the queue entry says otherwise. No implementation lane claims completion because a coordination summary PR merged. No candidate memory becomes canon because an agent wrote it. No public surface becomes the dumping ground for private implementation state.

The full state machine + per-component depth lives under docs/architecture/.

Repository shape

Path	Purpose today
`README.md`	This file — pattern anchor + documentation index
`docs/architecture/`	Long-form per-component architecture: overview, state machine, runner modes, handoff queue, candidate writes and Canon Gate, substrate, audit and replay, identity and security, human ownership, cloud-portable shape, glossary
`docs/diagrams/`	Mermaid sources for the architecture diagrams used across `docs/architecture/` and this README
`docs/architecture.md`	The v1 seven-component workflow model and lifecycle diagram
`docs/operating-model.md`	The four-role table, scopes, and handoff rules
`docs/lifecycle-and-state.md`	The state-transition graph and per-state required artifacts
`docs/vocabulary.md`	Vocabulary used across the pattern, plus extension guidance for adopters
`docs/private-vs-public.md`	What stays in an adopter's coordination repository vs what becomes adopter-public
`templates/work-package-template.md`	A bounded unit-of-work template with the eight required sections
`templates/handoff-message-template.md`	The handoff-queue message shape
`templates/return-artifact-template.md`	The worker return-artifact shape
`templates/review-response-template.md`	The review-response shape and verdict vocabulary
`examples/oss-library-maintenance/`	One synthetic end-to-end walkthrough applying the pattern to a fictional open-source library maintenance scenario
`ROADMAP.md`	What v1 ships, what does NOT ship at v1, and v1.1+ candidates

Evidence ladder

The pattern uses an honest evidence ladder. Adopters describe what they have proven using a defined classification, not by overclaiming.

Evidence class	Meaning	Acceptable claim
Source trace	Repo files, work-package specs, prior accepted artifacts, and PR state were inspected.	"The current repo says X."
Static proof	A diff, inventory, collision map, or structural check proves a bounded property.	"This migration plan is additions-only and names the collision."
Local command proof	A local command ran and logs were captured.	"This harness passed locally under these inputs."
Runtime proof	Services actually started or a substrate path executed.	"The transactional store / vector store / graph path ran for this scenario."
Stress / abuse proof	Negative paths, replay, duplicate write, tenant boundary, rollback, and redaction cases ran.	"This path fails closed under the tested abuse cases."
Merge proof	The implementation PR merged and the merge SHA is recorded.	"This implementation is integrated into the target repo."
Not proven	The lane names the gap and does not overclaim.	"Cloud runtime is not proven in this lane."

These categories do not collapse. A static plan is not runtime proof. A local stub is not a local model. A coordination summary merge is not target-repo code integration. A candidate write is not canon.

Writing standard for reviewer-complete artifacts

The pattern asks adopters to write major PR bodies, READMEs, work-package returns, and proof summaries to a reviewer-complete bar. The standard is closer to a technical advisory or architecture review than a normal short changelog.

A reviewer-complete artifact is one where a reviewer can read the artifact and answer every load-bearing question without needing chat context, side-channel files, or a separate explanation. If the reviewer has to ask "what did this actually do?", "where is the evidence for that claim?", "what's not proven here?", or "what's the next step?", the artifact is not reviewer-complete. The body answers all of those upfront.

The standard expects:

Long-form prose. Not bullet-only changelogs. Reviewers need orientation, not just a list of touched files. The executive judgment section is prose; the architecture-in-prose section is prose; the non-claims section is prose. State tables and evidence registers complement the prose; they do not replace it.
State tables. A status table at the top, a runtime / source anchor table, a current-state truth table, a risk and residual register. These give a reviewer the structured anchors they need to verify drift between PR body, queue, and repo state.
Professional diagrams. When architecture, data flow, state flow, graph flow, queue flow, or review flow is central to the artifact, include a GitHub-renderable Mermaid diagram. Use conservative syntax that renders reliably; visually clever but brittle diagrams that fail to render are worse than no diagram. Follow the diagram with prose explaining what each component owns.
Evidence registers with specific pointers. "Tests pass" is not evidence. Name the command, the captured log, the expected output, and the interpretation. Pointers to commands, logs, fixtures, PR URLs, and merge SHAs are what let a reviewer verify rather than trust.
Explicit non-claims. State what the artifact does not prove. Non-claims are not defensive writing; they are how the system avoids drift. A reviewer who sees an explicit "this does not authorize unreviewed canon promotion" knows the lane has been thought through.
Public / private hygiene. State what must not move to public surfaces. Run hygiene scans against blocked external-model literals, private-source identifiers, customer or employer terms, internal control-plane vocabulary, AI-attribution trailers, and IP-sensitive filenames.
Reviewer-complete context. Everything a reviewer needs to evaluate the lane in isolation is in the artifact. Chat-context dependencies are findings; the artifact stands on its own.

The standard scales with the lane. A minor doc fix does not need 600 lines of body. A substrate change, a Canon Gate change, a graph-engine change, a security boundary change, a major migration, a validation tier landing — these all need the full long-form treatment.

Reproduce

There is nothing to run. To use the pattern: read docs/operating-model.md and docs/lifecycle-and-state.md for the v1 framing, then walk through the long-form docs/architecture/ tree for component depth, then read examples/oss-library-maintenance/ for a complete one-cycle illustration.

License

Apache-2.0.

Adjacent repositories

The three sibling canonical repositories are out-of-scope here. Cross-references are descriptive only; this repository does not import or deploy them.

production-rag-eval-harness — retrieval-quality evaluation (the home for retrieval implementation depth).
agent-runtime-observability — governed agent runtime (the home for agent-runtime implementation depth).
aws-bedrock-iac-reference — AWS / Bedrock infrastructure-as-code reference (the home for cloud-side architecture depth).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agentic-ops

Status

What this repo IS

What this repo is NOT

How to read this repo

North Star

Architecture in prose

Operating rule

What this pattern enables (full scope)

Operating architecture

Repository shape

Evidence ladder

Writing standard for reviewer-complete artifacts

Reproduce

License

Adjacent repositories

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
examples/oss-library-maintenance		examples/oss-library-maintenance
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md

Folders and files

Latest commit

History

Repository files navigation

agentic-ops

Status

What this repo IS

What this repo is NOT

How to read this repo

North Star

Architecture in prose

Operating rule

What this pattern enables (full scope)

Operating architecture

Repository shape

Evidence ladder

Writing standard for reviewer-complete artifacts

Reproduce

License

Adjacent repositories

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages