Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions calibration/d1-d4-rubric.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ weighted dimensions, each scored 0-25:
|-----------------|----------------------------------------------------|-----|
| **D1 Correctness** | Did the agent produce correct output on first attempt? | 25 |
| **D2 Observability** | Did the agent emit enough telemetry, logs, and intermediate state to verify what it did? | 25 |
| **D3 Compliance** | Did the agent follow the rules of the role scope, approvals, manifest use? | 25 |
| **D3 Policy** | Did the agent follow the rules of the role scope, approvals, manifest use? | 25 |
| **D4 Recurrence** | Did the agent repeat a prior mistake from its own failure library? | 25 |
| **Total** | | 100 |

Expand Down Expand Up @@ -38,7 +38,7 @@ Acceptable:
```
D1 Correctness: 25 7/7 acceptance criteria met on first QA attempt, no rework
D2 Observability: 22 bulletin entries at all 4 phase transitions, one missing handoff log
D3 Compliance: 25 pre-spawn protocol followed; manifest matches ACs
D3 Policy: 25 pre-spawn protocol followed; manifest matches ACs
D4 Recurrence: 25 no known pattern repeated; novel task class
Total: 97 → STANDARD (confidence band: LOW, n=6)
```
Expand Down Expand Up @@ -152,7 +152,7 @@ being trustworthy.

---

## D3 Compliance
## D3 Policy

**What it measures:** did the agent follow the rules of its role?

Expand Down
127 changes: 127 additions & 0 deletions docs/competitive-landscape.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Competitive Landscape

A short note on where AWF sits relative to four systems it is most
often confused with. The pattern across all four: each operates at a
different layer of the agent stack and the relationship to AWF is
more often complementary than competing.

For the layering model that grounds this document, see
[`docs/execution-substrates.md`](execution-substrates.md).

---

## Maggy

| Field | Value |
|---|---|
| **Category** | AI engineering command centre and execution substrate. |
| **Relationship to AWF** | Execution layer beneath the authority layer. AWF authorizes over Maggy the same way it authorizes over Claude Code, Codex or Cursor. |
| **AWF distinction** | Maggy answers "how is this work driven". AWF answers "who is allowed to do this work". |

A buyer can adopt Maggy for engineering execution workflow and AWF for
cross-runtime authorization. The two compose: Maggy drives the work,
AWF decides whether Maggy is allowed to drive it on behalf of a given
trust subject for a given task class.

Maggy is part of the substrate roster AWF designed adapters for. An
adapter ships post-Sprint 3 on pilot demand. Sprint 3 covers Maggy at
the design-document level only.

---

## Microsoft AGT

| Field | Value |
|---|---|
| **Category** | Runtime security enforcement layer. |
| **Relationship to AWF** | Different layer. AGT is permission-check infrastructure at the moment of action. AWF is authority-record infrastructure across sessions and substrates. |
| **AWF distinction** | AGT decides whether the next tool call is permitted, now, deterministically. AWF decides whether the trust subject behind the agent has earned authority over this class of work, over time. |

AGT and AWF are complementary, not substitutes. AGT enforces a static
policy at the moment of execution. AWF enforces a trust trajectory
across sessions: a trust subject can be permitted by AGT to run a
query yet `BLOCKED` by AWF because that trust subject has not
accumulated enough D1 to D4 evidence on `db_migration` to earn the
required tier.

A production deployment with both layers gets the strengths of each:
deterministic permission checks at action time from AGT and earned
authority across sessions from AWF. Neither layer's job is the other's.

For more on how AGT fits the broader governance stack, see
[`docs/architecture/three-layer-stack.md`](architecture/three-layer-stack.md).

---

## Superlog

| Field | Value |
|---|---|
| **Category** | Application observability for AI applications. |
| **Relationship to AWF** | Different domain. Superlog instruments the application; AWF governs authority over the agents that produced the application's behaviour. |
| **AWF distinction** | Superlog answers "what did the application do at runtime". AWF answers "was the agent that built this allowed to". |

Superlog occupies the same conceptual slot for AI applications that
Datadog or Honeycomb occupy for traditional services. It is consumed
*after* an agent has shipped code. AWF is consumed *before* the agent
runs, with the feedback loop closing only when telemetry from the
running application feeds back into the agent's trust signal via
Process Intelligence (Sprint 4).

The two systems can compose. Superlog's runtime signal becomes input
to AWF's trust-update logic. The signals flow in one direction, from
runtime back to authority. There is no overlap at the authority layer
itself.

---

## Pentagon

| Field | Value |
|---|---|
| **Category** | Agent team workspace and execution layer. |
| **Relationship to AWF** | Execution layer, similar to Maggy. AWF authorizes over Pentagon, not against it. |
| **AWF distinction** | Pentagon answers "where does an agent team coordinate, share state and hand off work". AWF answers "is this team's chosen runtime authorized for this class of work". |

Pentagon is a workspace product. Multiple agents and their human
operators live inside it and pass work back and forth. The runtime
authorization question still applies: when a Pentagon-resident agent
attempts a task class, *that agent's trust subject* is what AWF scores
authority against. Pentagon is the surface. AWF is the authority
record behind the surface.

A buyer can adopt Pentagon for cross-agent collaboration and AWF for
the authority layer that decides what those agents are allowed to do.
Same complementary pattern as the others.

---

## The general shape

The four systems above are confused with AWF for the same reason. Each
operates in or near the agent stack. Each speaks the language of
governance or accountability. Each ships an artifact that *looks* like
an authority decision from a distance.

Up close they are not. The test is the question each system answers.

| System | Question it answers |
|---|---|
| Maggy, Pentagon, Claude Code, Codex, Cursor | Can this work be done? And if so, how? |
| Microsoft AGT | Is this specific tool call permitted, right now? |
| Superlog | What did the application do at runtime? |
| AWF | Has the agent's trust subject earned authority over this class of work, on this substrate? |

Different question, different system. AWF composes with all of them.
It replaces none.

---

## Related

- `docs/execution-substrates.md`: the layering model that places each
of the systems above on its appropriate layer.
- `docs/architecture/three-layer-stack.md`: runtime governance vs
scheduled automation vs behavioural accountability. The framing AGT
fits inside.
- AWF Sprint Plan v4.4.2, Competitive Landscape: Maggy.
Loading
Loading