From cdb3dd63d4e5b4bb8ad7337aeb1e422c2a71ed53 Mon Sep 17 00:00:00 2001
From: Valdis Pornieks <pornieks@gmail.com>
Date: Tue, 9 Jun 2026 12:45:52 +0300
Subject: [PATCH 1/4] chore: TDR: openAI compatible adapter with agent loop

---
 ...4-openai-compat-adapter-with-agent-loop.md | 176 ++++++++++++++++++
 1 file changed, 176 insertions(+)
 create mode 100644 docs/tdr/0004-openai-compat-adapter-with-agent-loop.md

diff --git a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md
new file mode 100644
index 0000000..c823da4
--- /dev/null
+++ b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md
@@ -0,0 +1,176 @@
+# TDR 0004 — OpenAI-Compatible Agentic Adapter
+
+**Status:** Accepted — 2026-06-09
+
+## Context
+
+QualOps' agentic review mode is gated on two proprietary SDKs: `@anthropic-ai/claude-agent-sdk`
+for Anthropic and `@openai/agents` for OpenAI. Both SDKs manage their own tool dispatch,
+context handling, and multi-turn orchestration internally. This works well for the providers they
+were designed for, but it prevents using any other model that speaks the OpenAI chat completions
+wire format — Groq, Mistral, local models via Ollama or LM Studio, Gemini via OpenRouter,
+DeepSeek on Fireworks, and others.
+
+These providers all implement the same `/v1/chat/completions` endpoint with tool calling
+(`tool_calls` in the assistant message, `tool` role in the next user turn). The only thing missing
+is a client-side loop that drives the conversation, dispatches tool calls, and manages the context
+window without relying on a provider-specific SDK.
+
+## Decision
+
+Introduce an `openai-compat` provider that runs a self-contained agentic harness using raw HTTP
+calls to any OpenAI-compatible chat completions endpoint. No agent SDK is involved. The harness
+owns the reasoning loop, tool dispatch, and context window management, making it compatible with
+any endpoint that speaks the standard wire format.
+
+## Architecture
+
+```
+  Caller (AgenticExecutor)
+         │
+         │  run(params)
+         ▼
+┌─────────────────────────────────────────────────────────────────┐
+│  OpenAICompatAdapter                                            │
+│                                                                 │
+│  ┌──────────────────────────────────────────────────────────┐   │
+│  │  Session state: ChatMessage[]                            │   │
+│  │  [ system ] [ user ] [ assistant ] [ tool ] [ ... ]      │   │
+│  └───────────────────────┬──────────────────────────────────┘   │
+│                          │                                      │
+│          ┌───── turn loop (1..maxTurns) ──────────────┐         │
+│          │                                            │         │
+│          │  ┌─────────────────────────┐               │         │
+│          │  │  ContextManager         │               │         │
+│          │  │  maybeSummarize()       │               │         │
+│          │  │  · estimate tokens      │               │         │
+│          │  │  · if > 60% window:     │               │         │
+│          │  │    summarise oldest     │               │         │
+│          │  │    exchange → [summary] │               │         │
+│          │  │    fallback: truncate   │               │         │
+│          │  └────────────┬────────────┘               │         │
+│          │               │ compressed history         │         │
+│          │               ▼                            │         │
+│          │  ┌─────────────────────────┐               │         │
+│          │  │  fetchWithRetry()       │               │         │
+│          │  │  POST /chat/completions │◄──────────────┼────────►│ LLM endpoint
+│          │  │  · 429/5xx: backoff×3   │               │         │
+│          │  │  · 401: immediate fail  │               │         │
+│          │  └────────────┬────────────┘               │         │
+│          │               │ ChatCompletionResponse     │         │
+│          │               ▼                            │         │
+│          │       finish_reason?                       │         │
+│          │       ├─ stop ──────────────────► return output      │
+│          │       ├─ length ────────────────► error_max_tokens   │
+│          │       ├─ content_filter ────────► error_content_filter
+│          │       └─ tool_calls                        │         │
+│          │               │                            │         │
+│          │               ▼                            │         │
+│          │  ┌─────────────────────────┐               │         │
+│          │  │  Tool Dispatch          │               │         │
+│          │  │  (sequential)           │               │         │
+│          │  │  · resolve by name      │               │         │
+│          │  │  · parse JSON args      │               │         │
+│          │  │  · execute handler      │               │         │
+│          │  │  · append tool result   │               │         │
+│          │  └─────────────────────────┘               │         │
+│          │                                            │         │
+│          └────────────────────────────────────────────┘         │
+│                          │                                      │
+│               maxTurns exceeded → error_max_turns               │
+│                                                                 │
+│  finally: toolSet.dispose()                                     │
+└─────────────────────────────────────────────────────────────────┘
+         │
+         │  AgentAdapterResult { output, inputTokens, outputTokens, errorSubtype? }
+         ▼
+  Caller (AgenticExecutor)
+```
+
+The harness is composed of three layers:
+
+### Adapter
+
+The adapter (`OpenAICompatAdapter`) is the entry point and owns the session lifecycle. It receives
+a system prompt, a user prompt, a set of tool definitions, and configuration — model name, endpoint
+URL, API key, and a turn budget. It builds an initial message history and enters a loop.
+
+On each turn it sends the current history to the endpoint and waits for a reply. If the model
+signals it is done (`finish_reason: stop`) the loop exits and the assistant's final text is
+returned. If the model requests tool calls, the adapter executes each one in sequence, appends the
+results to the history, and continues to the next turn. Execution is sequential rather than
+parallel because some tools — specifically the bash session tool — are stateful and cannot be
+interleaved safely.
+
+The adapter handles HTTP-level errors with exponential backoff for rate limiting (429) and server
+errors (5xx), and maps each terminal condition to a stable error subtype string that callers can
+inspect without parsing error messages. Tool errors — unknown tool names, malformed arguments,
+execution failures — are returned to the model as tool result messages rather than thrown. This
+lets the model observe the error and recover without restarting the session.
+
+The adapter is stateless between `run()` calls; all session state lives in the message history
+that it builds up turn by turn. Subagent orchestration is not supported — the model reasons flat
+using the provided tools.
+
+### Context Manager
+
+The context manager (`ContextManager`) is called at the start of each turn, before the request is
+sent. It estimates the token count of the current history and compares it against the known context
+limit for the model.
+
+If the history exceeds 60% of the context window, the context manager compresses the oldest tool
+exchange in the history. It does this by sending a one-shot request to the same endpoint, asking
+it to summarise what was done and found in that exchange. The summary replaces the original
+exchange in place — an assistant message with tool calls and the corresponding tool results become
+a single system message prefixed `[summarized]`.
+
+The 60% threshold is intentional. It ensures there is always room in the remaining window for both
+the summary response itself and the model's next substantive reply. Triggering compression at a
+higher threshold risks running out of context mid-summarisation.
+
+If the summary call fails for any reason, the context manager falls back to hard truncation:
+dropping the oldest tool exchange entirely rather than replacing it with a summary. The system and
+user messages at the start of the history are always preserved.
+
+### Tool Dispatch
+
+Tool definitions are Zod schemas. Before the first request, the adapter converts each schema to
+JSON Schema (draft-7 target) using the existing `schemaToJsonSchema` utility. Draft-7 is chosen
+over draft-2020-12 because some vendor endpoints have narrow JSON Schema parsers that reject the
+newer `$schema` markers.
+
+## Configuration
+
+The `openai-compat` provider is configured inside the standard `ai.reviewStage` block:
+
+```json
+{
+  "ai": {
+    "reviewStage": {
+      "provider": "openai-compat",
+      "model": "mistral-small-latest",
+      "baseURL": "https://api.mistral.ai/v1",
+      "apiKeyEnvVar": "MISTRAL_API_KEY",
+      "inputPerMillion": 0.1,
+      "outputPerMillion": 0.3
+    }
+  }
+}
+```
+
+`baseURL` identifies the chat completions base URL. It falls back to the `OPENAI_BASE_URL`
+environment variable, then to the standard OpenAI endpoint. `apiKeyEnvVar` is the name of the
+environment variable that holds the API key; it falls back to `OPENAI_API_KEY`, and if neither is
+set the key is empty — which is valid for local endpoints like Ollama that require no
+authentication.
+
+Structured output capability is derived automatically from the model name via the existing litellm
+capability catalog. No manual configuration is required.
+
+## What This Does Not Do (V1)
+
+- **Subagent orchestration.** The model reasons flat using the tools it is given. Multi-agent
+  handoff requires SDK-level orchestration that is out of scope here.
+- **Budget enforcement.** `maxBudgetUsd` is accepted and passed through but not enforced.
+- **Streaming.** Responses are collected in full before processing.
+- **Parallel tool execution.** Tool calls are always sequential.

From 7d9644d6be21506877c874cef989b80c91369345 Mon Sep 17 00:00:00 2001
From: Valdis Pornieks <pornieks@gmail.com>
Date: Tue, 9 Jun 2026 14:59:28 +0300
Subject: [PATCH 2/4] chore: updated TDR with considered options

---
 ...4-openai-compat-adapter-with-agent-loop.md | 242 +++++++++++++++---
 1 file changed, 200 insertions(+), 42 deletions(-)

diff --git a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md
index c823da4..9754ae7 100644
--- a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md
+++ b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md
@@ -1,6 +1,6 @@
 # TDR 0004 — OpenAI-Compatible Agentic Adapter
 
-**Status:** Accepted — 2026-06-09
+**Status:** Proposed — 2026-06-09
 
 ## Context
 
@@ -16,12 +16,130 @@ These providers all implement the same `/v1/chat/completions` endpoint with tool
 is a client-side loop that drives the conversation, dispatches tool calls, and manages the context
 window without relying on a provider-specific SDK.
 
-## Decision
+This TDR evaluates whether to build that loop inside QualOps or delegate it to an external
+library, and records the design decisions made in the chosen approach.
 
-Introduce an `openai-compat` provider that runs a self-contained agentic harness using raw HTTP
-calls to any OpenAI-compatible chat completions endpoint. No agent SDK is involved. The harness
-owns the reasoning loop, tool dispatch, and context window management, making it compatible with
-any endpoint that speaks the standard wire format.
+## Options Considered
+
+### Option A — Hand-rolled harness (build internally)
+
+A self-contained TypeScript module (~270 lines) driving a `while` loop directly against the
+OpenAI chat completions wire format using `fetch`. No runtime dependency added.
+
+**Pros:**
+- Zero added dependencies; no version drift risk in a critical path
+- Full control over context window management — proactive summarisation with hard-truncate
+  fallback, tuned for the large tool outputs typical in code review workloads
+- Sequential tool dispatch is explicit, auditable, and stateful-safe
+- Deterministic error codes (`errorSubtype`) that callers match without parsing text
+- Already implemented, tested, and passing
+
+**Cons:**
+- Maintenance burden sits entirely on the QualOps team
+- Does not benefit from improvements in the wider ecosystem
+
+---
+
+### Option B — Vercel AI SDK (`ai` package, v6)
+
+`generateText` with `maxSteps` drives the tool-calling loop internally. Provider-agnostic via
+`@ai-sdk/openai-compatible` with configurable `baseURL`.
+
+**Community:** 24,700 GitHub stars · active core team · last commit Jun 2026 · v6 stable.
+**Size:** Modular packages; `ai` core ~67 kB gzipped.
+
+**Pros:**
+- Mature, well-maintained, large community
+- Built-in loop (`maxSteps`), `prepareStep` hook between turns, `onStepFinish` callback
+- OpenAI-compatible via `createOpenAICompatible({ baseURL, apiKey })`
+- Tool errors in streaming surfaced as `tool-error` parts fed back to model
+
+**Cons:**
+- **Context window management not built-in.** `pruneMessages` utility exists but the caller
+  decides strategy; no summarisation primitive. Replacing our context manager would require
+  adding a second dependency (e.g. tokenlens) and re-implementing the logic anyway.
+- Tool errors in `generateText` (non-streaming) are thrown, not fed back to the model — the
+  error-as-tool-result pattern must be re-implemented on top.
+- Web-first design bias (Next.js); CLI is supported but not the primary persona.
+
+---
+
+### Option C — puristajs/harness (v1.0.0)
+
+TypeScript-native harness with typed agent loop, pluggable provider adapters (OpenAI, Anthropic,
+Bedrock, Azure), and workflow primitives (approval gates, parallelisation).
+
+**Community:** 1 GitHub star · single maintainer · released May 2026 (< 5 weeks old at time of
+writing).
+
+**Pros:**
+- Designed precisely for embedded TypeScript harnesses
+- Explicit workflow primitives (approval gates, parallel agents) useful for future multi-agent work
+- Provider-agnostic with adapters for major providers
+
+**Cons:**
+- **Nascent community** — 1 star, single contributor, no visible production users.
+- Context window management strategy undocumented.
+- API stability unknown at v1.0, released < 5 weeks ago.
+- Unacceptable maintenance risk for a single-maintainer project at this maturity level.
+
+---
+
+### Option D — eggai-tech/EggAI
+
+Async-first multi-agent meta framework using agent-to-agent message passing over Kafka channels.
+
+**Community:** 47 GitHub stars · last commit Mar 2026.
+
+**Pros:**
+- Multi-language (Python, JS, Go, etc.) via shared Kafka transport.
+- Vendor-agnostic via LiteLLM integration.
+
+**Cons:**
+- **Architectural mismatch.** Distributed Kafka message-passing model vs. an embedded
+  synchronous CLI loop. Not designed as a single-agent conversation harness.
+- No built-in conversation loop for tool calling.
+- No context window management.
+- Kafka client dependency is heavyweight for a CLI that does synchronous code review.
+
+---
+
+### Comparison
+
+| Criterion                  | A — Hand-rolled   | B — Vercel AI SDK  | C — puristajs      | D — EggAI         |
+|----------------------------|-------------------|--------------------|--------------------|-------------------|
+| TypeScript-native          | ✅                | ✅                 | ✅                 | ⚠️ unclear        |
+| Built-in agentic loop      | ✅                | ✅                 | ✅                 | ❌                |
+| OpenAI wire format         | ✅                | ✅                 | ✅                 | ✅ via LiteLLM    |
+| Context window management  | ✅ built-in       | ❌ caller-owned    | ❓ undocumented    | ❌                |
+| Error-as-tool-result       | ✅                | ⚠️ streaming only  | ❓                 | ❌                |
+| GitHub stars               | —                 | 24,700             | 1                  | 47                |
+| Zero added dependencies    | ✅                | ❌                 | ❌                 | ❌                |
+
+## Proposed decision: Option A — hand-rolled harness
+
+The only credible external alternative is the Vercel AI SDK (Option B). It has strong community
+health but lacks the one feature that makes this harness non-trivial: proactive context window
+management with summarisation. Adopting it would require adding a second dependency for token
+counting and re-implementing the context manager anyway — more moving parts, not fewer.
+
+puristajs/harness (Option C) is the closest architectural match, but at 1 GitHub star and less
+than five weeks old it carries unacceptable maintenance risk for a production tool. A dependency
+on a single-maintainer library this early in its lifecycle is hard to justify.
+
+eggai-tech/EggAI (Option D) is architecturally mismatched. Its distributed Kafka model is the
+wrong abstraction for an embedded synchronous loop.
+
+The hand-rolled approach keeps QualOps self-contained, avoids version drift in a critical path,
+and allows the context manager to be tuned specifically for code review workloads where tool
+output can be unusually large.
+
+**This decision should be revisited if:**
+- The Vercel AI SDK adds built-in context window management with summarisation support.
+- A harness library with >1k stars, multiple contributors, and built-in context management
+  reaches maturity.
+- QualOps needs multi-agent orchestration (e.g. a planner agent delegating to specialist
+  agents) that the current flat single-agent loop cannot serve.
 
 ## Architecture
 
@@ -91,46 +209,66 @@ The harness is composed of three layers:
 
 ### Adapter
 
-The adapter (`OpenAICompatAdapter`) is the entry point and owns the session lifecycle. It receives
-a system prompt, a user prompt, a set of tool definitions, and configuration — model name, endpoint
-URL, API key, and a turn budget. It builds an initial message history and enters a loop.
+The adapter (`OpenAICompatAdapter`) is the entry point and owns the session lifecycle. It
+implements an **imperative while loop** — the same pattern used by the OpenAI Agents SDK and
+smolagents. The alternative (a declarative graph such as LangGraph's `StateGraph`) is more
+expressive for multi-agent topologies but adds conceptual overhead that is unnecessary for
+flat single-agent reasoning.
+
+The adapter receives a system prompt, a user prompt, a set of tool definitions, and configuration
+— model name, endpoint URL, API key, and a turn budget. It builds an initial message history and
+enters the loop.
 
 On each turn it sends the current history to the endpoint and waits for a reply. If the model
 signals it is done (`finish_reason: stop`) the loop exits and the assistant's final text is
-returned. If the model requests tool calls, the adapter executes each one in sequence, appends the
-results to the history, and continues to the next turn. Execution is sequential rather than
-parallel because some tools — specifically the bash session tool — are stateful and cannot be
-interleaved safely.
+returned. If the model requests tool calls, the adapter dispatches each one, appends the results,
+and continues to the next turn.
 
 The adapter handles HTTP-level errors with exponential backoff for rate limiting (429) and server
-errors (5xx), and maps each terminal condition to a stable error subtype string that callers can
-inspect without parsing error messages. Tool errors — unknown tool names, malformed arguments,
-execution failures — are returned to the model as tool result messages rather than thrown. This
-lets the model observe the error and recover without restarting the session.
+errors (5xx), and maps each terminal condition to a stable `errorSubtype` string that callers can
+match without parsing error messages. This gives callers a machine-readable signal for every
+failure mode.
+
+Tool errors — unknown tool names, malformed arguments, execution failures — are handled using the
+**error-as-tool-result pattern**: the error is formatted as a `tool` role message and appended to
+history, letting the model observe what went wrong and recover in the next turn. This is the same
+pattern used by the OpenAI Agents SDK's `ToolErrorFormatter`. The alternative (throwing the
+error to the caller) would abort the session for what are often recoverable conditions.
 
-The adapter is stateless between `run()` calls; all session state lives in the message history
-that it builds up turn by turn. Subagent orchestration is not supported — the model reasons flat
-using the provided tools.
+The adapter is stateless between `run()` calls; all session state lives in the message history.
+Subagent orchestration is not supported — the model reasons flat using the provided tools.
 
 ### Context Manager
 
-The context manager (`ContextManager`) is called at the start of each turn, before the request is
-sent. It estimates the token count of the current history and compares it against the known context
-limit for the model.
+The context manager (`maybeSummarize`) is called at the start of each turn, before the request
+is sent. It estimates the token count of the current history and compares it against the known
+context limit for the model.
+
+**Why 60%, not 80%?** Letta/MemGPT triggers compression at 80% because it works *reactively*:
+the response has already been received and is in hand. This harness compresses *proactively*,
+before sending the next request. The remaining 40% of the window must budget for two things: the
+summarisation call response and the model's next substantive reply. 60% is the correct trigger
+for a proactive strategy; 80% would risk running out of context mid-summarisation.
+
+When compression is needed, the context manager applies **tool exchange atomicity**: the assistant
+message containing tool calls and all of its corresponding tool results are treated as an
+indivisible unit. They are summarised or dropped together, never split. (smolagents enforces the
+same principle via its `ActionStep` abstraction; Letta has a `group_id` field but enforces it
+weakly.) Splitting a tool call from its result would leave the model with an inconsistent view of
+what happened.
 
-If the history exceeds 60% of the context window, the context manager compresses the oldest tool
-exchange in the history. It does this by sending a one-shot request to the same endpoint, asking
-it to summarise what was done and found in that exchange. The summary replaces the original
-exchange in place — an assistant message with tool calls and the corresponding tool results become
-a single system message prefixed `[summarized]`.
+The **preservation contract** is explicit: the system message at index 0 and the original user
+task at index 1 are architecturally protected and are never compressed or dropped, regardless of
+context pressure. This mirrors Letta's pinned `in_context_messages[0]` and smolagents'
+immutable `SystemPromptStep`.
 
-The 60% threshold is intentional. It ensures there is always room in the remaining window for both
-the summary response itself and the model's next substantive reply. Triggering compression at a
-higher threshold risks running out of context mid-summarisation.
+Token counting uses an approximation (`estimateTokens` on `JSON.stringify(history)`) rather than
+exact per-token counting (e.g. tiktoken). This is a deliberate performance trade-off. The
+consequence of a false positive is an extra summarisation call; the consequence of a false
+negative would be a context overflow. The approximation errs toward early compression.
 
-If the summary call fails for any reason, the context manager falls back to hard truncation:
-dropping the oldest tool exchange entirely rather than replacing it with a summary. The system and
-user messages at the start of the history are always preserved.
+If the summary call fails, the context manager falls back to hard truncation: dropping the oldest
+tool exchange entirely. System and user messages are always preserved.
 
 ### Tool Dispatch
 
@@ -139,6 +277,13 @@ JSON Schema (draft-7 target) using the existing `schemaToJsonSchema` utility. Dr
 over draft-2020-12 because some vendor endpoints have narrow JSON Schema parsers that reject the
 newer `$schema` markers.
 
+Tool calls are dispatched **sequentially**. This is not an arbitrary constraint — LangGraph,
+the OpenAI Agents SDK, and smolagents all default to sequential execution for the same reason:
+stateful tools cannot be safely interleaved. The bash session tool maintains shell state between
+calls (working directory, environment variables, running processes). Parallel dispatch would
+produce non-deterministic results. LangGraph's `Send` API enables parallel execution, but only
+for tools that are stateless and idempotent — a prerequisite that does not hold here.
+
 ## Configuration
 
 The `openai-compat` provider is configured inside the standard `ai.reviewStage` block:
@@ -160,17 +305,30 @@ The `openai-compat` provider is configured inside the standard `ai.reviewStage`
 
 `baseURL` identifies the chat completions base URL. It falls back to the `OPENAI_BASE_URL`
 environment variable, then to the standard OpenAI endpoint. `apiKeyEnvVar` is the name of the
-environment variable that holds the API key; it falls back to `OPENAI_API_KEY`, and if neither is
-set the key is empty — which is valid for local endpoints like Ollama that require no
-authentication.
+environment variable that holds the API key; it falls back to `OPENAI_API_KEY`, and if neither
+is set the key is empty — valid for local endpoints like Ollama that require no authentication.
 
-Structured output capability is derived automatically from the model name via the existing litellm
-capability catalog. No manual configuration is required.
+Structured output capability is derived automatically from the model name via the existing
+litellm capability catalog. No manual configuration is required.
 
 ## What This Does Not Do (V1)
 
 - **Subagent orchestration.** The model reasons flat using the tools it is given. Multi-agent
-  handoff requires SDK-level orchestration that is out of scope here.
+  delegation (e.g. a planner handing off to specialist agents) requires a graph-based
+  orchestration layer such as LangGraph's `StateGraph` and is out of scope here.
+
 - **Budget enforcement.** `maxBudgetUsd` is accepted and passed through but not enforced.
-- **Streaming.** Responses are collected in full before processing.
-- **Parallel tool execution.** Tool calls are always sequential.
+
+- **Streaming.** Responses are collected in full before processing. Streaming would require
+  assembling partial `tool_calls` deltas before dispatch and is a non-trivial addition.
+
+- **Parallel tool execution.** Tool calls are always sequential. LangGraph's `Send` API is the
+  industry model for parallel dispatch, but it requires all tools to be stateless and idempotent.
+  The bash session tool is stateful, so this prerequisite cannot be met without architectural
+  changes to the tool layer.
+
+- **Reactive overflow recovery.** If a single tool result exceeds the remaining context budget
+  after proactive compression (e.g. a file read returning a megabyte of output), the endpoint
+  returns an HTTP 400. The harness surfaces this as an unhandled error. Proactive compression
+  reduces the probability but does not eliminate it; a per-tool result size limit would be the
+  correct fix.

From 8cf9da6368a624af745143181ec41334bfc4522b Mon Sep 17 00:00:00 2001
From: Valdis Pornieks <pornieks@gmail.com>
Date: Tue, 9 Jun 2026 16:47:13 +0300
Subject: [PATCH 3/4] chore: add option to create our own harness package

---
 ...4-openai-compat-adapter-with-agent-loop.md | 85 +++++++++++--------
 1 file changed, 51 insertions(+), 34 deletions(-)

diff --git a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md
index 9754ae7..62c15a6 100644
--- a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md
+++ b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md
@@ -104,42 +104,59 @@ Async-first multi-agent meta framework using agent-to-agent message passing over
 
 ---
 
+### Option E — Extract to `@eggai/harness` (new eggai-tech repo)
+
+Create a standalone npm package under the `eggai-tech` GitHub organisation covering the same
+scope as the hand-rolled implementation: agentic loop, context manager, tool dispatch. QualOps
+would become a consumer of that package rather than owning the code directly.
+
+**Pros:**
+- Reusable across other eggai-tech projects and by the community
+- Forces a clean, well-documented public API boundary between harness concerns and
+  QualOps-specific concerns (code review prompts, tool definitions, skip patterns)
+- Community contributions improve the harness without QualOps being the sole maintainer
+- Validates the design against other use cases, surfacing hidden QualOps-specific assumptions
+
+**Cons:**
+- Requires creating and maintaining a new open-source repo: CI, releases, semver, changelog,
+  documentation, issue triage
+- QualOps acquires a runtime dependency on a package the same team owns — version drift is
+  self-inflicted but still real (breaking changes require coordinated releases)
+- The harness design needs to stabilise first; extracting too early locks in an API that may
+  still need to change as QualOps workloads reveal new requirements
+- Until a second eggai-tech project needs the same harness, the overhead of a separate repo
+  is not justified by reuse benefit
+
+---
+
 ### Comparison
 
-| Criterion                  | A — Hand-rolled   | B — Vercel AI SDK  | C — puristajs      | D — EggAI         |
-|----------------------------|-------------------|--------------------|--------------------|-------------------|
-| TypeScript-native          | ✅                | ✅                 | ✅                 | ⚠️ unclear        |
-| Built-in agentic loop      | ✅                | ✅                 | ✅                 | ❌                |
-| OpenAI wire format         | ✅                | ✅                 | ✅                 | ✅ via LiteLLM    |
-| Context window management  | ✅ built-in       | ❌ caller-owned    | ❓ undocumented    | ❌                |
-| Error-as-tool-result       | ✅                | ⚠️ streaming only  | ❓                 | ❌                |
-| GitHub stars               | —                 | 24,700             | 1                  | 47                |
-| Zero added dependencies    | ✅                | ❌                 | ❌                 | ❌                |
-
-## Proposed decision: Option A — hand-rolled harness
-
-The only credible external alternative is the Vercel AI SDK (Option B). It has strong community
-health but lacks the one feature that makes this harness non-trivial: proactive context window
-management with summarisation. Adopting it would require adding a second dependency for token
-counting and re-implementing the context manager anyway — more moving parts, not fewer.
-
-puristajs/harness (Option C) is the closest architectural match, but at 1 GitHub star and less
-than five weeks old it carries unacceptable maintenance risk for a production tool. A dependency
-on a single-maintainer library this early in its lifecycle is hard to justify.
-
-eggai-tech/EggAI (Option D) is architecturally mismatched. Its distributed Kafka model is the
-wrong abstraction for an embedded synchronous loop.
-
-The hand-rolled approach keeps QualOps self-contained, avoids version drift in a critical path,
-and allows the context manager to be tuned specifically for code review workloads where tool
-output can be unusually large.
-
-**This decision should be revisited if:**
-- The Vercel AI SDK adds built-in context window management with summarisation support.
-- A harness library with >1k stars, multiple contributors, and built-in context management
-  reaches maturity.
-- QualOps needs multi-agent orchestration (e.g. a planner agent delegating to specialist
-  agents) that the current flat single-agent loop cannot serve.
+| Criterion                  | A — Hand-rolled   | B — Vercel AI SDK  | C — puristajs      | D — EggAI          | E — @eggai/harness  |
+|----------------------------|-------------------|--------------------|--------------------|--------------------|---------------------|
+| TypeScript-native          | ✅                | ✅                 | ✅                 | ⚠️ unclear         | ✅                  |
+| Built-in agentic loop      | ✅                | ✅                 | ✅                 | ❌                 | ✅ (planned)        |
+| OpenAI wire format         | ✅                | ✅                 | ✅                 | ✅ via LiteLLM     | ✅ (planned)        |
+| Context window management  | ✅ built-in       | ❌ caller-owned    | ❓ undocumented    | ❌                 | ✅ (planned)        |
+| Error-as-tool-result       | ✅                | ⚠️ streaming only  | ❓                 | ❌                 | ✅ (planned)        |
+| GitHub stars               | —                 | 24,700             | 1                  | 47                 | — (new)             |
+| Zero added dependencies    | ✅                | ❌                 | ❌                 | ❌                 | ❌                  |
+| Reusable outside QualOps   | ❌                | ✅                 | ✅                 | ✅                 | ✅                  |
+
+## Proposed decision: Option A — build inside QualOps
+
+Option A is the recommended starting point. Build the harness inside QualOps, let the design
+stabilise against real code review workloads, then revisit once there is evidence that the
+approach should change.
+
+**Revisit when:**
+- A second eggai-tech project needs an agentic loop — Option E (`@eggai/harness`) becomes
+  viable and the reuse benefit justifies a separate repo
+- The Vercel AI SDK (Option B) adds built-in summarisation-based context management —
+  the main gap that currently rules it out
+- A community harness library with >1k stars and multi-contributor context management
+  reaches maturity — revisit the external dependency options
+- QualOps needs multi-agent orchestration (e.g. a planner delegating to specialist agents)
+  that the current flat loop cannot serve — revisit graph-based options or Option E
 
 ## Architecture
 

From 8bf62fe6a26340ac96982696b18932edf08a48b0 Mon Sep 17 00:00:00 2001
From: Valdis Pornieks <pornieks@gmail.com>
Date: Wed, 10 Jun 2026 15:48:20 +0300
Subject: [PATCH 4/4] chore: add additional considerations for integrating with
 configurable-agent

---
 ...4-openai-compat-adapter-with-agent-loop.md | 384 ++++++++++--------
 1 file changed, 218 insertions(+), 166 deletions(-)

diff --git a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md
index 62c15a6..06b4fd2 100644
--- a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md
+++ b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md
@@ -85,78 +85,152 @@ writing).
 
 ---
 
-### Option D — eggai-tech/EggAI
+### Option D — configurable-agent (eggai-tech in-house harness)
 
-Async-first multi-agent meta framework using agent-to-agent message passing over Kafka channels.
+A standalone TypeScript service (Node 22, ESM) built and maintained by the eggai-tech team,
+already in production use for other agents in the organisation. It runs an agentic for-loop
+(`runAgent()`) built on the **Vercel AI SDK (`ai` v5)** with:
+- Streaming each step via `streamText()`
+- `ToolSet` injection via `RunAgentOptions.tools` — QualOps passes its own tools without MCP
+- Structured `AgentEvent` callbacks (`tool_call`, `tool_result`, `content_delta`, `final`, `error`)
+- Context compaction (LLM-based summarisation at 100k tokens) and tool output summarisation (>4k tokens)
+- OpenAI-compatible endpoints via `@ai-sdk/openai-compatible` (provider: `ollama` or `openai-compatible`, configurable `baseUrl`)
 
-**Community:** 47 GitHub stars · last commit Mar 2026.
+**Current state:** `runAgent()` exists and is tested, but is not yet exposed as a public library
+entry point. A one-time extraction is needed before QualOps can depend on it.
 
 **Pros:**
-- Multi-language (Python, JS, Go, etc.) via shared Kafka transport.
-- Vendor-agnostic via LiteLLM integration.
+- Loop, context compaction, and tool output summarisation already built and tested in production
+- Vercel AI SDK foundation — OpenAI-compatible endpoints supported natively
+- `ToolSet` injection path already exists — no MCP server required
+- `AgentEvent` emitter pattern is clean and observable
+- Maintained by the same organisation — no external dependency risk, aligned direction
+- Removes ~300 lines of hand-rolled loop from QualOps
 
 **Cons:**
-- **Architectural mismatch.** Distributed Kafka message-passing model vs. an embedded
-  synchronous CLI loop. Not designed as a single-agent conversation harness.
-- No built-in conversation loop for tool calling.
-- No context window management.
-- Kafka client dependency is heavyweight for a CLI that does synchronous code review.
+- `runAgent()` is not yet a public library API — requires a one-time library entry point extraction in the configurable-agent repo
+- QualOps tools (Zod schema + execute function) must be adapted to Vercel AI SDK `ToolSet` shape
+- `AgentConfig` is currently YAML/file-driven — QualOps must construct it programmatically
+- Streaming-first design: QualOps must accumulate `content_delta` events until `final` to get output string
+- Replaces our tested context manager with configurable-agent's compaction (different threshold: 100k tokens vs 60% proactive — needs validation)
+- `provider` enum covers `anthropic | openai | google | ollama` — `openai-compatible` may need to be added or `ollama` repurposed as the generic path
 
 ---
 
-### Option E — Extract to `@eggai/harness` (new eggai-tech repo)
+### Comparison
 
-Create a standalone npm package under the `eggai-tech` GitHub organisation covering the same
-scope as the hand-rolled implementation: agentic loop, context manager, tool dispatch. QualOps
-would become a consumer of that package rather than owning the code directly.
+| Criterion                  | A — Hand-rolled   | B — Vercel AI SDK  | C — puristajs      | D — configurable-agent |
+|----------------------------|-------------------|--------------------|--------------------|-----------------------|
+| TypeScript-native          | ✅                | ✅                 | ✅                 | ✅                    |
+| Built-in agentic loop      | ✅                | ✅                 | ✅                 | ✅                    |
+| OpenAI wire format         | ✅                | ✅                 | ✅                 | ✅ via Vercel AI SDK  |
+| Context window management  | ✅ built-in       | ❌ caller-owned    | ❓ undocumented    | ✅ built-in           |
+| Error-as-tool-result       | ✅                | ⚠️ streaming only  | ❓                 | ✅ via AgentEvent     |
+| GitHub stars               | —                 | 24,700             | 1                  | — (internal)          |
+| Zero added dependencies    | ✅                | ❌                 | ❌                 | ❌                    |
+| Reusable outside QualOps   | ❌                | ✅                 | ✅                 | ✅ (by design)        |
+| Same-org ownership         | ✅                | ❌                 | ❌                 | ✅                    |
+
+## Decision: Option D — configurable-agent
+
+Option D is the recommended approach. The configurable-agent harness is built by the same
+eggai-tech team, already production-tested, and removes the maintenance burden of a hand-rolled
+loop from QualOps. The `tools` injection path (`RunAgentOptions.tools`) means QualOps does not
+need to run MCP servers — it constructs its own `ToolSet` at call time. The Vercel AI SDK
+already handles OpenAI-compatible endpoints, satisfying the core requirement.
+
+Beyond QualOps, standardising on configurable-agent's `AgentEvent` API creates shared
+infrastructure for the wider eggai-tech agent portfolio: the structured event stream is the
+natural integration point for external observability tools (Langfuse), and a config-driven
+`AgentConfig` supports low-code deployment patterns that reduce the audit surface clients
+must review before approving production use. These benefits compound as more agents in the
+organisation adopt the same harness.
+
+## Technical tasks
+
+### Phase 1 — Prerequisite refactors
+
+These changes should be made before any integration work begins. They make both repositories
+cleaner independently of each other and remove the need for workarounds in the adapter.
+
+#### configurable-agent
+
+**1. Add `openai-compatible` as a named provider**
+
+`model.ts` currently handles OpenAI-compatible endpoints only via the `ollama` case, which calls
+`createOpenAICompatible({ name: 'ollama', baseURL })` without passing an API key. This silently
+breaks any authenticated endpoint (Mistral, Groq, DeepSeek, etc.).
+
+Add `openai-compatible` to the `ModelProvider` enum, add `apiKey?: string` to the `model` object
+in `AgentConfigSchema`, and add a corresponding case in `buildModel`:
+
+```typescript
+case 'openai-compatible': {
+  const baseURL = cfg.baseUrl ?? process.env.OPENAI_BASE_URL ?? 'https://api.openai.com/v1';
+  const apiKey  = cfg.apiKey  ?? process.env.OPENAI_API_KEY  ?? '';
+  const compat  = createOpenAICompatible({ name: 'openai-compatible', baseURL, apiKey });
+  return compat(cfg.name);
+}
+```
 
-**Pros:**
-- Reusable across other eggai-tech projects and by the community
-- Forces a clean, well-documented public API boundary between harness concerns and
-  QualOps-specific concerns (code review prompts, tool definitions, skip patterns)
-- Community contributions improve the harness without QualOps being the sole maintainer
-- Validates the design against other use cases, surfacing hidden QualOps-specific assumptions
+**Why not reuse `ollama`?** Injecting a Mistral API key via `OLLAMA_BASE_URL` / `OLLAMA_API_KEY`
+is misleading and fragile. A named provider is clearer and avoids confusion in logs and telemetry.
 
-**Cons:**
-- Requires creating and maintaining a new open-source repo: CI, releases, semver, changelog,
-  documentation, issue triage
-- QualOps acquires a runtime dependency on a package the same team owns — version drift is
-  self-inflicted but still real (breaking changes require coordinated releases)
-- The harness design needs to stabilise first; extracting too early locks in an API that may
-  still need to change as QualOps workloads reveal new requirements
-- Until a second eggai-tech project needs the same harness, the overhead of a separate repo
-  is not justified by reuse benefit
+---
+
+**2. Expose `runAgent` and types as a library entry point**
+
+`runAgent`, `AgentConfig`, `AgentEmitter`, `AgentEvent`, and `RunAgentOptions` are defined and
+tested but not reachable from outside the package. Add a library entry point:
+
+```json
+// package.json
+"exports": {
+  ".": "./dist/index.js",
+  "./lib": "./dist/lib/index.js"
+}
+```
+
+```typescript
+// lib/index.ts
+export { runAgent, prepareMessages } from './agent/loop.js';
+export type { RunAgentOptions }      from './agent/loop.js';
+export type { AgentConfig }          from './config/schema.js';
+export type { AgentEmitter, AgentEvent, ToolResult } from './agent/events.js';
+```
+
+#### QualOps
+
+**4. Remove `OpenAICompatAdapter` and `context-manager.ts`**
+
+The hand-rolled agentic adapter (`src/stages/review/agentic/adapters/openai-compat-adapter.ts`)
+and context manager (`src/stages/review/agentic/adapters/context-manager.ts`) are replaced by
+the configurable-agent integration. Removing them before writing the new adapter avoids confusion
+about which loop is active and eliminates dead code in the test suite.
 
 ---
 
-### Comparison
+### Phase 2 — Integration
+
+Once the Phase 1 refactors are merged in both repos:
+
+1. **Add dependency** on configurable-agent (local path dep initially, then published to npm).
 
-| Criterion                  | A — Hand-rolled   | B — Vercel AI SDK  | C — puristajs      | D — EggAI          | E — @eggai/harness  |
-|----------------------------|-------------------|--------------------|--------------------|--------------------|---------------------|
-| TypeScript-native          | ✅                | ✅                 | ✅                 | ⚠️ unclear         | ✅                  |
-| Built-in agentic loop      | ✅                | ✅                 | ✅                 | ❌                 | ✅ (planned)        |
-| OpenAI wire format         | ✅                | ✅                 | ✅                 | ✅ via LiteLLM     | ✅ (planned)        |
-| Context window management  | ✅ built-in       | ❌ caller-owned    | ❓ undocumented    | ❌                 | ✅ (planned)        |
-| Error-as-tool-result       | ✅                | ⚠️ streaming only  | ❓                 | ❌                 | ✅ (planned)        |
-| GitHub stars               | —                 | 24,700             | 1                  | 47                 | — (new)             |
-| Zero added dependencies    | ✅                | ❌                 | ❌                 | ❌                 | ❌                  |
-| Reusable outside QualOps   | ❌                | ✅                 | ✅                 | ✅                 | ✅                  |
-
-## Proposed decision: Option A — build inside QualOps
-
-Option A is the recommended starting point. Build the harness inside QualOps, let the design
-stabilise against real code review workloads, then revisit once there is evidence that the
-approach should change.
-
-**Revisit when:**
-- A second eggai-tech project needs an agentic loop — Option E (`@eggai/harness`) becomes
-  viable and the reuse benefit justifies a separate repo
-- The Vercel AI SDK (Option B) adds built-in summarisation-based context management —
-  the main gap that currently rules it out
-- A community harness library with >1k stars and multi-contributor context management
-  reaches maturity — revisit the external dependency options
-- QualOps needs multi-agent orchestration (e.g. a planner delegating to specialist agents)
-  that the current flat loop cannot serve — revisit graph-based options or Option E
+2. **New adapter** `ConfigurableAgentAdapter` implementing `AgentAdapter`:
+   - Constructs `AgentConfig` programmatically from `AgentAdapterParams`
+     (systemPrompt, model, maxTurns → maxSteps, maxOutputTokens, baseUrl, apiKey)
+   - Converts QualOps `ToolDefinition[]` to Vercel AI SDK `ToolSet` format
+   - Calls `runAgent(config, [userMessage], emitter, undefined, { tools })`
+   - Accumulates `content_delta` events into an output string; maps `error` event codes to
+     `errorSubtype` values; extracts token usage from the `final` event
+
+3. **Register adapter** — wire `ConfigurableAgentAdapter` in
+   `src/stages/review/agentic/adapters/index.ts` for both `anthropic` and `openai-compatible`
+   providers (replacing `AnthropicAdapter`, `OpenAIAdapter`, and `OpenAICompatibleAdapter`).
+
+No existing code changes — purely additive.
+
+---
 
 ## Architecture
 
@@ -166,55 +240,58 @@ approach should change.
          │  run(params)
          ▼
 ┌─────────────────────────────────────────────────────────────────┐
-│  OpenAICompatAdapter                                            │
+│  ConfigurableAgentAdapter (QualOps)                                 │
+│                                                                 │
+│  · Constructs AgentConfig from AgentAdapterParams               │
+│  · Converts ToolDefinition[] → Vercel AI SDK ToolSet            │
+│  · Accumulates AgentEvent stream into AgentAdapterResult        │
+└──────────────────────────┬──────────────────────────────────────┘
+                           │  runAgent(config, messages, emit, options)
+                           ▼
+┌─────────────────────────────────────────────────────────────────┐
+│  configurable-agent: runAgent()                                 │
 │                                                                 │
 │  ┌──────────────────────────────────────────────────────────┐   │
-│  │  Session state: ChatMessage[]                            │   │
+│  │  Message history: CoreMessage[]                          │   │
 │  │  [ system ] [ user ] [ assistant ] [ tool ] [ ... ]      │   │
 │  └───────────────────────┬──────────────────────────────────┘   │
 │                          │                                      │
-│          ┌───── turn loop (1..maxTurns) ──────────────┐         │
+│          ┌───── step loop (1..maxSteps) ──────────────┐         │
 │          │                                            │         │
 │          │  ┌─────────────────────────┐               │         │
-│          │  │  ContextManager         │               │         │
-│          │  │  maybeSummarize()       │               │         │
-│          │  │  · estimate tokens      │               │         │
-│          │  │  · if > 60% window:     │               │         │
-│          │  │    summarise oldest     │               │         │
-│          │  │    exchange → [summary] │               │         │
-│          │  │    fallback: truncate   │               │         │
+│          │  │  Context Compaction     │               │         │
+│          │  │  · trigger at 100k tok  │               │         │
+│          │  │  · LLM-based summary    │               │         │
+│          │  │  · tool output >4k tok  │               │         │
+│          │  │    summarised inline    │               │         │
 │          │  └────────────┬────────────┘               │         │
-│          │               │ compressed history         │         │
+│          │               │ compacted history          │         │
 │          │               ▼                            │         │
 │          │  ┌─────────────────────────┐               │         │
-│          │  │  fetchWithRetry()       │               │         │
-│          │  │  POST /chat/completions │◄──────────────┼────────►│ LLM endpoint
-│          │  │  · 429/5xx: backoff×3   │               │         │
-│          │  │  · 401: immediate fail  │               │         │
+│          │  │  streamText()           │               │         │
+│          │  │  (Vercel AI SDK)        │◄──────────────┼────────►│ LLM endpoint
+│          │  │  · @ai-sdk/openai-      │               │         │
+│          │  │    compatible           │               │         │
 │          │  └────────────┬────────────┘               │         │
-│          │               │ ChatCompletionResponse     │         │
+│          │               │ StreamTextResult           │         │
 │          │               ▼                            │         │
-│          │       finish_reason?                       │         │
-│          │       ├─ stop ──────────────────► return output      │
-│          │       ├─ length ────────────────► error_max_tokens   │
-│          │       ├─ content_filter ────────► error_content_filter
-│          │       └─ tool_calls                        │         │
+│          │       stopReason?                          │         │
+│          │       ├─ stop ──── emit final ──────────── ┼── ✓     │
+│          │       ├─ length ── emit error(code) ─────── ┼── ✗    │
+│          │       └─ tool-calls                        │         │
 │          │               │                            │         │
 │          │               ▼                            │         │
 │          │  ┌─────────────────────────┐               │         │
 │          │  │  Tool Dispatch          │               │         │
 │          │  │  (sequential)           │               │         │
-│          │  │  · resolve by name      │               │         │
-│          │  │  · parse JSON args      │               │         │
-│          │  │  · execute handler      │               │         │
-│          │  │  · append tool result   │               │         │
+│          │  │  · execute ToolSet fn   │               │         │
+│          │  │  · emit tool_call       │               │         │
+│          │  │  · emit tool_result     │               │         │
 │          │  └─────────────────────────┘               │         │
 │          │                                            │         │
 │          └────────────────────────────────────────────┘         │
 │                          │                                      │
-│               maxTurns exceeded → error_max_turns               │
-│                                                                 │
-│  finally: toolSet.dispose()                                     │
+│               maxSteps exceeded → emit error(max_steps)         │
 └─────────────────────────────────────────────────────────────────┘
          │
          │  AgentAdapterResult { output, inputTokens, outputTokens, errorSubtype? }
@@ -222,94 +299,63 @@ approach should change.
   Caller (AgenticExecutor)
 ```
 
-The harness is composed of three layers:
-
-### Adapter
-
-The adapter (`OpenAICompatAdapter`) is the entry point and owns the session lifecycle. It
-implements an **imperative while loop** — the same pattern used by the OpenAI Agents SDK and
-smolagents. The alternative (a declarative graph such as LangGraph's `StateGraph`) is more
-expressive for multi-agent topologies but adds conceptual overhead that is unnecessary for
-flat single-agent reasoning.
-
-The adapter receives a system prompt, a user prompt, a set of tool definitions, and configuration
-— model name, endpoint URL, API key, and a turn budget. It builds an initial message history and
-enters the loop.
-
-On each turn it sends the current history to the endpoint and waits for a reply. If the model
-signals it is done (`finish_reason: stop`) the loop exits and the assistant's final text is
-returned. If the model requests tool calls, the adapter dispatches each one, appends the results,
-and continues to the next turn.
-
-The adapter handles HTTP-level errors with exponential backoff for rate limiting (429) and server
-errors (5xx), and maps each terminal condition to a stable `errorSubtype` string that callers can
-match without parsing error messages. This gives callers a machine-readable signal for every
-failure mode.
-
-Tool errors — unknown tool names, malformed arguments, execution failures — are handled using the
-**error-as-tool-result pattern**: the error is formatted as a `tool` role message and appended to
-history, letting the model observe what went wrong and recover in the next turn. This is the same
-pattern used by the OpenAI Agents SDK's `ToolErrorFormatter`. The alternative (throwing the
-error to the caller) would abort the session for what are often recoverable conditions.
-
-The adapter is stateless between `run()` calls; all session state lives in the message history.
-Subagent orchestration is not supported — the model reasons flat using the provided tools.
-
-### Context Manager
-
-The context manager (`maybeSummarize`) is called at the start of each turn, before the request
-is sent. It estimates the token count of the current history and compares it against the known
-context limit for the model.
-
-**Why 60%, not 80%?** Letta/MemGPT triggers compression at 80% because it works *reactively*:
-the response has already been received and is in hand. This harness compresses *proactively*,
-before sending the next request. The remaining 40% of the window must budget for two things: the
-summarisation call response and the model's next substantive reply. 60% is the correct trigger
-for a proactive strategy; 80% would risk running out of context mid-summarisation.
-
-When compression is needed, the context manager applies **tool exchange atomicity**: the assistant
-message containing tool calls and all of its corresponding tool results are treated as an
-indivisible unit. They are summarised or dropped together, never split. (smolagents enforces the
-same principle via its `ActionStep` abstraction; Letta has a `group_id` field but enforces it
-weakly.) Splitting a tool call from its result would leave the model with an inconsistent view of
-what happened.
-
-The **preservation contract** is explicit: the system message at index 0 and the original user
-task at index 1 are architecturally protected and are never compressed or dropped, regardless of
-context pressure. This mirrors Letta's pinned `in_context_messages[0]` and smolagents'
-immutable `SystemPromptStep`.
-
-Token counting uses an approximation (`estimateTokens` on `JSON.stringify(history)`) rather than
-exact per-token counting (e.g. tiktoken). This is a deliberate performance trade-off. The
-consequence of a false positive is an extra summarisation call; the consequence of a false
-negative would be a context overflow. The approximation errs toward early compression.
-
-If the summary call fails, the context manager falls back to hard truncation: dropping the oldest
-tool exchange entirely. System and user messages are always preserved.
-
-### Tool Dispatch
-
-Tool definitions are Zod schemas. Before the first request, the adapter converts each schema to
-JSON Schema (draft-7 target) using the existing `schemaToJsonSchema` utility. Draft-7 is chosen
-over draft-2020-12 because some vendor endpoints have narrow JSON Schema parsers that reject the
-newer `$schema` markers.
-
-Tool calls are dispatched **sequentially**. This is not an arbitrary constraint — LangGraph,
-the OpenAI Agents SDK, and smolagents all default to sequential execution for the same reason:
-stateful tools cannot be safely interleaved. The bash session tool maintains shell state between
-calls (working directory, environment variables, running processes). Parallel dispatch would
-produce non-deterministic results. LangGraph's `Send` API enables parallel execution, but only
-for tools that are stateless and idempotent — a prerequisite that does not hold here.
+The integration is composed of two layers:
+
+### QualOps Adapter
+
+The QualOps adapter (`ConfigurableAgentAdapter`) bridges `AgentAdapterParams` and
+`configurable-agent`'s `runAgent()`. It is responsible for:
+
+1. **Config construction** — building an `AgentConfig` programmatically from `AgentAdapterParams`
+   (model name, endpoint URL, API key, maxSteps, maxOutputTokens) rather than loading a YAML file.
+
+2. **Tool conversion** — mapping QualOps `ToolDefinition[]` (Zod schema + execute function) to
+   Vercel AI SDK `ToolSet` format (`{ description, parameters: z.ZodType, execute }`). This is
+   the same shape QualOps tools already have; the conversion is mechanical.
+
+3. **Event accumulation** — subscribing to `AgentEmitter` callbacks and:
+   - Concatenating `content_delta` events into an output string
+   - Mapping `error` event codes to `errorSubtype` values that callers can match
+   - Extracting token usage from the `final` event to populate `AgentAdapterResult`
+
+The adapter is stateless between `run()` calls. All session state lives inside `runAgent()`.
+
+### configurable-agent Loop
+
+`runAgent()` implements an **imperative for-loop** over `streamText()` steps — the same pattern
+used by the OpenAI Agents SDK and smolagents. The alternative (a declarative graph such as
+LangGraph's `StateGraph`) is more expressive for multi-agent topologies but adds conceptual
+overhead that is unnecessary for flat single-agent reasoning.
+
+**Context compaction** triggers at 100k tokens (reactive, after receiving a response). When
+the threshold is exceeded, older messages are summarised by an LLM call and replaced with a
+`[COMPACTED CONTEXT]` system message; the 6 most recent messages are always kept verbatim.
+Both thresholds are hardcoded in `ConfigurableAgentAdapter` and are not currently surfaced
+as `.qualopsrc.json` configuration.
+
+**Tool output truncation** fires before compaction: any tool result exceeding 4k tokens is
+trimmed to head 500 + tail 500 characters inline, keeping history manageable before the
+compaction threshold is ever reached.
+
+**Tool exchange atomicity** is maintained: the Vercel AI SDK treats each step's tool calls and
+results as a unit. Tool output summarisation (>4k tokens per result) reduces history bloat before
+it reaches the compaction threshold.
+
+**Tool dispatch is sequential**. This is not an arbitrary constraint — LangGraph, the OpenAI
+Agents SDK, and smolagents all default to sequential execution for the same reason: stateful tools
+cannot be safely interleaved. The bash session tool maintains shell state between calls (working
+directory, environment variables, running processes). Parallel dispatch would produce
+non-deterministic results.
 
 ## Configuration
 
-The `openai-compat` provider is configured inside the standard `ai.reviewStage` block:
+The `openai-compatible` provider is configured inside the standard `ai.reviewStage` block:
 
 ```json
 {
   "ai": {
     "reviewStage": {
-      "provider": "openai-compat",
+      "provider": "openai-compatible",
       "model": "mistral-small-latest",
       "baseURL": "https://api.mistral.ai/v1",
       "apiKeyEnvVar": "MISTRAL_API_KEY",
@@ -336,8 +382,9 @@ litellm capability catalog. No manual configuration is required.
 
 - **Budget enforcement.** `maxBudgetUsd` is accepted and passed through but not enforced.
 
-- **Streaming.** Responses are collected in full before processing. Streaming would require
-  assembling partial `tool_calls` deltas before dispatch and is a non-trivial addition.
+- **Streaming to the caller.** `runAgent()` streams internally (Vercel AI SDK `streamText`),
+  but QualOps collects the full output via `AgentEvent` accumulation before returning
+  `AgentAdapterResult`. Surfacing incremental output to the CLI is not yet implemented.
 
 - **Parallel tool execution.** Tool calls are always sequential. LangGraph's `Send` API is the
   industry model for parallel dispatch, but it requires all tools to be stateless and idempotent.
@@ -349,3 +396,8 @@ litellm capability catalog. No manual configuration is required.
   returns an HTTP 400. The harness surfaces this as an unhandled error. Proactive compression
   reduces the probability but does not eliminate it; a per-tool result size limit would be the
   correct fix.
+
+- **On-behalf-of (OBO) auth flows.** The harness passes a static API key per session.
+  Delegated identity flows — where the agent acts on behalf of an authenticated end-user and
+  token acquisition is tied to that user's session — are not yet supported by configurable-agent
+  and are out of scope for V1.