From cdb3dd63d4e5b4bb8ad7337aeb1e422c2a71ed53 Mon Sep 17 00:00:00 2001 From: Valdis Pornieks Date: Tue, 9 Jun 2026 12:45:52 +0300 Subject: [PATCH 1/4] chore: TDR: openAI compatible adapter with agent loop --- ...4-openai-compat-adapter-with-agent-loop.md | 176 ++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 docs/tdr/0004-openai-compat-adapter-with-agent-loop.md diff --git a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md new file mode 100644 index 0000000..c823da4 --- /dev/null +++ b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md @@ -0,0 +1,176 @@ +# TDR 0004 — OpenAI-Compatible Agentic Adapter + +**Status:** Accepted — 2026-06-09 + +## Context + +QualOps' agentic review mode is gated on two proprietary SDKs: `@anthropic-ai/claude-agent-sdk` +for Anthropic and `@openai/agents` for OpenAI. Both SDKs manage their own tool dispatch, +context handling, and multi-turn orchestration internally. This works well for the providers they +were designed for, but it prevents using any other model that speaks the OpenAI chat completions +wire format — Groq, Mistral, local models via Ollama or LM Studio, Gemini via OpenRouter, +DeepSeek on Fireworks, and others. + +These providers all implement the same `/v1/chat/completions` endpoint with tool calling +(`tool_calls` in the assistant message, `tool` role in the next user turn). The only thing missing +is a client-side loop that drives the conversation, dispatches tool calls, and manages the context +window without relying on a provider-specific SDK. + +## Decision + +Introduce an `openai-compat` provider that runs a self-contained agentic harness using raw HTTP +calls to any OpenAI-compatible chat completions endpoint. No agent SDK is involved. The harness +owns the reasoning loop, tool dispatch, and context window management, making it compatible with +any endpoint that speaks the standard wire format. + +## Architecture + +``` + Caller (AgenticExecutor) + │ + │ run(params) + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ OpenAICompatAdapter │ +│ │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ Session state: ChatMessage[] │ │ +│ │ [ system ] [ user ] [ assistant ] [ tool ] [ ... ] │ │ +│ └───────────────────────┬──────────────────────────────────┘ │ +│ │ │ +│ ┌───── turn loop (1..maxTurns) ──────────────┐ │ +│ │ │ │ +│ │ ┌─────────────────────────┐ │ │ +│ │ │ ContextManager │ │ │ +│ │ │ maybeSummarize() │ │ │ +│ │ │ · estimate tokens │ │ │ +│ │ │ · if > 60% window: │ │ │ +│ │ │ summarise oldest │ │ │ +│ │ │ exchange → [summary] │ │ │ +│ │ │ fallback: truncate │ │ │ +│ │ └────────────┬────────────┘ │ │ +│ │ │ compressed history │ │ +│ │ ▼ │ │ +│ │ ┌─────────────────────────┐ │ │ +│ │ │ fetchWithRetry() │ │ │ +│ │ │ POST /chat/completions │◄──────────────┼────────►│ LLM endpoint +│ │ │ · 429/5xx: backoff×3 │ │ │ +│ │ │ · 401: immediate fail │ │ │ +│ │ └────────────┬────────────┘ │ │ +│ │ │ ChatCompletionResponse │ │ +│ │ ▼ │ │ +│ │ finish_reason? │ │ +│ │ ├─ stop ──────────────────► return output │ +│ │ ├─ length ────────────────► error_max_tokens │ +│ │ ├─ content_filter ────────► error_content_filter +│ │ └─ tool_calls │ │ +│ │ │ │ │ +│ │ ▼ │ │ +│ │ ┌─────────────────────────┐ │ │ +│ │ │ Tool Dispatch │ │ │ +│ │ │ (sequential) │ │ │ +│ │ │ · resolve by name │ │ │ +│ │ │ · parse JSON args │ │ │ +│ │ │ · execute handler │ │ │ +│ │ │ · append tool result │ │ │ +│ │ └─────────────────────────┘ │ │ +│ │ │ │ +│ └────────────────────────────────────────────┘ │ +│ │ │ +│ maxTurns exceeded → error_max_turns │ +│ │ +│ finally: toolSet.dispose() │ +└─────────────────────────────────────────────────────────────────┘ + │ + │ AgentAdapterResult { output, inputTokens, outputTokens, errorSubtype? } + ▼ + Caller (AgenticExecutor) +``` + +The harness is composed of three layers: + +### Adapter + +The adapter (`OpenAICompatAdapter`) is the entry point and owns the session lifecycle. It receives +a system prompt, a user prompt, a set of tool definitions, and configuration — model name, endpoint +URL, API key, and a turn budget. It builds an initial message history and enters a loop. + +On each turn it sends the current history to the endpoint and waits for a reply. If the model +signals it is done (`finish_reason: stop`) the loop exits and the assistant's final text is +returned. If the model requests tool calls, the adapter executes each one in sequence, appends the +results to the history, and continues to the next turn. Execution is sequential rather than +parallel because some tools — specifically the bash session tool — are stateful and cannot be +interleaved safely. + +The adapter handles HTTP-level errors with exponential backoff for rate limiting (429) and server +errors (5xx), and maps each terminal condition to a stable error subtype string that callers can +inspect without parsing error messages. Tool errors — unknown tool names, malformed arguments, +execution failures — are returned to the model as tool result messages rather than thrown. This +lets the model observe the error and recover without restarting the session. + +The adapter is stateless between `run()` calls; all session state lives in the message history +that it builds up turn by turn. Subagent orchestration is not supported — the model reasons flat +using the provided tools. + +### Context Manager + +The context manager (`ContextManager`) is called at the start of each turn, before the request is +sent. It estimates the token count of the current history and compares it against the known context +limit for the model. + +If the history exceeds 60% of the context window, the context manager compresses the oldest tool +exchange in the history. It does this by sending a one-shot request to the same endpoint, asking +it to summarise what was done and found in that exchange. The summary replaces the original +exchange in place — an assistant message with tool calls and the corresponding tool results become +a single system message prefixed `[summarized]`. + +The 60% threshold is intentional. It ensures there is always room in the remaining window for both +the summary response itself and the model's next substantive reply. Triggering compression at a +higher threshold risks running out of context mid-summarisation. + +If the summary call fails for any reason, the context manager falls back to hard truncation: +dropping the oldest tool exchange entirely rather than replacing it with a summary. The system and +user messages at the start of the history are always preserved. + +### Tool Dispatch + +Tool definitions are Zod schemas. Before the first request, the adapter converts each schema to +JSON Schema (draft-7 target) using the existing `schemaToJsonSchema` utility. Draft-7 is chosen +over draft-2020-12 because some vendor endpoints have narrow JSON Schema parsers that reject the +newer `$schema` markers. + +## Configuration + +The `openai-compat` provider is configured inside the standard `ai.reviewStage` block: + +```json +{ + "ai": { + "reviewStage": { + "provider": "openai-compat", + "model": "mistral-small-latest", + "baseURL": "https://api.mistral.ai/v1", + "apiKeyEnvVar": "MISTRAL_API_KEY", + "inputPerMillion": 0.1, + "outputPerMillion": 0.3 + } + } +} +``` + +`baseURL` identifies the chat completions base URL. It falls back to the `OPENAI_BASE_URL` +environment variable, then to the standard OpenAI endpoint. `apiKeyEnvVar` is the name of the +environment variable that holds the API key; it falls back to `OPENAI_API_KEY`, and if neither is +set the key is empty — which is valid for local endpoints like Ollama that require no +authentication. + +Structured output capability is derived automatically from the model name via the existing litellm +capability catalog. No manual configuration is required. + +## What This Does Not Do (V1) + +- **Subagent orchestration.** The model reasons flat using the tools it is given. Multi-agent + handoff requires SDK-level orchestration that is out of scope here. +- **Budget enforcement.** `maxBudgetUsd` is accepted and passed through but not enforced. +- **Streaming.** Responses are collected in full before processing. +- **Parallel tool execution.** Tool calls are always sequential. From 7d9644d6be21506877c874cef989b80c91369345 Mon Sep 17 00:00:00 2001 From: Valdis Pornieks Date: Tue, 9 Jun 2026 14:59:28 +0300 Subject: [PATCH 2/4] chore: updated TDR with considered options --- ...4-openai-compat-adapter-with-agent-loop.md | 242 +++++++++++++++--- 1 file changed, 200 insertions(+), 42 deletions(-) diff --git a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md index c823da4..9754ae7 100644 --- a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md +++ b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md @@ -1,6 +1,6 @@ # TDR 0004 — OpenAI-Compatible Agentic Adapter -**Status:** Accepted — 2026-06-09 +**Status:** Proposed — 2026-06-09 ## Context @@ -16,12 +16,130 @@ These providers all implement the same `/v1/chat/completions` endpoint with tool is a client-side loop that drives the conversation, dispatches tool calls, and manages the context window without relying on a provider-specific SDK. -## Decision +This TDR evaluates whether to build that loop inside QualOps or delegate it to an external +library, and records the design decisions made in the chosen approach. -Introduce an `openai-compat` provider that runs a self-contained agentic harness using raw HTTP -calls to any OpenAI-compatible chat completions endpoint. No agent SDK is involved. The harness -owns the reasoning loop, tool dispatch, and context window management, making it compatible with -any endpoint that speaks the standard wire format. +## Options Considered + +### Option A — Hand-rolled harness (build internally) + +A self-contained TypeScript module (~270 lines) driving a `while` loop directly against the +OpenAI chat completions wire format using `fetch`. No runtime dependency added. + +**Pros:** +- Zero added dependencies; no version drift risk in a critical path +- Full control over context window management — proactive summarisation with hard-truncate + fallback, tuned for the large tool outputs typical in code review workloads +- Sequential tool dispatch is explicit, auditable, and stateful-safe +- Deterministic error codes (`errorSubtype`) that callers match without parsing text +- Already implemented, tested, and passing + +**Cons:** +- Maintenance burden sits entirely on the QualOps team +- Does not benefit from improvements in the wider ecosystem + +--- + +### Option B — Vercel AI SDK (`ai` package, v6) + +`generateText` with `maxSteps` drives the tool-calling loop internally. Provider-agnostic via +`@ai-sdk/openai-compatible` with configurable `baseURL`. + +**Community:** 24,700 GitHub stars · active core team · last commit Jun 2026 · v6 stable. +**Size:** Modular packages; `ai` core ~67 kB gzipped. + +**Pros:** +- Mature, well-maintained, large community +- Built-in loop (`maxSteps`), `prepareStep` hook between turns, `onStepFinish` callback +- OpenAI-compatible via `createOpenAICompatible({ baseURL, apiKey })` +- Tool errors in streaming surfaced as `tool-error` parts fed back to model + +**Cons:** +- **Context window management not built-in.** `pruneMessages` utility exists but the caller + decides strategy; no summarisation primitive. Replacing our context manager would require + adding a second dependency (e.g. tokenlens) and re-implementing the logic anyway. +- Tool errors in `generateText` (non-streaming) are thrown, not fed back to the model — the + error-as-tool-result pattern must be re-implemented on top. +- Web-first design bias (Next.js); CLI is supported but not the primary persona. + +--- + +### Option C — puristajs/harness (v1.0.0) + +TypeScript-native harness with typed agent loop, pluggable provider adapters (OpenAI, Anthropic, +Bedrock, Azure), and workflow primitives (approval gates, parallelisation). + +**Community:** 1 GitHub star · single maintainer · released May 2026 (< 5 weeks old at time of +writing). + +**Pros:** +- Designed precisely for embedded TypeScript harnesses +- Explicit workflow primitives (approval gates, parallel agents) useful for future multi-agent work +- Provider-agnostic with adapters for major providers + +**Cons:** +- **Nascent community** — 1 star, single contributor, no visible production users. +- Context window management strategy undocumented. +- API stability unknown at v1.0, released < 5 weeks ago. +- Unacceptable maintenance risk for a single-maintainer project at this maturity level. + +--- + +### Option D — eggai-tech/EggAI + +Async-first multi-agent meta framework using agent-to-agent message passing over Kafka channels. + +**Community:** 47 GitHub stars · last commit Mar 2026. + +**Pros:** +- Multi-language (Python, JS, Go, etc.) via shared Kafka transport. +- Vendor-agnostic via LiteLLM integration. + +**Cons:** +- **Architectural mismatch.** Distributed Kafka message-passing model vs. an embedded + synchronous CLI loop. Not designed as a single-agent conversation harness. +- No built-in conversation loop for tool calling. +- No context window management. +- Kafka client dependency is heavyweight for a CLI that does synchronous code review. + +--- + +### Comparison + +| Criterion | A — Hand-rolled | B — Vercel AI SDK | C — puristajs | D — EggAI | +|----------------------------|-------------------|--------------------|--------------------|-------------------| +| TypeScript-native | ✅ | ✅ | ✅ | ⚠️ unclear | +| Built-in agentic loop | ✅ | ✅ | ✅ | ❌ | +| OpenAI wire format | ✅ | ✅ | ✅ | ✅ via LiteLLM | +| Context window management | ✅ built-in | ❌ caller-owned | ❓ undocumented | ❌ | +| Error-as-tool-result | ✅ | ⚠️ streaming only | ❓ | ❌ | +| GitHub stars | — | 24,700 | 1 | 47 | +| Zero added dependencies | ✅ | ❌ | ❌ | ❌ | + +## Proposed decision: Option A — hand-rolled harness + +The only credible external alternative is the Vercel AI SDK (Option B). It has strong community +health but lacks the one feature that makes this harness non-trivial: proactive context window +management with summarisation. Adopting it would require adding a second dependency for token +counting and re-implementing the context manager anyway — more moving parts, not fewer. + +puristajs/harness (Option C) is the closest architectural match, but at 1 GitHub star and less +than five weeks old it carries unacceptable maintenance risk for a production tool. A dependency +on a single-maintainer library this early in its lifecycle is hard to justify. + +eggai-tech/EggAI (Option D) is architecturally mismatched. Its distributed Kafka model is the +wrong abstraction for an embedded synchronous loop. + +The hand-rolled approach keeps QualOps self-contained, avoids version drift in a critical path, +and allows the context manager to be tuned specifically for code review workloads where tool +output can be unusually large. + +**This decision should be revisited if:** +- The Vercel AI SDK adds built-in context window management with summarisation support. +- A harness library with >1k stars, multiple contributors, and built-in context management + reaches maturity. +- QualOps needs multi-agent orchestration (e.g. a planner agent delegating to specialist + agents) that the current flat single-agent loop cannot serve. ## Architecture @@ -91,46 +209,66 @@ The harness is composed of three layers: ### Adapter -The adapter (`OpenAICompatAdapter`) is the entry point and owns the session lifecycle. It receives -a system prompt, a user prompt, a set of tool definitions, and configuration — model name, endpoint -URL, API key, and a turn budget. It builds an initial message history and enters a loop. +The adapter (`OpenAICompatAdapter`) is the entry point and owns the session lifecycle. It +implements an **imperative while loop** — the same pattern used by the OpenAI Agents SDK and +smolagents. The alternative (a declarative graph such as LangGraph's `StateGraph`) is more +expressive for multi-agent topologies but adds conceptual overhead that is unnecessary for +flat single-agent reasoning. + +The adapter receives a system prompt, a user prompt, a set of tool definitions, and configuration +— model name, endpoint URL, API key, and a turn budget. It builds an initial message history and +enters the loop. On each turn it sends the current history to the endpoint and waits for a reply. If the model signals it is done (`finish_reason: stop`) the loop exits and the assistant's final text is -returned. If the model requests tool calls, the adapter executes each one in sequence, appends the -results to the history, and continues to the next turn. Execution is sequential rather than -parallel because some tools — specifically the bash session tool — are stateful and cannot be -interleaved safely. +returned. If the model requests tool calls, the adapter dispatches each one, appends the results, +and continues to the next turn. The adapter handles HTTP-level errors with exponential backoff for rate limiting (429) and server -errors (5xx), and maps each terminal condition to a stable error subtype string that callers can -inspect without parsing error messages. Tool errors — unknown tool names, malformed arguments, -execution failures — are returned to the model as tool result messages rather than thrown. This -lets the model observe the error and recover without restarting the session. +errors (5xx), and maps each terminal condition to a stable `errorSubtype` string that callers can +match without parsing error messages. This gives callers a machine-readable signal for every +failure mode. + +Tool errors — unknown tool names, malformed arguments, execution failures — are handled using the +**error-as-tool-result pattern**: the error is formatted as a `tool` role message and appended to +history, letting the model observe what went wrong and recover in the next turn. This is the same +pattern used by the OpenAI Agents SDK's `ToolErrorFormatter`. The alternative (throwing the +error to the caller) would abort the session for what are often recoverable conditions. -The adapter is stateless between `run()` calls; all session state lives in the message history -that it builds up turn by turn. Subagent orchestration is not supported — the model reasons flat -using the provided tools. +The adapter is stateless between `run()` calls; all session state lives in the message history. +Subagent orchestration is not supported — the model reasons flat using the provided tools. ### Context Manager -The context manager (`ContextManager`) is called at the start of each turn, before the request is -sent. It estimates the token count of the current history and compares it against the known context -limit for the model. +The context manager (`maybeSummarize`) is called at the start of each turn, before the request +is sent. It estimates the token count of the current history and compares it against the known +context limit for the model. + +**Why 60%, not 80%?** Letta/MemGPT triggers compression at 80% because it works *reactively*: +the response has already been received and is in hand. This harness compresses *proactively*, +before sending the next request. The remaining 40% of the window must budget for two things: the +summarisation call response and the model's next substantive reply. 60% is the correct trigger +for a proactive strategy; 80% would risk running out of context mid-summarisation. + +When compression is needed, the context manager applies **tool exchange atomicity**: the assistant +message containing tool calls and all of its corresponding tool results are treated as an +indivisible unit. They are summarised or dropped together, never split. (smolagents enforces the +same principle via its `ActionStep` abstraction; Letta has a `group_id` field but enforces it +weakly.) Splitting a tool call from its result would leave the model with an inconsistent view of +what happened. -If the history exceeds 60% of the context window, the context manager compresses the oldest tool -exchange in the history. It does this by sending a one-shot request to the same endpoint, asking -it to summarise what was done and found in that exchange. The summary replaces the original -exchange in place — an assistant message with tool calls and the corresponding tool results become -a single system message prefixed `[summarized]`. +The **preservation contract** is explicit: the system message at index 0 and the original user +task at index 1 are architecturally protected and are never compressed or dropped, regardless of +context pressure. This mirrors Letta's pinned `in_context_messages[0]` and smolagents' +immutable `SystemPromptStep`. -The 60% threshold is intentional. It ensures there is always room in the remaining window for both -the summary response itself and the model's next substantive reply. Triggering compression at a -higher threshold risks running out of context mid-summarisation. +Token counting uses an approximation (`estimateTokens` on `JSON.stringify(history)`) rather than +exact per-token counting (e.g. tiktoken). This is a deliberate performance trade-off. The +consequence of a false positive is an extra summarisation call; the consequence of a false +negative would be a context overflow. The approximation errs toward early compression. -If the summary call fails for any reason, the context manager falls back to hard truncation: -dropping the oldest tool exchange entirely rather than replacing it with a summary. The system and -user messages at the start of the history are always preserved. +If the summary call fails, the context manager falls back to hard truncation: dropping the oldest +tool exchange entirely. System and user messages are always preserved. ### Tool Dispatch @@ -139,6 +277,13 @@ JSON Schema (draft-7 target) using the existing `schemaToJsonSchema` utility. Dr over draft-2020-12 because some vendor endpoints have narrow JSON Schema parsers that reject the newer `$schema` markers. +Tool calls are dispatched **sequentially**. This is not an arbitrary constraint — LangGraph, +the OpenAI Agents SDK, and smolagents all default to sequential execution for the same reason: +stateful tools cannot be safely interleaved. The bash session tool maintains shell state between +calls (working directory, environment variables, running processes). Parallel dispatch would +produce non-deterministic results. LangGraph's `Send` API enables parallel execution, but only +for tools that are stateless and idempotent — a prerequisite that does not hold here. + ## Configuration The `openai-compat` provider is configured inside the standard `ai.reviewStage` block: @@ -160,17 +305,30 @@ The `openai-compat` provider is configured inside the standard `ai.reviewStage` `baseURL` identifies the chat completions base URL. It falls back to the `OPENAI_BASE_URL` environment variable, then to the standard OpenAI endpoint. `apiKeyEnvVar` is the name of the -environment variable that holds the API key; it falls back to `OPENAI_API_KEY`, and if neither is -set the key is empty — which is valid for local endpoints like Ollama that require no -authentication. +environment variable that holds the API key; it falls back to `OPENAI_API_KEY`, and if neither +is set the key is empty — valid for local endpoints like Ollama that require no authentication. -Structured output capability is derived automatically from the model name via the existing litellm -capability catalog. No manual configuration is required. +Structured output capability is derived automatically from the model name via the existing +litellm capability catalog. No manual configuration is required. ## What This Does Not Do (V1) - **Subagent orchestration.** The model reasons flat using the tools it is given. Multi-agent - handoff requires SDK-level orchestration that is out of scope here. + delegation (e.g. a planner handing off to specialist agents) requires a graph-based + orchestration layer such as LangGraph's `StateGraph` and is out of scope here. + - **Budget enforcement.** `maxBudgetUsd` is accepted and passed through but not enforced. -- **Streaming.** Responses are collected in full before processing. -- **Parallel tool execution.** Tool calls are always sequential. + +- **Streaming.** Responses are collected in full before processing. Streaming would require + assembling partial `tool_calls` deltas before dispatch and is a non-trivial addition. + +- **Parallel tool execution.** Tool calls are always sequential. LangGraph's `Send` API is the + industry model for parallel dispatch, but it requires all tools to be stateless and idempotent. + The bash session tool is stateful, so this prerequisite cannot be met without architectural + changes to the tool layer. + +- **Reactive overflow recovery.** If a single tool result exceeds the remaining context budget + after proactive compression (e.g. a file read returning a megabyte of output), the endpoint + returns an HTTP 400. The harness surfaces this as an unhandled error. Proactive compression + reduces the probability but does not eliminate it; a per-tool result size limit would be the + correct fix. From 8cf9da6368a624af745143181ec41334bfc4522b Mon Sep 17 00:00:00 2001 From: Valdis Pornieks Date: Tue, 9 Jun 2026 16:47:13 +0300 Subject: [PATCH 3/4] chore: add option to create our own harness package --- ...4-openai-compat-adapter-with-agent-loop.md | 85 +++++++++++-------- 1 file changed, 51 insertions(+), 34 deletions(-) diff --git a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md index 9754ae7..62c15a6 100644 --- a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md +++ b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md @@ -104,42 +104,59 @@ Async-first multi-agent meta framework using agent-to-agent message passing over --- +### Option E — Extract to `@eggai/harness` (new eggai-tech repo) + +Create a standalone npm package under the `eggai-tech` GitHub organisation covering the same +scope as the hand-rolled implementation: agentic loop, context manager, tool dispatch. QualOps +would become a consumer of that package rather than owning the code directly. + +**Pros:** +- Reusable across other eggai-tech projects and by the community +- Forces a clean, well-documented public API boundary between harness concerns and + QualOps-specific concerns (code review prompts, tool definitions, skip patterns) +- Community contributions improve the harness without QualOps being the sole maintainer +- Validates the design against other use cases, surfacing hidden QualOps-specific assumptions + +**Cons:** +- Requires creating and maintaining a new open-source repo: CI, releases, semver, changelog, + documentation, issue triage +- QualOps acquires a runtime dependency on a package the same team owns — version drift is + self-inflicted but still real (breaking changes require coordinated releases) +- The harness design needs to stabilise first; extracting too early locks in an API that may + still need to change as QualOps workloads reveal new requirements +- Until a second eggai-tech project needs the same harness, the overhead of a separate repo + is not justified by reuse benefit + +--- + ### Comparison -| Criterion | A — Hand-rolled | B — Vercel AI SDK | C — puristajs | D — EggAI | -|----------------------------|-------------------|--------------------|--------------------|-------------------| -| TypeScript-native | ✅ | ✅ | ✅ | ⚠️ unclear | -| Built-in agentic loop | ✅ | ✅ | ✅ | ❌ | -| OpenAI wire format | ✅ | ✅ | ✅ | ✅ via LiteLLM | -| Context window management | ✅ built-in | ❌ caller-owned | ❓ undocumented | ❌ | -| Error-as-tool-result | ✅ | ⚠️ streaming only | ❓ | ❌ | -| GitHub stars | — | 24,700 | 1 | 47 | -| Zero added dependencies | ✅ | ❌ | ❌ | ❌ | - -## Proposed decision: Option A — hand-rolled harness - -The only credible external alternative is the Vercel AI SDK (Option B). It has strong community -health but lacks the one feature that makes this harness non-trivial: proactive context window -management with summarisation. Adopting it would require adding a second dependency for token -counting and re-implementing the context manager anyway — more moving parts, not fewer. - -puristajs/harness (Option C) is the closest architectural match, but at 1 GitHub star and less -than five weeks old it carries unacceptable maintenance risk for a production tool. A dependency -on a single-maintainer library this early in its lifecycle is hard to justify. - -eggai-tech/EggAI (Option D) is architecturally mismatched. Its distributed Kafka model is the -wrong abstraction for an embedded synchronous loop. - -The hand-rolled approach keeps QualOps self-contained, avoids version drift in a critical path, -and allows the context manager to be tuned specifically for code review workloads where tool -output can be unusually large. - -**This decision should be revisited if:** -- The Vercel AI SDK adds built-in context window management with summarisation support. -- A harness library with >1k stars, multiple contributors, and built-in context management - reaches maturity. -- QualOps needs multi-agent orchestration (e.g. a planner agent delegating to specialist - agents) that the current flat single-agent loop cannot serve. +| Criterion | A — Hand-rolled | B — Vercel AI SDK | C — puristajs | D — EggAI | E — @eggai/harness | +|----------------------------|-------------------|--------------------|--------------------|--------------------|---------------------| +| TypeScript-native | ✅ | ✅ | ✅ | ⚠️ unclear | ✅ | +| Built-in agentic loop | ✅ | ✅ | ✅ | ❌ | ✅ (planned) | +| OpenAI wire format | ✅ | ✅ | ✅ | ✅ via LiteLLM | ✅ (planned) | +| Context window management | ✅ built-in | ❌ caller-owned | ❓ undocumented | ❌ | ✅ (planned) | +| Error-as-tool-result | ✅ | ⚠️ streaming only | ❓ | ❌ | ✅ (planned) | +| GitHub stars | — | 24,700 | 1 | 47 | — (new) | +| Zero added dependencies | ✅ | ❌ | ❌ | ❌ | ❌ | +| Reusable outside QualOps | ❌ | ✅ | ✅ | ✅ | ✅ | + +## Proposed decision: Option A — build inside QualOps + +Option A is the recommended starting point. Build the harness inside QualOps, let the design +stabilise against real code review workloads, then revisit once there is evidence that the +approach should change. + +**Revisit when:** +- A second eggai-tech project needs an agentic loop — Option E (`@eggai/harness`) becomes + viable and the reuse benefit justifies a separate repo +- The Vercel AI SDK (Option B) adds built-in summarisation-based context management — + the main gap that currently rules it out +- A community harness library with >1k stars and multi-contributor context management + reaches maturity — revisit the external dependency options +- QualOps needs multi-agent orchestration (e.g. a planner delegating to specialist agents) + that the current flat loop cannot serve — revisit graph-based options or Option E ## Architecture From 8bf62fe6a26340ac96982696b18932edf08a48b0 Mon Sep 17 00:00:00 2001 From: Valdis Pornieks Date: Wed, 10 Jun 2026 15:48:20 +0300 Subject: [PATCH 4/4] chore: add additional considerations for integrating with configurable-agent --- ...4-openai-compat-adapter-with-agent-loop.md | 384 ++++++++++-------- 1 file changed, 218 insertions(+), 166 deletions(-) diff --git a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md index 62c15a6..06b4fd2 100644 --- a/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md +++ b/docs/tdr/0004-openai-compat-adapter-with-agent-loop.md @@ -85,78 +85,152 @@ writing). --- -### Option D — eggai-tech/EggAI +### Option D — configurable-agent (eggai-tech in-house harness) -Async-first multi-agent meta framework using agent-to-agent message passing over Kafka channels. +A standalone TypeScript service (Node 22, ESM) built and maintained by the eggai-tech team, +already in production use for other agents in the organisation. It runs an agentic for-loop +(`runAgent()`) built on the **Vercel AI SDK (`ai` v5)** with: +- Streaming each step via `streamText()` +- `ToolSet` injection via `RunAgentOptions.tools` — QualOps passes its own tools without MCP +- Structured `AgentEvent` callbacks (`tool_call`, `tool_result`, `content_delta`, `final`, `error`) +- Context compaction (LLM-based summarisation at 100k tokens) and tool output summarisation (>4k tokens) +- OpenAI-compatible endpoints via `@ai-sdk/openai-compatible` (provider: `ollama` or `openai-compatible`, configurable `baseUrl`) -**Community:** 47 GitHub stars · last commit Mar 2026. +**Current state:** `runAgent()` exists and is tested, but is not yet exposed as a public library +entry point. A one-time extraction is needed before QualOps can depend on it. **Pros:** -- Multi-language (Python, JS, Go, etc.) via shared Kafka transport. -- Vendor-agnostic via LiteLLM integration. +- Loop, context compaction, and tool output summarisation already built and tested in production +- Vercel AI SDK foundation — OpenAI-compatible endpoints supported natively +- `ToolSet` injection path already exists — no MCP server required +- `AgentEvent` emitter pattern is clean and observable +- Maintained by the same organisation — no external dependency risk, aligned direction +- Removes ~300 lines of hand-rolled loop from QualOps **Cons:** -- **Architectural mismatch.** Distributed Kafka message-passing model vs. an embedded - synchronous CLI loop. Not designed as a single-agent conversation harness. -- No built-in conversation loop for tool calling. -- No context window management. -- Kafka client dependency is heavyweight for a CLI that does synchronous code review. +- `runAgent()` is not yet a public library API — requires a one-time library entry point extraction in the configurable-agent repo +- QualOps tools (Zod schema + execute function) must be adapted to Vercel AI SDK `ToolSet` shape +- `AgentConfig` is currently YAML/file-driven — QualOps must construct it programmatically +- Streaming-first design: QualOps must accumulate `content_delta` events until `final` to get output string +- Replaces our tested context manager with configurable-agent's compaction (different threshold: 100k tokens vs 60% proactive — needs validation) +- `provider` enum covers `anthropic | openai | google | ollama` — `openai-compatible` may need to be added or `ollama` repurposed as the generic path --- -### Option E — Extract to `@eggai/harness` (new eggai-tech repo) +### Comparison -Create a standalone npm package under the `eggai-tech` GitHub organisation covering the same -scope as the hand-rolled implementation: agentic loop, context manager, tool dispatch. QualOps -would become a consumer of that package rather than owning the code directly. +| Criterion | A — Hand-rolled | B — Vercel AI SDK | C — puristajs | D — configurable-agent | +|----------------------------|-------------------|--------------------|--------------------|-----------------------| +| TypeScript-native | ✅ | ✅ | ✅ | ✅ | +| Built-in agentic loop | ✅ | ✅ | ✅ | ✅ | +| OpenAI wire format | ✅ | ✅ | ✅ | ✅ via Vercel AI SDK | +| Context window management | ✅ built-in | ❌ caller-owned | ❓ undocumented | ✅ built-in | +| Error-as-tool-result | ✅ | ⚠️ streaming only | ❓ | ✅ via AgentEvent | +| GitHub stars | — | 24,700 | 1 | — (internal) | +| Zero added dependencies | ✅ | ❌ | ❌ | ❌ | +| Reusable outside QualOps | ❌ | ✅ | ✅ | ✅ (by design) | +| Same-org ownership | ✅ | ❌ | ❌ | ✅ | + +## Decision: Option D — configurable-agent + +Option D is the recommended approach. The configurable-agent harness is built by the same +eggai-tech team, already production-tested, and removes the maintenance burden of a hand-rolled +loop from QualOps. The `tools` injection path (`RunAgentOptions.tools`) means QualOps does not +need to run MCP servers — it constructs its own `ToolSet` at call time. The Vercel AI SDK +already handles OpenAI-compatible endpoints, satisfying the core requirement. + +Beyond QualOps, standardising on configurable-agent's `AgentEvent` API creates shared +infrastructure for the wider eggai-tech agent portfolio: the structured event stream is the +natural integration point for external observability tools (Langfuse), and a config-driven +`AgentConfig` supports low-code deployment patterns that reduce the audit surface clients +must review before approving production use. These benefits compound as more agents in the +organisation adopt the same harness. + +## Technical tasks + +### Phase 1 — Prerequisite refactors + +These changes should be made before any integration work begins. They make both repositories +cleaner independently of each other and remove the need for workarounds in the adapter. + +#### configurable-agent + +**1. Add `openai-compatible` as a named provider** + +`model.ts` currently handles OpenAI-compatible endpoints only via the `ollama` case, which calls +`createOpenAICompatible({ name: 'ollama', baseURL })` without passing an API key. This silently +breaks any authenticated endpoint (Mistral, Groq, DeepSeek, etc.). + +Add `openai-compatible` to the `ModelProvider` enum, add `apiKey?: string` to the `model` object +in `AgentConfigSchema`, and add a corresponding case in `buildModel`: + +```typescript +case 'openai-compatible': { + const baseURL = cfg.baseUrl ?? process.env.OPENAI_BASE_URL ?? 'https://api.openai.com/v1'; + const apiKey = cfg.apiKey ?? process.env.OPENAI_API_KEY ?? ''; + const compat = createOpenAICompatible({ name: 'openai-compatible', baseURL, apiKey }); + return compat(cfg.name); +} +``` -**Pros:** -- Reusable across other eggai-tech projects and by the community -- Forces a clean, well-documented public API boundary between harness concerns and - QualOps-specific concerns (code review prompts, tool definitions, skip patterns) -- Community contributions improve the harness without QualOps being the sole maintainer -- Validates the design against other use cases, surfacing hidden QualOps-specific assumptions +**Why not reuse `ollama`?** Injecting a Mistral API key via `OLLAMA_BASE_URL` / `OLLAMA_API_KEY` +is misleading and fragile. A named provider is clearer and avoids confusion in logs and telemetry. -**Cons:** -- Requires creating and maintaining a new open-source repo: CI, releases, semver, changelog, - documentation, issue triage -- QualOps acquires a runtime dependency on a package the same team owns — version drift is - self-inflicted but still real (breaking changes require coordinated releases) -- The harness design needs to stabilise first; extracting too early locks in an API that may - still need to change as QualOps workloads reveal new requirements -- Until a second eggai-tech project needs the same harness, the overhead of a separate repo - is not justified by reuse benefit +--- + +**2. Expose `runAgent` and types as a library entry point** + +`runAgent`, `AgentConfig`, `AgentEmitter`, `AgentEvent`, and `RunAgentOptions` are defined and +tested but not reachable from outside the package. Add a library entry point: + +```json +// package.json +"exports": { + ".": "./dist/index.js", + "./lib": "./dist/lib/index.js" +} +``` + +```typescript +// lib/index.ts +export { runAgent, prepareMessages } from './agent/loop.js'; +export type { RunAgentOptions } from './agent/loop.js'; +export type { AgentConfig } from './config/schema.js'; +export type { AgentEmitter, AgentEvent, ToolResult } from './agent/events.js'; +``` + +#### QualOps + +**4. Remove `OpenAICompatAdapter` and `context-manager.ts`** + +The hand-rolled agentic adapter (`src/stages/review/agentic/adapters/openai-compat-adapter.ts`) +and context manager (`src/stages/review/agentic/adapters/context-manager.ts`) are replaced by +the configurable-agent integration. Removing them before writing the new adapter avoids confusion +about which loop is active and eliminates dead code in the test suite. --- -### Comparison +### Phase 2 — Integration + +Once the Phase 1 refactors are merged in both repos: + +1. **Add dependency** on configurable-agent (local path dep initially, then published to npm). -| Criterion | A — Hand-rolled | B — Vercel AI SDK | C — puristajs | D — EggAI | E — @eggai/harness | -|----------------------------|-------------------|--------------------|--------------------|--------------------|---------------------| -| TypeScript-native | ✅ | ✅ | ✅ | ⚠️ unclear | ✅ | -| Built-in agentic loop | ✅ | ✅ | ✅ | ❌ | ✅ (planned) | -| OpenAI wire format | ✅ | ✅ | ✅ | ✅ via LiteLLM | ✅ (planned) | -| Context window management | ✅ built-in | ❌ caller-owned | ❓ undocumented | ❌ | ✅ (planned) | -| Error-as-tool-result | ✅ | ⚠️ streaming only | ❓ | ❌ | ✅ (planned) | -| GitHub stars | — | 24,700 | 1 | 47 | — (new) | -| Zero added dependencies | ✅ | ❌ | ❌ | ❌ | ❌ | -| Reusable outside QualOps | ❌ | ✅ | ✅ | ✅ | ✅ | - -## Proposed decision: Option A — build inside QualOps - -Option A is the recommended starting point. Build the harness inside QualOps, let the design -stabilise against real code review workloads, then revisit once there is evidence that the -approach should change. - -**Revisit when:** -- A second eggai-tech project needs an agentic loop — Option E (`@eggai/harness`) becomes - viable and the reuse benefit justifies a separate repo -- The Vercel AI SDK (Option B) adds built-in summarisation-based context management — - the main gap that currently rules it out -- A community harness library with >1k stars and multi-contributor context management - reaches maturity — revisit the external dependency options -- QualOps needs multi-agent orchestration (e.g. a planner delegating to specialist agents) - that the current flat loop cannot serve — revisit graph-based options or Option E +2. **New adapter** `ConfigurableAgentAdapter` implementing `AgentAdapter`: + - Constructs `AgentConfig` programmatically from `AgentAdapterParams` + (systemPrompt, model, maxTurns → maxSteps, maxOutputTokens, baseUrl, apiKey) + - Converts QualOps `ToolDefinition[]` to Vercel AI SDK `ToolSet` format + - Calls `runAgent(config, [userMessage], emitter, undefined, { tools })` + - Accumulates `content_delta` events into an output string; maps `error` event codes to + `errorSubtype` values; extracts token usage from the `final` event + +3. **Register adapter** — wire `ConfigurableAgentAdapter` in + `src/stages/review/agentic/adapters/index.ts` for both `anthropic` and `openai-compatible` + providers (replacing `AnthropicAdapter`, `OpenAIAdapter`, and `OpenAICompatibleAdapter`). + +No existing code changes — purely additive. + +--- ## Architecture @@ -166,55 +240,58 @@ approach should change. │ run(params) ▼ ┌─────────────────────────────────────────────────────────────────┐ -│ OpenAICompatAdapter │ +│ ConfigurableAgentAdapter (QualOps) │ +│ │ +│ · Constructs AgentConfig from AgentAdapterParams │ +│ · Converts ToolDefinition[] → Vercel AI SDK ToolSet │ +│ · Accumulates AgentEvent stream into AgentAdapterResult │ +└──────────────────────────┬──────────────────────────────────────┘ + │ runAgent(config, messages, emit, options) + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ configurable-agent: runAgent() │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ -│ │ Session state: ChatMessage[] │ │ +│ │ Message history: CoreMessage[] │ │ │ │ [ system ] [ user ] [ assistant ] [ tool ] [ ... ] │ │ │ └───────────────────────┬──────────────────────────────────┘ │ │ │ │ -│ ┌───── turn loop (1..maxTurns) ──────────────┐ │ +│ ┌───── step loop (1..maxSteps) ──────────────┐ │ │ │ │ │ │ │ ┌─────────────────────────┐ │ │ -│ │ │ ContextManager │ │ │ -│ │ │ maybeSummarize() │ │ │ -│ │ │ · estimate tokens │ │ │ -│ │ │ · if > 60% window: │ │ │ -│ │ │ summarise oldest │ │ │ -│ │ │ exchange → [summary] │ │ │ -│ │ │ fallback: truncate │ │ │ +│ │ │ Context Compaction │ │ │ +│ │ │ · trigger at 100k tok │ │ │ +│ │ │ · LLM-based summary │ │ │ +│ │ │ · tool output >4k tok │ │ │ +│ │ │ summarised inline │ │ │ │ │ └────────────┬────────────┘ │ │ -│ │ │ compressed history │ │ +│ │ │ compacted history │ │ │ │ ▼ │ │ │ │ ┌─────────────────────────┐ │ │ -│ │ │ fetchWithRetry() │ │ │ -│ │ │ POST /chat/completions │◄──────────────┼────────►│ LLM endpoint -│ │ │ · 429/5xx: backoff×3 │ │ │ -│ │ │ · 401: immediate fail │ │ │ +│ │ │ streamText() │ │ │ +│ │ │ (Vercel AI SDK) │◄──────────────┼────────►│ LLM endpoint +│ │ │ · @ai-sdk/openai- │ │ │ +│ │ │ compatible │ │ │ │ │ └────────────┬────────────┘ │ │ -│ │ │ ChatCompletionResponse │ │ +│ │ │ StreamTextResult │ │ │ │ ▼ │ │ -│ │ finish_reason? │ │ -│ │ ├─ stop ──────────────────► return output │ -│ │ ├─ length ────────────────► error_max_tokens │ -│ │ ├─ content_filter ────────► error_content_filter -│ │ └─ tool_calls │ │ +│ │ stopReason? │ │ +│ │ ├─ stop ──── emit final ──────────── ┼── ✓ │ +│ │ ├─ length ── emit error(code) ─────── ┼── ✗ │ +│ │ └─ tool-calls │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌─────────────────────────┐ │ │ │ │ │ Tool Dispatch │ │ │ │ │ │ (sequential) │ │ │ -│ │ │ · resolve by name │ │ │ -│ │ │ · parse JSON args │ │ │ -│ │ │ · execute handler │ │ │ -│ │ │ · append tool result │ │ │ +│ │ │ · execute ToolSet fn │ │ │ +│ │ │ · emit tool_call │ │ │ +│ │ │ · emit tool_result │ │ │ │ │ └─────────────────────────┘ │ │ │ │ │ │ │ └────────────────────────────────────────────┘ │ │ │ │ -│ maxTurns exceeded → error_max_turns │ -│ │ -│ finally: toolSet.dispose() │ +│ maxSteps exceeded → emit error(max_steps) │ └─────────────────────────────────────────────────────────────────┘ │ │ AgentAdapterResult { output, inputTokens, outputTokens, errorSubtype? } @@ -222,94 +299,63 @@ approach should change. Caller (AgenticExecutor) ``` -The harness is composed of three layers: - -### Adapter - -The adapter (`OpenAICompatAdapter`) is the entry point and owns the session lifecycle. It -implements an **imperative while loop** — the same pattern used by the OpenAI Agents SDK and -smolagents. The alternative (a declarative graph such as LangGraph's `StateGraph`) is more -expressive for multi-agent topologies but adds conceptual overhead that is unnecessary for -flat single-agent reasoning. - -The adapter receives a system prompt, a user prompt, a set of tool definitions, and configuration -— model name, endpoint URL, API key, and a turn budget. It builds an initial message history and -enters the loop. - -On each turn it sends the current history to the endpoint and waits for a reply. If the model -signals it is done (`finish_reason: stop`) the loop exits and the assistant's final text is -returned. If the model requests tool calls, the adapter dispatches each one, appends the results, -and continues to the next turn. - -The adapter handles HTTP-level errors with exponential backoff for rate limiting (429) and server -errors (5xx), and maps each terminal condition to a stable `errorSubtype` string that callers can -match without parsing error messages. This gives callers a machine-readable signal for every -failure mode. - -Tool errors — unknown tool names, malformed arguments, execution failures — are handled using the -**error-as-tool-result pattern**: the error is formatted as a `tool` role message and appended to -history, letting the model observe what went wrong and recover in the next turn. This is the same -pattern used by the OpenAI Agents SDK's `ToolErrorFormatter`. The alternative (throwing the -error to the caller) would abort the session for what are often recoverable conditions. - -The adapter is stateless between `run()` calls; all session state lives in the message history. -Subagent orchestration is not supported — the model reasons flat using the provided tools. - -### Context Manager - -The context manager (`maybeSummarize`) is called at the start of each turn, before the request -is sent. It estimates the token count of the current history and compares it against the known -context limit for the model. - -**Why 60%, not 80%?** Letta/MemGPT triggers compression at 80% because it works *reactively*: -the response has already been received and is in hand. This harness compresses *proactively*, -before sending the next request. The remaining 40% of the window must budget for two things: the -summarisation call response and the model's next substantive reply. 60% is the correct trigger -for a proactive strategy; 80% would risk running out of context mid-summarisation. - -When compression is needed, the context manager applies **tool exchange atomicity**: the assistant -message containing tool calls and all of its corresponding tool results are treated as an -indivisible unit. They are summarised or dropped together, never split. (smolagents enforces the -same principle via its `ActionStep` abstraction; Letta has a `group_id` field but enforces it -weakly.) Splitting a tool call from its result would leave the model with an inconsistent view of -what happened. - -The **preservation contract** is explicit: the system message at index 0 and the original user -task at index 1 are architecturally protected and are never compressed or dropped, regardless of -context pressure. This mirrors Letta's pinned `in_context_messages[0]` and smolagents' -immutable `SystemPromptStep`. - -Token counting uses an approximation (`estimateTokens` on `JSON.stringify(history)`) rather than -exact per-token counting (e.g. tiktoken). This is a deliberate performance trade-off. The -consequence of a false positive is an extra summarisation call; the consequence of a false -negative would be a context overflow. The approximation errs toward early compression. - -If the summary call fails, the context manager falls back to hard truncation: dropping the oldest -tool exchange entirely. System and user messages are always preserved. - -### Tool Dispatch - -Tool definitions are Zod schemas. Before the first request, the adapter converts each schema to -JSON Schema (draft-7 target) using the existing `schemaToJsonSchema` utility. Draft-7 is chosen -over draft-2020-12 because some vendor endpoints have narrow JSON Schema parsers that reject the -newer `$schema` markers. - -Tool calls are dispatched **sequentially**. This is not an arbitrary constraint — LangGraph, -the OpenAI Agents SDK, and smolagents all default to sequential execution for the same reason: -stateful tools cannot be safely interleaved. The bash session tool maintains shell state between -calls (working directory, environment variables, running processes). Parallel dispatch would -produce non-deterministic results. LangGraph's `Send` API enables parallel execution, but only -for tools that are stateless and idempotent — a prerequisite that does not hold here. +The integration is composed of two layers: + +### QualOps Adapter + +The QualOps adapter (`ConfigurableAgentAdapter`) bridges `AgentAdapterParams` and +`configurable-agent`'s `runAgent()`. It is responsible for: + +1. **Config construction** — building an `AgentConfig` programmatically from `AgentAdapterParams` + (model name, endpoint URL, API key, maxSteps, maxOutputTokens) rather than loading a YAML file. + +2. **Tool conversion** — mapping QualOps `ToolDefinition[]` (Zod schema + execute function) to + Vercel AI SDK `ToolSet` format (`{ description, parameters: z.ZodType, execute }`). This is + the same shape QualOps tools already have; the conversion is mechanical. + +3. **Event accumulation** — subscribing to `AgentEmitter` callbacks and: + - Concatenating `content_delta` events into an output string + - Mapping `error` event codes to `errorSubtype` values that callers can match + - Extracting token usage from the `final` event to populate `AgentAdapterResult` + +The adapter is stateless between `run()` calls. All session state lives inside `runAgent()`. + +### configurable-agent Loop + +`runAgent()` implements an **imperative for-loop** over `streamText()` steps — the same pattern +used by the OpenAI Agents SDK and smolagents. The alternative (a declarative graph such as +LangGraph's `StateGraph`) is more expressive for multi-agent topologies but adds conceptual +overhead that is unnecessary for flat single-agent reasoning. + +**Context compaction** triggers at 100k tokens (reactive, after receiving a response). When +the threshold is exceeded, older messages are summarised by an LLM call and replaced with a +`[COMPACTED CONTEXT]` system message; the 6 most recent messages are always kept verbatim. +Both thresholds are hardcoded in `ConfigurableAgentAdapter` and are not currently surfaced +as `.qualopsrc.json` configuration. + +**Tool output truncation** fires before compaction: any tool result exceeding 4k tokens is +trimmed to head 500 + tail 500 characters inline, keeping history manageable before the +compaction threshold is ever reached. + +**Tool exchange atomicity** is maintained: the Vercel AI SDK treats each step's tool calls and +results as a unit. Tool output summarisation (>4k tokens per result) reduces history bloat before +it reaches the compaction threshold. + +**Tool dispatch is sequential**. This is not an arbitrary constraint — LangGraph, the OpenAI +Agents SDK, and smolagents all default to sequential execution for the same reason: stateful tools +cannot be safely interleaved. The bash session tool maintains shell state between calls (working +directory, environment variables, running processes). Parallel dispatch would produce +non-deterministic results. ## Configuration -The `openai-compat` provider is configured inside the standard `ai.reviewStage` block: +The `openai-compatible` provider is configured inside the standard `ai.reviewStage` block: ```json { "ai": { "reviewStage": { - "provider": "openai-compat", + "provider": "openai-compatible", "model": "mistral-small-latest", "baseURL": "https://api.mistral.ai/v1", "apiKeyEnvVar": "MISTRAL_API_KEY", @@ -336,8 +382,9 @@ litellm capability catalog. No manual configuration is required. - **Budget enforcement.** `maxBudgetUsd` is accepted and passed through but not enforced. -- **Streaming.** Responses are collected in full before processing. Streaming would require - assembling partial `tool_calls` deltas before dispatch and is a non-trivial addition. +- **Streaming to the caller.** `runAgent()` streams internally (Vercel AI SDK `streamText`), + but QualOps collects the full output via `AgentEvent` accumulation before returning + `AgentAdapterResult`. Surfacing incremental output to the CLI is not yet implemented. - **Parallel tool execution.** Tool calls are always sequential. LangGraph's `Send` API is the industry model for parallel dispatch, but it requires all tools to be stateless and idempotent. @@ -349,3 +396,8 @@ litellm capability catalog. No manual configuration is required. returns an HTTP 400. The harness surfaces this as an unhandled error. Proactive compression reduces the probability but does not eliminate it; a per-tool result size limit would be the correct fix. + +- **On-behalf-of (OBO) auth flows.** The harness passes a static API key per session. + Delegated identity flows — where the agent acts on behalf of an authenticated end-user and + token acquisition is tied to that user's session — are not yet supported by configurable-agent + and are out of scope for V1.