[arch] Design discussion: shared pluggable service for cross-agent state (feedback, sessions, traces)

## Background

The v0.12.0 enterprise feature track added several stateful surfaces to BaseAgent's server layer:

- `POST/GET /v1/sessions` — conversation persistence
- `GET /v1/traces` — trace inspection
- `POST/GET/PATCH /v1/feedback` — user feedback collection
- `GET /metrics` — Prometheus

All four follow the same shape: BaseAgent owns a pluggable store (Null / SQLite / Postgres), the server layer exposes REST endpoints, and the gateway-template proxies through. This was the right call when most deployments had one or two agents.

In multi-agent deployments (eg 10 agents fronted by a single gateway and UI) the per-agent ownership starts to chafe:

- **Duplication** — 10 Postgres pools, 10 schema migrations, 10 housekeeping loops, all writing the same tables
- **Fan-out for cross-agent queries** — "show me all thumbs-down feedback this week" requires the dashboard to hit 10 endpoints and merge client-side, OR query the shared Postgres directly out-of-band (the schema becomes a de-facto API)
- **Schema becomes a contract** — once N agents write the same table, schema changes need coordinated rollouts
- **No auth boundary** between agents sharing storage
- **Sessions are conceptually cross-agent** — a user talks to "the system," not to agent #4. Today there's no clean way to follow a conversation that gets routed to different agents

## What to design

Open question: what does the shape of a shared 'agent platform' service look like, and which surfaces move there?

Initial options to discuss (not a decision, a starting point):

1. **Status quo + documentation.** Document that multi-agent deployments should point all agents at the same Postgres and treat the shared schema as a stable join point. Cheapest, but the rough edges remain.

2. **Full extraction.** A new FastAPI service (working name: \`fipsagents-platform\`) owns sessions + traces + feedback. BaseAgent becomes a thin client. Gateway routes \`/v1/sessions\`, \`/v1/traces\`, \`/v1/feedback\` to the platform service rather than fanning out to per-agent endpoints. One Postgres pool, one REST surface, one dashboard backend.

3. **Partial extraction.** Move feedback + sessions (genuinely cross-agent) but leave traces in BaseAgent shipping to an Otel collector (industry-standard answer, already partially done via \`OTELTraceStore\`). Less moving parts, addresses the highest-value duplication.

4. **Something else.** Maybe BaseAgent keeps everything but grows a 'remote store' adapter for each — same code, configurable backend (in-process vs HTTP). Lets a deployer choose per-feature without forcing a topology.

## Things to think about during the discussion

- **Migration story.** The longer we wait, the more deployments depend on the per-agent endpoints. Cheap to do now while there's effectively one production user; observable migration later
- **Memory** is intentionally NOT in this list — \`self.memory\` is per-agent by design, and MemoryHub already provides the centralized option
- **Metrics** is also separate — Prometheus scrape targets are inherently per-pod, that's fine
- **Auth** — if multiple agents share a backend, who's allowed to write what? Today there's no model for this
- **Deployment friction** — every service we extract is another Helm chart, another readiness probe, another thing for ops to think about. Worth it iff the cross-agent benefits land
- **Pluggability shape** — same \`FeedbackStore\`/\`SessionStore\`/\`TraceStore\` ABCs we have today, just running in a different process? Or a different abstraction entirely?

## Out of scope for this issue

This is a **design discussion** issue, not an implementation. The goal is to come out with a written architecture decision (in \`docs/architecture.md\` or similar) that we can point at when implementing.

Captured during the v0.12.0 feedback feature track. Conversation context: the per-agent ownership felt fine for one or two agents but the smell got louder once we considered the 10-agent case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[arch] Design discussion: shared pluggable service for cross-agent state (feedback, sessions, traces) #112

Background

What to design

Things to think about during the discussion

Out of scope for this issue

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[arch] Design discussion: shared pluggable service for cross-agent state (feedback, sessions, traces) #112

Description

Background

What to design

Things to think about during the discussion

Out of scope for this issue

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions