AI-SPM v1.0.0 — AI Security Posture Management #23
Replies: 2 comments 1 reply
-
|
For an AI-SPM platform, the most valuable next control would be posture drift detection across the full agent deployment lifecycle. Useful drift dimensions:
Each drift event should be classified as expected change, review-required change, or release-blocking change. The evidence record should include previous value, new value, approver, policy version, and whether regression tests were rerun. That gives the platform a posture-management story beyond runtime blocking: it can show whether the deployed agent is still the agent that was originally reviewed. |
Beta Was this translation helpful? Give feedback.
-
|
I kept the scope focused on posture drift rather than a broad feature request: baseline-vs-current comparison for agent definitions, MCP tool manifests, identity scopes, runtime boundaries, memory/RAG sources, guardrail decision rates, and provider routes. That should make it easier to review alongside the current security hardening work. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
AI-SPM v1.0.0 — AI Security Posture Management
Release date: 2026-04-25
Codename: "MCP"
Highlights
with attached per-agent policies enforced on every turn.
agent.py, theplatform validates it, mints per-agent tokens, spawns a sandboxed
Docker container, and routes traffic through Kafka. No custom image
required for the five example agent shapes we ship.
Ollama (both OpenAI-compatible and native modes); operators switch
providers in the UI without restarts or code changes.
call emits a lineage event that lands in
session_eventsand tailsin the per-agent Activity tab in the admin UI within 5 seconds.
bundle from the controller at boot — no platform secrets in the
agent's container env.
What's new
Agent runtime control plane
POST /api/spm/agents— uploadagent.py(multipart) withdeploy_after=true. Validates syntax, top-levelasync def main,and dry-import; mints per-agent
mcp_token+llm_api_key; createsthe per-agent Kafka topics; spawns the runtime container; polls for
the SDK's
aispm.ready()handshake.POST /api/spm/agents/{id}/start | /stop— idempotent kick;UI surfaces a persistent "working…" spinner until the polled
runtime_stateactually changes.DELETE /api/spm/agents/{id}— stops the container, drops thetopics, deletes the row.
POST /api/spm/agents/{id}/chat— full pipeline, SSE response.GET /api/spm/agents/{id}/bootstrap— DB-backed SDK boot. Theagent's container only needs three env vars (
AGENT_ID,MCP_TOKEN,CONTROLLER_URL); everything else is fetched here.GET /api/spm/agents/{id}/policies+PUT— atomic-replaceattach/detach. The chat handler reads
linked_policiesper turnand forwards them to OPA so policies can scope evaluation.
GET /api/spm/agents/{id}/activity— unified timeline (chatturns +
AgentToolCall+AgentLLMCall), newest-first, capped at200 rows. Polled by the Activity tab.
Agent-side SDK (
agent_runtime/aispm)aispm.ready()— lifecycle handshake.aispm.chat.subscribe()/reply()— Kafka I/O. Consumer usesauto_offset_reset="earliest"so the very first message after deployis never silently dropped during consumer-group join.
aispm.chat.history(session_id, limit)— replay persisted turns;example agents use this for conversation memory across turns.
aispm.mcp.call("web_fetch", ...)— JSON-RPC over HTTP to the MCPserver;
web_fetchis Tavily-backed.aispm.llm.complete(messages=, model=…)— OpenAI-compatible callthrough
spm-llm-proxy; the SDK no longer pins a default model sothe operator's chosen provider model wins.
aispm.get_secret(name)— per-agent secret store.aispm.log("step", trace=…)— structured lineage line on stdout.Provider dispatch (spm-llm-proxy)
connector_typeanthropic{base_url}/v1/messagesx-api-key+anthropic-version: 2023-06-01model(payloadmodelhonoured only when it starts withclaude)ollama(/v1){base_url}/chat/completions(OpenAI-compatible)model> integrationmodel>llama3.1:8bfallbackollama(other){base_url}/api/chat(native)model> integrationmodel>llama3.1:8bfallbackSwitching provider is a UI dropdown change on the AI-SPM Agent Runtime
Control Plane (MCP) integration row — no restart, no agent re-deploy.
Observability (
AgentToolCallEvent,AgentLLMCallEvent)spm-mcpemitsAgentToolCallEventafter everyweb_fetch,capturing tool name, args, ok/error, and
duration_ms.spm-llm-proxyemitsAgentLLMCallEventafter every chat-completioncall (Anthropic and Ollama paths), capturing model, prompt and
completion token counts, and ok/error.
cpm.global.lineage_events. The existinglineage_consumerpersists them intosession_eventsautomatically.serving path. A
lineage_producer.send failedwarning is the onlysignal when Kafka is unreachable; chat keeps working.
Admin UI
with a runtime-state pip and risk tint.
Run/Stop toggle, Open Chat, View Detail, and Delete asset
actions.
Open Chat. Composer pinned to bottom (
min-h-0+max-h(100vh-120px)so it can never be pushed off-screen by long chat history).
View Detail button. Five tabs: Overview, Configure, Activity
(live tail, polls every 5s), Sessions, Lineage.
agent without leaving the panel.
enum_integrationfields render as realdropdowns of existing integrations (no more pasting UUIDs).
observes the actual runtime-state change.
Examples
A new top-level
Example agents/folder ships fiveready-to-deploy agents — one per
agent_typeenum value:agent_typecustom_agent.pycustomaispm.chat.history()conversation memory and a strong web-search prompt.langchain_agent.pylangchainAgentExecutor+@toolcalling our MCP / LLM proxies.llamaindex_agent.pyllamaindexaispm.llm, with a hand-rolled retrieval fallback.autogpt_agent.pyautogptopenai_assistant_agent.pyopenai_assistantThe runtime image now has
langchain==0.3.*,langchain-openai==0.2.*,llama-index-core==0.11.*, andllama-index-llms-openai-like==0.2.*baked in, so
langchain_agent.pyandllamaindex_agent.pydeploycleanly without bringing your own image.
Bug fixes
pausedagent immediately after deploy. The upload route's_wait_for_readywas reading a stale identity-mapped Agent rowfrom its own SQLAlchemy session and timing out, then overwriting
the (correctly running) row to
crashed. Fixed withdb.expire_all()on every poll iteration.
consumer joined the group with the default
auto_offset_reset= "latest", so any message produced betweenaispm.ready()flippingthe row to
runningand the consumer registering with the brokerwas skipped. Fixed by switching to
earliest.Prompt blocked by safety guard. (S2)on the literal word "yes".Three different code sites (two adapters and one module-level
function injected via
guard_fn=) had the same anti-pattern thatforced
verdict=blockwhenever any S1–S15 category appeared, evenwhen the guard's own verdict was
allow. Replaced with a length-basedbypass for inputs under
GUARD_MIN_TEXT_LEN=8chars and ascore-threshold (
GUARD_BLOCK_SCORE=0.6) gate on thecategory-escalation path.
Load failedon chat. Theagent_chat.pySSE handler wasimporting
aiokafkalazily but the package wasn't inspm-api'srequirements. Added the dep.
ModuleNotFoundError: No module named 'services.spm_api'inboth
spm-llm-proxyandspm-mcp. Both fell back to a brittlecross-service import. Inlined
_decode_secretand dropped thecross-service registry lookup so each service is self-contained.
POST /v1/chat/completionsreturning 500. The proxy was hardcodedto Ollama's
/api/chatshape; pointing Default LLM at Anthropicproduced a 404 from
api.anthropic.com. Now branches onconnector_typeand translates request + response shape per provider.web_fetch404 onhttp://spm-mcp:8500/mcp. The MCP serverregistered tools with FastMCP but never mounted FastMCP's HTTP
transport. Added an explicit
POST /mcpJSON-RPC handler.spm-mcpcrashing at startup withTypeError: issubclass() arg 1 must be a class.from __future__ import annotationsintools/web_fetch.pymade FastMCP's annotation introspection blow up.Removed.
agents.code_blobself-heal. Operators whorm'd the bind-mountsource previously broke the agent permanently. The runtime image
now rewrites the file from
agents.code_blob(DB-stored agent.pysource) on every spawn.
Integrationspage blank after creating the agent-runtime row.OwnerAvatarcrashed withCannot read properties of null (reading 'split')when the new row had no owner. Defensive null-name guard.null is not an object (evaluating 'n.id')UI crash. ThemergeAgents/mergedAllAssets/adaptLiveAgentchain had nonull filtering, so a poll race during a failing fetch could leave a
null entry in the array. All three layers now filter falsy entries.
SchemaFormdidn't handle
type: "enum_integration"and fell back to a textinput. Now renders a dropdown of existing integrations filtered by
the connector's declared
options_provider.AgentChatPanelcomposer pushed below the fold. Missingmin-h-0on the flex column meant the message list grew unbounded.Operator changes
Runtime Control Plane (MCP), under Integrations → AI Providers.
Set its Default LLM field to your existing AI Provider integration
(e.g. Anthropic, Ollama) and Tavily Integration to your Tavily
row. The proxy and MCP server resolve through this row at every call,
so changing the upstream provider is a UI dropdown change with no
restart.
batteries:
docker compose -f docker-compose.yml -f docker-compose.auth.yml \ --profile build-only build --no-cache agent-runtime-buildcontainers should be force-removed before the next deploy:
GUARD_BLOCK_SCOREapi,spm-api0.6allow + categorytoblock.GUARD_MIN_TEXT_LENapi,spm-api8AGENT_READY_TIMEOUT_Sspm-api30ready()handshake.AGENT_CHAT_REPLY_TIMEOUT_Sspm-api120chat.out.AGENT_CONTROLLER_URLspm-apihttp://spm-api:8092CONTROLLER_URLinto spawned agent containers.KAFKA_BOOTSTRAP_SERVERSspm-mcp,spm-llm-proxykafka-broker:9092Database migrations
Three new alembic revisions auto-apply on
spm-apistartup:005_agent_runtime_control_plane—agents,agent_chat_sessions,agent_chat_messagestables;agent_typeenum.006_agent_policies— join tableagent_policies(agent_id, policy_id, attached_at, attached_by)with cascade-on-agent-delete.007_agent_code_blob— addsagents.code_blob TEXTso the runtimecan self-heal a deleted host file.
All migrations are forward-only; no data loss on upgrade.
005isidempotent against duplicate-enum-create errors so re-applying is safe.
Tests
tests/e2e/(skip cleanly withoutdocker-compose; run when the stack is up).
TestChatWithPolicySmokecovers the full Phase 4 path: registeragent → wait for running → attach policy → POST
/chatSSE →assert reply text + activity timeline contains both user and agent
turns → detach policy → delete.
Documentation
README.mdsections: Deploying an agent,Adding a new integration (LLM, Tavily, etc.), Adding an LLM
specifically — minimum setup, Adding a new asset type.
docs/agents/operator-quickstart.mdextended with the Phase 4end-to-end flow diagram, bring-up checklist, provider dispatch
table, gotchas table, and env-knob reference.
Example agents/README.mdcovers thefive example agents, what each demonstrates, what's baked into the
runtime image, and how to read the Activity tab for debugging.
docs/superpowers/plans/2026-04-25-agent-runtime-control-plane-phase-{1..4}-*.md.docs/superpowers/specs/2026-04-25-agent-runtime-control-plane-mcp-design.md.Upgrade notes
For an existing AI-SPM stack, this is a backwards-compatible feature
release — no breaking changes to existing connectors, policies, or
chat behaviour. To get the new agent runtime online:
Then in the UI:
integration; pick the LLM provider and Tavily on it.
Example agents/custom_agent.py→ type
custom→ Register & Deploy.Known limitations
requirements.txtisn't supported yet; agents thatneed packages outside the baked-in set (LangChain, LlamaIndex, the
SDK transport deps) require forking the runtime Dockerfile. Planned
for V2.
AgentDeployedEvent/AgentStartedEvent/AgentStoppedEventdataclasses exist in
platform_shared/lineage_events.pybut aren'tyet emitted from
agent_controller. OnlyAgentChatMessageEvent,AgentToolCallEvent, andAgentLLMCallEventflow throughlineage_consumertoday.aispm.chat.stream) is a stub thatraises
NotImplementedErroronwrite(). The SSE shape supportsper-token frames, but the agent → Kafka path is one-message-per-reply.
Planned for V1.5.
(the list endpoint scopes by
tenant_id); other surfaces areeffectively single-tenant. V2 enforces strict isolation everywhere.
mcp_token/llm_api_keyat rest. The columns areadmin-only and never returned in API responses, but V2 will encrypt
with the existing Fernet key.
the existing
spm/prompt/allowandspm/output/allowrules; rulesthat reference
input.linked_policiesneed to be authored.What's next (V1.1 / V2 roadmap candidates)
requirements.txtbuilds.AgentLifecycle*event emission fromagent_controller.web_fetch.Acknowledgements
This release was built end-to-end across Phase 1 (backend), Phase 2
(SDK), Phase 3 (UI), Phase 4 (chat pipeline), and Phase 4.5
(observability) over a single working day, against a live AI-SPM stack
with iterative feedback at each step.
Thanks to the operators who hammered the chat with
yesuntil theguard-model false positive surfaced.
This discussion was created from the release AI-SPM v1.0.0 — AI Security Posture Management.
Beta Was this translation helpful? Give feedback.
All reactions