feat(orchestrator): agent transparency + Direct Line + forced delegation (#332)#335
Merged
Conversation
… forced delegation (#332) Address orchestrator "gatekeeper drift": make a requested sub-agent delegation observable and non-suppressible across every channel (incl. MS Teams), while keeping the orchestrator in the loop and aware. L1 — Transparency: project the deterministic runTrace.agentInvocations into a curated SemanticAnswer.agentsConsulted field (+ DelegatedAnswer contract + a plain-text footer fallback). Teams/Telegram, which never see the raw runTrace, now show which specialist actually ran — sourced from the choke-point trace, not the LLM's prose. A fabricated "I asked X" with no real invocation shows nothing. L2 — Direct Line: a core-parsed `#<specialist>` directive binds the sub-agent's input to the user's verbatim payload via the deterministic choke point and delivers its verbatim answer as a harness-owned, attributed delegatedAnswer the orchestrator can neither suppress nor reword. strict passthrough (default) or guarded additive note (never a redaction). Wired into BOTH the non-streaming (chat()/Teams) and streaming (chatStream/web-ui) paths; unknown/ambiguous tokens disambiguate instead of silently routing; sub-agent errors are delivered faithfully. L3 — Forced delegation: ChatTurnInput.expectedDomainTool ports OB-31 to the orchestrator loop — forces tool_choice + a synthetic reminder when the turn would otherwise end without the required consult. Tests: test/orchestrator/directLine.test.ts (15 new, all green).
…tekeeper-drift-plan
…#332) Confidence-check follow-ups on the #332 Direct Line: - Collision rule (Open Q3): a leading `#token` is now a directive ONLY when it resolves to a whitelisted specialist. An UNKNOWN token falls through to the normal LLM turn, so ordinary messages that merely start with `#` (`#urgent …`, `#1 priority …`, hashtags) are no longer hijacked into a "no such agent" reply. Ambiguous tokens (matched ≥2 agents) still disambiguate. - Awareness/continuity (Pitfall 5): a direct-line turn is now persisted through the same `sessionLogger` as a normal turn (KG continuity + cross-session recall + turnId), instead of relying only on the channel's prior-turn buffer. Tests: +ambiguous-disambiguation, +session-logger persistence, unknown-token now asserts LLM fall-through (17 green).
…verbatim payload (#332) Independent Codex/GPT-5.4 review follow-ups: - Streaming direct-line `onAfterTurn` now carries `turnExternalId` (parity with the normal done branch), so graph-linking observers (#133 E8) fire on direct-line streams too. - Directive payload now keeps internal/trailing whitespace byte-for-byte; only the leading separator is stripped, so a whitespace-significant payload (fenced code block) reaches the sub-agent truly verbatim. An all-whitespace remainder still collapses to an empty payload. Tests 18 green (+verbatim-preservation, +all-whitespace-empty).
Codex review MEDIUM follow-ups: - Guarded-additive PII: the note runs an extra `provider.complete` over the verbatim answer that is NOT routed through the privacy interning path. Degrade guarded → strict whenever a privacy guard is active, so un-masked PII is never forwarded to the model provider. The verbatim block is still delivered intact. - KG-learning parity: a direct-line turn skips chatInContext*, so the knowledge graph never learned from a delegated answer. Fire `factExtractor.extractAndIngest` (fire-and-forget) after the session log lands, mirroring a normal turn. Tests 20 green (+guarded no-redaction, +guarded degrades-to-strict-under-privacy).
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
omadia routes every turn through a single orchestrator, which decides at LLM discretion whether/when/how to delegate to a named specialist and rephrases the result. This produces gatekeeper drift: the orchestrator quietly concentrates competence and mediates everything, so a requested delegation is neither observable nor non-suppressible. That is a blocker for enterprise adoption — most visibly on MS Teams, where the user today has no agent-invocation visibility at all.
This PR keeps the orchestrator in the loop and aware, but removes its destructive powers (suppressing/falsifying a delegation) while preserving its constructive ones, via three independently-shippable layers. Addresses the in-process scope of #332 (the Teams Adaptive-Card rendering lives in the private connector repo and is a follow-up).
What
L1 — Tamper-evident transparency (the minimum)
SemanticAnswer:agentsConsulted(curated projection of the deterministicrunTrace.agentInvocations) +DelegatedAnswer.toSemanticAnswerbuildsagentsConsultedfrom the choke-point trace — not the LLM's prose. The rawrunTracestays dropped at the channel boundary. A fabricated "I asked X" with no real invocation yields an empty list.agentsConsultedFooterText()plain-text fallback (🔎 Consulted: Strategist ✓ · 2 steps).L2 — Direct Line (core of the maximum)
directLine.ts:parseDirectLineDirective(a leading#<specialist>token that survives Teams mention-strip + whitespace-collapse) andresolveDirectLineTarget(resolves only against this orchestrator's whitelisted sub-agents).#tokenis treated as a directive only when it resolves to a whitelisted specialist. An unknown token falls through to the normal LLM turn, so ordinary messages that merely start with#(#urgent …,#1 priority …, hashtags) are never hijacked. Ambiguous tokens (matched ≥2 agents) disambiguate instead of routing silently (Pitfall 7).Orchestrator.executeDirectLine: binds the sub-agent's input to the user's verbatim payload via the deterministic choke point, captures the verbatim answer, and delivers it as a harness-owned, attributeddelegatedAnswerthe orchestrator can neither remove nor reword. Routed through the sameprivacyHandle.finalize. Faithful failures, never a cover-up.sessionLoggeras a normal turn (KG continuity + cross-session recall +turnId), not just the channel's prior-turn buffer.▸ omadia note:may be appended; the verbatim block stays byte-for-byte intact — the no-redaction invariant is structural).L3 — Forced delegation obligation
ChatTurnInput.expectedDomainToolports OB-31 to the orchestrator loop: if the turn would end (pure-text stop) without invoking the required sub-agent, the harness escalates once withtool_choice:{type:'tool',name:X}+ a synthetic reminder. This forces the consult within the iteration budget; an absolute "cannot end without it" guarantee additionally needs the verifier/postcondition layer (out of scope here, per the issue).MS Teams note (important)
Teams uses
chat()→runTurn→chatInContextInner(non-streaming); web-ui useschatStream→chatStreamInner(streaming). The issue located L2/L3 only inchatStream, which would have left Teams uncovered — so L2 and L3 are wired into BOTH parallel loops.Files
harness-channel-sdk/src/{outgoing,toSemanticAnswer,chatAgent,index}.ts— L1 contract, projection, fallback,expectedDomainTool,done-eventdelegatedAnswer.harness-orchestrator/src/directLine.ts(new) — pure parser/resolver/label.harness-orchestrator/src/orchestrator.ts—executeDirectLine, guarded note, session-log persistence, options, obligation port in both loops.harness-orchestrator/src/index.ts— exports.test/orchestrator/directLine.test.ts(new) — 20 tests.Validation
builderEditRoutes/ runtime-secrets) sometimes appears in the full run but passes in isolation and alongside the new tests — unrelated to this PR.Acceptance criteria
runTrace, not the LLM; a fabricated consult shows nothing.expectedDomainToolis a primitive with no production producer yet — the Conductor (feat(conductor): Spec 005 — Omadia Conductor (deterministic engine, designer, human-in-the-loop) — US1–US8 implemented + live-tested #321) is its intended caller.Independent review
An independent Codex/GPT-5.4 adversarial review was run. Fixes applied from it: streaming
turnExternalIdparity, truly-verbatim payload (only the separator is stripped), guarded-mode PII degrade-to-strict, and KG fact-extraction parity. Confirmed structurally sound: the no-redaction invariant and the directive collision rule.Follow-ups (not in this PR)
agentsConsulted/delegatedAnswer.Complementary to the Conductor (#321): this PR delivers the per-turn trust primitives; the Conductor composes L3 obligations into multi-step processes.