feat(aura): add VFS-agnostic MemoryFS for orchestration memory#77
Draft
henryjandrews wants to merge 67 commits into
Draft
feat(aura): add VFS-agnostic MemoryFS for orchestration memory#77henryjandrews wants to merge 67 commits into
henryjandrews wants to merge 67 commits into
Conversation
Squash+rebuild of feature/orchestration-mode (8368ffe) onto main (793ca13). Adds orchestration mode: a coordinator agent decomposes queries into worker tasks, dispatches them to specialized MCP-equipped agents, and synthesizes results. Key capabilities: - Coordinator/worker architecture with per-worker MCP tool scoping - Prompt journal and persistence layer for run artifact tracking - Evaluation tool for structured worker result scoring - OrchestratorEvent SSE stream with planning, task, evaluation, synthesis events - Dual-path streaming in web server (Agent vs Orchestrator) - Docker Compose test infrastructure (math-mcp service + CI runner) - Integration tests gated on `integration-orchestration` feature flag Ref: LOG-21951
Added structs for phased planning work Ref: LOG-23252
Prompts, types, and base loop for execution Ref: LOG-23252
Added sse events and some minor prompt tweaking Ref: LOG-23252
Three targeted fixes for failure modes identified in E2E validation: 1. Phase continuation prompt (P2): Strengthen decision criteria to prevent spurious replanning. Discovery results showing available tools should always continue. Default to continue unless results genuinely invalidate remaining phases. Eliminates GPT-5.1 replan loop (was 1/2, now 2/2). 2. Fallback synthesis (P1): Error paths for phase-replan exhaustion and task-failure exhaustion now attempt synthesize() on completed tasks before returning Err. Prevents content delivery gap where 3/8 runs produced zero user content. 3. Phase-aware evaluation (P3): Add phases_context to EvaluationVars and evaluation prompt template. Phased plans now include phase execution context (labels of completed phases + guidance that discovery phases are legitimate intermediate results). E2E results: 8/8 clean (up from 5/8 baseline), 539 tests pass, 0 clippy. Ref: LOG-23252
Change 3 added phase execution context to the evaluation prompt, which improved scoring for GPT/Sonnet but caused Qwen to score quality=0.0 for correctly reporting "no stddev tool available." This triggered replan→MaxDepthError→zero content, regressing Qwen from 2/2 to 0/2. Reverting restores the 8/8 clean result from Change 1 alone. Phase-aware evaluation will be re-attempted with softer guidance (quality floor for error-free completions, recognition that tool-limitation reports are valid). Changes 1 (phase-continuation prompt) and 2 (fallback synthesis) retained. Ref: LOG-23252
More reduction of code for orchestration in aura's core to prevent bad diffs from main. Enriched StreamingAgent trait with get_provider_info() and UsageState on stream_with_timeout(), eliminating the dual-type concrete_agent fork in handlers.rs Ref: LOG-23252
Rigs react loop doesn't stop with while planning loops are running - on a slower model or local model this causes a feedback loop that leads to timeouts Ref: LOG-23252
Mock k8s-sre-mcp FastMCP server (17 tools across Kubernetes, Prometheus, and Alertmanager domains) with optional VERBOSE_MODE for realistic cluster simulation. Four Rust integration tests behind integration-orchestration-sre feature flag verify orchestration lifecycle events, domain-specific tool routing, session ID correlation, and synthesis quality scoring. Includes Docker Compose overlays, CI and local TOML configs with three specialized workers (k8s-discovery, prometheus-analyst, monitoring-engineer), and .gitignore updates for Python artifacts. Ref: LOG-22753
Adds the feature in cargo.toml so that the test sre integraiton tests are callable Ref: LOG-22753
Rewrote README to match the main branch style while covering orchestration-specific content. Key changes: - Added multi-agent orchestration concepts: coordinator/worker architecture, DAG-based parallel execution, and quality evaluation loop (Plan-Execute-Synthesize-Evaluate). - Added CLI usage section with basic query, interactive, and verbose mode examples. - Added Orchestration configuration section with worker isolation, MCP/vector-store filtering, and link to example-workers.toml. - Added orchestration integration test commands and feature flags. - Updated project structure to reflect compose/, development/, docs/, and scripts/ directories. - Expanded Architecture section with prompt routing model and orchestrator component descriptions. - Consolidated redundant sections and removed stale content. Ref: LOG-23358
Added a concise open-alpha callout beneath the key capabilities list, noting that APIs and configuration may change between releases and linking to the GitHub issues page for feedback. Ref: LOG-23358 Made-with: Cursor
The orchestration flow diagram is not being included in this release. Removed the reference from the Documentation section to avoid a dangling link. Ref: LOG-23358 Made-with: Cursor
Changed final_result from initialized String::new() to an uninitialized declaration. The value was always overwritten before being read, triggering clippy unused-assignments with -D warnings. Ref: LOG-23358 Made-with: Cursor
Remove the mcp-openai-bridge from vendored pattern to inline mod to simplify code Ref: LOG-23293
Add toToml-based rendering so Helm values.yaml sections (llm, agent, mcp, etc.) are converted to valid TOML without hand-written template helpers. Helm's YAML parser turns ints into Go float64 causing toToml to render 8000 as 8000.0 — rather than fixing this in templates, a lenient_int serde module on the Rust side accepts both forms during deserialization, keeping the Helm template a plain toToml pass-through. Ref: LOG-23231
Remove old CLI crate this will be replaced by a more comprehensive stand alone cli/tui that can be run both remote and embedded Ref: LOG-23311
sre integration tests need a make target for deps and feature flag Ref: LOG-23252
Missing ollama references and a few other things, ensure that the new orchestration events section is additive from whats in main Ref: LOG-22815
Main bumped to edition 2024 rust which gives us a new set of clippy rules to conform to Ref: LOG-22753
Reflection prompt was mistakenly given only a char summary of worker events for use in replanning Ref: LOG-23405
Model config updated with different configs for models used in the e2e q1 -> q4 suite Ref: LOG-22815
Smaller quant models for workers get stuck in a tool loop. Were optimizing prompts to avoid sticking with the same exact tool call and expecting different outputs and providing a in code reminder to steer away from repates Ref: LOG-23411
Use the full worker preamble with scope, execution steps, critical rules to better enforce aura tool fields around reasoning and prevent looping. More clera error handling patterns in prompt, remove reasoning from required preventing re-loops with smaller models. Ref: LOG-21951
Move away from DAG style (flat list with ids for deps) to a true nested json structure for more accurate planning. This greatly improves planning accuracy. Ref: LOG-23434
Add optional `steps` field to Plan struct so plan.json shows the original LLM step structure alongside the flattened task array. Add math-orchestration-qwen35-ollama.toml for local Ollama testing. Ref: LOG-23434
Adds a `steps` plan format where tasks are sequential by default — flatten_steps() auto-assigns dependencies from ordering. This fixes Qwen3.5's persistent `task1.deps=[]` failure where the model couldn't declare dependencies in the DAG format. E2E confirmed: Qwen3.5 15/15 (100%), all Q2 plans show task 1: deps=[0]. Ref: LOG-23434
Adds stream_and_forward() — a streaming wrapper that forwards ReasoningDelta events through event_tx while collecting the final response. Migrates workers, synthesis, and phase continuation from the non-streaming chat_with_timeout path. This makes worker reasoning visible as aura.reasoning SSE events, enabling diagnosis of model-level issues like the Qwen3.5 duplicate tool-call loop (model hallucinates parameter failure despite success). Ref: LOG-23435
Include truncated task results in the evaluation prompt so the evaluator can cross-reference synthesized responses against actual tool outputs, reducing false hallucination accusations. Controlled by AURA_ENRICH_EVALUATION env var (default: true). Also switches to a dedicated evaluation preamble instead of reusing the coordinator preamble. ref: LOG-23425
A little clippy cleanup after cherry picks Ref: LOG-22924
Collaborator
Author
|
I have read the CLA Document and I hereby sign the CLA |
Collaborator
Author
|
recheck |
1 similar comment
Collaborator
Author
|
recheck |
Collaborator
Author
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Pull request overview
Adds a durable, VFS-agnostic “orchestration memory” layer (Markdown-first) that the coordinator can read via new MemoryFS tools and that Aura can write post-run, along with config/docs updates to enable and describe the feature.
Changes:
- Introduces
MemoryFs(read-only virtual FS + DSL) andMemoryWriter(post-run durable Markdown memory + index). - Adds coordinator-only memory tools (
list_memories,read_memory,search_memory,recent_memory,memory_shell) and updates coordinator prompts/config to encourage consulting memory before routing. - Adds new
[orchestration.memory]config (aura + aura-config) and updates docs/examples/configs accordingly.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/reference.toml | Documents new [orchestration.memory] settings and how they relate to artifacts.memory_dir. |
| crates/aura/src/prompts/orchestrator_preamble.md | Adds {{memory_guidance}} placeholder to coordinator preamble template. |
| crates/aura/src/orchestration/types.rs | Refactors reflection prompt builder to be mode-driven for tests (reduces env var dependency). |
| crates/aura/src/orchestration/tools/mod.rs | Registers and re-exports new coordinator memory tools module. |
| crates/aura/src/orchestration/tools/memory.rs | Implements coordinator read-only tools over MemoryFs (+ unit tests). |
| crates/aura/src/orchestration/orchestrator.rs | Wires memory config validation, memory guidance in planning prompt, registers memory tools for coordinator, and writes durable memory post-run. |
| crates/aura/src/orchestration/mod.rs | Adds memory_fs / memory_writer modules and exports MemoryConfig. |
| crates/aura/src/orchestration/memory_writer.rs | Implements durable Markdown memory writer and index regeneration (+ tests). |
| crates/aura/src/orchestration/memory_fs.rs | Implements the virtual filesystem + read-only “memory_shell” DSL (+ tests). |
| crates/aura/src/orchestration/config.rs | Adds MemoryConfig, memory_root() resolution, and preamble memory guidance inclusion (+ tests). |
| crates/aura/src/lib.rs | Re-exports MemoryConfig from the aura crate API surface. |
| crates/aura-config/src/config_test.rs | Adds parsing test coverage for [orchestration.memory] in aura-config. |
| crates/aura-config/src/config.rs | Mirrors MemoryConfig into aura-config’s OrchestrationConfig model. |
| crates/aura-config/src/builder.rs | Plumbs aura-config memory settings into aura runtime OrchestrationConfig. |
| configs/mezmo-ops-orchestration.toml | Enables durable memory in an example ops orchestration config. |
| README.md | Documents [orchestration.memory] and memory tool behavior for coordinators. |
| CLAUDE.md | Updates repo dev/testing guidance and structure overview. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+221
to
+231
| async fn atomic_write(path: &Path, content: &str) -> std::io::Result<()> { | ||
| if let Some(parent) = path.parent() { | ||
| fs::create_dir_all(parent).await?; | ||
| } | ||
| let timestamp = DateTime::<Utc>::from(std::time::SystemTime::now()) | ||
| .timestamp_nanos_opt() | ||
| .unwrap_or_default(); | ||
| let tmp = path.with_extension(format!("tmp-{timestamp}")); | ||
| fs::write(&tmp, content).await?; | ||
| fs::rename(tmp, path).await | ||
| } |
Comment on lines
+1871
to
+1878
| // Planning coordinator uses a tight depth budget. Memory-enabled coordinators get | ||
| // extra turns so a memory lookup cannot consume the routing-tool budget. | ||
| // stream_and_collect() provides the primary early-exit guard; this is defense-in-depth. | ||
| let max_depth = PLANNING_COORDINATOR_MAX_DEPTH; | ||
| let max_depth = if self.config.memory.enabled { | ||
| PLANNING_COORDINATOR_MAX_DEPTH + 3 | ||
| } else { | ||
| PLANNING_COORDINATOR_MAX_DEPTH | ||
| }; |
Comment on lines
+472
to
+507
| async fn walk_collect<F>( | ||
| &self, | ||
| root: &Path, | ||
| virtual_root: &str, | ||
| visitor: &mut F, | ||
| ) -> std::io::Result<()> | ||
| where | ||
| F: FnMut(&Path, &str) -> bool, | ||
| { | ||
| let mut stack = vec![(root.to_path_buf(), virtual_root.to_string())]; | ||
| while let Some((path, virtual_path)) = stack.pop() { | ||
| if !visitor(&path, &virtual_path) { | ||
| break; | ||
| } | ||
| if path.is_dir() { | ||
| let mut entries = fs::read_dir(&path).await?; | ||
| let mut children = Vec::new(); | ||
| while let Some(entry) = entries.next_entry().await? { | ||
| let child = entry.path(); | ||
| let name = entry.file_name().to_string_lossy().to_string(); | ||
| let child_virtual = join_virtual(&virtual_path, &name); | ||
| children.push((child, child_virtual)); | ||
| } | ||
| children.sort_by(|a, b| b.1.cmp(&a.1)); | ||
| stack.extend(children); | ||
| } | ||
| } | ||
| Ok(()) | ||
| } | ||
|
|
||
| fn resolve(&self, path: &str, cwd: &str) -> Result<ResolvedPath, String> { | ||
| let virtual_path = resolve_virtual(path, cwd)?; | ||
| let relative = virtual_path.trim_start_matches('/'); | ||
| let real = self.root.join(relative); | ||
| Ok(ResolvedPath { real, virtual_path }) | ||
| } |
Comment on lines
+325
to
+362
| async fn query(&self, args: &[String], cwd: &str) -> std::io::Result<MemoryFsOutput> { | ||
| if args.len() != 5 || args[1] != "--field" || args[3] != "--equals" { | ||
| return Ok(MemoryFsOutput::err( | ||
| "query usage: query <path> --field FIELD --equals VALUE", | ||
| cwd.to_string(), | ||
| )); | ||
| } | ||
| let resolved = match self.resolve(&args[0], cwd) { | ||
| Ok(path) => path, | ||
| Err(e) => return Ok(MemoryFsOutput::err(e, cwd.to_string())), | ||
| }; | ||
| let content = fs::read_to_string(&resolved.real).await?; | ||
| let mut matches = Vec::new(); | ||
| for line in content.lines() { | ||
| let candidate = if content.trim_start().starts_with('{') && content.lines().count() == 1 | ||
| { | ||
| content.as_str() | ||
| } else { | ||
| line | ||
| }; | ||
| if let Ok(value) = serde_json::from_str::<serde_json::Value>(candidate) | ||
| && json_field_equals(&value, &args[2], &args[4]) | ||
| { | ||
| matches.push(candidate.to_string()); | ||
| } | ||
| if content.trim_start().starts_with('{') && content.lines().count() == 1 { | ||
| break; | ||
| } | ||
| } | ||
| Ok(MemoryFsOutput::ok( | ||
| if matches.is_empty() { | ||
| String::new() | ||
| } else { | ||
| format!("{}\n", matches.join("\n")) | ||
| }, | ||
| cwd.to_string(), | ||
| matches.len() >= self.max_search_results, | ||
| )) |
Comment on lines
+596
to
+601
| async fn is_binary(path: &Path) -> bool { | ||
| let Ok(data) = fs::read(path).await else { | ||
| return true; | ||
| }; | ||
| data.iter().take(8192).any(|b| *b == 0) | ||
| } |
Comment on lines
+239
to
+249
| let fs = self.config.fs(args.limit); | ||
| let mut matches = Vec::new(); | ||
| let mut truncated = false; | ||
| let paths = args.paths.unwrap_or_else(|| vec!["/memory".to_string()]); | ||
| for path in paths { | ||
| let output = fs | ||
| .search_path(&path, &args.query, None, args.case_sensitive, args.regex) | ||
| .await?; | ||
| truncated |= output.truncated; | ||
| matches.extend(output.stdout.lines().map(ToString::to_string)); | ||
| } |
Comment on lines
+195
to
+202
| async fn call(&self, args: Self::Args) -> Result<Self::Output, Self::Error> { | ||
| let fs = self.config.fs(None); | ||
| let output = if let Some(tail_n) = args.tail_n { | ||
| fs.execute(&format!("tail -n {tail_n} {}", args.path), None) | ||
| .await? | ||
| } else { | ||
| fs.read_path(&args.path, None).await? | ||
| }; |
Collaborator
Author
|
recheck |
bdf32d4 to
7d587f5
Compare
7d587f5 to
1f65c68
Compare
Base automatically changed from
justingross/LOG-23587-add-aura-cli
to
feature/orchestration-mode
May 11, 2026 18:12
cac986e to
69fe071
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a VFS-agnostic MemoryFS layered over
orchestration.artifacts.memory_dir, plus coordinator-only tools and an internal writer so Aura's coordinator can consult durable orchestration memory before routing. Backend-agnostic: works with local disk, Archil, NFS, Docker volumes, or K8s PVCs as a mounted path.Changes
Config
[orchestration.memory]section (disabled by default) withenabled,root_dir,max_read_bytes,max_search_resultsmemory.root_dir, falls back toartifacts.memory_dir; fails clearly if enabled with neithercrates/aura-configMemoryFS layout (Markdown-first, OpenChronicle-inspired)
memory/index.md,event-YYYY-MM-DD.md,worker-*.md,failure-*.md, etc.Coordinator-only read tools
list_memories,read_memory,search_memory,recent_memory,memory_shellmemory_shellis a read-only DSL (pwd,ls,cat,head,tail,stat,find,grep,query) — no subprocess execution, no pipes/redirects/writesInternal
MemoryWriterwrite_run_manifestwhen memory is enabledindex.mdCoordinator behavior