feat(aura): add VFS-agnostic MemoryFS for orchestration memory by henryjandrews · Pull Request #77 · mezmo/aura

henryjandrews · 2026-04-27T16:10:56Z

Summary

Adds a VFS-agnostic MemoryFS layered over orchestration.artifacts.memory_dir, plus coordinator-only tools and an internal writer so Aura's coordinator can consult durable orchestration memory before routing. Backend-agnostic: works with local disk, Archil, NFS, Docker volumes, or K8s PVCs as a mounted path.

Changes

Config

New [orchestration.memory] section (disabled by default) with enabled, root_dir, max_read_bytes, max_search_results
Root resolves to memory.root_dir, falls back to artifacts.memory_dir; fails clearly if enabled with neither
Mirrored in crates/aura-config

MemoryFS layout (Markdown-first, OpenChronicle-inspired)

memory/index.md, event-YYYY-MM-DD.md, worker-*.md, failure-*.md, etc.
YAML frontmatter, append-only timestamped entries, atomic writes
Run artifacts remain in existing session/run layout

Coordinator-only read tools

list_memories, read_memory, search_memory, recent_memory, memory_shell
Native Rust search (fixed-string, regex, case-insensitive) with result caps and truncation reporting
memory_shell is a read-only DSL (pwd, ls, cat, head, tail, stat, find, grep, query) — no subprocess execution, no pipes/redirects/writes
Virtual paths only; traversal and host-path escapes rejected

Internal MemoryWriter

Invoked after write_run_manifest when memory is enabled
Appends timeline + per-worker success/failure entries, regenerates index.md
Failures are warning-only; no LLM classification in v1

Coordinator behavior

Preamble + planning prompt updated to consult memory before routing on prior-reference, follow-up, or underspecified queries
Memory tools registered before routing tools; memory calls do not trigger early routing exit
Recalled facts embedded directly into worker task descriptions (workers remain memory-blind in v1)

Squash+rebuild of feature/orchestration-mode (8368ffe) onto main (793ca13). Adds orchestration mode: a coordinator agent decomposes queries into worker tasks, dispatches them to specialized MCP-equipped agents, and synthesizes results. Key capabilities: - Coordinator/worker architecture with per-worker MCP tool scoping - Prompt journal and persistence layer for run artifact tracking - Evaluation tool for structured worker result scoring - OrchestratorEvent SSE stream with planning, task, evaluation, synthesis events - Dual-path streaming in web server (Agent vs Orchestrator) - Docker Compose test infrastructure (math-mcp service + CI runner) - Integration tests gated on `integration-orchestration` feature flag Ref: LOG-21951

Added structs for phased planning work Ref: LOG-23252

Prompts, types, and base loop for execution Ref: LOG-23252

Added sse events and some minor prompt tweaking Ref: LOG-23252

Three targeted fixes for failure modes identified in E2E validation: 1. Phase continuation prompt (P2): Strengthen decision criteria to prevent spurious replanning. Discovery results showing available tools should always continue. Default to continue unless results genuinely invalidate remaining phases. Eliminates GPT-5.1 replan loop (was 1/2, now 2/2). 2. Fallback synthesis (P1): Error paths for phase-replan exhaustion and task-failure exhaustion now attempt synthesize() on completed tasks before returning Err. Prevents content delivery gap where 3/8 runs produced zero user content. 3. Phase-aware evaluation (P3): Add phases_context to EvaluationVars and evaluation prompt template. Phased plans now include phase execution context (labels of completed phases + guidance that discovery phases are legitimate intermediate results). E2E results: 8/8 clean (up from 5/8 baseline), 539 tests pass, 0 clippy. Ref: LOG-23252

Change 3 added phase execution context to the evaluation prompt, which improved scoring for GPT/Sonnet but caused Qwen to score quality=0.0 for correctly reporting "no stddev tool available." This triggered replan→MaxDepthError→zero content, regressing Qwen from 2/2 to 0/2. Reverting restores the 8/8 clean result from Change 1 alone. Phase-aware evaluation will be re-attempted with softer guidance (quality floor for error-free completions, recognition that tool-limitation reports are valid). Changes 1 (phase-continuation prompt) and 2 (fallback synthesis) retained. Ref: LOG-23252

More reduction of code for orchestration in aura's core to prevent bad diffs from main. Enriched StreamingAgent trait with get_provider_info() and UsageState on stream_with_timeout(), eliminating the dual-type concrete_agent fork in handlers.rs Ref: LOG-23252

Rigs react loop doesn't stop with while planning loops are running - on a slower model or local model this causes a feedback loop that leads to timeouts Ref: LOG-23252

Mock k8s-sre-mcp FastMCP server (17 tools across Kubernetes, Prometheus, and Alertmanager domains) with optional VERBOSE_MODE for realistic cluster simulation. Four Rust integration tests behind integration-orchestration-sre feature flag verify orchestration lifecycle events, domain-specific tool routing, session ID correlation, and synthesis quality scoring. Includes Docker Compose overlays, CI and local TOML configs with three specialized workers (k8s-discovery, prometheus-analyst, monitoring-engineer), and .gitignore updates for Python artifacts. Ref: LOG-22753

Adds the feature in cargo.toml so that the test sre integraiton tests are callable Ref: LOG-22753

Rewrote README to match the main branch style while covering orchestration-specific content. Key changes: - Added multi-agent orchestration concepts: coordinator/worker architecture, DAG-based parallel execution, and quality evaluation loop (Plan-Execute-Synthesize-Evaluate). - Added CLI usage section with basic query, interactive, and verbose mode examples. - Added Orchestration configuration section with worker isolation, MCP/vector-store filtering, and link to example-workers.toml. - Added orchestration integration test commands and feature flags. - Updated project structure to reflect compose/, development/, docs/, and scripts/ directories. - Expanded Architecture section with prompt routing model and orchestrator component descriptions. - Consolidated redundant sections and removed stale content. Ref: LOG-23358

Added a concise open-alpha callout beneath the key capabilities list, noting that APIs and configuration may change between releases and linking to the GitHub issues page for feedback. Ref: LOG-23358 Made-with: Cursor

The orchestration flow diagram is not being included in this release. Removed the reference from the Documentation section to avoid a dangling link. Ref: LOG-23358 Made-with: Cursor

Changed final_result from initialized String::new() to an uninitialized declaration. The value was always overwritten before being read, triggering clippy unused-assignments with -D warnings. Ref: LOG-23358 Made-with: Cursor

Remove the mcp-openai-bridge from vendored pattern to inline mod to simplify code Ref: LOG-23293

Add toToml-based rendering so Helm values.yaml sections (llm, agent, mcp, etc.) are converted to valid TOML without hand-written template helpers. Helm's YAML parser turns ints into Go float64 causing toToml to render 8000 as 8000.0 — rather than fixing this in templates, a lenient_int serde module on the Rust side accepts both forms during deserialization, keeping the Helm template a plain toToml pass-through. Ref: LOG-23231

Remove old CLI crate this will be replaced by a more comprehensive stand alone cli/tui that can be run both remote and embedded Ref: LOG-23311

sre integration tests need a make target for deps and feature flag Ref: LOG-23252

Missing ollama references and a few other things, ensure that the new orchestration events section is additive from whats in main Ref: LOG-22815

Main bumped to edition 2024 rust which gives us a new set of clippy rules to conform to Ref: LOG-22753

Reflection prompt was mistakenly given only a char summary of worker events for use in replanning Ref: LOG-23405

Model config updated with different configs for models used in the e2e q1 -> q4 suite Ref: LOG-22815

Smaller quant models for workers get stuck in a tool loop. Were optimizing prompts to avoid sticking with the same exact tool call and expecting different outputs and providing a in code reminder to steer away from repates Ref: LOG-23411

Use the full worker preamble with scope, execution steps, critical rules to better enforce aura tool fields around reasoning and prevent looping. More clera error handling patterns in prompt, remove reasoning from required preventing re-loops with smaller models. Ref: LOG-21951

Move away from DAG style (flat list with ids for deps) to a true nested json structure for more accurate planning. This greatly improves planning accuracy. Ref: LOG-23434

Add optional `steps` field to Plan struct so plan.json shows the original LLM step structure alongside the flattened task array. Add math-orchestration-qwen35-ollama.toml for local Ollama testing. Ref: LOG-23434

Adds a `steps` plan format where tasks are sequential by default — flatten_steps() auto-assigns dependencies from ordering. This fixes Qwen3.5's persistent `task1.deps=[]` failure where the model couldn't declare dependencies in the DAG format. E2E confirmed: Qwen3.5 15/15 (100%), all Q2 plans show task 1: deps=[0]. Ref: LOG-23434

Adds stream_and_forward() — a streaming wrapper that forwards ReasoningDelta events through event_tx while collecting the final response. Migrates workers, synthesis, and phase continuation from the non-streaming chat_with_timeout path. This makes worker reasoning visible as aura.reasoning SSE events, enabling diagnosis of model-level issues like the Qwen3.5 duplicate tool-call loop (model hallucinates parameter failure despite success). Ref: LOG-23435

Include truncated task results in the evaluation prompt so the evaluator can cross-reference synthesized responses against actual tool outputs, reducing false hallucination accusations. Controlled by AURA_ENRICH_EVALUATION env var (default: true). Also switches to a dedicated evaluation preamble instead of reusing the coordinator preamble. ref: LOG-23425

A little clippy cleanup after cherry picks Ref: LOG-22924

henryjandrews · 2026-04-27T16:30:34Z

I have read the CLA Document and I hereby sign the CLA

henryjandrews · 2026-04-27T16:30:56Z

recheck

henryjandrews · 2026-04-28T23:45:50Z

recheck

henryjandrews · 2026-04-28T23:48:14Z

I have read the CLA Document and I hereby sign the CLA

Copilot

Pull request overview

Adds a durable, VFS-agnostic “orchestration memory” layer (Markdown-first) that the coordinator can read via new MemoryFS tools and that Aura can write post-run, along with config/docs updates to enable and describe the feature.

Changes:

Introduces MemoryFs (read-only virtual FS + DSL) and MemoryWriter (post-run durable Markdown memory + index).
Adds coordinator-only memory tools (list_memories, read_memory, search_memory, recent_memory, memory_shell) and updates coordinator prompts/config to encourage consulting memory before routing.
Adds new [orchestration.memory] config (aura + aura-config) and updates docs/examples/configs accordingly.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
examples/reference.toml	Documents new `[orchestration.memory]` settings and how they relate to `artifacts.memory_dir`.
crates/aura/src/prompts/orchestrator_preamble.md	Adds `{{memory_guidance}}` placeholder to coordinator preamble template.
crates/aura/src/orchestration/types.rs	Refactors reflection prompt builder to be mode-driven for tests (reduces env var dependency).
crates/aura/src/orchestration/tools/mod.rs	Registers and re-exports new coordinator memory tools module.
crates/aura/src/orchestration/tools/memory.rs	Implements coordinator read-only tools over `MemoryFs` (+ unit tests).
crates/aura/src/orchestration/orchestrator.rs	Wires memory config validation, memory guidance in planning prompt, registers memory tools for coordinator, and writes durable memory post-run.
crates/aura/src/orchestration/mod.rs	Adds `memory_fs` / `memory_writer` modules and exports `MemoryConfig`.
crates/aura/src/orchestration/memory_writer.rs	Implements durable Markdown memory writer and index regeneration (+ tests).
crates/aura/src/orchestration/memory_fs.rs	Implements the virtual filesystem + read-only “memory_shell” DSL (+ tests).
crates/aura/src/orchestration/config.rs	Adds `MemoryConfig`, `memory_root()` resolution, and preamble memory guidance inclusion (+ tests).
crates/aura/src/lib.rs	Re-exports `MemoryConfig` from the aura crate API surface.
crates/aura-config/src/config_test.rs	Adds parsing test coverage for `[orchestration.memory]` in aura-config.
crates/aura-config/src/config.rs	Mirrors `MemoryConfig` into aura-config’s OrchestrationConfig model.
crates/aura-config/src/builder.rs	Plumbs aura-config memory settings into aura runtime OrchestrationConfig.
configs/mezmo-ops-orchestration.toml	Enables durable memory in an example ops orchestration config.
README.md	Documents `[orchestration.memory]` and memory tool behavior for coordinators.
CLAUDE.md	Updates repo dev/testing guidance and structure overview.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+async fn atomic_write(path: &Path, content: &str) -> std::io::Result<()> {
+    if let Some(parent) = path.parent() {
+        fs::create_dir_all(parent).await?;
+    }
+    let timestamp = DateTime::<Utc>::from(std::time::SystemTime::now())
+        .timestamp_nanos_opt()
+        .unwrap_or_default();
+    let tmp = path.with_extension(format!("tmp-{timestamp}"));
+    fs::write(&tmp, content).await?;
+    fs::rename(tmp, path).await
+}


+        // Planning coordinator uses a tight depth budget. Memory-enabled coordinators get
+        // extra turns so a memory lookup cannot consume the routing-tool budget.
        // stream_and_collect() provides the primary early-exit guard; this is defense-in-depth.
-        let max_depth = PLANNING_COORDINATOR_MAX_DEPTH;
+        let max_depth = if self.config.memory.enabled {
+            PLANNING_COORDINATOR_MAX_DEPTH + 3
+        } else {
+            PLANNING_COORDINATOR_MAX_DEPTH
+        };


+    async fn walk_collect<F>(
+        &self,
+        root: &Path,
+        virtual_root: &str,
+        visitor: &mut F,
+    ) -> std::io::Result<()>
+    where
+        F: FnMut(&Path, &str) -> bool,
+    {
+        let mut stack = vec![(root.to_path_buf(), virtual_root.to_string())];
+        while let Some((path, virtual_path)) = stack.pop() {
+            if !visitor(&path, &virtual_path) {
+                break;
+            }
+            if path.is_dir() {
+                let mut entries = fs::read_dir(&path).await?;
+                let mut children = Vec::new();
+                while let Some(entry) = entries.next_entry().await? {
+                    let child = entry.path();
+                    let name = entry.file_name().to_string_lossy().to_string();
+                    let child_virtual = join_virtual(&virtual_path, &name);
+                    children.push((child, child_virtual));
+                }
+                children.sort_by(|a, b| b.1.cmp(&a.1));
+                stack.extend(children);
+            }
+        }
+        Ok(())
+    }
+
+    fn resolve(&self, path: &str, cwd: &str) -> Result<ResolvedPath, String> {
+        let virtual_path = resolve_virtual(path, cwd)?;
+        let relative = virtual_path.trim_start_matches('/');
+        let real = self.root.join(relative);
+        Ok(ResolvedPath { real, virtual_path })
+    }


+    async fn query(&self, args: &[String], cwd: &str) -> std::io::Result<MemoryFsOutput> {
+        if args.len() != 5 || args[1] != "--field" || args[3] != "--equals" {
+            return Ok(MemoryFsOutput::err(
+                "query usage: query <path> --field FIELD --equals VALUE",
+                cwd.to_string(),
+            ));
+        }
+        let resolved = match self.resolve(&args[0], cwd) {
+            Ok(path) => path,
+            Err(e) => return Ok(MemoryFsOutput::err(e, cwd.to_string())),
+        };
+        let content = fs::read_to_string(&resolved.real).await?;
+        let mut matches = Vec::new();
+        for line in content.lines() {
+            let candidate = if content.trim_start().starts_with('{') && content.lines().count() == 1
+            {
+                content.as_str()
+            } else {
+                line
+            };
+            if let Ok(value) = serde_json::from_str::<serde_json::Value>(candidate)
+                && json_field_equals(&value, &args[2], &args[4])
+            {
+                matches.push(candidate.to_string());
+            }
+            if content.trim_start().starts_with('{') && content.lines().count() == 1 {
+                break;
+            }
+        }
+        Ok(MemoryFsOutput::ok(
+            if matches.is_empty() {
+                String::new()
+            } else {
+                format!("{}\n", matches.join("\n"))
+            },
+            cwd.to_string(),
+            matches.len() >= self.max_search_results,
+        ))


+async fn is_binary(path: &Path) -> bool {
+    let Ok(data) = fs::read(path).await else {
+        return true;
+    };
+    data.iter().take(8192).any(|b| *b == 0)
+}


+        let fs = self.config.fs(args.limit);
+        let mut matches = Vec::new();
+        let mut truncated = false;
+        let paths = args.paths.unwrap_or_else(|| vec!["/memory".to_string()]);
+        for path in paths {
+            let output = fs
+                .search_path(&path, &args.query, None, args.case_sensitive, args.regex)
+                .await?;
+            truncated |= output.truncated;
+            matches.extend(output.stdout.lines().map(ToString::to_string));
+        }


+    async fn call(&self, args: Self::Args) -> Result<Self::Output, Self::Error> {
+        let fs = self.config.fs(None);
+        let output = if let Some(tail_n) = args.tail_n {
+            fs.execute(&format!("tail -n {tail_n} {}", args.path), None)
+                .await?
+        } else {
+            fs.read_path(&args.path, None).await?
+        };


henryjandrews · 2026-04-28T23:57:01Z

recheck

Shearerbeard and others added 30 commits March 30, 2026 11:28

feat: types for phased planning

3706966

Added structs for phased planning work Ref: LOG-23252

feat: phased planning base execution loop

7a67361

Prompts, types, and base loop for execution Ref: LOG-23252

feat: phased planning SSE events

4054f13

Added sse events and some minor prompt tweaking Ref: LOG-23252

fix: rig react loops continues to churt through planning

cb755da

Rigs react loop doesn't stop with while planning loops are running - on a slower model or local model this causes a feedback loop that leads to timeouts Ref: LOG-23252

chore: ads sre tests as feature in web cargo.toml

6d439f2

Adds the feature in cargo.toml so that the test sre integraiton tests are callable Ref: LOG-22753

doc: add open-alpha notice to README

5b8f74a

Added a concise open-alpha callout beneath the key capabilities list, noting that APIs and configuration may change between releases and linking to the GitHub issues page for feedback. Ref: LOG-23358 Made-with: Cursor

doc: remove orchestration-flow-diagram from README docs list

848bb26

The orchestration flow diagram is not being included in this release. Removed the reference from the Documentation section to avoid a dangling link. Ref: LOG-23358 Made-with: Cursor

fix: remove unused initial assignment in orchestration loop

b64a390

Changed final_result from initialized String::new() to an uninitialized declaration. The value was always overwritten before being read, triggering clippy unused-assignments with -D warnings. Ref: LOG-23358 Made-with: Cursor

chore: remove old vendored sanitiation and move to aura

d6241b7

Remove the mcp-openai-bridge from vendored pattern to inline mod to simplify code Ref: LOG-23293

chore: remove old cli

58b477d

Remove old CLI crate this will be replaced by a more comprehensive stand alone cli/tui that can be run both remote and embedded Ref: LOG-23311

chore: add sre test target to makefile

1467aef

sre integration tests need a make target for deps and feature flag Ref: LOG-23252

doc: add orchestration events to streaming guide, fix stale references

e225b8b

Missing ollama references and a few other things, ensure that the new orchestration events section is additive from whats in main Ref: LOG-22815

chore: update clippy rules to match main

8c2a418

Main bumped to edition 2024 rust which gives us a new set of clippy rules to conform to Ref: LOG-22753

fix: reflection planning prompt summary fix

16885e5

Reflection prompt was mistakenly given only a char summary of worker events for use in replanning Ref: LOG-23405

chore: update model configs for e2e examples

5c87c16

Model config updated with different configs for models used in the e2e q1 -> q4 suite Ref: LOG-22815

fix: prevent identical tool looping

f58900e

Smaller quant models for workers get stuck in a tool loop. Were optimizing prompts to avoid sticking with the same exact tool call and expecting different outputs and providing a in code reminder to steer away from repates Ref: LOG-23411

feat: used heirarchical vs dag planning

5728e0d

Move away from DAG style (flat list with ids for deps) to a true nested json structure for more accurate planning. This greatly improves planning accuracy. Ref: LOG-23434

feat: persist original steps in plan.json and add Ollama config

7d28d59

Add optional `steps` field to Plan struct so plan.json shows the original LLM step structure alongside the flattened task array. Add math-orchestration-qwen35-ollama.toml for local Ollama testing. Ref: LOG-23434

chore: clippy fmt

b872f24

A little clippy cleanup after cherry picks Ref: LOG-22924

henryjandrews requested a review from Copilot April 28, 2026 23:46

Copilot started reviewing on behalf of henryjandrews April 28, 2026 23:46 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

justintime4tea force-pushed the justingross/LOG-23587-add-aura-cli branch 18 times, most recently from bdf32d4 to 7d587f5 Compare May 6, 2026 19:47

justintime4tea force-pushed the justingross/LOG-23587-add-aura-cli branch from 7d587f5 to 1f65c68 Compare May 11, 2026 16:35

Base automatically changed from justingross/LOG-23587-add-aura-cli to feature/orchestration-mode May 11, 2026 18:12

Shearerbeard force-pushed the feature/orchestration-mode branch from cac986e to 69fe071 Compare May 14, 2026 00:59

Base automatically changed from feature/orchestration-mode to main May 14, 2026 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(aura): add VFS-agnostic MemoryFS for orchestration memory#77

feat(aura): add VFS-agnostic MemoryFS for orchestration memory#77
henryjandrews wants to merge 67 commits into
mainfrom
henryjandrews/aura-memoryfs

henryjandrews commented Apr 27, 2026

Uh oh!

henryjandrews commented Apr 27, 2026

Uh oh!

henryjandrews commented Apr 27, 2026

Uh oh!

henryjandrews commented Apr 28, 2026

Uh oh!

henryjandrews commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

henryjandrews commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

henryjandrews commented Apr 27, 2026

Summary

Changes

Uh oh!

henryjandrews commented Apr 27, 2026

Uh oh!

henryjandrews commented Apr 27, 2026

Uh oh!

henryjandrews commented Apr 28, 2026

Uh oh!

henryjandrews commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

henryjandrews commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants