Skip to content

feat(aura): add VFS-agnostic MemoryFS for orchestration memory#77

Draft
henryjandrews wants to merge 67 commits into
mainfrom
henryjandrews/aura-memoryfs
Draft

feat(aura): add VFS-agnostic MemoryFS for orchestration memory#77
henryjandrews wants to merge 67 commits into
mainfrom
henryjandrews/aura-memoryfs

Conversation

@henryjandrews
Copy link
Copy Markdown
Collaborator

Summary

Adds a VFS-agnostic MemoryFS layered over orchestration.artifacts.memory_dir, plus coordinator-only tools and an internal writer so Aura's coordinator can consult durable orchestration memory before routing. Backend-agnostic: works with local disk, Archil, NFS, Docker volumes, or K8s PVCs as a mounted path.

Changes

Config

  • New [orchestration.memory] section (disabled by default) with enabled, root_dir, max_read_bytes, max_search_results
  • Root resolves to memory.root_dir, falls back to artifacts.memory_dir; fails clearly if enabled with neither
  • Mirrored in crates/aura-config

MemoryFS layout (Markdown-first, OpenChronicle-inspired)

  • memory/index.md, event-YYYY-MM-DD.md, worker-*.md, failure-*.md, etc.
  • YAML frontmatter, append-only timestamped entries, atomic writes
  • Run artifacts remain in existing session/run layout

Coordinator-only read tools

  • list_memories, read_memory, search_memory, recent_memory, memory_shell
  • Native Rust search (fixed-string, regex, case-insensitive) with result caps and truncation reporting
  • memory_shell is a read-only DSL (pwd, ls, cat, head, tail, stat, find, grep, query) — no subprocess execution, no pipes/redirects/writes
  • Virtual paths only; traversal and host-path escapes rejected

Internal MemoryWriter

  • Invoked after write_run_manifest when memory is enabled
  • Appends timeline + per-worker success/failure entries, regenerates index.md
  • Failures are warning-only; no LLM classification in v1

Coordinator behavior

  • Preamble + planning prompt updated to consult memory before routing on prior-reference, follow-up, or underspecified queries
  • Memory tools registered before routing tools; memory calls do not trigger early routing exit
  • Recalled facts embedded directly into worker task descriptions (workers remain memory-blind in v1)

Shearerbeard and others added 30 commits March 30, 2026 11:28
Squash+rebuild of feature/orchestration-mode
(8368ffe) onto main (793ca13).

Adds orchestration mode: a coordinator agent decomposes
queries into worker tasks, dispatches them to specialized
MCP-equipped agents, and synthesizes results.
Key capabilities:

- Coordinator/worker architecture with
per-worker MCP tool scoping
- Prompt journal and persistence layer for
run artifact tracking
- Evaluation tool for structured worker result scoring
- OrchestratorEvent SSE stream with planning,
task, evaluation, synthesis events
- Dual-path streaming in web server
(Agent vs Orchestrator)
- Docker Compose test infrastructure
(math-mcp service + CI runner)
- Integration tests gated on `integration-orchestration`
feature flag

Ref: LOG-21951
Added structs for phased planning work

Ref: LOG-23252
Prompts, types, and base loop for execution

Ref: LOG-23252
Added sse events and some minor prompt tweaking

Ref: LOG-23252
Three targeted fixes for failure modes
identified in E2E validation:

1. Phase continuation prompt (P2):
   Strengthen decision criteria to prevent
   spurious replanning. Discovery results
   showing available tools should always
   continue. Default to continue unless results
   genuinely invalidate remaining phases.
   Eliminates GPT-5.1 replan loop (was 1/2, now 2/2).

2. Fallback synthesis (P1): Error paths
   for phase-replan exhaustion and
   task-failure exhaustion now attempt
   synthesize() on completed tasks
   before returning Err. Prevents content
   delivery gap where 3/8 runs
   produced zero user content.

3. Phase-aware evaluation (P3): Add
   phases_context to EvaluationVars
   and evaluation prompt template.
   Phased plans now include phase
   execution context (labels of completed
   phases + guidance that discovery phases
   are legitimate intermediate results).

E2E results: 8/8 clean (up from 5/8 baseline),
  539 tests pass, 0 clippy.

Ref: LOG-23252
Change 3 added phase execution context
to the evaluation prompt, which
improved scoring for GPT/Sonnet but
caused Qwen to score quality=0.0
for correctly reporting "no stddev
tool available." This triggered
replan→MaxDepthError→zero content,
regressing Qwen from 2/2 to 0/2.

Reverting restores the 8/8 clean
result from Change 1 alone. Phase-aware
evaluation will be re-attempted with
softer guidance (quality floor for
error-free completions, recognition
that tool-limitation reports are valid).

Changes 1 (phase-continuation prompt)
and 2 (fallback synthesis) retained.

Ref: LOG-23252
More reduction of code for orchestration in aura's
core to prevent bad diffs from main.

Enriched StreamingAgent trait with get_provider_info() and UsageState on
stream_with_timeout(), eliminating the dual-type concrete_agent
fork in handlers.rs

Ref: LOG-23252
Rigs react loop doesn't stop with while planning loops
are running - on a slower model or local model this
causes a feedback loop that leads to timeouts

Ref: LOG-23252
Mock k8s-sre-mcp FastMCP server (17 tools across Kubernetes,
Prometheus, and Alertmanager domains) with optional VERBOSE_MODE
for realistic cluster simulation.

Four Rust integration tests behind integration-orchestration-sre
feature flag verify orchestration lifecycle events, domain-specific
tool routing, session ID correlation, and synthesis quality scoring.

Includes Docker Compose overlays, CI and local TOML configs with
three specialized workers (k8s-discovery, prometheus-analyst,
monitoring-engineer), and .gitignore updates for Python artifacts.

Ref: LOG-22753
Adds the feature in cargo.toml so that the test sre
integraiton tests are callable

Ref: LOG-22753
Rewrote README to match the main branch style while covering
orchestration-specific content. Key changes:

- Added multi-agent orchestration concepts: coordinator/worker
  architecture, DAG-based parallel execution, and quality evaluation
  loop (Plan-Execute-Synthesize-Evaluate).
- Added CLI usage section with basic query, interactive, and verbose
  mode examples.
- Added Orchestration configuration section with worker isolation,
  MCP/vector-store filtering, and link to example-workers.toml.
- Added orchestration integration test commands and feature flags.
- Updated project structure to reflect compose/, development/, docs/,
  and scripts/ directories.
- Expanded Architecture section with prompt routing model and
  orchestrator component descriptions.
- Consolidated redundant sections and removed stale content.

Ref: LOG-23358
Added a concise open-alpha callout beneath the key capabilities list,
noting that APIs and configuration may change between releases and
linking to the GitHub issues page for feedback.

Ref: LOG-23358
Made-with: Cursor
The orchestration flow diagram is not being included in this release.
Removed the reference from the Documentation section to avoid a
dangling link.

Ref: LOG-23358
Made-with: Cursor
Changed final_result from initialized String::new() to an
uninitialized declaration. The value was always overwritten before
being read, triggering clippy unused-assignments with -D warnings.

Ref: LOG-23358
Made-with: Cursor
Remove the mcp-openai-bridge from vendored pattern to
inline mod to simplify code

Ref: LOG-23293
Add toToml-based rendering so Helm values.yaml sections (llm, agent,
mcp, etc.) are converted to valid TOML without hand-written template
helpers. Helm's YAML parser turns ints into Go float64 causing toToml
to render 8000 as 8000.0 — rather than fixing this in templates, a
lenient_int serde module on the Rust side accepts both forms during
deserialization, keeping the Helm template a plain toToml pass-through.

Ref: LOG-23231
Remove old CLI crate this will be replaced by
a more comprehensive stand alone cli/tui
that can be run both remote and embedded

Ref: LOG-23311
sre integration tests need a make target for
deps and feature flag

Ref: LOG-23252
Missing ollama references and a few other things, ensure that
the new orchestration events section is additive from whats
in main

Ref: LOG-22815
Main bumped to edition 2024 rust which gives us
a new set of clippy rules to conform to

Ref: LOG-22753
Reflection prompt was mistakenly given only a
char summary of worker events for use in replanning

Ref: LOG-23405
Model config updated with different configs for
models used in the e2e q1 -> q4 suite

Ref: LOG-22815
Smaller quant models for workers get stuck in a
tool loop. Were optimizing prompts to avoid sticking
with the same exact tool call and expecting different
outputs and providing a in code reminder to steer away from
repates

Ref: LOG-23411
Use the full worker preamble with scope, execution steps,
critical rules to better enforce aura tool fields around
reasoning and prevent looping. More clera error handling
patterns in prompt, remove reasoning from required preventing
re-loops with smaller models.

Ref: LOG-21951
Move away from DAG style (flat list with ids for deps)
to a true nested json structure for more accurate planning.
This greatly improves planning accuracy.

Ref: LOG-23434
Add optional `steps` field to Plan struct so plan.json shows the
original LLM step structure alongside the flattened task array.
Add math-orchestration-qwen35-ollama.toml for local Ollama testing.

Ref: LOG-23434
Adds a `steps` plan format where tasks are sequential by default —
flatten_steps() auto-assigns dependencies from ordering. This fixes
Qwen3.5's persistent `task1.deps=[]` failure where the model couldn't
declare dependencies in the DAG format.

E2E confirmed: Qwen3.5 15/15 (100%), all Q2 plans show task 1: deps=[0].

Ref: LOG-23434
Adds stream_and_forward() — a streaming wrapper that forwards
ReasoningDelta events through event_tx while collecting the final
response. Migrates workers, synthesis, and phase continuation from
the non-streaming chat_with_timeout path.

This makes worker reasoning visible as aura.reasoning SSE events,
enabling diagnosis of model-level issues like the Qwen3.5 duplicate
tool-call loop (model hallucinates parameter failure despite success).

Ref: LOG-23435
Include truncated task results in the evaluation prompt so the
evaluator can cross-reference synthesized responses against actual
tool outputs, reducing false hallucination accusations. Controlled
by AURA_ENRICH_EVALUATION env var (default: true).

Also switches to a dedicated evaluation preamble instead of reusing
the coordinator preamble.

ref: LOG-23425
A little clippy cleanup after cherry picks

Ref: LOG-22924
@henryjandrews
Copy link
Copy Markdown
Collaborator Author

I have read the CLA Document and I hereby sign the CLA

@henryjandrews
Copy link
Copy Markdown
Collaborator Author

recheck

1 similar comment
@henryjandrews
Copy link
Copy Markdown
Collaborator Author

recheck

@henryjandrews
Copy link
Copy Markdown
Collaborator Author

I have read the CLA Document and I hereby sign the CLA

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a durable, VFS-agnostic “orchestration memory” layer (Markdown-first) that the coordinator can read via new MemoryFS tools and that Aura can write post-run, along with config/docs updates to enable and describe the feature.

Changes:

  • Introduces MemoryFs (read-only virtual FS + DSL) and MemoryWriter (post-run durable Markdown memory + index).
  • Adds coordinator-only memory tools (list_memories, read_memory, search_memory, recent_memory, memory_shell) and updates coordinator prompts/config to encourage consulting memory before routing.
  • Adds new [orchestration.memory] config (aura + aura-config) and updates docs/examples/configs accordingly.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
examples/reference.toml Documents new [orchestration.memory] settings and how they relate to artifacts.memory_dir.
crates/aura/src/prompts/orchestrator_preamble.md Adds {{memory_guidance}} placeholder to coordinator preamble template.
crates/aura/src/orchestration/types.rs Refactors reflection prompt builder to be mode-driven for tests (reduces env var dependency).
crates/aura/src/orchestration/tools/mod.rs Registers and re-exports new coordinator memory tools module.
crates/aura/src/orchestration/tools/memory.rs Implements coordinator read-only tools over MemoryFs (+ unit tests).
crates/aura/src/orchestration/orchestrator.rs Wires memory config validation, memory guidance in planning prompt, registers memory tools for coordinator, and writes durable memory post-run.
crates/aura/src/orchestration/mod.rs Adds memory_fs / memory_writer modules and exports MemoryConfig.
crates/aura/src/orchestration/memory_writer.rs Implements durable Markdown memory writer and index regeneration (+ tests).
crates/aura/src/orchestration/memory_fs.rs Implements the virtual filesystem + read-only “memory_shell” DSL (+ tests).
crates/aura/src/orchestration/config.rs Adds MemoryConfig, memory_root() resolution, and preamble memory guidance inclusion (+ tests).
crates/aura/src/lib.rs Re-exports MemoryConfig from the aura crate API surface.
crates/aura-config/src/config_test.rs Adds parsing test coverage for [orchestration.memory] in aura-config.
crates/aura-config/src/config.rs Mirrors MemoryConfig into aura-config’s OrchestrationConfig model.
crates/aura-config/src/builder.rs Plumbs aura-config memory settings into aura runtime OrchestrationConfig.
configs/mezmo-ops-orchestration.toml Enables durable memory in an example ops orchestration config.
README.md Documents [orchestration.memory] and memory tool behavior for coordinators.
CLAUDE.md Updates repo dev/testing guidance and structure overview.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +221 to +231
async fn atomic_write(path: &Path, content: &str) -> std::io::Result<()> {
if let Some(parent) = path.parent() {
fs::create_dir_all(parent).await?;
}
let timestamp = DateTime::<Utc>::from(std::time::SystemTime::now())
.timestamp_nanos_opt()
.unwrap_or_default();
let tmp = path.with_extension(format!("tmp-{timestamp}"));
fs::write(&tmp, content).await?;
fs::rename(tmp, path).await
}
Comment on lines +1871 to +1878
// Planning coordinator uses a tight depth budget. Memory-enabled coordinators get
// extra turns so a memory lookup cannot consume the routing-tool budget.
// stream_and_collect() provides the primary early-exit guard; this is defense-in-depth.
let max_depth = PLANNING_COORDINATOR_MAX_DEPTH;
let max_depth = if self.config.memory.enabled {
PLANNING_COORDINATOR_MAX_DEPTH + 3
} else {
PLANNING_COORDINATOR_MAX_DEPTH
};
Comment on lines +472 to +507
async fn walk_collect<F>(
&self,
root: &Path,
virtual_root: &str,
visitor: &mut F,
) -> std::io::Result<()>
where
F: FnMut(&Path, &str) -> bool,
{
let mut stack = vec![(root.to_path_buf(), virtual_root.to_string())];
while let Some((path, virtual_path)) = stack.pop() {
if !visitor(&path, &virtual_path) {
break;
}
if path.is_dir() {
let mut entries = fs::read_dir(&path).await?;
let mut children = Vec::new();
while let Some(entry) = entries.next_entry().await? {
let child = entry.path();
let name = entry.file_name().to_string_lossy().to_string();
let child_virtual = join_virtual(&virtual_path, &name);
children.push((child, child_virtual));
}
children.sort_by(|a, b| b.1.cmp(&a.1));
stack.extend(children);
}
}
Ok(())
}

fn resolve(&self, path: &str, cwd: &str) -> Result<ResolvedPath, String> {
let virtual_path = resolve_virtual(path, cwd)?;
let relative = virtual_path.trim_start_matches('/');
let real = self.root.join(relative);
Ok(ResolvedPath { real, virtual_path })
}
Comment on lines +325 to +362
async fn query(&self, args: &[String], cwd: &str) -> std::io::Result<MemoryFsOutput> {
if args.len() != 5 || args[1] != "--field" || args[3] != "--equals" {
return Ok(MemoryFsOutput::err(
"query usage: query <path> --field FIELD --equals VALUE",
cwd.to_string(),
));
}
let resolved = match self.resolve(&args[0], cwd) {
Ok(path) => path,
Err(e) => return Ok(MemoryFsOutput::err(e, cwd.to_string())),
};
let content = fs::read_to_string(&resolved.real).await?;
let mut matches = Vec::new();
for line in content.lines() {
let candidate = if content.trim_start().starts_with('{') && content.lines().count() == 1
{
content.as_str()
} else {
line
};
if let Ok(value) = serde_json::from_str::<serde_json::Value>(candidate)
&& json_field_equals(&value, &args[2], &args[4])
{
matches.push(candidate.to_string());
}
if content.trim_start().starts_with('{') && content.lines().count() == 1 {
break;
}
}
Ok(MemoryFsOutput::ok(
if matches.is_empty() {
String::new()
} else {
format!("{}\n", matches.join("\n"))
},
cwd.to_string(),
matches.len() >= self.max_search_results,
))
Comment on lines +596 to +601
async fn is_binary(path: &Path) -> bool {
let Ok(data) = fs::read(path).await else {
return true;
};
data.iter().take(8192).any(|b| *b == 0)
}
Comment on lines +239 to +249
let fs = self.config.fs(args.limit);
let mut matches = Vec::new();
let mut truncated = false;
let paths = args.paths.unwrap_or_else(|| vec!["/memory".to_string()]);
for path in paths {
let output = fs
.search_path(&path, &args.query, None, args.case_sensitive, args.regex)
.await?;
truncated |= output.truncated;
matches.extend(output.stdout.lines().map(ToString::to_string));
}
Comment on lines +195 to +202
async fn call(&self, args: Self::Args) -> Result<Self::Output, Self::Error> {
let fs = self.config.fs(None);
let output = if let Some(tail_n) = args.tail_n {
fs.execute(&format!("tail -n {tail_n} {}", args.path), None)
.await?
} else {
fs.read_path(&args.path, None).await?
};
@henryjandrews
Copy link
Copy Markdown
Collaborator Author

recheck

@justintime4tea justintime4tea force-pushed the justingross/LOG-23587-add-aura-cli branch 18 times, most recently from bdf32d4 to 7d587f5 Compare May 6, 2026 19:47
@justintime4tea justintime4tea force-pushed the justingross/LOG-23587-add-aura-cli branch from 7d587f5 to 1f65c68 Compare May 11, 2026 16:35
Base automatically changed from justingross/LOG-23587-add-aura-cli to feature/orchestration-mode May 11, 2026 18:12
@Shearerbeard Shearerbeard force-pushed the feature/orchestration-mode branch from cac986e to 69fe071 Compare May 14, 2026 00:59
Base automatically changed from feature/orchestration-mode to main May 14, 2026 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants