feat: add multi-agent test generation pipeline by brandon-shelton-mezmo · Pull Request #78 · mezmo/aura

brandon-shelton-mezmo · 2026-04-27T21:34:57Z

Summary

Add make test-generate, an on-demand Makefile target that uses Aura agents to generate and review Rust unit tests — dogfooding the platform to test itself
Add filesystem_write config option to enable the WriteFileTool in agent configurations
Generate 460 unit tests across 6 modules (tools, vector_dynamic, vector_store, builder, rag_tools, string_utils) covering filesystem security, path validation, vector search helpers, and builder pure functions
Harden generator and reviewer agent prompts to enforce content assertions, security path testing, DRY consolidation, and upstream style compliance

Components

Rust changes: filesystem_write field in ToolsConfig (4 files), pub(crate) visibility for testable helpers, VectorStoreManager::new_stub() test constructor
Agent configs: 5 TOML configs in configs/test-agents/ (1 generator + 4 review lenses: correctness, coverage, robustness, style)
Orchestration: scripts/test-generate-orchestrate.sh with generate → cargo test → parallel review → feedback loop (up to 3 rounds)
Tests: 6 _tests.rs files with 460 passing tests, zero warnings
Docs: Strategy doc and README usage guide

Test plan

cargo test --workspace --lib — 743 tests pass, 0 failures, 0 warnings
make test-generate FILES="crates/aura/src/string_utils.rs" — full pipeline runs end-to-end
Security tests verify all 8 sensitive path patterns (/etc, /proc, /sys, /dev, /var/log, /.ssh, /.aws, /.config)
Jenkins commitlint validation

Ref: LOG-00000

Add make test-generate, an on-demand Makefile target that uses Aura agents to generate and review Rust unit tests. Includes filesystem_write config support, 5 agent TOML configs, an orchestration script, and 468 generated tests across 5 modules. Ref: LOG-00000 Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>

Document make test-generate in the Testing section with usage examples, prerequisites, and pipeline overview so developers know how to generate and review tests with Aura agents. Ref: LOG-00000 Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>

Address assertion poverty, DRY violations, missing security tests, fragile assertions, and weak naming found during test quality review. Generator now rejects is_ok()-only assertions and requires content verification. Reviewers enforce security path coverage, mutation resistance, and duplication detection. Ref: LOG-00000 Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>

Regenerate string_utils and tools tests using improved generator prompt that enforces content assertions, security path testing, and DRY consolidation. Adds crate-specific guidance to prevent compilation errors (TempDir pattern, import visibility, write file pre-creation for canonicalize workaround). Key improvements: - Zero assert!(is_ok()) without content verification - Security tests for all 8 sensitive path patterns - Path traversal and base_dir escape tests - Consolidated duplicate tests (46 -> 35 for string_utils) Ref: LOG-00000 Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>

Regenerate vector_dynamic, vector_store, builder, and rag_tools tests using the improved generator prompt with content assertion enforcement, DRY consolidation, and crate-specific guidance. Ref: LOG-00000 Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>

github-actions · 2026-04-27T21:35:08Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

brandon-shelton-mezmo · 2026-04-27T21:56:56Z

I have read the CLA Document and I hereby sign the CLA

Copilot

Pull request overview

Adds an on-demand, multi-agent “test generation + review” workflow to dogfood Aura for Rust unit-test generation, alongside a new filesystem_write toggle that enables agent-driven file writes.

Changes:

Introduces make test-generate and a bash orchestrator that runs generator → cargo test → 4 parallel reviewers with a feedback loop.
Adds filesystem_write to tools configuration and conditionally registers WriteFileTool.
Adds large generated unit-test modules and documents the self-testing strategy and usage.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
scripts/test-generate-orchestrate.sh	Orchestrates multi-agent generation/review loop, manages agent processes, registers test modules, runs `cargo test`.
docs/aura-self-testing-strategy.md	Strategy/architecture document for Aura self-testing (API agent, generator pipeline, chaos agent).
crates/aura/src/vector_store_tests.rs	Generated tests for vector store manager helpers and formatting/search behavior.
crates/aura/src/vector_store.rs	Adds `VectorStoreManager::new_stub()` for test construction.
crates/aura/src/vector_dynamic_tests.rs	Generated tests for dynamic vector search tool schema/serde and call paths.
crates/aura/src/vector_dynamic.rs	Exposes helper fns (`parse_value_str`, `default_limit`) to sibling test modules.
crates/aura/src/string_utils_tests.rs	Generated tests for UTF-8-safe truncation behavior across many inputs.
crates/aura/src/rag_tools_tests.rs	Generated tests for RAG tool schemas, ingest flows, and JSON auto-ingest helpers.
crates/aura/src/lib.rs	Registers new sibling test modules under `#[cfg(test)]`.
crates/aura/src/config.rs	Adds `filesystem_write` field to Aura’s `ToolsConfig`.
crates/aura/src/builder_tests.rs	Adds extensive tests around builder helpers/config construction.
crates/aura/src/builder.rs	Adds conditional `WriteFileTool` registration controlled by `filesystem_write`; exposes helper fns for testing.
crates/aura-config/src/config.rs	Adds `filesystem_write` with serde default to config crate’s `ToolsConfig`.
crates/aura-config/src/builder.rs	Plumbs `filesystem_write` through into Aura runtime config.
configs/test-agents/generator.toml	Generator agent config with hardened prompt guidance and write enabled.
configs/test-agents/review-correctness.toml	Correctness review lens agent config (writes structured JSON review).
configs/test-agents/review-coverage.toml	Coverage review lens agent config (writes structured JSON review).
configs/test-agents/review-robustness.toml	Robustness/flakiness review lens agent config (writes structured JSON review).
configs/test-agents/review-style.toml	Style/DRY review lens agent config (writes structured JSON review).
README.md	Documents test running and AI-assisted test generation usage/prereqs.
Makefile	Adds `test-generate` target calling the orchestrator.
Cargo.lock	Updates workspace crate versions and a `rig-core` git rev.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T23:50:42Z

+        issue_count=$(echo "$review_content" | jq '.issues | length' 2>/dev/null || echo "?")
+        total_issues=$((total_issues + ${issue_count:-0}))


When issue_count cannot be parsed, this assigns a non-numeric value (e.g. "?") and then performs arithmetic expansion, which will terminate the script under set -euo pipefail. Make issue_count default to 0 on parse errors (e.g., jq -r '(.issues|length) // 0' with a numeric fallback) before adding to total_issues.

Suggested change

issue_count=$(echo "$review_content" | jq '.issues | length' 2>/dev/null || echo "?")

total_issues=$((total_issues + ${issue_count:-0}))

issue_count=$(echo "$review_content" | jq -r '(.issues | length) // 0' 2>/dev/null || echo "0")

if ! [[ "$issue_count" =~ ^[0-9]+$ ]]; then

issue_count=0

fi

total_issues=$((total_issues + issue_count))

Copilot · 2026-04-28T23:50:43Z

+      | grep -v '/tests/' \
+      | grep -v '_test\.rs' \
+      | grep -v '_generated_tests\.rs' \


The changed-file auto-detection excludes _test.rs but not the generated *_tests.rs files used by this pipeline. That can cause the script to treat generated test modules as inputs on subsequent runs. Update the filters to exclude *_tests.rs (and any other generated-test naming used here) so only source modules are targeted.

Suggested change

| grep -v '/tests/' \

| grep -v '_test\.rs' \

| grep -v '_generated_tests\.rs' \

| grep -Ev '(/tests/|_generated_tests?\.rs$|_tests?\.rs$)' \

Copilot · 2026-04-28T23:50:43Z

+#   ./scripts/test-generate-orchestrate.sh --files "crates/aura/src/foo.rs"
+#   ./scripts/test-generate-orchestrate.sh  # auto-detect changed files vs main
+#
+# Requires: cargo, curl, jq, OPENAI_API_KEY


The script comment says it requires OPENAI_API_KEY, but the runtime check accepts either AWS credentials or OpenAI. Update the header comment to match the actual supported credential sources so users aren’t misled.

Suggested change

# Requires: cargo, curl, jq, OPENAI_API_KEY

# Requires: cargo, curl, jq, and either AWS credentials or OPENAI_API_KEY

Copilot · 2026-04-28T23:50:43Z

+            let write_enabled = tools_config.filesystem_write;
+            tracing::info!("Adding filesystem tools (write_access={})", write_enabled);
            let fs_tool = FilesystemTool::new()
-                .with_write_access(false)
+                .with_write_access(write_enabled)
                .with_max_file_size(1_048_576);

            builder_state = builder_state.add_tool(ReadFileTool(fs_tool.clone()));
-            builder_state = builder_state.add_tool(ListDirTool(fs_tool));
+            builder_state = builder_state.add_tool(ListDirTool(fs_tool.clone()));
+            if write_enabled {
+                builder_state = builder_state.add_tool(WriteFileTool(fs_tool));
+            }


Enabling filesystem_write adds WriteFileTool while FilesystemTool has no base_dir restriction by default, meaning an agent can overwrite arbitrary paths outside a small sensitive-pattern denylist (e.g., home dotfiles like ~/.bashrc). Consider requiring a base_dir when write is enabled, or defaulting base_dir to a safe project directory for write operations to reduce accidental or prompt-injected damage.

Copilot · 2026-04-28T23:50:43Z

+    #[tokio::test]
+    async fn test_dynamic_vector_search_tool_call_with_limit() {
+        let store = create_test_vector_store("test", None);
+        let tool = DynamicVectorSearchTool::new(store, "test".to_string());
+        let args = VectorSearchArgs {
+            query: "test".to_string(),
+            limit: 10,
+            min_score: 0.0,
+            label_filters: vec![],
+        };
+        let result = tool.call(args).await;
+
+        assert!(result.is_ok());
+    }


This test only asserts result.is_ok() and doesn’t validate any behavior of the tool (output content, formatting, limit handling, etc.). To make it mutation-resistant, unwrap the response and assert on fields that should change with the input (e.g., query, formatted_results, and any limit/min_score enforcement).

Copilot · 2026-04-28T23:50:44Z

+    #[test]
+    fn test_is_reasoning_model_does_not_match_substring() {
+        assert!(!is_reasoning_model("model-o1"));
+        assert!(!is_reasoning_model("my-gpt-5"));
+        assert!(is_reasoning_model("o1x"));
+    }


test_is_reasoning_model_does_not_match_substring is confusing: it asserts that embedded substrings like model-o1 don’t match, but also asserts that an arbitrary string like o1x does match. Either rename the test to reflect the intended "prefix match" behavior, or tighten is_reasoning_model to match only known model name patterns and update the assertions accordingly.

Copilot · 2026-04-28T23:50:44Z

+stop_agent_on_port() {
+  local port=$1
+  # Find and kill the process listening on this port
+  local pid
+  pid=$(lsof -ti :"$port" 2>/dev/null || true)
+  if [ -n "$pid" ]; then
+    kill "$pid" 2>/dev/null || true
+    wait "$pid" 2>/dev/null || true
+    # Remove from PIDS array
+    local new_pids=()
+    for p in "${PIDS[@]+"${PIDS[@]}"}"; do
+      if [ "$p" != "$pid" ]; then
+        new_pids+=("$p")
+      fi
+    done
+    PIDS=("${new_pids[@]+"${new_pids[@]}"}")
+  fi


stop_agent_on_port relies on lsof to find the PID. If lsof is missing (common on some CI images) or returns multiple PIDs, the generator/review servers may not be stopped correctly and later rounds can fail to bind the port. Prefer tracking the PID returned by start_agent (e.g., an assoc array port→pid) and killing that, and/or add lsof to dependency checks with a clear error.

promptless · 2026-05-14T17:38:08Z

Promptless prepared a documentation update related to this change.

Triggered by PR #78

The new filesystem_write config option introduced in this PR wasn't documented in the configuration reference (examples/reference.toml). I've added documentation for it in the [tools] section.

Review: Document filesystem_write config option

brandon-shelton-mezmo added 5 commits April 27, 2026 12:15

brandon-shelton-mezmo requested a review from a team April 27, 2026 21:34

henryjandrews requested a review from Copilot April 28, 2026 23:45

Copilot started reviewing on behalf of henryjandrews April 28, 2026 23:46 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add multi-agent test generation pipeline#78

feat: add multi-agent test generation pipeline#78
brandon-shelton-mezmo wants to merge 5 commits into
mainfrom
feature/aura-self-testing

brandon-shelton-mezmo commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

brandon-shelton-mezmo commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

promptless Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		issue_count=$(echo "$review_content" \| jq '.issues \| length' 2>/dev/null \|\| echo "?")
		total_issues=$((total_issues + ${issue_count:-0}))

-        issue_count=$(echo "$review_content" | jq '.issues | length' 2>/dev/null || echo "?")
-        total_issues=$((total_issues + ${issue_count:-0}))
+        issue_count=$(echo "$review_content" | jq -r '(.issues | length) // 0' 2>/dev/null || echo "0")
+        if ! [[ "$issue_count" =~ ^[0-9]+$ ]]; then
+          issue_count=0
+        fi
+        total_issues=$((total_issues + issue_count))

	# Requires: cargo, curl, jq, OPENAI_API_KEY
	# Requires: cargo, curl, jq, and either AWS credentials or OPENAI_API_KEY

Conversation

brandon-shelton-mezmo commented Apr 27, 2026

Summary

Components

Test plan

Uh oh!

github-actions Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandon-shelton-mezmo commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

promptless Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 27, 2026 •

edited

Loading