Skip to content

feat: add multi-agent test generation pipeline#78

Open
brandon-shelton-mezmo wants to merge 5 commits into
mainfrom
feature/aura-self-testing
Open

feat: add multi-agent test generation pipeline#78
brandon-shelton-mezmo wants to merge 5 commits into
mainfrom
feature/aura-self-testing

Conversation

@brandon-shelton-mezmo
Copy link
Copy Markdown

Summary

  • Add make test-generate, an on-demand Makefile target that uses Aura agents to generate and review Rust unit tests — dogfooding the platform to test itself
  • Add filesystem_write config option to enable the WriteFileTool in agent configurations
  • Generate 460 unit tests across 6 modules (tools, vector_dynamic, vector_store, builder, rag_tools, string_utils) covering filesystem security, path validation, vector search helpers, and builder pure functions
  • Harden generator and reviewer agent prompts to enforce content assertions, security path testing, DRY consolidation, and upstream style compliance

Components

  • Rust changes: filesystem_write field in ToolsConfig (4 files), pub(crate) visibility for testable helpers, VectorStoreManager::new_stub() test constructor
  • Agent configs: 5 TOML configs in configs/test-agents/ (1 generator + 4 review lenses: correctness, coverage, robustness, style)
  • Orchestration: scripts/test-generate-orchestrate.sh with generate → cargo test → parallel review → feedback loop (up to 3 rounds)
  • Tests: 6 _tests.rs files with 460 passing tests, zero warnings
  • Docs: Strategy doc and README usage guide

Test plan

  • cargo test --workspace --lib — 743 tests pass, 0 failures, 0 warnings
  • make test-generate FILES="crates/aura/src/string_utils.rs" — full pipeline runs end-to-end
  • Security tests verify all 8 sensitive path patterns (/etc, /proc, /sys, /dev, /var/log, /.ssh, /.aws, /.config)
  • Jenkins commitlint validation

Ref: LOG-00000

Add make test-generate, an on-demand Makefile target that uses
Aura agents to generate and review Rust unit tests. Includes
filesystem_write config support, 5 agent TOML configs, an
orchestration script, and 468 generated tests across 5 modules.

Ref: LOG-00000
Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>
Document make test-generate in the Testing section with usage
examples, prerequisites, and pipeline overview so developers
know how to generate and review tests with Aura agents.

Ref: LOG-00000
Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>
Address assertion poverty, DRY violations, missing security
tests, fragile assertions, and weak naming found during test
quality review. Generator now rejects is_ok()-only assertions
and requires content verification. Reviewers enforce security
path coverage, mutation resistance, and duplication detection.

Ref: LOG-00000
Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>
Regenerate string_utils and tools tests using improved generator
prompt that enforces content assertions, security path testing,
and DRY consolidation. Adds crate-specific guidance to prevent
compilation errors (TempDir pattern, import visibility, write
file pre-creation for canonicalize workaround).

Key improvements:
- Zero assert!(is_ok()) without content verification
- Security tests for all 8 sensitive path patterns
- Path traversal and base_dir escape tests
- Consolidated duplicate tests (46 -> 35 for string_utils)

Ref: LOG-00000
Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>
Regenerate vector_dynamic, vector_store, builder, and rag_tools
tests using the improved generator prompt with content assertion
enforcement, DRY consolidation, and crate-specific guidance.

Ref: LOG-00000
Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>
@brandon-shelton-mezmo brandon-shelton-mezmo requested a review from a team April 27, 2026 21:34
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 27, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@brandon-shelton-mezmo
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an on-demand, multi-agent “test generation + review” workflow to dogfood Aura for Rust unit-test generation, alongside a new filesystem_write toggle that enables agent-driven file writes.

Changes:

  • Introduces make test-generate and a bash orchestrator that runs generator → cargo test → 4 parallel reviewers with a feedback loop.
  • Adds filesystem_write to tools configuration and conditionally registers WriteFileTool.
  • Adds large generated unit-test modules and documents the self-testing strategy and usage.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
scripts/test-generate-orchestrate.sh Orchestrates multi-agent generation/review loop, manages agent processes, registers test modules, runs cargo test.
docs/aura-self-testing-strategy.md Strategy/architecture document for Aura self-testing (API agent, generator pipeline, chaos agent).
crates/aura/src/vector_store_tests.rs Generated tests for vector store manager helpers and formatting/search behavior.
crates/aura/src/vector_store.rs Adds VectorStoreManager::new_stub() for test construction.
crates/aura/src/vector_dynamic_tests.rs Generated tests for dynamic vector search tool schema/serde and call paths.
crates/aura/src/vector_dynamic.rs Exposes helper fns (parse_value_str, default_limit) to sibling test modules.
crates/aura/src/string_utils_tests.rs Generated tests for UTF-8-safe truncation behavior across many inputs.
crates/aura/src/rag_tools_tests.rs Generated tests for RAG tool schemas, ingest flows, and JSON auto-ingest helpers.
crates/aura/src/lib.rs Registers new sibling test modules under #[cfg(test)].
crates/aura/src/config.rs Adds filesystem_write field to Aura’s ToolsConfig.
crates/aura/src/builder_tests.rs Adds extensive tests around builder helpers/config construction.
crates/aura/src/builder.rs Adds conditional WriteFileTool registration controlled by filesystem_write; exposes helper fns for testing.
crates/aura-config/src/config.rs Adds filesystem_write with serde default to config crate’s ToolsConfig.
crates/aura-config/src/builder.rs Plumbs filesystem_write through into Aura runtime config.
configs/test-agents/generator.toml Generator agent config with hardened prompt guidance and write enabled.
configs/test-agents/review-correctness.toml Correctness review lens agent config (writes structured JSON review).
configs/test-agents/review-coverage.toml Coverage review lens agent config (writes structured JSON review).
configs/test-agents/review-robustness.toml Robustness/flakiness review lens agent config (writes structured JSON review).
configs/test-agents/review-style.toml Style/DRY review lens agent config (writes structured JSON review).
README.md Documents test running and AI-assisted test generation usage/prereqs.
Makefile Adds test-generate target calling the orchestrator.
Cargo.lock Updates workspace crate versions and a rig-core git rev.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +446 to +447
issue_count=$(echo "$review_content" | jq '.issues | length' 2>/dev/null || echo "?")
total_issues=$((total_issues + ${issue_count:-0}))
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When issue_count cannot be parsed, this assigns a non-numeric value (e.g. "?") and then performs arithmetic expansion, which will terminate the script under set -euo pipefail. Make issue_count default to 0 on parse errors (e.g., jq -r '(.issues|length) // 0' with a numeric fallback) before adding to total_issues.

Suggested change
issue_count=$(echo "$review_content" | jq '.issues | length' 2>/dev/null || echo "?")
total_issues=$((total_issues + ${issue_count:-0}))
issue_count=$(echo "$review_content" | jq -r '(.issues | length) // 0' 2>/dev/null || echo "0")
if ! [[ "$issue_count" =~ ^[0-9]+$ ]]; then
issue_count=0
fi
total_issues=$((total_issues + issue_count))

Copilot uses AI. Check for mistakes.
Comment on lines +237 to +239
| grep -v '/tests/' \
| grep -v '_test\.rs' \
| grep -v '_generated_tests\.rs' \
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changed-file auto-detection excludes _test.rs but not the generated *_tests.rs files used by this pipeline. That can cause the script to treat generated test modules as inputs on subsequent runs. Update the filters to exclude *_tests.rs (and any other generated-test naming used here) so only source modules are targeted.

Suggested change
| grep -v '/tests/' \
| grep -v '_test\.rs' \
| grep -v '_generated_tests\.rs' \
| grep -Ev '(/tests/|_generated_tests?\.rs$|_tests?\.rs$)' \

Copilot uses AI. Check for mistakes.
# ./scripts/test-generate-orchestrate.sh --files "crates/aura/src/foo.rs"
# ./scripts/test-generate-orchestrate.sh # auto-detect changed files vs main
#
# Requires: cargo, curl, jq, OPENAI_API_KEY
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script comment says it requires OPENAI_API_KEY, but the runtime check accepts either AWS credentials or OpenAI. Update the header comment to match the actual supported credential sources so users aren’t misled.

Suggested change
# Requires: cargo, curl, jq, OPENAI_API_KEY
# Requires: cargo, curl, jq, and either AWS credentials or OPENAI_API_KEY

Copilot uses AI. Check for mistakes.
Comment on lines +446 to +456
let write_enabled = tools_config.filesystem_write;
tracing::info!("Adding filesystem tools (write_access={})", write_enabled);
let fs_tool = FilesystemTool::new()
.with_write_access(false)
.with_write_access(write_enabled)
.with_max_file_size(1_048_576);

builder_state = builder_state.add_tool(ReadFileTool(fs_tool.clone()));
builder_state = builder_state.add_tool(ListDirTool(fs_tool));
builder_state = builder_state.add_tool(ListDirTool(fs_tool.clone()));
if write_enabled {
builder_state = builder_state.add_tool(WriteFileTool(fs_tool));
}
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling filesystem_write adds WriteFileTool while FilesystemTool has no base_dir restriction by default, meaning an agent can overwrite arbitrary paths outside a small sensitive-pattern denylist (e.g., home dotfiles like ~/.bashrc). Consider requiring a base_dir when write is enabled, or defaulting base_dir to a safe project directory for write operations to reduce accidental or prompt-injected damage.

Copilot uses AI. Check for mistakes.
Comment on lines +239 to +252
#[tokio::test]
async fn test_dynamic_vector_search_tool_call_with_limit() {
let store = create_test_vector_store("test", None);
let tool = DynamicVectorSearchTool::new(store, "test".to_string());
let args = VectorSearchArgs {
query: "test".to_string(),
limit: 10,
min_score: 0.0,
label_filters: vec![],
};
let result = tool.call(args).await;

assert!(result.is_ok());
}
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test only asserts result.is_ok() and doesn’t validate any behavior of the tool (output content, formatting, limit handling, etc.). To make it mutation-resistant, unwrap the response and assert on fields that should change with the input (e.g., query, formatted_results, and any limit/min_score enforcement).

Copilot uses AI. Check for mistakes.
Comment on lines +1536 to +1541
#[test]
fn test_is_reasoning_model_does_not_match_substring() {
assert!(!is_reasoning_model("model-o1"));
assert!(!is_reasoning_model("my-gpt-5"));
assert!(is_reasoning_model("o1x"));
}
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_is_reasoning_model_does_not_match_substring is confusing: it asserts that embedded substrings like model-o1 don’t match, but also asserts that an arbitrary string like o1x does match. Either rename the test to reflect the intended "prefix match" behavior, or tighten is_reasoning_model to match only known model name patterns and update the assertions accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +124 to +140
stop_agent_on_port() {
local port=$1
# Find and kill the process listening on this port
local pid
pid=$(lsof -ti :"$port" 2>/dev/null || true)
if [ -n "$pid" ]; then
kill "$pid" 2>/dev/null || true
wait "$pid" 2>/dev/null || true
# Remove from PIDS array
local new_pids=()
for p in "${PIDS[@]+"${PIDS[@]}"}"; do
if [ "$p" != "$pid" ]; then
new_pids+=("$p")
fi
done
PIDS=("${new_pids[@]+"${new_pids[@]}"}")
fi
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop_agent_on_port relies on lsof to find the PID. If lsof is missing (common on some CI images) or returns multiple PIDs, the generator/review servers may not be stopped correctly and later rounds can fail to bind the port. Prefer tracking the PID returned by start_agent (e.g., an assoc array port→pid) and killing that, and/or add lsof to dependency checks with a clear error.

Copilot uses AI. Check for mistakes.
@promptless
Copy link
Copy Markdown

promptless Bot commented May 14, 2026

Promptless prepared a documentation update related to this change.

Triggered by PR #78

The new filesystem_write config option introduced in this PR wasn't documented in the configuration reference (examples/reference.toml). I've added documentation for it in the [tools] section.

Review: Document filesystem_write config option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants