feat: add multi-agent test generation pipeline#78
feat: add multi-agent test generation pipeline#78brandon-shelton-mezmo wants to merge 5 commits into
Conversation
Add make test-generate, an on-demand Makefile target that uses Aura agents to generate and review Rust unit tests. Includes filesystem_write config support, 5 agent TOML configs, an orchestration script, and 468 generated tests across 5 modules. Ref: LOG-00000 Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>
Document make test-generate in the Testing section with usage examples, prerequisites, and pipeline overview so developers know how to generate and review tests with Aura agents. Ref: LOG-00000 Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>
Address assertion poverty, DRY violations, missing security tests, fragile assertions, and weak naming found during test quality review. Generator now rejects is_ok()-only assertions and requires content verification. Reviewers enforce security path coverage, mutation resistance, and duplication detection. Ref: LOG-00000 Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>
Regenerate string_utils and tools tests using improved generator prompt that enforces content assertions, security path testing, and DRY consolidation. Adds crate-specific guidance to prevent compilation errors (TempDir pattern, import visibility, write file pre-creation for canonicalize workaround). Key improvements: - Zero assert!(is_ok()) without content verification - Security tests for all 8 sensitive path patterns - Path traversal and base_dir escape tests - Consolidated duplicate tests (46 -> 35 for string_utils) Ref: LOG-00000 Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>
Regenerate vector_dynamic, vector_store, builder, and rag_tools tests using the improved generator prompt with content assertion enforcement, DRY consolidation, and crate-specific guidance. Ref: LOG-00000 Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Pull request overview
Adds an on-demand, multi-agent “test generation + review” workflow to dogfood Aura for Rust unit-test generation, alongside a new filesystem_write toggle that enables agent-driven file writes.
Changes:
- Introduces
make test-generateand a bash orchestrator that runs generator →cargo test→ 4 parallel reviewers with a feedback loop. - Adds
filesystem_writeto tools configuration and conditionally registersWriteFileTool. - Adds large generated unit-test modules and documents the self-testing strategy and usage.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/test-generate-orchestrate.sh | Orchestrates multi-agent generation/review loop, manages agent processes, registers test modules, runs cargo test. |
| docs/aura-self-testing-strategy.md | Strategy/architecture document for Aura self-testing (API agent, generator pipeline, chaos agent). |
| crates/aura/src/vector_store_tests.rs | Generated tests for vector store manager helpers and formatting/search behavior. |
| crates/aura/src/vector_store.rs | Adds VectorStoreManager::new_stub() for test construction. |
| crates/aura/src/vector_dynamic_tests.rs | Generated tests for dynamic vector search tool schema/serde and call paths. |
| crates/aura/src/vector_dynamic.rs | Exposes helper fns (parse_value_str, default_limit) to sibling test modules. |
| crates/aura/src/string_utils_tests.rs | Generated tests for UTF-8-safe truncation behavior across many inputs. |
| crates/aura/src/rag_tools_tests.rs | Generated tests for RAG tool schemas, ingest flows, and JSON auto-ingest helpers. |
| crates/aura/src/lib.rs | Registers new sibling test modules under #[cfg(test)]. |
| crates/aura/src/config.rs | Adds filesystem_write field to Aura’s ToolsConfig. |
| crates/aura/src/builder_tests.rs | Adds extensive tests around builder helpers/config construction. |
| crates/aura/src/builder.rs | Adds conditional WriteFileTool registration controlled by filesystem_write; exposes helper fns for testing. |
| crates/aura-config/src/config.rs | Adds filesystem_write with serde default to config crate’s ToolsConfig. |
| crates/aura-config/src/builder.rs | Plumbs filesystem_write through into Aura runtime config. |
| configs/test-agents/generator.toml | Generator agent config with hardened prompt guidance and write enabled. |
| configs/test-agents/review-correctness.toml | Correctness review lens agent config (writes structured JSON review). |
| configs/test-agents/review-coverage.toml | Coverage review lens agent config (writes structured JSON review). |
| configs/test-agents/review-robustness.toml | Robustness/flakiness review lens agent config (writes structured JSON review). |
| configs/test-agents/review-style.toml | Style/DRY review lens agent config (writes structured JSON review). |
| README.md | Documents test running and AI-assisted test generation usage/prereqs. |
| Makefile | Adds test-generate target calling the orchestrator. |
| Cargo.lock | Updates workspace crate versions and a rig-core git rev. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| issue_count=$(echo "$review_content" | jq '.issues | length' 2>/dev/null || echo "?") | ||
| total_issues=$((total_issues + ${issue_count:-0})) |
There was a problem hiding this comment.
When issue_count cannot be parsed, this assigns a non-numeric value (e.g. "?") and then performs arithmetic expansion, which will terminate the script under set -euo pipefail. Make issue_count default to 0 on parse errors (e.g., jq -r '(.issues|length) // 0' with a numeric fallback) before adding to total_issues.
| issue_count=$(echo "$review_content" | jq '.issues | length' 2>/dev/null || echo "?") | |
| total_issues=$((total_issues + ${issue_count:-0})) | |
| issue_count=$(echo "$review_content" | jq -r '(.issues | length) // 0' 2>/dev/null || echo "0") | |
| if ! [[ "$issue_count" =~ ^[0-9]+$ ]]; then | |
| issue_count=0 | |
| fi | |
| total_issues=$((total_issues + issue_count)) |
| | grep -v '/tests/' \ | ||
| | grep -v '_test\.rs' \ | ||
| | grep -v '_generated_tests\.rs' \ |
There was a problem hiding this comment.
The changed-file auto-detection excludes _test.rs but not the generated *_tests.rs files used by this pipeline. That can cause the script to treat generated test modules as inputs on subsequent runs. Update the filters to exclude *_tests.rs (and any other generated-test naming used here) so only source modules are targeted.
| | grep -v '/tests/' \ | |
| | grep -v '_test\.rs' \ | |
| | grep -v '_generated_tests\.rs' \ | |
| | grep -Ev '(/tests/|_generated_tests?\.rs$|_tests?\.rs$)' \ |
| # ./scripts/test-generate-orchestrate.sh --files "crates/aura/src/foo.rs" | ||
| # ./scripts/test-generate-orchestrate.sh # auto-detect changed files vs main | ||
| # | ||
| # Requires: cargo, curl, jq, OPENAI_API_KEY |
There was a problem hiding this comment.
The script comment says it requires OPENAI_API_KEY, but the runtime check accepts either AWS credentials or OpenAI. Update the header comment to match the actual supported credential sources so users aren’t misled.
| # Requires: cargo, curl, jq, OPENAI_API_KEY | |
| # Requires: cargo, curl, jq, and either AWS credentials or OPENAI_API_KEY |
| let write_enabled = tools_config.filesystem_write; | ||
| tracing::info!("Adding filesystem tools (write_access={})", write_enabled); | ||
| let fs_tool = FilesystemTool::new() | ||
| .with_write_access(false) | ||
| .with_write_access(write_enabled) | ||
| .with_max_file_size(1_048_576); | ||
|
|
||
| builder_state = builder_state.add_tool(ReadFileTool(fs_tool.clone())); | ||
| builder_state = builder_state.add_tool(ListDirTool(fs_tool)); | ||
| builder_state = builder_state.add_tool(ListDirTool(fs_tool.clone())); | ||
| if write_enabled { | ||
| builder_state = builder_state.add_tool(WriteFileTool(fs_tool)); | ||
| } |
There was a problem hiding this comment.
Enabling filesystem_write adds WriteFileTool while FilesystemTool has no base_dir restriction by default, meaning an agent can overwrite arbitrary paths outside a small sensitive-pattern denylist (e.g., home dotfiles like ~/.bashrc). Consider requiring a base_dir when write is enabled, or defaulting base_dir to a safe project directory for write operations to reduce accidental or prompt-injected damage.
| #[tokio::test] | ||
| async fn test_dynamic_vector_search_tool_call_with_limit() { | ||
| let store = create_test_vector_store("test", None); | ||
| let tool = DynamicVectorSearchTool::new(store, "test".to_string()); | ||
| let args = VectorSearchArgs { | ||
| query: "test".to_string(), | ||
| limit: 10, | ||
| min_score: 0.0, | ||
| label_filters: vec![], | ||
| }; | ||
| let result = tool.call(args).await; | ||
|
|
||
| assert!(result.is_ok()); | ||
| } |
There was a problem hiding this comment.
This test only asserts result.is_ok() and doesn’t validate any behavior of the tool (output content, formatting, limit handling, etc.). To make it mutation-resistant, unwrap the response and assert on fields that should change with the input (e.g., query, formatted_results, and any limit/min_score enforcement).
| #[test] | ||
| fn test_is_reasoning_model_does_not_match_substring() { | ||
| assert!(!is_reasoning_model("model-o1")); | ||
| assert!(!is_reasoning_model("my-gpt-5")); | ||
| assert!(is_reasoning_model("o1x")); | ||
| } |
There was a problem hiding this comment.
test_is_reasoning_model_does_not_match_substring is confusing: it asserts that embedded substrings like model-o1 don’t match, but also asserts that an arbitrary string like o1x does match. Either rename the test to reflect the intended "prefix match" behavior, or tighten is_reasoning_model to match only known model name patterns and update the assertions accordingly.
| stop_agent_on_port() { | ||
| local port=$1 | ||
| # Find and kill the process listening on this port | ||
| local pid | ||
| pid=$(lsof -ti :"$port" 2>/dev/null || true) | ||
| if [ -n "$pid" ]; then | ||
| kill "$pid" 2>/dev/null || true | ||
| wait "$pid" 2>/dev/null || true | ||
| # Remove from PIDS array | ||
| local new_pids=() | ||
| for p in "${PIDS[@]+"${PIDS[@]}"}"; do | ||
| if [ "$p" != "$pid" ]; then | ||
| new_pids+=("$p") | ||
| fi | ||
| done | ||
| PIDS=("${new_pids[@]+"${new_pids[@]}"}") | ||
| fi |
There was a problem hiding this comment.
stop_agent_on_port relies on lsof to find the PID. If lsof is missing (common on some CI images) or returns multiple PIDs, the generator/review servers may not be stopped correctly and later rounds can fail to bind the port. Prefer tracking the PID returned by start_agent (e.g., an assoc array port→pid) and killing that, and/or add lsof to dependency checks with a clear error.
|
Promptless prepared a documentation update related to this change. Triggered by PR #78 The new |
Summary
make test-generate, an on-demand Makefile target that uses Aura agents to generate and review Rust unit tests — dogfooding the platform to test itselffilesystem_writeconfig option to enable the WriteFileTool in agent configurationsComponents
filesystem_writefield in ToolsConfig (4 files),pub(crate)visibility for testable helpers,VectorStoreManager::new_stub()test constructorconfigs/test-agents/(1 generator + 4 review lenses: correctness, coverage, robustness, style)scripts/test-generate-orchestrate.shwith generate → cargo test → parallel review → feedback loop (up to 3 rounds)_tests.rsfiles with 460 passing tests, zero warningsTest plan
cargo test --workspace --lib— 743 tests pass, 0 failures, 0 warningsmake test-generate FILES="crates/aura/src/string_utils.rs"— full pipeline runs end-to-endRef: LOG-00000