Skip to content

gtskevin/test-audit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Test Audit β€” Autonomous test judgment for AI coding agents

GitHub Stars Claude Code Skill License: MIT Agent Agnostic


Note

Who is this for? You build things with AI coding agents. You're not sure when code needs tests, what kind of tests, or how many. Test-audit makes those decisions for you β€” then writes and runs the tests.

Highlights

Feature Why it matters
🧠 Makes test decisions for you Reads your diff, decides which files need tests, what type, and how deep. Zero input required.
πŸ› Finds real bugs, not just coverage gaps Found 3 production bugs in its first project (route ordering, missing table, missing column).
πŸ”§ Respects your project Reads your conftest, reuses your helpers, follows your patterns. Never introduces new dependencies.
🎯 Converges, doesn't retry 2-retry limit forces understanding over blind repetition.

The Gap This Fills

There are plenty of testing tools. But they all assume you already know what to test:

Tool What it does What it assumes
TDD skills Enforces "write test first" process You know WHAT to test and HOW DEEP
Coverage tools Measures % of lines covered You interpret the number yourself
"Write tests for X" Generates tests for a specific file You chose the file and scope
Copilot test gen Suggests tests for your current file You know which file matters

Test-audit is different. It answers the question you can't: given these code changes, what actually needs testing?

A senior developer looks at your diff and thinks:
  "Type definitions? β†’ Skip."
  "Auth routes? β†’ High risk. Test every path."
  "Helper function? β†’ Happy path + one edge case."
  "Database query? β†’ Test with isolation."

Test-audit encodes this judgment.

Installation

Tip

The easiest way: just tell your AI agent to install it.

Open your agent and say:

Install the test-audit skill from https://github.com/gtskevin/test-audit

Your agent will handle the rest β€” clone the repo and put the file in the right place.

Manual installation (if you prefer terminal)

Claude Code:

mkdir -p ~/.claude/skills/test-audit
curl -sL https://raw.githubusercontent.com/gtskevin/test-audit/main/skill.md -o ~/.claude/skills/test-audit/skill.md

Codex CLI:

mkdir -p ~/.codex/skills/test-audit
curl -sL https://raw.githubusercontent.com/gtskevin/test-audit/main/skill.md -o ~/.codex/skills/test-audit/SKILL.md

Gemini CLI / Cursor / Windsurf: Copy the skill.md content into your agent's instruction file (GEMINI.md, .cursor/rules/test-audit.mdc, or .windsurfrules).

Quick Start

  1. Install the skill (above)
  2. Open your agent in any project with code changes
  3. Say test-audit (or /test-audit in Claude Code)
  4. Watch the output:
πŸ“ Files reviewed: 5
   β†’ 2 files skipped (type definitions, constants)
   β†’ 3 files need testing
πŸ§ͺ Tests added: 12 (3 new test files)
βœ… Result: ALL PASS
πŸ› Bugs found: 1 (route ordering in analyses.py)
   β†’ Scanned similar routes: 2 more issues found
β†’ Can commit? Fix the bugs first, then YES

Example Prompts

Once installed, trigger it in your agent. Here are real scenarios:

Scenario What to say What happens
After AI writes a feature test-audit Scans the diff, tests only what changed
Before committing test-audit Safety net β€” catches untested changes
After a big refactor test-audit Re-evaluates which tests need updating
"I just don't know what to test" test-audit Makes all test decisions for you

Tip

Test-audit works best right after code changes, before committing. It reads git diff to find exactly what changed.

How It Works

flowchart TD
    A["/test-audit"] --> B["Locate Changes<br/>git diff"]
    B --> C["Judge Each File<br/>Skip? Unit? Integration? Deep?"]
    C --> D["Read the Room<br/>conftest, patterns, DB setup"]
    D --> E["Write Tests<br/>Reuse project helpers"]
    E --> F["Run & Converge<br/>Max 2 retries"]
    F --> G{Bug found?}
    G -->|Yes| H["Record + Scan for similar"]
    G -->|No| I["Report Results"]
    H --> I
    I --> J["Can I commit? βœ…"]
Loading

The Judgment Engine (Core Value)

For each changed file, test-audit makes three decisions that experienced developers make instinctively:

1. Is this file worth testing?

Type definitions, constants, pure CSS     β†’ Not worth it
Helper functions, simple data transforms   β†’ Probably worth it
API endpoints, database operations, auth   β†’ Definitely worth it

2. What type of test?

Pure logic function    β†’ Unit test
API route with auth    β†’ Integration test via TestClient
React component        β†’ Test the logic, skip the rendering if deps are heavy
Database query         β†’ Test with real DB, not mocks

3. How deep?

Auth/permissions/data isolation   β†’ Every path: happy + unauthorized + edge cases
Simple helper                     β†’ Happy path + one boundary value
Complex state machine             β†’ State transitions table

These aren't hard rules β€” they're guidelines the AI applies with judgment based on your actual code.

Bug Discovery

While writing tests, test-audit often discovers bugs in the code being tested:

Bug Found Root Cause How test-audit caught it
Route 422 on valid endpoint Fixed path matched by {param} route Test got unexpected 422 β†’ investigated route ordering
no such table Table only in migration, not schema init Test DB creation failed β†’ checked _ensure_schema
no such column Column not in schema init Test assertion failed β†’ checked _ensure_table_columns

When one bug is found, it scans for similar issues (one discovery, batch investigation).

The 2-Retry Rule

1st failure β†’ Read error, locate cause, fix
2nd failure β†’ STOP. Re-understand the problem.
              Test infrastructure misunderstood? β†’ Re-read conftest
              Production code has a bug?         β†’ Record it, move on

Same error twice = your mental model is wrong β€” not that you need another try.

Who Uses This

The vibe coder β€” You build with AI agents but aren't a professional developer. You don't have the "test intuition" that comes from years of shipping. Test-audit gives you that intuition.

The solo developer shipping fast β€” You know testing matters but don't have time to think about coverage strategy. Run /test-audit before each commit as a safety net.

The team lead reviewing PRs β€” Use test-audit to quickly assess whether a PR has adequate test coverage without reading every line.

How It's Different

vs. TDD Skills

TDD says "write the test first." But which test? Testing an auth route needs per-path coverage. Testing a helper function needs one happy path. How do you know? Experience.

Test-audit has that experience built in. It reads the code and makes the judgment call β€” so you don't need years of practice to know what "good testing" looks like.

vs. "Write tests for X"

When you say "write tests for auth.py," you've already made the decision that auth.py needs tests. But what if the real risk is in a helper that auth.py calls? Or what if auth.py already has great coverage, but a new migration file broke the schema?

Test-audit scans your entire diff, not just one file you pointed at.

vs. Coverage Reports

Coverage says "87% of lines are executed." It doesn't tell you:

  • Whether the 13% gap matters
  • Whether the 87% tests are actually asserting anything meaningful
  • Whether a file with 100% coverage is testing the right things

Test-audit makes qualitative judgments, not just quantitative.

Cross-Agent Compatibility

The core skill β€” judgment about what to test, workflow, and pitfall patterns β€” is agent-agnostic. It works with any AI coding agent that can read instructions, run shell commands, and edit files.

Agent How to Install How to Trigger Notes
Claude Code ~/.claude/skills/test-audit/skill.md /test-audit Native skill system. Best experience.
Codex CLI ~/.codex/skills/test-audit/SKILL.md "run test-audit" Native skill system. Full compatibility.
Gemini CLI Paste into GEMINI.md "run test-audit" No skill system, paste as instructions.
Cursor .cursor/rules/test-audit.mdc Auto or @test-audit Use rule activation. Works with .mdc frontmatter.
Windsurf Paste into .windsurfrules Manual trigger No skill system. 12K char limit β€” trim if needed.

Note

What's agent-specific? Very little. The skill references in the "Compatibility with Other Skills" section (e.g., everything-claude-code:browser-qa) are Claude Code specific, but the skill automatically adapts its instructions β€” it tells your agent to use whatever browser/E2E/security tools are available to it.

Real Results

From 7 rounds of test-audit on a production FastAPI + React project:

Metric Result
Files correctly skipped 3 (type definitions, CSS, constants)
New tests added 35 across 3 test files
Production bugs found 3 (route ordering, missing table, missing column)
False positives 0 β€” every finding was a real issue

FAQ

I'm a senior developer. Do I need this?

Maybe not for judgment β€” you already know what to test. But you might still find value in: (1) the automated diff scanning, (2) bug discovery during test writing, and (3) the convergence discipline that prevents AI agents from infinite retry loops.

Does it work with my test framework?

Yes. Test-audit reads your existing test files and conftest to understand your framework (pytest, vitest, jest, Go testing, etc.) before writing any tests. It adapts to whatever you're using.

What if my project has no tests at all?

Test-audit will automatically set up the test infrastructure β€” install dependencies, create configuration, and verify the environment β€” before writing the first test. No manual setup needed.

How is this different from Copilot's "generate tests"?

Copilot generates tests for the file you have open. You chose the file. You decided it needs tests. Test-audit scans your entire diff, decides which files matter, and determines the right depth β€” then writes the tests. It's the difference between having a typist and having a test architect.

Can I use it with non-Python projects?

Yes. The judgment engine (is it worth testing, what type, how deep) is language-agnostic. The workflow adapts to any test framework.

What if the AI agent keeps retrying failing tests?

Test-audit enforces a strict 2-retry limit. After two failures on the same test, it stops and re-evaluates the root cause β€” either the test infrastructure was misunderstood, or the production code has a bug. No infinite loops.

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Star History

Star History Chart


Built with πŸ§ͺ by @gtskevin β€” Test judgment for everyone.

About

πŸ§ͺ You don't know what to test. This skill does. Test judgment for vibe coders β€” decides what needs testing, writes the tests, finds bugs.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors