Agentic Quality Engineering Lab

Controlled agentic QA workflow for QA Automation Engineers and SDETs: requirement analysis, product-risk analysis, risk-based test design, deterministic quality gate execution, failure analysis, bug report generation, release-decision support, guardrails, human approval simulation, and CI evaluation thresholds.

This is a portfolio-grade lab for AI-assisted quality engineering. It shows how agents can support QA thinking only when they are constrained by structured outputs, safe tool allowlists, deterministic evaluation, human-reviewed policies, and auditable traces.

What This Repo Does

Reads fictional BugBank requirements from local Markdown files.
Uses deterministic FakeAgentProvider outputs by default.
Produces Zod-validated agent outputs for every workflow step.
Runs only allowlisted local tools.
Simulates quality gates against seeded BugBank bugs.
Generates local JSON and Markdown reports.
Evaluates golden scenarios with CI thresholds.
Records agent traces, tool calls, guardrail decisions, and approval decisions.
Captures invalid agent outputs and tool errors as auditable trace failures.

Why It Matters For QA/SDET

AI agents should not be trusted blindly. This lab demonstrates a guarded workflow where QA expertise remains central: requirements are analyzed, risks are made explicit, tests are designed from risk, failures are tied to evidence, and release support is gated by measurable checks.

The point is not to replace QA engineers. The point is to show how controlled agentic QA workflows can make quality reasoning more visible, repeatable, and reviewable.

What Is Deterministic

FakeAgentProvider is the default and only provider used by tests, evals, CLI commands, and CI.
No real OpenAI, Anthropic, Gemini, or other external AI API is called.
No secrets, database, Docker, paid service, browser UI, vector database, or network access is required.
.env.example is documentation-only, contains no secrets, and is safe to commit.
Reports avoid random IDs, real timestamps, absolute local paths, and nondeterministic ordering.

What Guardrails Block

Arbitrary shell execution.
Network requests.
Source code modification by agents.
External GitHub/Jira issue creation.
Environment secret reads.
Path traversal such as ../../package.json.
Writes outside reports/.

What CI Evaluates

CI runs typecheck, lint, format check, unit tests, agent tests, evaluation tests, and golden-scenario evaluation thresholds. Any case below its minimum score, missing expected risk, missing expected bug report, wrong release decision, guardrail violation, or hallucinated evidence can fail the gate.

Tech Stack

TypeScript
Node.js 20+
Vitest
Zod
ESLint
Prettier
tsx
GitHub Actions

Architecture

flowchart LR
  R["Requirements"] --> RA["Requirement Analyst"]
  RA --> RK["Risk Analyst"]
  RK --> TD["Test Designer"]
  TD --> AA["Automation Advisor"]
  AA --> TR["Test Runner"]
  TR --> FA["Failure Analyst"]
  FA --> BR["Bug Reporter"]
  BR --> RD["Release Decision"]

  Tools["Allowlisted Tools"] --> TR
  Guardrails["Guardrails"] --> Tools
  Approval["Human Approval Simulation"] --> Tools
  Data["Golden Scenarios + Seeded Bugs"] --> TR
  RD --> Reports["JSON + Markdown Reports"]
  Reports --> CI["CI Evaluation Thresholds"]

Agent Workflow

Requirement analyst extracts business rules, acceptance criteria, unknowns, and assumptions.
Risk analyst identifies severity, rationale, and recommended test layers.
Test designer creates test cases tied to risk IDs.
Automation advisor recommends API/unit/manual coverage.
Test runner calls only the simulated quality gate tool.
Failure analyst links failed checks to seeded bugs and requirements.
Bug reporter creates local structured reports.
Release decision agent returns GO, NO_GO, or NEEDS_REVIEW.

Every step emits a trace entry with input summary, output summary, tool calls, guardrail decisions, approval decisions, and pass/fail status.

Invalid agent output stops the workflow and is recorded as a structured failed trace entry. The workflow does not continue downstream with invalid data.

Tool Safety And Guardrails

Allowed tools:

readRequirement
inspectProductModel
runSimulatedQualityGate
parseTestResults
writeLocalReport
createLocalBugReport

Each tool call records a deterministic sequence number, tool name, allowed/blocked result, reason, input summary, and output summary. The registry validates tool names, Zod input schemas, path boundaries, URL absence, and approval policy before execution.

Allowed tool calls that throw are still audit-logged with deterministic sequence numbers and ERROR status.

Human Approval Simulation

Auto-allowed in V1:

Read requirements.
Inspect the product model.
Run simulated quality gates.
Parse test results.
Write local reports under reports/.

Blocked in V1:

Source modification.
Arbitrary shell.
Network calls.
Secret reads.
External issue creation.

Would require human approval in a real system:

External issue filing.
Production release decisions.
Code changes.
Sensitive data access.

Source modification and external issue creation are blocked because this lab demonstrates human-reviewed release support, not uncontrolled changes to real systems.

Evaluation Methodology

Golden scenarios score the full workflow across:

requirement understanding
risk coverage
test design coverage
tool usage correctness
guardrail compliance
failure analysis correctness
bug report quality
release decision correctness
no hallucinated evidence
human approval compliance

Default thresholds:

AGENT_EVAL_MIN_SCORE=0.80
AGENT_EVAL_MIN_RISK_COVERAGE=0.80
AGENT_EVAL_REQUIRE_GUARDRAILS=true
AGENT_EVAL_REQUIRE_NO_HALLUCINATED_EVIDENCE=true

Scenario results include expected risks found/missing, expected bug reports found/missing, expected release decision vs actual, guardrail violations, hallucinated evidence, final score, and minimum score.

The golden dataset includes an unsafe tool-call scenario that proves shell, network, secret-read, source-modification, and external-issue attempts are blocked and reported.

Quality Gate Matrix

Area	Gate
Type safety	`npm run typecheck`
Linting	`npm run lint`
Formatting	`npm run format:check`
Unit/agent/evaluation tests	`npm run test`
Golden scenario CI thresholds	`npm run agent:evaluate:ci`
Full local quality gate	`npm run quality`

Commands

npm ci
npm run typecheck
npm run lint
npm run format:check
npm run test
npm run agent:run
npm run agent:evaluate
npm run agent:evaluate:ci
npm run quality

Reports

npm run agent:run writes:

reports/agent-run-report.json
reports/agent-run-report.md

npm run agent:evaluate writes:

reports/agent-evaluation-report.json
reports/agent-evaluation-report.md

Markdown evaluation reports include a compact agent workflow trace summary.

Generated JSON and Markdown reports are ignored by git except reports/.gitkeep.

CI

GitHub Actions runs on push and pull request to main or master. The workflow installs with npm ci, runs npm run quality, and uploads reports/ as an artifact with if: always(). No secrets are required.

OpenAI Provider Example

src/providers/openAiProvider.example.ts is intentionally disabled. It is not imported by default, not used by tests/evals/CI, does not require environment variables, and throws immediately if constructed. The default flow uses FakeAgentProvider only.

Known Limitations

This is a portfolio lab, not a production QA platform.
It does not replace QA engineers.
Deterministic fake agents demonstrate workflow evaluation, but cannot prove real LLM behavior.
Real model integration would require additional eval datasets, monitoring, human review, and provider-specific safety checks.
No real external systems are modified.
No source code is modified by agents.
Human approval is simulated.

Future Improvements

Add larger golden datasets with more ambiguous and adversarial requirements.
Add optional real-provider adapters behind explicit opt-in flags.
Add mutation-style product-model defects for deeper failure analysis.
Add richer report visualizations while keeping CI deterministic.
Add provider-specific safety and monitoring docs for real model experiments.

Interview-Ready Explanation

This repo demonstrates how I would design a guarded AI-assisted quality workflow as an SDET: start from requirements, make risks explicit, derive test coverage from those risks, run only safe local tools, connect failures to evidence, produce auditable bug reports, and make release support conditional on measurable evaluation gates. The important part is not the fake agent output. The important part is the control system around the agents.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
docs		docs
reports		reports
requirements		requirements
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Quality Engineering Lab

What This Repo Does

Why It Matters For QA/SDET

What Is Deterministic

What Guardrails Block

What CI Evaluates

Tech Stack

Architecture

Agent Workflow

Tool Safety And Guardrails

Human Approval Simulation

Evaluation Methodology

Quality Gate Matrix

Commands

Reports

CI

OpenAI Provider Example

Known Limitations

Future Improvements

Interview-Ready Explanation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic Quality Engineering Lab

What This Repo Does

Why It Matters For QA/SDET

What Is Deterministic

What Guardrails Block

What CI Evaluates

Tech Stack

Architecture

Agent Workflow

Tool Safety And Guardrails

Human Approval Simulation

Evaluation Methodology

Quality Gate Matrix

Commands

Reports

CI

OpenAI Provider Example

Known Limitations

Future Improvements

Interview-Ready Explanation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages