Shell-native test automation powered by GitHub Copilot CLI and Microsoft Playwright CLI.
No JavaScript. No .mjs files. No npx playwright. Just pure shell scripts orchestrating two CLI tools.
Not just for web apps. Works for pure REST APIs, GraphQL endpoints, or full web applications. Give it a URL or a file of curl commands — it figures out the rest.
┌─────────────────────────────────────────────────────────────────┐
│ orchestrate.sh │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Phase 1 │ │ Phase 2 │ │ Phase 3 │ │
│ │ Discovery │──│ Generation │──│ Execution │ │
│ │ (LLM) │ │ (LLM) │ │ (Zero-LLM) │ │
│ └──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ │
│ │ │ │ │
│ ┌──────┴─────────────────┴────────────────────┴──────────┐ │
│ │ Shell Utility Layer │ │
│ │ copilot-cli.sh │ playwright-cli.sh │ generate-dsl.sh │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────┴───────────────────────────────┐ │
│ │ CLI Tools (installed globally) │ │
│ │ copilot -p "..." --model X --yolo --silent │ │
│ │ playwright-cli -s=session <command> │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Tests are generated in QSpec (.qspec), a pipe-delimited DSL that maps directly to playwright-cli commands:
# Scenario: Login with valid credentials
# Tags: @smoke @auth
GOTO | http://localhost:8080/#/login
FOCUS | input[placeholder="you@example.com"]
TYPE | demo@taskpilot.com
PRESS | Tab
TYPE | Demo1234!
PRESS | Enter
WAIT | 500
ASSERT_URL | #/
ASSERT_TEXT | h1 | Dashboard
Why QSpec over Gherkin?
- Each line maps 1:1 to a
playwright-clicommand — no LLM needed to execute - Uses CSS selectors (never ephemeral ref IDs like
e14) - DSL vocabulary is auto-generated from
playwright-cli --help— adapts to CLI version
Phase 3 (execution) makes zero LLM calls for passing tests:
- QSpec lines are deterministically translated to
playwright-clicommands - LLM is invoked only on failure — to heal broken selectors from a fresh snapshot
- This makes execution fast, deterministic, and cost-effective
The DSL spec (specs/qspec-dsl.md) is not hardcoded. It is regenerated at Phase 2 start by scripts/utils/generate-dsl.sh, which queries playwright-cli --help to discover available commands. If playwright-cli adds new commands, the DSL automatically picks them up.
| Tool | Purpose | Install |
|---|---|---|
| GitHub Copilot CLI | AI analysis, test generation, healing on failure | npm i -g @github/copilot |
| Playwright CLI | Browser automation, snapshots, network capture | npm i -g @playwright/cli@latest |
copilot -p "your prompt here" --model gpt-5-mini --yolo --silent --add-dir ./contextKey flags:
-p "prompt"— programmatic (non-interactive) mode--model <model>— select the model (gpt-5-mini, claude-haiku-4.5, gpt-4.1, etc.)--yolo— execute without confirmation--silent— suppress interactive output--add-dir <dir>— add directory files as context@filepath— reference specific file in prompt
playwright-cli -s=mysession open https://example.com
playwright-cli -s=mysession snapshot --filename=page.yml
playwright-cli -s=mysession eval "document.querySelector('input[type=email]').focus()"
playwright-cli -s=mysession type "test@example.com"
playwright-cli -s=mysession press Enter
playwright-cli -s=mysession screenshot --filename=capture.png
playwright-cli -s=mysession closeKey concepts:
-s=name— persistent named sessioneval+type+press— interact with elements via CSS selectors (no ref IDs)- Snapshots: YAML accessibility tree of the page
run-code: Execute arbitrary Playwright JS in the browser context
./scripts/setup.sh./scripts/main.sh
# Asks what you need, reasons via Copilot CLI, outputs exact commands./scripts/orchestrate.sh https://example.com --mode=web# Phase by phase with model selection
./scripts/orchestrate.sh http://localhost:8080/ --mode=web --phase=1 --scope=ui --skip-a11y --skip-api 2>&1
./scripts/orchestrate.sh http://localhost:8080/ --mode=web --phase=2 --scope=ui --skip-a11y --skip-api --resume 2>&1
./scripts/orchestrate.sh http://localhost:8080/ --mode=web --phase=3 --scope=ui --skip-a11y --skip-api --resume 2>&1# From curl commands file
./scripts/orchestrate.sh --mode=api --input-file=./curls.txt
# From base URL
./scripts/orchestrate.sh https://api.example.com --mode=api./scripts/quick-discover.sh https://example.com # Phase 1 only
./scripts/quick-test-url.sh https://example.com # Full UI pipeline
./scripts/quick-test-api.sh https://api.example.com # Full API pipeline
./scripts/quick-test-api.sh --file=./curls.txt # API from fileqopilot/
├── .env.example # Environment variable template
├── .gitignore
├── README.md
│
├── config/
│ └── config.sh # Central configuration
│
├── specs/
│ └── qspec-dsl.md # Auto-generated DSL vocabulary (from playwright-cli --help)
│
├── scripts/
│ ├── main.sh # Intelligent entry point (AI advisor)
│ ├── orchestrate.sh # Master pipeline orchestrator
│ ├── setup.sh # Dependency installer
│ ├── generate-report.sh # Final report generator
│ ├── quick-discover.sh # Quick: Phase 1 only
│ ├── quick-test-url.sh # Quick: Full UI pipeline
│ ├── quick-test-api.sh # Quick: Full API pipeline
│ │
│ ├── utils/
│ │ ├── common.sh # Shared functions, init, prerequisites
│ │ ├── logger.sh # Logging with colors & AI audit
│ │ ├── copilot-cli.sh # Copilot CLI wrapper functions
│ │ ├── playwright-cli.sh # Playwright CLI wrapper functions
│ │ └── generate-dsl.sh # Dynamic DSL generator (queries playwright-cli --help)
│ │
│ └── phases/
│ ├── phase1-discover-ui.sh
│ ├── phase1-discover-api.sh
│ ├── phase1-discover-api-from-file.sh
│ ├── phase1-discover-a11y-perf.sh
│ ├── phase2-generate-ui-tests.sh
│ ├── phase2-generate-api-tests.sh
│ ├── phase2-generate-crosscutting-tests.sh
│ ├── phase3-execute-ui-tests.sh
│ ├── phase3-execute-api-tests.sh
│ └── phase3-execute-a11y-perf.sh
│
└── output/ # Generated at runtime
├── discovery/
│ ├── pages/
│ ├── api/
│ ├── accessibility/
│ └── performance/
├── generation/
│ ├── ui/ (.qspec files)
│ ├── api/ (.qspec files)
│ ├── e2e/ (.qspec files)
│ └── edge-cases/ (.qspec files)
├── execution/
│ ├── ui-results/
│ ├── api-results/
│ └── a11y-perf-results/
└── final-report.md
| Script | What It Does | Model |
|---|---|---|
phase1-discover-ui.sh |
Opens URL, takes snapshots at 4 viewports, captures network/console, crawls pages, maps navigation, identifies auth flows | MODEL_DISCOVERY |
phase1-discover-api.sh |
Captures network traffic, classifies API endpoints, tests each via run-code, documents security surface |
MODEL_API_ANALYSIS |
phase1-discover-a11y-perf.sh |
Evaluates WCAG checks via eval, captures performance timing, analyzes responsive behavior |
MODEL_DISCOVERY |
| Script | What It Does | Model |
|---|---|---|
phase2-generate-ui-tests.sh |
Generates .qspec files for UI pages, forms, auth, navigation. Loads DSL from specs/qspec-dsl.md. |
MODEL_GENERATION |
phase2-generate-api-tests.sh |
Generates .qspec for REST endpoints, GraphQL, security testing |
MODEL_GENERATION |
phase2-generate-crosscutting-tests.sh |
Generates .qspec for accessibility, performance, responsive, E2E journeys |
MODEL_GENERATION |
| Script | What It Does | LLM Usage |
|---|---|---|
phase3-execute-ui-tests.sh |
Reads .qspec → deterministic playwright-cli commands. Zero LLM for passing tests. On failure: snapshot → MODEL_HEAL → retry. |
MODEL_HEAL (failure only) |
phase3-execute-api-tests.sh |
Converts .qspec API scenarios → run-code fetch() commands, executes, validates responses |
MODEL_EXECUTION |
phase3-execute-a11y-perf.sh |
Re-runs a11y & perf checks via eval, compares against Phase 1 baselines, flags regressions |
MODEL_EXECUTION |
Phase 3A UI execution uses a heal-only-on-failure approach:
- Execute QSpec line deterministically via
playwright-cli - If the command passes → next line (zero LLM)
- If the command fails → take fresh snapshot of current page
- Send snapshot + failed command + original CSS selector to Copilot CLI (
MODEL_HEAL) - Copilot returns a corrected command
- Retry with the fix → ADAPTED (healed) or TRUE FAILURE
QSpec line → translate → playwright-cli → PASS ✓ (no LLM)
→ FAIL → snapshot → MODEL_HEAL → retry → ADAPTED ✓
→ TRUE FAILURE ✗
Edit .env (from .env.example) or config/config.sh:
| Variable | Default | Description |
|---|---|---|
TARGET_URL |
(required) | URL to test |
MODEL_DISCOVERY |
gpt-5-mini |
Model for Phase 1 (lightweight) |
MODEL_GENERATION |
claude-haiku-4.5 |
Model for Phase 2 (test generation) |
MODEL_HEAL |
gpt-4.1 |
Model for Phase 3 failure healing |
MODEL_REPORTING |
gpt-4.1 |
Model for report synthesis |
MAX_DISCOVERY_PAGES |
20 |
Max pages in discovery |
MAX_RETRIES |
3 |
Max heal retries per failed step |
| Principle | Implementation |
|---|---|
| Single Responsibility | Each phase script does ONE thing. Phase 2 generates; Phase 3 executes. No cross-phase leakage. |
| Open/Closed | Add new phase scripts without modifying existing ones. DSL extends via generate-dsl.sh. |
| Liskov Substitution | All pw_* functions have consistent signatures. Swap implementations without breaking callers. |
| Interface Segregation | copilot-cli.sh and playwright-cli.sh are separate — scripts source only what they need. |
| Dependency Inversion | Phase scripts depend on abstractions (pw_snapshot, copilot_prompt) not concrete CLI syntax. Shared DSL loaded from spec file, not hardcoded. |
Copyright (C) 2026 Naveen Kumar Dommaraju
Licensed under the Server Side Public License, Version 1 (SSPL v1).
- ✅ Free to use, modify, and self-host
- ✅ Open source — read and learn from the code
- ❌ Cannot offer this software as a commercial service without open sourcing your entire stack
- ❌ Cannot resell or rebrand as a proprietary product
See LICENSE for full terms. For commercial licensing enquiries, contact the author.