diff --git a/AGENTS.md b/AGENTS.md index de1edc63..9385713b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -198,10 +198,23 @@ Run locally: uv run --with pytest --with requests pytest -q tests/ ``` +## OpenHands/extensions — No Release Tags + +`OpenHands/extensions` does **not** publish versioned release tags (no `v1`, `v2`, etc.). +All GitHub Action references to plugins in that repo must use `@main`: + +```yaml +uses: OpenHands/extensions/plugins/qa-changes@main +uses: OpenHands/extensions/plugins/pr-review@main +``` + +Do **not** suggest pinning to `@v1` or any other tag — they don't exist and the workflow will fail. + ## Related repos (source-of-truth) - OpenHands Agent SDK: https://github.com/OpenHands/software-agent-sdk - OpenHands CLI: https://github.com/OpenHands/OpenHands-CLI - OpenHands (Web/App): https://github.com/OpenHands/OpenHands +- OpenHands Extensions: https://github.com/OpenHands/extensions (plugins, skills, actions — **no release tags**) When updating SDK features or examples, expect to update this repo too (especially under `sdk/`). diff --git a/docs.json b/docs.json index 548487e1..c1d31682 100644 --- a/docs.json +++ b/docs.json @@ -198,6 +198,7 @@ "pages": [ "openhands/usage/use-cases/vulnerability-remediation", "openhands/usage/use-cases/code-review", + "openhands/usage/use-cases/qa-changes", "openhands/usage/use-cases/incident-triage", "openhands/usage/use-cases/cobol-modernization", "openhands/usage/use-cases/dependency-upgrades", diff --git a/openhands/usage/automations/overview.mdx b/openhands/usage/automations/overview.mdx index 5c76f31e..cb268bcd 100644 --- a/openhands/usage/automations/overview.mdx +++ b/openhands/usage/automations/overview.mdx @@ -115,6 +115,13 @@ Each use case has a ready-to-use automation prompt. Click a card to see the full > Monitor API health, analyze errors, and alert your team automatically. + + Functionally test PR changes by exercising the software as a real user would. + Set up automated PR reviews to maintain code quality and catch bugs early. + + Validate PR changes by actually running the software as a real user would. + - + Functionally test PR changes by exercising the software as a real user would. +--- + + + Check out the complete QA changes plugin with ready-to-use code and configuration. + + +Automated QA testing goes beyond code review and CI: instead of reading diffs or running the test suite, the QA agent actually **runs the software** and verifies that changes work as claimed. It sets up the environment, exercises changed behavior as a real user would (browser, CLI, API requests), and posts a structured report with evidence. + +This is Layer 2 of the [Verification Stack](https://www.openhands.dev/blog/verification-stack), complementing the [code review agent](/openhands/usage/use-cases/code-review). + +## Overview + +The QA agent follows a four-phase methodology: + +1. **Understand** — Reads the PR diff, title, and description. Classifies changes (new feature, bug fix, refactor, config) and identifies entry points (CLI commands, API endpoints, UI pages). +2. **Setup** — Bootstraps the repository: installs dependencies, builds the project, notes CI status. +3. **Exercise** — The core phase: spins up servers, opens browsers, runs CLI commands, makes HTTP requests — testing the changed behavior as a real user would. For bug fixes, it reproduces the bug on the base branch and verifies the fix on the PR branch. +4. **Report** — Posts a structured QA report as a PR comment, with evidence (commands run, outputs, screenshots) and a verdict (PASS / FAIL / PARTIAL). + +The QA agent knows when to give up: after exhausting multiple approaches without progress, it reports what it tried and stops — rather than spinning endlessly. + +## What It Does (and Doesn't) + + + + - Run the actual application and interact with it + - Make real HTTP requests, run real CLI commands + - Open browsers and verify UI changes + - Reproduce bugs and verify fixes end-to-end + - Report with evidence (commands, outputs, screenshots) + + + - Run the test suite (that's CI's job) + - Analyze code for style or structure (that's code review's job) + - Run linters, formatters, or type checkers + - Substitute `--help` or `--dry-run` for real execution + + + +## Quick Start + +### GitHub Actions + +Create `.github/workflows/qa-changes.yml` in your repository: + +```yaml +name: QA Changes + +on: + pull_request: + types: [opened, ready_for_review, labeled] + +permissions: + contents: read + pull-requests: write + issues: write + +jobs: + qa: + if: | + (github.event.action == 'opened' && github.event.pull_request.draft == false) || + github.event.action == 'ready_for_review' || + github.event.label.name == 'qa-this' + runs-on: ubuntu-latest + steps: + - name: Run QA Changes + uses: OpenHands/extensions/plugins/qa-changes@main + with: + llm-model: anthropic/claude-sonnet-4-20250514 + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} +``` + +Add your `LLM_API_KEY` to your repository's **Settings → Secrets and variables → Actions**. + +### In a Conversation + +You can also trigger QA manually in any OpenHands conversation. First, install the skill: + +``` +/add-skill https://github.com/OpenHands/extensions/tree/main/skills/qa-changes +``` + +Then invoke it: + +``` +/qa-changes +``` + +The agent will ask for the PR to test, or you can provide context directly: + +``` +/qa-changes — Please QA PR #42 on the my-org/my-repo repository. +Focus on the new dashboard page and verify it renders correctly. +``` + +## QA Report Format + +The QA agent posts a structured report as a PR comment: + +``` +## QA Report + +**Status: PASS** ✅ + +### Changes Tested +- New `/api/health` endpoint returns 200 with version info +- Dashboard page renders at `/dashboard` with correct data + +### Evidence +1. Started server with `npm run dev` +2. `curl http://localhost:3000/api/health` → 200 OK, body: {"status":"ok","version":"1.2.0"} +3. Navigated to http://localhost:3000/dashboard — page renders correctly + [screenshot attached] + +### Edge Cases +- Empty database state: dashboard shows "No data" placeholder ✅ +- Invalid auth token: returns 401 as expected ✅ +``` + +## Customization + +### Change Types + +The QA agent adapts its approach based on the type of change: + +| Change Type | QA Approach | +|-------------|-------------| +| **Frontend / UI** | Starts dev server, opens browser, verifies visual changes, tests interactions | +| **CLI** | Runs commands with realistic arguments, verifies output, tests edge cases | +| **API / Backend** | Starts server, makes HTTP requests, verifies responses and side effects | +| **Bug fix** | Reproduces bug on base branch, verifies fix on PR branch (before/after) | +| **Library / SDK** | Writes and runs a short script that imports and calls changed functions | + +### Repository-Specific QA Guidelines + +Add repo-specific QA instructions by creating `.agents/skills/qa-guide.md`: + +```markdown +--- +name: qa-guide +description: Project-specific QA guidelines +triggers: +- /qa-changes +--- + +# QA Guidelines for [Your Project] + +## Environment Setup +- Run `make setup` to initialize the development environment +- The dev server runs on port 8080 + +## Key Test Scenarios +- Always verify the admin dashboard at /admin after backend changes +- For API changes, test with both authenticated and unauthenticated requests + +## Known Limitations +- The payment module requires a Stripe test key — skip payment flow testing +``` + +## Integration with the Verification Stack + +The QA agent is most powerful when used alongside the [code review agent](/openhands/usage/use-cases/code-review) and the [iterate skill](https://github.com/OpenHands/extensions/tree/main/skills/iterate) as part of the full [Verification Stack](https://www.openhands.dev/blog/verification-stack): + +1. **Code review** catches issues by reading the diff (style, security, data structures) +2. **QA** catches issues by running the software (behavioral regressions, UI bugs) +3. **Iterate** orchestrates the loop — fixing issues flagged by either verifier and re-polling until the PR is clean + +## Troubleshooting + + + + Ensure your repository's setup instructions are documented in `README.md` or `AGENTS.md`. The agent follows these to bootstrap the environment. If setup requires special steps, add them to a custom QA guide. + + + + PARTIAL means some scenarios passed and others failed or couldn't be tested. Read the report details — it will explain what worked and what didn't. Common causes: missing environment variables, external service dependencies, or insufficient permissions. + + + + For large PRs with many changed entry points, the agent may need more time. Consider splitting large PRs into smaller, focused changes. You can also add a custom QA guide that prioritizes the most important scenarios. + + + +## Automate This + +You can run QA automatically on every PR using [OpenHands Automations](/openhands/usage/automations/overview). +Copy this prompt into a new conversation to set one up: + +``` +Create an automation called "Automated QA" that triggers on pull_request.opened +and pull_request.labeled (with label "qa-this") for my repositories. + +It should use the qa-changes plugin from github:OpenHands/extensions to: +1. Check out the PR branch +2. Run the QA agent to exercise the changed behavior as a real user would +3. Post a structured QA report as a PR comment with evidence (commands run, outputs, screenshots) + +Learn more at https://docs.openhands.dev/openhands/usage/use-cases/qa-changes +``` + +For automated QA on every push, use the [qa-changes plugin](https://github.com/OpenHands/extensions/tree/main/plugins/qa-changes) as a GitHub Action instead. + +## Related Resources + +- [QA Changes Plugin](https://github.com/OpenHands/extensions/tree/main/plugins/qa-changes) — GitHub Actions plugin +- [QA Changes Skill](https://github.com/OpenHands/extensions/tree/main/skills/qa-changes) — Detailed skill methodology +- [Verification Stack](https://www.openhands.dev/blog/verification-stack) — How QA fits into the full verification pipeline +- [Automated Code Review](/openhands/usage/use-cases/code-review) — The complementary code review agent