Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions .github/workflows/provider-dialect-smoke.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: Provider Dialect Smoke

on:
workflow_dispatch:
inputs:
providers:
description: 'Comma-separated provider list (anthropic,openai,bedrock,github). Defaults to all.'
required: false
default: ''
model:
description: 'Optional model override applied to every provider.'
required: false
default: ''
schedule:
# Nightly at 03:17 UTC. Off-peak; staggered minute keeps us out of the top-of-hour herd.
- cron: '17 3 * * *'

permissions:
contents: read

concurrency:
group: provider-dialect-smoke
cancel-in-progress: false

jobs:
smoke:
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6

- name: Setup Node.js
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6
with:
node-version: 20.x
cache: npm

- name: Install dependencies
run: npm ci

- name: Run provider-dialect smoke matrix
env:
# Secret names mirror env-var names; runtime reads these via src/config/env.ts.
# Missing secrets cause that provider to be skipped (warn), not failed.
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_BASE_URL: ${{ secrets.OPENAI_BASE_URL }}
GITHUB_API_KEY: ${{ secrets.GITHUB_API_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
PROVIDERS_INPUT: ${{ inputs.providers }}
MODEL_INPUT: ${{ inputs.model }}
run: |
ARGS=()

Check warning on line 56 in .github/workflows/provider-dialect-smoke.yml

View workflow job for this annotation

GitHub Actions / QualOps Code Quality

MEDIUM: bug

GitHub Actions workflow inputs `providers` and `model` are silently ignored — the test spec does not parse CLI arguments
if [ -n "$PROVIDERS_INPUT" ]; then ARGS+=(--providers="$PROVIDERS_INPUT"); fi
if [ -n "$MODEL_INPUT" ]; then ARGS+=(--model="$MODEL_INPUT"); fi
npm run test:smoke -- "${ARGS[@]}"

- name: Upload run log
if: always()
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
with:
name: smoke-run-log-${{ github.run_id }}
path: evals/logs/smoke_*.json
if-no-files-found: warn
retention-days: 30
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ evals/logs/
evals/datasets/crb/benchmark_data.json
evals/datasets/crb/repos/

# Provider-dialect smoke harness scratch dir (per-run temp .qualopsrc.*.json files)
tests/smoke/.tmp/

# Logs
*.log
npm-debug.log*
Expand Down
45 changes: 19 additions & 26 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `skipPatterns` config field is now fully functional as a pre-filter: excluded files never reach the review pipeline in file-by-file mode, and agentic tool calls (`read_file`, `grep_files`, `glob_files`) enforce patterns at the handler layer for both OpenAI and Anthropic providers.
- Anthropic agentic mode now uses MCP tools for file access instead of SDK built-ins, ensuring `skipPatterns` enforcement is consistent across providers.
- `globFiles` tool upgraded from `find`-based to `glob` npm package for proper `**` glob support.

### Changed
- Default `skipPatterns` in `ConfigService` changed from infrastructure dirs to empty (`[]`) — patterns are project-specific and should be set per project. qualops's own `.qualopsrc.json` now lists its TS-specific patterns.
- Removed `file-exclusions.ts` (dead code — `applyPenalty()` was never called).
- Provider-dialect smoke spec: `npm run test:smoke` runs the 4 AI caller stages migrated in PR #145 (`file-reviewer`, `validation-resolver`, `dedup-resolver`, `root-cause-extract`) against each real provider (`anthropic`, `openai`, `bedrock`, `github`) using a slice fixture as input. Validates that the structured-output dialect path returns a zod-validated response without throwing. Implemented as a Jest spec under `tests/smoke/` with its own `jest.smoke.config.ts` — not picked up by default `npm test` (whose `roots` are limited to `tests/unit/`). Provider config comes from `ConfigService` + the existing `PROVIDER_DEFAULTS` table, not a duplicated table. Providers with missing credentials are `describe.skip()`-ed; providers with malformed credentials fail loudly via the provider class's own `validateApiKey()`. Input is a slice fixture under `evals/datasets/inbox/smoke-sql-injection/`, loosely following TDR 0002. Nightly + manual CI workflow at `.github/workflows/provider-dialect-smoke.yml`. Automates the unchecked manual smoke item from PR #145's test plan; distinct from the deferred per-stage golden-evals item which validates output quality.

## [0.2.3] - 2026-05-28

### Changed
- CRB evals are now self-contained slice directories (`evals/datasets/crb/<id>/slice.json` + `repo/`) — no external repo cloning required. Replaced `fetch-crb-dataset.ts` with `check-crb-staleness.ts` which validates local slices against upstream CRB PR URLs.
- Neutralize language-specific wording in built-in prompts where the underlying tooling is genuinely language-agnostic, so review output is no longer TypeScript-flavored when qualops is pointed at a non-TS repo.

### Changed
- Bump `@anthropic-ai/claude-agent-sdk` from 0.2.139 to 0.3.144.
- Bump `@anthropic-ai/claude-agent-sdk-linux-x64` from 0.2.139 to 0.3.144.
- Bump `@opentelemetry/sdk-node` from 0.217.0 to 0.218.0.
Expand All @@ -40,6 +37,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- New `BaseAIProvider` consolidating shared token accounting + cost computation while preserving exact per-provider semantics (OpenAI `prompt_tokens` incl. cached, Anthropic/Bedrock `input_tokens` excl. cached; Bedrock log policy unchanged).
- New `ProviderCapabilities` descriptor that routes `(provider, model)` to the right structured-output dialect, replacing model-name string sniffing.
- Reusable zod schemas in `src/ai/shared/schemas/` for review issues, validation results, dedup indices, search/replace fixes, and root-cause classifications.
- Agentic mode now supports OpenAI and Azure OpenAI providers via `@openai/agents`. Set `provider: "openai"` in your stage config to use the OpenAI adapter; set `OPENAI_BASE_URL` to an Azure endpoint and the correct Azure client is used automatically.
- You can now specify a model and provider together in stage config using `model: { provider: "openai", name: "gpt-4o" }` instead of relying on a separate top-level `provider` field.
- OpenTelemetry observability instrumentation across the full review pipeline (file-by-file, agentic, and eval runs), with auto-detection for Langfuse and generic OTLP backends. All span attributes are sanitized to prevent credential leakage.
- Agentic jobs now support a `prompt` field for file-based prompt instructions, combined with the existing inline `systemPrompt`
- GitHub Models AI provider (`provider: "github"`) via `https://models.github.ai/inference`
- Zod-based runtime validation for `.qualopsrc.json` with deprecation warnings for legacy fields
- JSON Schema generated from Zod schemas (`npm run generate:schema`) replacing hand-maintained schema
- Eval `--severity` filter to run only CRB cases with matching golden comment severity
- Report on eval flakiness for Code Review Benchmark `npm run eval:recall-report` with filtering options `-- --severity=critical`
- `init-claude` now scaffolds a validated default config, quality prompt, and supports `--provider` flag
- New `Promote to Stable` workflow (`workflow_dispatch`) for promoting a beta release to a clean stable version
- New `update-beta-ref` and `update-stable-ref` jobs in the npm publish workflow that force-move the `beta` / `stable` lightweight git tags after each release
- `docs/tdr/` folder for Technical Design Records, with TDR 0001 documenting the release process
- New `Releases` page on the docs site explaining the two-tier model to consumers

### Changed
- `AIProvider.complete` is now overloaded: `complete<S extends z.ZodType>(opts & { schema: S })` returns `AIResponse<z.infer<S>>` (schema-typed); plain `complete(opts)` still returns `AIResponse<string>`.
Expand All @@ -52,6 +63,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Release failure issues now include the failing stages and release kind (beta vs stable)
- Normalize `uses: eggai-tech/qualops@v1` examples across the README, docs, and example workflows to `@stable`
- Refactor agentic tools: `tools/index.ts` is now a provider-agnostic registry (`createToolSet`); Anthropic and OpenAI SDK wiring stays inside their respective adapters
- AI provider types/factory now include `github` and use stricter provider typing
- Environment config and test setup now include `GITHUB_API_KEY`
- Update documentation to reference the new JSON Schema and provide configuration examples
- Added eval suite

### Removed
- Deleted `JsonParser` class and the duplicated private `fixMalformedJson` (last production callers migrated).
Expand All @@ -70,28 +85,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Release version validation now allows only the prerelease labels the publish workflow recognises (`rc`, `alpha`, `beta`); unrecognised labels like `0.3.0-preview.1` are rejected up-front instead of silently publishing to `latest`
- `Promote to Stable` workflow now asserts that `stable_version` equals `beta_version`'s base (e.g., `0.4.0-beta.1` can only promote to `0.4.0`)

### Added
- Agentic mode now supports OpenAI and Azure OpenAI providers via `@openai/agents`. Set `provider: "openai"` in your stage config to use the OpenAI adapter; set `OPENAI_BASE_URL` to an Azure endpoint and the correct Azure client is used automatically.
- You can now specify a model and provider together in stage config using `model: { provider: "openai", name: "gpt-4o" }` instead of relying on a separate top-level `provider` field.
- OpenTelemetry observability instrumentation across the full review pipeline (file-by-file, agentic, and eval runs), with auto-detection for Langfuse and generic OTLP backends. All span attributes are sanitized to prevent credential leakage.
- Agentic jobs now support a `prompt` field for file-based prompt instructions, combined with the existing inline `systemPrompt`
- GitHub Models AI provider (`provider: "github"`) via `https://models.github.ai/inference`
- Zod-based runtime validation for `.qualopsrc.json` with deprecation warnings for legacy fields
- JSON Schema generated from Zod schemas (`npm run generate:schema`) replacing hand-maintained schema
- Eval `--severity` filter to run only CRB cases with matching golden comment severity
- Report on eval flakiness for Code Review Benchmark `npm run eval:recall-report` with filtering options `-- --severity=critical`
- `init-claude` now scaffolds a validated default config, quality prompt, and supports `--provider` flag
- New `Promote to Stable` workflow (`workflow_dispatch`) for promoting a beta release to a clean stable version
- New `update-beta-ref` and `update-stable-ref` jobs in the npm publish workflow that force-move the `beta` / `stable` lightweight git tags after each release
- `docs/tdr/` folder for Technical Design Records, with TDR 0001 documenting the release process
- New `Releases` page on the docs site explaining the two-tier model to consumers

### Changed
- AI provider types/factory now include `github` and use stricter provider typing
- Environment config and test setup now include `GITHUB_API_KEY`
- Update documentation to reference the new JSON Schema and provide configuration examples
- Added eval suite

## [0.2.1] - 2026-03-14

### Changed
Expand Down
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,24 @@ Reference in `.qualopsrc.json`:
}
```

## Testing

### Unit tests

```bash
npm test
```

### Provider-dialect smoke tests

Real-API tests that exercise the 4 AI caller stages (`file-reviewer`, `validation-resolver`, `dedup-resolver`, `root-cause-extract`) against each supported provider. Validates that the structured-output dialect path returns a zod-validated response without throwing. Providers without credentials are skipped automatically.

```bash
npm run test:smoke
```

See [`tests/smoke/README.md`](./tests/smoke/README.md) for details on env vars and CI setup.

## License

MIT
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
import { Request, Response } from 'express';
import { db } from '../db';

export async function getUser(req: Request, res: Response) {
const userId = req.params.id;
const result = await db.query(`SELECT * FROM users WHERE id = '${userId}'`);
res.json(result.rows[0]);
}
20 changes: 20 additions & 0 deletions evals/datasets/inbox/smoke-sql-injection/slice.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"id": "smoke-sql-injection",
"language": "typescript",
"filePath": "src/api/users.ts",
"diff": "@@ -10,6 +10,12 @@\n import { db } from '../db';\n \n+export async function getUser(req: Request, res: Response) {\n+ const userId = req.params.id;\n+ const result = await db.query(`SELECT * FROM users WHERE id = '${userId}'`);\n+ res.json(result.rows[0]);\n+}\n+",
"purpose": "smoke",
"capturedAt": "2026-05-21",
"capturedBy": "provider-dialect-smoke-harness",
"note": "Synthetic input for the provider-dialect smoke harness. Not a captured real-world miss. Loosely follows TDR 0002 slice layout (slice.json + repo/ tree) so future smoke fixtures can be migrated to the full inbox eval format if the slice harness lands.",
"expected": [
{
"file": "src/api/users.ts",
"line": 6,
"lineEnd": 6,
"type": "security",
"severity": "critical",
"description": "SQL injection via string interpolation in query"
}
]
}
27 changes: 27 additions & 0 deletions jest.smoke.config.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
export default {
displayName: 'qualops-smoke',
preset: './jest.preset.js',
testEnvironment: 'node',
setupFilesAfterEnv: ['<rootDir>/tests/setup/smoke.setup.ts'],
roots: ['<rootDir>/tests/smoke'],
globals: {},
testMatch: ['<rootDir>/tests/smoke/**/*.spec.ts'],
transform: {
'^.+\\.(ts|mjs|js)$': [
'ts-jest',
{
tsconfig: '<rootDir>/tsconfig.spec.json',
useESM: true,
},
],
},
moduleFileExtensions: ['ts', 'js', 'mjs'],
extensionsToTreatAsEsm: ['.ts'],
moduleNameMapper: {
'^@/(.*)$': '<rootDir>/src/$1',
'^@tests/(.*)$': '<rootDir>/tests/$1',
'^(\\.{1,2}/.*)\\.js$': '$1',
},
transformIgnorePatterns: ['node_modules/(?!.*\\.mjs$)'],
maxWorkers: 1,
};
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@
"eval:upload:qualops": "npx tsx evals/src/upload-datasets.ts --source=qualops",
"eval:upload:crb:all": "npx tsx evals/src/upload-datasets.ts --source=crb",
"eval:recall-report": "npx tsx evals/src/recall-report.ts",
"test:smoke": "jest --config jest.smoke.config.ts",
"generate:schema": "ts-node --transpile-only --project tsconfig.lib.json scripts/generate-config-schema.ts"
},
"dependencies": {
Expand Down
9 changes: 9 additions & 0 deletions tests/setup/smoke.setup.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import { config as dotenvConfig } from 'dotenv';

// Load .env before any module that reads process.env (e.g. envConfig singleton).
// This must happen in setupFilesAfterEnv, which runs before the spec is imported.
dotenvConfig();

// Per-test timeout for real-API calls. Long enough to absorb provider retries on
// transient 5xx/429s without parking the runner indefinitely.
jest.setTimeout(120_000);
71 changes: 71 additions & 0 deletions tests/smoke/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Provider-dialect smoke

A real-API Jest spec for the 4 AI caller stages migrated in PR #145
(`file-reviewer`, `validation-resolver`, `dedup-resolver`,
`root-cause-extract`). Runs each stage through each real provider
(`anthropic`, `openai`, `bedrock`, `github`) using a slice fixture as input.
Validates plumbing only — the structured-output dialect path returns a
zod-validated response without throwing. Output quality is out of scope and
covered by the deferred per-stage golden-evals follow-up.

This spec is **not** part of the default `npm test` run. The base
`jest.config.js` constrains `roots` to `tests/unit/`, so this file is
unreachable from `npm test`. It runs under its own config,
`jest.smoke.config.ts`, via `npm run test:smoke`.

## Architecture

- **Test runner**: Jest (own config; not picked up by unit or integration lanes).
- **Provider configuration**: per-provider temp `.qualopsrc.json` written to
`tests/smoke/.tmp/` and loaded via `ConfigService.setConfigPath()`. Pricing
+ model defaults come from `PROVIDER_DEFAULTS` in `src/config/config.ts`
(with one inline default for GitHub Models, which is not in that table).
Stage classes are obtained via `AIFactory.createForStage('review')` — same
path that production code uses; no direct provider instantiation.
- **Input**: slice fixture at
`evals/datasets/inbox/smoke-sql-injection/` (slice.json + repo/ tree),
loosely following [TDR 0002](../../docs/tdr/0002-evals-from-real-prs.md).
- **Skip vs fail**: a provider whose credential env var is missing is marked
`describe.skip` at module load — the entire 4-stage block is statically
skipped in the test report. A provider with present-but-malformed
credentials is attempted; the provider class's own `validateApiKey()` /
`validateConfiguration()` throws, surfacing as a failed test with a real
error.

## Run

```bash
npm run test:smoke
```

The CI workflow exports `--json --outputFile=smoke-result.json` to capture
the test results as an artifact.

## Env vars

| Provider | Env vars |
|---|---|
| `anthropic` | `ANTHROPIC_API_KEY` |
| `openai` | `OPENAI_API_KEY` (+ optional `OPENAI_BASE_URL` for Azure / proxies) |
| `bedrock` | `AWS_REGION` + `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY` |
| `github` | `GITHUB_API_KEY` (a `ghp_…`, `github_pat_…`, etc. PAT — **not** `GITHUB_TOKEN`) |

In CI, every entry above corresponds to a GitHub Actions repo secret of the
same name (e.g. `secrets.ANTHROPIC_API_KEY`). The `ANTHROPIC_API_KEY` secret
already exists in the repo (used by `ci.yml`); the others must be added
before their providers contribute non-skip coverage in the nightly run.

## CI

`.github/workflows/provider-dialect-smoke.yml` — manual `workflow_dispatch`
and nightly cron at 03:17 UTC. Gated on API-key repository secrets. **Not**
part of PR-blocking CI.

## Notes on `root-cause-extract`

The stage swallows provider errors internally and returns synthetic
`{rootCause: 'other', confidence: 0}` classifications for every input issue.
A naïve "did the function throw" assertion would always pass even when the
API call silently failed. The spec cross-checks
`AIFactory.createForStage('review').getTokenStats()` and the classification
distribution to detect this case and surface it as a failure.
Loading