Provider-dialect smoke harness for AI caller stages by sebastianwessel · Pull Request #188 · eggai-tech/qualops

sebastianwessel · 2026-05-21T11:21:09Z

Summary

Automates the unchecked manual smoke item from PR #145's test plan: exercises each of the 4 AI caller stages migrated to native structured output (file-reviewer, validation-resolver, dedup-resolver, root-cause-extract) against each real provider (anthropic, openai, bedrock, github) using a slice fixture as input. Validates plumbing only — the provider-specific dialect path returns a zod-validated response without throwing. Output quality stays scoped to the deferred per-stage golden-evals follow-up.

Why

PR #145 introduced six provider-dialect paths (OpenAI strict json_schema, OpenAI json_object fallback, Anthropic output_config, Anthropic tool_use fallback, Bedrock forced tool_use, GitHub Models via OpenAI-compatible) and four zod schemas. Unit tests cover each path with mocked SDKs; nothing exercises a full stage call end-to-end against a real provider. The risk surface is the stage × dialect matrix.

Approach

Jest spec at tests/smoke/provider-dialect-smoke.spec.ts with its own jest.smoke.config.ts. The base jest.config.js constrains roots to tests/unit/, so this file is unreachable from default npm test; npm run test:smoke uses the smoke config.
Provider configuration via ConfigService: per-provider temp .qualopsrc.json written under tests/smoke/.tmp/, loaded via ConfigService.setConfigPath(). Pricing + model defaults come from PROVIDER_DEFAULTS in src/config/config.ts (with one inline default for GitHub Models, which is not in that table). Stage classes are obtained via AIFactory.createForStage('review') — same path production code uses.
Skip vs fail: per-provider credential presence is checked at module load and missing-credential providers are statically marked describe.skip(), so the entire 4-stage block shows up as Skipped in the test report rather than Pass. Providers with present-but-malformed credentials are attempted and fail loudly via the provider class's own validateApiKey().
Input: slice fixture at evals/datasets/inbox/smoke-sql-injection/ (slice.json + repo/ tree), loosely following TDR 0002. The inbox dataset infrastructure from PR chore(evals): add eval case for PR 144 bash tool security findings #152 hasn't landed yet, so this fixture is self-contained; it slots into the new format if/when the slice harness lands.
root-cause-extract swallows provider errors internally and returns synthetic {rootCause: 'other', confidence: 0} classifications. The spec cross-checks AIFactory.createForStage('review').getTokenStats() and the classification distribution to surface silent failures as test failures.

CI

Nightly + manual workflow. Workflow file is currently at the repo root pending a workflow-scoped push that moves it under .github/workflows/.

Test plan

npm run lint clean
npm run test:smoke with no credentials → 16 skipped, 0 failed (all 4 describes skipped)
npm run test:smoke with a malformed Anthropic key → 4 failed (3 with 401 from anthropic.completeStructured wrapError, 1 from the root-cause-extract silent-failure assertion), 12 skipped
No artifacts left behind after a smoke run (.qualops/prompts/_smoke-*.md, tests/smoke/.tmp/, .qualops/reports/.smoke-* all cleaned by afterAll)
CI workflow dry-run after the workflow file is moved into .github/workflows/

github-actions · 2026-05-21T11:23:25Z

QualOps Code Quality Analysis

Status: ⚠️ WARNINGS - Medium severity issues found

Summary

Total Issues: 1
Critical: 0 🔴
High: 0 🟠
Medium: 1 🟡
Low: 0 🟢
Files Analyzed: 11

🟡 Medium Issues (1)

.github/workflows/provider-dialect-smoke.yml:56 - bug
GitHub Actions workflow inputs providers and model are silently ignored — the test spec does not parse CLI arguments

📊 Full Report

View detailed report

Powered by QualOps

…OPS-45) Automates the unchecked manual smoke item from PR #145's test plan: exercises the 4 AI caller stages migrated to native structured-output (file-reviewer, validation-resolver, dedup-resolver, root-cause-extract) against each real provider (anthropic, openai, bedrock, github) using one eval dataset entry as input. Validates plumbing only — that the provider-specific dialect path returns a zod-validated response without throwing. Output quality remains scoped to the deferred per-stage golden-evals follow-up. Why: PR #145 introduced six provider-dialect paths (OpenAI strict json_schema, OpenAI json_object fallback, Anthropic output_config, Anthropic tool_use fallback, Bedrock forced tool_use, GitHub Models via OpenAI-compatible) and four zod schemas. Unit tests cover each path with mocked SDKs; nothing exercises a full stage call end-to-end against a real provider. The risk surface is the stage × dialect matrix. Design: - Standalone tsx script at tests/smoke/provider-dialect-smoke.ts. Not a Jest spec — paid API calls must never enter the default npm test run. - Reuses evals/src/run-log.js for run-log shape + error classification. - Per-provider env-var presence determines skip vs attempt; the provider classes' own validateApiKey()/validateConfiguration() handle format validation, so a malformed CI secret surfaces as a real failure (classified errorCode) rather than a silent skip. - root-cause-extract uses AIFactory.createForStage('review') internally and swallows provider errors, so the harness writes a per-provider temp .qualopsrc.*.json, swaps ConfigService.setConfigPath(), and cross-checks token stats + classification distribution post-call to surface silent failures. - 4 stages × 4 providers = 16 calls per full run. Exit 0 if every attempted combination passed (or was skipped for missing credentials), 1 otherwise. Run log uploaded as CI artifact. CI lane: .github/workflows/provider-dialect-smoke.yml — manual workflow_dispatch + nightly cron at 03:17 UTC. Secret names mirror env- var names (secrets.ANTHROPIC_API_KEY, secrets.OPENAI_API_KEY, secrets.GITHUB_API_KEY, AWS_*) matching what src/config/env.ts reads at runtime. Concurrency-gated; not part of PR-blocking CI. Verified locally: - npm run lint clean - npm run test:smoke (no credentials) → 16 skips, exit 0 - npm run test:smoke with a malformed Anthropic key → 4 attempts, 4 fails (3 AUTH_FAILED + 1 UNKNOWN for the silent-fallback stage), exit 1 - Cleanup leaves no prompt files, no tmp configs, no leftover session Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses review comments on the smoke harness: 1. Provider/model configuration now flows through ConfigService instead of a hardcoded PROVIDER_DEFAULTS table local to the smoke harness. The spec writes a per-provider temp .qualopsrc.json under tests/smoke/.tmp/, calls ConfigService.setConfigPath(), and obtains the AIProvider via AIFactory.createForStage('review') — the same path production code uses. Pricing + model defaults come from PROVIDER_DEFAULTS in src/config/config.ts (with one inline default for GitHub Models, which is not in that table). 2. Standalone tsx script replaced with a Jest spec at tests/smoke/provider-dialect-smoke.spec.ts running under its own jest.smoke.config.ts. The base jest.config.js already constrains roots to tests/unit/, so this file is unreachable from the default `npm test` run — no testPathIgnorePatterns entry needed. `npm run test:smoke` uses the smoke config. Per-provider credential presence is checked at module load and missing-credential providers are statically marked describe.skip() so the entire 4-stage block shows up as Skipped in the test report rather than Pass. 3. Input is now a slice fixture under evals/datasets/inbox/smoke-sql-injection/ (slice.json + repo/ tree), loosely following TDR 0002 (docs/tdr/0002-evals-from-real-prs.md). The inbox dataset infrastructure from PR #152 has not landed yet, so this fixture is a self-contained smoke input; it slots into the new format if/when the slice harness lands. Workflow file is left in its current repo-root location for now; a follow-up with workflow-scoped credentials will move it back under .github/workflows/. Verified locally: - npm run lint clean - npm run test:smoke (no credentials) → 16 skipped, 0 failed - npm run test:smoke with malformed Anthropic key → 4 failed (3 with 401 from anthropic.completeStructured wrapError, 1 root-cause-extract caught by the token-stats silent-failure assertion), 12 skipped Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t convention

…s not an eval

…nv indirection

…lication - Load .env via dotenv in smoke.setup.ts before envConfig singleton initialises, so npm run test:smoke works without pre-exporting env vars in the shell - Remove the exists-guard in setupPrompts (files are always written and always cleaned up in afterAll, so the guard added complexity with no benefit) - Remove the separate system prompt fallback string; PROJECT_ROOT-relative readFile of the bundled quality.md is sufficient (file always present in source tree)

sebastianwessel requested a review from valdis May 21, 2026 11:23

valdis reviewed May 21, 2026

View reviewed changes

Comment thread tests/smoke/provider-dialect-smoke.ts Outdated

Comment thread tests/smoke/provider-dialect-smoke.ts Outdated

Comment thread tests/smoke/provider-dialect-smoke.ts Outdated

Comment thread tests/smoke/provider-dialect-smoke.ts Outdated

sebastianwessel changed the title ~~Feat/qualops 45 provider dialect smoke tests~~ Provider-dialect smoke harness for AI caller stages May 21, 2026

valdis force-pushed the feat/qualops-45-provider-dialect-smoke-tests branch 2 times, most recently from 8f94b19 to d1337fb Compare May 29, 2026 11:43

sebastianwessel and others added 9 commits June 4, 2026 16:27

chore: temp move workflow file into root

0b75708

chore: cleanup changelog

d134a9c

refactor(test/smoke): move setup file to tests/setup/ to match projec…

cce8070

…t convention

docs(evals): remove smoke cross-reference from evals/README — smoke i…

1b3a3cb

…s not an eval

docs: add smoke test section to root README

80cc83b

fix(ci): prevent script injection from workflow_dispatch inputs via e…

81e677f

…nv indirection

valdis force-pushed the feat/qualops-45-provider-dialect-smoke-tests branch from d1337fb to 0086bbc Compare June 4, 2026 13:28

valdis merged commit 0aa3a97 into main Jun 4, 2026
7 checks passed

valdis deleted the feat/qualops-45-provider-dialect-smoke-tests branch June 4, 2026 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provider-dialect smoke harness for AI caller stages#188

Provider-dialect smoke harness for AI caller stages#188
valdis merged 9 commits into
mainfrom
feat/qualops-45-provider-dialect-smoke-tests

sebastianwessel commented May 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sebastianwessel commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Approach

CI

Test plan

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QualOps Code Quality Analysis

Summary

🟡 Medium Issues (1)

📊 Full Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sebastianwessel commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading