Skip to content

SELL-362 dogfood: doctor accuracy (R2-B1/B2/B5) + test-order model correction#55

Merged
next-devin merged 4 commits into
mainfrom
fix/sell-362-r2-b1-doctor-demo-ref-provenance
Jun 1, 2026
Merged

SELL-362 dogfood: doctor accuracy (R2-B1/B2/B5) + test-order model correction#55
next-devin merged 4 commits into
mainfrom
fix/sell-362-r2-b1-doctor-demo-ref-provenance

Conversation

@next-devin
Copy link
Copy Markdown
Contributor

@next-devin next-devin commented Jun 1, 2026

Addresses the campaigns-os-side findings from the SELL-362 Round 2 dogfood (result comment), plus a corrected model for test-order gating that surfaced during review.

Part 1 — doctor accuracy (the warning-noise findings)

R2-B1 — stop flagging spec-declared refs as starter demo refs

Doctor flagged 21 real low-integer API ref_ids as starter demo refs (57% of warnings were noise). specDeclaredCommerceRefs() now treats a ref matching a package/offer/shipping_method the spec declares as real; dangling refs still flag.

R2-B2 — defer routing-meta + currency to built output

Doctor kept warning on spec/source state the build had already fixed. Routing-meta now defers to the authoritative built-output check; currency partitions source vs. built (warns only when built output has HTML and is dirty).

R2-B5 — under-configured store profile (new readiness warning)

Placeholder store_url and empty available_payment_methods now warn — framed as real-shopper readiness (a real customer can't transact), decoupled from test-order proof.

Part 2 — test-order model correction

The prior model treated typed-card test orders as order-mutating, merchant-risk, approval-gated events requiring --allow-test-orders + --sandbox-test-card-confirmed + packet policy + a domain allowlist + an approved order count. Per the admin-api testing guide, that's wrong: global test cards bypass the gateway, create no transactions, need no merchant setup, and are safe on live stores. The only real control is depth; the only artifact is one reusable customer.

  • Gating removed: --test-order <mode> is sufficient — no permission flags, no packet-policy gate, no doctor "test orders not enabled / avoid mutations" warning, no orchestration block. (allowed_domains_confirmed is untouched — it governs SDK origin loading, a separate real concern. qa policy set + packet qa.* booleans stay as informational metadata.)
  • Stable test customer: both code paths minted a fresh undeletable customer every run. Now resolve to ONE stable address: --test-email > CAMPAIGNS_OS_QA_TEST_EMAIL > stable synthetic fallback. The real deliverable inbox stays in the internal skill (the public package can't hardcode that domain — private-string guard — and shouldn't).
  • common preset (new default): checkout + first-upsell accept/decline + one deeper mixed path on 2+ offers (3-5 shapes). full stays the explicit every-permutation opt-in. --max-test-orders reframed as an accidental-flood guard.
  • Docs/skills/agents: rewrote the QA doc, bundled skills, and all four agent-instruction flavors (claude/codex/cursor/copilot) to the corrected model.

Tests

npm run check green: 30 tests + fixtures / private-string / template-doctrine gates.
New: doctor-demo-ref (5), doctor-built-output (4), doctor-store-profile (5), stable-email + common-preset coverage in qa-browser.test.

Follow-ups (separate, not this repo)

  • Internal next-campaigns-ops QA skill/README still carry the old approval/allowlist phrasing and the real --test-email inbox — needs the same model correction (separate PR; that's where the deliverable address lives).
  • R2-B1 export-side (ref-level provenance on Map export) → Map Builder / nc-campaigns-proxy.
  • R2-B3 (lint-sdk.mjs olympus-only) → target page-kit / meridian.
  • R2-B4 (candidate→source handoff) → process / SELL-361.

🤖 Generated with Claude Code

next-devin and others added 2 commits June 1, 2026 20:42
…2-B1)

SELL-362 dogfood (Sasa) found doctor flagging 21 real low-integer
Campaigns-API ref_ids ("1"/"2") as "starter-looking demo refs" — 57% of
warnings were noise on a valid spec, training operators to ignore the very
warning meant to catch real placeholders.

Root cause: collectDemoRefHits suppressed a hit only when an ancestor carried
`_provenance.api`, but the Map exporter does not stamp provenance down to the
ref level, so valid refs whose value happened to match a starter demoOnlyValue
were flagged.

Fix: add a second escape hatch — suppress a hit when its value matches a ref
the spec itself declares as a real commerce entity (package / offer /
shipping_method), via a new specDeclaredCommerceRefs() built on the existing
specPackageRefs/specShippingRefs helpers. A ref that points at a declared
entity is legitimate regardless of ref-level provenance stamping; refs that
point at nothing the spec defines are still flagged, so real signal is kept.

- src/cli.mjs: specDeclaredCommerceRefs(); collectDemoRefHits() suppression +
  export for testing
- src/doctor-demo-ref.test.mjs: 5 unit tests (declared package/shipping
  suppressed; dangling ref still flagged; provenance escape intact; empty
  vocab) wired into `npm run check`

The export-side half of R2-B1 (stamp ref_id-level provenance on Map export)
remains a separate Map Builder / proxy task.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-B2)

SELL-362 dogfood (Sasa): after a correct build, doctor kept emitting
"fix before QA" warnings computed from the spec/source even though the
build had already fixed them in the output — browser QA later proved the
deployed output was right. Same noise class as R2-B1.

Two emitters fixed:

- Routing meta: validateSpecRoutingMetaTags flagged the spec's unrooted
  meta-tag *hint* literals, but the page-kit build roots them and
  validateBuiltSdkMetaTags checks the rendered _site/<slug>/ authoritatively.
  Now defers to that built-output check once assembly is complete and the
  rendered site exists.

- Hardcoded currency: validateMarketSensitiveCopy scanned source+target
  lumped together, so raw $ in the source input still warned after the build
  tokenized it. Now partitions by surface: when the built campaign output
  contains HTML and is currency-clean, source-only residue is downgraded to
  an info note; warnings on the built output still fire. Guards on the target
  actually containing built HTML (an empty output dir means the build has not
  run, so source warnings stand) — caught by the existing market-copy fixture.

- src/doctor-built-output.test.mjs: 4 unit tests (routing defers w/ built
  site, warns without; currency source-only info vs. built-output warn) wired
  into `npm run check`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@next-devin next-devin changed the title fix(doctor): stop flagging spec-declared refs as starter demo refs (R2-B1) fix(doctor): reduce dogfood warning noise — R2-B1 + R2-B2 Jun 1, 2026
@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot Bot commented Jun 1, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Overview

Severity Count
CRITICAL 0
WARNING 0
SUGGESTION 0

Review Details

Incremental review of PR #55 (commit 5629e34) covering the test-order model corrections: removal of approval gates, stable test customer email, and common default mode. All changes are additive improvements building on the previously reviewed R2-B1/R2-B2/R2-B5 fixes.

Test-Order Model Changes: The common preset correctly maps to a 3-5 shape sample (checkout baseline + first-upsell accept/decline + one deeper mixed path when depth >= 2). The --test-order boolean-true case correctly falls through to common. Permission flag removal is consistent across browser and legacy API paths.

Stable Test Email: testEmail correctly resolves to a single stable address with precedence: explicit flag > env var > default fallback. No per-run unique emails are generated.

R2-B5 (validateSpecStoreProfile): No regressions from the new commit. Function signature correctly includes warnings parameter at all call sites.

R2-B1/R2-B2: No regressions. Declared commerce refs exclusion and built-output deferral logic remain intact.

Files Reviewed (10 files)
  • src/cli.mjs - 0 issues
  • src/qa-browser.mjs - 0 issues
  • src/qa-browser.test.mjs - 0 issues
  • src/qa-node.mjs - 0 issues
  • src/doctor-store-profile.test.mjs - 0 issues
  • src/doctor-demo-ref.test.mjs - 0 issues
  • src/doctor-built-output.test.mjs - 0 issues
  • skills/next-campaigns-os/SKILL.md - 0 issues
  • skills/next-campaigns-qa/SKILL.md - 0 issues
  • docs/qa-and-test-orders.md - 0 issues

Reviewed by minimax-m2.7 · 1,063,683 tokens

SELL-362 dogfood: the test store had store_url=https://localhost:3000/ and
available_payment_methods: [] — a store that passes doctor's required-field
check but cannot complete a real order. Neither doctor nor browser QA
surfaced it, so it would only fail at typed-card proof time, for store-config
reasons rather than campaign reasons.

Extend validateSpecStoreProfile with two readiness *warnings* (not blockers,
so routing/visual-only runs are unaffected):

- spec.store_profile.placeholder_store_url: store_url is a local/dev host or a
  reserved test/example domain (conservative match; real merchant domains do
  not trip it).
- spec.store_profile.no_payment_methods: available_payment_methods is an
  explicit empty array. Absent != empty, so an unconfigured field does not
  warn.

This makes store-config risk visible *before* order-proof approval — directly
relevant to the typed-card allowlist decision.

- src/doctor-store-profile.test.mjs: 5 unit tests, wired into `npm run check`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@next-devin next-devin changed the title fix(doctor): reduce dogfood warning noise — R2-B1 + R2-B2 doctor: accuracy fixes from SELL-362 dogfood (R2-B1, R2-B2, R2-B5) Jun 1, 2026
… customer, `common` default

The prior model treated typed-card test orders as order-mutating, merchant-risk,
approval-gated events. They are not: global test cards bypass the payment gateway,
create no transactions, need no merchant setup, and are safe on live stores
(developer-docs admin-api testing guide). The only real control is depth, and the
only artifact is one reusable customer.

Execution gating removed:
- qa-node maybeRunTestOrders / maybeRunLegacyApiTestOrders no longer require
  --allow-test-orders / --sandbox-test-card-confirmed or packet qa.* policy
  booleans. `--test-order <mode>` is sufficient intent.
- doctor no longer emits the "test orders not enabled / avoid mutations" warning
  or the sandbox-confirmation error; buildNextStep no longer blocks the
  test-orders stage on those codes. (allowed_domains_confirmed is untouched — it
  governs SDK origin loading, a separate real concern.)
- The packet qa.* booleans and `qa policy set` remain as informational metadata
  (back-compat, fixtures), but no longer gate anything.

Stable test customer (the per-run-unique bug):
- Both code paths minted a fresh, undeletable customer every run
  (qa+campaigns-os-<runId>-<ts>@example.com). Now resolve to ONE stable address:
  --test-email > CAMPAIGNS_OS_QA_TEST_EMAIL > stable synthetic fallback. The real
  deliverable inbox stays in the internal next-campaigns-ops skill (the public
  package can't hardcode it — private-string guard bans that domain — and
  shouldn't).

Depth as the control:
- New `common` preset (also bare `--test-order`): checkout baseline + first-upsell
  accept/decline + one deeper mixed path when 2+ offers (3-5 shapes, under the
  flood cap). `full` stays the explicit every-permutation opt-in. --max-test-orders
  is reframed as an accidental-flood guard, not a permission gate.

Also reframes R2-B5 store-profile warnings to the real-shopper angle (decoupled
from test-order proof), and rewrites the docs, bundled skills, and all four
agent-instruction flavors (claude/codex/cursor/copilot) to the corrected model.

Tests: stable-email + `common`-preset coverage added; help-string fixture updated.
`npm run check` green (30 tests + all gates).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@next-devin next-devin changed the title doctor: accuracy fixes from SELL-362 dogfood (R2-B1, R2-B2, R2-B5) SELL-362 dogfood: doctor accuracy (R2-B1/B2/B5) + test-order model correction Jun 1, 2026
@next-devin next-devin merged commit 0bb703f into main Jun 1, 2026
2 checks passed
@next-devin next-devin deleted the fix/sell-362-r2-b1-doctor-demo-ref-provenance branch June 1, 2026 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant