Skip to content

verify: G5 smoke test (DO NOT MERGE)#1

Closed
TheDave94 wants to merge 1 commit into
mainfrom
verify/g5-smoke-test
Closed

verify: G5 smoke test (DO NOT MERGE)#1
TheDave94 wants to merge 1 commit into
mainfrom
verify/g5-smoke-test

Conversation

@TheDave94
Copy link
Copy Markdown
Owner

DO NOT MERGE. This PR is a deliberate end-to-end verification of .github/workflows/bake-gates.yml (G5).

What this changes

PrimaeNative/Resources/Letters/Regular/A/strokes.json — shifts cps[15:25] of A s0 (left diagonal) by 0.010 perpendicular to the stroke direction.

Expected workflow behavior

Locally-verified prediction:

Gate Result Detail
G1 (asymmetry drift) PASS A s0 Pearson 0.86 ≥ 0.2005 threshold
G3 (perpendicular deviation) FAIL A s0 deviation 2.69 px > 2.05 px threshold
G4 (junction kink drift) PASS A apex drift 0.004° ≤ 4.43°

Workflow should:

  • Fire on PR open (path filter matches strokes.json change)
  • Run all three gate steps sequentially
  • Exit non-zero overall because of G3 failure
  • Upload bake-gate-results artifact with per-gate JSON

Verification plan

After CI runs:

  1. Confirm workflow fired
  2. Confirm G3 step reported A s0 failure with the expected deviation magnitude
  3. Confirm G1 and G4 steps passed
  4. Confirm the workflow exited non-zero
  5. Confirm JSON artifacts were uploaded
  6. Close this PR without merging; delete the branch

TheDave94 pushed a commit that referenced this pull request May 24, 2026
G5's bake-gates.yml workflow was missing scikit-image, which
scripts/generate_strokes_auto.py imports unconditionally at
module load time. The gates don't use skimage directly, but
run_gates.py imports generate_strokes_auto for rasterize() and
bbox_from_mask(), which triggers the load-time import.

Caught by G5 verification PR #1 (deliberate gate violation;
expected G3 failure but workflow exited at G1's module load
with ModuleNotFoundError: No module named 'skimage').

Matches ios-build.yml stroke_audit job's dep list (plus scipy
which audit_invariants.py needs for distance_transform_edt).

Dep audit confirmed scikit-image is the only missing entry;
all other top-level imports (pillow, numpy, scipy, fonttools)
were already present.

Future cleanup (out of scope): generate_strokes_auto.py should
move the skimage import inside the bake-pipeline functions
that actually use it, decoupling gate code from bake-code
dependencies. Flagged for future refactor.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@TheDave94 TheDave94 force-pushed the verify/g5-smoke-test branch from d382ac2 to 91f156c Compare May 24, 2026 18:33
TheDave94 pushed a commit that referenced this pull request May 24, 2026
         classifier — catches smooth long curves wrongly
         admitted by max+p95 alone

Caught during G5 verification (PR #1, 2026-05-24): three
strokes in the full 59-letter corpus (Y s0, Y s1, g s1)
classified as STRAIGHT under the existing max+p95 criterion
but had perpendicular deviations of 24-70 px. Visual
rendering confirmed these are correctly-drawn smooth long
curves, not bake artifacts. The classifier hole: smooth
curves at N=100 resample have per-segment angles ~0.03 rad
(below max threshold) and p95 ~0.08 (below p95 threshold),
but accumulate substantial net direction change.

Fix: add a third criterion to is_straight requiring
|signed_cumulative_angle| < π/12 (15°). Empirically derived
from full-corpus diagnostic; sits in the 22.9°-wide gap
between the last well-behaved STRAIGHT stroke (ä s1 at 4.7°)
and the first offender (Y s1 at 27.6°).

Properties:
- N-invariant (zero-mean noise cancels at any N; unlike
  unsigned cumulative)
- Preserves all 8 calibration corpus STRAIGHT strokes
  (max |signed_cum| in corpus: 0.023 rad / 1.3°)
- Correctly filters Y s0 (27.6°), Y s1 (30.2°), g s1 (116°)
- Threshold of record (2.05 px) unchanged — Y/g were
  previously wrongly admitted; they're now vacuous

Methodology note: fifth instance of design-prediction-meets-
data in Phase 2b Track B. Predicted classifier was adequate
for full 59-letter corpus; verification PR's deployment to
all letters falsified the prediction; refined criterion
derived from data. The "predict explicitly, verify
empirically, refine when data falsifies" methodology now has
five trail markers.

Files in this commit:
- scripts/audit_invariants.py: G3_STRAIGHTNESS_SIGNED_CUM_RAD
  constant (=π/12, N-invariance documented); _stroke_angle_stats
  returns (max, p95, signed_cum); gate_g3_per_stroke straightness
  check uses three-part AND; result dict carries signed_cum_ref
- scripts/tests/test_gate_g3.py: two new tests (smooth-long-curve
  vacuous via signed_cum; truly-straight-with-zero-mean-noise
  still passes). 13 G3 tests total (was 11).
- research_data/phase2b_gates/g3_design.md: "Refinement caught
  during G5 verification" subsection within G3.1 caveat — full
  diagnostic data, N-invariance rationale, 8-corner classification
  matrix
- research_data/phase2b_gates/g3_calibration_run.md: "Post-
  deployment refinement" section — calibration corpus check
  (all 8 strokes preserved), threshold-of-record unchanged
- docs/BAKE_INVARIANTS.md: Threshold 3 criterion expanded to
  three-part AND with cross-reference

All 50 tests pass (13 G1 + 9 G2 + 13 G3 + 15 G4). Smoke test:
G3 on Y/g now correctly vacuous; G3 sweep over all 59 letters
on main: 59/59 pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DO NOT MERGE. This branch is a throwaway end-to-end CI
verification for .github/workflows/bake-gates.yml (G5).

Modifies cps[15:25] of A s0 (left diagonal) by 0.010 perpendicular
to the stroke direction. Locally-verified expected behavior:

  G1: PASS (Pearson 0.86 ≥ 0.2005)
  G3: FAIL (deviation 2.69 px > 2.05 px threshold)
  G4: PASS (junction kink drift 0.004° ≤ 4.43°)

Workflow should fire on PR-open (path filter matches), all three
gates should run, only G3 should fail, exit code should be
non-zero, JSON artifacts should upload. PR will be closed without
merging once verification is captured.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@TheDave94 TheDave94 force-pushed the verify/g5-smoke-test branch from 91f156c to 6131f98 Compare May 24, 2026 22:32
@TheDave94
Copy link
Copy Markdown
Owner Author

Verification complete. Workflow confirmed: fires on PR open, runs G1/G3/G4, catches the deliberate A violation in G3 (dev=2.68 px > 2.05 px threshold), exits non-zero, uploads JSON artifacts. Two sidebar findings landed on main during verification: scikit-image dep fix (commit 987b0bc) and G3 classifier refinement with signed-cumulative criterion (commit c4c143b). Closing without merging.

@TheDave94 TheDave94 closed this May 24, 2026
@TheDave94 TheDave94 deleted the verify/g5-smoke-test branch May 24, 2026 22:33
TheDave94 pushed a commit that referenced this pull request May 24, 2026
PR #1 (verify/g5-smoke-test) closed without merging after
end-to-end verification of .github/workflows/bake-gates.yml.

Workflow confirmed operational:
- Fires on PR-open when strokes.json or gate code changes
- Runs G1/G3/G4 sequentially via run_gates.py
- Catches deliberate violations (A s0 dev=2.68 px caught by G3)
- Exits non-zero on gate failure (merge-blocker semantics)
- Uploads bake-gate-results JSON artifact

Two sidebar fixes landed on main during verification:
- 987b0bc: scikit-image added to CI deps (was missing)
- c4c143b: G3 classifier extended with signed-cumulative
  criterion to filter smooth-long-curves (Y, g) the
  max+p95 alone admitted

g5_verification.md documents the full verification trail
including methodology-chapter content: fifth instance of
design-prediction-meets-data in Phase 2b Track B (G3
classifier hole surfaced by G5 deployment to all 59 letters,
not just the 13-letter calibration corpus).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant