refactor(test): derive test_seo sample from prerender + harden deploy smoke test by miquelmatoses · Pull Request #37 · Cercol/cercol

miquelmatoses · 2026-05-28T20:44:05Z

Summary

Two reliability follow-ups from the 2026-05-26 audit.

1. `test_seo.py` sample derived from the prerender (no hardcoded slugs)

The audit found SAMPLED_ROUTES contained three blog slugs that no longer existed. CI never caught it because these tests skipif when dist/ is absent and CI does not run build:full on the backend job. PR #36 swapped the dead slugs by hand, but the fragile pattern remained.

Now the sample is derived from the prerendered dist/ at import time:

def _discover_routes() -> list[str]:
    if not _has_prerendered():
        return []
    routes = [r for r in TOP_LEVEL_CANDIDATES if _html_path(r).is_file()]
    for lang in LANGS:
        blog_dir = DIST / "blog" if lang == "" else DIST / lang / "blog"
        slugs = sorted(p.name for p in blog_dir.iterdir()
                       if p.is_dir() and (p / "index.html").is_file())
        if slugs:
            prefix = "blog" if lang == "" else f"{lang}/blog"
            routes.append(f"{prefix}/{slugs[0]}")
    return routes

Deterministic (alphabetical, first per category) so it fails identically for everyone.
Degrades gracefully: empty when dist/ is not built (skipif still fires), individual top-level pages skipped if missing instead of a hard FileNotFoundError.
Coverage 7 -> 13 routes (one blog article per language present).

Robustness gate (verified locally): removing the currently-sampled article from dist/ keeps the suite green, because the sample re-derives to the next existing article.

$ mv dist/blog/anonymity-...-why-it-matters /tmp/   # simulate vanished slug
$ pytest -q api/tests/test_seo.py
57 passed   # sample auto-advanced to big-five-personality-across-cultures-...

2. Drift inventory (FASE 2)

Location	Finding	Action
`api/tests/test_seo.py`	3 dead hardcoded blog slugs	Fixed — dynamic derivation
`api/jobs/pagespeed_ingest.py` `SEED_URLS`	2 dead fallback blog slugs (PSI would measure 404 pages)	Fixed — replaced with live articles + rot-risk comment
`scripts/update_blog_article_{2,3}.py`	body cross-link to `/blog/big-five-vs-disc-vs-belbin` (dead)	Documented, not fixed — run-once historical generators; editing them does not change already-published content. This is content debt (a broken internal link in published articles), tracked separately, not test fragility
`api/seo_mcp/`, `src/`	no hardcoded article slugs found	none

3. Backend deploy smoke test hardened

PR #36's backend deploy went green on the server (Caddy validated, service active, api.cercol.team/blog 200, crawl_logs flowing) but the Action failed: the smoke test waited only ~10s (5 x 2s) while uvicorn took ~31s to boot two workers on the load-saturated shared VPS (the YELLOW 6 audit finding). False negative. Window is now 20 x 3s = up to 60s.

Test plan

pytest -q api/tests/test_seo.py (full dist) — 57 passed
Robustness gate — remove sampled article, still 57 passed
pytest -q api/ — 176 passed
vitest run — 204 passed
ruff — covered by CI (not installed locally)
CI frontend job runs build:full + test_seo against the complete dist (authoritative gate)

No production behavior change. Test + CI-config only; the auto-triggered backend deploy should now pass its (longer) smoke test.

🤖 Generated with Claude Code

…smoke test test_seo.py no longer hardcodes blog slugs that rot when content is renamed (the audit found three dead slugs that CI never caught because it skips when dist/ is absent). SAMPLED_ROUTES is now derived from the prerendered dist/ at import time: every existing top-level page plus the first blog article alphabetically for each language present. The selection is deterministic and degrades gracefully (empty when dist/ is not built, individual pages skipped if missing), so a vanished article can never break the suite again. Coverage went from 7 to 13 routes (one article per language). Robustness gate: removing the sampled article from dist/ keeps the suite green because the sample re-derives from what remains. FASE 2 drift sweep also fixed pagespeed_ingest.py SEED_URLS, whose two fallback blog slugs were dead (PSI would have measured 404 pages); replaced with live articles and a comment on the rot risk. Remaining known drift documented in the PR (historical update_blog_article_*.py scripts cross-link a non-existent slug; that is content debt, not test fragility, and editing the run-once scripts would not change published content). Also hardens the backend deploy smoke test: the 2026-05-28 deploy of PR #36 went green on the server (Caddy valid, service active, /blog 200) but the Action failed because the smoke test only waited ~10s while uvicorn took ~31s to boot two workers on the load-saturated shared VPS. The window is now 20 x 3s = up to 60s, which tolerates the cold start without masking a real outage. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

miquelmatoses merged commit 3c477a5 into main May 28, 2026
10 of 11 checks passed

miquelmatoses deleted the refactor/test-seo-dynamic-sampling branch May 28, 2026 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(test): derive test_seo sample from prerender + harden deploy smoke test#37

refactor(test): derive test_seo sample from prerender + harden deploy smoke test#37
miquelmatoses merged 1 commit into
mainfrom
refactor/test-seo-dynamic-sampling

miquelmatoses commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

miquelmatoses commented May 28, 2026

Summary

1. test_seo.py sample derived from the prerender (no hardcoded slugs)

2. Drift inventory (FASE 2)

3. Backend deploy smoke test hardened

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `test_seo.py` sample derived from the prerender (no hardcoded slugs)