Skip to content

fix(ci): clear base security and smoke blockers#8461

Open
scarmani wants to merge 15 commits into
mainfrom
codex/base-ci-8454-security-smoke
Open

fix(ci): clear base security and smoke blockers#8461
scarmani wants to merge 15 commits into
mainfrom
codex/base-ci-8454-security-smoke

Conversation

@scarmani

Copy link
Copy Markdown
Collaborator

Summary

  • refresh locked cryptography and starlette versions so the project pip-audit gate clears current vulnerabilities
  • raise the CI install cryptography security floor to the fixed family
  • make tests/routing/test_domain_matcher_root.py self-contained when optional anthropic is not installed

Validation

  • uv lock --check
  • python scripts/run_pip_audit_gate.py
  • forced missing-anthropic import simulation for tests.routing.test_domain_matcher_root
  • python scripts/run_test_baseline.py --no-clean-check tests/routing/test_domain_matcher_root.py
  • python -m pytest -p no:rerunfailures tests/routing/test_domain_matcher_root.py -q
  • python -m ruff check tests/routing/test_domain_matcher_root.py
  • bash -n scripts/ci_install_project.sh
  • git diff --check
  • bash scripts/automation_pr_preflight.sh origin/main HEAD

Notes: full local bash scripts/test_tiers.sh smoke got past the prior collection/import blocker but was manually interrupted after large collection/runtime exceeded the useful local validation window.

@scarmani

Copy link
Copy Markdown
Collaborator Author

Codex independent semantic review on head 1f3fefe

Reviewer harness: codex
Model family: openai
Model id: GPT-5 Codex
Receipt artifact: /private/tmp/aragora-8461-evidence-artifacts/codex-openai-review.md

Verdict: PASS

I reviewed only the exact-head diff for PR #8461:

  • scripts/ci_install_project.sh
  • tests/routing/test_domain_matcher_root.py
  • uv.lock

Findings: no blocking correctness or security issue found in the exact-head diff. The cryptography floor moves past the cited security floor, the lock resolves to cryptography 49.0.0, the local pip-audit gate reports no known vulnerabilities, and the TextBlock fallback is scoped to the routing test module without overwriting a real anthropic install.

Focused adversarial dogfood: I checked the two risk paths this PR is meant to cover. First, the local security gate passes with No known vulnerabilities found. Second, tests/routing/test_domain_matcher_root.py passes without relying on an installed anthropic package, so the collection/import fallback addresses the smoke/baseline failure mode without changing product routing behavior.

Local validation:

  • /Users/armand/.pyenv/versions/3.11.11/bin/python -m pytest tests/routing/test_domain_matcher_root.py -q -> 37 passed.
  • /Users/armand/.pyenv/versions/3.11.11/bin/python scripts/run_pip_audit_gate.py -> No known vulnerabilities found.

@scarmani

Copy link
Copy Markdown
Collaborator Author

Claude independent semantic review on head 1f3fefe

Reviewer harness: claude
Model family: claude
Model id: Claude Code 2.1.178 using sonnet
Receipt artifact: /private/tmp/aragora-8461-evidence-artifacts/claude-review.md

Verdict: PASS

Claude reviewed only the exact-head patch for PR #8461, focusing on security dependency drift, CI install compatibility, and the anthropic.types.TextBlock fallback. Claude did not request changes.

Findings: no blocking issue was identified. Claude noted non-blocking compatibility cautions around cryptography platform wheel coverage, partial anthropic installs where anthropic.types exists without TextBlock, Python 3.8 or 32-bit Windows source-build fallback, and the starlette minor bump. These cautions do not overturn the PASS verdict for this PR's intended CI/security repair.

Focused adversarial dogfood: Claude specifically stress-tested Intel macOS wheel fallback, sys.modules stub persistence when anthropic is absent, and partial anthropic install behavior. The local receipt pairs that semantic review with exact-head validation: the routing test file passed 37 tests and the pip-audit gate reported no known vulnerabilities.

Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
@scarmani

Copy link
Copy Markdown
Collaborator Author

OpenAI focused dogfood evidence

Reviewer: openai (codex) - focused current-head dogfood run by Codex/OpenAI on the exact PR head.
PR: #8461
Head: 49d8562
Model family: openai
Evidence type: focused dogfood
Verdict: PASS

dogfood: yes

Focused validation run from detached exact-head worktree /private/tmp/aragora-8461-evidence-49d.nRyqLS:

  • bash -n scripts/ci_install_project.sh passed, proving the CI install helper remains syntactically valid.
  • python3 -m pytest -q tests/routing/test_domain_matcher_root.py passed: 37 passed in 2.24s, proving the routing root behavior and import fallback stay healthy without requiring the Anthropic package for this test root.
  • Static assertion passed that scripts/ci_install_project.sh contains cryptography>=48.0.1,<50.0 and no longer contains cryptography>=46.0.7,<48.0, proving the CI install security floor repair is present at this head.

No blocker found.

@scarmani

Copy link
Copy Markdown
Collaborator Author

Claude independent model review

Reviewer: claude (anthropic) - independent adversarial model review via the Aragora Claude reviewer, grounded on the exact PR head.
Head: 49d8562 (49d8562), committed 2026-06-17T00:12:49Z.
PR: #8461.
Model family: claude

Verdict: PASS

  • No blocking issues. Both changes are minimal and well-scoped.
  • scripts/ci_install_project.sh:24 - the >=48.0.1,<50.0 floor matches the documented constraint in pyproject.toml:98-115 for the GHSA-537c-gmf6-5ccf fix. Aligned.
  • tests/routing/test_domain_matcher_root.py:15-29 - stub registration uses sys.modules.setdefault, so a real anthropic install wins. The stub's dataclass supplies both .type and .text and is registered as anthropic.types.TextBlock, so production's lazy from anthropic.types import TextBlock plus isinstance(first_block, TextBlock) check resolves to the same class object.
  • [P3] tests/routing/test_domain_matcher_root.py:17 - except ModuleNotFoundError will not catch ImportError if anthropic.types exists but TextBlock is absent. Broadening to ImportError would be more defensive, but this is non-blocking for the intended CI dependency-absence repair.

dogfood: yes

@scarmani

Copy link
Copy Markdown
Collaborator Author

OpenAI independent model review

Reviewer: openai (openai) - independent adversarial model review via Codex CLI OpenAI harness, grounded on the exact PR head.
Head: 49d8562 (49d8562), committed 2026-06-17T00:12:49Z.
PR: #8461.
Model family: openai

Verdict: PASS

  • No blocking issues found.
  • [P3] tests/routing/test_domain_matcher_root.py:13 - the fallback stubs only ModuleNotFoundError; if anthropic is installed but anthropic.types.TextBlock moved or was renamed, this would still fail. This is acceptable for the current CI dependency-absence scope, but ImportError would be a slightly more robust guard.

dogfood: yes

@scarmani scarmani marked this pull request as ready for review June 17, 2026 03:11
@scarmani scarmani requested a review from an0mium as a code owner June 17, 2026 03:11
Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Aragora Code Review

Advisory-only review. Findings are surfaced for follow-up and do not fail this workflow.

Ranked High-Level Tasks

  1. Remove unconditional result.status = "completed" and result.final_answer = synthesis overwrite in aragora/golden.py:debate() to preserve consensus-phase final answers — Verify: pytest tests/debate/test_consensus.py -v
  2. Fix non-numeric Content-Length handling in aragora/server/handlers/base.py:_read_json_body_value() by wrapping int(content_length_header or 0) in try/except ValueError — Verify: pytest tests/server/test_handlers_base.py -v
  3. Eliminate circular-import risk from _restore_root_golden_review_export() calls in aragora/review/protocol.py, aragora/review/provider_slots.py, aragora/review/reviewer_output.py — Verify: pytest tests/review/test_protocol.py -v
  4. Restore if not result.final_answer: guard in aragora/debate/phases/synthesis_generator.py:generate_mandatory_synthesis() to avoid clobbering existing answers — Verify: pytest tests/debate/test_synthesis_generator.py -v
  5. Replace metaclass __getattribute__ hot-path in aragora/__init__.py:_AragoraModule with __getattr__ to avoid per-attribute O(n) sys.modules scan — Verify: pytest tests/test_init.py -v
  6. Guard result.status mutation in aragora/golden.py:debate() against missing/None result attribute — Verify: pytest tests/test_golden.py -v
  7. Verify supabase init reorder in aragora/persistence/supabase_client.py:__init__() does not leak partial state when _ensure_supabase() fails after URL/key set — Verify: pytest tests/persistence/test_supabase_client.py -v

Suggested Subtasks

  • Revert result.status = "completed" line in aragora/golden.py:debate() and add regression test asserting status reflects real arena outcome — Verify: pytest tests/test_golden.py::test_status_not_forced -v
  • Wrap int(content_length_header or 0) in try/except returning HTTP 400 in aragora/server/handlers/base.py — Verify: pytest tests/server/test_handlers_base.py::test_invalid_content_length -v
  • Add if not result.final_answer: guard at second synthesis write in aragora/debate/phases/synthesis_generator.py line ~228 — Verify: pytest tests/debate/test_synthesis_generator.py::test_consensus_answer_preserved -v
  • Convert _AragoraModule.__getattribute__ to module-level __getattr__ in aragora/__init__.py — Verify: pytest tests/test_init.py::test_golden_collision_lazy -v
  • Add idempotency/recursion guard to _restore_root_golden_review_export() in aragora/review/__init__.py — Verify: pytest tests/review/test_protocol.py::test_no_circular_import -v
  • Cache getattr(psycopg2, "Error", Exception) once in aragora/storage/backends.py instead of per-connection — Verify: pytest tests/storage/test_backends.py -v

Owner module / file paths

  • aragora/golden.py
  • aragora/server/handlers/base.py
  • aragora/review/__init__.py
  • aragora/review/protocol.py
  • aragora/review/provider_slots.py
  • aragora/review/reviewer_output.py
  • aragora/debate/phases/synthesis_generator.py
  • aragora/__init__.py
  • aragora/persistence/supabase_client.py
  • aragora/storage/backends.py

Test Plan

  • pytest tests/test_golden.py -v --cov=aragora/golden --cov-fail-under=85
  • pytest tests/server/test_handlers_base.py -v --cov=aragora/server/handlers/base --cov-fail-under=80
  • pytest tests/debate/test_synthesis_generator.py -v --cov=aragora/debate/phases/synthesis_generator --cov-fail-under=85
  • pytest tests/test_init.py tests/review/test_protocol.py -v
  • pytest tests/persistence/test_supabase_client.py tests/storage/test_backends.py -v
  • Assert: malformed Content-Length: abc returns HTTP 400, not 500
  • Assert: consensus-phase final_answer is not overwritten by synthesis when already set
  • Coverage threshold: >= 85% on all modified source files

Rollback Plan

  • Trigger: If pytest tests/debate/ tests/test_golden.py -v fails or coverage drops below 85%
  • Action: git revert HEAD to undo the synthesis/golden changes commit
  • Trigger: If circular-import errors surface during python -c "import aragora.review.protocol" in CI
  • Action: git revert HEAD removing the _restore_root_golden_review_export() import-time calls
  • Trigger: If p95 import time of import aragora exceeds 500ms after metaclass change
  • Action: Revert aragora/__init__.py via git revert HEAD && make test

Gate Criteria

  • Test coverage >= 85% on all modified files
  • Malformed Content-Length error rate handled: 0 unhandled ValueError (500 responses) on fuzz input
  • import aragora cold-import time p95 <= 500ms
  • 0 new lint errors (ruff check passes clean, 0 errors)
  • 0 circular-import failures across aragora.review.* submodules
  • All 7 required section headers present (== 7)
  • Synthesis regression suite passes 100% (0 failures) for consensus-answer-preservation tests

JSON Payload

{
  "tasks": 7,
  "subtasks": 6,
  "owner_files": 10,
  "coverage_threshold": 85,
  "import_time_threshold_ms": 500,
  "unhandled_error_threshold": 0,
  "circular_import_threshold": 0,
  "lint_error_threshold": 0,
  "required_headers": 7
}

1 finding(s) across the diff

[CRITICAL] Finding

Finding


Generated by Aragora multi-agent code review

scarmani and others added 2 commits June 17, 2026 02:12
Keep the fallback TextBlock shim local to the mocked LLM tests so an absent Anthropic SDK does not leave a hollow anthropic package in sys.modules for later tests.

Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
@scarmani

Copy link
Copy Markdown
Collaborator Author

Claude independent model review

Reviewer: claude (anthropic) — independent adversarial model review via the Aragora Claude reviewer, grounded on the exact PR head.
Head: 231d49d (231d49d), committed 2026-06-17T07:15:31Z.
PR: #8461.
Model family: claude

Verdict: PASS

  • No blocking issues.
  • [P3] scripts/ci_install_project.sh:24 — bump aligns with pyproject.toml:115 (cryptography>=48.0.1); pyproject's note explicitly confirms no downstream pin caps the bump, so resolver risk is low. Upper bound widening to <50.0 is consistent with project policy and forward-compatible.
  • [P3] tests/routing/test_domain_matcher_root.py:308-327 — fixture correctly uses monkeypatch.setitem so the stub auto-clears after each test; the new test_anthropic_fallback_is_not_registered_globally guard validates the leak fix. Stub TextBlock(type, text) matches the production isinstance check at aragora/routing/domain_matcher.py:478 because the stub module is what gets imported in that runtime path during the test — consistent identity.
  • [P3] Minor: NEEDS_ANTHROPIC_TYPES_STUB is decided once at import time; harmless given the test fixture is a no-op when anthropic is real.

dogfood: yes

@scarmani

Copy link
Copy Markdown
Collaborator Author

Grok independent model review

Reviewer: grok (xai) — independent adversarial model review via Grok Build CLI harness, grounded on the exact PR head.
Head: 231d49d (231d49d), committed 2026-06-17T07:15:31Z.
PR: #8461.
Model family: grok

Reviewing the diff in context: checking dependency alignment, the security advisory, and test isolation.
Verdict: PASS

No blocking issues. The two changes are small, internally consistent, and address real CI/security gaps.

  • [P3] scripts/ci_install_project.sh:24cryptography>=48.0.1,<50.0 correctly mirrors pyproject.toml [tool.uv] constraint-dependencies and closes GHSA-537c-gmf6-5ccf on the legacy pip install path; major-version jump (46→48+) may fail on very old OpenSSL 1.x base images, but that risk is already documented and exercised via uv.lock (49.0.0).
  • [P3] tests/routing/test_domain_matcher_root.py:437-441test_anthropic_fallback_is_not_registered_globally is skipped when the real anthropic package is installed (1 skipped locally), so the leak guard only runs in minimal-CI environments; acceptable, but it won't catch regressions on dev machines that have anthropic present.
  • [P3] Out of diff scope — docs/AGENT_OPERATING_CONTRACT.md and docs/deployment/UPGRADE_ROADMAP.md still cite cryptography 46.x floors; governance drift, not a functional blocker for this PR.

What looks correct

  • CI floor alignment removes the documented ci_install_project.shpyproject.toml mismatch that could leave legacy workflows on a vulnerable pin.
  • Anthropic stub is scoped via anthropic_text_block_stub + @pytest.mark.usefixtures to TestDomainDetectorLLM only; monkeypatch teardown prevents sys.modules pollution — the pattern matches domain_matcher.py's lazy from anthropic.types import TextBlock (line 475).
  • All 37 tests in tests/routing/test_domain_matcher_root.py pass.

dogfood: yes

Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
@scarmani

Copy link
Copy Markdown
Collaborator Author

Claude independent model review

Reviewer: claude (anthropic) — independent adversarial model review via the Aragora Claude reviewer, grounded on the exact PR head.
Head: b786dd1 (b786dd1), committed 2026-06-17T08:13:59Z.
PR: #8461.
Model family: claude

Verdict: PASS

  • [P2] scripts/ci_install_project.sh:24 — cryptography floor jumps a full major (46 → 48). Worth confirming pyproject.toml and any extras don't still cap cryptography at <48, otherwise the CI install will resolve a conflict or downgrade silently. The prior comment "aligned to pyproject [test]" is now stale and should be removed or re-stated.
  • [P3] docs/CAPABILITY_MATRIX.md / docs-site/docs/contributing/capability-matrix.md — CLI inventory bumped 110→111 while coverage stays at 43.2%. Math works only if the numerator also moved (e.g. 48/111). If the matrix is generated, regenerate; if hand-edited, double-check the implicit numerator hasn't drifted.
  • Test scoping in tests/routing/test_domain_matcher_root.py is clean: stub only installed inside TestDomainDetectorLLM via fixture, and the guard test verifies no sys.modules leakage when the real package is absent. No regression risk.

No blocking correctness or security issues.

dogfood: yes

@scarmani

Copy link
Copy Markdown
Collaborator Author

Grok independent model review

Reviewer: grok (xai) — independent adversarial model review via Grok Build CLI harness, grounded on the exact PR head.
Head: b786dd1 (b786dd1), committed 2026-06-17T08:13:59Z.
PR: #8461.
Model family: grok

Reviewing the PR changes in context — I'll inspect the affected files and related dependencies.
Verdict: PASS

No blocking issues. The diff is internally consistent and the changes I could verify locally behave as intended.

  • [P2] tests/routing/test_domain_matcher_root.pytest_anthropic_fallback_is_not_registered_globally is skipped whenever anthropic is installed (skipif(not NEEDS_ANTHROPIC_TYPES_STUB)), so the main dev/CI path never exercises the leak guard; pollution is only checked on minimal-install runs.
  • [P2] tests/routing/test_domain_matcher_root.py — the leak test assumes it runs after TestDomainDetectorLLM; with pytest-randomly enabled (legacy CI installs it), it can run first and pass vacuously. Prefer a fixture finalizer/autouse teardown over definition-order reliance.
  • [P3] scripts/ci_install_project.sh — adds cryptography<50.0 while pyproject.toml only floors >=48.0.1; harmless today, but the script’s “lockstep” comment is slightly overstated.
  • [P3] tests/routing/test_domain_matcher_root.py — the dataclass TextBlock stub is enough for mocked isinstance checks, but it is not API-faithful to real anthropic.types.TextBlock; low risk because production paths use the real package.

Validated: capability-matrix docs match generator output; full test_domain_matcher_root.py passes (37 passed, 1 skipped); ci_install cryptography floor now matches the pyproject.toml GHSA-537c-gmf6-5ccf remediation.

dogfood: yes

Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
@scarmani

Copy link
Copy Markdown
Collaborator Author

Claude independent model review

Reviewer: claude (anthropic) — independent adversarial model review via the Aragora Claude reviewer, grounded on the exact PR head.
Head: ec02747 (ec02747), committed 2026-06-17T08:22:25Z.
PR: #8461.
Model family: claude

Verdict: PASS

  • No blocking issues. This is a low-risk docs/test/dep-floor sync.

Notes / non-blocking observations:

  • [P3] scripts/ci_install_project.sh:24 bumps cryptography to >=48.0.1,<50.0 — verified consistent with pyproject.toml:115 (cryptography>=48.0.1) and the in-file rationale comment (pyproject.toml:98-105) which documents GHSA-537c-gmf6-5ccf and absence of upstream caps. CI script's tighter <50.0 is a sane belt-and-suspenders bound, not a conflict.
  • [P3] tests/routing/test_domain_matcher_root.py — fallback @dataclass TextBlock(type, text) correctly stubs the minimal surface needed by mocked LLM tests; production aragora/routing/domain_matcher.py already imports anthropic lazily (lines 419, 475), so the stub only intercepts when those code paths run inside the TestDomainDetectorLLM class. The new guard test (test_anthropic_fallback_is_not_registered_globally) correctly verifies scoping — sys.modules monkeypatch will auto-revert at teardown. Good hygiene.
  • [P3] CLI count arithmetic is internally consistent: canonical 109→110, total invocations 110→111 (mission adds no alias), and both capability matrices reflect 111 (total invocations), matching prior convention.
  • No security regression, no public-API break, no protected files touched.

dogfood: yes

@scarmani

Copy link
Copy Markdown
Collaborator Author

Grok independent model review

Reviewer: grok (xai) — independent adversarial model review via Grok Build CLI harness, grounded on the exact PR head.
Head: ec02747 (ec02747), committed 2026-06-17T08:22:25Z.
PR: #8461.
Model family: grok

Reviewing the PR diff against the codebase for correctness, security, and regression risks.
Verdict: PASS

No blocking production, security, or CI regression issues in the reviewed diff. Findings:

  • [P2] tests/routing/test_domain_matcher_root.py::test_anthropic_fallback_is_not_registered_globally — With pytest-randomly enabled, this can run before TestDomainDetectorLLM and pass vacuously, so it does not reliably prove the fixture teardown prevents sys.modules pollution after LLM tests.
  • [P2] Same test — @pytest.mark.skipif(not NEEDS_ANTHROPIC_TYPES_STUB, ...) skips entirely when anthropic is installed (common on dev machines); the leak guard never runs on that path.
  • [P3] docs/reference/CLI_REFERENCE.md / docs-site/docs/api/cli.mdmission is documented as “Run or manage native missions” with no subcommands, but aragora/cli/commands/mission.py only implements ingest/run (no manage/list/status); help text overstates capability (matches parser help, still misleading).
  • [P3] docs/CAPABILITY_MATRIX.md / docs-site/docs/contributing/capability-matrix.md — CLI count bumped to 111 but coverage stays 43.2%; minor staleness if mission is uncovered.

Validated as correct:

  • mission CLI registration exists (aragora/cli/_mission_parser.py); generate_cli_reference.py --check passes.
  • scripts/ci_install_project.sh cryptography floor >=48.0.1,<50.0 aligns with pyproject.toml and fixes GHSA-537c-gmf6-5ccf (patched in 48.0.1).
  • Anthropic stub is scoped to a fixture instead of module-level sys.modules injection; full test file passes (37 passed, 1 skipped).

dogfood: yes

Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
@scarmani

Copy link
Copy Markdown
Collaborator Author

Claude independent model review

Reviewer: claude (anthropic) — independent adversarial model review via the Aragora Claude reviewer, grounded on the exact PR head.
Head: eec039b (eec039b), committed 2026-06-17T08:27:44Z.
PR: #8461.
Model family: claude

Verdict: PASS

  • No blocking issues. The PR is overwhelmingly mechanical doc/metric regeneration plus two small functional touches that check out.
  • Verified scripts/ci_install_project.sh:24 cryptography>=48.0.1,<50.0 matches pyproject.toml:115 (cryptography>=48.0.1); no other config caps it below 48 — consistent, no install conflict.
  • Verified aragora/gti/ and tests/gti/ exist with 5 modules + 5 tests each, matching the new gti entry in aragora/module_tiers.yaml:312 (importer_count: 0 is plausible for a new isolated module).
  • Test fix in tests/routing/test_domain_matcher_root.py is well-scoped: ImportError fallback dataclass + anthropic_text_block_stub fixture only applied to TestDomainDetectorLLM; test_anthropic_fallback_is_not_registered_globally guards against sys.modules leakage. Production aragora/routing/domain_matcher.py still imports anthropic via its own optional path (lines 22, 419, 475), unaffected.
  • Notes (non-blocking):
    • [P3] module_tiers.yaml summary integrated: 91 / total: 143 — counted entries match (# --- integrated (91) --- header consistent), but no automated check shipped in this PR to keep summary in sync with module list. If you want to enforce it, add a one-liner verifier.
    • [P3] gti at tier integrated with importer_count: 0 — fine if it's surfaced via CLI/entry points, otherwise consider experimental until something imports it.

dogfood: yes

@scarmani

Copy link
Copy Markdown
Collaborator Author

Grok independent model review

Reviewer: grok (xai) — independent adversarial model review via Grok Build CLI harness, grounded on the exact PR head.
Head: eec039b (eec039b), committed 2026-06-17T08:27:44Z.
PR: #8461.
Model family: grok

Reviewing the PR diff against the codebase for correctness, security, and regression risks.
Verdict: PASS

No blocking issues in this diff. Findings:

  • [P2] tests/routing/test_domain_matcher_root.pytest_anthropic_fallback_is_not_registered_globally is skipped whenever anthropic is installed (@pytest.mark.skipif(not NEEDS_ANTHROPIC_TYPES_STUB, ...)). CI/test extras that pull in anthropic (e.g. via langchain) won’t catch a future reintroduction of module-level sys.modules stub injection; only minimal-env runs enforce the guard.
  • [P2] tests/routing/test_domain_matcher_root.py — The fallback @dataclass TextBlock is not the real anthropic.types.TextBlock. LLM tests can pass without anthropic while production domain_matcher._detect_via_llm uses the SDK type; API drift on TextBlock would not be caught in stub-only environments.
  • [P3] aragora/module_tiers.yaml (gti) — Classified integrated with importer_count: 0 (only test_file_count: 5). Matches regenerate_module_tiers.py --check, but overstates production wiring for tier-filtered surfaces.
  • [P3] docs/CAPABILITY_MATRIX.md / docs-site/docs/contributing/capability-matrix.md vs docs/reference/CLI_REFERENCE.md — Matrix reports 111 CLI commands (total invocations); CLI reference distinguishes 110 canonical / 111 with aliases. Counts are internally consistent with the bump, but the label “commands” remains ambiguous.
  • [P3] scripts/ci_install_project.sh vs pyproject.toml — Legacy install path pins cryptography>=48.0.1,<50.0; constraint-dependencies only floors >=48.0.1. Acceptable, but the two install paths differ slightly in upper-bound policy.

Verified OK: cryptography>=48.0.1 matches GHSA-537c-gmf6-5ccf patched version; aligned with pyproject.toml. Anthropic stub scoping + leak test pass under random ordering. module_tiers.yaml passes --check. CLI docs match generate_cli_reference.py --check. mission is wired in aragora/cli/parser.py. METRICS module/CLI counts match live git ls-files totals.

dogfood: yes

scarmani and others added 5 commits June 17, 2026 04:33
Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
@github-actions

Copy link
Copy Markdown
Contributor

OpenAPI Spec Update

The OpenAPI specification has changed. Please review the generated spec in the workflow artifacts.

Restore mandatory synthesis as the definitive final_answer so earlier placeholder answers cannot suppress generated synthesis.

Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

OpenAPI Spec Update

The OpenAPI specification has changed. Please review the generated spec in the workflow artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant