chore(deps): update audit-vulnerable Python locks by scarmani · Pull Request #8460 · synaptent/aragora

scarmani · 2026-06-16T02:09:39Z

Summary

Update locked cryptography from 46.0.7 to 48.0.1.
Update locked starlette from 1.0.1 to 1.3.1.
Keep the repair lockfile-only to address the Python Security Scan blocker observed on PR fix(review-queue): block changes-requested evidence dissent #8454 without changing application code.

Validation

uv lock --check
uv run --with pip-audit python scripts/run_pip_audit_gate.py
bash scripts/automation_pr_preflight.sh origin/main HEAD

Notes

Direct python3 scripts/run_pip_audit_gate.py is not available in this local shell because the default Python lacks pip_audit; the same repo helper passes through an ephemeral uv run --with pip-audit environment.

Update locked cryptography and starlette versions to satisfy the Python Security Scan audit gate. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

github-actions · 2026-06-16T03:27:24Z

Aragora Code Review

Advisory-only review. Findings are surfaced for follow-up and do not fail this workflow.

Security Review: PR Analysis (Revised — Round 2, Skeptical Pass)

The Round 2 critique found "no significant errors" and offered one optional enhancement (acknowledging the duck-typing relaxation). That's fine, but a clean bill of health is exactly when a skeptic should push hardest. In this pass I re-examine my own prior findings adversarially — and I'm downgrading or qualifying several of them, because on closer inspection some rest on assumptions I cannot actually support from the diff. I'd rather correct my own overreach than carry forward findings that look rigorous but aren't.

Self-Challenge: Which of My Findings Actually Hold?

Finding #1 (`None` vs `[]`) — HOLDS, but my severity framing was inflated

I claimed a caller doing for domain, score in self._detect_with_llm(...) would raise TypeError. Let me challenge my own claim. I have never seen the caller. The only observable caller in this diff is detect(), exercised by the new tests, and those tests pass with return None (the empty-content test asserts keyword fallback succeeds). That is direct evidence that the actual production caller already tolerates None — otherwise the new test would raise, not fall back.

So my "TypeError waiting to happen" framing is speculative, conditioned on a hypothetical second caller I cannot demonstrate exists. The real, demonstrable defect is narrower and purely a contract/typing issue:

Category: Quality (type-contract); Security relevance is defense-in-depth at best
Severity: LOW (revised down from MEDIUM)
Issue: The signature declares -> list[tuple[str, float]]; two branches now return None. This is a genuine type-contract violation and may trip the mypy baseline. But the claim that it will cause a runtime crash is unsupported — the shipped tests prove the one visible caller handles None gracefully. The danger is purely latent: a future caller, or a caller in code I cannot see.
Why I downgraded: MEDIUM implied a likely runtime bug. The evidence (passing fallback tests) actively argues against an imminent runtime bug. Honest severity tracks demonstrable impact, and the demonstrable impact here is "type hint is now wrong," which is LOW. I was letting the category (security/robustness) inflate the severity.
Suggestion: Still revert both branches to return [] — it's a one-character-class fix that restores the contract at zero cost and keeps the genuinely good guards (if not response.content, duck-typed block check). Cheap to do, so worth doing, but not a blocker.

Finding #4 (starlette assertion) — HOLDS as a flag, but I overstated the failure risk

I said "if the lockfile change isn't already merged, this test will fail at CI." Challenging myself: the more likely reality is that the lockfile bump landed in a prior merged PR, and this assertion is simply catching up. Why? Because a competent author would not ship a self-failing test, and the PR also bumps cryptography — suggesting a coordinated dependency-hygiene pass where the lock was already updated upstream. My "will fail" framing presented the worst branch as if it were equiprobable with the benign one. It isn't; the benign reading is more probable. I retain the finding as a verify-the-coupling item (LOW), not a likely-breakage warning.

Finding #5 (`gti` zero importers) — WEAKER than I implied

Skeptical look: importer_count: 0 at integrated tier is genuinely odd, but I have no evidence of dead code, and I listed three possibilities without being able to discriminate among them. The presence of tests/gti/__init__.py (a scaffolded test package) is mild evidence against dead code. This is barely a finding — it's a "note to maintainer." I'm keeping it as INFORMATIONAL, not LOW, to stop dressing a question up as a defect.

Finding #6 (file-count math) — I should retire this

This is the one I'm most skeptical of in my own work. I built an arithmetic "residual" (+17 reported vs ~+13 visible) and implied possible manifest drift. Challenging myself: module_tiers.yaml plainly does not enumerate every module's py_files delta in a single PR view, and METRICS.md counts all aragora/ files including those in modules untouched-in-tier but still gaining files. A residual is the expected state, not an anomaly. Treating expected non-equality as a finding is a false-positive pattern. I'm retiring #6 to INFORMATIONAL with no action — chasing a 4-file residual across a manifest I can't regenerate is not a defensible use of reviewer attention.

Addressing the Round 2 Suggestion (duck-typing relaxation)

Accepted and incorporated. The critic is right that I never explicitly noted the type-guard relaxation. Worth stating plainly:

The change from isinstance(first_block, TextBlock) to getattr(first_block, "type", None) != "text" or not hasattr(first_block, "text") trades a strict nominal type check for a structural (duck-typed) one. This is a real tradeoff, not pure improvement:

Upside: removes a hard runtime import of anthropic.types.TextBlock, so the code no longer breaks if the SDK relocates/renames that symbol — and it adds the empty-content guard. Genuinely more robust against SDK churn.
Downside (skeptical note): duck-typing accepts any object presenting type == "text" and a .text attribute, even one that isn't a real TextBlock. In this LLM-response context that's harmless (the response object is trusted SDK output, not attacker-controlled), so it does not create an injection or type-confusion vulnerability. But it is a looser guard, and the looseness is now untested (see chore(deps): Bump the testing group in /aragora/live with 3 updates #3). Net: safe here, slightly weaker invariant.

Confirmed Findings (after skeptical pruning)

#2 — Cryptography floor bump. Unchanged and the strongest positive in the PR. The bump from cryptography>=46.0.7,<48.0 to >=48.0.1 is justified: the prior floor covered only the buffer-overflow CVE, while DNS-name-constraint and SECT-curve subgroup-validation issues require a higher floor. The removal of the upper bound is a deliberate-tradeoff judgment call (auto-receive patches vs. guard against breaking majors), not a defect. Optional: >=48.0.1,<49.0.

#3 — Missing test for the non-text-block branch. Holds firmly. The empty-content path is now tested; the duck-typed branch is not. Concrete fixture (already given last round) feeds SimpleNamespace(type="tool_use"), which lacks .text and thus exercises both arms of the or. This is the most actionable item in the PR because it locks in the new (and now looser) guard. LOW severity but high value-per-effort.

Revised Summary Table

#	Category	Severity	Item	Change this round
1	Quality (type-contract)	LOW ↓	`None` violates `list[...]` hint; runtime-crash claim is unsupported by visible tests	Downgraded MEDIUM→LOW
2	Security	INFORMATIONAL	cryptography bump justified; `<49.0` cap optional	Unchanged
3	Quality	LOW	Missing test for duck-typed non-text branch	Unchanged (top actionable)
4	Quality/Supply-chain	LOW	starlette assertion — verify lock coupling (benign reading more likely)	Softened failure framing
5	Quality	INFORMATIONAL ↓	`gti` zero-importer — a question, not a defect	Downgraded LOW→INFO
6	—	RETIRED	file-count residual is expected, not drift	Retired
—	Quality	INFORMATIONAL	duck-typing relaxes guard; safe here, slightly looser invariant	Added per critique

What I Changed and Why

Downgraded chore(deps): Bump the react group in /aragora/live with 4 updates #1 from MEDIUM to LOW. My own MEDIUM rested on a hypothetical caller. The shipped fallback tests are direct evidence that the visible caller tolerates None, which undercuts the "imminent runtime crash" framing. The defensible defect is the type-hint violation (LOW). I was inflating severity via category; corrected. The fix recommendation (revert to []) is unchanged because it's cheap and correct regardless.
Softened chore(deps): Update black requirement from <25.0,>=23.0 to >=23.0,<26.0 #4's failure framing. I had presented "will fail at CI" and "landed elsewhere" as equiprobable. The benign reading (lock already bumped in a prior PR) is more probable given a coordinated dependency pass. Kept as a verify-coupling note.
Downgraded chore(deps): Update redis requirement from <6.0,>=5.0.0 to >=5.0.0,<8.0 #5 to INFORMATIONAL. Zero importers is a question I can't resolve from the diff; the test scaffold mildly argues against dead code. Stop presenting an open question as a graded finding.
Retired chore(deps): Update elevenlabs requirement from <2.0,>=1.0 to >=1.0,<3.0 #6 entirely. A nonzero residual between METRICS.md's repo-wide file count and a non-exhaustive per-module YAML is the expected condition, not evidence of drift. This was a false-positive pattern in my own review and I'm removing it.
Added the duck-typing-relaxation note (Round 2 critique), and went further than the critic by stating why it's safe here (trusted SDK output, not attacker-controlled) rather than just asserting safety.
Held firm on chore(deps): Update pytest-asyncio requirement from <1.0,>=0.21 to >=0.21,<2.0 #2 and chore(deps): Bump the testing group in /aragora/live with 3 updates #3, which survive skeptical scrutiny: the crypto bump is a verified net positive, and the missing branch test is the single highest value-per-effort item in the PR.

Net effect of this skeptical pass: the PR is lower-risk than my Round 1 review implied. There is no demonstrable runtime defect — only one cheap type-hint fix (#1) and one cheap test addition (#3) worth making before merge, plus two items to verify (#4 lock coupling, #5 gti intent). I'd previously let finding category and inter-reviewer convergence nudge severities upward; convergence between two reviewers on a point does not make that point more true, only more agreed-upon, and I corrected for that bias here.

2 finding(s) across the diff

[CRITICAL] Finding

Finding

[CRITICAL] Finding

Finding

Generated by Aragora multi-agent code review

Regenerate metrics and module tier docs after the GTI additions, and package the new GTI tests so pytest collection does not collide with heterogeneity test modules. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

Use duck-typed text blocks in domain detection so mocked LLM responses do not require the optional Anthropic package during CI baseline collection. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

…deps-20260615

Regenerate docs/METRICS.md against the current origin/main merge tree so the PR metrics drift check matches GitHub's synthetic merge commit. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

Commit the generated docs-site contributing status update required by the Build Documentation check after syncing docs. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

…deps-20260615

Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

…deps-20260615 # Conflicts: # uv.lock

Restore the current execution plan mirror, keep cryptography on the current 49.x lock resolution, and make domain matcher LLM fallback handle empty/non-text content. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

chore(deps): update audit-vulnerable python locks

2894a47

Update locked cryptography and starlette versions to satisfy the Python Security Scan audit gate. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

scarmani marked this pull request as ready for review June 16, 2026 02:19

scarmani requested a review from an0mium as a code owner June 16, 2026 02:19

fix(deps): align audit lock contract

9d9a5e7

scarmani and others added 5 commits June 15, 2026 22:45

fix(tests): refresh security audit generated drift

bc3cb80

Regenerate metrics and module tier docs after the GTI additions, and package the new GTI tests so pytest collection does not collide with heterogeneity test modules. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

fix(tests): avoid required anthropic import in domain routing baseline

b29aa48

Use duck-typed text blocks in domain detection so mocked LLM responses do not require the optional Anthropic package during CI baseline collection. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into codex/security-audit-…

b29102b

…deps-20260615

docs(metrics): refresh after main merge

13af3bb

Regenerate docs/METRICS.md against the current origin/main merge tree so the PR metrics drift check matches GitHub's synthetic merge commit. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

docs(site): sync contributing status

d974212

Commit the generated docs-site contributing status update required by the Build Documentation check after syncing docs. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

an0mium mentioned this pull request Jun 16, 2026

[automation] Founder Brief #6106

Open

scarmani and others added 5 commits June 16, 2026 10:17

fix(tests): package heterogeneity receipt tests

cadae14

Merge remote-tracking branch 'origin/main' into codex/security-audit-…

b40c4ec

…deps-20260615

docs(site): sync security audit docs after main merge

0597d35

Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into codex/security-audit-…

686f78e

…deps-20260615 # Conflicts: # uv.lock

fix(security): address audit deps review blockers

4e1d3b1

Restore the current execution plan mirror, keep cryptography on the current 49.x lock resolution, and make domain matcher LLM fallback handle empty/non-text content. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(deps): update audit-vulnerable Python locks#8460

chore(deps): update audit-vulnerable Python locks#8460
scarmani wants to merge 12 commits into
mainfrom
codex/security-audit-deps-20260615

scarmani commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

scarmani commented Jun 16, 2026

Summary

Validation

Notes

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Aragora Code Review

Security Review: PR Analysis (Revised — Round 2, Skeptical Pass)

Self-Challenge: Which of My Findings Actually Hold?

Finding #1 (None vs []) — HOLDS, but my severity framing was inflated

Finding #4 (starlette assertion) — HOLDS as a flag, but I overstated the failure risk

Finding #5 (gti zero importers) — WEAKER than I implied

Finding #6 (file-count math) — I should retire this

Addressing the Round 2 Suggestion (duck-typing relaxation)

Confirmed Findings (after skeptical pruning)

Revised Summary Table

What I Changed and Why

[CRITICAL] Finding

[CRITICAL] Finding

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Finding #1 (`None` vs `[]`) — HOLDS, but my severity framing was inflated

Finding #5 (`gti` zero importers) — WEAKER than I implied