Skip to content

sec(enterprise): SECURITY.md + auth-leak fix + telemetry opt-out + stable Flask key#40

Open
dgokeeffe wants to merge 1 commit into
mainfrom
fix/enterprise-security-quick-wins
Open

sec(enterprise): SECURITY.md + auth-leak fix + telemetry opt-out + stable Flask key#40
dgokeeffe wants to merge 1 commit into
mainfrom
fix/enterprise-security-quick-wins

Conversation

@dgokeeffe
Copy link
Copy Markdown
Collaborator

Summary

Bundle of four enterprise-readiness improvements from an independent security review framed against NAB (APRA / CPS 234) and Coles Group (PCI-DSS / ISO 27001) procurement requirements. All four are low-risk and surgical; default behaviour is unchanged for non-enterprise deployments.

Together they close the document-review phase of any large-enterprise vendor security assessment. They do NOT close every finding from the review — the rest are tracked at the bottom of this PR description.

E-2 — Add .github/SECURITY.md (P0 compliance blocker)

Vendor security assessments (SIG / CAIQ) ask "do you have a documented vulnerability disclosure process?" The answer was no.

Now we have:

  • Two private disclosure channels (GitHub Security Advisories + email)
  • Acknowledgement / triage / fix-plan / disclosure timeline
  • Severity-based patch SLA (Critical: 7d, High: 14d, Medium: 30d)
  • Scope boundaries
  • Supply-chain controls summary for procurement reviewers
  • Pointer to docs/enterprise.md § known limits for the trade-offs already documented

E-4 — Remove /api/app-state from auth-exempt list (P1)

app.py:808 had /api/app-state in the same exempt list as /health. Unauthenticated callers could fetch app_owner email + last_rotation_iso + owner_resolved_at — enough to fingerprint session timing and identify the workspace owner.

The endpoint has no pre-auth use case (the polling endpoints the UI needs before PAT setup are /api/setup-status and /api/pat-status, both still exempt). Removed from the list, two regression tests added:

  • test_app_state_denied_for_non_owner
  • test_app_state_allowed_for_owner

E-9 — Telemetry opt-out via CODA_TELEMETRY_DISABLED (P2)

log_telemetry() now short-circuits when CODA_TELEMETRY_DISABLED is truthy. Default behaviour unchanged (telemetry still on by default).

Why this matters for procurement: regulated customers (banks, retailers with PCI-DSS) must inventory every outbound data flow from their workspace boundary. CoDA's User-Agent-header telemetry to Databricks servers was previously undisclosed and unsuppressible. Opt-out gives those customers a clean answer for their third-party-risk register.

app.yaml gets a commented CODA_TELEMETRY_DISABLED: \"true\" example. 5 new unit tests cover truthy/falsy parsing and that the background thread doesn't spawn when disabled.

E-11 — Stable Flask secret_key via FLASK_SECRET_KEY (P2)

app.py:56 used to be app.secret_key = os.urandom(24) — regenerated on every worker restart, invalidating Flask session cookies. With single-worker config today this is mostly cosmetic but it's a flagged finding in any key-management audit and would actively break sessions on multi-worker.

Now reads FLASK_SECRET_KEY from env (typically wired to a Databricks secret in app.yaml). Falls back to os.urandom for local dev with a WARNING log line so operators see when production is using an ephemeral key.

Test plan

  • uv run pytest tests/ — 244/244 pass
  • uv run pytest tests/test_auth_enforcement.py tests/test_telemetry_opt_out.py tests/test_app_state.py — 42/42 pass
  • Default behaviour unchanged (no new env vars set = exactly current behaviour)
  • Reviewer: confirm curl -i https://<app>/api/app-state from an unauth context now returns 403

What's still open from the review

Tracked for follow-up PRs / triage, not in this bundle:

Finding Severity Notes
E-1 SBOM on releases P0 Separate PR (#41 — being authored alongside this one)
E-3 Rate limit on /api/configure-pat P1 Needs slowapi wire-up
E-5 curl | bash Claude installer (no checksum) P1 Enterprise mode (#38) partially addresses; default path still unverified
E-6 requests = git+https://... PyPI proxy bypass P1 Blocked by Databricks internal PyPI proxy availability
E-7 MCP DEEPWIKI/EXA disable on main P1 Closes when #38 merges
E-8 Session-level command audit trail P1 Medium effort, structured logging
E-10 PAT blast radius P2 Architectural; mitigation is documentation
E-12 Hermes git pin on main P2 Closes when #39 merges
E-13 Content-filter proxy log retention P3 Add RotatingFileHandler
E-14 CSRF on /api/input P3 Medium effort, Flask-WTF or custom header
E-15 Data residency disclosure P3 Documentation

This pull request and its description were written by Isaac.

…able Flask key

Bundle of four enterprise-readiness improvements from an independent
security review against NAB (APRA / CPS 234) and Coles (PCI-DSS / ISO
27001) procurement requirements.

E-2: Add .github/SECURITY.md (P0 compliance blocker)
================================================================
Vendor security assessments (SIG / CAIQ) ask "do you have a documented
vulnerability disclosure process?" — answer was no. Now we have:
  - Two private disclosure channels (GitHub Security Advisories + email)
  - Acknowledgement / triage / fix-plan / disclosure timeline
  - Severity-based patch SLA (Critical: 7d, High: 14d, Medium: 30d)
  - Scope boundaries (in/out of scope)
  - Supply-chain controls summary for procurement reviewers
  - Pointer to docs/enterprise.md § known limits for the trade-offs
    we've deliberately accepted

E-4: Remove /api/app-state from auth-exempt list (P1)
================================================================
app.py:808 — /api/app-state was in the same exempt list as /health.
Unauthenticated callers could fetch app_owner email + last_rotation_iso
+ owner_resolved_at, which is enough to fingerprint session timing and
identify the workspace owner.

The endpoint has no pre-auth use case in the UI (the polling endpoints
the UI needs before PAT setup are /api/setup-status and /api/pat-status,
both still exempt). Removed from the list.

Tests:
  - test_app_state_denied_for_non_owner — non-owners get 403
  - test_app_state_allowed_for_owner — owner still gets 200

E-9: Telemetry opt-out via CODA_TELEMETRY_DISABLED (P2)
================================================================
telemetry.py — log_telemetry() now short-circuits when
CODA_TELEMETRY_DISABLED is truthy. Default behaviour unchanged (telemetry
still on by default).

Why this matters for procurement: regulated customers (banks, retailers
with PCI-DSS) must inventory every outbound data flow from their
workspace boundary. CoDA's User-Agent-header telemetry to Databricks
servers was previously undisclosed and unsuppressible. Opt-out gives
those customers a clean answer for their third-party-risk register.

app.yaml gets a commented `CODA_TELEMETRY_DISABLED: "true"` example so
operators see the knob. 5 new unit tests cover truthy/falsy parsing and
that the background thread does NOT spawn when disabled.

E-11: Stable Flask secret_key via FLASK_SECRET_KEY (P2)
================================================================
app.py:56 used to be `app.secret_key = os.urandom(24)` — regenerated on
every worker restart, invalidating all Flask session cookies. With
single-worker config today this is mostly cosmetic (sessions get torn
down anyway when PTYs die with the worker), but it's a flagged finding
in any key-management audit and would actively break sessions if we
ever scaled to multi-worker.

Now reads FLASK_SECRET_KEY from env (typically wired to a Databricks
secret in app.yaml). Falls back to os.urandom for local dev with a
WARNING log line so operators see when production is using an ephemeral
key. app.yaml gets a commented example.

Test results
================================================================
- 244/244 unit tests pass
- 42/42 in the auth-enforcement + telemetry-opt-out + app-state suites
- Default behaviour unchanged when no new env vars are set

Out of scope (separate PRs / future work)
================================================================
- SBOM generation in the release workflow — PR #41
- Rate limiting on /api/configure-pat (E-3) — needs slowapi wire-up
- Session-level command audit trail (E-8) — medium effort, structured logs
- CSRF protection on /api/input (E-14) — medium effort
- `requests = git+https://...` PyPI proxy bypass (E-6) — blocked by
  Databricks internal PyPI proxy availability

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant