Skip to content

Suppress KeywordDetector false positives (0.5.1)#76

Open
debu-sinha wants to merge 2 commits into
mainfrom
fix/keyword-detector-false-positives
Open

Suppress KeywordDetector false positives (0.5.1)#76
debu-sinha wants to merge 2 commits into
mainfrom
fix/keyword-detector-false-positives

Conversation

@debu-sinha

Copy link
Copy Markdown
Owner

Summary

Second false-positive pass over the top-50 MCP ecosystem scan, focused on the
credential scanner's detect-secrets KeywordDetector results. These produced ~73
medium "Secret Keyword" findings that were not secrets. Ships as 0.5.1.

Result on the same corpus: medium "Secret Keyword" 73 -> 22 (-70%). The
remaining matches are genuinely ambiguous (telemetry keys, datastore config,
config-schema prose) where further suppression would risk missing real secrets.

Changes

All fixes are scoped to KeywordDetector, so provider-key detectors (AWS, Private
Key, OpenAI, etc.) are never affected. Each has a regression test.

  • Suppress descriptive identifier values: enum/const strings and OAuth method
    names such as SecurityApiKey, clientSecretBasic, security.apiKey,
    CryptoService, password_auth. Detected by splitting on separators and
    camelCase into real word tokens; high-entropy keys decompose into two-char
    fragments and are kept.
  • Suppress validation-regex values (AIza[0-9A-Za-z-_]{35}, sk-[a-zA-Z0-9]{20})
    and CI templating expressions (${{ secrets.GITHUB_TOKEN }}).
  • Treat localization / i18n files (locales/, i18n/, *.lang.ts) as
    low-confidence context.
  • Regenerate the dashboard (grade A, 93/100).

Testing Performed

Local Unit

  • pytest tests/ -q -> 658 passed, 2 skipped, 3 xfailed (5 new tests).
    Guard tests confirm real high-entropy custom secrets, AWS keys, and private
    keys still flag.
  • ruff check src/ tests/ -> clean.

Read-only Smoke Test

  • Top-50 ecosystem rerun: medium Secret Keyword 73 -> 22; each remaining match
    spot-checked as ambiguous or legitimate, not a clear FP.

Risk & Rollback

Low. KeywordDetector-scoped suppression only; provider detectors untouched.
Guard tests prevent over-suppression. Rollback is a branch revert.

Docs

  • CHANGELOG 0.5.1 entry; version bumped in pyproject, init, CITATION.

Deeper pass over the top-50 MCP scan surfaced ~73 medium "Secret Keyword"
findings that were not secrets. KeywordDetector captures the value after a
password=/secret=/token= keyword, which is often a descriptive identifier, a
validation regex, a CI expression, or translated UI text.

Fixes (scoped to KeywordDetector so provider keys are never affected):
- Identifier values like SecurityApiKey, clientSecretBasic, security.apiKey,
  CryptoService, password_auth (split on separators + camelCase into real
  words; high-entropy keys decompose into 2-char fragments and are kept).
- Validation regex values like AIza[0-9A-Za-z-_]{35} and sk-[a-zA-Z0-9]{20}.
- CI templating expressions like ${{ secrets.GITHUB_TOKEN }}.
- Localization files (locales/, i18n/, *.lang.ts) treated as low confidence.

Adds regression tests with guards proving real high-entropy custom secrets,
AWS keys, and private keys still flag. Bumps to 0.5.1.
@debu-sinha debu-sinha force-pushed the fix/keyword-detector-false-positives branch from 0ded821 to 1e1e7ac Compare June 16, 2026 04:59
@debu-sinha

Copy link
Copy Markdown
Owner Author

Dropped the dashboard regeneration from this PR: the weekly MCP scan updated the dashboard on main in the meantime, causing conflicts on generated files. The dashboard will refresh with these 0.5.1 fixes on the next weekly run. This PR is now code-only (the credential scanner fixes + tests + version bump).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant