Skip to content

Detect whitespace padding used to hide prompt-injection instructions (P9)#24

Open
korjavin wants to merge 12 commits into
NVIDIA:mainfrom
korjavin:feat/detect-whitespace-padding-injection
Open

Detect whitespace padding used to hide prompt-injection instructions (P9)#24
korjavin wants to merge 12 commits into
NVIDIA:mainfrom
korjavin:feat/detect-whitespace-padding-injection

Conversation

@korjavin

Copy link
Copy Markdown

Adds rule P9 "Whitespace Padding" under Prompt Injection, for issue #20. It detects padding that pushes injected instructions out of a reviewer's view while the agent still reads them.

P6 through P8 were taken by System Prompt Leakage, so this uses P9. One id covers all three signals; confidence carries the weighting.

Signals (all reported as P9):

  1. Vertical: 20+ consecutive blank or whitespace-only lines. MEDIUM, raised to HIGH when content follows a gap of 40+ lines. Confidence 0.8 with trailing content, 0.6 without.
  2. Horizontal: 80+ consecutive whitespace characters in a line, including leading indentation. MEDIUM, confidence 0.7.
  3. Ratio: a contiguous whitespace block over 2 KB, or whitespace over 90% of a file larger than 4 KB. LOW, confidence 0.4.

Whitespace is classified by Unicode category rather than ASCII space/tab: controls (\t \n \r \v \f), categories Zs/Zl/Zp (U+00A0, U+2028, U+2029, U+3000, and so on), and the zero-width family (U+200B/C/D, U+2060, U+FEFF). That zero-width set is now one shared constant (ZERO_WIDTH_CHARS) used by P2's regex and the mcp_tool_poisoning zero-width check, so the two cannot drift; the MCP check also picks up U+2060/U+FEFF.

Each finding points at the line where padding starts and includes a visible snippet of what was hidden (for example U+00A0 x82 or \n x80).

False-positive guards: markdown fenced code is skipped for the horizontal signal; vendored files are skipped (*.min.js, *.min.css, *.lock, package-lock.json, yarn.lock, *.svg, *.map); binary-ish content (containing U+FFFD) bails out; the ratio signal stays at LOW. Eval-dataset prose and files over 1 MB are already skipped upstream.

MCP manifest description fields are covered by the same detector (horizontal and block signals; the per-file ratio signal is skipped since fields are short).

Tests cover all three signals at their thresholds, the full Unicode evasion set, the false-positive guards, and the MCP path. Thresholds are named constants, easy to tune against a real corpus. Happy to adjust the signals or thresholds before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant