Detect large whitespace padding used to hide prompt-injection instructions from review

Going through the prompt-injection coverage I noticed a gap I think we should close.

We already catch several ways of hiding instructions:
- `P2` (Hidden Instructions) in `static_patterns_prompt_injection.py` keys off zero-width characters, HTML and Markdown comments, and base64 blobs.
- `TP2` (Unicode Deception) in `mcp_tool_poisoning.py` keys off homoglyphs, RTL overrides, and invisible formatting characters.
- `MP2` (Context Window Stuffing) in `static_patterns_memory_poisoning.py` keys off character *repetition* (the `((\S)(?!\2).{1,19}?)\1{20,}` regex) and a few stuffing keywords.

None of them catch the simplest trick of all: padding a file with a large block of plain whitespace so the malicious instructions sit far below, or far to the right of, anything a human sees when they open the file, while the agent still reads the whole thing. `MP2` is the closest, but a run of blank lines is not character repetition in the sense its regex matches (a blank line is `\n` against `\n`, not a visible `\S` against itself), so whitespace padding slips through today.

### The attack

A skill author drops, say, eighty blank lines into `SKILL.md` (or a long run of spaces inside a line), and then the injected instruction underneath. A reviewer opening the file in an editor or on GitHub sees what looks like an empty tail and scrolls past it, or never scrolls at all. The agent reads the file end to end and acts on the hidden instruction. It is the text-file equivalent of white-on-white text in a PDF, and it costs the attacker nothing.

### Proposed detection (non-binary files only)

Three signals, each independently useful, all scoped to text file types (skip anything we treat as binary):

1. **Vertical blank-line runs.** N or more consecutive blank or whitespace-only lines (proposed threshold 20), reported with higher confidence when non-blank content follows the gap (the gap is hiding something, not just trailing the file).
2. **Horizontal whitespace runs.** A run of 80 or more consecutive whitespace characters inside a line, or leading indentation of the same size. This needs to fire on the run itself, whether or not visible content follows on that same line, because the point is pushing later content off-screen in an editor without word wrap. It is also the signal that catches the single-line case in the next paragraph.
3. **Oversized whitespace ratio.** A single contiguous whitespace block over a byte budget (proposed 2 KB), or whitespace over ~90% of a file that is itself over a few KB. This one is the noisiest and would start at lower confidence.

One thing I got wrong in my first pass, and want to fix here before any code: "whitespace" cannot mean ASCII space and tab only. An agent reads the file as text, so the padding character only has to be something Python (and the model) treats as whitespace while a human does not see it. I checked the obvious candidates against a naive `[ \t]` test and they all slip through: U+00A0 (no-break space), U+2028 and U+2029 (line and paragraph separators), U+000C (form feed), U+000B (vertical tab), U+3000 (ideographic space), and the zero-width family U+200B, U+200C, U+200D, U+2060, U+FEFF. So all three signals should classify padding by Unicode whitespace category (`Zs`, `Zl`, `Zp`, plus the relevant control characters), not by a hardcoded space-and-tab set. The zero-width characters are already partly covered by `P2`, so this pattern and `P2` should share one definition rather than drift.

The finding would point at the line (or byte offset, for a newline-free run) where the padding starts and quote a short, visible-ized snippet (rendering the run as `U+00A0 x82` or `\n x82`) so the reviewer can see what was hidden.

### Thresholds and false positives

The numbers above are conservative starting points and I expect we will tune them. The false positives I can already think of, and want guards for:
- fenced code blocks and ASCII art in Markdown (legitimate large indentation),
- generated or vendored files (minified assets, lockfiles),
- intentional spacer lines in templates.

A reasonable first cut is to skip fenced code regions for the horizontal signal, and to keep the ratio signal at low confidence so it informs rather than dominates the score.

### Placement and id

This is a prompt-injection evasion, so I would put it next to `P2` in `static_patterns_prompt_injection.py` (a new `P`-series id), or alternatively as `MP4` under Memory Poisoning since it is a sibling of context stuffing. Happy to go either way. Whichever we pick, it registers the same way (add the module/id to `ANALYZER_NODE_IDS` / `ANALYZER_NODES` and route through `static_runner`), and it needs entries in `pattern_defaults.py` plus a row in the README pattern table.

### Open questions

- Do we want one combined id covering all three signals, or separate ids so the score can weight them differently?
- Severity: I lean MEDIUM for the vertical and horizontal signals (HIGH if non-blank content follows a very large gap), and LOW for the ratio signal until it is tuned.
- Should this also run against the MCP manifest fields (tool/parameter descriptions), not just file bodies, since a padded description is the same attack in a smaller surface?

I can put up a first-cut PR once we agree on the signals and rough thresholds.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect large whitespace padding used to hide prompt-injection instructions from review #20

The attack

Proposed detection (non-binary files only)

Thresholds and false positives

Placement and id

Open questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Detect large whitespace padding used to hide prompt-injection instructions from review #20

Description

The attack

Proposed detection (non-binary files only)

Thresholds and false positives

Placement and id

Open questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions