Going through the prompt-injection coverage I noticed a gap I think we should close.
We already catch several ways of hiding instructions:
P2 (Hidden Instructions) in static_patterns_prompt_injection.py keys off zero-width characters, HTML and Markdown comments, and base64 blobs.
TP2 (Unicode Deception) in mcp_tool_poisoning.py keys off homoglyphs, RTL overrides, and invisible formatting characters.
MP2 (Context Window Stuffing) in static_patterns_memory_poisoning.py keys off character repetition (the ((\S)(?!\2).{1,19}?)\1{20,} regex) and a few stuffing keywords.
None of them catch the simplest trick of all: padding a file with a large block of plain whitespace so the malicious instructions sit far below, or far to the right of, anything a human sees when they open the file, while the agent still reads the whole thing. MP2 is the closest, but a run of blank lines is not character repetition in the sense its regex matches (a blank line is \n against \n, not a visible \S against itself), so whitespace padding slips through today.
The attack
A skill author drops, say, eighty blank lines into SKILL.md (or a long run of spaces inside a line), and then the injected instruction underneath. A reviewer opening the file in an editor or on GitHub sees what looks like an empty tail and scrolls past it, or never scrolls at all. The agent reads the file end to end and acts on the hidden instruction. It is the text-file equivalent of white-on-white text in a PDF, and it costs the attacker nothing.
Proposed detection (non-binary files only)
Three signals, each independently useful, all scoped to text file types (skip anything we treat as binary):
- Vertical blank-line runs. N or more consecutive blank or whitespace-only lines (proposed threshold 20), reported with higher confidence when non-blank content follows the gap (the gap is hiding something, not just trailing the file).
- Horizontal whitespace runs. A run of 80 or more consecutive whitespace characters inside a line, or leading indentation of the same size. This needs to fire on the run itself, whether or not visible content follows on that same line, because the point is pushing later content off-screen in an editor without word wrap. It is also the signal that catches the single-line case in the next paragraph.
- Oversized whitespace ratio. A single contiguous whitespace block over a byte budget (proposed 2 KB), or whitespace over ~90% of a file that is itself over a few KB. This one is the noisiest and would start at lower confidence.
One thing I got wrong in my first pass, and want to fix here before any code: "whitespace" cannot mean ASCII space and tab only. An agent reads the file as text, so the padding character only has to be something Python (and the model) treats as whitespace while a human does not see it. I checked the obvious candidates against a naive [ \t] test and they all slip through: U+00A0 (no-break space), U+2028 and U+2029 (line and paragraph separators), U+000C (form feed), U+000B (vertical tab), U+3000 (ideographic space), and the zero-width family U+200B, U+200C, U+200D, U+2060, U+FEFF. So all three signals should classify padding by Unicode whitespace category (Zs, Zl, Zp, plus the relevant control characters), not by a hardcoded space-and-tab set. The zero-width characters are already partly covered by P2, so this pattern and P2 should share one definition rather than drift.
The finding would point at the line (or byte offset, for a newline-free run) where the padding starts and quote a short, visible-ized snippet (rendering the run as U+00A0 x82 or \n x82) so the reviewer can see what was hidden.
Thresholds and false positives
The numbers above are conservative starting points and I expect we will tune them. The false positives I can already think of, and want guards for:
- fenced code blocks and ASCII art in Markdown (legitimate large indentation),
- generated or vendored files (minified assets, lockfiles),
- intentional spacer lines in templates.
A reasonable first cut is to skip fenced code regions for the horizontal signal, and to keep the ratio signal at low confidence so it informs rather than dominates the score.
Placement and id
This is a prompt-injection evasion, so I would put it next to P2 in static_patterns_prompt_injection.py (a new P-series id), or alternatively as MP4 under Memory Poisoning since it is a sibling of context stuffing. Happy to go either way. Whichever we pick, it registers the same way (add the module/id to ANALYZER_NODE_IDS / ANALYZER_NODES and route through static_runner), and it needs entries in pattern_defaults.py plus a row in the README pattern table.
Open questions
- Do we want one combined id covering all three signals, or separate ids so the score can weight them differently?
- Severity: I lean MEDIUM for the vertical and horizontal signals (HIGH if non-blank content follows a very large gap), and LOW for the ratio signal until it is tuned.
- Should this also run against the MCP manifest fields (tool/parameter descriptions), not just file bodies, since a padded description is the same attack in a smaller surface?
I can put up a first-cut PR once we agree on the signals and rough thresholds.
Going through the prompt-injection coverage I noticed a gap I think we should close.
We already catch several ways of hiding instructions:
P2(Hidden Instructions) instatic_patterns_prompt_injection.pykeys off zero-width characters, HTML and Markdown comments, and base64 blobs.TP2(Unicode Deception) inmcp_tool_poisoning.pykeys off homoglyphs, RTL overrides, and invisible formatting characters.MP2(Context Window Stuffing) instatic_patterns_memory_poisoning.pykeys off character repetition (the((\S)(?!\2).{1,19}?)\1{20,}regex) and a few stuffing keywords.None of them catch the simplest trick of all: padding a file with a large block of plain whitespace so the malicious instructions sit far below, or far to the right of, anything a human sees when they open the file, while the agent still reads the whole thing.
MP2is the closest, but a run of blank lines is not character repetition in the sense its regex matches (a blank line is\nagainst\n, not a visible\Sagainst itself), so whitespace padding slips through today.The attack
A skill author drops, say, eighty blank lines into
SKILL.md(or a long run of spaces inside a line), and then the injected instruction underneath. A reviewer opening the file in an editor or on GitHub sees what looks like an empty tail and scrolls past it, or never scrolls at all. The agent reads the file end to end and acts on the hidden instruction. It is the text-file equivalent of white-on-white text in a PDF, and it costs the attacker nothing.Proposed detection (non-binary files only)
Three signals, each independently useful, all scoped to text file types (skip anything we treat as binary):
One thing I got wrong in my first pass, and want to fix here before any code: "whitespace" cannot mean ASCII space and tab only. An agent reads the file as text, so the padding character only has to be something Python (and the model) treats as whitespace while a human does not see it. I checked the obvious candidates against a naive
[ \t]test and they all slip through: U+00A0 (no-break space), U+2028 and U+2029 (line and paragraph separators), U+000C (form feed), U+000B (vertical tab), U+3000 (ideographic space), and the zero-width family U+200B, U+200C, U+200D, U+2060, U+FEFF. So all three signals should classify padding by Unicode whitespace category (Zs,Zl,Zp, plus the relevant control characters), not by a hardcoded space-and-tab set. The zero-width characters are already partly covered byP2, so this pattern andP2should share one definition rather than drift.The finding would point at the line (or byte offset, for a newline-free run) where the padding starts and quote a short, visible-ized snippet (rendering the run as
U+00A0 x82or\n x82) so the reviewer can see what was hidden.Thresholds and false positives
The numbers above are conservative starting points and I expect we will tune them. The false positives I can already think of, and want guards for:
A reasonable first cut is to skip fenced code regions for the horizontal signal, and to keep the ratio signal at low confidence so it informs rather than dominates the score.
Placement and id
This is a prompt-injection evasion, so I would put it next to
P2instatic_patterns_prompt_injection.py(a newP-series id), or alternatively asMP4under Memory Poisoning since it is a sibling of context stuffing. Happy to go either way. Whichever we pick, it registers the same way (add the module/id toANALYZER_NODE_IDS/ANALYZER_NODESand route throughstatic_runner), and it needs entries inpattern_defaults.pyplus a row in the README pattern table.Open questions
I can put up a first-cut PR once we agree on the signals and rough thresholds.