Skip to content

OWASP Java HTML Sanitizer: gaps and inconsistencies (v20260102.1) #380

@ayanda327

Description

@ayanda327

The following six issues were discovered while integrating owasp-java-html-sanitizer into a
production HTML sanitization pipeline for user-authored rich content. All findings were verified
empirically against the published jar using reflection probes and live sanitize() roundtrips.
They range from missing CSS primitives (fr unit, conic-gradient) to API contract gaps that
make it impossible to override DEFAULT schema entries without reflection or full schema rebuilds.

Environment

  • owasp-java-html-sanitizer: 20260102.1
  • Java: 21
  • Context: sanitizing user-authored HTML/CSS in form entries (rich content fields)

Issues

1. CssTokens.UNIT_TRIE missing fr unit

fr (the CSS Grid fractional unit, standardized in CSS Grid Layout Level 1 since 2017) is not
recognized by CssTokens.isWellKnownUnit. The UNIT_TRIE static initializer covers:

em, ex, ch, rem, vh, vw, vmin, vmax, px, mm, cm, in, pt, pc,
deg, rad, grad, turn, s, ms, hz, khz, dpi, dpcm, dppx

fr is absent. As a result, tokens like 1fr lex as BAD_DIMENSION and the schema rejects
them, silently dropping entire grid track definitions.

Reproduction:

PolicyFactory p = new HtmlPolicyBuilder()
    .allowElements("div").allowAttributes("style").globally()
    .allowStyling()
    .toFactory();

String input = "<div style=\"display:grid;grid-template-columns:1fr 280px\">x</div>";
System.out.println(p.sanitize(input));
// Actual:   <div style="display:grid;grid-template-columns:280px">x</div>
// Expected: <div style="display:grid;grid-template-columns:1fr 280px">x</div>

1fr is dropped; 280px survives because px is a known unit. The grid collapses to a
single-column layout with no sanitizer warning.

Suggested fix: Add fr to the UNIT_TRIE static initializer in CssTokens.java. It is a
length-like quantity (non-negative integer or decimal, no negative values in practice) and
semantically analogous to %.


2. background-image DEFAULT literals omit inherit

The DEFAULT CssSchema entry for background-image has literals {",", "none"}. The keyword
inherit is missing. Every sibling property that also accepts a URL or image value includes
inherit in its literal set:

Property inherit in literals?
background-color yes
color yes
display yes
list-style-image yes
cursor yes
background-image no
content no

Reproduction:

PolicyFactory p = new HtmlPolicyBuilder()
    .allowElements("div").allowAttributes("style").globally()
    .allowStyling()
    .toFactory();

// Stripped — background-image has no "inherit" literal:
System.out.println(p.sanitize("<div style=\"background-image:inherit\">x</div>"));
// Actual:   <div>x</div>
// Expected: <div style="background-image:inherit">x</div>

// Survives — list-style-image does have it:
System.out.println(p.sanitize("<div style=\"list-style-image:inherit\">x</div>"));
// Actual:   <div style="list-style-image:inherit">x</div>

Suggested fix: Add "inherit" to background-image's literal set in
CssSchema.DEFINITIONS. This is a one-line change consistent with the pattern already
established by neighboring properties.


3. allowUrlsInStyles(...) silently no-ops without allowUrlProtocols(...)

Calling allowUrlsInStyles(lambda) registers a styleUrlPolicy on the builder. When
compilePolicies() assembles the StylingPolicy, the URL rewriter chains the lambda with a
FilterUrlByProtocolAttributePolicy built from the builder's global allowedProtocols set. If
allowUrlProtocols(...) was never called at the builder level, that set is empty, and the
protocol filter rejects every URL — including URLs the lambda explicitly approved.

The result is that allowUrlsInStyles(IDENTITY_ATTRIBUTE_POLICY) appears to be a no-op when
allowUrlProtocols is absent. There is no warning, no exception, and no diagnostic output.

Reproduction:

PolicyFactory p = new HtmlPolicyBuilder()
    .allowElements("div").allowAttributes("style").globally()
    .allowStyling()
    .allowUrlsInStyles(AttributePolicy.IDENTITY_ATTRIBUTE_POLICY)
    // Without .allowUrlProtocols("https"), the line above has no effect.
    .toFactory();

String input = "<div style=\"background:url('https://example.com/img.png')\">x</div>";
System.out.println(p.sanitize(input));
// Actual:   <div>x</div>
// Expected: <div style="background:url('https://example.com/img.png')">x</div>

// Adding .allowUrlProtocols("https") makes it work as expected.

The interaction is documented nowhere in the Javadoc for allowUrlsInStyles.

Suggested fixes (any one of):

  1. Add a Javadoc note to allowUrlsInStyles that it requires at least one allowUrlProtocols
    call to take effect.
  2. Throw IllegalStateException at toFactory() time if allowUrlsInStyles was called but
    allowedProtocols is empty.
  3. When the caller supplies an explicit lambda to allowUrlsInStyles, skip the protocol-filter
    chain and treat the lambda as the sole arbiter of URL acceptance.

4. CssSchema.union() cannot extend a DEFAULT property

CssSchema.union(CssSchema a, CssSchema b) throws "Duplicate irreconcilable definitions" when
both schemas define the same property key with differing bits, literals, or fn sets. This makes
it impossible to override a DEFAULT property without rebuilding the entire schema from scratch via
withProperties().

Common use cases that hit this wall:

Reproduction:

// Attempt to add "inherit" to background-image's literals:
CssSchema.Property backgroundImageWithInherit = CssSchema.properties(
    "background-image", "none,inherit"
).get("background-image");

Map<String, CssSchema.Property> overrides = new HashMap<>();
overrides.put("background-image", backgroundImageWithInherit);

// This throws: "Duplicate irreconcilable definitions for background-image"
CssSchema extended = CssSchema.union(CssSchema.DEFAULT, CssSchema.withProperties(overrides));

There is no unionWithOverride variant and no merge-strategy parameter.

Suggested fix: Add CssSchema.unionWithOverride(CssSchema base, CssSchema overrides) that
applies overrides entries on top of base entries when a conflict is detected, rather than
throwing. Alternatively, accept a MergeStrategy enum (THROW, PREFER_FIRST, PREFER_SECOND,
MERGE_LITERALS).


5. CssSchema.withProperties(Map) requires fn self-containment

withProperties(Map<String, Property>) throws "Property map is not self contained" if any
Property's fnKeys references a key not also present in the input map. This means a single
custom property that uses CSS functions (e.g., a background-image Property that allows
linear-gradient() cannot be built in isolation — the input map must also contain an entry keyed
"linear-gradient(" for every function the property references.

Combined with the union() constraint in issue #4, there is no clean API path to:

  1. Take the DEFAULT schema as a base.
  2. Override a single property that uses functions.
  3. Add new function entries alongside it.

Every attempt either throws on the union() call or requires the caller to reconstruct the
complete reachable closure of every property and function key.

Reproduction:

// Attempting a minimal override: background-image that allows linear-gradient.
// This throws "Property map is not self contained" because "linear-gradient(" is not
// a key in the map, even though it exists in CssSchema.DEFAULT.
CssSchema.withProperties(Map.of(
    "background-image", /* property with fnKey "linear-gradient(" */
));

Suggested fix: Relax the self-containment check to allow fn keys that are already present in
a referenced base schema, or document the exact workaround (supply a complete map of all
properties reachable transitively via fn keys, mirroring whatever DEFAULT already provides).


6. Modern CSS function gaps

The following CSS functions are valid per current specifications but are not present in the
DEFAULT allowed-function list:

Function Specification Typical use
conic-gradient() CSS Images Level 4 (2019) Backgrounds
repeat() CSS Grid Layout Level 1 grid-template-*
minmax() CSS Grid Layout Level 1 grid-template-*

linear-gradient and radial-gradient are present in DEFAULT. Their conic-gradient
counterpart is not, making the gradient support asymmetric. repeat() and minmax() are the
two primary value-building functions for CSS Grid tracks; without them, authors cannot express
responsive grid templates even after the fr unit gap (issue #1) is addressed.

Reproduction:

PolicyFactory p = new HtmlPolicyBuilder()
    .allowElements("div").allowAttributes("style").globally()
    .allowStyling()
    .toFactory();

// conic-gradient stripped:
System.out.println(p.sanitize("<div style=\"background:conic-gradient(red, blue)\">x</div>"));
// Actual:   <div>x</div>
// Expected: <div style="background:conic-gradient(red, blue)">x</div>

// repeat() stripped — grid template collapses:
System.out.println(p.sanitize(
    "<div style=\"grid-template-columns:repeat(3, 1fr)\">x</div>"));
// Actual:   <div>x</div>
// Expected: <div style="grid-template-columns:repeat(3, 1fr)">x</div>

Suggested fix: Add conic-gradient(, repeat(, and minmax( to the function definitions in
CssSchema.DEFINITIONS alongside linear-gradient( and radial-gradient(.


Why these matter

Modern CSS is increasingly common in user-authored content, particularly in rich-text editors and
form-based authoring tools. CSS Grid (fr, repeat(), minmax()) is now baseline browser
support and appears regularly in authored HTML. conic-gradient has been widely supported since
2021. inherit on background-image is valid CSS 2.1 behavior.

The API ergonomics issues (#3, #4, #5) compound the data gaps (#1, #2, #6): when downstream
users discover a missing value or unit, the natural response is to extend the DEFAULT schema. The
current API makes that extension path either silent-failing (#3) or structurally blocked (#4, #5).
Addressing the ergonomics issues would allow the ecosystem to self-correct without waiting for
upstream releases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions