HTML API: Auto-escape JavaScript and JSON script tag contents when necessary #10635

sirreal · 2025-12-15T19:13:52Z

The HTML API currently rejects script tag contents that may be dangerous. This is a proposal to detect JavaScript and JSON script tags and automatically escape contents when necessary.

JSON and JavaScript script tags may be detected according to the HTML standard.
Script tag contents are escaped only when <script or </script (case-insensitive) is found.

In JSON

< is replaced with \u003C. This eliminates the problematic strings and aligns with the approach described in #63851 and applied in r60681.

This is proposed as a simple character replacement with strtr. This should be highly performant. A less invasive replacement could be done to only replace < in <script or </script where it's really necessary. This would preserve more of the JSON string, but likely at the cost of performance. It would require either a regular expression with case-insensitive matching (see JavaScript example).

In JavaScript

<script and </script (followed by a necessary tag termination character \t\n\r\f/>) the s is replaced with its Unicode escape. This should remain valid in all contexts where the text can appear and maintain identical behavior in all except a few edge cases (see ticket or quoted section below for full explanation and caveats).

From the ticket:

The HTML API prevents setting SCRIPT tag that could modify the tree either by closing the SCRIPT element prematurely, or by preventing the SCRIPT element from closing at the expected close tag.

This is handled by rejecting any script tag contents that are potentially dangerous and is safe. There are some improvements that could be made.

If the contents are found to be unsafe and the type of the script tag is JSON or JavaScript (this is well specified in the HTML standard), it should be possible to apply a syntactic transformation to the contents in such a way that the script contents become safe without semantically altering the script.

If the HTML API can safely and automatically escape the majority of SCRIPT tag contents, it can then be used to for SCRIPT tag creation and has the potential to eliminate the class of problem from #40737, #62797, and #63851. It also has the potential to address part of #51159 where SCRIPT tag escaping becomes less of an issue.

JSON

In JSON SCRIPT tags, the transformation is a simple replacement of < with its Unicode escape sequence \u003C. This can be applied to the entire contents of the script or specifically in case-insensitive matches for <script and </script.

JavaScript

JavaScript SCRIPT tags are more difficult because the language has vastly more syntax. Fortunately, there is prior art described in this 2022 blog post (external) from React team member Sophie Alpert. It's the same the JavaScript SCRIPT tag contents escaping strategy that React continues to employ today. In summary, the problematic text <script and </script syntactically appear in places where Unicode escape sequences can be used in the script part (Strings, Identifiers, and RegExp literals). React takes the approach of replacing the s character, resulting in <\u0073cript or </\u0073cript, completely safe in a Script tag.

There are a few notable exceptions where the transformed JavaScript has observably different runtime behavior. These are the only examples I'm aware of. They're more esoteric parts of the language and the likelihood of them being used in inline JavaScript with the problematic text sequences seems an acceptable tradeoff to me to enable cheap, automatic JavaScript escaping.

String.raw does not process escape sequences.
'<script>' === '<\u0073cript>'; // true
String.raw`<script>` === String.raw`<\u0073cript>`; // false
Tagged templates can also access the raw strings, again a form without processing escape sequences.
function taggedCooked( strings ) {
    return strings[0];
}
taggedCooked`<script>` === taggedCooked`<\u0073cript>`; // true

function taggedRaw( strings ) {
    return strings.raw[0];
}
taggedRaw`<script>` === taggedRaw`<\u0073cript>`; // false
The source property of RegExp contains a string representation of the pattern. JavaScript RegExp support Unicode escape sequences, but the Unicode escape sequence is not transformed in the source.
const rPlain = /<script>/;
const rEscaped = /<\u0073cript>/

rPlain.test('<script>'); // true
rEscaped.test('<script>'); // true

rPlain.source === rEscaped.source; // false
rPlain.source; // '<script>'
rEscaped.source; // '<\\u0073cript>'
Any better JavaScript escaping would likely require a complete JavaScript parser and much more invasive changes. It would be much more costly to perform. Even then, I'm not sure that the escaping could be done faithfully.

String.raw() could be split and joined:
String.raw`<script>` === String.raw`<s` + String.raw`cript>`; true ✅
Tagged template raw and RegExp source seem much more challenging.

Trac ticket: https://core.trac.wordpress.org/ticket/64419

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

github-actions · 2025-12-15T19:34:26Z

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

The Plugin and Theme Directories cannot be accessed within Playground.
All changes will be lost when closing a tab with a Playground instance.
All changes will be lost when refreshing the page.
A fresh instance is created each time the link below is clicked.
Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.