Skip to content

Fix: .md URLs with non-ASCII slugs (å/ä/ö) never resolve#39

Open
TBarregren wants to merge 1 commit into
ProgressPlanner:mainfrom
Kntnt:fix/non-ascii-md-slugs
Open

Fix: .md URLs with non-ASCII slugs (å/ä/ö) never resolve#39
TBarregren wants to merge 1 commit into
ProgressPlanner:mainfrom
Kntnt:fix/non-ascii-md-slugs

Conversation

@TBarregren
Copy link
Copy Markdown

Summary

Appending .md to a permalink whose slug contains non-ASCII characters (e.g. Swedish å, ä, ö) never returns Markdown — the request falls through to the normal HTML response or a 404. Plain ASCII slugs work fine.

Example (live):

  • https://safeteam.se/kunskapsmagasin/5-%c3%a5tgarder-mot-retail-st%c3%b6ld.md → HTML/404
  • https://safeteam.se/kunskapsmagasin/%c3%b6ka-tryggheten-f%c3%b6r-butikspersonal.md → HTML/404

Root cause

RewriteHandler::parse_markdown_url() reads the request path through sanitize_text_field():

$request_uri = isset($_SERVER['REQUEST_URI'])
    ? sanitize_text_field(wp_unslash($_SERVER['REQUEST_URI']))
    : '';
$path = wp_parse_url($request_uri, PHP_URL_PATH);

sanitize_text_field() (via core's _sanitize_text_fields()) deliberately strips every percent-encoded octet:

// Remove percent-encoded characters.
while ( preg_match( '/%[a-f0-9]{2}/i', $filtered, $match ) ) {
    $filtered = str_replace( $match[0], '', $filtered );
    ...
}

Non-ASCII URL characters are percent-encoded octets (å = %c3%a5, ä = %c3%a4, ö = %c3%b6), so they are silently deleted:

/kunskapsmagasin/5-%c3%a5tgarder-mot-retail-st%c3%b6ld.md
        → /kunskapsmagasin/5-tgarder-mot-retail-stld.md   (octets stripped)
        → slug "kunskapsmagasin/5-tgarder-mot-retail-stld"
        → url_to_postid(...) === 0   (no such post)
        → return;  // no markdown served

The same sanitize_text_field() call is also used in handle_markdown_request(), where it corrupts the target of the /<slug>.md//<slug>.md trailing-slash redirect for the same reason.

This is purely the plugin's own path-parsing. WordPress core resolves the corresponding HTML permalink correctly, because the post's post_name is stored percent-encoded (utf8_uri_encode()) and get_page_by_path() re-normalizes via rawurlencode(urldecode()) against a case-insensitive collation — a chain that works as long as the octets are still present when the lookup runs.

Fix

Use esc_url_raw() instead of sanitize_text_field() for the request URI in both spots. esc_url_raw() is the appropriate sanitizer for a URL/path and preserves valid percent-encoding, so the encoded slug survives to url_to_postid(), which then resolves it correctly. ASCII slugs are unaffected (no behavioral change for them).

How to verify

  1. On a site with pretty permalinks, publish a post/page whose slug contains å, ä, or ö (so the permalink contains %c3%a5 etc.).
  2. Before: request that permalink with .md appended → HTML or 404.
  3. After: the same .md URL returns Content-Type: text/markdown with the rendered Markdown.
  4. ASCII .md URLs continue to work unchanged.

Notes

  • Scope is intentionally minimal (2 lines, one file). I've left the version bump and readme.txt changelog entry to you so it fits your release flow — happy to add a changelog line if you'd prefer it in the PR.
  • Happy to add a regression test if you point me at the test harness (I didn't find a tests/ directory in the repo).

🤖 Generated with Claude Code

RewriteHandler read the request path through sanitize_text_field(),
which deliberately strips every percent-encoded octet (%[a-f0-9]{2}).
Non-ASCII URL characters are percent-encoded octets (å=%c3%a5,
ä=%c3%a4, ö=%c3%b6), so they were silently deleted, mangling the slug
so url_to_postid() could never match and no Markdown was served.

Use esc_url_raw() instead, which preserves valid percent-encoding, in
both parse_markdown_url() and handle_markdown_request() (the latter
builds the /<slug>.md/ -> /<slug>.md trailing-slash redirect). ASCII
slugs are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
TBarregren added a commit to Kntnt/markdown-alternate that referenced this pull request Jun 2, 2026
TBarregren added a commit to Kntnt/markdown-alternate that referenced this pull request Jun 2, 2026
Repoint update source, Plugin/Update URI, readme links and dev clone URL from
TBarregren/markdown-alternate to Kntnt/markdown-alternate after transferring the
repository to the Kntnt org. The contributor credit keeps Thomas Barregren's
personal profile link, since it attributes his upstream PRs (ProgressPlanner#39, ProgressPlanner#31).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@TBarregren
Copy link
Copy Markdown
Author

Heads-up to avoid confusion when verifying this PR: the two "Example (live)" URLs in the description now return HTTP 200 with Content-Type: text/markdownnot because main is fixed, but because I've since deployed a fork of this plugin (with exactly this patch) on safeteam.se. Those live URLs therefore now exercise the patched code path, so they no longer reproduce the bug.

The bug itself is unchanged on upstream main: RewriteHandler::parse_markdown_url() (and handle_markdown_request()) still run the request URI through sanitize_text_field(), which strips percent-encoded octets, so any non-ASCII slug (å/ä/ö) still 404s on a stock build. To reproduce on main, append .md to any permalink whose slug contains such a character.

In other words, this is still a genuine bug — the live examples just no longer demonstrate it on that one site. Happy to add a regression test if you can point me at a test harness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant