Fix: .md URLs with non-ASCII slugs (å/ä/ö) never resolve#39
Conversation
RewriteHandler read the request path through sanitize_text_field(),
which deliberately strips every percent-encoded octet (%[a-f0-9]{2}).
Non-ASCII URL characters are percent-encoded octets (å=%c3%a5,
ä=%c3%a4, ö=%c3%b6), so they were silently deleted, mangling the slug
so url_to_postid() could never match and no Markdown was served.
Use esc_url_raw() instead, which preserves valid percent-encoding, in
both parse_markdown_url() and handle_markdown_request() (the latter
builds the /<slug>.md/ -> /<slug>.md trailing-slash redirect). ASCII
slugs are unaffected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ported from upstream PR ProgressPlanner#39 (squashed).
Repoint update source, Plugin/Update URI, readme links and dev clone URL from TBarregren/markdown-alternate to Kntnt/markdown-alternate after transferring the repository to the Kntnt org. The contributor credit keeps Thomas Barregren's personal profile link, since it attributes his upstream PRs (ProgressPlanner#39, ProgressPlanner#31). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Heads-up to avoid confusion when verifying this PR: the two "Example (live)" URLs in the description now return The bug itself is unchanged on upstream In other words, this is still a genuine bug — the live examples just no longer demonstrate it on that one site. Happy to add a regression test if you can point me at a test harness. |
Summary
Appending
.mdto a permalink whose slug contains non-ASCII characters (e.g. Swedishå,ä,ö) never returns Markdown — the request falls through to the normal HTML response or a 404. Plain ASCII slugs work fine.Example (live):
https://safeteam.se/kunskapsmagasin/5-%c3%a5tgarder-mot-retail-st%c3%b6ld.md→ HTML/404https://safeteam.se/kunskapsmagasin/%c3%b6ka-tryggheten-f%c3%b6r-butikspersonal.md→ HTML/404Root cause
RewriteHandler::parse_markdown_url()reads the request path throughsanitize_text_field():sanitize_text_field()(via core's_sanitize_text_fields()) deliberately strips every percent-encoded octet:Non-ASCII URL characters are percent-encoded octets (
å=%c3%a5,ä=%c3%a4,ö=%c3%b6), so they are silently deleted:The same
sanitize_text_field()call is also used inhandle_markdown_request(), where it corrupts the target of the/<slug>.md/→/<slug>.mdtrailing-slash redirect for the same reason.This is purely the plugin's own path-parsing. WordPress core resolves the corresponding HTML permalink correctly, because the post's
post_nameis stored percent-encoded (utf8_uri_encode()) andget_page_by_path()re-normalizes viarawurlencode(urldecode())against a case-insensitive collation — a chain that works as long as the octets are still present when the lookup runs.Fix
Use
esc_url_raw()instead ofsanitize_text_field()for the request URI in both spots.esc_url_raw()is the appropriate sanitizer for a URL/path and preserves valid percent-encoding, so the encoded slug survives tourl_to_postid(), which then resolves it correctly. ASCII slugs are unaffected (no behavioral change for them).How to verify
å,ä, orö(so the permalink contains%c3%a5etc.)..mdappended → HTML or 404..mdURL returnsContent-Type: text/markdownwith the rendered Markdown..mdURLs continue to work unchanged.Notes
readme.txtchangelog entry to you so it fits your release flow — happy to add a changelog line if you'd prefer it in the PR.tests/directory in the repo).🤖 Generated with Claude Code