Skip to content

Move first-party HX push URL decisions to client-side to protect cacheability#11

Open
DmitryMatv wants to merge 4 commits intomainfrom
t3code/remove-push-url-cache-split
Open

Move first-party HX push URL decisions to client-side to protect cacheability#11
DmitryMatv wants to merge 4 commits intomainfrom
t3code/remove-push-url-cache-split

Conversation

@DmitryMatv
Copy link
Owner

@DmitryMatv DmitryMatv commented Mar 18, 2026

Summary

  • Moves first-party history/push URL targeting from server inference to client-side hx-push-url updates in classifier.ts.
  • Keeps fragment requests cache-safe by removing push_url from loader/form params and avoiding HX-Current-URL-driven server behavior.
  • Adds explicit hidden track_usage=true to the classifier form and preserves history-disabled initial loader behavior.
  • Refactors URL/history helpers in app/web.py (build_clean_classifier_url, resolve_push_url_enabled, resolve_page_title_enabled) and aligns title OOB behavior with explicit flags.
  • Expands frontend and backend test coverage for canonical params, push URL behavior, title swaps, and rendered page contracts.
  • Documents cache/history gotchas in AGENTS.md to prevent regressions.

Testing

  • Added/updated unit tests in app/assets/ts/classifier.test.ts for canonical form params, loader params, and dynamic hx-push-url behavior.
  • Added backend fragment history contract checks in tests/test_web_fragment_history.py for omitted/explicit push URL and title OOB behavior.
  • Added page rendering contract checks in tests/test_web_helpers.py for form/loader attributes and canonical hx-vals.
  • Not run in this PR-writing context.

Summary by CodeRabbit

  • New Features

    • Added a dedicated search endpoint and server-driven redirects to enforce canonical classifier page URLs.
  • Improvements

    • Canonical URL generation with Unicode/whitespace normalization and lossy-query preservation for stable clean URLs.
    • Form behavior updated to normalize inputs client-side, disable fragment push-to-history by default, and include explicit tracking on submissions.
  • Documentation

    • Clarified cache/first-party fragment behavior and guidance to avoid client-side URL prediction.
  • Tests

    • Expanded tests for normalization, canonicalization, fragment history, and navigation behavior.

- Keep fragment params canonical by removing `push_url` from loader/form payloads
- Set `hx-push-url="false"` in template and compute clean first-party history targets in `classifier.ts`
- Stop inferring history updates from `HX-Current-URL` server-side; honor explicit flags only
- Expand TS/Python tests to lock in cache-safe history/title behavior
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56a2445f8c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- stop client-side slug/push-url prediction and navigate submits to `/{classifier}/search`
- add server-side query/version/top_k normalization plus canonical 301/303 redirects
- update TS/Python tests for NFC + whitespace normalization and canonical URL behavior
- Add `q` fallback for long or lossy search text
- Keep initial loader text and redirects aligned with the preserved query
- Extend slug length coverage in helper tests
@coderabbitai
Copy link

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

Client-side form submit interception and Unicode-aware query normalization were added; the classifier form now performs standard GET navigation to a new /.../search endpoint that 303-redirects to canonical slugged page URLs. HTMX fragment requests are history-disabled and canonicalization/lossy-query handling moved server-side.

Changes

Cohort / File(s) Summary
TypeScript Classifier Implementation
app/assets/ts/classifier.ts, app/assets/ts/classifier.test.ts
Added setupSearchNavigation() to intercept form[hx-get] submits (capture), normalize product_description (NFC + collapse/trim), construct navigation URL, and delegate to window.__classifierNavigate or location.assign. Introduced allowFragmentSubmit to allow HTMX-originated submits. Tests updated to use production-like form, hidden track_usage=true, loader hx-vals canonical params, and adjusted animation timing assertions. New tests assert canonical fragment params, loader history-disabled behavior, and server navigation for real submits/version changes.
TypeScript Globals & Test Setup
app/assets/ts/types/globals.ts, app/assets/ts/test/setup.ts
Added Window.__classifierNavigate?: (url: string) => void typing and clear it in test beforeEach.
HTML Template
app/templates/classifier_page.html
Changed form to standard GET (action="/.../search", method="get") while retaining hx-get/hx-target; added hx-push-url="false" and hidden track_usage=true; removed push_url from initial loader hx-vals.
Backend Web Logic
app/web.py
Added Unicode-aware normalization utilities (normalize_query_text, normalize_version, normalize_page_top_k) and canonical builders (build_clean_page_url, build_clean_classifier_url). Introduced SLUG_MAX_LENGTH and LOSSY_QUERY_PARAM (q). Added GET /{classifier_type}/search handler that normalizes inputs and 303-redirects to canonical page URL; page route now accepts q and 301-redirects non-canonical requests. Fragment route normalized inputs, removed server inference of HX-Push-Url from HX-Current-URL, and uses resolve_push_url_enabled/resolve_page_title_enabled. Normalized values used for rendering and cache-key inputs.
Tests & Helpers
tests/test_web_fragment_history.py, tests/test_web_helpers.py, tests/test_homepage_headers.py
Expanded test helpers to parse hx-vals and detect OOB <title>. Fragment-history tests accept request headers (e.g., HX-Current-URL) and assert presence/absence of HX-Push-Url and OOB title depending on track_usage. Added extensive normalization/slug/canonicalization tests (including NFC, whitespace collapse, Arabic preservation, 1000-char slug cap), redirect expectations for /.../search (303) and page canonicalization (301), and updated homepage header canonical assertion (removed trailing slash).
Documentation
AGENTS.md
Added constraints: require hidden track_usage=true, disallow server-side inference of HX-Push-Url from HX-Current-URL, forbid mirroring slugify() in frontend to predict URLs; instruct keeping fragment requests history-disabled and relying on server redirects for canonical URLs.

Sequence Diagram

sequenceDiagram
    actor User
    participant Form as Form (HTML)
    participant JS as JS (classifier.ts)
    participant Browser as Browser
    participant Server as Server (app/web.py)

    User->>Form: submit search
    Form->>JS: submit event (capture)
    JS->>JS: normalize product_description (NFC + collapse/trim)
    JS->>JS: build URL from FormData (exclude track_usage)
    JS->>Browser: navigate via window.__classifierNavigate or location.assign
    Browser->>Server: GET /{classifier_type}/search with params
    Server->>Server: normalize inputs, build canonical URL (may set q)
    Server->>Browser: 303 Redirect -> canonical page URL
    Browser->>Server: GET canonical page URL
    Server->>Server: render page with initial-loader hx-vals (canonical params, history-disabled)
    Server->>Browser: HTML response
    Browser->>Form: HTMX loads/swaps fragment (history unchanged)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I hop and tidy queries bright,

NFC and spaces folded tight.
Forms now steer where servers reign,
Slugs stay true — no frontend feign.
Clean URLs bloom in morning light.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective of the PR: moving first-party HX push URL decisions from server-side inference to client-side to protect cacheability.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch t3code/remove-push-url-cache-split
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch t3code/remove-push-url-cache-split

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DmitryMatv
Copy link
Owner Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f1b67946e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 556 to +559
"url_params": {
"search": decoded_search_query,
"version": version if version and version != first_version else "",
"top_k": top_k,
"version": (
normalized_version if normalized_version != first_version else ""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep form query aligned with canonical lossy query

When a canonical URL carries the original text in q (for characters that cannot round-trip through the slug), this context still seeds url_params.search from decoded_search_query instead of the already-computed effective_query. That means pages like /NAICS/bolt_nut_washer?q=bolt+%26+nut+%2F+washer render results for bolt & nut / washer but populate the textarea with bolt nut washer, and the next top-k/version interaction resubmits the mutated text and drops q, changing the query unexpectedly.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/test_web_fragment_history.py (1)

75-79: Match the title element, not just any OOB swap.

These helpers currently pass on any hx-swap-oob="true" marker in the fragment, so an unrelated OOB partial could hide a title regression. Tightening the matcher to the <title> node would keep the assertion aligned with the helper name.

♻️ Tighten the helper to the title node
 def assert_title_oob_present(self, response: httpx.Response) -> None:
-        self.assertIn('hx-swap-oob="true"', response.text)
+        self.assertRegex(response.text, r'<title[^>]*hx-swap-oob="true"')

 def assert_title_oob_absent(self, response: httpx.Response) -> None:
-        self.assertNotIn('hx-swap-oob="true"', response.text)
+        self.assertNotRegex(response.text, r'<title[^>]*hx-swap-oob="true"')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_web_fragment_history.py` around lines 75 - 79, The helpers
assert_title_oob_present and assert_title_oob_absent should match an out-of-band
swap specifically on the <title> element rather than any occurrence of
hx-swap-oob="true"; update both to search for the title node with that attribute
(e.g., look for '<title hx-swap-oob="true"' or use an HTML parser to find a
<title> element with attribute 'hx-swap-oob' set to "true") and invert the
assertion in assert_title_oob_absent accordingly so the presence/absence check
targets the <title> element only.
tests/test_web_helpers.py (1)

20-46: Make the HTML extraction helpers attribute-order agnostic.

Line 22 and Line 32 hard-code incidental markup details like attribute ordering and single-quoted hx-vals. That will make a harmless template rewrite fail a large chunk of this suite without changing behavior. Parsing attributes semantically instead of regexing raw tags would keep these contract tests much less brittle.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_web_helpers.py` around lines 20 - 46, The helper functions
extract_initial_loader_vals, extract_classifier_form_tag, and
extract_initial_loader_tag are brittle because they assume attribute order and
single quotes; replace the regex-with-order approach by parsing the markup with
an HTML parser (e.g., html.parser or BeautifulSoup), locate the element by
attribute name (find the <div> with data-initial-results-loader="true", find the
<form> that has an hx-get attribute), then read attributes semantically
(accepting single or double quotes) to return the tag string or the hx-vals
value (unescape and json.loads for extract_initial_loader_vals) so attribute
ordering or quoting changes won’t break tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_web_fragment_history.py`:
- Around line 75-79: The helpers assert_title_oob_present and
assert_title_oob_absent should match an out-of-band swap specifically on the
<title> element rather than any occurrence of hx-swap-oob="true"; update both to
search for the title node with that attribute (e.g., look for '<title
hx-swap-oob="true"' or use an HTML parser to find a <title> element with
attribute 'hx-swap-oob' set to "true") and invert the assertion in
assert_title_oob_absent accordingly so the presence/absence check targets the
<title> element only.

In `@tests/test_web_helpers.py`:
- Around line 20-46: The helper functions extract_initial_loader_vals,
extract_classifier_form_tag, and extract_initial_loader_tag are brittle because
they assume attribute order and single quotes; replace the regex-with-order
approach by parsing the markup with an HTML parser (e.g., html.parser or
BeautifulSoup), locate the element by attribute name (find the <div> with
data-initial-results-loader="true", find the <form> that has an hx-get
attribute), then read attributes semantically (accepting single or double
quotes) to return the tag string or the hx-vals value (unescape and json.loads
for extract_initial_loader_vals) so attribute ordering or quoting changes won’t
break tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 72cd3665-e261-4a90-9d48-d91a94838da8

📥 Commits

Reviewing files that changed from the base of the PR and between 3a82bc7 and 9f1b679.

📒 Files selected for processing (10)
  • AGENTS.md
  • app/assets/ts/classifier.test.ts
  • app/assets/ts/classifier.ts
  • app/assets/ts/test/setup.ts
  • app/assets/ts/types/globals.ts
  • app/templates/classifier_page.html
  • app/web.py
  • tests/test_homepage_headers.py
  • tests/test_web_fragment_history.py
  • tests/test_web_helpers.py

- Stop varying fragment URLs on browser-local push state
- Tighten HTML-based tests for form and loader attributes
- Cap slug normalization input at 1000 characters
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
app/web.py (1)

496-500: Consider behavior with unexpected query parameters.

The canonical URL comparison strips unknown query parameters. If a request has unexpected params (e.g., UTM tracking), they'll trigger a redirect that removes them. This is likely intentional for cache consistency, but worth documenting if partners use tracking params.

The current behavior (stripping unknown params via redirect) is reasonable for cache consistency. If you need to preserve certain tracking parameters, consider explicitly handling them in build_clean_page_url.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/web.py` around lines 496 - 500, The redirect logic compares
requested_path_and_query (built from request.url.path and request.url.query)
against canonical_page_url and therefore strips any unexpected query params
(e.g., UTM) via the 301 redirect; update behavior by either documenting this in
the code/comments or modifying build_clean_page_url to accept a whitelist of
tracking params and include those allowed params when constructing
canonical_page_url so the comparison preserves them, and ensure the redirect
logic (where requested_path_and_query is compared to canonical_page_url) uses
the same canonicalization rules/whitelist to avoid dropping permitted tracking
parameters.
tests/test_web_helpers.py (2)

403-413: Potential URL encoding mismatch in long query assertion.

The assertion at line 413 checks for ?q={long_query} literally, but the redirect location header will have the query URL-encoded. Since long_query is just "x" * 1200, this works because x doesn't need encoding. However, this makes the test brittle if the test data changes.

💡 Optional: Use URL-encoded comparison for robustness
     async def test_search_redirect_preserves_long_query_beyond_slug_limit(self) -> None:
         long_query = "x" * 1200
         response = await self._request(
             "/NAICS/search",
             params={"product_description": long_query},
         )

         self.assertEqual(response.status_code, 303)
         location = response.headers["location"]
         self.assertTrue(location.startswith("/NAICS/" + ("x" * 1000)))
-        self.assertIn(f"?q={long_query}", location)
+        self.assertIn(f"q={long_query}", location)  # 'x' chars don't need encoding
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_web_helpers.py` around lines 403 - 413, The test
test_search_redirect_preserves_long_query_beyond_slug_limit assumes the raw
long_query appears verbatim in the Location header but real redirects URL-encode
query params; update the assertion to handle encoding by either encoding
long_query with urllib.parse.quote (or quote_plus) before checking presence in
location, or better parse the redirect URL with urllib.parse.urlparse and
urllib.parse.parse_qs and assert the parsed "q" parameter equals long_query;
reference variables/function names:
test_search_redirect_preserves_long_query_beyond_slug_limit, long_query,
response, location.

47-58: Consider adding context to JSON parse errors.

If hx-vals contains malformed JSON, json.loads() will raise a generic JSONDecodeError. In test helpers, wrapping this with additional context could improve debuggability.

💡 Optional: Add context to JSON parse errors
     hx_vals = attrs.get("hx-vals")
     if hx_vals is None:
         raise AssertionError("Initial results loader hx-vals not found")
-    return json.loads(hx_vals)
+    try:
+        return json.loads(hx_vals)
+    except json.JSONDecodeError as e:
+        raise AssertionError(f"Invalid JSON in hx-vals: {hx_vals!r}") from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_web_helpers.py` around lines 47 - 58, The helper
extract_initial_loader_vals currently calls json.loads(hx_vals) which will raise
a generic JSONDecodeError on malformed input; update extract_initial_loader_vals
to catch json.JSONDecodeError (import json if not already) around the json.loads
call and raise an AssertionError with additional context that includes the
failing hx_vals (or a truncated preview) and the original error message to make
failures from find_tag_attrs/attrs.get("hx-vals") easier to debug.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/web.py`:
- Around line 496-500: The redirect logic compares requested_path_and_query
(built from request.url.path and request.url.query) against canonical_page_url
and therefore strips any unexpected query params (e.g., UTM) via the 301
redirect; update behavior by either documenting this in the code/comments or
modifying build_clean_page_url to accept a whitelist of tracking params and
include those allowed params when constructing canonical_page_url so the
comparison preserves them, and ensure the redirect logic (where
requested_path_and_query is compared to canonical_page_url) uses the same
canonicalization rules/whitelist to avoid dropping permitted tracking
parameters.

In `@tests/test_web_helpers.py`:
- Around line 403-413: The test
test_search_redirect_preserves_long_query_beyond_slug_limit assumes the raw
long_query appears verbatim in the Location header but real redirects URL-encode
query params; update the assertion to handle encoding by either encoding
long_query with urllib.parse.quote (or quote_plus) before checking presence in
location, or better parse the redirect URL with urllib.parse.urlparse and
urllib.parse.parse_qs and assert the parsed "q" parameter equals long_query;
reference variables/function names:
test_search_redirect_preserves_long_query_beyond_slug_limit, long_query,
response, location.
- Around line 47-58: The helper extract_initial_loader_vals currently calls
json.loads(hx_vals) which will raise a generic JSONDecodeError on malformed
input; update extract_initial_loader_vals to catch json.JSONDecodeError (import
json if not already) around the json.loads call and raise an AssertionError with
additional context that includes the failing hx_vals (or a truncated preview)
and the original error message to make failures from
find_tag_attrs/attrs.get("hx-vals") easier to debug.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 539974d7-0783-4291-ab99-519bcb7f1c8c

📥 Commits

Reviewing files that changed from the base of the PR and between 9f1b679 and fefdb8e.

📒 Files selected for processing (3)
  • app/web.py
  • tests/test_web_fragment_history.py
  • tests/test_web_helpers.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant