Move first-party HX push URL decisions to client-side to protect cacheability by DmitryMatv · Pull Request #11 · DmitryMatv/classifast

DmitryMatv · 2026-03-18T13:31:42Z

Summary

Moves first-party history/push URL targeting from server inference to client-side hx-push-url updates in classifier.ts.
Keeps fragment requests cache-safe by removing push_url from loader/form params and avoiding HX-Current-URL-driven server behavior.
Adds explicit hidden track_usage=true to the classifier form and preserves history-disabled initial loader behavior.
Refactors URL/history helpers in app/web.py (build_clean_classifier_url, resolve_push_url_enabled, resolve_page_title_enabled) and aligns title OOB behavior with explicit flags.
Expands frontend and backend test coverage for canonical params, push URL behavior, title swaps, and rendered page contracts.
Documents cache/history gotchas in AGENTS.md to prevent regressions.

Testing

Added/updated unit tests in app/assets/ts/classifier.test.ts for canonical form params, loader params, and dynamic hx-push-url behavior.
Added backend fragment history contract checks in tests/test_web_fragment_history.py for omitted/explicit push URL and title OOB behavior.
Added page rendering contract checks in tests/test_web_helpers.py for form/loader attributes and canonical hx-vals.
Not run in this PR-writing context.

Summary by CodeRabbit

New Features
- Added a dedicated search endpoint and server-driven redirects to enforce canonical classifier page URLs.
Improvements
- Canonical URL generation with Unicode/whitespace normalization and lossy-query preservation for stable clean URLs.
- Form behavior updated to normalize inputs client-side, disable fragment push-to-history by default, and include explicit tracking on submissions.
Documentation
- Clarified cache/first-party fragment behavior and guidance to avoid client-side URL prediction.
Tests
- Expanded tests for normalization, canonicalization, fragment history, and navigation behavior.

- Keep fragment params canonical by removing `push_url` from loader/form payloads - Set `hx-push-url="false"` in template and compute clean first-party history targets in `classifier.ts` - Stop inferring history updates from `HX-Current-URL` server-side; honor explicit flags only - Expand TS/Python tests to lock in cache-safe history/title behavior

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56a2445f8c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

app/assets/ts/classifier.ts

- stop client-side slug/push-url prediction and navigate submits to `/{classifier}/search` - add server-side query/version/top_k normalization plus canonical 301/303 redirects - update TS/Python tests for NFC + whitespace normalization and canonical URL behavior

- Add `q` fallback for long or lossy search text - Keep initial loader text and redirects aligned with the preserved query - Extend slug length coverage in helper tests

coderabbitai · 2026-03-24T12:26:49Z

📝 Walkthrough

Walkthrough

Client-side form submit interception and Unicode-aware query normalization were added; the classifier form now performs standard GET navigation to a new /.../search endpoint that 303-redirects to canonical slugged page URLs. HTMX fragment requests are history-disabled and canonicalization/lossy-query handling moved server-side.

Changes

Cohort / File(s)	Summary
TypeScript Classifier Implementation `app/assets/ts/classifier.ts`, `app/assets/ts/classifier.test.ts`	Added `setupSearchNavigation()` to intercept `form[hx-get]` submits (capture), normalize `product_description` (NFC + collapse/trim), construct navigation URL, and delegate to `window.__classifierNavigate` or `location.assign`. Introduced `allowFragmentSubmit` to allow HTMX-originated submits. Tests updated to use production-like form, hidden `track_usage=true`, loader `hx-vals` canonical params, and adjusted animation timing assertions. New tests assert canonical fragment params, loader history-disabled behavior, and server navigation for real submits/version changes.
TypeScript Globals & Test Setup `app/assets/ts/types/globals.ts`, `app/assets/ts/test/setup.ts`	Added `Window.__classifierNavigate?: (url: string) => void` typing and clear it in test `beforeEach`.
HTML Template `app/templates/classifier_page.html`	Changed form to standard GET (`action="/.../search"`, `method="get"`) while retaining `hx-get`/`hx-target`; added `hx-push-url="false"` and hidden `track_usage=true`; removed `push_url` from initial loader `hx-vals`.
Backend Web Logic `app/web.py`	Added Unicode-aware normalization utilities (`normalize_query_text`, `normalize_version`, `normalize_page_top_k`) and canonical builders (`build_clean_page_url`, `build_clean_classifier_url`). Introduced `SLUG_MAX_LENGTH` and `LOSSY_QUERY_PARAM` (`q`). Added `GET /{classifier_type}/search` handler that normalizes inputs and 303-redirects to canonical page URL; page route now accepts `q` and 301-redirects non-canonical requests. Fragment route normalized inputs, removed server inference of `HX-Push-Url` from `HX-Current-URL`, and uses `resolve_push_url_enabled`/`resolve_page_title_enabled`. Normalized values used for rendering and cache-key inputs.
Tests & Helpers `tests/test_web_fragment_history.py`, `tests/test_web_helpers.py`, `tests/test_homepage_headers.py`	Expanded test helpers to parse `hx-vals` and detect OOB `<title>`. Fragment-history tests accept request headers (e.g., `HX-Current-URL`) and assert presence/absence of `HX-Push-Url` and OOB title depending on `track_usage`. Added extensive normalization/slug/canonicalization tests (including NFC, whitespace collapse, Arabic preservation, 1000-char slug cap), redirect expectations for `/.../search` (303) and page canonicalization (301), and updated homepage header canonical assertion (removed trailing slash).
Documentation `AGENTS.md`	Added constraints: require hidden `track_usage=true`, disallow server-side inference of `HX-Push-Url` from `HX-Current-URL`, forbid mirroring `slugify()` in frontend to predict URLs; instruct keeping fragment requests history-disabled and relying on server redirects for canonical URLs.

Sequence Diagram

sequenceDiagram
    actor User
    participant Form as Form (HTML)
    participant JS as JS (classifier.ts)
    participant Browser as Browser
    participant Server as Server (app/web.py)

    User->>Form: submit search
    Form->>JS: submit event (capture)
    JS->>JS: normalize product_description (NFC + collapse/trim)
    JS->>JS: build URL from FormData (exclude track_usage)
    JS->>Browser: navigate via window.__classifierNavigate or location.assign
    Browser->>Server: GET /{classifier_type}/search with params
    Server->>Server: normalize inputs, build canonical URL (may set q)
    Server->>Browser: 303 Redirect -> canonical page URL
    Browser->>Server: GET canonical page URL
    Server->>Server: render page with initial-loader hx-vals (canonical params, history-disabled)
    Server->>Browser: HTML response
    Browser->>Form: HTMX loads/swaps fragment (history unchanged)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I hop and tidy queries bright,

NFC and spaces folded tight.
Forms now steer where servers reign,
Slugs stay true — no frontend feign.
Clean URLs bloom in morning light.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.11% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main objective of the PR: moving first-party HX push URL decisions from server-side inference to client-side to protect cacheability.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch t3code/remove-push-url-cache-split

⚔️ Resolve merge conflicts

Resolve merge conflict in branch t3code/remove-push-url-cache-split

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

DmitryMatv · 2026-03-24T12:27:52Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f1b67946e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T12:34:07Z

app/web.py

            "url_params": {
                "search": decoded_search_query,
-                "version": version if version and version != first_version else "",
-                "top_k": top_k,
+                "version": (
+                    normalized_version if normalized_version != first_version else ""


Keep form query aligned with canonical lossy query

When a canonical URL carries the original text in q (for characters that cannot round-trip through the slug), this context still seeds url_params.search from decoded_search_query instead of the already-computed effective_query. That means pages like /NAICS/bolt_nut_washer?q=bolt+%26+nut+%2F+washer render results for bolt & nut / washer but populate the textarea with bolt nut washer, and the next top-k/version interaction resubmits the mutated text and drops q, changing the query unexpectedly.

Useful? React with 👍 / 👎.

app/web.py

coderabbitai

🧹 Nitpick comments (2)

tests/test_web_fragment_history.py (1)

75-79: Match the title element, not just any OOB swap.

These helpers currently pass on any hx-swap-oob="true" marker in the fragment, so an unrelated OOB partial could hide a title regression. Tightening the matcher to the <title> node would keep the assertion aligned with the helper name.

♻️ Tighten the helper to the title node

 def assert_title_oob_present(self, response: httpx.Response) -> None:
-        self.assertIn('hx-swap-oob="true"', response.text)
+        self.assertRegex(response.text, r'<title[^>]*hx-swap-oob="true"')

 def assert_title_oob_absent(self, response: httpx.Response) -> None:
-        self.assertNotIn('hx-swap-oob="true"', response.text)
+        self.assertNotRegex(response.text, r'<title[^>]*hx-swap-oob="true"')

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_web_fragment_history.py` around lines 75 - 79, The helpers
assert_title_oob_present and assert_title_oob_absent should match an out-of-band
swap specifically on the <title> element rather than any occurrence of
hx-swap-oob="true"; update both to search for the title node with that attribute
(e.g., look for '<title hx-swap-oob="true"' or use an HTML parser to find a
<title> element with attribute 'hx-swap-oob' set to "true") and invert the
assertion in assert_title_oob_absent accordingly so the presence/absence check
targets the <title> element only.

tests/test_web_helpers.py (1)

20-46: Make the HTML extraction helpers attribute-order agnostic.

Line 22 and Line 32 hard-code incidental markup details like attribute ordering and single-quoted hx-vals. That will make a harmless template rewrite fail a large chunk of this suite without changing behavior. Parsing attributes semantically instead of regexing raw tags would keep these contract tests much less brittle.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_web_helpers.py` around lines 20 - 46, The helper functions
extract_initial_loader_vals, extract_classifier_form_tag, and
extract_initial_loader_tag are brittle because they assume attribute order and
single quotes; replace the regex-with-order approach by parsing the markup with
an HTML parser (e.g., html.parser or BeautifulSoup), locate the element by
attribute name (find the <div> with data-initial-results-loader="true", find the
<form> that has an hx-get attribute), then read attributes semantically
(accepting single or double quotes) to return the tag string or the hx-vals
value (unescape and json.loads for extract_initial_loader_vals) so attribute
ordering or quoting changes won’t break tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_web_fragment_history.py`:
- Around line 75-79: The helpers assert_title_oob_present and
assert_title_oob_absent should match an out-of-band swap specifically on the
<title> element rather than any occurrence of hx-swap-oob="true"; update both to
search for the title node with that attribute (e.g., look for '<title
hx-swap-oob="true"' or use an HTML parser to find a <title> element with
attribute 'hx-swap-oob' set to "true") and invert the assertion in
assert_title_oob_absent accordingly so the presence/absence check targets the
<title> element only.

In `@tests/test_web_helpers.py`:
- Around line 20-46: The helper functions extract_initial_loader_vals,
extract_classifier_form_tag, and extract_initial_loader_tag are brittle because
they assume attribute order and single quotes; replace the regex-with-order
approach by parsing the markup with an HTML parser (e.g., html.parser or
BeautifulSoup), locate the element by attribute name (find the <div> with
data-initial-results-loader="true", find the <form> that has an hx-get
attribute), then read attributes semantically (accepting single or double
quotes) to return the tag string or the hx-vals value (unescape and json.loads
for extract_initial_loader_vals) so attribute ordering or quoting changes won’t
break tests.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 72cd3665-e261-4a90-9d48-d91a94838da8

📥 Commits

Reviewing files that changed from the base of the PR and between 3a82bc7 and 9f1b679.

📒 Files selected for processing (10)

AGENTS.md
app/assets/ts/classifier.test.ts
app/assets/ts/classifier.ts
app/assets/ts/test/setup.ts
app/assets/ts/types/globals.ts
app/templates/classifier_page.html
app/web.py
tests/test_homepage_headers.py
tests/test_web_fragment_history.py
tests/test_web_helpers.py

- Stop varying fragment URLs on browser-local push state - Tighten HTML-based tests for form and loader attributes - Cap slug normalization input at 1000 characters

coderabbitai

🧹 Nitpick comments (3)

app/web.py (1)

496-500: Consider behavior with unexpected query parameters.

The canonical URL comparison strips unknown query parameters. If a request has unexpected params (e.g., UTM tracking), they'll trigger a redirect that removes them. This is likely intentional for cache consistency, but worth documenting if partners use tracking params.

The current behavior (stripping unknown params via redirect) is reasonable for cache consistency. If you need to preserve certain tracking parameters, consider explicitly handling them in build_clean_page_url.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/web.py` around lines 496 - 500, The redirect logic compares
requested_path_and_query (built from request.url.path and request.url.query)
against canonical_page_url and therefore strips any unexpected query params
(e.g., UTM) via the 301 redirect; update behavior by either documenting this in
the code/comments or modifying build_clean_page_url to accept a whitelist of
tracking params and include those allowed params when constructing
canonical_page_url so the comparison preserves them, and ensure the redirect
logic (where requested_path_and_query is compared to canonical_page_url) uses
the same canonicalization rules/whitelist to avoid dropping permitted tracking
parameters.

tests/test_web_helpers.py (2)

403-413: Potential URL encoding mismatch in long query assertion.

The assertion at line 413 checks for ?q={long_query} literally, but the redirect location header will have the query URL-encoded. Since long_query is just "x" * 1200, this works because x doesn't need encoding. However, this makes the test brittle if the test data changes.

💡 Optional: Use URL-encoded comparison for robustness

     async def test_search_redirect_preserves_long_query_beyond_slug_limit(self) -> None:
         long_query = "x" * 1200
         response = await self._request(
             "/NAICS/search",
             params={"product_description": long_query},
         )

         self.assertEqual(response.status_code, 303)
         location = response.headers["location"]
         self.assertTrue(location.startswith("/NAICS/" + ("x" * 1000)))
-        self.assertIn(f"?q={long_query}", location)
+        self.assertIn(f"q={long_query}", location)  # 'x' chars don't need encoding

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_web_helpers.py` around lines 403 - 413, The test
test_search_redirect_preserves_long_query_beyond_slug_limit assumes the raw
long_query appears verbatim in the Location header but real redirects URL-encode
query params; update the assertion to handle encoding by either encoding
long_query with urllib.parse.quote (or quote_plus) before checking presence in
location, or better parse the redirect URL with urllib.parse.urlparse and
urllib.parse.parse_qs and assert the parsed "q" parameter equals long_query;
reference variables/function names:
test_search_redirect_preserves_long_query_beyond_slug_limit, long_query,
response, location.

47-58: Consider adding context to JSON parse errors.

If hx-vals contains malformed JSON, json.loads() will raise a generic JSONDecodeError. In test helpers, wrapping this with additional context could improve debuggability.

💡 Optional: Add context to JSON parse errors

     hx_vals = attrs.get("hx-vals")
     if hx_vals is None:
         raise AssertionError("Initial results loader hx-vals not found")
-    return json.loads(hx_vals)
+    try:
+        return json.loads(hx_vals)
+    except json.JSONDecodeError as e:
+        raise AssertionError(f"Invalid JSON in hx-vals: {hx_vals!r}") from e

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_web_helpers.py` around lines 47 - 58, The helper
extract_initial_loader_vals currently calls json.loads(hx_vals) which will raise
a generic JSONDecodeError on malformed input; update extract_initial_loader_vals
to catch json.JSONDecodeError (import json if not already) around the json.loads
call and raise an AssertionError with additional context that includes the
failing hx_vals (or a truncated preview) and the original error message to make
failures from find_tag_attrs/attrs.get("hx-vals") easier to debug.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/web.py`:
- Around line 496-500: The redirect logic compares requested_path_and_query
(built from request.url.path and request.url.query) against canonical_page_url
and therefore strips any unexpected query params (e.g., UTM) via the 301
redirect; update behavior by either documenting this in the code/comments or
modifying build_clean_page_url to accept a whitelist of tracking params and
include those allowed params when constructing canonical_page_url so the
comparison preserves them, and ensure the redirect logic (where
requested_path_and_query is compared to canonical_page_url) uses the same
canonicalization rules/whitelist to avoid dropping permitted tracking
parameters.

In `@tests/test_web_helpers.py`:
- Around line 403-413: The test
test_search_redirect_preserves_long_query_beyond_slug_limit assumes the raw
long_query appears verbatim in the Location header but real redirects URL-encode
query params; update the assertion to handle encoding by either encoding
long_query with urllib.parse.quote (or quote_plus) before checking presence in
location, or better parse the redirect URL with urllib.parse.urlparse and
urllib.parse.parse_qs and assert the parsed "q" parameter equals long_query;
reference variables/function names:
test_search_redirect_preserves_long_query_beyond_slug_limit, long_query,
response, location.
- Around line 47-58: The helper extract_initial_loader_vals currently calls
json.loads(hx_vals) which will raise a generic JSONDecodeError on malformed
input; update extract_initial_loader_vals to catch json.JSONDecodeError (import
json if not already) around the json.loads call and raise an AssertionError with
additional context that includes the failing hx_vals (or a truncated preview)
and the original error message to make failures from
find_tag_attrs/attrs.get("hx-vals") easier to debug.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 539974d7-0783-4291-ab99-519bcb7f1c8c

📥 Commits

Reviewing files that changed from the base of the PR and between 9f1b679 and fefdb8e.

📒 Files selected for processing (3)

app/web.py
tests/test_web_fragment_history.py
tests/test_web_helpers.py

chatgpt-codex-connector bot reviewed Mar 18, 2026

View reviewed changes

app/assets/ts/classifier.ts Outdated Show resolved Hide resolved

DmitryMatv added 2 commits March 18, 2026 20:37

Preserve lossy queries in canonical page URLs

9f1b679

- Add `q` fallback for long or lossy search text - Keep initial loader text and redirects aligned with the preserved query - Extend slug length coverage in helper tests

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

Remove push-url cache split from classifier pages

fefdb8e

- Stop varying fragment URLs on browser-local push state - Tighten HTML-based tests for form and loader attributes - Cap slug normalization input at 1000 characters

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move first-party HX push URL decisions to client-side to protect cacheability#11

Move first-party HX push URL decisions to client-side to protect cacheability#11
DmitryMatv wants to merge 4 commits intomainfrom
t3code/remove-push-url-cache-split

DmitryMatv commented Mar 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

DmitryMatv commented Mar 24, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DmitryMatv commented Mar 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

DmitryMatv commented Mar 24, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DmitryMatv commented Mar 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading