Move first-party HX push URL decisions to client-side to protect cacheability#11
Move first-party HX push URL decisions to client-side to protect cacheability#11DmitryMatv wants to merge 4 commits intomainfrom
Conversation
- Keep fragment params canonical by removing `push_url` from loader/form payloads - Set `hx-push-url="false"` in template and compute clean first-party history targets in `classifier.ts` - Stop inferring history updates from `HX-Current-URL` server-side; honor explicit flags only - Expand TS/Python tests to lock in cache-safe history/title behavior
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 56a2445f8c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- stop client-side slug/push-url prediction and navigate submits to `/{classifier}/search`
- add server-side query/version/top_k normalization plus canonical 301/303 redirects
- update TS/Python tests for NFC + whitespace normalization and canonical URL behavior
- Add `q` fallback for long or lossy search text - Keep initial loader text and redirects aligned with the preserved query - Extend slug length coverage in helper tests
📝 WalkthroughWalkthroughClient-side form submit interception and Unicode-aware query normalization were added; the classifier form now performs standard GET navigation to a new Changes
Sequence DiagramsequenceDiagram
actor User
participant Form as Form (HTML)
participant JS as JS (classifier.ts)
participant Browser as Browser
participant Server as Server (app/web.py)
User->>Form: submit search
Form->>JS: submit event (capture)
JS->>JS: normalize product_description (NFC + collapse/trim)
JS->>JS: build URL from FormData (exclude track_usage)
JS->>Browser: navigate via window.__classifierNavigate or location.assign
Browser->>Server: GET /{classifier_type}/search with params
Server->>Server: normalize inputs, build canonical URL (may set q)
Server->>Browser: 303 Redirect -> canonical page URL
Browser->>Server: GET canonical page URL
Server->>Server: render page with initial-loader hx-vals (canonical params, history-disabled)
Server->>Browser: HTML response
Browser->>Form: HTMX loads/swaps fragment (history unchanged)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f1b67946e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "url_params": { | ||
| "search": decoded_search_query, | ||
| "version": version if version and version != first_version else "", | ||
| "top_k": top_k, | ||
| "version": ( | ||
| normalized_version if normalized_version != first_version else "" |
There was a problem hiding this comment.
Keep form query aligned with canonical lossy query
When a canonical URL carries the original text in q (for characters that cannot round-trip through the slug), this context still seeds url_params.search from decoded_search_query instead of the already-computed effective_query. That means pages like /NAICS/bolt_nut_washer?q=bolt+%26+nut+%2F+washer render results for bolt & nut / washer but populate the textarea with bolt nut washer, and the next top-k/version interaction resubmits the mutated text and drops q, changing the query unexpectedly.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/test_web_fragment_history.py (1)
75-79: Match the title element, not just any OOB swap.These helpers currently pass on any
hx-swap-oob="true"marker in the fragment, so an unrelated OOB partial could hide a title regression. Tightening the matcher to the<title>node would keep the assertion aligned with the helper name.♻️ Tighten the helper to the title node
def assert_title_oob_present(self, response: httpx.Response) -> None: - self.assertIn('hx-swap-oob="true"', response.text) + self.assertRegex(response.text, r'<title[^>]*hx-swap-oob="true"') def assert_title_oob_absent(self, response: httpx.Response) -> None: - self.assertNotIn('hx-swap-oob="true"', response.text) + self.assertNotRegex(response.text, r'<title[^>]*hx-swap-oob="true"')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_web_fragment_history.py` around lines 75 - 79, The helpers assert_title_oob_present and assert_title_oob_absent should match an out-of-band swap specifically on the <title> element rather than any occurrence of hx-swap-oob="true"; update both to search for the title node with that attribute (e.g., look for '<title hx-swap-oob="true"' or use an HTML parser to find a <title> element with attribute 'hx-swap-oob' set to "true") and invert the assertion in assert_title_oob_absent accordingly so the presence/absence check targets the <title> element only.tests/test_web_helpers.py (1)
20-46: Make the HTML extraction helpers attribute-order agnostic.Line 22 and Line 32 hard-code incidental markup details like attribute ordering and single-quoted
hx-vals. That will make a harmless template rewrite fail a large chunk of this suite without changing behavior. Parsing attributes semantically instead of regexing raw tags would keep these contract tests much less brittle.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_web_helpers.py` around lines 20 - 46, The helper functions extract_initial_loader_vals, extract_classifier_form_tag, and extract_initial_loader_tag are brittle because they assume attribute order and single quotes; replace the regex-with-order approach by parsing the markup with an HTML parser (e.g., html.parser or BeautifulSoup), locate the element by attribute name (find the <div> with data-initial-results-loader="true", find the <form> that has an hx-get attribute), then read attributes semantically (accepting single or double quotes) to return the tag string or the hx-vals value (unescape and json.loads for extract_initial_loader_vals) so attribute ordering or quoting changes won’t break tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/test_web_fragment_history.py`:
- Around line 75-79: The helpers assert_title_oob_present and
assert_title_oob_absent should match an out-of-band swap specifically on the
<title> element rather than any occurrence of hx-swap-oob="true"; update both to
search for the title node with that attribute (e.g., look for '<title
hx-swap-oob="true"' or use an HTML parser to find a <title> element with
attribute 'hx-swap-oob' set to "true") and invert the assertion in
assert_title_oob_absent accordingly so the presence/absence check targets the
<title> element only.
In `@tests/test_web_helpers.py`:
- Around line 20-46: The helper functions extract_initial_loader_vals,
extract_classifier_form_tag, and extract_initial_loader_tag are brittle because
they assume attribute order and single quotes; replace the regex-with-order
approach by parsing the markup with an HTML parser (e.g., html.parser or
BeautifulSoup), locate the element by attribute name (find the <div> with
data-initial-results-loader="true", find the <form> that has an hx-get
attribute), then read attributes semantically (accepting single or double
quotes) to return the tag string or the hx-vals value (unescape and json.loads
for extract_initial_loader_vals) so attribute ordering or quoting changes won’t
break tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 72cd3665-e261-4a90-9d48-d91a94838da8
📒 Files selected for processing (10)
AGENTS.mdapp/assets/ts/classifier.test.tsapp/assets/ts/classifier.tsapp/assets/ts/test/setup.tsapp/assets/ts/types/globals.tsapp/templates/classifier_page.htmlapp/web.pytests/test_homepage_headers.pytests/test_web_fragment_history.pytests/test_web_helpers.py
- Stop varying fragment URLs on browser-local push state - Tighten HTML-based tests for form and loader attributes - Cap slug normalization input at 1000 characters
There was a problem hiding this comment.
🧹 Nitpick comments (3)
app/web.py (1)
496-500: Consider behavior with unexpected query parameters.The canonical URL comparison strips unknown query parameters. If a request has unexpected params (e.g., UTM tracking), they'll trigger a redirect that removes them. This is likely intentional for cache consistency, but worth documenting if partners use tracking params.
The current behavior (stripping unknown params via redirect) is reasonable for cache consistency. If you need to preserve certain tracking parameters, consider explicitly handling them in
build_clean_page_url.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/web.py` around lines 496 - 500, The redirect logic compares requested_path_and_query (built from request.url.path and request.url.query) against canonical_page_url and therefore strips any unexpected query params (e.g., UTM) via the 301 redirect; update behavior by either documenting this in the code/comments or modifying build_clean_page_url to accept a whitelist of tracking params and include those allowed params when constructing canonical_page_url so the comparison preserves them, and ensure the redirect logic (where requested_path_and_query is compared to canonical_page_url) uses the same canonicalization rules/whitelist to avoid dropping permitted tracking parameters.tests/test_web_helpers.py (2)
403-413: Potential URL encoding mismatch in long query assertion.The assertion at line 413 checks for
?q={long_query}literally, but the redirect location header will have the query URL-encoded. Sincelong_queryis just"x" * 1200, this works becausexdoesn't need encoding. However, this makes the test brittle if the test data changes.💡 Optional: Use URL-encoded comparison for robustness
async def test_search_redirect_preserves_long_query_beyond_slug_limit(self) -> None: long_query = "x" * 1200 response = await self._request( "/NAICS/search", params={"product_description": long_query}, ) self.assertEqual(response.status_code, 303) location = response.headers["location"] self.assertTrue(location.startswith("/NAICS/" + ("x" * 1000))) - self.assertIn(f"?q={long_query}", location) + self.assertIn(f"q={long_query}", location) # 'x' chars don't need encoding🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_web_helpers.py` around lines 403 - 413, The test test_search_redirect_preserves_long_query_beyond_slug_limit assumes the raw long_query appears verbatim in the Location header but real redirects URL-encode query params; update the assertion to handle encoding by either encoding long_query with urllib.parse.quote (or quote_plus) before checking presence in location, or better parse the redirect URL with urllib.parse.urlparse and urllib.parse.parse_qs and assert the parsed "q" parameter equals long_query; reference variables/function names: test_search_redirect_preserves_long_query_beyond_slug_limit, long_query, response, location.
47-58: Consider adding context to JSON parse errors.If
hx-valscontains malformed JSON,json.loads()will raise a genericJSONDecodeError. In test helpers, wrapping this with additional context could improve debuggability.💡 Optional: Add context to JSON parse errors
hx_vals = attrs.get("hx-vals") if hx_vals is None: raise AssertionError("Initial results loader hx-vals not found") - return json.loads(hx_vals) + try: + return json.loads(hx_vals) + except json.JSONDecodeError as e: + raise AssertionError(f"Invalid JSON in hx-vals: {hx_vals!r}") from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_web_helpers.py` around lines 47 - 58, The helper extract_initial_loader_vals currently calls json.loads(hx_vals) which will raise a generic JSONDecodeError on malformed input; update extract_initial_loader_vals to catch json.JSONDecodeError (import json if not already) around the json.loads call and raise an AssertionError with additional context that includes the failing hx_vals (or a truncated preview) and the original error message to make failures from find_tag_attrs/attrs.get("hx-vals") easier to debug.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@app/web.py`:
- Around line 496-500: The redirect logic compares requested_path_and_query
(built from request.url.path and request.url.query) against canonical_page_url
and therefore strips any unexpected query params (e.g., UTM) via the 301
redirect; update behavior by either documenting this in the code/comments or
modifying build_clean_page_url to accept a whitelist of tracking params and
include those allowed params when constructing canonical_page_url so the
comparison preserves them, and ensure the redirect logic (where
requested_path_and_query is compared to canonical_page_url) uses the same
canonicalization rules/whitelist to avoid dropping permitted tracking
parameters.
In `@tests/test_web_helpers.py`:
- Around line 403-413: The test
test_search_redirect_preserves_long_query_beyond_slug_limit assumes the raw
long_query appears verbatim in the Location header but real redirects URL-encode
query params; update the assertion to handle encoding by either encoding
long_query with urllib.parse.quote (or quote_plus) before checking presence in
location, or better parse the redirect URL with urllib.parse.urlparse and
urllib.parse.parse_qs and assert the parsed "q" parameter equals long_query;
reference variables/function names:
test_search_redirect_preserves_long_query_beyond_slug_limit, long_query,
response, location.
- Around line 47-58: The helper extract_initial_loader_vals currently calls
json.loads(hx_vals) which will raise a generic JSONDecodeError on malformed
input; update extract_initial_loader_vals to catch json.JSONDecodeError (import
json if not already) around the json.loads call and raise an AssertionError with
additional context that includes the failing hx_vals (or a truncated preview)
and the original error message to make failures from
find_tag_attrs/attrs.get("hx-vals") easier to debug.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 539974d7-0783-4291-ab99-519bcb7f1c8c
📒 Files selected for processing (3)
app/web.pytests/test_web_fragment_history.pytests/test_web_helpers.py
Summary
hx-push-urlupdates inclassifier.ts.push_urlfrom loader/form params and avoidingHX-Current-URL-driven server behavior.track_usage=trueto the classifier form and preserves history-disabled initial loader behavior.app/web.py(build_clean_classifier_url,resolve_push_url_enabled,resolve_page_title_enabled) and aligns title OOB behavior with explicit flags.AGENTS.mdto prevent regressions.Testing
app/assets/ts/classifier.test.tsfor canonical form params, loader params, and dynamichx-push-urlbehavior.tests/test_web_fragment_history.pyfor omitted/explicit push URL and title OOB behavior.tests/test_web_helpers.pyfor form/loader attributes and canonicalhx-vals.Summary by CodeRabbit
New Features
Improvements
Documentation
Tests