Skip to content

SEO: Crawler-only reverse proxy for state legislative tracker#703

Merged
anth-volk merged 4 commits intomainfrom
tracker-reverse-proxy
Feb 18, 2026
Merged

SEO: Crawler-only reverse proxy for state legislative tracker#703
anth-volk merged 4 commits intomainfrom
tracker-reverse-proxy

Conversation

@PavelMakarchuk
Copy link
Contributor

Summary

Closes #702
Companion to PolicyEngine/state-legislative-tracker#107 (already merged and deployed to Modal).

Search engines and social media crawlers hitting /us/state-legislative-tracker/* now receive pre-rendered HTML directly from Modal with:

  • Canonical URLs on policyengine.org (not modal.run)
  • Per-bill OG tags (unique title, description per state/bill)
  • JSON-LD structured data
  • Noscript content with bill provisions and revenue impacts

Regular users see no change — they still get the iframe version with the full policyengine.org nav and footer.

Changes

vercel.json — Add /_tracker/:path* rewrite to proxy tracker assets from Modal. These are referenced by the pre-rendered HTML that crawlers receive. Placed before the /(.*) → website.html catch-all so it takes priority.

app/middleware.ts — Add early interception for /us/state-legislative-tracker/* routes:

  • Social crawlers (Facebook, Twitter, LinkedIn, etc.) and search engines (Googlebot, Bingbot, etc.) → fetch pre-rendered HTML from Modal and return it
  • Everyone else → fall through to catch-all rewrite → website.html → iframe (unchanged)

Search engine bots are detected separately from social crawlers to avoid affecting OG tag behavior on other routes.

Architecture

Crawlers:  policyengine.org/us/state-legislative-tracker/GA
           → middleware detects bot → fetches modal.run/GA → returns HTML

Users:     policyengine.org/us/state-legislative-tracker/GA
           → middleware skips → rewrite → website.html → iframe (unchanged)

Assets:    policyengine.org/_tracker/index-*.js
           → Vercel rewrite → modal.run/_tracker/index-*.js

Test plan

  • curl -A "Googlebot" https://policyengine.org/us/state-legislative-tracker/GA → returns tracker HTML with <link rel="canonical" href="https://www.policyengine.org/us/state-legislative-tracker/GA">
  • curl -A "facebookexternalhit" https://policyengine.org/us/state-legislative-tracker/GA → returns tracker HTML with OG tags
  • curl -A "Mozilla/5.0" https://policyengine.org/us/state-legislative-tracker → returns website.html (iframe version)
  • Browser visit to /us/state-legislative-tracker → iframe with PE nav/footer (unchanged)
  • /_tracker/ assets load via Vercel rewrite (check network tab when crawled page renders)

🤖 Generated with Claude Code

Proxy search engine and social media crawler requests for
/us/state-legislative-tracker/* to Modal, serving pre-rendered HTML
with canonical policyengine.org URLs, structured data, and sitemap.
Regular users continue to see the iframe version with the full
policyengine.org nav/footer.

- vercel.json: Add /_tracker/* rewrite for proxied asset loading
- middleware.ts: Intercept crawler/bot requests for tracker routes
  and fetch pre-rendered HTML from Modal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policyengine-app-v2 Ready Ready Preview, Comment Feb 18, 2026 8:22pm
policyengine-calculator Ready Ready Preview, Comment Feb 18, 2026 8:22pm

Request Review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
If Modal is down or returns an error, fall through gracefully to the
app shell instead of crashing the middleware.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@anth-volk
Copy link
Collaborator

Will review tomorrow

@anth-volk
Copy link
Collaborator

@PavelMakarchuk currently reviewing. Having some challenges running locally, but will keep you updated.

Copy link
Collaborator

@anth-volk anth-volk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes don't currently work on the preview branch. Did some cursory research and it's unclear to me if SPAs like Vite without server-side rendering can even use Vercel edge functions. At the very least, I believe you need a middleware.ts file at project root. Happy to chat about how to proceed on this if you hit a wall; the learning may be that we want some SSR platform like Next.js and/or partial pre-hydration of some sort from React (but I've never worked with this before).

Vercel Edge Middleware requires middleware.ts at the project root
(same level as package.json). The Vercel project has Root Directory
set to ".", so app/middleware.ts was never being executed — neither
the existing OG tag middleware nor the new tracker proxy.

- Move app/middleware.ts → middleware.ts
- Update imports to ./app/src/data/...
- Update tsconfig include path
- Update test import paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@PavelMakarchuk
Copy link
Contributor Author

Good catch — you were right that it wasn't working. The root cause is that middleware.ts was inside app/, but our Vercel project has Root Directory set to . (repo root). Vercel Edge Middleware needs to be at the project root (same level as package.json), so app/middleware.ts was never being picked up.

This also means the existing OG tag middleware on main has never been running in production — confirmed by testing with curl -A "facebookexternalhit" against a blog post URL and getting generic OG tags instead of post-specific ones.

I've pushed a fix (1eb72c8) that moves middleware.ts to the repo root and updates all import paths. This should fix both the new tracker proxy and the existing OG tags. Tests all pass.

Note: the preview still can't be tested with curl due to Vercel's deployment protection (SSO auth wall intercepts before middleware runs), but once merged to production we can verify with:

curl -s -A "facebookexternalhit" https://policyengine.org/us/research/some-post | head -5
curl -s -A "Googlebot" https://policyengine.org/us/state-legislative-tracker/GA | head -5

Re: your Vite question — Edge Middleware is a Vercel platform feature, not framework-specific. It works with any project type (Vite, Next.js, static, etc.). The file just needs to be at the right location.

@anth-volk
Copy link
Collaborator

Turns out that you can verify this works on a test deployment by removing deployment protection temporarily. Did so and confirmed that it works on the preview deploy link. Thanks for this, @PavelMakarchuk!

@anth-volk anth-volk merged commit f253e88 into main Feb 18, 2026
8 checks passed
@anth-volk anth-volk deleted the tracker-reverse-proxy branch February 18, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SEO: Reverse proxy state legislative tracker for crawlers

2 participants

Comments