SEO: Crawler-only reverse proxy for state legislative tracker#703
SEO: Crawler-only reverse proxy for state legislative tracker#703
Conversation
Proxy search engine and social media crawler requests for /us/state-legislative-tracker/* to Modal, serving pre-rendered HTML with canonical policyengine.org URLs, structured data, and sitemap. Regular users continue to see the iframe version with the full policyengine.org nav/footer. - vercel.json: Add /_tracker/* rewrite for proxied asset loading - middleware.ts: Intercept crawler/bot requests for tracker routes and fetch pre-rendered HTML from Modal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
If Modal is down or returns an error, fall through gracefully to the app shell instead of crashing the middleware. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Will review tomorrow |
|
@PavelMakarchuk currently reviewing. Having some challenges running locally, but will keep you updated. |
anth-volk
left a comment
There was a problem hiding this comment.
These changes don't currently work on the preview branch. Did some cursory research and it's unclear to me if SPAs like Vite without server-side rendering can even use Vercel edge functions. At the very least, I believe you need a middleware.ts file at project root. Happy to chat about how to proceed on this if you hit a wall; the learning may be that we want some SSR platform like Next.js and/or partial pre-hydration of some sort from React (but I've never worked with this before).
Vercel Edge Middleware requires middleware.ts at the project root (same level as package.json). The Vercel project has Root Directory set to ".", so app/middleware.ts was never being executed — neither the existing OG tag middleware nor the new tracker proxy. - Move app/middleware.ts → middleware.ts - Update imports to ./app/src/data/... - Update tsconfig include path - Update test import paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Good catch — you were right that it wasn't working. The root cause is that This also means the existing OG tag middleware on I've pushed a fix (1eb72c8) that moves Note: the preview still can't be tested with Re: your Vite question — Edge Middleware is a Vercel platform feature, not framework-specific. It works with any project type (Vite, Next.js, static, etc.). The file just needs to be at the right location. |
|
Turns out that you can verify this works on a test deployment by removing deployment protection temporarily. Did so and confirmed that it works on the preview deploy link. Thanks for this, @PavelMakarchuk! |
Summary
Closes #702
Companion to PolicyEngine/state-legislative-tracker#107 (already merged and deployed to Modal).
Search engines and social media crawlers hitting
/us/state-legislative-tracker/*now receive pre-rendered HTML directly from Modal with:policyengine.org(notmodal.run)Regular users see no change — they still get the iframe version with the full policyengine.org nav and footer.
Changes
vercel.json— Add/_tracker/:path*rewrite to proxy tracker assets from Modal. These are referenced by the pre-rendered HTML that crawlers receive. Placed before the/(.*) → website.htmlcatch-all so it takes priority.app/middleware.ts— Add early interception for/us/state-legislative-tracker/*routes:website.html→ iframe (unchanged)Search engine bots are detected separately from social crawlers to avoid affecting OG tag behavior on other routes.
Architecture
Test plan
curl -A "Googlebot" https://policyengine.org/us/state-legislative-tracker/GA→ returns tracker HTML with<link rel="canonical" href="https://www.policyengine.org/us/state-legislative-tracker/GA">curl -A "facebookexternalhit" https://policyengine.org/us/state-legislative-tracker/GA→ returns tracker HTML with OG tagscurl -A "Mozilla/5.0" https://policyengine.org/us/state-legislative-tracker→ returns website.html (iframe version)/us/state-legislative-tracker→ iframe with PE nav/footer (unchanged)/_tracker/assets load via Vercel rewrite (check network tab when crawled page renders)🤖 Generated with Claude Code