Skip to content

SEO: Reverse proxy state legislative tracker for crawlers #702

@PavelMakarchuk

Description

@PavelMakarchuk

Problem

The state legislative tracker is embedded in policyengine.org via an iframe that loads from Modal (policyengine--state-legislative-tracker.modal.run). This causes SEO problems:

  • Canonical URLs point to Modal, not policyengine.org — Google indexes the wrong domain
  • Social sharing cards (Twitter, Facebook, LinkedIn) show generic app-v2 OG tags instead of per-bill metadata (title, description, revenue impact)
  • Structured data (JSON-LD) and sitemap live on Modal and reference Modal URLs, so Google doesn't associate the tracker content with policyengine.org
  • iframe content is not reliably indexed by all search engines

The tracker has rich pre-rendered HTML for every bill page (19 bills across 16 states) with unique titles, descriptions, and noscript content — but crawlers hitting policyengine.org never see it.

Solution

Add a crawler-only reverse proxy in Vercel middleware. When a search engine or social media bot requests /us/state-legislative-tracker/*, the middleware fetches the pre-rendered HTML from Modal and returns it directly. This HTML contains:

  • Canonical URLs on policyengine.org
  • Per-bill OG tags (title, description, image)
  • JSON-LD structured data
  • Noscript content with bill provisions and revenue impacts
  • Sitemap and robots.txt referencing policyengine.org URLs

Regular users continue to see the existing iframe version with the full policyengine.org nav and footer — no visual change for humans.

How it works

Crawlers:  policyengine.org/us/state-legislative-tracker/GA
           → middleware detects bot UA → fetches modal.run/GA
           → returns pre-rendered HTML with policyengine.org canonical URLs

Users:     policyengine.org/us/state-legislative-tracker/GA
           → middleware returns nothing → catch-all rewrite → website.html → iframe

Assets:    policyengine.org/_tracker/index-*.js  (loaded by crawler-rendered pages)
           → Vercel rewrite → modal.run/_tracker/index-*.js

Companion PR

Tracker-side changes (already merged): PolicyEngine/state-legislative-tracker#107

  • Renamed asset directory from assets/ to _tracker/ to avoid path collisions
  • Updated prerender to emit canonical URLs on policyengine.org

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions