Skip to content

Add unaccent to trigram GIN indexes for accent-insensitive search#4

Merged
jakebromberg merged 1 commit intomainfrom
feat/unaccent-indexes
Feb 14, 2026
Merged

Add unaccent to trigram GIN indexes for accent-insensitive search#4
jakebromberg merged 1 commit intomainfrom
feat/unaccent-indexes

Conversation

@jakebromberg
Copy link
Member

Summary

  • Adds accent-insensitive fuzzy matching so queries for "Bjork" match "Björk" at the database level
  • Creates an immutable f_unaccent() SQL wrapper (required because PostgreSQL's built-in unaccent() is STABLE, not IMMUTABLE, so it can't be used in index expressions directly)
  • Updates all 4 trigram GIN index expressions from lower(col) to lower(f_unaccent(col))
  • Updates the dedup, pipeline orchestrator, and copy-to-target code paths to create the function before indexes
  • Adds integration tests for the unaccent extension, the f_unaccent function, and index definitions

Test plan

  • Unit tests pass (198/198)
  • Integration tests (pytest -m postgres) -- verify schema, indexes, dedup, copy-to on Postgres
  • E2E tests (pytest -m e2e) -- full pipeline run
  • Manual: SELECT indexdef FROM pg_indexes WHERE indexname LIKE '%trgm%' shows f_unaccent
  • Manual: SELECT f_unaccent('Björk') returns Bjork

Note: consumer queries in request-parser's cache_service.py will need a separate change to use lower(f_unaccent($1)) in WHERE clauses to match the new index expressions.

Queries for "Bjork" now match "Björk" at the database level. Adds an
immutable f_unaccent() wrapper (required because the built-in unaccent()
is STABLE) and updates all 4 trigram index expressions from lower(col)
to lower(f_unaccent(col)).

- schema/create_database.sql: enable unaccent extension
- schema/create_functions.sql: immutable f_unaccent() wrapper
- schema/create_indexes.sql: accent-insensitive index expressions
- scripts/dedup_releases.py: update hardcoded post-dedup indexes
- scripts/run_pipeline.py: run create_functions.sql after schema
- scripts/verify_cache.py: run create_functions.sql on target DB
@jakebromberg jakebromberg merged commit 5b2ae07 into main Feb 14, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant