Add generic sqlite provider for snomed, rxnorm, and loinc#112
Open
jmandel wants to merge 9 commits intoHealthIntersections:mainfrom
Open
Add generic sqlite provider for snomed, rxnorm, and loinc#112jmandel wants to merge 9 commits intoHealthIntersections:mainfrom
jmandel wants to merge 9 commits intoHealthIntersections:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Report: Generic SQLite Provider for SNOMED, LOINC, and RxNorm
Date: 2026-02-13
Branch:
generic-sqlite-providerBase:
upstream/main1. Purpose
This branch builds a single SQLite-backed terminology path for SNOMED, LOINC, and RxNorm, while keeping worker compatibility with existing server abstractions.
The intent is not to introduce a parallel server architecture. The intent is to make the existing worker pipeline operate against one generic, metadata-driven provider/runtime for major vocabularies.
2. Scope and Test Configuration
All current validation in this report uses:
/r4tx/fixtures/sample-all-sqlite-v0.ymltx/fixtures/test-cases-setup-all-sqlite-v0.jsonConfigured sources:
sqlite-v0!:sct_intl_20250201.v0.dbsqlite-v0:sct_us_20250301.v0.dbsqlite-v0:loinc_281_full.v0.dbsqlite-v0:rxnorm_02022026.v0.db!marks the default source for that code system in multi-version setups.3. Architecture Implemented
3.1 Unified loader/runtime
sqlite-v0.snomed/loinc/rxnorm.cs_configmetadata (runtime.*keys).3.2 Metadata-driven behavior
Behavior is driven by metadata for:
Specialization is still possible, but only via metadata tags and a registry, not loader hardcoding.
Current state:
/vs/*URL semantics.3.3 SQLite schema and indexing strategy
Unified schema includes (high level):
conceptdesignationproperty_defconcept_literalconcept_linkclosurecs_configIndexing/model choices:
3.4 Worker compatibility and optional capabilities
Existing worker contracts remain usable. We added optional capabilities that fall back safely:
filterPage(...)(batched filter path)locateMany(...)(batched concept locate)If unsupported by a provider, workers continue using legacy per-item paths.
4. Codebase Changes
4.1 Legacy removal
Removed legacy terminology classes and legacy importer modules for SNOMED/LOINC/RxNorm so SQLite runtime is the primary path for this branch.
4.2 Importer corrections and runtime hardening
Key fixes applied while validating against official and sampled traffic:
searchFilter(...)argument order in expand worker pathValueSet/$validate-codecrash path (messagespropagation)RXCUI+TTYpairsfallbackRecursive=false)5. Data Artifacts and Size
Current SQLite outputs in this branch:
sct_intl_20250201.v0.dbsct_us_20250301.v0.dbloinc_281_full.v0.dbrxnorm_02022026.v0.dbAvailable baseline artifacts in mainline cache path (
FHIRsmith/data/terminology-cache) for comparable versions:sct_intl_20250201.cachesct_intl_20250201.v0.dbloinc-2.81.dbloinc_281_full.v0.dbrxnorm_02022026.dbrxnorm_02022026.v0.dbNotes:
20250301does not have a direct same-version baseline artifact in the mainline cache directory.6. Correctness Results
6.1 Official terminology mini-runner (R4)
Artifact:
captured/official-term-mini-results-r4.all-sqlitev0-20260213-prrefresh.jsonThe 2 non-xfail failures are both SNOMED tests pinned to
xsct20250814, which is outside the configured loaded versions.6.2 Sampled replay (R4-focused sampled NDJSON)
Artifacts:
captured/snomed-replay-allsqlite-v0-20260213-prrefresh.jsoncaptured/loinc-replay-allsqlite-v0-20260213-prrefresh.jsoncaptured/rxnorm-replay-allsqlite-v0-20260213-prrefresh.jsonResults:
Mismatch classification (latest classified artifacts):
displayLanguage=english(3), prod/dev disagreement (2), captured-body defects (2), other-needs-triage (2)displayLanguage=english(5), prod/dev disagreement (4), captured-body defects (1)422 VALUESET_TOO_COSTLYreplacing sampled500(3)Classified artifacts:
captured/snomed-replay-allsqlite-v0-20260213-prrefresh.classified.jsoncaptured/loinc-replay-allsqlite-v0-20260213-prrefresh.classified.jsoncaptured/rxnorm-replay-allsqlite-v0-20260213-prrefresh.classified.json6.3 Concrete behavior improvement example
RxNorm query:
ValueSet/$expandproperty=TTY,op==,value=SBDtylenolObserved:
500 Invalid search filter200withexpansion.total=13(active Tylenol SBD concepts)This comes from:
7. Performance Results
Performance artifacts used:
captured/perf-snomed-main-vs-generic-20260213h.jsoncaptured/perf-loinc-main-vs-generic-20260213h.jsoncaptured/perf-rxnorm-main-vs-generic-20260213h.jsonRun shape:
6, warmup1onandoff7.1 Overall timings
7.2 Operation-level uncached p50 median delta (branch - main)
ValueSet/$validate-codeCodeSystem/$validate-codeValueSet/$expandValueSet/$batch-validate-codeValueSet/$validate-codeCodeSystem/$validate-codeValueSet/$expandCodeSystem/$validate-codeValueSet/$validate-codeValueSet/$expandInterpretation:
_incomplete/large expand paths._incompleteexpand remains the main lag pattern.8. Non-DB Pipeline Findings and Fixes
This effort also surfaced and fixed issues outside pure DB schema/import concerns:
filterPage,locateMany) with strict fallback behaviorThese are generic pipeline improvements and not tied to one terminology.
9. Trade-offs and Remaining Gaps
xsct20250814._incompleteexpands.10. Repro Commands
Official mini-runner:
Sample replay:
Perf comparison:
11. Summary
The branch demonstrates that SNOMED, LOINC, and RxNorm can run through one generic SQLite provider/runtime with metadata-driven behavior and minimal specialization, while staying compatible with existing worker abstractions.
Results are mixed but clear: