-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Compare precision/recall of different entity extraction configurations to quantify accuracy/performance tradeoffs.
Details
Compare three configurations on the 18 evaluation cases:
- 49-term curated thesaurus (current
snomed_thesaurus.json) - 1.4M-pattern full UMLS automaton (current
umls_automata.bin.zst) - Expanded curated set (500+ terms, from expanded thesaurus work)
Metrics to capture:
- Precision (correct extractions / total extractions)
- Recall (correct extractions / expected extractions)
- F1 score
- Extraction latency (ms per case)
- Memory usage per configuration
Acceptance Criteria
- Benchmark harness with reproducible results
- Comparison table across all three configurations
- Latency and memory measurements
- Recommendations for production configuration
- Results documented in repo
Priority: P3
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request