Problem
Phase 1 (per-WSG extract + break pre-computation) runs sequentially across WSGs. For province-wide runs (246 WSGs), this is the dominant cost since Phase 2 (species classification) is already parallelized via furrr.
Proposed Solution
Wrap the Phase 1 WSG loop in furrr::future_map. Each WSG writes to its own tables (working.streams_{wsg}, working.breaks_access_{wsg}_{thr}) so there are no conflicts. Each worker opens its own DB connection.
Current benchmarks (BULK, 32K segments, 7 species)
- Break pre-computation: ~230s (sequential, fixed)
- Species phase: 259s with 4 workers
- Total: 487s
For multi-WSG, Phase 1 time scales linearly with WSG count. Parallelizing across WSGs removes that bottleneck.
Depends on #63 (local Docker) for safe testing with many concurrent connections.
Problem
Phase 1 (per-WSG extract + break pre-computation) runs sequentially across WSGs. For province-wide runs (246 WSGs), this is the dominant cost since Phase 2 (species classification) is already parallelized via furrr.
Proposed Solution
Wrap the Phase 1 WSG loop in furrr::future_map. Each WSG writes to its own tables (working.streams_{wsg}, working.breaks_access_{wsg}_{thr}) so there are no conflicts. Each worker opens its own DB connection.
Current benchmarks (BULK, 32K segments, 7 species)
For multi-WSG, Phase 1 time scales linearly with WSG count. Parallelizing across WSGs removes that bottleneck.
Depends on #63 (local Docker) for safe testing with many concurrent connections.