Skip to content

Parallelize Phase 1 break pre-computation across WSGs #64

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

Phase 1 (per-WSG extract + break pre-computation) runs sequentially across WSGs. For province-wide runs (246 WSGs), this is the dominant cost since Phase 2 (species classification) is already parallelized via furrr.

Proposed Solution

Wrap the Phase 1 WSG loop in furrr::future_map. Each WSG writes to its own tables (working.streams_{wsg}, working.breaks_access_{wsg}_{thr}) so there are no conflicts. Each worker opens its own DB connection.

Current benchmarks (BULK, 32K segments, 7 species)

  • Break pre-computation: ~230s (sequential, fixed)
  • Species phase: 259s with 4 workers
  • Total: 487s

For multi-WSG, Phase 1 time scales linearly with WSG count. Parallelizing across WSGs removes that bottleneck.

Depends on #63 (local Docker) for safe testing with many concurrent connections.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions