Skip to content

Optimize classify: 3,700x speedup via ltree enrichment#75

Merged
NewGraphEnvironment merged 2 commits intomainfrom
optimize-classify
Apr 5, 2026
Merged

Optimize classify: 3,700x speedup via ltree enrichment#75
NewGraphEnvironment merged 2 commits intomainfrom
optimize-classify

Conversation

@NewGraphEnvironment
Copy link
Copy Markdown
Owner

Summary

  • Enrich breaks table with wscode_ltree + localcode_ltree from FWA base network after all break sources are combined
  • Restructure classify query: split NOT EXISTS ... OR into two separate NOT EXISTS ... AND — PG can use indexes independently
  • Eliminates join to fwa_stream_networks_sp (4.9M rows) at classify time

Performance

Metric Before After Speedup
Classify (single species) 742s 0.2s 3,700x
Full pipeline ADMS (5 species) ~88s 90s same (classify was not the only step)

The classify step dropped from dominant bottleneck to negligible. Remaining time is frs_col_generate (gradient recompute) and frs_classify for habitat attribute ranges.

Test plan

  • devtools::test() — 528 pass, 0 fail
  • Classify result identical: 1,060/11,520 accessible (9.2%)
  • Full pipeline: 5 species, 90s total
  • Benchmark logs in scripts/habitat/logs/

Fixes #72
Relates to NewGraphEnvironment/sred-2025-2026#16

🤖 Generated with Claude Code

NewGraphEnvironment and others added 2 commits April 4, 2026 13:45
Two changes eliminate the classify bottleneck:

1. Enrich breaks table with wscode_ltree + localcode_ltree from FWA
   base network (one-time join after all breaks combined). Eliminates
   the 4.9M row fwa_stream_networks_sp join at classify time.

2. Split NOT EXISTS OR into two separate NOT EXISTS with AND:
   - Same BLK: pure measure comparison (btree index)
   - Cross BLK: direct ltree comparison (GIST index)
   PG can now use indexes on both independently.

Full pipeline ADMS (5 species, falls + crossings): 90s total.
Classify step alone: 0.2s (was 742s with indexes, ~840s without).

Fixes #72

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
32GB shared_buffers, 96GB effective_cache_size, 2GB work_mem,
8 parallel workers per gather. Add tuning.md with scaling formulas
for other machines.

Relates to #72

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@NewGraphEnvironment NewGraphEnvironment merged commit bb0a8a4 into main Apr 5, 2026
1 check passed
@NewGraphEnvironment NewGraphEnvironment deleted the optimize-classify branch April 5, 2026 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Profile and optimize Phase 2 species classification performance

1 participant