feat: ft call-peaks subcommand and FIRE fiber-level filter refactor by mrvollger · Pull Request #94 · fiberseq/fibertools-rs

mrvollger · 2025-11-06T15:29:09Z

Summary

This branch lands native peak calling in Rust, ergonomic improvements to ft pileup, a Train-FIRE Snakemake pipeline, and a handful of correctness fixes uncovered along the way.

New subcommands

ft call-peaks — full peak-calling pipeline in Rust. Builds an FDR table per chromosome, finds local maxima, identifies natural peak start/end from underlying FIRE elements, merges adjacent peaks, and emits a BED with the expected columns. Includes lookup_fdr, reciprocal-overlap merging, and FDR builder logic with unit tests.
ft mock-fire — synthesizes a mock BAM with FIRE elements from a BED file. Each interval becomes a FIRE element; the 4th column groups intervals into the same mock fiber. Useful for end-to-end peak-calling tests.
ft benchmark (hidden) — micro-benchmarks the fiberseq iterator for profiling.

`ft pileup` improvements

Accept multiple regions on the command line, or a BED of regions (multi-fetch instead of collect).
Major internal refactor for code reuse with the new peak caller.
Default frac-fibers filter; cleaner logging.

FIRE fiber-level filters → global

--skip-no-m6a, --min-msp, --min-ave-msp-size, and the convenience --fire-filter are now global flags on FiberFilters, applied inside FiberseqRecords::next(). Every subcommand inherits them; per-subcommand duplicates are removed.
--fire-filter defaults all three to the FIRE pipeline's values (skip-no-m6a + 10 + 10); explicit flags override (require_equals so --skip-no-m6a=false doesn't swallow the positional BAM arg).

Correctness fixes

fa:Z extras pairing on overlapping peaks — when fibertig peaks share a forward-strand start (e.g. fs:B:I,0,0,12,159), the aligned-coordinate sort no longer reduces to "reverse the input order," so the post-hoc ann_vals.reverse() from PR fix: pair fa:Z extras with correct annotations on reverse-strand reads #103 mispaired extras. Carry extras through the flip + sort inside FiberAnnotations::new_with_extras so pairing survives any permutation. Validated on Anna's surjected CHM13 CRAM and locked down with a regression test using her exact peak shape.
--skip-no-m6a=false override was silently ignored when combined with any other fire-filter, contradicting the CLI help. The empty-m6a reject in passes_fire_filter was hardcoded; now gated on resolved_skip_no_m6a(). The n_msps == 0 guard is kept (still required to avoid a divide-by-zero in the average-MSP-size calc).

Train-FIRE Snakemake pipeline

New Train-FIRE/ directory with a complete Snakemake workflow for training FIRE models (rules for features, regions, apply, trackhub, train), pixi env, default profile, test data, and example mixed-positives resources (DHS/CTCF/hotspots/peaks).

Build / dev-loop

Thin-LTO and tuned test profile (opt-level = 0 for the dev compile loop, opt-level = 2 restored where ML inference tests need speed).
Optimize a handful of test deps so tests stay fast.
Clippy clean (cargo clippy --tests -- -D warnings).

Tests added

tests/call_peaks_test.rs — peak-calling integration coverage (FDR lookup, reciprocal overlap, FDR builder).
tests/fibertig_test.rs — fa-pairing tests for forward, reverse-strand multi-peak, and reverse-strand shared-start cases.
src/utils/input_bam.rs — 7 unit tests for the new fire-filter resolution and passes_fire_filter semantics (each threshold, --fire-filter combo defaults, explicit overrides).

Test plan

cargo test — 70+ tests across unit + integration suites
cargo clippy --tests -- -D warnings — clean
ft pileup smoke check (baseline 13192, --fire-filter → 12967, --min-msp=1000 → 1)
ft fire smoke check
ft call-peaks smoke check
ft pg-inject --extract on Anna's surjected CHM13 CRAM

…ng, also need to turn that into a real peak file

…r peak calling

…ng, also need to turn that into a real peak file

…r peak calling

into peak_calling

…s and incoperating the underlying FIRE elements so I can find the natural peak start and end

…merging

…tput columns

…seq iterator

Add new `mock-fire` CLI subcommand that generates mock BAM files with FIRE elements from a BED file. Each interval becomes a FIRE element, with the 4th column grouping intervals into the same mock read. Bug fixes: - Fix score calculation overflow when quality=255 by capping at 253 and limiting max score to 100 - Fix peak merging to select representative peak by highest score instead of lowest FDR, ensuring the peak_max reflects the position with most FIRE coverage - Add hidden --min-fire-coverage parameter to call-peaks for testing with low-coverage data New files: - src/cli/mock_fire_opts.rs - src/subcommands/mock_fire.rs

Share the three FIRE fiber-level filters (--skip-no-m6a, --min-msp, --min-ave-msp-size) across ft fire, ft pileup, and ft call-peaks via a new FireFiberFilters struct in utils/fire.rs. A fiber that fails any filter contributes nothing to downstream counts (coverage denominator included), so pileup output reflects what FIRE peak-calling actually sees. ft call-peaks defaults to the FIRE peak-calling pipeline values (skip_no_m6a=true, min_msp=10, min_ave_msp_size=10). ft pileup keeps filters off by default but adds --fire-filter as a convenience that bundles all three + --fiber-coverage; individual flags still override. ft fire's CLI defaults are unchanged; only its internal filter logic now goes through the shared struct. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Mirror ft pileup: filters default off, --fire-filter is the one knob that turns on the FIRE peak-calling pipeline values. Individual flags still override when both are set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The --fire-filter / --skip-no-m6a / --min-msp / --min-ave-msp-size flags now live on InputBam's FiberFilters and are applied inside FiberseqRecords::next(). Every subcommand that reads fibers gets the same filter semantics for free, matching the precedent set by --filter and --ftx. Drops the per-subcommand duplicate flag definitions and the redundant .passes() calls in fire.rs, pileup.rs, and call_peaks. Also drops the short form on --use-5mc to unblock a pre-existing debug-assert collision with --uncompressed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…starts PR #103 reversed `ann_vals` post-sort to pair fa extras with reverse-strand peaks, but that only works when the aligned-coordinate sort is equivalent to reversing the input order. Overlapping peaks with shared forward starts (e.g. `fs:B:I,0,0,12,159`) produce non-monotonic aligned starts, so the sort permutation is no longer a simple reversal and extras get scrambled. Carry extras through `FiberAnnotations::new` so the stable `sort_by_key` moves them alongside their annotation. Added regression test mirroring Anna's four-peak surjection case; validated on the reported CRAM. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Follow-up cleanup to the fa-extras pairing fix; no behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Measured on this repo with hyperfine (2-3 runs each): - Warm incremental release (touch one .rs): 32.2s -> 15.2s (-53%) - Clean test --no-run: 103.0s -> 79.2s (-23%) - Clean release got slower with thin-LTO (88s -> 140s), so dist keeps full LTO + codegen-units=1 for shipped binaries. Also add explicit rerun-if-changed for the ONNX files in build.rs so the heavy burn-import codegen only reruns when a model file actually changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Fixes the Formatting CI step; no code changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Drop stale `//use polars::prelude::*;` import (polars dep already removed) - Hide `--min-frac-accessible` option from help (defined but not yet wired) - Drop duplicate `println!`s in `ft benchmark`; use the parallel `log::info!`s Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Previous commit dropped [profile.test] opt-level from 2 to 0 to speed up test compile, but this made the m6a prediction tests (burn/candle inference) run for 60+ seconds each. Use a package-glob override so workspace crate stays at opt-level 0 (fast compile), while all dependencies build at opt-level 2. m6a_prediction_test runtime: ~240s -> 10.3s. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…uilder Extract two pure helpers from `Peak` so they can be tested without constructing a `FiberseqPileup`: - `fdr::lookup_fdr(score, &fdr_table)` (was `Peak::lookup_fdr`) - `peaks::reciprocal_overlap_raw(a_chrom, a_start, a_end, ...)` — `Peak::reciprocal_overlap` is now a thin wrapper. Add `tests/call_peaks_test.rs` (21 cases) covering: - `lookup_fdr`: empty, below/above all, exact match, between thresholds, monotonicity - `reciprocal_overlap_raw`: different chroms, disjoint, touching boundaries, identical, containment, partial overlap, symmetry - `IncrementalFdrBuilder`: single-chrom expected table, error when no FDR < max, multi-chrom additivity, sort invariant - `read_fdr_table` / `write_fdr_table`: tempfile round trip, malformed-input error, sort-on-read All 21 pass in <10ms; tests use only the public surface (no `#[cfg(test)]` blocks added to source files). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Use `if let Some(...) = bam.records().next()` instead of a `for` with an unconditional `break` after the first record. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The package override only optimized non-workspace deps, so the workspace crate (m6a calling code) still compiled at opt-level=0 and made cargo test unusably slow. Reverting to the original test profile keeps m6a inference fast under tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…filter The hardcoded `m6a.is_empty() || n_msps == 0` early-reject ignored the resolved value of `skip_no_m6a`, so `--skip-no-m6a=false` was silently overridden whenever any other fire-filter was active (e.g. when used alongside `--fire-filter`). The CLI help text explicitly documents this override, so honor it: split the n_msps==0 guard (still needed for the average-size divide-by-zero) from the no-m6a rejection, and gate the latter on `resolved_skip_no_m6a()`. Adds unit tests covering the resolution semantics, the override combinations, and each individual threshold's reject/pass boundary. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…released changelog Cargo.toml v0.8.2 -> v0.9.0 to flag a pending release. The version banner used VERGEN_GIT_DESCRIBE, which only walks annotated tags by default; v0.7.0-v0.8.2 are lightweight tags so describe was falling back to v0.6.4-N-gSHA. Switch to a 7-char VERGEN_GIT_SHA so the banner is meaningful regardless of tag annotation state. Add an Unreleased section to CHANGELOG.md summarizing peak_calling work: ft call-peaks / ft mock-fire / ft benchmark, ft pileup multi-region and default frac-fibers filter, FIRE filter hoist to global FiberFilters, Train-FIRE pipeline, fa:Z shared-start pairing fix (affects v0.8.0-v0.8.2), and added tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Mitchell R. Vollger and others added 30 commits October 29, 2025 15:14

first try at peak calling all in rust, fdr table done but needs testi…

4b4757c

…ng, also need to turn that into a real peak file

first try at peak calling all in rust, fdr table done but needs testi…

1559db7

…ng, also need to turn that into a real peak file

clippy

705a87b

Major refactor of pileup to allow more code to be reusable in prep fo…

55c140a

…r peak calling

first try at peak calling all in rust, fdr table done but needs testi…

9f394cf

…ng, also need to turn that into a real peak file

first try at peak calling all in rust, fdr table done but needs testi…

ff7f2b1

…ng, also need to turn that into a real peak file

clippy

dbdcc90

Major refactor of pileup to allow more code to be reusable in prep fo…

247dd8d

…r peak calling

Merge branch 'peak_calling' of https://github.com/fiberseq/fibertools-rs

112afb7

into peak_calling

First peak calling works! Now I need to work on merging adjacent peak…

414e10d

…s and incoperating the underlying FIRE elements so I can find the natural peak start and end

Finding the natural start and end of local maximum, I now need to do …

7ae3922

…merging

Merging is working I think

c5f681c

Added calling by percent accessible and added most of the expected ou…

032d87c

…tput columns

build fdr one chrom at a time

60aab5d

build fdr one chrom at a time

e0ebdac

build fdr one chrom at a time

5331299

multi fetch instead of collecting

9ea6806

remove default filtering

d8d7d33

add a deafult frac fibers filter

365c17e

better profiling options

55fceb0

add a hidden benchmakr command to help me test the speed of the fiber…

0350adc

…seq iterator

clean up logging in the new peak calling

e88b264

feat: allow pileup to accept multiple regions or a bed file of regions

099b8f4

fix: better error msg

80e633f

Fix merged peak start and ends using all fire elements median positions

a5717f6

move a FIRE training snakemake into fibertools

83b28c2

move a FIRE training snakemake into fibertools

02c872b

move a FIRE training snakemake into fibertools

c4ecc60

move a FIRE training snakemake into fibertools

0390a39

Mitchell R. Vollger and others added 26 commits April 22, 2026 11:24

update

7a519da

update

bc3eccc

update

1910dc5

update

d7c9f41

update

6da85eb

update

df28544

update

637de22

update

a7fde23

update

20170f3

update

e3a196f

update

06f8b1d

update

1c29bef

Merge remote-tracking branch 'origin/main' into peak_calling

d004bbe

style: use in-scope ann_tag instead of re-unwrapping annotation_tag

afc0e74

Follow-up cleanup to the fa-extras pairing fix; no behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

style: cargo fmt

83a1285

Fixes the Formatting CI step; no code changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

style: fix clippy never_loop in coordinate_lift_test

2047f8a

Use `if let Some(...) = bam.records().next()` instead of a `for` with an unconditional `break` after the first record. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

mrvollger marked this pull request as ready for review April 27, 2026 14:32

mrvollger changed the title ~~Add peak calling for FIRE into ft~~ feat: ft call-peaks subcommand and FIRE fiber-level filter refactor Apr 27, 2026

mrvollger merged commit 98b9b02 into main Apr 27, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ft call-peaks subcommand and FIRE fiber-level filter refactor#94

feat: ft call-peaks subcommand and FIRE fiber-level filter refactor#94
mrvollger merged 86 commits intomainfrom
peak_calling

mrvollger commented Nov 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrvollger commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New subcommands

ft pileup improvements

FIRE fiber-level filters → global

Correctness fixes

Train-FIRE Snakemake pipeline

Build / dev-loop

Tests added

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mrvollger commented Nov 6, 2025 •

edited

Loading

`ft pileup` improvements