Skip to content

feat: ft call-peaks subcommand and FIRE fiber-level filter refactor#94

Merged
mrvollger merged 86 commits intomainfrom
peak_calling
Apr 27, 2026
Merged

feat: ft call-peaks subcommand and FIRE fiber-level filter refactor#94
mrvollger merged 86 commits intomainfrom
peak_calling

Conversation

@mrvollger
Copy link
Copy Markdown
Member

@mrvollger mrvollger commented Nov 6, 2025

Summary

This branch lands native peak calling in Rust, ergonomic improvements to ft pileup, a Train-FIRE Snakemake pipeline, and a handful of correctness fixes uncovered along the way.

New subcommands

  • ft call-peaks — full peak-calling pipeline in Rust. Builds an FDR table per chromosome, finds local maxima, identifies natural peak start/end from underlying FIRE elements, merges adjacent peaks, and emits a BED with the expected columns. Includes lookup_fdr, reciprocal-overlap merging, and FDR builder logic with unit tests.
  • ft mock-fire — synthesizes a mock BAM with FIRE elements from a BED file. Each interval becomes a FIRE element; the 4th column groups intervals into the same mock fiber. Useful for end-to-end peak-calling tests.
  • ft benchmark (hidden) — micro-benchmarks the fiberseq iterator for profiling.

ft pileup improvements

  • Accept multiple regions on the command line, or a BED of regions (multi-fetch instead of collect).
  • Major internal refactor for code reuse with the new peak caller.
  • Default frac-fibers filter; cleaner logging.

FIRE fiber-level filters → global

  • --skip-no-m6a, --min-msp, --min-ave-msp-size, and the convenience --fire-filter are now global flags on FiberFilters, applied inside FiberseqRecords::next(). Every subcommand inherits them; per-subcommand duplicates are removed.
  • --fire-filter defaults all three to the FIRE pipeline's values (skip-no-m6a + 10 + 10); explicit flags override (require_equals so --skip-no-m6a=false doesn't swallow the positional BAM arg).

Correctness fixes

  • fa:Z extras pairing on overlapping peaks — when fibertig peaks share a forward-strand start (e.g. fs:B:I,0,0,12,159), the aligned-coordinate sort no longer reduces to "reverse the input order," so the post-hoc ann_vals.reverse() from PR fix: pair fa:Z extras with correct annotations on reverse-strand reads #103 mispaired extras. Carry extras through the flip + sort inside FiberAnnotations::new_with_extras so pairing survives any permutation. Validated on Anna's surjected CHM13 CRAM and locked down with a regression test using her exact peak shape.
  • --skip-no-m6a=false override was silently ignored when combined with any other fire-filter, contradicting the CLI help. The empty-m6a reject in passes_fire_filter was hardcoded; now gated on resolved_skip_no_m6a(). The n_msps == 0 guard is kept (still required to avoid a divide-by-zero in the average-MSP-size calc).

Train-FIRE Snakemake pipeline

  • New Train-FIRE/ directory with a complete Snakemake workflow for training FIRE models (rules for features, regions, apply, trackhub, train), pixi env, default profile, test data, and example mixed-positives resources (DHS/CTCF/hotspots/peaks).

Build / dev-loop

  • Thin-LTO and tuned test profile (opt-level = 0 for the dev compile loop, opt-level = 2 restored where ML inference tests need speed).
  • Optimize a handful of test deps so tests stay fast.
  • Clippy clean (cargo clippy --tests -- -D warnings).

Tests added

  • tests/call_peaks_test.rs — peak-calling integration coverage (FDR lookup, reciprocal overlap, FDR builder).
  • tests/fibertig_test.rs — fa-pairing tests for forward, reverse-strand multi-peak, and reverse-strand shared-start cases.
  • src/utils/input_bam.rs — 7 unit tests for the new fire-filter resolution and passes_fire_filter semantics (each threshold, --fire-filter combo defaults, explicit overrides).

Test plan

  • cargo test — 70+ tests across unit + integration suites
  • cargo clippy --tests -- -D warnings — clean
  • ft pileup smoke check (baseline 13192, --fire-filter → 12967, --min-msp=1000 → 1)
  • ft fire smoke check
  • ft call-peaks smoke check
  • ft pg-inject --extract on Anna's surjected CHM13 CRAM

Mitchell R. Vollger and others added 30 commits October 29, 2025 15:14
…ng, also need to turn that into a real peak file
…ng, also need to turn that into a real peak file
…ng, also need to turn that into a real peak file
…ng, also need to turn that into a real peak file
…s and incoperating the underlying FIRE elements so I can find the natural peak start and end
Add new `mock-fire` CLI subcommand that generates mock BAM files with
FIRE elements from a BED file. Each interval becomes a FIRE element,
with the 4th column grouping intervals into the same mock read.

Bug fixes:
- Fix score calculation overflow when quality=255 by capping at 253
  and limiting max score to 100
- Fix peak merging to select representative peak by highest score
  instead of lowest FDR, ensuring the peak_max reflects the position
  with most FIRE coverage
- Add hidden --min-fire-coverage parameter to call-peaks for testing
  with low-coverage data

New files:
- src/cli/mock_fire_opts.rs
- src/subcommands/mock_fire.rs
Mitchell R. Vollger and others added 26 commits April 22, 2026 11:24
Share the three FIRE fiber-level filters (--skip-no-m6a, --min-msp,
--min-ave-msp-size) across ft fire, ft pileup, and ft call-peaks via a
new FireFiberFilters struct in utils/fire.rs. A fiber that fails any
filter contributes nothing to downstream counts (coverage denominator
included), so pileup output reflects what FIRE peak-calling actually
sees.

ft call-peaks defaults to the FIRE peak-calling pipeline values
(skip_no_m6a=true, min_msp=10, min_ave_msp_size=10). ft pileup keeps
filters off by default but adds --fire-filter as a convenience that
bundles all three + --fiber-coverage; individual flags still override.
ft fire's CLI defaults are unchanged; only its internal filter logic
now goes through the shared struct.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirror ft pileup: filters default off, --fire-filter is the one knob
that turns on the FIRE peak-calling pipeline values. Individual flags
still override when both are set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The --fire-filter / --skip-no-m6a / --min-msp / --min-ave-msp-size
flags now live on InputBam's FiberFilters and are applied inside
FiberseqRecords::next(). Every subcommand that reads fibers gets
the same filter semantics for free, matching the precedent set
by --filter and --ftx. Drops the per-subcommand duplicate flag
definitions and the redundant .passes() calls in fire.rs, pileup.rs,
and call_peaks.

Also drops the short form on --use-5mc to unblock a pre-existing
debug-assert collision with --uncompressed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…starts

PR #103 reversed `ann_vals` post-sort to pair fa extras with reverse-strand
peaks, but that only works when the aligned-coordinate sort is equivalent
to reversing the input order. Overlapping peaks with shared forward starts
(e.g. `fs:B:I,0,0,12,159`) produce non-monotonic aligned starts, so the
sort permutation is no longer a simple reversal and extras get scrambled.

Carry extras through `FiberAnnotations::new` so the stable `sort_by_key`
moves them alongside their annotation. Added regression test mirroring
Anna's four-peak surjection case; validated on the reported CRAM.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Follow-up cleanup to the fa-extras pairing fix; no behavior change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Measured on this repo with hyperfine (2-3 runs each):
- Warm incremental release (touch one .rs): 32.2s -> 15.2s (-53%)
- Clean test --no-run: 103.0s -> 79.2s (-23%)
- Clean release got slower with thin-LTO (88s -> 140s), so dist keeps
  full LTO + codegen-units=1 for shipped binaries.

Also add explicit rerun-if-changed for the ONNX files in build.rs so the
heavy burn-import codegen only reruns when a model file actually changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixes the Formatting CI step; no code changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Drop stale `//use polars::prelude::*;` import (polars dep already removed)
- Hide `--min-frac-accessible` option from help (defined but not yet wired)
- Drop duplicate `println!`s in `ft benchmark`; use the parallel `log::info!`s

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous commit dropped [profile.test] opt-level from 2 to 0 to speed up
test compile, but this made the m6a prediction tests (burn/candle inference)
run for 60+ seconds each. Use a package-glob override so workspace crate
stays at opt-level 0 (fast compile), while all dependencies build at opt-level 2.

m6a_prediction_test runtime: ~240s -> 10.3s.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…uilder

Extract two pure helpers from `Peak` so they can be tested without
constructing a `FiberseqPileup`:
- `fdr::lookup_fdr(score, &fdr_table)` (was `Peak::lookup_fdr`)
- `peaks::reciprocal_overlap_raw(a_chrom, a_start, a_end, ...)` — `Peak::reciprocal_overlap` is now a thin wrapper.

Add `tests/call_peaks_test.rs` (21 cases) covering:
- `lookup_fdr`: empty, below/above all, exact match, between thresholds, monotonicity
- `reciprocal_overlap_raw`: different chroms, disjoint, touching boundaries, identical, containment, partial overlap, symmetry
- `IncrementalFdrBuilder`: single-chrom expected table, error when no FDR < max, multi-chrom additivity, sort invariant
- `read_fdr_table` / `write_fdr_table`: tempfile round trip, malformed-input error, sort-on-read

All 21 pass in <10ms; tests use only the public surface (no `#[cfg(test)]` blocks added to source files).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Use `if let Some(...) = bam.records().next()` instead of a `for` with
an unconditional `break` after the first record.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The package override only optimized non-workspace deps, so the workspace
crate (m6a calling code) still compiled at opt-level=0 and made cargo test
unusably slow. Reverting to the original test profile keeps m6a inference
fast under tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…filter

The hardcoded `m6a.is_empty() || n_msps == 0` early-reject ignored the
resolved value of `skip_no_m6a`, so `--skip-no-m6a=false` was silently
overridden whenever any other fire-filter was active (e.g. when used
alongside `--fire-filter`). The CLI help text explicitly documents this
override, so honor it: split the n_msps==0 guard (still needed for the
average-size divide-by-zero) from the no-m6a rejection, and gate the
latter on `resolved_skip_no_m6a()`.

Adds unit tests covering the resolution semantics, the override
combinations, and each individual threshold's reject/pass boundary.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mrvollger mrvollger marked this pull request as ready for review April 27, 2026 14:32
@mrvollger mrvollger changed the title Add peak calling for FIRE into ft feat: ft call-peaks subcommand and FIRE fiber-level filter refactor Apr 27, 2026
…released changelog

Cargo.toml v0.8.2 -> v0.9.0 to flag a pending release.

The version banner used VERGEN_GIT_DESCRIBE, which only walks annotated
tags by default; v0.7.0-v0.8.2 are lightweight tags so describe was
falling back to v0.6.4-N-gSHA. Switch to a 7-char VERGEN_GIT_SHA so the
banner is meaningful regardless of tag annotation state.

Add an Unreleased section to CHANGELOG.md summarizing peak_calling work:
ft call-peaks / ft mock-fire / ft benchmark, ft pileup multi-region and
default frac-fibers filter, FIRE filter hoist to global FiberFilters,
Train-FIRE pipeline, fa:Z shared-start pairing fix (affects v0.8.0-v0.8.2),
and added tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mrvollger mrvollger merged commit 98b9b02 into main Apr 27, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant