feat: ft call-peaks subcommand and FIRE fiber-level filter refactor#94
Merged
feat: ft call-peaks subcommand and FIRE fiber-level filter refactor#94
Conversation
…ng, also need to turn that into a real peak file
…ng, also need to turn that into a real peak file
…ng, also need to turn that into a real peak file
…ng, also need to turn that into a real peak file
…s and incoperating the underlying FIRE elements so I can find the natural peak start and end
Add new `mock-fire` CLI subcommand that generates mock BAM files with FIRE elements from a BED file. Each interval becomes a FIRE element, with the 4th column grouping intervals into the same mock read. Bug fixes: - Fix score calculation overflow when quality=255 by capping at 253 and limiting max score to 100 - Fix peak merging to select representative peak by highest score instead of lowest FDR, ensuring the peak_max reflects the position with most FIRE coverage - Add hidden --min-fire-coverage parameter to call-peaks for testing with low-coverage data New files: - src/cli/mock_fire_opts.rs - src/subcommands/mock_fire.rs
Share the three FIRE fiber-level filters (--skip-no-m6a, --min-msp, --min-ave-msp-size) across ft fire, ft pileup, and ft call-peaks via a new FireFiberFilters struct in utils/fire.rs. A fiber that fails any filter contributes nothing to downstream counts (coverage denominator included), so pileup output reflects what FIRE peak-calling actually sees. ft call-peaks defaults to the FIRE peak-calling pipeline values (skip_no_m6a=true, min_msp=10, min_ave_msp_size=10). ft pileup keeps filters off by default but adds --fire-filter as a convenience that bundles all three + --fiber-coverage; individual flags still override. ft fire's CLI defaults are unchanged; only its internal filter logic now goes through the shared struct. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirror ft pileup: filters default off, --fire-filter is the one knob that turns on the FIRE peak-calling pipeline values. Individual flags still override when both are set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The --fire-filter / --skip-no-m6a / --min-msp / --min-ave-msp-size flags now live on InputBam's FiberFilters and are applied inside FiberseqRecords::next(). Every subcommand that reads fibers gets the same filter semantics for free, matching the precedent set by --filter and --ftx. Drops the per-subcommand duplicate flag definitions and the redundant .passes() calls in fire.rs, pileup.rs, and call_peaks. Also drops the short form on --use-5mc to unblock a pre-existing debug-assert collision with --uncompressed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…starts PR #103 reversed `ann_vals` post-sort to pair fa extras with reverse-strand peaks, but that only works when the aligned-coordinate sort is equivalent to reversing the input order. Overlapping peaks with shared forward starts (e.g. `fs:B:I,0,0,12,159`) produce non-monotonic aligned starts, so the sort permutation is no longer a simple reversal and extras get scrambled. Carry extras through `FiberAnnotations::new` so the stable `sort_by_key` moves them alongside their annotation. Added regression test mirroring Anna's four-peak surjection case; validated on the reported CRAM. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Follow-up cleanup to the fa-extras pairing fix; no behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Measured on this repo with hyperfine (2-3 runs each): - Warm incremental release (touch one .rs): 32.2s -> 15.2s (-53%) - Clean test --no-run: 103.0s -> 79.2s (-23%) - Clean release got slower with thin-LTO (88s -> 140s), so dist keeps full LTO + codegen-units=1 for shipped binaries. Also add explicit rerun-if-changed for the ONNX files in build.rs so the heavy burn-import codegen only reruns when a model file actually changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixes the Formatting CI step; no code changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Drop stale `//use polars::prelude::*;` import (polars dep already removed) - Hide `--min-frac-accessible` option from help (defined but not yet wired) - Drop duplicate `println!`s in `ft benchmark`; use the parallel `log::info!`s Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous commit dropped [profile.test] opt-level from 2 to 0 to speed up test compile, but this made the m6a prediction tests (burn/candle inference) run for 60+ seconds each. Use a package-glob override so workspace crate stays at opt-level 0 (fast compile), while all dependencies build at opt-level 2. m6a_prediction_test runtime: ~240s -> 10.3s. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…uilder Extract two pure helpers from `Peak` so they can be tested without constructing a `FiberseqPileup`: - `fdr::lookup_fdr(score, &fdr_table)` (was `Peak::lookup_fdr`) - `peaks::reciprocal_overlap_raw(a_chrom, a_start, a_end, ...)` — `Peak::reciprocal_overlap` is now a thin wrapper. Add `tests/call_peaks_test.rs` (21 cases) covering: - `lookup_fdr`: empty, below/above all, exact match, between thresholds, monotonicity - `reciprocal_overlap_raw`: different chroms, disjoint, touching boundaries, identical, containment, partial overlap, symmetry - `IncrementalFdrBuilder`: single-chrom expected table, error when no FDR < max, multi-chrom additivity, sort invariant - `read_fdr_table` / `write_fdr_table`: tempfile round trip, malformed-input error, sort-on-read All 21 pass in <10ms; tests use only the public surface (no `#[cfg(test)]` blocks added to source files). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Use `if let Some(...) = bam.records().next()` instead of a `for` with an unconditional `break` after the first record. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The package override only optimized non-workspace deps, so the workspace crate (m6a calling code) still compiled at opt-level=0 and made cargo test unusably slow. Reverting to the original test profile keeps m6a inference fast under tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…filter The hardcoded `m6a.is_empty() || n_msps == 0` early-reject ignored the resolved value of `skip_no_m6a`, so `--skip-no-m6a=false` was silently overridden whenever any other fire-filter was active (e.g. when used alongside `--fire-filter`). The CLI help text explicitly documents this override, so honor it: split the n_msps==0 guard (still needed for the average-size divide-by-zero) from the no-m6a rejection, and gate the latter on `resolved_skip_no_m6a()`. Adds unit tests covering the resolution semantics, the override combinations, and each individual threshold's reject/pass boundary. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…released changelog Cargo.toml v0.8.2 -> v0.9.0 to flag a pending release. The version banner used VERGEN_GIT_DESCRIBE, which only walks annotated tags by default; v0.7.0-v0.8.2 are lightweight tags so describe was falling back to v0.6.4-N-gSHA. Switch to a 7-char VERGEN_GIT_SHA so the banner is meaningful regardless of tag annotation state. Add an Unreleased section to CHANGELOG.md summarizing peak_calling work: ft call-peaks / ft mock-fire / ft benchmark, ft pileup multi-region and default frac-fibers filter, FIRE filter hoist to global FiberFilters, Train-FIRE pipeline, fa:Z shared-start pairing fix (affects v0.8.0-v0.8.2), and added tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch lands native peak calling in Rust, ergonomic improvements to
ft pileup, a Train-FIRE Snakemake pipeline, and a handful of correctness fixes uncovered along the way.New subcommands
ft call-peaks— full peak-calling pipeline in Rust. Builds an FDR table per chromosome, finds local maxima, identifies natural peak start/end from underlying FIRE elements, merges adjacent peaks, and emits a BED with the expected columns. Includeslookup_fdr, reciprocal-overlap merging, and FDR builder logic with unit tests.ft mock-fire— synthesizes a mock BAM with FIRE elements from a BED file. Each interval becomes a FIRE element; the 4th column groups intervals into the same mock fiber. Useful for end-to-end peak-calling tests.ft benchmark(hidden) — micro-benchmarks the fiberseq iterator for profiling.ft pileupimprovementsFIRE fiber-level filters → global
--skip-no-m6a,--min-msp,--min-ave-msp-size, and the convenience--fire-filterare now global flags onFiberFilters, applied insideFiberseqRecords::next(). Every subcommand inherits them; per-subcommand duplicates are removed.--fire-filterdefaults all three to the FIRE pipeline's values (skip-no-m6a + 10 + 10); explicit flags override (require_equalsso--skip-no-m6a=falsedoesn't swallow the positional BAM arg).Correctness fixes
fs:B:I,0,0,12,159), the aligned-coordinate sort no longer reduces to "reverse the input order," so the post-hocann_vals.reverse()from PR fix: pair fa:Z extras with correct annotations on reverse-strand reads #103 mispaired extras. Carry extras through the flip + sort insideFiberAnnotations::new_with_extrasso pairing survives any permutation. Validated on Anna's surjected CHM13 CRAM and locked down with a regression test using her exact peak shape.--skip-no-m6a=falseoverride was silently ignored when combined with any other fire-filter, contradicting the CLI help. The empty-m6a reject inpasses_fire_filterwas hardcoded; now gated onresolved_skip_no_m6a(). Then_msps == 0guard is kept (still required to avoid a divide-by-zero in the average-MSP-size calc).Train-FIRE Snakemake pipeline
Train-FIRE/directory with a complete Snakemake workflow for training FIRE models (rules for features, regions, apply, trackhub, train), pixi env, default profile, test data, and example mixed-positives resources (DHS/CTCF/hotspots/peaks).Build / dev-loop
opt-level = 0for the dev compile loop,opt-level = 2restored where ML inference tests need speed).cargo clippy --tests -- -D warnings).Tests added
tests/call_peaks_test.rs— peak-calling integration coverage (FDR lookup, reciprocal overlap, FDR builder).tests/fibertig_test.rs— fa-pairing tests for forward, reverse-strand multi-peak, and reverse-strand shared-start cases.src/utils/input_bam.rs— 7 unit tests for the new fire-filter resolution andpasses_fire_filtersemantics (each threshold,--fire-filtercombo defaults, explicit overrides).Test plan
cargo test— 70+ tests across unit + integration suitescargo clippy --tests -- -D warnings— cleanft pileupsmoke check (baseline 13192,--fire-filter→ 12967,--min-msp=1000→ 1)ft firesmoke checkft call-peakssmoke checkft pg-inject --extracton Anna's surjected CHM13 CRAM