Feat/calibration stage scoring by jspaezp · Pull Request #54 · TalusBio/timsbuktoolkit

jspaezp · 2026-04-10T05:46:46Z

... and A LOT more

Delete `build_narrow_context`, `process_query`, and `process_batch` — the old uncalibrated pipeline that is no longer called. The CLI exclusively uses `prescore_batch` + `score_calibrated_batch`. Also remove unused `info` import and clean up the stale module-level doc comment.

Delete ToleranceHierarchy struct and inline its prescore/secondary fields directly onto Scorer as broad_tolerance and secondary_tolerance. The tertiary_tolerance() method is inlined at its single call site. Update all re-exports, consumer function signatures, and construction sites.

…ult types

Ports SearchResultBuilder → ScoredCandidateBuilder in results.rs using the new ScoringFields field names and [f32; N] arrays for per-ion data. Updates finalize_results in pipeline.rs to use the new builder and return Result<ScoredCandidate, DataProcessingError>. Callers (Task 5) still expect IonSearchResults; those mismatches are intentional intermediate state.

Replace all IonSearchResults references with ScoredCandidate in the accumulator, score_calibrated_extraction, score_calibrated_batch, and FullQueryResult. The pipeline now produces ScoredCandidate throughout; CLI callers in processing.rs will be updated in Task 6.

…→ FinalResult)

…erive Delete search_results.rs entirely and remove all references to IonSearchResults and SearchResultBuilder across the codebase. All consumers now use the new ScoredCandidate/CompetedCandidate/FinalResult types. Drop the parquet_derive dependency since the derive macro is no longer used.

- ScoreTraces → ElutionTraces; fields main_score → apex_profile, ms2_cosine_ref_sim → cosine_trace - ScoringContext → Extraction; field query_values → chromatograms - ApexLocation/ApexScore::raising_cycles → rising_cycles - PeptideMetadata::ref_rt_seconds → query_rt_seconds - build_candidate_context → build_broad_extraction in pipeline.rs - build_calibrated_context → build_calibrated_extraction in pipeline.rs - main_loop → execute_pipeline in processing.rs - process_speclib → run_pipeline in processing.rs / main.rs - Move SCRIBE_FLOOR from scribe.rs to apex_features.rs; delete scribe.rs - Update iter_scores() string literals: "main_score" → "apex_profile", "ms2_cosine_ref_sim" → "cosine_trace" - Update viewer files (computed_state.rs, plot_renderer.rs) for all renames

…emove secondary_tolerance Add get_spectral_tolerance() and get_isotope_tolerance() to CalibrationResult, thread them as explicit parameters through execute_secondary_query and its callers, and delete the secondary_tolerance field from Scorer.

- Add `lookback: usize` parameter to `find_optimal_path` (calibrt), removing the hardcoded `let lookback = 30` - Add `lookback: usize` parameter to `calibrate_with_ranges`; update the `calibrate` convenience wrapper to pass 30 - Pass `config.dp_lookback` from `CalibrationConfig` through `calibrate_from_phase1` in processing.rs - Pass lookback=10 in the identity fallback in rt_calibration.rs (2-point curve needs no large window) - Remove unused `lowess_frac` field from `CalibrationConfig` - Update calibrt integration tests to supply the new lookback argument

…ult to ViewerResult Update viewer-facing API to use clearer names: FullQueryResult → ViewerResult with fields traces/longitudinal_apex_profile/chromatograms/scored, and the method process_query_full → score_for_viewer on Scorer.

Separate CLI output into three streams for clearer user experience: - stdout: brief phase milestones + 1% FDR result summary - log file: full tracing record at configured level (default: {output_dir}/timsseek.log) - stderr: progress bars (TTY only) + warn/error tracing messages Replace -v/-q verbosity flags with --log-path and --log-level options. Progress bars auto-hide when stderr is not a terminal.

… debug-level for expected failures Replace StorageProvider probe in sniff_cached_index with a direct Path::exists() check for local paths, eliminating spurious ERROR/INFO lines on the normal "not a cached index" code path. Cloud paths keep the probe but log at debug! instead of error!. Also demote cache-miss error! to debug! in try_load_from_cache, simplify the load_index_auto detection log, and drop noisy info! calls in timscentroid storage.rs to trace!/debug!.

…g, max-qvalue - Upgrade forust-ml 0.4.8 → 0.5.0 - GBM early stopping at 100 rounds (PrecomputedFeatures row-major matrix) - Load speclib/calib_lib once in main(), pass by &Speclib reference - Add RunReport (per-invocation) with speclib/index loading timings - Add --max-qvalue CLI arg (default 0.5, filters Parquet output) - Fix duplicate NUM_MS2_IONS/NUM_MS1_IONS constants (import from mod.rs) - Re-export FileReport, RunReport from scoring mod

Change find_optimal_path signature to accept &mut Vec<f64> and &mut Vec<Option<usize>> buffers instead of allocating them internally. Caller in calibrate_with_ranges creates temporary buffers for now; Task 3 will move ownership to CalibrationState for reuse across calls.

Introduces CalibrationState, a reusable struct that owns Grid, DP buffers, and path indices to enable incremental calibration without repeated allocation. Makes CalibrationCurve::new pub(crate) so fit() can construct it within the crate.

…ibrantHeap::iter()

…ces + cached profiles

…_apex_location as wrappers

Move the core extraction logic from Scorer::build_broad_extraction into a standalone generic function that works with any KeyLike type (IonAnnot or String), enabling reuse by both the CLI scorer and the viewer.

Introduce the calibration state machine and background scoring thread infrastructure for the viewer's live RT calibration panel: - Create calibration.rs with ViewerCalibrationState (Idle/Running/Paused/Done), background thread using AtomicU8 control + thread::park for pause, bounded sync_channel for CalibrationMessage snapshots, and CalibrantHeap accumulation with periodic snapshot sends. - Add Pane::Calibration variant to the dock layout. - Wrap ElutionGroupData in Arc<> for sharing with the background thread. - Add calibration.poll() to the update loop with request_repaint. - Make build_extraction() pub (was pub(crate)) so the viewer can call it. - Add calibrt dependency to the viewer.

Add render_panel() to ViewerCalibrationState with: - Context-sensitive control buttons (Start/Pause/Resume/Stop/Reset) - Progress counters (scored / total, calibrants / capacity) - egui_plot scatter showing suppressed, retained, and path grid cells - Fitted calibration curve as a cyan line sampled at 200 points - WRMSE display and RT tolerance suggestion with Apply button Also adds CalibrationCurve::points() public accessor to calibrt.

…ge) reference lines

- CalibrationState::measure_ridge_width() expands from path cells into adjacent cells above a weight threshold fraction - Returns RidgeMeasurement { x, half_width, total_weight } per column - Viewer: weighted-average half-width as global tolerance (heavy columns count more), replaces non-suppressed cell residual approach

- CalibrationResult stores ridge widths and interpolates at query RT - get_tolerance() now returns position-dependent RT tolerance (wider at edges, tighter in middle) based on the calibration grid ridge width - CLI's calibrate_from_phase1 switched to CalibrationState API for ridge measurement after curve fitting - get_tolerance receives library RT (ridge widths indexed by library RT) - Fallback to uniform tolerance when no ridge data available - RIDGE_WIDTH_MULTIPLIER (1.0) and MIN_RT_TOLERANCE_MINUTES (0.5) tunable

…d polygon

…rances)

Move Array2D and ArrayElement from timsquery into a new rust/array2d workspace crate with its own Array2DError type. timsquery re-exports from array2d and bridges errors via From<Array2DError> for DataProcessingError. calibrt gains array2d as a direct dependency.

Makes the semantic meaning explicit: library RT on the x-axis and observed RT on the y-axis. Also renames RidgeMeasurement.x to .library.

…types

…dge measurement

…ation viz Pathfinding: use geometric mean (sqrt weights) for edge weight formula to reduce bias against sparse regions. After DP pass, greedily extend path backward/forward through monotonic non-suppressed cells to cover full RT range. Viewer: draw extrapolated prediction as dashed red line beyond curve bounds, clamped to grid y-range to prevent runaway extrapolation.

Print training fold progress and scoring time to stderr so users can tell the rescore phase isn't frozen. Uses Duration debug format for automatic unit selection. Also fix: score() was calling assign_scores() redundantly after fit() already assigned them — removed the double scoring pass.

…tric Add column_weight to RidgeMeasurement (total weight in full column) alongside ridge_weight (weight inside ridge bounds). Compute in-ridge ratio in RidgeWidthSummary and display as percentage in both CLI calibration summary and viewer tolerance panel.

- Viewer deadlock: drop channel receiver before joining background thread in reset() and Drop, use try_send for Done messages so the background thread never blocks on a full channel. - RT fields: replace ambiguous query_rt_seconds/delta_rt/sq_delta_rt/ recalibrated_rt with explicit library_rt, calibrated_rt_seconds, obs_rt_seconds, and calibrated_sq_delta_rt computed from calibrated residuals. ML features updated accordingly. - Batch error handling: replace .unwrap() on run_pipeline with proper error propagation; abort batch on I/O errors (disk full, permissions) instead of retrying every remaining file.

- Grid::reset() preserves bin center geometry instead of zeroing nodes; add Grid::reconfigure() for changing dimensions. - CalibrationState::update returns Result — rejects NaN/Inf coordinates and weights at the grid boundary instead of silently accumulating them. Propagated as error in CLI, logged as warning in viewer. - Replace bare .unwrap() on partial_cmp with descriptive .expect() messages documenting the invariant that NaN scores should not reach the sort phase.

- I4: Document count_falling_steps convention (apex counts as 1) - I5: get_frag_range returns Result instead of panicking on non-DIA files - I6: n_scored in calibration JSON now reflects Phase 1 library size, not calibrant count (which was redundant with n_calibrants) - I7: rt_range_seconds in calibration JSON now uses raw file RT range from the cycle mapping, not the calibrant subset - I8: Fold progress uses atomic eprintln! lines instead of split eprint!/eprintln! to avoid interleaving with progress bars - I9: CalibrantCandidate Ord uses f32::total_cmp for sound ordering - I10: Parquet writer returns Result from add/flush/close instead of panicking on write errors; propagated as TimsSeekError::Io

- Extract save_calibration_dialog helper to deduplicate Paused/Done save button logic in the viewer - Fix redundant sqrt() call in apex_finding compute_pass_1 - Mark Parquet columns as non-nullable (no data is ever null, saves validity bitmap overhead)

prescore() only needs ApexLocation — the metadata (including a cloned digest String per peptide) was built and immediately discarded. Call build_extraction directly instead of build_broad_extraction to avoid the unnecessary allocation in the Phase 1 hot loop.

- Make rayon an optional dependency (default on). Serial mode is now --no-default-features instead of the dead serial_scoring feature. - Gate parallel code behind #[cfg(feature = "rayon")], serial behind #[cfg(not(feature = "rayon"))]. - Add PrescoreTimings struct with extraction/scoring breakdown and n_passed_filter/n_scored counters. Aggregated via fold/reduce in both parallel and serial paths. - Rename thread-summed timing fields to *_thread_ms to distinguish from wall-clock phase timings. - Remove dead serial_scoring feature, dead rayon re-export in cv.rs. - Remove per-item filter timing (measured clock overhead, not work).

Replace manual Instant::now()/elapsed()/println!() boilerplate with two primitives: timed! for hot-path Duration accumulation (pipeline.rs) and TimedStep for progressive CLI output with auto dot-padding and tracing spans (main.rs, processing.rs, cv.rs). Duration display uses {:?} for automatic unit selection everywhere.

jspaezp added 30 commits March 28, 2026 13:30

feat: pyaccess

1d37d1a

feat: add ScoringFields, ScoredCandidate, CompetedCandidate, FinalRes…

6eb710a

…ult types

refactor: typed pipeline stages (ScoredCandidate → CompetedCandidate …

c4aaea7

…→ FinalResult)

feat: manual Parquet writer with exhaustive destructure

0ff4ca7

feat: PipelineReport with full timing (phases 4-6) and q-value counts

6d06381

chore: remove timsseek_rescore and timsseek_rts_receiver packages

93550bb

feat(calibrt): make Node pub, add Grid::reset() and grid_cells()

840d466

feat(timsseek): add library_rt_seconds to CalibrantCandidate, add Cal…

66d3aca

…ibrantHeap::iter()

refactor(timsseek): rename ApexFinder -> TraceScorer, add compute_tra…

ba8cf65

…ces + cached profiles

feat(timsseek): add suggest_apex and score_at, rewrite find_apex/find…

739df17

…_apex_location as wrappers

refactor(timsseek): extract shared build_extraction() function

39cbcf3

Move the core extraction logic from Scorer::build_broad_extraction into a standalone generic function that works with any KeyLike type (IonAnnot or String), enabling reuse by both the CLI scorer and the viewer.

refactor(viewer): migrate to TraceScorer (rename only, same behavior)

f2ca852

refactor: remove ApexFinder alias, all code uses TraceScorer

3affdc2

feat(viewer): add save/load calibration as JSON v1

b000f40

jspaezp added 29 commits April 10, 2026 12:15

feat(viewer): show separate Library RT (blue) and Calibrated RT (oran…

fb22e3e

…ge) reference lines

feat(viewer): persist calibration across restarts via app state snapshot

392c135

feat(viewer): parallelize calibration scoring via Rayon chunked par_iter

cf5b456

feat(viewer): display ridge tolerance envelope on calibration heatmap

4b0cf62

fix(viewer): ridge envelope as dashed boundary lines instead of fille…

e4a5ce0

…d polygon

feat(cli): print calibration summary after Phase 2 (ridge width, tole…

050559c

…rances)

feat(calibrt): add LibraryRT<T> and ObservedRTSeconds<T> newtypes

f795c71

refactor(calibrt): rename Point fields x/y to library/observed

ac47f11

Makes the semantic meaning explicit: library RT on the x-axis and observed RT on the y-axis. Also renames RidgeMeasurement.x to .library.

feat(calibrt): type API boundary with LibraryRT/ObservedRTSeconds new…

971700e

…types

feat(calibrt): ping-pong weight buffers with 3x3 gaussian blur for ri…

5079c23

…dge measurement

feat(timsseek): use RT newtypes in CalibrationResult API

fedcb60

feat: propagate RT newtypes to CalibrantCandidate and all callers

c04a218

chore: update deps

446793a

chore: update uv lock

8c7a1f9

chore: update uv lock

a822c50

chore: bump versions

8954ee0

chore: minor review suggestions

1e908b9

- Extract save_calibration_dialog helper to deduplicate Paused/Done save button logic in the viewer - Fix redundant sqrt() call in apex_finding compute_pass_1 - Mark Parquet columns as non-nullable (no data is ever null, saves validity bitmap overhead)

jspaezp merged commit 23ea962 into main Apr 13, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/calibration stage scoring#54

Feat/calibration stage scoring#54
jspaezp merged 64 commits intomainfrom
feat/calibration_stage_scoring

jspaezp commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jspaezp commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant