Skip to content

Feat/calibration stage scoring#54

Merged
jspaezp merged 64 commits intomainfrom
feat/calibration_stage_scoring
Apr 13, 2026
Merged

Feat/calibration stage scoring#54
jspaezp merged 64 commits intomainfrom
feat/calibration_stage_scoring

Conversation

@jspaezp
Copy link
Copy Markdown
Collaborator

@jspaezp jspaezp commented Apr 10, 2026

... and A LOT more

jspaezp added 30 commits March 28, 2026 13:30
Delete `build_narrow_context`, `process_query`, and `process_batch` — the old
uncalibrated pipeline that is no longer called. The CLI exclusively uses
`prescore_batch` + `score_calibrated_batch`. Also remove unused `info` import
and clean up the stale module-level doc comment.
Delete ToleranceHierarchy struct and inline its prescore/secondary fields
directly onto Scorer as broad_tolerance and secondary_tolerance. The
tertiary_tolerance() method is inlined at its single call site.
Update all re-exports, consumer function signatures, and construction sites.
Ports SearchResultBuilder → ScoredCandidateBuilder in results.rs using
the new ScoringFields field names and [f32; N] arrays for per-ion data.
Updates finalize_results in pipeline.rs to use the new builder and return
Result<ScoredCandidate, DataProcessingError>. Callers (Task 5) still
expect IonSearchResults; those mismatches are intentional intermediate state.
Replace all IonSearchResults references with ScoredCandidate in the
accumulator, score_calibrated_extraction, score_calibrated_batch, and
FullQueryResult. The pipeline now produces ScoredCandidate throughout;
CLI callers in processing.rs will be updated in Task 6.
…erive

Delete search_results.rs entirely and remove all references to IonSearchResults
and SearchResultBuilder across the codebase. All consumers now use the new
ScoredCandidate/CompetedCandidate/FinalResult types. Drop the parquet_derive
dependency since the derive macro is no longer used.
- ScoreTraces → ElutionTraces; fields main_score → apex_profile, ms2_cosine_ref_sim → cosine_trace
- ScoringContext → Extraction; field query_values → chromatograms
- ApexLocation/ApexScore::raising_cycles → rising_cycles
- PeptideMetadata::ref_rt_seconds → query_rt_seconds
- build_candidate_context → build_broad_extraction in pipeline.rs
- build_calibrated_context → build_calibrated_extraction in pipeline.rs
- main_loop → execute_pipeline in processing.rs
- process_speclib → run_pipeline in processing.rs / main.rs
- Move SCRIBE_FLOOR from scribe.rs to apex_features.rs; delete scribe.rs
- Update iter_scores() string literals: "main_score" → "apex_profile", "ms2_cosine_ref_sim" → "cosine_trace"
- Update viewer files (computed_state.rs, plot_renderer.rs) for all renames
…emove secondary_tolerance

Add get_spectral_tolerance() and get_isotope_tolerance() to CalibrationResult, thread them
as explicit parameters through execute_secondary_query and its callers, and delete the
secondary_tolerance field from Scorer.
- Add `lookback: usize` parameter to `find_optimal_path` (calibrt), removing the hardcoded `let lookback = 30`
- Add `lookback: usize` parameter to `calibrate_with_ranges`; update the `calibrate` convenience wrapper to pass 30
- Pass `config.dp_lookback` from `CalibrationConfig` through `calibrate_from_phase1` in processing.rs
- Pass lookback=10 in the identity fallback in rt_calibration.rs (2-point curve needs no large window)
- Remove unused `lowess_frac` field from `CalibrationConfig`
- Update calibrt integration tests to supply the new lookback argument
…ult to ViewerResult

Update viewer-facing API to use clearer names: FullQueryResult → ViewerResult with
fields traces/longitudinal_apex_profile/chromatograms/scored, and the method
process_query_full → score_for_viewer on Scorer.
Separate CLI output into three streams for clearer user experience:
- stdout: brief phase milestones + 1% FDR result summary
- log file: full tracing record at configured level (default: {output_dir}/timsseek.log)
- stderr: progress bars (TTY only) + warn/error tracing messages

Replace -v/-q verbosity flags with --log-path and --log-level options.
Progress bars auto-hide when stderr is not a terminal.
… debug-level for expected failures

Replace StorageProvider probe in sniff_cached_index with a direct
Path::exists() check for local paths, eliminating spurious ERROR/INFO
lines on the normal "not a cached index" code path. Cloud paths keep
the probe but log at debug! instead of error!. Also demote cache-miss
error! to debug! in try_load_from_cache, simplify the load_index_auto
detection log, and drop noisy info! calls in timscentroid storage.rs
to trace!/debug!.
…g, max-qvalue

- Upgrade forust-ml 0.4.8 → 0.5.0
- GBM early stopping at 100 rounds (PrecomputedFeatures row-major matrix)
- Load speclib/calib_lib once in main(), pass by &Speclib reference
- Add RunReport (per-invocation) with speclib/index loading timings
- Add --max-qvalue CLI arg (default 0.5, filters Parquet output)
- Fix duplicate NUM_MS2_IONS/NUM_MS1_IONS constants (import from mod.rs)
- Re-export FileReport, RunReport from scoring mod
Change find_optimal_path signature to accept &mut Vec<f64> and
&mut Vec<Option<usize>> buffers instead of allocating them internally.
Caller in calibrate_with_ranges creates temporary buffers for now;
Task 3 will move ownership to CalibrationState for reuse across calls.
Introduces CalibrationState, a reusable struct that owns Grid, DP buffers,
and path indices to enable incremental calibration without repeated allocation.
Makes CalibrationCurve::new pub(crate) so fit() can construct it within the crate.
Move the core extraction logic from Scorer::build_broad_extraction into
a standalone generic function that works with any KeyLike type (IonAnnot
or String), enabling reuse by both the CLI scorer and the viewer.
Introduce the calibration state machine and background scoring thread
infrastructure for the viewer's live RT calibration panel:

- Create calibration.rs with ViewerCalibrationState (Idle/Running/Paused/Done),
  background thread using AtomicU8 control + thread::park for pause,
  bounded sync_channel for CalibrationMessage snapshots, and CalibrantHeap
  accumulation with periodic snapshot sends.
- Add Pane::Calibration variant to the dock layout.
- Wrap ElutionGroupData in Arc<> for sharing with the background thread.
- Add calibration.poll() to the update loop with request_repaint.
- Make build_extraction() pub (was pub(crate)) so the viewer can call it.
- Add calibrt dependency to the viewer.
Add render_panel() to ViewerCalibrationState with:
- Context-sensitive control buttons (Start/Pause/Resume/Stop/Reset)
- Progress counters (scored / total, calibrants / capacity)
- egui_plot scatter showing suppressed, retained, and path grid cells
- Fitted calibration curve as a cyan line sampled at 200 points
- WRMSE display and RT tolerance suggestion with Apply button

Also adds CalibrationCurve::points() public accessor to calibrt.
jspaezp added 29 commits April 10, 2026 12:15
- CalibrationState::measure_ridge_width() expands from path cells into
  adjacent cells above a weight threshold fraction
- Returns RidgeMeasurement { x, half_width, total_weight } per column
- Viewer: weighted-average half-width as global tolerance (heavy columns
  count more), replaces non-suppressed cell residual approach
- CalibrationResult stores ridge widths and interpolates at query RT
- get_tolerance() now returns position-dependent RT tolerance (wider at
  edges, tighter in middle) based on the calibration grid ridge width
- CLI's calibrate_from_phase1 switched to CalibrationState API for
  ridge measurement after curve fitting
- get_tolerance receives library RT (ridge widths indexed by library RT)
- Fallback to uniform tolerance when no ridge data available
- RIDGE_WIDTH_MULTIPLIER (1.0) and MIN_RT_TOLERANCE_MINUTES (0.5) tunable
Move Array2D and ArrayElement from timsquery into a new rust/array2d
workspace crate with its own Array2DError type. timsquery re-exports
from array2d and bridges errors via From<Array2DError> for
DataProcessingError. calibrt gains array2d as a direct dependency.
Makes the semantic meaning explicit: library RT on the x-axis and
observed RT on the y-axis. Also renames RidgeMeasurement.x to .library.
…ation viz

Pathfinding: use geometric mean (sqrt weights) for edge weight formula to
reduce bias against sparse regions. After DP pass, greedily extend path
backward/forward through monotonic non-suppressed cells to cover full RT range.

Viewer: draw extrapolated prediction as dashed red line beyond curve bounds,
clamped to grid y-range to prevent runaway extrapolation.
Print training fold progress and scoring time to stderr so users
can tell the rescore phase isn't frozen. Uses Duration debug format
for automatic unit selection.

Also fix: score() was calling assign_scores() redundantly after fit()
already assigned them — removed the double scoring pass.
…tric

Add column_weight to RidgeMeasurement (total weight in full column)
alongside ridge_weight (weight inside ridge bounds). Compute in-ridge
ratio in RidgeWidthSummary and display as percentage in both CLI
calibration summary and viewer tolerance panel.
- Viewer deadlock: drop channel receiver before joining background
  thread in reset() and Drop, use try_send for Done messages so the
  background thread never blocks on a full channel.

- RT fields: replace ambiguous query_rt_seconds/delta_rt/sq_delta_rt/
  recalibrated_rt with explicit library_rt, calibrated_rt_seconds,
  obs_rt_seconds, and calibrated_sq_delta_rt computed from calibrated
  residuals. ML features updated accordingly.

- Batch error handling: replace .unwrap() on run_pipeline with proper
  error propagation; abort batch on I/O errors (disk full, permissions)
  instead of retrying every remaining file.
- Grid::reset() preserves bin center geometry instead of zeroing nodes;
  add Grid::reconfigure() for changing dimensions.

- CalibrationState::update returns Result — rejects NaN/Inf coordinates
  and weights at the grid boundary instead of silently accumulating them.
  Propagated as error in CLI, logged as warning in viewer.

- Replace bare .unwrap() on partial_cmp with descriptive .expect()
  messages documenting the invariant that NaN scores should not reach
  the sort phase.
- I4: Document count_falling_steps convention (apex counts as 1)
- I5: get_frag_range returns Result instead of panicking on non-DIA files
- I6: n_scored in calibration JSON now reflects Phase 1 library size,
  not calibrant count (which was redundant with n_calibrants)
- I7: rt_range_seconds in calibration JSON now uses raw file RT range
  from the cycle mapping, not the calibrant subset
- I8: Fold progress uses atomic eprintln! lines instead of split
  eprint!/eprintln! to avoid interleaving with progress bars
- I9: CalibrantCandidate Ord uses f32::total_cmp for sound ordering
- I10: Parquet writer returns Result from add/flush/close instead of
  panicking on write errors; propagated as TimsSeekError::Io
- Extract save_calibration_dialog helper to deduplicate Paused/Done
  save button logic in the viewer
- Fix redundant sqrt() call in apex_finding compute_pass_1
- Mark Parquet columns as non-nullable (no data is ever null, saves
  validity bitmap overhead)
prescore() only needs ApexLocation — the metadata (including a cloned
digest String per peptide) was built and immediately discarded. Call
build_extraction directly instead of build_broad_extraction to avoid
the unnecessary allocation in the Phase 1 hot loop.
- Make rayon an optional dependency (default on). Serial mode is now
  --no-default-features instead of the dead serial_scoring feature.
- Gate parallel code behind #[cfg(feature = "rayon")], serial behind
  #[cfg(not(feature = "rayon"))].
- Add PrescoreTimings struct with extraction/scoring breakdown and
  n_passed_filter/n_scored counters. Aggregated via fold/reduce in
  both parallel and serial paths.
- Rename thread-summed timing fields to *_thread_ms to distinguish
  from wall-clock phase timings.
- Remove dead serial_scoring feature, dead rayon re-export in cv.rs.
- Remove per-item filter timing (measured clock overhead, not work).
Replace manual Instant::now()/elapsed()/println!() boilerplate with two
primitives: timed! for hot-path Duration accumulation (pipeline.rs) and
TimedStep for progressive CLI output with auto dot-padding and tracing
spans (main.rs, processing.rs, cv.rs). Duration display uses {:?} for
automatic unit selection everywhere.
@jspaezp jspaezp merged commit 23ea962 into main Apr 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant