DRAFT: finalise metrics by hspitzer · Pull Request #40 · simonmfr/cellseg-benchmark

hspitzer · 2026-01-26T09:25:57Z

Draft PR for finalising metrics.
Basic structure for every metric: compute_ and plot_ function in cellseg_benchmark/metrics. Script in scripts/metrics for every metric. The script creates a csv file with rows for every sample & method. All scripts can be run on a specific set of methods / all methods at once and have an overwrite flag. The script also plots figures using the plot_ function. Finally, there is a notebook for every method to interactively compute methods & plots.

Metrics to implement

Clean up:

remove unused notebooks.
remove unused functions in cellseg_benchmark/metrics

hspitzer · 2026-01-26T09:37:45Z

@simonmfr please check implementation of clustering scores. I compute leiden clustering per sample. "leiden" is not in adata.obsm (using adata_integrated).

simonmfr · 2026-02-19T19:50:25Z

The reason why leiden clustering labels don't appear in adata_integrated is the following: clustering is currently computed during the cell type annotation step, per sample, with resolution=10 (here). The resulting cluster labels are written to adata_obs_annotated.csv, but they are not transferred into the downstream master_sdata object (see here).

I think a clustering resolution of 10 is not ideal for metric evaluation, therefore I would suggest keeping the current behavior unchanged. This mean running the clustering step twice, but with different resolutions:

leiden with res=10 for cell type annotation
leiden with res=1 (or similar) for clustering metric evaluation

hspitzer · 2026-03-06T11:43:47Z

I just added scripts for general stats extraction & plotting (all present in adata obs). I noticed that intensities (DAPI / PolyT) were not computed for the following methods (aging cohort): vpt_2D_DAPI_nuclei, vpt_3D_DAPI_PolyT_nuclei, vpt_3D_DAPI_nuclei, Cellpose_1_Merlin, vpt_2D_DAPI_PolyT, vpt_2D_DAPI_PolyT_nuclei, vpt_3D_DAPI_PolyT

simonmfr · 2026-03-06T17:12:17Z

I just added scripts for general stats extraction & plotting (all present in adata obs). I noticed that intensities (DAPI / PolyT) were not computed for the following methods (aging cohort): vpt_2D_DAPI_nuclei, vpt_3D_DAPI_PolyT_nuclei, vpt_3D_DAPI_nuclei, Cellpose_1_Merlin, vpt_2D_DAPI_PolyT, vpt_2D_DAPI_PolyT_nuclei, vpt_3D_DAPI_PolyT

Thanks for highlighting. These are all our VPT/Merlin methods, which contain a separate DAPI/PolyT quantification. We need to format these correctly; I've added it to our to-dos as #43.

hspitzer · 2026-03-31T14:24:21Z

Re cleanup for this PR: I propose to remove the following notebooks:
metrics_cell_type_based, metrics_cell_types, metrics_general_old, metrics_morphology
All of these are re-implemented by my functions.

There might also be old metrics functions, maybe we can find them once we remove the notebooks by checking which code in the repo references them. If we don't use it at all it can go imo.

simonmfr · 2026-03-31T16:35:41Z

I double checked this and removed the notebooks you mentioned: metrics_cell_type_based, metrics_cell_types, metrics_general_old, metrics_morphology
Also removed the outdated metric function files: metrics/wasserstein.py and metrics/specificity.py
Other outdated metric functions were defined in the removed notebooks

hspitzer · 2026-04-02T09:15:45Z

hspitzer · 2026-04-02T12:56:46Z

I updated the MECR and marker f1 computation to include samples. The plots now aggregate per gene, then per cell type showing samples as points and ordered by mean sample value. I realised that MECR is lower is better, so we'll have to *-1 this for the summary table.

simonmfr · 2026-04-09T03:15:31Z

I added marker F1 sample mean to table
Regarding metric normalization:
- The total score is now computed as the mean of absolute metrics (all are in range [0, 1] without any normalization, except for silhouette score [-1, 1] which is rescaled to [0, 1]). In this way, the total score stays comparable between datasets.
- For plotting, individual metrics (but not the total score) are min-max normalized given the dataset, as before and to highlight differences between methods; otherwise differences appear very small.
- scib used the same approach, where metrics are in the range of [0, 1] based on theoretical min/max, which is then used for computing scores. On the plots, metrics are most likely min-max normalized (all range from [0, 1]), but I haven't found that in the plotting code yet.

Agent-Logs-Url: https://github.com/simonmfr/cellseg-benchmark/sessions/99ed110c-3933-4f0e-9681-cc0245530075 Co-authored-by: simonmfr <70199914+simonmfr@users.noreply.github.com>

Debuggt Ficture F1 score Setup of __init__-file Relative imports of cellseg_benchmark functions

implement scripts for cell type and clustering metrics

565f900

Hannah Spitzer and others added 3 commits January 26, 2026 12:32

implement MECR score

5284cc1

upplotecr scores

058a6c5

add marker f1 score and plotting functions

c8098b3

Hannah Spitzer and others added 3 commits February 20, 2026 10:34

add negative marker purity scripts

bbcbb0c

functions to compute positive markers

acf0765

fix positive marker computation, add general stats

d708f07

implement assigned transcript quantification

79e3ffe

simonmfr force-pushed the metrics branch from cc769c0 to 79e3ffe Compare March 6, 2026 16:54

hspitzer commented Mar 12, 2026

View reviewed changes

Comment thread cellseg_benchmark/metrics/utils.py Outdated

simonmfr and others added 2 commits March 13, 2026 15:33

fix assigned_transcript plotting

276c497

memory and time metric (untested)

b9682b6

hspitzer and others added 4 commits March 31, 2026 16:25

Merge branch 'main' into metrics

96a518b

finalise positive marker dict

c6b8680

Merge branch 'metrics' of github.com:simonmfr/st-bsb into metrics

dd0a79c

remove outdated metric code

3a10029

add **kwargs to each metric_func

5b3b4f6

Hannah Spitzer added 2 commits April 2, 2026 15:00

update maker-based metrics to include sample

aaa23ba

XMerge branch 'metrics' of github.com:simonmfr/st-bsb into metrics

27c17c9

simonmfr and others added 2 commits April 9, 2026 05:22

revise final table

46b8904

calculate marker based metrics

3981ea4

simonmfr requested changes Apr 21, 2026

View reviewed changes

simonmfr reviewed Apr 21, 2026

View reviewed changes

Comment thread scripts/metrics/compute_f1_samples.py

simonmfr requested changes Apr 21, 2026

View reviewed changes

jonas2612 and others added 27 commits April 22, 2026 08:31

review of Simon

677129b

adapt for compute_ficture_f1_parallel

6312766

Added fail-save if Ficture output not available

5409965

Proseg bugfix

5aa9d8f

Set cell_id for Proseg boundaries directly

f04c5c9

convert p_id for CP1_Merlin to str

85f637f

Remove from … import …

b6b6d59

fix: import mpl_toolkits.axes_grid1 subpackage explicitly in ovrl.py

13f3b43

Agent-Logs-Url: https://github.com/simonmfr/cellseg-benchmark/sessions/99ed110c-3933-4f0e-9681-cc0245530075 Co-authored-by: simonmfr <70199914+simonmfr@users.noreply.github.com>

fix: Remove xarray dependency and unused imports

ab41f6b

Fix: specify output dtype

b5d37de

allow for module import for cellseg_benchmark

d922e9e

Fix: imports

7923bac

Prevention of runtime error

5ff203e

Update __init__.py

a1970c2

Fix: switch to only relative imports and remove imports with from

c2a856f

Fix: Convert index to str for Cellpose_1_Merlin

48d0704

Copilot recommendations

9c7d456

Merge pull request #48 from simonmfr/debug_ficture_f1

f2d5586

Debuggt Ficture F1 score Setup of __init__-file Relative imports of cellseg_benchmark functions

Fix import of compute_f1

9c20d51

Bugfixes

169f8d1

fix typo

8367c46

fix typo

eaef8f8

allow tsv-parser to skip bad lines

643f43d

add new segmentation methods to method_colors

34a25c4

change plotting function

893d0ec

Update jobname parser

be3d621

Update argsparse

37988b2

Conversation

hspitzer commented Jan 26, 2026 • edited by jonas2612 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hspitzer commented Jan 26, 2026

Uh oh!

simonmfr commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hspitzer commented Mar 6, 2026

Uh oh!

simonmfr commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

hspitzer commented Mar 31, 2026

Uh oh!

simonmfr commented Mar 31, 2026

Uh oh!

hspitzer commented Apr 2, 2026 • edited by jonas2612 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hspitzer commented Apr 2, 2026

Uh oh!

simonmfr commented Apr 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hspitzer commented Jan 26, 2026 •

edited by jonas2612

Loading

simonmfr commented Feb 19, 2026 •

edited

Loading

simonmfr commented Mar 6, 2026 •

edited

Loading

hspitzer commented Apr 2, 2026 •

edited by jonas2612

Loading