Skip to content

Benchmarks

genomewalker edited this page Feb 27, 2026 · 9 revisions

Benchmarks

All benchmarks use the KapK ancient sediment dataset: a Holocene lake sediment metagenome from the Kap København Formation, Greenland, containing reads from ancient (damaged, ~86 bp modal length) and modern (undamaged, ~177 bp) DNA populations. Assembly was performed with MEGAHIT from 119 time-point samples co-assembled (~2.8 Gbp total, ~280,000 contigs ≥ 2,500 bp). Bin quality assessed with CheckM2 v1.0.2 [1].

MIMAG thresholds [2]: HQ = completeness ≥ 90%, contamination < 5%; MQ = completeness ≥ 50%, contamination < 10%.


Tool comparison

All tools ran on the same assembly and BAM. AMBER result is the amber resolve consensus from 3 independent runs (3 encoder restarts × 25 Leiden seeds each). COMEBin and SemiBin2 results are across independent replicate runs; mean and range shown.

Tool HQ bins MQ bins Reps Notes
SemiBin2 [3] 5.7 (range 5–6) 17 3 Self-supervised contrastive, no aDNA features
COMEBin [4] 7.6 (range 6–9) 16–17 5 Standard self-supervised InfoNCE, no aDNA features
AMBER (this work) 11 20 3 → resolve Damage-aware InfoNCE + quality-guided Leiden + co-binning consensus

AMBER recovers 2–5 additional HQ bins compared with competing methods. SemiBin2 is limited to 5–6 HQ and is stable across runs; COMEBin varies between 6 and 9 HQ across 5 runs. AMBER is fully stable (11/11/11).


AMBER — per-bin quality (resolve consensus)

All 11 HQ bins. Genome sizes in Mbp.

Bin Completeness Contamination Size (Mbp)
bin_28 100.0% 4.45% 3.63
bin_77 99.9% 0.69% 0.84
bin_53 99.9% 1.86% 3.30
bin_42 98.2% 0.05% 1.12
bin_24 97.7% 0.23% 3.19
bin_26 96.5% 2.09% 0.77
bin_38 95.0% 2.65% 3.34
bin_35 94.6% 0.90% 2.45
bin_4 92.3% 1.90% 1.91
bin_8 90.6% 2.29% 2.76
bin_47 90.1% 2.62% 3.58

SemiBin2 — per-bin quality (best replicate, 6 HQ)

Best of 3 SemiBin2 self-supervised runs. All 6 HQ bins shown; SemiBin2 produces 17 MQ bins per run.

Bin Completeness Contamination Size (Mbp) Tier
SemiBin_13 100.0% 4.36% 3.69 HQ
SemiBin_146 99.9% 0.64% 0.83 HQ
SemiBin_37 99.1% 2.07% 3.44 HQ
SemiBin_11 99.1% 0.37% 3.35 HQ
SemiBin_21 97.7% 0.04% 1.10 HQ
SemiBin_12 97.5% 2.15% 0.76 HQ

COMEBin — per-bin quality (best replicate, 9 HQ)

Best of 5 COMEBin runs (rep5). All 9 HQ bins shown; COMEBin recovers 6–9 HQ and 16–17 MQ bins per run.

Bin Completeness Contamination Size (Mbp) Tier
27966 100.0% 2.15% 3.54 HQ
25795 100.0% 4.31% 3.56 HQ
28106 99.8% 0.66% 0.96 HQ
25284 99.2% 0.41% 3.33 HQ
27724 97.5% 0.05% 1.14 HQ
27394 93.3% 1.67% 3.53 HQ
26333 93.2% 2.10% 1.65 HQ
25942 91.9% 0.94% 2.72 HQ
23134 91.3% 2.04% 2.41 HQ

Comparison highlights

Top-quality bins (≥ 97% completeness, < 5% contamination): AMBER and COMEBin recover 5 such bins; SemiBin2 recovers 6. Genome sizes are consistent (~0.8–3.7 Mbp), suggesting a shared core of high-completeness genomes that any reasonable binner recovers. The differences emerge below 97% completeness.

Bins unique to AMBER: bin_8 (90.6% / 2.29%, 2.76 Mbp) and bin_47 (90.1% / 2.62%, 3.58 Mbp) cross the HQ threshold in AMBER but are not recovered as HQ by either COMEBin or SemiBin2. These are likely genomes where aDNA damage features provide signal to separate them from neighbouring bins.

Contamination control: AMBER's median contamination across 11 HQ bins is 1.90%; COMEBin best rep median is 1.67%; SemiBin2 best rep median is 1.35% — all comparable. AMBER's worst HQ bin is bin_28 at 4.45%.

Reproducibility: AMBER produces exactly 11 HQ bins in every replicate run. SemiBin2 is stable (5–6 HQ) but recovers fewer genomes. COMEBin varies between 6 and 9 HQ across 5 runs.


References

  1. Chklovski A et al. (2023) CheckM2: a rapid, scalable, and accurate tool for assessing microbial genome quality using machine learning. Nature Methods 20:1203–1212.
  2. Bowers RM et al. (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG). Nature Biotechnology 35:725–731.
  3. Pan S, Zhao X-M, Coelho LP (2023) SemiBin2: self-supervised contrastive learning leads to better MAGs for short- and long-read sequencing. Bioinformatics 39(Suppl 1):i21–i29.
  4. Wang Z et al. (2024) COMEBin allows effective binning of metagenomic contigs using coverage multi-view encoder. Nature Communications 15:1119.

Clone this wiki locally