Benchmarks

All benchmarks use the KapK ancient sediment dataset: a Holocene lake sediment metagenome from the Kap København Formation, Greenland, containing reads from ancient (damaged, ~86 bp modal length) and modern (undamaged, ~177 bp) DNA populations. Assembly was performed with MEGAHIT from 119 time-point samples co-assembled (~2.8 Gbp total, ~280,000 contigs ≥ 2,500 bp). Bin quality assessed with CheckM2 v1.0.2 [1].

MIMAG thresholds [2]: HQ = completeness ≥ 90%, contamination < 5%; MQ = completeness ≥ 50%, contamination < 10%.

Tool comparison

All tools ran on the same assembly and BAM. AMBER result is the amber resolve consensus from 3 independent runs (3 encoder restarts × 25 Leiden seeds each). COMEBin and SemiBin2 results are across independent replicate runs; mean and range shown.

Tool	HQ bins	MQ bins	Reps	Notes
SemiBin2 [3]	5.7 (range 5–6)	17	3	Self-supervised contrastive, no aDNA features
COMEBin [4]	7.6 (range 6–9)	16–17	5	Standard self-supervised InfoNCE, no aDNA features
AMBER (this work)	11	20	3 → resolve	Damage-aware InfoNCE + quality-guided Leiden + co-binning consensus

AMBER recovers 2–5 additional HQ bins compared with competing methods. SemiBin2 is limited to 5–6 HQ and is stable across runs; COMEBin varies between 6 and 9 HQ across 5 runs. AMBER is fully stable (11/11/11).

AMBER — per-bin quality (resolve consensus)

All 11 HQ bins. Genome sizes in Mbp.

Bin	Completeness	Contamination	Size (Mbp)
bin_28	100.0%	4.45%	3.63
bin_77	99.9%	0.69%	0.84
bin_53	99.9%	1.86%	3.30
bin_42	98.2%	0.05%	1.12
bin_24	97.7%	0.23%	3.19
bin_26	96.5%	2.09%	0.77
bin_38	95.0%	2.65%	3.34
bin_35	94.6%	0.90%	2.45
bin_4	92.3%	1.90%	1.91
bin_8	90.6%	2.29%	2.76
bin_47	90.1%	2.62%	3.58

SemiBin2 — per-bin quality (best replicate, 6 HQ)

Best of 3 SemiBin2 self-supervised runs. All 6 HQ bins shown; SemiBin2 produces 17 MQ bins per run.

Bin	Completeness	Contamination	Size (Mbp)	Tier
SemiBin_13	100.0%	4.36%	3.69	HQ
SemiBin_146	99.9%	0.64%	0.83	HQ
SemiBin_37	99.1%	2.07%	3.44	HQ
SemiBin_11	99.1%	0.37%	3.35	HQ
SemiBin_21	97.7%	0.04%	1.10	HQ
SemiBin_12	97.5%	2.15%	0.76	HQ

COMEBin — per-bin quality (best replicate, 9 HQ)

Best of 5 COMEBin runs (rep5). All 9 HQ bins shown; COMEBin recovers 6–9 HQ and 16–17 MQ bins per run.

Bin	Completeness	Contamination	Size (Mbp)	Tier
27966	100.0%	2.15%	3.54	HQ
25795	100.0%	4.31%	3.56	HQ
28106	99.8%	0.66%	0.96	HQ
25284	99.2%	0.41%	3.33	HQ
27724	97.5%	0.05%	1.14	HQ
27394	93.3%	1.67%	3.53	HQ
26333	93.2%	2.10%	1.65	HQ
25942	91.9%	0.94%	2.72	HQ
23134	91.3%	2.04%	2.41	HQ

Comparison highlights

Top-quality bins (≥ 97% completeness, < 5% contamination): AMBER and COMEBin recover 5 such bins; SemiBin2 recovers 6. Genome sizes are consistent (~0.8–3.7 Mbp), suggesting a shared core of high-completeness genomes that any reasonable binner recovers. The differences emerge below 97% completeness.

Bins unique to AMBER: bin_8 (90.6% / 2.29%, 2.76 Mbp) and bin_47 (90.1% / 2.62%, 3.58 Mbp) cross the HQ threshold in AMBER but are not recovered as HQ by either COMEBin or SemiBin2. These are likely genomes where aDNA damage features provide signal to separate them from neighbouring bins.

Contamination control: AMBER's median contamination across 11 HQ bins is 1.90%; COMEBin best rep median is 1.67%; SemiBin2 best rep median is 1.35% — all comparable. AMBER's worst HQ bin is bin_28 at 4.45%.

Reproducibility: AMBER produces exactly 11 HQ bins in every replicate run. SemiBin2 is stable (5–6 HQ) but recovers fewer genomes. COMEBin varies between 6 and 9 HQ across 5 runs.

References

Chklovski A et al. (2023) CheckM2: a rapid, scalable, and accurate tool for assessing microbial genome quality using machine learning. Nature Methods 20:1203–1212.
Bowers RM et al. (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG). Nature Biotechnology 35:725–731.
Pan S, Zhao X-M, Coelho LP (2023) SemiBin2: self-supervised contrastive learning leads to better MAGs for short- and long-read sequencing. Bioinformatics 39(Suppl 1):i21–i29.
Wang Z et al. (2024) COMEBin allows effective binning of metagenomic contigs using coverage multi-view encoder. Nature Communications 15:1119.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks

Benchmarks

Tool comparison

AMBER — per-bin quality (resolve consensus)

SemiBin2 — per-bin quality (best replicate, 6 HQ)

COMEBin — per-bin quality (best replicate, 9 HQ)

Comparison highlights

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AMBER Wiki

Clone this wiki locally