-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathCITATION.cff
More file actions
149 lines (145 loc) · 5.07 KB
/
Copy pathCITATION.cff
File metadata and controls
149 lines (145 loc) · 5.07 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
cff-version: 1.2.0
title: "HarmonizePy: Pure-Python batch-effect harmonization with structural missingness handling"
message: "If you use HarmonizePy in your research, please cite it using the metadata below."
type: software
repository-code: "https://github.com/LangeLab/HarmonizePy"
version: 0.3.2
date-released: "2026-05-29"
license: GPL-3.0
abstract: >
HarmonizePy is a Pure-Python batch-effect correction toolkit for omics
data with structural missingness. Standard ComBat and limma require
complete matrices; HarmonizePy handles the dissection, adjustment, and
reassembly so they work on real-world incomplete data without an R
runtime.
It provides all four modes of ComBat (Johnson, Li & Rabinovic, 2007):
parametric and non-parametric, with and without scale correction. It
also implements limma-style batch removal via sum-to-zero contrasts
(Ritchie et al., 2015). The pipeline targets HarmonizR v1.10.0 feature
parity: per-feature batch-presence detection groups the matrix into
compatible subsets, each adjusted independently and then reassembled
into the original shape without imputation. Missing observations remain
missing in the output. Three sort strategies reorder batches so similar
ones become neighbours before blocking; batch blocking groups
consecutive batches into blocks to reduce sub-matrix count and memory;
unique-combination removal rescues singletons by cropping their
affiliation to the nearest shared pattern before splitting.
The engine layer is pure NumPy with no compiled extensions. Both a
programmatic API and a full CLI are available, supporting config files
(TOML, JSON, YAML), dry-run validation, JSON run summaries, and output
in TSV, CSV, or Parquet. The implementation is verified against R
reference outputs for `sva::ComBat`, `limma::removeBatchEffect`, and
HarmonizR workflows across synthetic and real-data scenarios.
authors:
- family-names: Ergin
given-names: Enes Kemal
orcid: "https://orcid.org/0000-0001-9810-7399"
affiliation: Lange Lab
email: eneskemalergin@gmail.com
keywords:
- batch-effect
- harmonization
- ComBat
- limma
- proteomics
- transcriptomics
- structural-missingness
- batch-sorting
- batch-blocking
- unique-removal
- empirical-Bayes
- data-integration
- Python
- command-line-tool
references:
- type: article
title: "HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values"
authors:
- family-names: Voß
given-names: H.
- family-names: Schlumbohm
given-names: S.
- family-names: Barwikowski
given-names: P.
- family-names: Wurlitzer
given-names: M.
- family-names: Dottermusch
given-names: M.
- family-names: Neumann
given-names: P.
- family-names: Schlüter
given-names: H.
- family-names: Neumann
given-names: J. E.
- family-names: Krisp
given-names: C.
journal: Nature Communications
volume: 13
article-number: 3523
year: 2022
doi: 10.1038/s41467-022-31007-x
url: "https://doi.org/10.1038/s41467-022-31007-x"
- type: article
title: "Adjusting batch effects in microarray expression data using empirical Bayes methods"
authors:
- family-names: Johnson
given-names: W. Evan
- family-names: Li
given-names: Cheng
- family-names: Rabinovic
given-names: Ariel
journal: Biostatistics
volume: 8
number: 1
pages: 118-127
year: 2007
doi: 10.1093/biostatistics/kxj037
- type: article
title: "limma powers differential expression analyses for RNA-sequencing and microarray studies"
authors:
- family-names: Ritchie
given-names: Matthew E.
- family-names: Phipson
given-names: Belinda
- family-names: Wu
given-names: Di
- family-names: Hu
given-names: Yifang
- family-names: Law
given-names: Charity W.
- family-names: Shi
given-names: Wei
- family-names: Smyth
given-names: Gordon K.
journal: Nucleic Acids Research
volume: 43
number: 7
article-number: e47
year: 2015
- type: article
title: "HarmonizR: blocking and singular feature data adjustment improve runtime efficiency and data preservation"
authors:
- family-names: Schlumbohm
given-names: Simon
- family-names: Neumann
given-names: Julia E.
- family-names: Neumann
given-names: Philipp
journal: BMC Bioinformatics
volume: 26
year: 2025
doi: 10.1186/s12859-025-06073-9
- type: software
title: "HarmonizR"
repository-code: "https://github.com/HSU-HPC/HarmonizR"
abstract: "Original R implementation (Schlumbohm, Neumann & Neumann)."
authors:
- family-names: Schlumbohm
given-names: Simon
- family-names: Neumann
given-names: Julia E.
- family-names: Neumann
given-names: Philipp
- type: website
title: "HarmonizR Bioconductor Package"
url: "https://www.bioconductor.org/packages/release/bioc/html/HarmonizR.html"