Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
ce31ec7
replace upload with ena-webin-handler
KateSakharova Mar 12, 2026
952ab91
additions
KateSakharova Mar 12, 2026
e2709e1
add condition for manifest parsing
KateSakharova Mar 12, 2026
0453f2e
rename
KateSakharova Mar 12, 2026
7514d0a
Merge remote-tracking branch 'origin/dev' into add_tests
ochkalova Mar 16, 2026
931fc86
refactor and add check to only download CAT_db if local db is not pro…
ochkalova Mar 17, 2026
f72c32e
add possible fna extension to schemas because we can handle it
ochkalova Mar 17, 2026
c599e65
patch genome_evaluation to only download db if there are genomes to a…
ochkalova Mar 17, 2026
8f52a6c
patch fasta_classify_catpack to only download db if there are genomes…
ochkalova Mar 17, 2026
81dbeb2
add FASTAVALIDATOR to check fasta file formatting in GENOMESUBMIT wor…
ochkalova Mar 17, 2026
76498c2
include FASTAVALIDATOR step to genomesubmit
ochkalova Mar 18, 2026
6264ee5
add more published results to modules.conf
ochkalova Mar 18, 2026
f6eb677
add tests for genomesubmit in --mode mag
ochkalova Mar 18, 2026
2ae8f95
update methods.md
ochkalova Mar 18, 2026
d422d96
update container for webin-cli-wrapper
ochkalova Mar 18, 2026
201d0ff
fix typo
ochkalova Mar 18, 2026
1ca99b3
add output for accession TSV and update meta description to webin-cli…
ochkalova Mar 18, 2026
b612148
add test with invalid assembly
ochkalova Mar 18, 2026
6a38731
append mag_ to all samplesheets for genomesubmit testing for convenience
ochkalova Mar 18, 2026
9e4f980
add tests for assemblysubmit workflow
ochkalova Mar 18, 2026
75de363
create output dir for metadata file if it doesn't exist
ochkalova Mar 19, 2026
f9bd3d4
push test data and samplesheets to nf-core/test-datasets
ochkalova Mar 19, 2026
57f22b8
add mode-specific test tags, update .nftignore to exclude results fil…
ochkalova Mar 19, 2026
d0b47ea
update usage doc
ochkalova Mar 19, 2026
c3757d8
add more tests
ochkalova Mar 19, 2026
ef79beb
add snapshots for tests
ochkalova Mar 20, 2026
4fc0b10
pdate container for webin-cli-wrapper
ochkalova Mar 20, 2026
59f14ab
add test profiles import to nextflow.config
ochkalova Mar 20, 2026
ea385d7
replace webin-cli with webin_cli_wrapper in assemblysubmit workflow
ochkalova Mar 20, 2026
b6dfc78
remove echoed credentials
ochkalova Mar 20, 2026
a47409a
update docs
ochkalova Mar 20, 2026
707becf
remove webin-cli from modules.config, add publishDirs for databases
ochkalova Mar 20, 2026
30bdba5
enable MULTIQC in assemblysubmit and genomesubmit workflows
ochkalova Mar 20, 2026
f6ab40d
update schema
ochkalova Mar 20, 2026
fe7a8cc
remove commented code
ochkalova Mar 20, 2026
f917165
remove docker.enabled = true from test profiles
ochkalova Mar 25, 2026
2df682c
downgrade MULTIQC container version to 1.25.1 to prevent segfaults in…
ochkalova Mar 25, 2026
f7adda8
Merge branch 'dev' into add_tests
ochkalova Mar 25, 2026
797e019
fixes in tests, snapshot updates
ochkalova Mar 25, 2026
87b473a
add notes to readme to ensure expected order of columns in the sample…
ochkalova Mar 26, 2026
05919f2
use splitCsv for coverage file parsing
ochkalova Mar 26, 2026
25b3d8d
apply vangelis linter
ochkalova Mar 26, 2026
962acb0
add fallback publishDir and explicitly disable publishing where needed
ochkalova Mar 27, 2026
e881c0a
update snapshots
ochkalova Mar 27, 2026
9a17eca
add secrets export in github actions
ochkalova Mar 27, 2026
56fd07f
reduce memory allocation in conf files to 12GB for github actions
ochkalova Mar 27, 2026
861b92a
clean up tests, make test_mag_multiple_bins_missing_metadata.nf defau…
ochkalova Mar 27, 2026
b100a7d
delete slow and greedy nf-test that runs CheckM2
ochkalova Mar 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/actions/nf-test/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@ runs:
channel-priority: strict
conda-remove-defaults: true

- name: Configure Nextflow secrets
shell: bash
run: |
nextflow secrets set ENA_WEBIN "$WEBIN_ACCOUNT"
nextflow secrets set ENA_WEBIN_PASSWORD "$WEBIN_PASSWORD"

- name: Run nf-test
shell: bash
env:
Expand Down
34 changes: 34 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,40 @@

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

- [CoverM](https://github.com/wwood/CoverM)

> Aroney ST, Newell RJ, Nissen JN, Camargo AP, Tyson GW, Woodcroft BJ. CoverM: Read alignment statistics for metagenomics. Bioinformatics. 2025;41(4):btaf147. doi: 10.1093/bioinformatics/btaf147. PubMed PMID: 40193404; PubMed Central PMCID: PMC11993303.

- [CheckM2](https://github.com/chklovski/CheckM2)

> Chklovski A, Parks DH, Woodcroft BJ, Tyson GW. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat Methods. 2023;20(8):1203-1212. doi: 10.1038/s41592-023-01940-w. PubMed PMID: 37500759; PubMed Central PMCID: not available.

- [CAT and BAT](https://doi.org/10.1186/s13059-019-1817-x)

> von Meijenfeldt FAB, Arkhipova K, Cambuy DD, Coutinho FH, Dutilh BE. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biol. 2019;20(1):217. doi: 10.1186/s13059-019-1817-x. PubMed PMID: 31640809; PubMed Central PMCID: PMC6805573.

- [tRNAscan-SE 2.0](https://doi.org/10.1093/nar/gkab688)

> Chan PP, Lin BY, Mak AJ, Lowe TM. tRNAscan-SE 2.0: Improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 2021;49(16):9077-9096. doi: 10.1093/nar/gkab688. PubMed PMID: 34417604; PubMed Central PMCID: PMC8450103.

- [barrnap](https://github.com/tseemann/barrnap)

> Seemann T. Barrnap: rapid ribosomal RNA prediction. GitHub repository. https://github.com/tseemann/barrnap

## Submission and helper tools

- [ENA Webin-CLI](https://github.com/enasequence/webin-cli)

> European Nucleotide Archive. Webin command line submission interface (Webin-CLI). GitHub repository. https://github.com/enasequence/webin-cli

- [assembly_uploader](https://github.com/EBI-Metagenomics/assembly_uploader)

> EBI Metagenomics. ENA Metagenome Assembly uploader. GitHub repository. https://github.com/EBI-Metagenomics/assembly_uploader

- [genome_uploader](https://github.com/EBI-Metagenomics/genome_uploader)

> EBI Metagenomics. ENA public Bins and MAGs uploader. GitHub repository. https://github.com/EBI-Metagenomics/genome_uploader

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
44 changes: 30 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ Currently, the pipeline supports three submission modes, each routed to a dedica

Setup your environment secrets before running the pipeline:

`nextflow secrets set WEBIN_ACCOUNT "Webin-XXX"`
`nextflow secrets set ENA_WEBIN "Webin-XXX"`

`nextflow secrets set WEBIN_PASSWORD "XXX"`
`nextflow secrets set ENA_WEBIN_PASSWORD "XXX"`

Make sure you update commands above with your authorised credentials.

Expand All @@ -55,43 +55,52 @@ The input must follow `assets/schema_input_genome.json`.
Required columns:

- `sample`
- `fasta` (must end with `.fa.gz` or `.fasta.gz`)
- `fasta` (must end with `.fa.gz`, `.fasta.gz`, or `.fna.gz`)
- `accession`
- `assembly_software`
- `binning_software`
- `binning_parameters`
- `stats_generation_software`
- `metagenome`
- `environmental_medium`
- `broad_environment`
- `local_environment`
- `co-assembly`

Columns that required for now, but will be optional in the nearest future:
At least one of the following must be provided per row:

- reads (`fastq_1`, optional `fastq_2` for paired-end)
- `genome_coverage`

Additional supported columns:

- `stats_generation_software`
- `completeness`
- `contamination`
- `genome_coverage`
- `RNA_presence`
- `NCBI_lineage`

Those fields are metadata required for [genome_uploader](https://github.com/EBI-Metagenomics/genome_uploader) package.
If `genome_coverage`, `stats_generation_software`, `completeness`, `contamination`, `RNA_presence`, or `NCBI_lineage` are missing, the workflow can calculate or infer them when the required inputs are available.

Those fields are metadata required for the [genome_uploader](https://github.com/EBI-Metagenomics/genome_uploader) package.

Example `samplesheet_genome.csv`:
Example `samplesheet_genomes.csv`:

```csv
sample,fasta,accession,assembly_software,binning_software,binning_parameters,stats_generation_software,completeness,contamination,genome_coverage,metagenome,co-assembly,broad_environment,local_environment,environmental_medium,RNA_presence,NCBI_lineage
lachnospira_eligens,data/bin_lachnospira_eligens.fa.gz,SRR24458089,spades_v3.15.5,metabat2_v2.6,default,CheckM2_v1.0.1,61.0,0.21,32.07,sediment metagenome,No,marine,cable_bacteria,marine_sediment,No,d__Bacteria;p__Proteobacteria;s_unclassified_Proteobacteria
sample,fasta,accession,fastq_1,fastq_2,assembly_software,binning_software,binning_parameters,stats_generation_software,completeness,contamination,genome_coverage,metagenome,co-assembly,broad_environment,local_environment,environmental_medium,RNA_presence,NCBI_lineage
lachnospira_eligens,data/bin_lachnospira_eligens.fa.gz,SRR24458089,,,spades_v3.15.5,metabat2_v2.6,default,CheckM2_v1.0.1,61.0,0.21,32.07,sediment metagenome,No,marine,cable_bacteria,marine_sediment,No,d__Bacteria;p__Proteobacteria;s__unclassified_Proteobacteria
```

> [!IMPORTANT]
> **Samplesheet column requirements**: All columns shown in the example above must be present in your samplesheet, even if some values are empty. Columns must be in exactly the same order as shown.

### `metagenomic_assemblies` mode (`ASSEMBLYSUBMIT`)

The input must follow `assets/schema_input_assembly.json`.

Required columns:

- `sample`
- `fasta` (must end with `.fa.gz` or `.fasta.gz`)
- `fasta` (must end with `.fa.gz`, `.fasta.gz`, or `.fna.gz`)
- `run_accession`
- `assembler`
- `assembler_version`
Expand All @@ -111,6 +120,9 @@ assembly_1,data/contigs_1.fasta.gz,data/reads_1.fastq.gz,data/reads_2.fastq.gz,,
assembly_2,data/contigs_2.fasta.gz,,,42.7,ERR011323,MEGAHIT,1.2.9
```

> [!IMPORTANT]
> **Samplesheet column requirements**: All columns shown in the example above must be present in your samplesheet, even if some values are empty. Columns must be in exactly the same order as shown.

## Usage

> [!NOTE]
Expand All @@ -122,6 +134,10 @@ All data submitted through this pipeline must be associated with an ENA study (p

See the [usage documentation](docs/usage.md#submission-study) for more details.

### Database setup (`CheckM2` and `CAT_pack`)

The `mags`/`bins` workflow requires databases for completeness/contamination estimation and taxonomy assignment. See [Usage documentation](usage.md) for details.

### Required parameters:

| Parameter | Description |
Expand All @@ -137,7 +153,7 @@ See the [usage documentation](docs/usage.md#submission-study) for more details.
| Parameter | Description |
| ------------------- | ---------------------------------------------------------------------------------------- |
| `--upload_tpa` | Flag to control the type of assembly study (third party assembly or not). Default: false |
| `--test_upload` | Upload to TEST ENA server instead of LIVE. Default: false |
| `--test_upload` | Upload to TEST ENA server instead of LIVE. Default: true |
| `--webincli_submit` | If set to false, submissions will be validated, but not submitted. Default: true |

General command template:
Expand Down Expand Up @@ -202,8 +218,8 @@ For more details and further functionality, please refer to the [usage documenta

Key output locations in `--outdir`:

- `upload/manifests/`: generated manifest files for submission
- `upload/webin_cli/`: ENA Webin CLI reports
- `mags/` or `bins/`: genome metadata, manifests, and per-sample submission support files
- `metagenomic_assemblies/`: assembly metadata CSVs and per-sample coverage files
- `multiqc/`: MultiQC summary report
- `pipeline_info/`: execution reports, trace, DAG, and software versions

Expand Down
6 changes: 3 additions & 3 deletions assets/samplesheet_genomes.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
sample,fasta,accession,fastq_1,fastq_2,assembly_software,binning_software,binning_parameters,stats_generation_software,completeness,contamination,genome_coverage,metagenome,co-assembly,broad_environment,local_environment,environmental_medium,rRNA_presence,NCBI_lineage
lachnospira_eligens,https://github.com/nf-core/test-datasets/raw/seqsubmit/test_data/bins/bin_lachnospira_eligens.fa.gz,SRR24458089,spades_v3.15.5,mags_v1,default,CheckM2_v1.0.1,61.0,0.21,32.07,sediment metagenome,False,marine,cable bacteria,marine sediment,False,d__Bacteria;p__Proteobacteria;c__Deltaproteobacteria;o__Desulfobacterales;f__Desulfobulbaceae;g__Candidatus Electrothrix;s__
lachnospiraceae,https://github.com/nf-core/test-datasets/raw/seqsubmit/test_data/bins/bin_lachnospiraceae.fa.gz,SRR24458087,spades_v3.15.5,mags_v1,default,CheckM2_v1.0.1,92.81,1.09,66.04,sediment metagenome,False,marine,cable bacteria,marine sediment,False,d__Bacteria;p__Proteobacteria;c__Deltaproteobacteria;o__Desulfobacterales;f__Desulfobulbaceae;g__Candidatus Electrothrix;s__Candidatus Electrothrix marina
sample,fasta,accession,fastq_1,fastq_2,assembly_software,binning_software,binning_parameters,stats_generation_software,completeness,contamination,genome_coverage,metagenome,co-assembly,broad_environment,local_environment,environmental_medium,RNA_presence,NCBI_lineage
lachnospira_eligens,https://github.com/nf-core/test-datasets/raw/seqsubmit/test_data/bins/bin_lachnospira_eligens.fa.gz,SRR24458089,,,spades_v3.15.5,mags_v1,default,CheckM2_v1.0.1,61.0,0.21,32.07,sediment metagenome,No,marine,cable bacteria,marine sediment,No,d__Bacteria;p__Proteobacteria;c__Deltaproteobacteria;o__Desulfobacterales;f__Desulfobulbaceae;g__Candidatus Electrothrix;s__
lachnospiraceae,https://github.com/nf-core/test-datasets/raw/seqsubmit/test_data/bins/bin_lachnospiraceae.fa.gz,SRR24458087,,,spades_v3.15.5,mags_v1,default,CheckM2_v1.0.1,92.81,1.09,66.04,sediment metagenome,No,marine,cable bacteria,marine sediment,No,d__Bacteria;p__Proteobacteria;c__Deltaproteobacteria;o__Desulfobacterales;f__Desulfobulbaceae;g__Candidatus Electrothrix;s__Candidatus Electrothrix marina
4 changes: 2 additions & 2 deletions assets/schema_input_assembly.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^([\\S\\s]*\\/)?[^\\s\\/]+\\.f(ast)?a\\.gz$",
"errorMessage": "FASTA file must be provided and have extension '.fa', '.fasta', '.fas', '.fna' (optionally gzipped)",
"pattern": "^([\\S\\s]*\\/)?[^\\s\\/]+\\.(fa|fasta|fna)\\.gz$",
"errorMessage": "FASTA file must be provided and have extension '.fa.gz', '.fasta.gz', '.fna.gz'",
"description": "Metagenomic assembly FASTA file"
},
"fastq_1": {
Expand Down
5 changes: 3 additions & 2 deletions assets/schema_input_genome.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^([\\S\\s]*\\/)?[^\\s\\/]+\\.f(ast)?a\\.gz$",
"errorMessage": "FASTA file for sequences 1 must be provided, cannot contain spaces and must have extension '.fa.gz' or '.fasta.gz'",
"pattern": "^([\\S\\s]*\\/)?[^\\s\\/]+\\.(fa|fasta|fna)\\.gz$",
"errorMessage": "FASTA file for sequences 1 must be provided, cannot contain spaces and must have extension '.fa.gz', '.fasta.gz', or '.fna.gz'",
"description": "MAG/bin sequence file"
},
"accession": {
Expand Down Expand Up @@ -117,6 +117,7 @@
"required": [
"sample",
"fasta",
"accession",
"assembly_software",
"co-assembly",
"binning_software",
Expand Down
3 changes: 0 additions & 3 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@

process {

// TODO nf-core: Check the defaults for all processes
cpus = { 1 * task.attempt }
memory = { 6.GB * task.attempt }
time = { 4.h * task.attempt }
Expand All @@ -24,8 +23,6 @@ process {
// These labels are used and recognised by default in DSL2 files hosted on nf-core/modules.
// If possible, it would be nice to keep the same label naming convention when
// adding in your local modules too.
// TODO nf-core: Customise requirements for specific processes.
// See https://www.nextflow.io/docs/latest/config.html#config-process-selectors
withLabel:process_single {
cpus = { 1 }
memory = { 6.GB * task.attempt }
Expand Down
Loading
Loading