Skip to content

Add modules for CSV/TSV metadata generation #44

@ochkalova

Description

@ochkalova

Description of feature

Currenty files are generated in the workflow body, that is more fragile and prone to errors (especially when resuming the run):

    // --------- Combine metadata into TSV
    genome_metadata_csv = fasta_updated_with_taxonomy
        .map { meta, fasta ->
            [
                meta.id,
                fasta.getName(),
                meta.accession,
                meta.assembly_software,
                meta.binning_software,
                meta.binning_parameters,
                meta.stats_generation_software,
                meta.completeness,
                meta.contamination,
                meta.genome_coverage,
                meta.metagenome,
                meta.co_assembly == "Yes" ? "True" : "False",
                meta.broad_environment,
                meta.local_environment,
                meta.environmental_medium,
                meta.RNA_presence == "Yes" ? "True" : "False",
                meta.NCBI_lineage
            ].join('\t')
        }
        .collectFile(
            name: 'genomes_metadata.csv',
            storeDir: "${params.outdir}/${params.mode}",
            seed: [
                'genome_name',
                'genome_path',
                'accessions',
                'assembly_software',
                'binning_software',
                'binning_parameters',
                'stats_generation_software',
                'completeness',
                'contamination',
                'genome_coverage',
                'metagenome',
                'co-assembly',
                'broad_environment',
                'local_environment',
                'environmental_medium',
                'rRNA_presence',
                'NCBI_lineage'
            ].join('\t'),
            newLine: true
        )

The task is to replace those with dedicated small modules that do generation of files.
Something like (AI):

process CREATE_ASSEMBLY_METADATA {
    tag "$meta.id"
    publishDir "${params.outdir}/${params.mode}", mode: 'copy'

    input:
    tuple val(meta), path(fasta)

    output:
    tuple val(meta), path("${meta.id}_assembly_metadata.csv")

    script:
    def header = 'Runs,Coverage,Assembler,Version,Filepath,Sample'
    def row = [
        meta.run_accession ?: '',
        meta.coverage ?: '',
        meta.assembler ?: '',
        meta.assembler_version ?: '',
        fasta.name,
        ''
    ].join(',')
    """
    cat <<-END_CSV > ${meta.id}_assembly_metadata.csv
    ${header}
    ${row}
    END_CSV
    """
}

// Then in workflow:
assembly_metadata_csv = CREATE_ASSEMBLY_METADATA(assemblies_with_coverage)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions