Background:
The MiCall pipeline currently processes reads on per-real-sample basis and outputs an assembled consensus sequence for them. Each run relies on SampleSheet.csv files for input and output details. A feature to merge samples, ideally across different runs, would simplify the downstream analysis.
Feature Description:
Introduce a merger tool that takes a .csv mapping file and generates a merged SampleSheet.csv, RunInfo.xml, and a duplicate of the input .csv for traceability. The mapping file correlates sample_name and run_folder with output_name, specifying the merging plan.
Feature Objectives:
- Facilitate efficient sample mergers across different run folders.
- Ensure consistency and traceability for merged samples.
- Handle default values and conflicts in input
.csv files.
Functional Requirements:
- Input to the tool:
- Path to the mapping
.csv file.
- Path to the output folder.
- Outputs of the tool:
SampleSheet.csv with merged output_name records.
RunInfo.xml copied from the first associated run_folder.
- Input
.csv file to trace origins of merged data.
- Conflict resolution strategy, with a strict mode option (
--strict flag).
Conflict Resolution Rules:
project_name header field to follow the $current_date.merged pattern.
date header field to reflect the actual merge date.
- All other fields should use the first observed value unless
--strict is enabled.
- Fields
index and index2 should default to XXXXX.
Implementation Tasks:
Background:
The MiCall pipeline currently processes reads on per-real-sample basis and outputs an assembled consensus sequence for them. Each run relies on
SampleSheet.csvfiles for input and output details. A feature to merge samples, ideally across different runs, would simplify the downstream analysis.Feature Description:
Introduce a merger tool that takes a
.csvmapping file and generates a mergedSampleSheet.csv,RunInfo.xml, and a duplicate of the input.csvfor traceability. The mapping file correlatessample_nameandrun_folderwithoutput_name, specifying the merging plan.Feature Objectives:
.csvfiles.Functional Requirements:
.csvfile.SampleSheet.csvwith mergedoutput_namerecords.RunInfo.xmlcopied from the first associatedrun_folder..csvfile to trace origins of merged data.--strictflag).Conflict Resolution Rules:
project_nameheader field to follow the$current_date.mergedpattern.dateheader field to reflect the actual merge date.--strictis enabled.indexandindex2should default toXXXXX.Implementation Tasks:
.csvand handle row defaults.SampleSheet.csvandRunInfo.xml..csvfrom the mapping file.--non-strictmode for conflict resolution, with it becoming the default.