Skip to content

feat: add optional library_id, lane, flowcell samplesheet columns#147

Open
znorgaard wants to merge 21 commits intodevfrom
feature/samplesheet-metadata
Open

feat: add optional library_id, lane, flowcell samplesheet columns#147
znorgaard wants to merge 21 commits intodevfrom
feature/samplesheet-metadata

Conversation

@znorgaard
Copy link
Copy Markdown
Collaborator

@znorgaard znorgaard commented Mar 26, 2026

Summary

Changes

  • Schema: sample now maps to meta.sample; meta.id is computed as library_id (when provided) or sample
  • Pre-merge output naming: Files include lane (and optionally flowcell) in prefix via ext.prefix (e.g., SAMPLE.1.html, SAMPLE.HXXYYBBXX.L001.html)
  • FastqToBam: --sample ${meta.sample} --library ${meta.id} (was both ${meta.id})
  • Merge grouping: Strips lane/flowcell from meta before groupKey so lanes merge correctly
  • Validation: All-or-nothing rules for library_id, lane, flowcell; uniqueness of (flowcell, lane) pairs; global uniqueness of library_id across samples

Test plan

  • 5 validation failure tests (library_id partial, library_id cross-sample, lane partial, flowcell partial, duplicate flowcell+lane)
  • Positive integration test with explicit flowcell and lane metadata
  • All 16 existing + new tests passing
  • Snapshot updated for .{lane} suffix in pre-merge output filenames
  • Manual review of multi-lane output directory structure

Closes #146

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 26, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit e6f9035

+| ✅ 206 tests passed       |+
#| ❔  10 tests were ignored |#
#| ❔   1 tests had warnings |#
!| ❗   1 tests had warnings |!
Details

❗ Test warnings:

❔ Tests ignored:

  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: conf/modules/modules.config
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/nf-core-fastquorum_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-fastquorum_logo_dark.png
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
  • template_strings - template_strings

❔ Tests fixed:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.2
  • Run at 2026-04-02 22:50:48

@znorgaard znorgaard mentioned this pull request Mar 27, 2026
11 tasks
@znorgaard znorgaard force-pushed the feature/samplesheet-metadata branch from c2f31dc to a00b2c8 Compare March 30, 2026 16:40
znorgaard and others added 2 commits March 30, 2026 10:38
…rary_id and nonrandom multi-lane workflows

Add ext.prefix to CORRECTUMIS config for lane-aware output naming, fix
GroupReadsByUmi stub missing grouped-read-metrics.txt, update existing
nonrandom/mixed UMI tests for lane-aware filenames, and add new tests
covering library_id as processing unit and nonrandom UMIs with multiple
lanes (both auto-assigned and explicit metadata).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ext.args = '--quiet'
ext.prefix = { meta.flowcell ? "${meta.id}.${meta.flowcell}.${meta.lane}" : "${meta.id}.${meta.lane}" }
publishDir = [
path: { "${params.outdir}/preprocessing/fastqc/${meta.id}" },
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The publish path will currently be assigned to library_id if provided, not sample.

I'm open to changing this, I'm not sure which organization will intuitively make more sense/limit clutter/make it easier to find desired results.

@znorgaard znorgaard force-pushed the feature/samplesheet-metadata branch from 6408d30 to d4478f5 Compare March 30, 2026 18:51
@znorgaard znorgaard force-pushed the feature/samplesheet-metadata branch from d4478f5 to d00ee1f Compare March 30, 2026 20:35
- Fix umi_file error message to use `id` instead of `metas[0].id`
- Normalize flowcell values to null early for consistent representation
- Guard ext.prefix closures for null meta.lane (module-level tests)
- Add comment documenting shared_meta safety invariant
- Add positive test for library_id with multi-lane merge path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
"""
touch ${prefix}.grouped.bam
touch ${prefix}.grouped-family-sizes.txt
touch ${prefix}.grouped-read-metrics.txt
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Script was updated previously to output the read metrics, but missed the stub.

@znorgaard znorgaard force-pushed the feature/samplesheet-metadata branch from 499aef2 to f54ec0f Compare April 1, 2026 21:44
@znorgaard znorgaard marked this pull request as ready for review April 1, 2026 21:44
@znorgaard znorgaard requested review from clintval and nh13 April 1, 2026 21:44
},
"lane": {
"type": ["string", "integer"],
"pattern": "^[A-Za-z0-9]+$",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question:
pattern only validates strings, should we enforce integers > 0?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question:
why not just have the type be a string?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nf-schema work around. We want this to be a string so users can use things like "L001". If you leave the type as only string an integer value will throw an error (even if you quote it).

--input (samplesheet.csv): Validation of file failed:
	-> Entry 1: Error for field 'lane' (1): Value is [integer] but should be [string]

We could force it to be greater than 0 but I don't see a specific reason to do so. If someone wanted to use a lane=0 it wouldn't break anything. It could even be a useful re-code. Say you have data from a sequencing run where the lanes were not split and one where they were. You could do something like:

sample,flowcell,lane
control,cell_a,0
control,cell_b,1
control,cell_b,2

znorgaard and others added 3 commits April 2, 2026 09:32
Move integration test samplesheets to nf-core/test-datasets and convert
stub tests to full integration tests with trace counts, software version
snapshots, content snapshots, and output file existence checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@znorgaard znorgaard requested a review from nh13 April 2, 2026 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants