Skip to content

Add HLA-HD v1.7.1 module and BAM-input subworkflow#241

Open
johnoooh wants to merge 6 commits intodevelopfrom
feature/hlahd
Open

Add HLA-HD v1.7.1 module and BAM-input subworkflow#241
johnoooh wants to merge 6 commits intodevelopfrom
feature/hlahd

Conversation

@johnoooh
Copy link
Collaborator

Add HLA-HD v1.7.1 module and BAM-input subworkflow

Summary

  • Adds modules/msk/hlahd — nf-core-style module for HLA-HD v1.7.1 (high-resolution HLA typing from paired FASTQ)
  • Adds subworkflows/msk/hlahd_from_bam — end-to-end BAM-to-HLA-typing workflow
  • Adds test data entries to tests/config/test_data.config (data on hlahd branch of test-datasets repo)

Module: modules/msk/hlahd

Container mskcc.jfrog.io/omicswf-docker-dev-local/mskcc-omics-workflows/hlahd:1.7.1
Input [ meta, fastq_1, fastq_2 ]
Output result (final allele calls), result_per_locus (per-gene .est.txt files), versions
  • Min-read threshold configurable via ext.args2 (default: 100)
  • Includes stub for pipeline dry-runs

Note: Container currently points to the dev registry. Will be updated to prod on the next containers release.

Subworkflow: subworkflows/msk/hlahd_from_bam

Chains four modules to go from coordinate-sorted BAM to HLA allele calls:

[ meta, bam, bai ]
        |
  SAMTOOLS_VIEW         extract HLA region (configured via ext.args)
        |
  GATK4_REVERTSAM       optional BQSR reversion (skip_revert_sam param)
        |
  SAMTOOLS_FASTQ        BAM -> paired FASTQ
        |
  HLAHD                 HLA allele calling
        |
[ result, result_per_locus, versions ]

The skip_revert_sam parameter controls whether GATK4 RevertSam runs. Set to true when the input BAM has no BQSR applied.

Add HLA-HD v1.7.1 module and BAM-input subworkflow

  • Adds subworkflows/msk/hlahd_from_bam — end-to-end BAM-to-HLA-typing workflow

Module: modules/msk/hlahd

Container mskcc.jfrog.io/omicswf-docker-dev-local/mskcc-omics-workflows/hlahd:1.7.1
Input [ meta, fastq_1, fastq_2 ]
  • Min-read threshold configurable via ext.args2 (default: 100)
  • Includes stub for pipeline dry-runs

Subworkflow: subworkflows/msk/hlahd_from_bam

Chains four modules to go from coordinate-sorted BAM to HLA allele calls:

[ meta, bam, bai ]
        |
  SAMTOOLS_VIEW         extract HLA region (configured via ext.args)
        |
  GATK4_REVERTSAM       optional BQSR reversion (skip_revert_sam param)
        |
  SAMTOOLS_FASTQ        BAM -> paired FASTQ
        |
  HLAHD                 HLA allele calling
        |
[ result, result_per_locus, versions ]

Test data

Region Coordinates (GRCh37)
HLA-A 6:29910247-29913661
HLA-B 6:31321649-31324989
HLA-C 6:31236526-31239913

~21k reads, ~3.3MB across 4 files (BAM + BAI + paired FASTQ).

Example output (test_sample_final.result.txt)

A       HLA-A*01:01:01  HLA-A*29:02:01
B       HLA-B*08:01:01  HLA-B*44:46
C       HLA-C*07:01:01  HLA-C*16:26
DRB1    Not typed       Not typed
        |
[ result, result_per_locus, versions ]

Class II loci are "Not typed" as expected — only class I regions are included in the test data.

Tests

All 5 nf-test tests pass with deterministic snapshot matching:

Module tests (2):

  • hlahd - fastq pair - result txt — real HLA-HD run, verifies final result md5
  • hlahd - fastq pair - stub — stub run, verifies versions output

Subworkflow tests (3):

  • hlahd_from_bam - bam - with revert sam - result — full pipeline with GATK4 RevertSam
  • hlahd_from_bam - bam - skip revert sam - result — pipeline skipping RevertSam
  • hlahd_from_bam - bam - stub — stub run

Both revert/skip-revert paths produce identical final calls (md5: 6f83fc8ac5bd3b9f56853b583595e2a0).

Checklist

  • Module follows nf-core conventions (meta map, ext.args, versions.yml)
  • All 5 nf-test tests passing
  • Snapshot files committed
  • Test data on hlahd branch in test-datasets repo
  • meta.yml complete for both module and subworkflow
  • Container URL switched to prod registry (pending next containers release)

johnoooh and others added 6 commits March 5, 2026 11:33
Module runs HLA-HD for HLA typing from paired-end FASTQ input.
Container-only (not available on conda/bioconda).
Private container built from JFrog-hosted binary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stub test and real test using HLA-region FASTQ from test-datasets hlahd branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Composes samtools/view, gatk4/revertsam (optional), samtools/fastq,
and hlahd modules into a BAM-to-HLA-typing pipeline.
Tests cover both skip_revert_sam paths plus stub test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also add the nf-test snapshot file that was missing from prior commits.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pshots

- Fix output globs: results are at <prefix>/result/, not <prefix>/
  - result: ${prefix}/result/${prefix}_final.result.txt
  - result_per_locus: ${prefix}/result/${prefix}_*.est.txt
- Switch container URL to dev registry while awaiting next prod release
- Add nextflow.config for subworkflow tests (ext.prefix per process to
  avoid GATK4_REVERTSAM input/output name collision)
- Regenerate all snapshots against new HLA class I test data that
  produces actual allele calls (A*01:01:01, B*08:01:01, C*07:01:01)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@johnoooh johnoooh requested a review from a team as a code owner March 12, 2026 20:18
@johnoooh johnoooh requested a review from price0416 March 12, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant