Genomic Workflow Deployment on AWS ParallelCluster Using Nextflow and SLURM

Overview

This documentation provides a comprehensive design guide for building a scalable, secure, and reproducible High-Performance Computing (HPC) environment on Amazon Web Services (AWS), specifically tailored for Genomic Nextflow workflows.

The core of the infrastructure is built using AWS ParallelCluster, managed as Infrastructure as Code (IaC) via Terraform.

Published documentation: https://naratech-platforms.gitbook.io/genomics-nf-hpc-on-aws-parallelcluster/

Production-grade genomic variant discovery pipeline
AWS ParallelCluster with SLURM scheduler
CPU and GPU partitions for optimized workload execution
Nextflow workflow orchestration
Spack + Lmod for software management
FSx for Lustre and EFS for high-performance storage
Wazuh security monitoring on ECS Fargate
Prometheus/Grafana observability stack

Architecture Diagram

flowchart TD
    subgraph "Public Internet"
        User((User))
    end

    subgraph "AWS Cloud - VPC"
        subgraph "Public Subnet"
            Bastion[Bastion Host <br/>or SSM/ VPN Gateway]
        end

        subgraph "Private Subnet"
            subgraph "Management Layer"
                HeadNode[Head Node<br/>SLURM Controller<br/>Nextflow Engine<br/>Spack/Lmod]
                WazuhMgr[Wazuh Manager<br/>Security Monitoring]
                PromGraf[Prometheus & Grafana<br/>Observability Stack]
            end

            subgraph "Compute Layer (Autoscaling)"
                subgraph "CPU Partition"
                    CPUNodes[c6a/c7i Instances<br/>Alignment, QC, Pre-processing]
                end
                subgraph "GPU Partition"
                    GPUNodes[g5/g6 Instances<br/>DeepVariant, GPU Acceleration]
                end
            end

            subgraph "Storage Layer"
                EFS[(Amazon EFS<br/>Home Dirs, Scripts)]
                FSx[(FSx for Lustre<br/>Scratch, High-perf I/O)]
                S3[(Amazon S3<br/>Raw Data, Long-term Storage)]
            end
        end
    end

    User -- r1 --> Bastion
    
    Bastion --> HeadNode
    HeadNode --> CPUNodes
    HeadNode --> GPUNodes
    CPUNodes --- FSx
    GPUNodes --- FSx
    HeadNode --- EFS
    CPUNodes --- EFS
    GPUNodes --- EFS
    FSx <--> S3
    
    WazuhMgr --> HeadNode
    WazuhMgr -.-> CPUNodes
    WazuhMgr -.-> GPUNodes
    PromGraf -.-> HeadNode
    PromGraf -.-> CPUNodes
    PromGraf -.-> GPUNodes

    linkStyle default stroke-width:2px,fill:none,stroke:#F472B6,stroke-dasharray: 5 5,animation:flow
    
    classDef default fill:#1F2937,stroke:#22D3EE,color:#E5E7EB,stroke-width:2px;
    classDef storage fill:#1F2937,stroke:#22D3EE,color:#E5E7EB,stroke-width:2px,stroke-dasharray: 5 5;
    class EFS,FSx,S3 storage;

Quick Navigation

Section	Description
Project Overview	Objectives, design principles, and target audience
System Architecture	Component breakdown and deployment model
Technology Stack	Compute, storage, and software decisions
Terraform Provisioning	Infrastructure as Code setup
Workflow Design	Nextflow pipeline execution flow
Security & Observability	Wazuh, Prometheus, and Grafana integration
Cost Optimization	Strategies for minimizing TCO
Conclusion	Summary and future enhancements
References	External resources and citations
Developer Guidance	SSM access, SLURM, and GPU management
Troubleshooting	GPU and CUDA issue resolution
Validation Checklist	Post-deployment verification steps
Ansible Post-Provisioning	Post-provision configuration and Lmod setup

Getting Started

Review the Project Overview to understand the goals
Study the System Architecture for component understanding
Follow the Terraform Provisioning guide to deploy infrastructure
Use the Validation Checklist to verify deployment

Repository Structure

hpc-genomics-nf/
├── README.md            # Project overview (this file)
├── docs/                # GitBook documentation
├── ansible/             # Post-provision configuration
├── terraform/           # Infrastructure as Code
├── nextflow/            # Pipeline definitions
└── modules/             # Reusable components

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
ansible		ansible
assets		assets
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
book.json		book.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genomic Workflow Deployment on AWS ParallelCluster Using Nextflow and SLURM

Overview

Architecture Diagram

Quick Navigation

Getting Started

Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Genomic Workflow Deployment on AWS ParallelCluster Using Nextflow and SLURM

Overview

Architecture Diagram

Quick Navigation

Getting Started

Repository Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages