Skip to content

msk-access/gbcms

Repository files navigation

gbcms

Complete orientation-aware counting system for genomic variants

Tests Python 3.10+ Ask DeepWiki

Features

  • 🚀 High Performance: Rust-powered core engine with multi-threading
  • 🧬 Complete Variant Support: SNP, MNP, insertion, deletion, and complex variants (DelIns, SNP+Indel)
  • 📊 Orientation-Aware: Forward and reverse strand analysis with fragment counting
  • 🔬 Statistical Analysis: Fisher's exact test for strand bias
  • 📁 Flexible I/O: VCF and MAF input/output formats
  • 🎯 Quality Filters: 8 configurable read and quality filtering options

Installation

Quick install:

pip install gbcms

From source (requires Rust):

git clone https://github.com/msk-access/gbcms.git
cd gbcms
pip install .

Docker:

docker pull ghcr.io/msk-access/gbcms:X.Y.Z  # Replace X.Y.Z with latest from PyPI

💡 Find the latest version on PyPI or GHCR.

📖 Full documentation: https://msk-access.github.io/gbcms/


Usage

gbcms can be used in two ways:

🔧 Option 1: Standalone CLI (1-10 samples)

Best for: Quick analysis, local processing, direct control

gbcms run \
    --variants variants.vcf \
    --bam sample1.bam \
    --fasta reference.fa \
    --output-dir results/

Output: results/sample1.vcf

Learn more:


🔄 Option 2: Nextflow Workflow (10+ samples, HPC)

Best for: Many samples, HPC clusters (SLURM), reproducible pipelines

nextflow run nextflow/main.nf \
    --input samplesheet.csv \
    --variants variants.vcf \
    --fasta reference.fa \
    -profile slurm

Features:

  • ✅ Automatic parallelization across samples
  • ✅ SLURM/HPC integration
  • ✅ Container support (Docker/Singularity)
  • ✅ Resume failed runs

Learn more:


Which Should I Use?

Scenario Recommendation
1-10 samples, local machine CLI
10+ samples, HPC cluster Nextflow
Quick ad-hoc analysis CLI
Production pipeline Nextflow
Need auto-parallelization Nextflow
Full manual control CLI

Quick Examples

CLI: Single Sample

gbcms run \
    --variants variants.vcf \
    --bam tumor.bam \
    --fasta hg19.fa \
    --output-dir results/ \
    --threads 4

CLI: Multiple Samples (Sequential)

gbcms run \
    --variants variants.vcf \
    --bam-list samples.txt \
    --fasta hg19.fa \
    --output-dir results/

Nextflow: Many Samples (Parallel)

# samplesheet.csv:
# sample,bam,bai
# tumor1,/path/to/tumor1.bam,
# tumor2,/path/to/tumor2.bam,

nextflow run nextflow/main.nf \
    --input samplesheet.csv \
    --variants variants.vcf \
    --fasta hg19.fa \
    --outdir results \
    -profile slurm

Documentation

📚 Full Documentation: https://msk-access.github.io/gbcms/

Quick Links:


Contributing

See CONTRIBUTING.md for development guidelines.

To contribute to documentation, see the gh-pages branch.


Citation

If you use gbcms in your research, please cite:

Shah, R. et al. (2025). gbcms: A high-performance orientation-aware genotype counting system for genomic variants. Available at: https://github.com/msk-access/gbcms

BibTeX:

@software{pygbcms,
  author       = {Shah, Ronak and contributors},
  title        = {gbcms: A high-performance orientation-aware genotype counting system for genomic variants},
  year         = {2025},
  url          = {https://github.com/msk-access/gbcms},
  note         = {GitHub repository}
}

License

AGPL-3.0 - see LICENSE for details.


Support

About

A high-performance orientation-aware genotype counting system for genomic variants

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors