ngs-bits - Short-read and long-read sequencing tools for diagnostics

Installation

Binaries of ngs-bits are available via Bioconda:

Binaries for Linux/macOS

Alternatively, ngs-bits can be built from sources. Use git to clone the most recent release (the source code package of GitHub does not contains required sub-modules):

> git clone --recursive https://github.com/imgag/ngs-bits.git
> cd ngs-bits
> git checkout 2025_12
> git submodule update --recursive --init

Depending on your operating system, building instructions vary slightly:

Building from sources for Linux
Building from sources for MacOS
Building from sources for Windows

GSvar app requires a running server, instructions on how to deplpy it on a Linux machine can be found here

Support

Please report any issues or questions to the ngs-bits issue tracker.

Documentation

The documentation of individual tools is linked in the tools list below.
For some tools the documentation pages contain only the command-line help, for other tools they contain more information.

If you want to contribute, check the development documentation.

License

ngs-bits is provided under the MIT license, but is is based on other software components with different lincenses:

Qt is our base framwork for the graphical user interface, platform abstraction, data structures and much more.
htslib for HTS data format support (BAM, VCF, ...)
SimpleCrypt for weak encryption
QR-Code-generator for QR code generation

ChangeLog

Change log is available on the releases page.

Citing

You can cite ngs-bits in using Zenodo DOIs:

2025_12:
2025_09:

A list of all releases/DOIs can be found here.

Tools list

ngs-bits contains a lot of tools that are used for NGS-based diagnostics in our institute.

Some of the tools need the NGSD, a database that contains for example gene, transcript and exon data.
Installation instructions for the NGSD can be found here.

Main tools

SeqPurge - A highly-sensitive adapter trimmer for paired-end short-read data.
SampleSimilarity - Calculates pairwise sample similarity metrics from VCF/BAM files.
SampleIdentity - Tries to identify datasets that are from the same patient based on BAM/CRAM files of WGS/WES/lrGS/RNA sequencing.
SampleGender - Determines sample gender based on a BAM file.
SampleAncestry - Estimates the ancestry of a sample based on variants.
CnvHunter - CNV detection from targeted resequencing data using non-matched control samples.
RohHunter - ROH detection based on a variant list annotated with AF values.
UpdHunter - UPD detection from trio variant data.

QC tools

The default output format of the quality control tools is qcML, an XML-based format for -omics quality control, that consists of an XML schema, which defined the overall structure of the format, and an ontology which defines the QC metrics that can be used.
You can open the a qcML file in Firefox and to show a human-readable version of the XML content (this does not work in other browsers though since no other browser supports XSLT embedded in a XML file).

ReadQC - Quality control tool for FASTQ files.
MappingQC - Quality control tool for a BAM file.
VariantQC - Quality control tool for a VCF file.
SomaticQC - Quality control tool for tumor-normal pairs (paper).
TrioMaternalContamination - Detects maternal contamination of a child using SNPs from parents.
TrioMendelianErrors - Determines mendelian error rate form a trio VCF file.
RnaQC - Calculates QC metrics for RNA samples.
QcToTsv - Converts qcML files to a TSV file.

BAM tools

BamClipOverlap - (Soft-)Clips paired-end reads that overlap.
BamDownsample - Downsamples a BAM file to the given percentage of reads.
BamExtract - Extract reads from BAM/CRAM by read name.
BamFilter - Filters a BAM file by multiple criteria.
BamHighCoverage - Determines high-coverage regions in a BAM file.
BamInfo - Basic BAM information.
BamToFastq - Converts a coordinate-sorted BAM file to FASTQ files.
FastaFromBam - Download the reference genome FASTA file for a BAM/CRAM file.

BED tools

BedAdd - Merges regions from several BED files.
BedAnnotateFromBed - Annotates BED file regions with information from a second BED file.
BedAnnotateGC - Annnotates the regions in a BED file with GC content.
BedAnnotateGenes - Annotates BED file regions with gene names (needs NGSD).
BedChunk - Splits regions in a BED file to chunks of a desired size.
BedCoverage - Annotates the regions in a BED file with the average coverage in one or several BAM files.
BedExtend - Extends the regions in a BED file by n bases.
BedGeneOverlap - Calculates how much of each overlapping gene is covered (needs NGSD).
BedHighCoverage - Detects high-coverage regions from a BAM file.
BedInfo - Prints summary information about a BED file.
BedIntersect - Intersects two BED files.
BedLiftOver - Lift-over of regions in a BED file to a different genome build.
BedLowCoverage - Calcualtes regions of low coverage based on a input BED and BAM file.
BedMerge - Merges overlapping regions in a BED file.
BedReadCount - Annoates the regions in a BED file with the read count from a BAM file.
BedShrink - Shrinks the regions in a BED file by n bases.
BedSort - Sorts the regions in a BED file
BedSubtract - Subracts one BED file from another BED file.
BedToFasta - Converts BED file to a FASTA file (based on the reference genome).
CnvReferenceCohort - Create a reference cohort for CNV calling from a list of coverage profiles.

FASTQ tools

FastqAddBarcode - Adds sequences from separate FASTQ as barcodes to read IDs.
FastqConvert - Converts the quality scores from Illumina 1.5 offset to Sanger/Illumina 1.8 offset.
FastqConcat - Concatinates several FASTQ files into one output FASTQ file.
FastqDownsample - Downsamples paired-end FASTQ files.
FastqExtract - Extracts reads from a FASTQ file according to an ID list.
FastqExtractBarcode - Moves molecular barcodes of reads to a separate file.
FastqExtractUMI - Moves unique moleculare identifier from read sequence to read ID.
FastqFormat - Determines the quality score offset of a FASTQ file.
FastqList - Lists read IDs and base counts.
FastqMidParser - Counts the number of occurances of each MID/index/barcode in a FASTQ file.
FastqToFasta - Converts FASTQ to FASTA format.
FastqTrim - Trims start/end bases from the reads in a FASTQ file.

VCF tools (small variants)

VcfAdd - Merges several VCF files into one VCF by appending one to the other.
VcfAnnotateConsequence - Adds transcript-specific consequence predictions to a VCF file (similar to Ensembl VEP).
VcfAnnotateFromBed - Annotates the INFO column of a VCF with data from a BED file.
VcfAnnotateFromBigWig - Annotates the INFO column of a VCF with data from a BED file.
VcfAnnotateFromVcf - Annotates a VCF file with data from one or more source VCF files.
VcfAnnotateHexplorer - Annotates a VCF with Hexplorer and HBond scores.
VcfAnnotateMaxEntScan - Annotates a VCF file with MaxEntScan scores.
VcfBreakMulti - Breaks multi-allelic variants into several lines, making sure that allele-specific INFO/SAMPLE fields are still valid.
VcfCalculatePRS - Calculates the Polgenic Risk Score(s) for a sample.
VcfCheck - Checks a VCF file for errors.
VcfExtractSamples - Extract one or several samples from a VCF file. Can also be used to re-order sample columns.
VcfFilter - Filters a VCF based on the given criteria.
VcfLeftNormalize - Normalizes all variants and shifts indels to the left in a VCF file.
VcfReplaceSamples - Replaces sample identifiers in the VCF header.
VcfSort - Sorts variant lists according to chromosomal position.
VcfSplit - Splits a VCF into several chunks.
VcfStrip - Removes unwanted information from a VCF file
VcfStreamSort - Sorts entries of a VCF file according to genomic position using a stream.
VcfSubtract - Substracts the variants in a VCF from a second VCF.
VcfToBed - Converts a VCF file to a BED file.
VcfToBedpe - Converts a VCF file containing structural variants to BEDPE format.
VcfToTsv - Converts a VCF file to a tab-separated text file.

BEDPE tools (structural variants)

BedpeAnnotateFromBed - Annotates a BEDPE file with information from a BED file.
BedpeFilter - Filters a BEDPE file by region.
BedpeGeneAnnotation - Annotates a BEDPE file with gene information from the NGSD (needs NGSD).
BedpeSort - Sort a BEDPE file according to chromosomal position.
BedpeToBed - Converts a BEDPE file into BED file.
SvFilterAnnotations - Filter a structural variant list in BEDPE format based on variant annotations.

Gene handling tools

GenePrioritization: Performs gene prioritization based on list of known disease genes and a PPI graph (see also GraphStringDb).
GraphStringDb: Creates simple representation of String-DB interaction graph.
GenesToApproved - Replaces gene symbols by approved symbols using the HGNC database (needs NGSD).
GenesToBed - Converts a text file with gene names to a BED file (needs NGSD).
GenesToTranscripts - Converts a text file with gene names to transcript names (needs NGSD).
NGSDExportGenes - Lists genes from NGSD (needs NGSD).
TranscriptsToBed - Converts a text file with transcript names to a BED file (needs NGSD).

Phenotype handling tools

PhenotypesToGenes - Converts a phenotype list to a list of matching genes (needs NGSD).
PhenotypeSubtree - Returns all sub-phenotype of a given phenotype (needs NGSD).

Misc tools

FastqFromBam - Download the reference genome FASTA file for a BAM/CRAM file.
FastaChecksumUpdate - Fixes MD5 checksums in FASTA sequence headers.
FastaInfo - Basic info on a FASTA file containing DNA sequences.
FastaMask - Mask regions in a FASTA file with N bases.
HgvsToVcf - Transforms a TSV file with transcript ID and HGVS.c change into a VCF file (needs NGSD).
VariantRanking - Rankes small variants in the context of a patients phenotype using an evidence-based model (needs NGSD).

Name		Name	Last commit message	Last commit date
Latest commit History 5,157 Commits
.github/workflows		.github/workflows
bin		bin
doc		doc
htslib		htslib
libxml2		libxml2
src		src
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ngs-bits - Short-read and long-read sequencing tools for diagnostics

Installation

Support

Documentation

License

ChangeLog

Citing

Tools list

Main tools

QC tools

BAM tools

BED tools

FASTQ tools

VCF tools (small variants)

BEDPE tools (structural variants)

Gene handling tools

Phenotype handling tools

Misc tools

About

Uh oh!

Releases 41

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ngs-bits - Short-read and long-read sequencing tools for diagnostics

Installation

Support

Documentation

License

ChangeLog

Citing

Tools list

Main tools

QC tools

BAM tools

BED tools

FASTQ tools

VCF tools (small variants)

BEDPE tools (structural variants)

Gene handling tools

Phenotype handling tools

Misc tools

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 41

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages