BioFastq-A

High-performance FASTQ/FASTA quality analysis — written in Rust

No Java. No Python. No internet required.

Benchmark

Real Illumina data — SRR38033288 (43.5M reads · 6.08 Gbp · ~14 GB on disk · 4 threads · cold cache · WSL2)

Tool	Time	What it does
BioFastq-A	33s	overrep seqs · k-mers · dup · per-tile · N50/N90 · HTML report
fastp	58s	adapter trimming · QC · k-mers
FastQC	168s	similar analysis depth to BioFastq-A

Quick start

# Build (one-time, ~20s)
cargo build --release

# Interactive TUI — live dashboard while processing
./target/release/biofastq-a sample.fastq

# Headless — for scripts and CI
./target/release/biofastq-a sample.fastq --headless --output-dir ./reports

# Trim adapters + analyse
./target/release/biofastq-a reads.fastq --trim --output-dir ./qc

# Open the report
xdg-open ./reports/sample_report.html   # Linux
open ./reports/sample_report.html       # macOS
explorer.exe ./reports/sample_report.html  # WSL

Installation

Build from source (recommended)

Requires Rust ≥ 1.80.

git clone https://github.com/DilaDeniz/BioFastq-a.git
cd BioFastq-a
cargo build --release
# binary → target/release/biofastq-a

Enable native CPU optimisations (AVX2/SSE4 — recommended):

mkdir -p .cargo
echo '[build]' > .cargo/config.toml
echo 'rustflags = ["-C", "target-cpu=native"]' >> .cargo/config.toml
cargo build --release

Install system-wide:

bash install.sh          # → /usr/local/bin (may need sudo)
bash install.sh ~/bin    # → ~/bin (no sudo)

Docker

docker build -t biofastq-a .

# Run (mount current directory as /data)
docker run --rm -v "$PWD":/data biofastq-a sample.fastq --headless
docker run --rm -v "$PWD":/data biofastq-a *.fastq.gz --trim --output-dir /data/qc

Homebrew (macOS / Linux)

brew tap DilaDeniz/biofastq-a
brew install biofastq-a

Usage

biofastq-a [OPTIONS] <file> [<file2> ...]

OPTIONS:
  --headless             No TUI — for scripts and CI
  --output-dir <dir>     Where to write reports (default: current directory)
  --trim                 Trim adapters; write <stem>_trimmed.fastq.gz
  --min-length <N>       Drop trimmed reads shorter than N bp (default: 20)
  --adapter <seq>        Additional adapter sequence to screen/trim (repeatable)
  --quality-trim <Q>     Trim 3' bases with Phred quality below Q (default: off)
  --threads <N>          Number of CPU threads (default: all cores)
  --strict               Abort on first malformed record (default: skip & warn)
  --paired-end <R2>      Paired-end mode: provide R2 file path
  --version, -V          Print version
  --help, -h             Show help

Features

QC Modules

Module	Details
Per-base quality	Phred per position up to 500 bp · Q20/Q28/Q30 zone shading
Per-sequence quality	Read-level mean Phred distribution
Base composition	A/C/G/T/N % per position
GC content	Overall + FastQC-style pass/warn/fail
N content	N % per position
Sequence length	Distribution chart · N50 · N90
Duplication	Fingerprint-hashes first 200k reads · deterministic
Overrepresented seqs	Top sequences by frequency · adapter source detection
Adapter content	7 built-in sequences + custom via `--adapter`
Per-tile quality	Illumina CASAVA 1.8+ tile IDs · bar chart per tile
K-mer analysis	Parallel 4-mer counting · top enriched k-mers

Each module shows a FastQC-style traffic light (Pass / Warn / Fail).

Output

<stem>_report.html      — self-contained HTML report (offline, no CDN)
<stem>_report.json      — machine-readable JSON for pipelines
<stem>_trimmed.fastq.gz — trimmed reads (only with --trim)

For multiple input files: one report per file + batch_report.html summary.

Adapters detected

Name	Sequence (prefix matched)
TruSeq Read 1	`AGATCGGAAGAGCACACGTCT`
TruSeq Read 2	`AGATCGGAAGAGCGTCGTGTA`
Nextera Read 1/2	`CTGTCTCTTATACACATCT`
Small RNA 3′	`TGGAATTCTCGGGTGCCAAGG`
Poly-A	`AAAAAAAAAAAAAAAAAAAAAA`
Poly-T	`TTTTTTTTTTTTTTTTTTTTTT`

Add custom adapters with --adapter SEQUENCE (repeatable).

How it's fast

mmap zero-copy reader — sequence data never copied to heap
RecordRange descriptors — byte offsets into shared mmap, no allocations in hot path
crossbeam I/O pipeline — reader thread and rayon workers run in parallel
BASE_LUT — 256-entry lookup table replaces 5-way branch per base
AVX2 quality loops — phred sum, Q20, Q30 as separate vectorised passes (32 bytes/cycle)
K-mer sampling — capped at first 200k reads, not the full file

Comparison

vs FastQC

	BioFastq-A	FastQC
Language	Rust	Java
Speed (real data)	33s / 6.08 Gbp	168s / 6.08 Gbp
Interactive TUI	Yes	No
Adapter trimming	Yes	No
N50 / N90	Yes	No
Long-read support	Yes	Limited
Offline / no deps	Yes	Requires JVM
HTML report	Yes	Yes
Per-tile quality	Yes	Yes
Duplication estimate	Yes	Yes

vs fastp

	BioFastq-A	fastp
Language	Rust	C++
Speed (real data)	33s / 6 Gb	58s / 6 Gb
Interactive TUI	Yes	No
N50 / N90	Yes	No
Per-tile quality	Yes	No
Overrepresented seqs	Yes	No
FastQC traffic lights	Yes	No
Multi-file batch	Yes	No
Paired-end support	Yes	Yes (default)
Auto adapter detection	No	Yes
Poly-G tail trim	No	Yes

Pipeline integration

Snakemake

rule fastq_qc:
    input:  "data/{sample}.fastq.gz"
    output:
        html = "qc/{sample}_report.html",
        json = "qc/{sample}_report.json"
    shell:
        "biofastq-a {input} --headless --output-dir qc/"

Nextflow

process BIOFASTQA {
    input:  path fastq
    output: path "*_report.{html,json}"
    script:
    """
    biofastq-a ${fastq} --headless --output-dir .
    """
}

Support

If you find this project useful, consider sending a small tip. Due to age restrictions I'm unable to use traditional payment platforms — crypto is the only way I can receive support. Thank you!

Network	Address
Solana (SOL)	`AY5SwVxbvTHL16SUGj6kJBqMk4USniZmbqdXxH8xVrTa`
Ethereum (ETH)	`0x5176d005DD096aFa145B3ffff308b72ed76f1554`

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.cargo		.cargo
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
Formula		Formula
recipes/biofastq-a		recipes/biofastq-a
src		src
test_data		test_data
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
logo.png		logo.png
test_sample.fastq		test_sample.fastq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioFastq-A

Benchmark

Quick start

Installation

Usage

Features

How it's fast

Comparison

Pipeline integration

Support

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BioFastq-A

Benchmark

Quick start

Installation

Usage

Features

How it's fast

Comparison

Pipeline integration

Support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages