High-performance FASTQ/FASTA quality analysis — written in Rust
No Java. No Python. No internet required.
Real Illumina data — SRR38033288 (43.5M reads · 6.08 Gbp · ~14 GB on disk · 4 threads · cold cache · WSL2)
| Tool | Time | What it does |
|---|---|---|
| BioFastq-A | 33s | overrep seqs · k-mers · dup · per-tile · N50/N90 · HTML report |
| fastp | 58s | adapter trimming · QC · k-mers |
| FastQC | 168s | similar analysis depth to BioFastq-A |
# Build (one-time, ~20s)
cargo build --release
# Interactive TUI — live dashboard while processing
./target/release/biofastq-a sample.fastq
# Headless — for scripts and CI
./target/release/biofastq-a sample.fastq --headless --output-dir ./reports
# Trim adapters + analyse
./target/release/biofastq-a reads.fastq --trim --output-dir ./qc
# Open the report
xdg-open ./reports/sample_report.html # Linux
open ./reports/sample_report.html # macOS
explorer.exe ./reports/sample_report.html # WSLBuild from source (recommended)
Requires Rust ≥ 1.80.
git clone https://github.com/DilaDeniz/BioFastq-a.git
cd BioFastq-a
cargo build --release
# binary → target/release/biofastq-aEnable native CPU optimisations (AVX2/SSE4 — recommended):
mkdir -p .cargo
echo '[build]' > .cargo/config.toml
echo 'rustflags = ["-C", "target-cpu=native"]' >> .cargo/config.toml
cargo build --releaseInstall system-wide:
bash install.sh # → /usr/local/bin (may need sudo)
bash install.sh ~/bin # → ~/bin (no sudo)Docker
docker build -t biofastq-a .
# Run (mount current directory as /data)
docker run --rm -v "$PWD":/data biofastq-a sample.fastq --headless
docker run --rm -v "$PWD":/data biofastq-a *.fastq.gz --trim --output-dir /data/qcHomebrew (macOS / Linux)
brew tap DilaDeniz/biofastq-a
brew install biofastq-abiofastq-a [OPTIONS] <file> [<file2> ...]
OPTIONS:
--headless No TUI — for scripts and CI
--output-dir <dir> Where to write reports (default: current directory)
--trim Trim adapters; write <stem>_trimmed.fastq.gz
--min-length <N> Drop trimmed reads shorter than N bp (default: 20)
--adapter <seq> Additional adapter sequence to screen/trim (repeatable)
--quality-trim <Q> Trim 3' bases with Phred quality below Q (default: off)
--threads <N> Number of CPU threads (default: all cores)
--strict Abort on first malformed record (default: skip & warn)
--paired-end <R2> Paired-end mode: provide R2 file path
--version, -V Print version
--help, -h Show help
QC Modules
| Module | Details |
|---|---|
| Per-base quality | Phred per position up to 500 bp · Q20/Q28/Q30 zone shading |
| Per-sequence quality | Read-level mean Phred distribution |
| Base composition | A/C/G/T/N % per position |
| GC content | Overall + FastQC-style pass/warn/fail |
| N content | N % per position |
| Sequence length | Distribution chart · N50 · N90 |
| Duplication | Fingerprint-hashes first 200k reads · deterministic |
| Overrepresented seqs | Top sequences by frequency · adapter source detection |
| Adapter content | 7 built-in sequences + custom via --adapter |
| Per-tile quality | Illumina CASAVA 1.8+ tile IDs · bar chart per tile |
| K-mer analysis | Parallel 4-mer counting · top enriched k-mers |
Each module shows a FastQC-style traffic light (Pass / Warn / Fail).
Output
<stem>_report.html — self-contained HTML report (offline, no CDN)
<stem>_report.json — machine-readable JSON for pipelines
<stem>_trimmed.fastq.gz — trimmed reads (only with --trim)
For multiple input files: one report per file + batch_report.html summary.
Adapters detected
| Name | Sequence (prefix matched) |
|---|---|
| TruSeq Read 1 | AGATCGGAAGAGCACACGTCT |
| TruSeq Read 2 | AGATCGGAAGAGCGTCGTGTA |
| Nextera Read 1/2 | CTGTCTCTTATACACATCT |
| Small RNA 3′ | TGGAATTCTCGGGTGCCAAGG |
| Poly-A | AAAAAAAAAAAAAAAAAAAAAA |
| Poly-T | TTTTTTTTTTTTTTTTTTTTTT |
Add custom adapters with --adapter SEQUENCE (repeatable).
- mmap zero-copy reader — sequence data never copied to heap
- RecordRange descriptors — byte offsets into shared mmap, no allocations in hot path
- crossbeam I/O pipeline — reader thread and rayon workers run in parallel
- BASE_LUT — 256-entry lookup table replaces 5-way branch per base
- AVX2 quality loops — phred sum, Q20, Q30 as separate vectorised passes (32 bytes/cycle)
- K-mer sampling — capped at first 200k reads, not the full file
vs FastQC
| BioFastq-A | FastQC | |
|---|---|---|
| Language | Rust | Java |
| Speed (real data) | 33s / 6.08 Gbp | 168s / 6.08 Gbp |
| Interactive TUI | Yes | No |
| Adapter trimming | Yes | No |
| N50 / N90 | Yes | No |
| Long-read support | Yes | Limited |
| Offline / no deps | Yes | Requires JVM |
| HTML report | Yes | Yes |
| Per-tile quality | Yes | Yes |
| Duplication estimate | Yes | Yes |
vs fastp
| BioFastq-A | fastp | |
|---|---|---|
| Language | Rust | C++ |
| Speed (real data) | 33s / 6 Gb | 58s / 6 Gb |
| Interactive TUI | Yes | No |
| N50 / N90 | Yes | No |
| Per-tile quality | Yes | No |
| Overrepresented seqs | Yes | No |
| FastQC traffic lights | Yes | No |
| Multi-file batch | Yes | No |
| Paired-end support | Yes | Yes (default) |
| Auto adapter detection | No | Yes |
| Poly-G tail trim | No | Yes |
Snakemake
rule fastq_qc:
input: "data/{sample}.fastq.gz"
output:
html = "qc/{sample}_report.html",
json = "qc/{sample}_report.json"
shell:
"biofastq-a {input} --headless --output-dir qc/"Nextflow
process BIOFASTQA {
input: path fastq
output: path "*_report.{html,json}"
script:
"""
biofastq-a ${fastq} --headless --output-dir .
"""
}If you find this project useful, consider sending a small tip. Due to age restrictions I'm unable to use traditional payment platforms — crypto is the only way I can receive support. Thank you!
| Network | Address |
|---|---|
| Solana (SOL) | AY5SwVxbvTHL16SUGj6kJBqMk4USniZmbqdXxH8xVrTa |
| Ethereum (ETH) | 0x5176d005DD096aFa145B3ffff308b72ed76f1554 |
MIT