DNA Null Framework

A per-sequence empirical null-distribution framework for detecting composition-independent long-range correlations in DNA sequences. Combines six parallel correlation methods (DFA, power spectrum, wavelet multifractal, recurrence quantification, excess entropy, reverse-complement symmetry) with dual null calibration: standard mononucleotide shuffle and Altschul–Erickson dinucleotide-preserving Euler shuffle.

Status: Research-grade, work in progress. Methods validated on π-derived null sequences and classical references (intergenic chromatin, telomeric repeats). Pilot applications to neurodevelopmental disorder gene panels are exploratory.

Preprint: Forthcoming on bioRxiv — DOI will appear here when posted.

What this does

Many DNA sequence statistics — Hurst exponents, 1/f^β spectra, multifractal widths, k-mer asymmetries — can flag "long-range correlations" that on closer inspection are driven by local nucleotide composition (GC bias, CpG depletion, codon usage). This framework separates the two:

Mono null shuffles preserve only single-base composition. Signals that survive are independent of base frequencies.
Di null uses an Altschul–Erickson Euler shuffle to preserve all 16 dinucleotide frequencies exactly. Signals that survive are independent of 2-mer composition, including CpG depletion.

For each input sequence, the framework computes Z-scores against N=100 shuffled controls from both null modes. Saturated values (|Z|≥20) are flagged separately, since with N=100 the empirical standard deviation is too noisy to make finer distinctions.

Validation

The framework is validated on a sequence with known statistical properties: the first 10⁶ digits of π, with digits {0,1,2,3} mapped to {A,C,G,T}. Under both null modes all twelve metrics return |Z|<3 on this sequence, confirming the method correctly identifies a numerically generated null as null.

The classical Peng et al. (1992) long-range correlations in intergenic chromatin are reproduced cleanly under both null modes, anchoring the framework against established findings.

Quick start

Requires Python 3.9+, numpy, mpmath, matplotlib.

git clone https://github.com/MarcusFFFFFF/dna-null-framework.git
cd dna-null-framework
pip install -r requirements.txt

# Download reference sequences from NCBI RefSeq
bash scripts/download_sequences.sh

# Run analysis with both null modes
python3 src/dna_unified_v11.py 100 --null-mode mono
python3 src/dna_unified_v11.py 100 --null-mode di

# Diagnostic pass: repeat-screening, FDR, power analysis
python3 src/dna_unified_v12.py

# Generate publication figures
python3 scripts/generate_figures_v2.py

Outputs are written to ~/dna_analysis/ by default. See docs/methodology.md for details.

What's in the repository

src/
  dna_unified_v8.py    Core analysis engine (six methods)
  dna_unified_v11.py   Empirical null wrapper, dual-mode
  dna_unified_v12.py   Diagnostics: repeats, FDR, power, length
scripts/
  download_sequences.sh   NCBI RefSeq fetch
  generate_figures_v2.py     Publication figures
data/
  accessions.txt          NCBI accession list (sequences fetched at runtime)
results/                  JSON outputs from runs
figures/                  PNG figures
docs/
  methodology.md          Detailed methods
  preprint_outline.md     Manuscript skeleton

Honest limitations

Pilot panel sizes (n=6 per group) yield only 40–50% power to detect medium effect sizes. Findings are suggestive, not confirmatory.
TBP in the housekeeping panel contains a polyQ (CAG/CAA) repeat that confounds RQA-based comparisons. Repeat-screening (dna_unified_v12.py) identifies this and similar issues.
Length-matching across functional groups is imperfect. Several metrics have known length-dependent finite-size biases.
mRNA-only analysis omits intronic and regulatory regions where structural signals may concentrate.

These limitations are flagged in the preprint Discussion. They are not failures of the framework — they are properties of any pilot study.

How to cite

If you use this framework, please cite the archived release via Zenodo:

Frenell, M. (2026). DNA Null Framework (v0.1.0). Zenodo.
https://doi.org/10.5281/zenodo.20283245

BibTeX:

@software{frenell_dna_null_2026,
  author    = {Frenell, Marcus},
  title     = {{DNA Null Framework}},
  month     = may,
  year      = 2026,
  publisher = {Zenodo},
  version   = {v0.1.0},
  doi       = {10.5281/zenodo.20283245},
  url       = {https://doi.org/10.5281/zenodo.20283245}
}

A preprint citation will be added once the bioRxiv version is posted.

License

MIT License — see LICENSE file. You are free to use, modify, and distribute, including for commercial purposes. Attribution requested via citation, not legally required.

Contributing

Issues and pull requests welcome. The framework is research-grade and there are many opportunities to extend:

Higher-order shuffle (Markov-order-k preserving) for trinucleotide and higher null models
Length-matched control generation
Larger reference panels for power
Integration with established bioinformatics packages
Performance optimization for genome-scale analysis

If you would like to collaborate on a specific biological application, open an issue describing the target system and we can discuss.

Contact

Marcus Frenell — marcusfrenell@gmail.com

Independent research, Stockholm, Sweden.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DNA Null Framework

What this does

Validation

Quick start

What's in the repository

Honest limitations

How to cite

License

Contributing

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
docs		docs
figures		figures
results		results
scripts		scripts
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DNA Null Framework

What this does

Validation

Quick start

What's in the repository

Honest limitations

How to cite

License

Contributing

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages