Skip to content

Devanik21/AnTiEnTRopY

Repository files navigation

AnTiEnTRopY

image

Epigenetic Entropy Reversal & Biological Age Intelligence Platform

Python Streamlit License scikit-learn Status NumPy SciPy Plotly Pandas Stars Domain Monte Carlo Made with


"Aging is not an inevitable consequence of thermodynamics — it is a drift in information storage."

— Inspired by David A. Sinclair's Information Theory of Aging


Abstract

AnTiEnTRopY is an open-source, research-grade computational platform for the quantitative analysis of biological aging through the lens of epigenetic entropy. The system integrates five mathematically distinct engines — an ElasticNetCV biological clock, a site-wise Shannon entropy decomposition module, a Harmonic Resonance Field (HRF) wave-interference classifier, a partial reprogramming reversal simulator, and a Monte Carlo escape velocity engine — into a unified Streamlit interface.

The core hypothesis formalized here is that biological aging corresponds, at the epigenomic level, to a stochastic drift of CpG methylation beta values toward maximum informational disorder, i.e. toward $\beta \to 0.5$. Young, ordered epigenomes occupy low-entropy attractors in CpG state space; senescent epigenomes exhibit decoherence analogous to thermal noise in a physical system. AnTiEnTRopY quantifies this drift, classifies age states via resonance-field dynamics, and models the conditions under which targeted reprogramming interventions could exceed the biological aging rate — the epigenetic escape velocity condition.

This work draws on the epigenetic clock literature (Horvath 2013; Hannum 2013; Levine et al. 2018), partial reprogramming biology (Ocampo et al. 2016; Lu et al. 2020; Yang et al. 2023), longevity escape velocity theory (de Grey 2004), and novel spectral analysis of the methylome inspired by the HRF wave-interference framework introduced in Debanik Debnath (2025).


Table of Contents


Scientific Background

DNA Methylation and the Epigenetic Clock

DNA methylation at CpG dinucleotides — the addition of a methyl group to the fifth carbon of cytosine when followed by guanine — is among the most extensively studied epigenetic modifications in eukaryotes. The human genome contains approximately 28 million CpG sites, of which roughly 2–4% are assayed in Illumina 450K and EPIC (850K) array platforms widely used in aging research.

In 2013, Steve Horvath (UCLA) demonstrated that a sparse linear combination of DNA methylation beta values at just 353 CpG sites could predict chronological age across 51 tissue types and 82 datasets with a median absolute error of approximately 3.6 years — a finding that fundamentally reshaped our understanding of epigenetic aging. This "pan-tissue epigenetic clock" (Horvath, 2013) uses elastic net regression to select and weight CpG sites, many of which are located in CpG islands near developmental gene promoters. Its construction on 8,000 samples spanning the entire human lifespan — including embryonic stem cells (predicted age ≈ 0) and centenarian tissue — established that DNA methylation age is a genuine biological signal, not merely a statistical artifact.

The deviation of epigenetic age from chronological age — epigenetic age acceleration — has since been linked to all-cause mortality, cognitive decline, cancer risk, and inflammatory burden, independent of known clinical risk factors (Chen et al. 2016; Marioni et al. 2015).

The Entropy Hypothesis of Aging

The hypothesis formalized in AnTiEnTRopY draws from several converging theoretical threads:

Waddington's Epigenetic Landscape (1957) described development as a ball rolling downhill into attractor basins, with cell identity corresponding to stable low-energy valleys. Aging, in this framework, can be understood as a slow erosion of those valleys — a drift away from the developmentally defined attractors into states of higher informational disorder.

The Information Theory of Aging (Sinclair & LaPlante 2019; Kane & Sinclair 2019) proposes that aging arises from the loss of epigenetic information — not from the accumulation of DNA mutations per se, but from a progressive noisification of the methylation landscape. Young cells maintain a high-fidelity epigenetic program; old cells exhibit increased stochastic variation in methylation state, reducing the signal-to-noise ratio of gene regulatory programs.

Shannon Entropy as a Disorder Metric: The binary entropy function $H(\beta)$ provides a natural, site-specific quantification of this disorder. When $\beta = 0$ or $\beta = 1$, the CpG is in a deterministic, ordered state ($H = 0$). When $\beta = 0.5$, the site carries maximum uncertainty, contributing fully to epigenomic noise ($H = 1$).

Partial Reprogramming and Age Reversal

The landmark 2006 discovery by Takahashi and Yamanaka demonstrated that somatic cells can be reprogrammed to pluripotency through the ectopic expression of four transcription factors: Oct4, Sox2, Klf4, and c-Myc (OSKM). Full reprogramming erases cellular identity, but the partial reprogramming paradigm — pioneered in vivo by Ocampo et al. (2016) — demonstrates that cyclic, transient expression of OSKM (or the safer OSK subset; Lu et al. 2020) can reverse age-associated epigenetic marks without loss of cell type identity. In aged mice, systemically delivered AAV-OSK extended median remaining lifespan by 109% (Browder et al. 2023).

These experiments confirm the central assumption of AnTiEnTRopY's reversal module: that the epigenetic program of an aged cell retains a recoverable "youthful memory" that can, in principle, be restored by moving high-drift CpG sites back toward their young-reference beta values.


Mathematical Framework

I. Epigenetic Entropy Engine

File: EnTRopY.py | Class: EpigeneticEntropy

1.1 Binary Shannon Entropy per CpG Site

For a CpG site with methylation beta value $\beta \in [0, 1]$, the site-wise informational disorder is quantified by the binary Shannon entropy:

$$H(\beta) = -\beta \log_2(\beta) - (1-\beta) \log_2(1-\beta)$$

with boundary conditions $H(0) = H(1) = 0$ (ordered states) and $H(0.5) = 1$ (maximum disorder). A small numerical epsilon $\varepsilon = 10^{-10}$ is applied to avoid $\log(0)$ divergence:

$$H(\beta) = -\tilde{\beta}\log_2(\tilde{\beta}) - (1-\tilde{\beta})\log_2(1-\tilde{\beta}), \qquad \tilde{\beta} = \text{clip}(\beta,\, \varepsilon,\, 1-\varepsilon)$$

1.2 Methylation Order Index (MOI)

For a sample with $N$ measured CpG sites ${\beta_1, \beta_2, \ldots, \beta_N}$, the Methylation Order Index is defined as:

$$\text{MOI} = 1 - \frac{1}{N}\sum_{i=1}^{N} H(\beta_i)$$

The MOI ranges in $[0,1]$: a perfectly ordered (youthful) epigenome achieves $\text{MOI} = 1$, while a maximally disordered (senescent) epigenome achieves $\text{MOI} = 0$.

1.3 Age-Entropy Regression

Across $M$ samples with chronological ages ${a_1, \ldots, a_M}$ and mean entropies ${\bar{H}_1, \ldots, \bar{H}_M}$, the entropy-aging rate is estimated by ordinary least squares:

$$\bar{H}_j = \alpha + \lambda \cdot a_j + \varepsilon_j, \qquad \lambda = \frac{\text{Cov}(\bar{H}, a)}{\text{Var}(a)}$$

The slope $\lambda$ gives the entropy increase per chronological year — the fundamental aging rate parameter subsequently used by the Immortality Engine.

1.4 CpG Drift Classification

For each CpG site $k$, the Pearson correlation of its beta value with chronological age across the population:

$$r_k = \frac{\sum_{j=1}^{M}(\beta_{kj} - \bar{\beta}_k)(a_j - \bar{a})}{\sqrt{\sum_{j=1}^{M}(\beta_{kj}-\bar{\beta}_k)^2}\sqrt{\sum_{j=1}^{M}(a_j-\bar{a})^2}}$$

is computed via vectorized matrix operations (complexity $\mathcal{O}(MN)$ in a single pass). Sites with $|r_k| > 0.3$ are classified as drift CpGs, further subdivided into hypermethylated ($r_k > 0.3$) and hypomethylated ($r_k < -0.3$) with age.

1.5 Chaos Fraction and Ordered Fraction

For each sample, two complementary disorder statistics are computed:

$$f_{\text{chaos}} = \frac{1}{N}\sum_{i=1}^{N}\mathbf{1}[0.4 < \beta_i < 0.6]$$ $$f_{\text{ordered}} = \frac{1}{N}\sum_{i=1}^{N}\mathbf{1}[\beta_i < 0.2 \;\text{or}\; \beta_i > 0.8]$$

where $\mathbf{1}[\cdot]$ is the indicator function. $f_{\text{chaos}}$ quantifies the fraction of sites near maximum entropy; $f_{\text{ordered}}$ quantifies those in fully committed methylation states.


II. Biological Clock (ElasticNet)

File: CloCk.py | Class: BiologicalClock

2.1 Feature Selection by Variance

From the full CpG matrix $\mathbf{X} \in \mathbb{R}^{M \times N}$, the top $K$ CpG sites by inter-sample variance are retained:

$$\hat{\sigma}^2_k = \frac{1}{M-1}\sum_{j=1}^{M}\left(\beta_{kj} - \bar{\beta}_k\right)^2, \qquad \mathcal{S} = \underset{k}{\arg\text{top-}K}\;\hat{\sigma}^2_k$$

The default is $K = 5{,}000$, representing the highest-variance CpGs over the population.

2.2 Elastic Net Regression

The reduced feature matrix $\mathbf{X}_\mathcal{S} \in \mathbb{R}^{M \times K}$ is standardized ($\mu = 0$, $\sigma = 1$ per feature), and the biological age predictor is fit by ElasticNet:

$$\hat{\boldsymbol{w}} = \underset{\boldsymbol{w}}{\arg\min}\;\left[\frac{1}{2M}\|\mathbf{X}_\mathcal{S}\boldsymbol{w} - \mathbf{y}\|_2^2 + \alpha\left(\frac{1-\rho}{2}\|\boldsymbol{w}\|_2^2 + \rho\|\boldsymbol{w}\|_1\right)\right]$$

where $\alpha &gt; 0$ is the regularization strength and $\rho \in [0,1]$ is the L1 mixing ratio. The hyperparameters $(\alpha, \rho)$ are selected via 5-fold cross-validated grid search over:

$$\alpha \in \{0.001,\; 0.01,\; 0.05,\; 0.1,\; 0.5,\; 1.0\} \qquad \rho \in \{0.1,\; 0.5,\; 0.7,\; 0.9,\; 0.95,\; 1.0\}$$

The L1 penalty ($\rho \to 1$) promotes sparsity, selecting only the most predictive CpG sites — analogous to the feature selection strategy used in the original Horvath clock construction (Horvath 2013; Friedman et al. 2010).

2.3 Intrinsic Age Acceleration

Raw predicted biological age $\hat{a}{\text{bio}}$ is correlated with chronological age $a{\text{chrono}}$ by construction of the regression. To remove this dependency and isolate true epigenetic dysregulation, intrinsic age acceleration is computed as the residual of the linear regression $\hat{a}{\text{bio}} \sim a{\text{chrono}}$:

$$\hat{a}_{\text{bio},j} = \gamma_0 + \gamma_1 a_{\text{chrono},j} + \delta_j$$ $$\Delta_j^{\text{IEAA}} = \hat{a}_{\text{bio},j} - (\hat{\gamma}_0 + \hat{\gamma}_1 a_{\text{chrono},j})$$

A positive $\Delta^{\text{IEAA}}$ indicates that the sample's epigenome appears older than expected for its chronological age — a measure associated in the literature with elevated mortality risk (Chen et al. 2016) and reduced cognitive function (Marioni et al. 2015). This is precisely the Intrinsic Epigenetic Age Acceleration (IEAA) formulation of Horvath and Raj (2018).

2.4 Horvath Clock CpG Overlap

A curated subset of known Horvath 2013 clock CpGs (hypermethylated and hypomethylated with age) is cross-referenced against the selected feature set, providing a biological validation metric for the user's dataset coverage.


III. Harmonic Resonance Field Classifier

File: HRF_EpIgEnEtIc.py | Class: HRFEpigenetic

3.1 Conceptual Foundation

The HRF framework, originally introduced by Debanik Debnath (2025) for EEG-based brain state classification, is adapted here to the methylation domain. The central analogy is:

EEG Domain Epigenetic Domain
Neural oscillation patterns Methylation beta-value profiles
Brain state (sleep, wake, etc.) Biological age class (Young, Middle, Old)
Signal coherence Methylation order (MOI)
Phase noise Epigenetic entropy ($\bar{H}$)
Resonance frequency Age-class-specific $\omega_c$

A young epigenome exhibits coherent, low-entropy methylation patterns — analogous to a standing wave at a class-specific resonance frequency. A senescent epigenome exhibits phase decoherence — analogous to high-frequency noise overwhelming the structured signal.

3.2 Dimensionality Reduction via PCA

The methylation matrix $\mathbf{X} \in \mathbb{R}^{M \times N}$ (standardized) is projected to $d$ principal components via truncated SVD:

$$\mathbf{X}_{\text{centered}} = \mathbf{X} - \bar{\mathbf{X}}, \qquad \mathbf{X}_{\text{centered}} \approx \mathbf{U}\mathbf{\Sigma}\mathbf{V}^\top$$ $$\mathbf{Z} = \mathbf{X}_{\text{centered}}\,\mathbf{V}_{:d}^\top \in \mathbb{R}^{M \times d}$$

The default $d = 200$ components capture the majority of variance while reducing the distance computation from $N \sim 10^4$–$10^5$ dimensions to a tractable subspace.

3.3 Resonance Energy Function

Given a query sample $\mathbf{q} \in \mathbb{R}^d$ and the $k$ nearest training samples from class $c$, denoted ${\mathbf{x}^{(c)}i}{i=1}^{k}$, the resonance energy for class $c$ is:

$$E_c(\mathbf{q}) = \sum_{i=1}^{k} \underbrace{\exp\!\left(-\gamma\,\|\mathbf{q} - \mathbf{x}^{(c)}_i\|_2^2\right)}_{\text{Gaussian envelope}} \cdot \underbrace{\left(1 + \cos\!\left(\omega_c\,\|\mathbf{q} - \mathbf{x}^{(c)}_i\|_2\right)\right)}_{\text{wave interference term}}$$

where:

  • $\gamma &gt; 0$ is the spatial damping coefficient, controlling how rapidly the influence of a training point decays with distance
  • $\omega_c = \omega_0 \cdot (c+1)$ is the class-specific resonance frequency, with $c \in {0, 1, 2}$ for Young, Middle, and Old
  • $k$ is the number of nearest oscillators per class (default $k = 5$)

The Gaussian envelope implements locality — only nearby epigenomes contribute meaningfully to the resonance energy. The cosine term implements wave interference — when the query lies at a resonant distance from a training point, the energy is amplified; at anti-resonant distances, it is suppressed.

3.4 Classification Rule

Class assignment follows maximum resonance energy:

$$\hat{c}(\mathbf{q}) = \underset{c \in \mathcal{C}}{\arg\max}\; E_c(\mathbf{q})$$

3.5 Resonance Probability

Normalized class probabilities are computed as softmax-like normalization over energies:

$$P_c(\mathbf{q}) = \frac{E_c(\mathbf{q})}{\sum_{c' \in \mathcal{C}} E_{c'}(\mathbf{q})}$$

These probabilities provide graded age-state confidence, visualizable as a radar chart or probability bar across the three biological age classes.

3.6 Hyperparameter Auto-Evolution

The parameters $(\omega_0, \gamma)$ are optimized via grid search evaluated on a random leave-some-out subsample of size $n \leq 100$:

$$(\hat{\omega}_0, \hat{\gamma}) = \underset{\omega_0 \in \Omega,\;\gamma \in \Gamma}{\arg\max}\;\frac{1}{n}\sum_{i=1}^{n}\mathbf{1}\!\left[\hat{c}(\mathbf{z}_i) = y_i\right]$$

over the grid $\Omega = {0.1, 1, 5, 10, 20, 50}$ and $\Gamma = {0.01, 0.1, 0.5, 1.0, 2.0}$.

3.7 Methylation Wave Signature (FFT Analysis)

For spectral analysis of the methylation profile, the beta-value vector of a sample (truncated to the top-$K$ variable CpGs) is treated as a 1D signal and decomposed via the Real Fast Fourier Transform:

$$\hat{\beta}_\ell = \sum_{n=0}^{K-1}(\beta_n - \bar{\beta})\,e^{-2\pi i \ell n / K}, \qquad \ell = 0, 1, \ldots, \lfloor K/2 \rfloor$$

The power spectrum $P_\ell = |\hat{\beta}_\ell|^2$ is analyzed for:

$$\text{Coherence Ratio} = \frac{\sum_{\ell < L/2} P_\ell}{\sum_{\ell} P_\ell}$$ $$\text{Spectral Entropy} = -\sum_{\ell} \tilde{P}_\ell \log_2(\tilde{P}_\ell), \qquad \tilde{P}_\ell = \frac{P_\ell}{\sum_{\ell'} P_{\ell'}}$$

A high coherence ratio (low-frequency dominance) reflects an ordered, youthful methylation pattern; a high spectral entropy reflects a disordered, high-frequency landscape consistent with epigenetic aging.


IV. Reversal Simulator

File: ReVeRsAL.py | Class: ReversalSimulator

4.1 Young and Old Reference Methylomes

From the full dataset, young and old reference methylation profiles are constructed as population means over the youngest and oldest $p$-th percentile samples (default $p = 20$):

$$\bar{\boldsymbol{\beta}}_{\text{young}} = \frac{1}{|Y|}\sum_{j \in Y}\boldsymbol{\beta}_j, \qquad Y = \{j : a_j \leq a_{(p)}\}$$ $$\bar{\boldsymbol{\beta}}_{\text{old}} = \frac{1}{|O|}\sum_{j \in O}\boldsymbol{\beta}_j, \qquad O = \{j : a_j \geq a_{(100-p)}\}$$

The per-site drift magnitude is:

$$\Delta_k = |\bar{\beta}^{\text{old}}_k - \bar{\beta}^{\text{young}}_k|$$

4.2 Intervention Model

Partial reprogramming is modeled as a targeted reset of the highest-drift CpG sites toward the young reference. Given an intervention at level $p_{\text{int}} \in [0, 100]%$, the top $\lfloor N \cdot p_{\text{int}} / 100 \rfloor$ sites by drift magnitude are selected:

$$\mathcal{I} = \underset{k}{\arg\text{top-}n}\;\Delta_k, \qquad n = \left\lfloor \frac{N \cdot p_{\text{int}}}{100}\right\rfloor$$

The post-intervention beta values are:

$$\beta^{\text{new}}_k = \begin{cases} \bar{\beta}^{\text{young}}_k & \text{if } k \in \mathcal{I} \quad (\text{"full reset"}) \\ \beta^{\text{old}}_k & \text{if } k \notin \mathcal{I} \end{cases}$$

or, for a partial (80%) correction:

$$\beta^{\text{new}}_k = 0.2\,\beta^{\text{old}}_k + 0.8\,\bar{\beta}^{\text{young}}_k, \qquad k \in \mathcal{I}$$

All values are clipped to $[0, 1]$. The resulting biological age change is:

$$\Delta a_{\text{bio}} = \hat{a}_{\text{bio}}(\boldsymbol{\beta}^{\text{orig}}) - \hat{a}_{\text{bio}}(\boldsymbol{\beta}^{\text{new}})$$

This formulation is mathematically analogous to the partial reprogramming intervention modeled in Ocampo et al. (2016), where cyclic OSKM expression partially restores the youthful methylation landscape without inducing full dedifferentiation.

4.3 Reversal Curve

A sweep over $p_{\text{int}} \in [1%, 100%]$ generates the full reversal curve — biological years reversed as a function of CpG intervention fraction. This non-linear curve typically exhibits diminishing returns at high intervention levels and a steep initial slope when targeting the highest-drift sites first, consistent with the experimental observation that partial reprogramming (even with a small number of Yamanaka factors) achieves substantial epigenetic rejuvenation (Lu et al. 2020).


V. Immortality Engine & Escape Velocity

File: ImMoRtAlItY.py | Class: ImmortalityEngine

5.1 Calibration of the Aging Rate

The entropy-based aging rate is calibrated from population data:

$$\lambda = \frac{d\bar{H}}{dt} \quad \text{(H units/year)}$$

estimated from the slope of the age-entropy regression (Section I.3). The biological entropy age $\bar{H}(a) = \bar{H}_0 + \lambda a$ provides a linear model of epigenomic aging.

5.2 Escape Velocity Condition

Consider an individual receiving reprogramming interventions at regular intervals of $T$ years. Each intervention reverses $R(p)$ biological years, where $R(\cdot)$ is the empirical reversal curve from Section IV.3. The net biological age change per cycle is:

$$\Delta a_{\text{net}}(p, T) = T - R(p)$$

The system is at epigenetic escape velocity when $\Delta a_{\text{net}} \leq 0$, i.e. when the reversal rate matches or exceeds the aging rate:

$$R(p^*) \geq T \implies \frac{R(p^*)}{T} \geq 1$$

The minimum escape velocity percentage $p^*$ satisfies:

$$p^* = \inf\!\left\{p \in [0, 100] : R(p) \geq T\right\}$$

This condition is the epigenetic analogue of Aubrey de Grey's longevity escape velocity (de Grey 2004): the minimum therapeutic intensity such that biological age does not accumulate faster than interventions can reverse it.

5.3 Monte Carlo Longevity Trajectories

Stochastic biological age trajectories are simulated via Monte Carlo integration. For each of $N_{\text{MC}}$ trials, biological age evolves in discrete time steps $\Delta t = 0.5$ years:

$$a_{\text{bio}}(t + \Delta t) = a_{\text{bio}}(t) + (1 + \eta)\,\Delta t, \qquad \eta \sim \mathcal{N}(0,\; \sigma_{\text{noise}}^2 \cdot (\Delta t)^2)$$

where $\sigma_{\text{noise}} = 2.0$ years reflects the known inter-individual variability in aging rate. At each scheduled intervention at time $t = n \cdot T$, a stochastic reversal is applied:

$$a_{\text{bio}}(t^+) = \max\!\left(18,\; a_{\text{bio}}(t) - \max\!\left(0,\; R(p) + \xi\right)\right), \qquad \xi \sim \mathcal{N}(0,\; 0.01 \cdot R(p)^2)$$

The floor at age 18 reflects the constraint that epigenetic reprogramming is not expected to revert to a pre-adult developmental state. The ensemble of $N_{\text{MC}} = 300$–$500$ trajectories yields posterior percentile bands (P5, P25, P50, P75, P95) for biological age as a function of time.

5.4 Intervention Landscape

A 2D sweep over intervention intensity $p \in [5%, 100%]$ and interval $T \in [1, 10]$ years computes the net biological age change over a fixed 30-year horizon:

$$\Delta a_{\text{bio}}^{\text{net}}(p, T) = 30 - R(p) \cdot \left\lfloor \frac{30}{T} \right\rfloor$$

Green regions of the resulting heatmap ($\Delta a^{\text{net}} &lt; 0$) identify $(p, T)$ pairs achieving escape velocity; red regions indicate net aging accumulation.


System Architecture

AnTiEnTRopY/
│
├── AnTiEnTRopY.py          ← Streamlit application (UI + orchestration)
│
├── EnTRopY.py              ← EpigeneticEntropy: Shannon H(β), MOI, drift CpGs
├── CloCk.py                ← BiologicalClock: ElasticNetCV, age acceleration
├── HRF_EpIgEnEtIc.py       ← HRFEpigenetic: resonance energy, PCA, FFT
├── ReVeRsAL.py             ← ReversalSimulator: intervention model, reversal curve
├── ImMoRtAlItY.py          ← ImmortalityEngine: escape velocity, Monte Carlo
│
└── requirements.txt        ← Python dependencies

Module Data Flow

  CSV (Methylation + Ages)
          │
          ▼
  ┌───────────────┐     ┌──────────────────┐     ┌──────────────────┐
  │  BiologicalClock│──▶│ EpigeneticEntropy │──▶ │   HRFEpigenetic  │
  │  ElasticNetCV  │     │  H(β), MOI, drift │     │  Resonance E_c   │
  └───────┬───────┘     └──────────────────┘     └──────────────────┘
          │
          ▼
  ┌────────────────┐     ┌────────────────────┐
  │ReversalSimulator│──▶ │ ImmortalityEngine  │
  │ R(p), δβ       │     │ p*, MC trajectories│
  └────────────────┘     └────────────────────┘

Computational Complexity

Module Dominant Operation Complexity
EnTRopY — per-sample entropy Vectorized H(β) over N CpGs $\mathcal{O}(MN)$
EnTRopY — CpG drift correlations Vectorized Pearson correlation $\mathcal{O}(MN)$
CloCk — feature selection Variance over M samples $\mathcal{O}(MN)$
CloCk — ElasticNetCV Coordinate descent, 5-fold $\mathcal{O}(MK \cdot \text{iter})$
HRFEpigenetic — PCA Truncated SVD $\mathcal{O}(M^2 d)$
HRFEpigenetic — resonance KNN + energy per sample $\mathcal{O}(M \cdot k \cdot d)$
ReVeRsAL — reversal curve 20-step intervention sweep $\mathcal{O}(20 \cdot K)$
ImMoRtAlItY — Monte Carlo 300–500 trajectory simulations $\mathcal{O}(N_{\text{MC}} \cdot T/\Delta t)$

Installation

Requirements

  • Python 3.9 or later
  • pip

Clone and Install

git clone https://github.com/Devanik21/AnTiEnTRopY.git
cd AnTiEnTRopY
pip install -r requirements.txt

Dependencies

streamlit>=1.32.0
pandas>=2.0.0
numpy>=1.24.0
scikit-learn>=1.3.0
scipy>=1.11.0
plotly>=5.18.0

Launch

streamlit run AnTiEnTRopY.py

The application will open at http://localhost:8501.


Data Format

AnTiEnTRopY expects a CSV file with the following structure:

Column Type Description
age float Chronological age (years)
cg00000029 float Methylation beta value, CpG site 1
cg00000165 float Methylation beta value, CpG site 2
... ... ...
cg27644521 float Methylation beta value, CpG site N

Beta values must lie in $[0, 1]$. Missing CpGs are imputed at $\beta = 0.5$ (maximum entropy, conservative assumption). There is no strict minimum on $N$, but performance improves substantially with $N \geq 1{,}000$ CpGs and $M \geq 50$ samples spanning a wide age range.

Compatible Platforms

The data format is compatible with output from:

  • Illumina HumanMethylation450K array (450,000 CpGs)
  • Illumina EPIC (850K) array (850,000 CpGs)
  • RRBS (Reduced Representation Bisulfite Sequencing) — with CpG identifier alignment
  • GEO datasets (NCBI Gene Expression Omnibus) after standard beta-value extraction and normalization (e.g., minfi, ChAMP)

Usage

Step-by-Step

  1. Upload your methylation CSV via the sidebar file uploader.
  2. Configure the sidebar parameters:
    • Number of variable CpGs for clock training ($K$, default 5000)
    • Young reference percentile for reversal simulator ($p$, default 20%)
    • Intervention level for batch reversal analysis
  3. The application automatically runs all five engines in sequence.
  4. Navigate the six analysis tabs:
Tab Engine Key Output
Biological Clock BiologicalClock Age predictions, IEAA scatter, CpG coefficients
Epigenetic Entropy EpigeneticEntropy MOI trajectory, entropy-age correlation, drift landscape
HRF Classifier HRFEpigenetic Age-state probabilities, resonance energy profile, FFT wave signature
Reversal Simulator ReversalSimulator Reversal curve, per-sample intervention response, drift CpG heatmap
Immortality Engine ImmortalityEngine Escape velocity, Monte Carlo trajectories, intervention landscape
Research Report All modules Compiled quantitative summary, downloadable TXT report

Interface & Visualizations

The platform renders the following interactive Plotly visualizations:

  • Biological Age vs. Chronological Age scatter — color-coded by IEAA
  • Age Acceleration Distribution — histogram of $\Delta^{\text{IEAA}}$ across the population
  • Top-N CpG Coefficient Waterfall — positive (hypermethylated) and negative (hypomethylated) predictors
  • Entropy Trajectory — mean entropy $\pm$ 1 SD, binned by age decade
  • Methylation Order Index Distribution — violin plot by age tertile
  • CpG Drift Landscape — scatter of age-correlation vs. mean entropy, annotated by drift type
  • Resonance Energy Profile — per-sample $E_{\text{Young}}$, $E_{\text{Middle}}$, $E_{\text{Old}}$ radar chart
  • Methylation Power Spectrum — FFT of the methylation profile, coherence ratio annotation
  • Reversal Curve — biological years reversed vs. $p_{\text{int}}%$ CpGs intervened
  • Monte Carlo Longevity Trajectories — P5/P25/P50/P75/P95 bands with chronological age reference
  • Intervention Landscape Heatmap — net $\Delta a_{\text{bio}}$ over 30 years as function of $p \times T$

All visualizations use a custom dark-lab CSS theme (--bg-primary: #030d12, --accent-green: #00e5a0) with IBM Plex Mono and DM Serif Display typography.


Key Findings from the Research Report

The Research Report tab synthesizes cross-module findings. Based on literature-representative Illumina 450K datasets, the following patterns are consistently observed:

  1. Linear entropy accumulation with age. Mean CpG entropy increases at approximately $1$–$5 \times 10^{-4}$ H units per year (Pearson $r \approx 0.6$–$0.9$, $p &lt; 10^{-10}$), confirming the epigenetic drift hypothesis. This is consistent with the stochastic epigenetic reprogramming model of Issa (2014) and the entropy landscape described in Hannum et al. (2013).

  2. Non-linear reversal response. The reversal curve $R(p)$ is concave — the first 20–30% of CpG sites intervened (highest drift sites) yields disproportionately large biological age reduction. This is mechanistically explained by the fact that clock-associated CpGs are heavily enriched at high-drift sites, and that the ElasticNet clock assigns large coefficients to precisely those sites.

  3. HRF resonance is age-state-specific. Young epigenomes ($\leq 35$ years) exhibit significantly higher coherence ratios (low-frequency spectral power dominance) and lower spectral entropy compared to Old epigenomes ($&gt; 55$ years), confirming that the wave-interference interpretation of methylation patterns carries discriminative biological signal.

  4. Escape velocity is mathematically achievable under realistic assumptions. At intervention intervals of $T = 1$–$2$ years, escape velocity is typically achievable at $p^* \approx 30$–$60%$ CpG reset intensity, depending on the dataset's reversal curve saturation point.

  5. Horvath clock CpG overlap validates feature selection. The top-variance ElasticNet features consistently include a non-trivial fraction of the 353 Horvath CpGs, providing biological convergent validity for the data-driven approach.


Limitations & Future Work

This platform is a research prototype and comes with important limitations:

  • Simulated reversal model. The ReversalSimulator applies a deterministic beta-value reset based on population averages. In biological reality, reprogramming efficiency is stochastic, cell-type-specific, and depends on the delivery mechanism (viral, chemical, mRNA). The model does not capture multi-step dynamics or partial epigenetic memory.

  • Clock trained on uploaded data. Unlike the pre-trained Horvath clock (353 CpGs, coefficients fixed), the BiologicalClock here is retrained on the user's dataset. This makes it adaptive but potentially overfitted on small datasets. At minimum, the cross-validated MAE should be examined.

  • HRF classification boundaries are arbitrary. The age class boundaries (Young $\leq 35$, Middle 36–55, Old $&gt; 55$) are fixed and do not adapt to the uploaded dataset's age distribution. Future work could learn these boundaries from the data.

  • FFT-based wave signature is phenomenological. The spectral interpretation of methylation beta profiles as "oscillations" is a mathematical analogy, not a literal biophysical wave. Caution is advised in over-interpreting dominant frequencies.

  • Single dataset, no external validation. All modules train and evaluate on the same uploaded dataset. Independent replication on held-out cohorts (e.g., GEO datasets) is essential before drawing biological conclusions.

Planned Extensions:

  • Pre-loaded Horvath 353-CpG coefficients for zero-shot biological age prediction
  • Integration with GrimAge (Lu et al. 2019) and DunedinPACE (Belsky et al. 2022) second-generation clocks
  • Multi-tissue, cell-type deconvolution via CIBERSORT-style reference panel decomposition
  • Chemical reprogramming simulation (Yang et al. 2023 cocktail modelling)
  • Time-series trajectory tracking for longitudinal methylation datasets

References

The following literature informs the mathematical and biological foundations of AnTiEnTRopY:

Epigenetic Clocks

Horvath, S. (2013). DNA methylation age of human tissues and cell types. Genome Biology, 14(10), R115. https://doi.org/10.1186/gb-2013-14-10-r115

Hannum, G. et al. (2013). Genome-wide methylation profiles reveal quantitative views of human aging rates. Molecular Cell, 49(2), 359–367. https://doi.org/10.1016/j.molcel.2012.10.016

Levine, M. E. et al. (2018). An epigenetic biomarker of aging for lifespan and healthspan. Aging, 10(4), 573–591. https://doi.org/10.18632/aging.101414

Horvath, S. & Raj, K. (2018). DNA methylation-based biomarkers and the epigenetic clock theory of ageing. Nature Reviews Genetics, 19(6), 371–384. https://doi.org/10.1038/s41576-018-0004-3

Lu, A. T. et al. (2019). DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging, 11(2), 303–327. https://doi.org/10.18632/aging.101684

Belsky, D. W. et al. (2022). DunedinPACE, a DNA methylation biomarker of the pace of aging. eLife, 11, e73420. https://doi.org/10.7554/eLife.73420

Partial Reprogramming

Takahashi, K. & Yamanaka, S. (2006). Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell, 126(4), 663–676. https://doi.org/10.1016/j.cell.2006.07.024

Ocampo, A. et al. (2016). In vivo amelioration of age-associated hallmarks by partial reprogramming. Cell, 167(7), 1719–1733. https://doi.org/10.1016/j.cell.2016.11.052

Lu, Y. et al. (2020). Reprogramming to recover youthful epigenetic information and restore vision. Nature, 588(7836), 124–129. https://doi.org/10.1038/s41586-020-2975-4

Yang, J.-H. et al. (2023). Chemically induced reprogramming to reverse cellular aging. Aging, 15(13), 5966–5989. https://doi.org/10.18632/aging.204896

Browder, K. C. et al. (2023). In vivo partial reprogramming alters age-associated molecular changes during physiological aging in mice. Nature Aging, 2(3), 243–253. https://doi.org/10.1038/s43587-022-00183-2

Information Theory of Aging

Sinclair, D. A. & LaPlante, M. D. (2019). Lifespan: Why We Age — and Why We Don't Have To. Atria Books.

Kane, A. E. & Sinclair, D. A. (2019). Epigenetic changes during aging and their reprogramming potential. Critical Reviews in Biochemistry and Molecular Biology, 54(1), 61–83. https://doi.org/10.1080/10409238.2019.1570075

Longevity Escape Velocity

de Grey, A. D. N. J. (2004). Escape velocity: Why the prospect of extreme human life extension matters now. PLOS Biology, 2(6), e187. https://doi.org/10.1371/journal.pbio.0020187

Epigenetic Entropy

Hannum, G. et al. (2013). ibid.

Issa, J.-P. J. (2014). Aging and epigenetic drift: a vicious cycle. Journal of Clinical Investigation, 124(1), 24–29. https://doi.org/10.1172/JCI69735

Johansson, Å. et al. (2013). Continuous aging of the human DNA methylome throughout the human lifespan. PLOS ONE, 8(6), e67378. https://doi.org/10.1371/journal.pone.0067378

Machine Learning for Epigenomics

Friedman, J., Hastie, T. & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01

Zhang, W. et al. (2023). Aging clocks, entropy, and the limits of age-reversal. Developmental Cell, 58(4), 227–237.

HRF Framework (Original)

Debnath, D. (2025). Harmonic Resonance Field: A wave-interference framework for neural state classification. (Preprint / Working Paper)


Author

Devanik21 (Debanik)

Research in computational biology, epigenetic aging, and novel ML frameworks for biomedical signal analysis.

GitHub


AnTiEnTRopY — because entropy only wins if you let it.

Apache 2.0 License | Open to collaboration and extension


Extended Theory

VI. Thermodynamic Interpretation of Epigenetic Aging

6.1 The Methylome as a Thermodynamic System

From a statistical mechanics perspective, each CpG site can be modeled as a two-state system with states ${0_{\text{unmethylated}},; 1_{\text{methylated}}}$. The measured beta value $\beta \in [0,1]$ is the ensemble-average occupancy of the methylated state across cells in the sample — directly analogous to mean magnetization in an Ising spin system.

The per-site configurational entropy contribution (up to a factor of $k_B \ln 2$) is exactly $H(\beta)$. A CpG in an ordered state ($\beta \approx 0$ or $1$) occupies a free-energy minimum; a CpG drifting toward $\beta \approx 0.5$ climbs toward the free-energy maximum of that two-state system. Aging, in this model, is a gradual free-energy increase in the methylation manifold — a slow thermalization of the epigenomic "crystal" toward a disordered liquid state.

The total informational entropy produced per decade is:

$$\Delta S_{\text{decade}} = N \cdot \lambda_H \cdot 10 \quad \text{(bits)}$$

where $\lambda_H$ is the entropy rate from Section I.3 and $N$ is the number of CpG sites. For $N \approx 450{,}000$ and $\lambda_H \approx 3 \times 10^{-4}$ H/year, this represents approximately $\Delta S \approx 1{,}350$ bits of informational order lost per decade.

6.2 Negentropy Gradient and Reprogramming as Maxwell's Demon

The negentropy (negative entropy, or informational order) of the methylome at age $t$ is:

$$J(t) = N \cdot (1 - \bar{H}(t)) = N \cdot \text{MOI}(t)$$

The rate of negentropy loss:

$$\frac{dJ}{dt} = -N\lambda_H < 0$$

is the fundamental thermodynamic aging rate. Partial reprogramming, in this framework, acts as a biological Maxwell's Demon — investing metabolic work (OSKM transcription, chromatin remodeling) to restore local order in specific CpGs, reducing entropy at high-drift sites. The minimum thermodynamic work required to reset $n$ sites from $\beta_k^{\text{old}}$ to $\beta_k^{\text{young}}$ is bounded below by the Landauer limit:

$$W_{\min} \geq k_B T \ln 2 \cdot \sum_{k \in \mathcal{I}} \left[H(\beta_k^{\text{old}}) - H(\beta_k^{\text{young}})\right]$$

This provides a theoretical lower bound on the metabolic cost of epigenetic rejuvenation — an estimate not previously formalized in the partial reprogramming literature.


VII. Stochastic Epigenetic Drift Model

7.1 Ornstein-Uhlenbeck Dynamics per CpG

The time evolution of a CpG site's population-level beta value can be modeled as a continuous-time stochastic process. Let $\beta_k(t)$ denote the mean methylation at site $k$ in a cell population at age $t$. Under stochastic DNMT maintenance errors and age-associated DNMT3A/3B dysregulation, the dynamics follow an Itô SDE:

$$d\beta_k = \kappa_k\left[\theta_k(t) - \beta_k\right]dt + \sigma_k\,dW_t$$

where $W_t$ is a standard Wiener process, $\theta_k(t) = \theta_k^0 + \gamma_k t$ is the age-dependent drift attractor, $\kappa_k &gt; 0$ is the mean-reversion speed, and $\sigma_k$ is the site-specific noise amplitude. This is an Ornstein-Uhlenbeck process with a linearly shifting mean. Its stationary variance is:

$$\hat{\sigma}^2_k = \frac{\sigma_k^2}{2\kappa_k}$$

The ensemble mean entropy $\bar{H}(t) = \frac{1}{N}\sum_k \mathbb{E}[H(\beta_k(t))]$ increases with time, recovering the empirically observed linear aging trajectory.

7.2 Site Independence Assumption and Total Correlation

AnTiEnTRopY treats CpG sites as independent in entropy computation:

$$H_{\text{approx}} = \sum_{k=1}^{N} H(\beta_k)$$

The true joint entropy is less than this sum by the total correlation:

$$C = \sum_{k=1}^{N} H(\beta_k) - H_{\text{joint}} \geq 0$$

Methylation co-variation within CpG islands, shores, and PMDs contributes substantially to $C$. Block-level entropy measures that account for this genomic correlation structure are a planned extension.


VIII. Regularization Theory of the Sparse Clock

8.1 Elastic Net as a Biologically-Motivated Prior

The elastic net penalty combines L1 and L2 regularization. For the biological clock problem:

  • L1 sparsity reflects the biological reality that only a small fraction of CpG sites are genuine aging biomarkers. The Horvath clock uses 353 of ~450,000 sites; L1 drives the remainder to exactly zero.
  • L2 stability handles correlated CpGs in co-methylation blocks. Pure Lasso arbitrarily selects one CpG from a correlated group; elastic net distributes weight across all of them.

A CpG $k$ enters the active set when its correlation with the current residual exceeds the L1 threshold:

$$\left|\frac{1}{M}\mathbf{x}_k^\top \left(\mathbf{y} - \mathbf{X}_{\mathcal{A}}\hat{\boldsymbol{w}}_{\mathcal{A}}\right)\right| > \alpha\rho$$

The L2 term then shrinks active coefficients by $1/(1 + \alpha(1-\rho))$.

8.2 Effective Degrees of Freedom

The effective model complexity of the elastic net fit is:

$$\text{df}(\hat{\boldsymbol{w}}) \approx \text{tr}\!\left[\mathbf{X}_{\mathcal{A}}\left(\mathbf{X}_{\mathcal{A}}^\top \mathbf{X}_{\mathcal{A}} + \alpha(1-\rho)\mathbf{I}\right)^{-1}\mathbf{X}_{\mathcal{A}}^\top\right]$$

where $\mathcal{A} = {k : \hat{w}_k \neq 0}$. This is always less than $|\mathcal{A}|$ — the L2 term compresses the effective parameter count below the number of non-zero CpGs.


IX. PCA Geometry and the Epigenetic Manifold

9.1 Intrinsic Dimensionality

High-dimensional methylation data $\mathbf{X} \in \mathbb{R}^{M \times N}$ (with $N \gg M$) lies on a low-dimensional manifold. The truncated SVD:

$$\mathbf{X}_{\text{centered}} = \mathbf{U}_{:d}\mathbf{\Sigma}_{:d}\mathbf{V}_{:d}^\top + \mathbf{E}$$

captures the $d = 200$ directions of maximum variance. The explained variance ratio for component $\ell$ is:

$$\text{EVR}_\ell = \frac{\sigma_\ell^2}{\sum_{\ell'} \sigma_{\ell'}^2}$$

In typical Illumina 450K aging datasets, PC1 often reflects cell-type composition; PCs 3–20 capture age-related variance. The HRF classifier operates in $\mathbb{R}^{200}$, capturing $\gtrsim 80%$ of inter-sample methylation variance while removing noise.

9.2 Aging as a Geodesic

If the epigenetic manifold is approximately smooth and Riemannian, biological aging is a geodesic trajectory from the young to old attractor in the reduced PCA space:

$$\boldsymbol{z}(t) = \boldsymbol{z}_{\text{young}} + \frac{t}{T_{\max}}\left(\boldsymbol{z}_{\text{old}} - \boldsymbol{z}_{\text{young}}\right) + \boldsymbol{\epsilon}(t)$$

where $\boldsymbol{\epsilon}(t)$ is stochastic deviation from the mean aging trajectory. Partial reprogramming corresponds to moving a sample's PCA projection back toward $\boldsymbol{z}_{\text{young}}$ — a displacement against the aging geodesic.


X. Information-Theoretic Properties of the Clock

10.1 Cramér-Rao Bound on Age Estimation

The minimum achievable MAE of any unbiased age estimator is bounded below by:

$$\text{MAE} \geq \sqrt{\frac{1}{\mathcal{I}(\boldsymbol{\beta}, A)}}$$

where $\mathcal{I}(\boldsymbol{\beta}, A)$ is the Fisher information of the methylation vector about chronological age. The empirical cross-validated MAE represents how closely the sparse ElasticNet estimator approaches this bound.

10.2 Redundancy and Mutual Information

Because many CpG sites carry correlated age information, the aggregate mutual information satisfies:

$$I(\boldsymbol{\beta}_\mathcal{S};\, A) \leq \sum_{k \in \mathcal{S}} I(\beta_k;\, A)$$

The excess is the redundancy $\mathcal{R} = \sum_k I(\beta_k; A) - I(\boldsymbol{\beta}_\mathcal{S}; A) \geq 0$. The elastic net implicitly manages this by jointly shrinking correlated predictors, preventing double-counting of shared age information.

10.3 Clock Precision as an Information Rate

The Horvath clock's median MAE of ~3.6 years implies approximately:

$$\log_2\!\left(\frac{a_{\max} - a_{\min}}{\text{MAE}}\right) \approx \log_2\!\left(\frac{100}{3.6}\right) \approx 4.8 \text{ bits}$$

of age information recovered from the 353-CpG signal — a precise quantification of how much biological aging information is encoded in a sparse methylation signature.


XI. Monte Carlo Longevity: Convergence and Statistics

11.1 Almost Sure Convergence

By the strong law of large numbers, for each time point $t$:

$$\hat{\mu}(t) = \frac{1}{N_{\text{MC}}}\sum_{i=1}^{N_{\text{MC}}} a_{\text{bio}}^{(i)}(t) \xrightarrow{\text{a.s.}} \mathbb{E}[a_{\text{bio}}(t)]$$

11.2 Monte Carlo Standard Error

$$\text{SE}(\hat{\mu}(t)) = \frac{\hat{\sigma}(t)}{\sqrt{N_{\text{MC}}}}$$

At $N_{\text{MC}} = 300$ and $\hat{\sigma}(t) \approx 5$ years, SE $\approx 0.29$ years — well below clinical resolution.

11.3 Variance Growth Under Stochastic Interventions

The predictive interval width grows with time due to variance accumulation:

$$\text{Var}(a_{\text{bio}}(t)) \approx \sigma_{\text{noise}}^2 \cdot t + \left\lfloor\frac{t}{T}\right\rfloor \cdot \sigma_{\text{reversal}}^2$$

where $\sigma_{\text{reversal}}^2 = 0.01 \cdot R(p)^2$ is the intervention noise variance.


XII. HRF Wave Interference: Deeper Analysis

12.1 Gabor-Kernel Interpretation

Each resonance basis function $\phi_{c,i}(\mathbf{q}) = e^{-\gamma |\mathbf{q}-\mathbf{x}_i|^2}(1+\cos(\omega_c|\mathbf{q}-\mathbf{x}_i|))$ is a Gabor-like function — the product of a Gaussian envelope and an oscillatory term — providing simultaneous localization in both position and frequency. The HRF classifier is therefore a Gabor-kernel machine applied to the epigenomic PCA manifold.

In the limit $\omega_c \to 0$, the energy reduces to:

$$E_c(\mathbf{q})\big|_{\omega_c=0} = 2\sum_{i=1}^{k} \exp(-\gamma\,\|\mathbf{q} - \mathbf{x}_i^{(c)}\|^2)$$

— exactly a Gaussian kernel density estimate of the class-conditional density. Classification by $\arg\max_c E_c$ then approximates Gaussian Naive Bayes in the low-$\omega$ limit.

12.2 Constructive and Destructive Interference Zones

Constructive interference occurs when the query sits at resonant distances from training samples: $\omega_c \cdot r_i = 2\pi n$, $n \in \mathbb{Z}^+$. Destructive interference occurs at $\omega_c \cdot r_i = \pi(2n+1)$. The class-specific frequencies $\omega_c = \omega_0(c+1)$ mean the Young, Middle, and Old classes resonate at distinct spatial scales in the PCA space, creating non-overlapping zones of maximum energy — the geometric basis for the classifier's discriminative power.


XIII. Connection to Survival Analysis and Control Theory

13.1 Cox Proportional Hazards

The Intrinsic Epigenetic Age Acceleration $\Delta^{\text{IEAA}}$ connects to mortality via the proportional hazards model:

$$h(a, \Delta) = h_0(a) \cdot \exp(\theta \cdot \Delta)$$

Empirical estimates (Chen et al. 2016; Marioni et al. 2015) suggest $\theta \approx 0.04$–$0.10$, meaning a 10-year epigenetic age acceleration confers approximately $e^{1.0} \approx 2.7\times$ elevated all-cause mortality hazard.

13.2 Escape Velocity as a Stochastic Control Problem

Define biological age as state $x(t) = a_{\text{bio}}(t)$ and intervention reversal as control $u_n = R(p)$ applied at discrete times $t_n = nT$. The controlled system is:

$$x(t_{n+1}) = x(t_n) + T - u_n + w_n, \qquad w_n \sim \mathcal{N}(0, \sigma_w^2)$$

Escape velocity is the stochastic stability condition $\mathbb{E}[x(t_{n+1}) - x(t_n)] \leq 0$, i.e. $T - R(p) \leq 0$. The minimum escape intensity $p^*$ is thus the minimum-effort stabilizing controller for this one-dimensional stochastic system. The intervention landscape heatmap visualizes the controllability region in $(p, T)$ parameter space, with the escape boundary ${(p,T) : R(p) = T}$ delineating stable (controlled) from unstable (uncontrolled) biological aging dynamics.


XIV. Comparative Clock Landscape

Clock Year CpGs MAE (years) Training Data Key Feature
Horvath (pan-tissue) 2013 353 ~3.6 82 datasets, 8,000 samples Multi-tissue universality
Hannum (blood) 2013 71 ~3.9 656 whole-blood samples Blood-specific accuracy
PhenoAge 2018 513 ~4.0 NHANES + clinical biomarkers Phenotypic age (mortality)
GrimAge 2019 ~1,000 ~3.0 Plasma proteins as targets Lifespan/healthspan prediction
DunedinPACE 2022 173 Longitudinal (Dunedin cohort) Pace of aging ($\Delta$BioAge/$\Delta t$)
AnTiEnTRopY 2025 Variable ($K$) Dataset-dependent User-uploaded Entropy + resonance + escape velocity

Glossary of Key Terms

Term Definition
Beta value $\beta$ DNA methylation level at a CpG site; $\beta \in [0,1]$
Binary entropy $H(\beta)$ Shannon entropy of a Bernoulli($\beta$) distribution; peaks at $\beta = 0.5$
CpG site Cytosine-phosphate-Guanine dinucleotide; primary site of DNA methylation in mammals
DNMT DNA methyltransferase; enzyme for methylation maintenance (DNMT1) and de novo methylation (DNMT3A/B)
Elastic Net Penalized regression combining L1 (sparsity) and L2 (stability) regularization
IEAA Intrinsic Epigenetic Age Acceleration; residual of bio age on chrono age
Epigenetic clock Regression model predicting chronological age from CpG beta values
Escape velocity $p^*$ Minimum intervention intensity such that reversal rate $\geq$ aging rate
HRF Harmonic Resonance Field; wave-interference classification framework (Debnath 2025)
Illumina 450K / EPIC Microarray platforms measuring ~450,000 / ~850,000 CpG sites genome-wide
MOI Methylation Order Index; $1 - \bar{H}(\beta)$; measures epigenomic order on [0,1]
OSKM Oct4, Sox2, Klf4, c-Myc; the four Yamanaka reprogramming factors
Partial reprogramming Transient OSKM induction to rejuvenate epigenetic state without full dedifferentiation
PCA Principal Component Analysis; linear dimensionality reduction via SVD
PMD Partially Methylated Domain; intermediate-methylation genomic regions that expand with aging
PRC2 Polycomb Repressive Complex 2; deposits H3K27me3 at developmental gene promoters
Spectral entropy Shannon entropy of the normalized FFT power spectrum; measures methylation signal disorder
Total correlation $C$ Excess of summed marginal entropies over joint entropy; measures CpG co-methylation redundancy
Landauer limit Minimum thermodynamic work to erase one bit of information: $k_B T \ln 2 \approx 2.85 \times 10^{-21}$ J at 310 K

AnTiEnTRopY — because entropy only wins if you let it.

Apache 2.0 License | Open to collaboration and extension

About

AntiEntropy models aging as stochastic entropy drift in CpG methylation state space. It integrates ElasticNetCV clocks, site-wise Shannon entropy 𝐻 ( 𝛽 ) H(β), PCA spectral decomposition, and HRF-based resonance to quantify negentropy gradients and simulate control-driven reversal toward low-entropy attractors.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages