Adaptive Pharmacokinetic Model Discovery Engine
What is APMODE? · Capability Status · Quick Start · Formular DSL · Architecture · Governance · Benchmarks · CLI · HTTP API · Citation
APMODE is a governed meta-system that composes five population PK modeling paradigms into a single discovery workflow — so pharmacometricians can evaluate classical, automated, neural, Bayesian, and hybrid approaches under one roof with consistent evidence standards.
Five paradigms, one pipeline. Classical NLME (nlmixr2), automated structural search, hybrid mechanistic-NODE (JAX/Diffrax), agentic LLM model construction (Phase 3), and Bayesian PK (Stan/Torsten via cmdstanpy, Phase 2+) — each dispatched, evaluated, and ranked through a unified governance funnel.
Evidence gates. Every candidate passes Gate 1 (technical validity), Gate 2 (lane admissibility), Gate 2.5 (ICH M15 credibility), and Gate 3 (cross-paradigm ranking) before it can be recommended. The Bayesian backend adds MCMC-specific Gate 1 thresholds (R̂ ≤ 1.01, bulk/tail ESS ≥ 400, divergences = 0, E-BFMI ≥ 0.3, Pareto-k ≤ 0.7) and a Gate 2 prior-justification artifact aligned with FDA's Jan 2026 draft Bayesian methodology guidance (FDA-2025-D-3217). NODE models are never eligible for regulatory submission — this is a hard rule, not a tunable weight.
Reproducibility is the unit of output. Every run emits a versioned JSON bundle — data manifest, search trajectory, gate decisions, candidate lineage DAG, compiled specs — so any result can be audited or replayed.
Formular — a typed PK DSL — is the control surface. Models are specified in Formular, a five-block grammar (Absorption × Distribution × Elimination × Variability × Observation) plus a sixth semantic axis — priors — populated via the SetPrior transform rather than grammar text. Specs compile to a typed AST, are validated against pharmacometric constraints, and lower to backend-specific code (nlmixr2 R, Stan/Torsten, JAX/Diffrax). The agentic LLM backend (Phase 3) operates exclusively through the 10 typed Formular transforms — including SetPrior for Bayesian workflows — it cannot emit raw code.
Status: v0.6.1-rc1 (2026-04-25) — 0.6.1 release candidate. 2564 tests passing (
-m "not live");mypy --strictclean;ruffclean. Supports Python 3.12–3.14. Gate policy schema 0.6.0; profiler policy 2.1.0 (manifest_schema_version = 2). v0.6.1 ships the FastAPI HTTP API surface (apmode serve,POST /runs, cancellation lifecycle), Suite C Phase-1 MLE + Bayesian fixtures with weekly NPE scoring, SOTA absorption preview forms, and the RO-Crate projector hardening pass. Reproducibility bundles continue to carry a_COMPLETEsentinel with a SHA-256 digest;apmode validaterefuses unsealed bundles. See CHANGELOG.md for the full release-candidate changes.
A reader-facing snapshot of what is runnable today vs. what is partial / planned. Project-internal phasing (Phase 1 / 2 / 3 from PRD §8) is preserved in docs/PRD_APMODE_v0.3.md; this table is the canonical user-facing version.
| Component | Status | Notes |
|---|---|---|
| Classical NLME (nlmixr2 SAEM/FOCEi) | ✅ Stable | Submission-lane primary; Suite A + Suite C MLE roster |
| Automated structural search | ✅ Stable | Evidence-driven, deterministic budget (≤ 30–50 candidates/run) |
| Reproducibility bundle + RO-Crate projection | ✅ Stable | Sealed _COMPLETE SHA-256; WRROC v0.5 export + ZIP round-trip |
| Bayesian PK (Stan/Torsten via cmdstanpy) | ✅ Available | MCMC Gate 1 thresholds shipped (R̂/ESS/divergences/E-BFMI/Pareto-k); Suite C Bayesian roster (vancomycin Roberts 2011) ships in v0.6 |
| Hybrid mechanistic-NODE (JAX/Diffrax) | ⚠ Available, oral-only | Pooled-NLL training (no per-subject random effects yet); never submission-eligible (PRD §3 hard rule) |
| Agentic LLM backend | ⚠ Available, opt-in | Off by default (--agentic); discovery / optimization lanes only; ≤ 25 typed-transform iterations per run |
| Optimization lane (LORO-CV Gate 2) | ✅ Available | Gate 2 enforces LORO-CV folds, NPDE mean/variance bands, VPC concordance; policies/optimization.json is a versioned artefact; cross-paradigm scoring-contract ranker shipped in the 0.6 release-candidate line |
HTTP API + apmode serve |
✅ Available | Loopback by default; refuses non-loopback without --allow-public; full endpoint contract under HTTP API. Long-poll status (?wait=N) is a documented future enhancement, not in v0.6.1-rc1 |
All numeric badges + status counts are rewritten from the source tree by scripts/sync_readme.py — see Keeping the README honest.
- Python: 3.12, 3.13, or 3.14
- Package manager: uv
- Optional (classical): R 4.4+ with
nlmixr2,rxode2,jsonlite,lotri,mice,missRanger— mock R subprocess tests work without R - Optional (Bayesian): CmdStan 2.36+ via
cmdstanpy.install_cmdstan()for the Stan/Torsten backend - Optional (agentic): An LLM API key (
ANTHROPIC_API_KEY,OPENAI_API_KEY,GEMINI_API_KEY,OPENROUTER_API_KEY) or a local Ollama install - Optional (HTTP API):
uv sync --extra apipulls FastAPI + Uvicorn + aiosqlite. See the HTTP API section for the endpoint contract, run-store semantics, and cancellation lifecycle.
git clone https://github.com/biostochastics/apmode.git
cd apmode
# Core install
uv sync --all-extras
# Verify toolchain (R/nlmixr2, CmdStan, Python deps, LLM keys)
uv run apmode doctor
# Confirm everything is healthy in one line
uv run apmode doctor --json | jq '.components[] | select(.status!="ok")' # empty output = all green
# Verify CLI entry point
uv run apmode --helpAPMODE ingests NONMEM-style CSV through a Pandera schema. To save first-run debugging:
- Required columns —
ID(subject),TIME(numeric),DV(observed concentration),AMT(dose amount; 0 for observation rows),EVID(0 = observation, 1 = dose, 2 = other event). - Optional event fields —
CMT(compartment),RATE(infusion rate; 0 for bolus),ADDL/II(additional dose count + interdose interval),SS(steady-state flag),MDV(missing-DV flag),BLQ(below-LLOQ indicator),DVID(multi-analyte routing — used by FREM and DVID-multiplexed observations). - Categorical covariates — canonical encoding is
{0, 1};{1, 2},M/F,Yes/No,Pos/Neg,Case/Control, etc. are auto-detected and remapped (full table in Missing-Data Handling). Multi-level categoricals must be one-hot encoded by the analyst. - Units — the NCA estimator is unit-aware for the common
mg → ng/mLconvention (see Phase 2 — root candidates). No explicit unit column is required.
Datasets in the public registry (apmode datasets) ship pre-conformed; use them as schema templates.
This is the canonical "first hour with APMODE" — it exercises the full pipeline front-to-back and lands you at a readable regulatory report.
# 1. Browse the registry (14 datasets; 5 real + 9 simulated ground-truth)
uv run apmode datasets
# 2. Download a single-dose theophylline dataset
uv run apmode datasets theo_sd -o ./data
# 3. Explore before you commit to a full run — profiler + NCA + search-space preview
uv run apmode explore ./data/theo_sd.csv
# 4. Full submission-lane pipeline (classical NLME, deterministic budget)
# theo_sd (12 subjects) ≈ 3-5 min wall time at -j 4
# mavoglurant (120 subjects) ≈ 60-90 min; 1000+ subjects scales to several hours
uv run apmode run ./data/theo_sd.csv --lane submission --parallel-models 4 -o ./runs
# 5. Inspect the best candidate, then trace the whole DAG
BUNDLE=$(uv run apmode ls --sort bic --limit 1 --format path)
uv run apmode inspect "$BUNDLE"
uv run apmode log "$BUNDLE" --top 3
uv run apmode graph "$BUNDLE" --format tree
# 6. Open the regulatory report
uv run apmode report "$BUNDLE" # HTML in browser
uv run apmode report "$BUNDLE" --format md | less # Markdown in pager
# 7. Validate the bundle is complete + replayable
uv run apmode validate "$BUNDLE"# Discovery lane — classical + NODE + agentic LLM admissible
uv run apmode run ./data/theo_sd.csv --lane discovery
# Bayesian backend on Discovery (requires bayesian extras + CmdStan)
uv sync --extra bayesian
uv run python -c "import cmdstanpy; cmdstanpy.install_cmdstan()"
uv run apmode run ./data/theo_sd.csv --lane discovery \
--backend bayesian_stan --bayes-chains 4 --bayes-warmup 1000 --bayes-sampling 1000
# Agentic LLM backend (discovery/optimization lanes only; OFF by default)
uv run apmode run ./data/theo_sd.csv --lane discovery --agentic --provider anthropic
# Optimization lane — adds LORO-CV Gate 2
uv run apmode run ./data/theo_sd.csv --lane optimization--resume-agentic skips the multi-hour classical SAEM stage and restarts the LLM loop from the existing classical_checkpoint.json in the bundle directory. Use this after an API 5xx, quota trip, or KeyboardInterrupt:
uv run apmode run ./data/theo_sd.csv --lane discovery --agentic --resume-agentic \
-o ./runs/run_<timestamp>uv run apmode diff ./runs/run_a ./runs/run_b # side-by-side summary
uv run apmode diff ./runs/run_a ./runs/run_b --json # structured diff for CI
uv run apmode lineage "$BUNDLE" <candidate_id> # transform chain root→candidate
uv run apmode trace "$BUNDLE" # agentic iteration summary
uv run apmode trace "$BUNDLE" --iteration 5 # one iteration in detail
uv run apmode trace "$BUNDLE" --cost # token + $ rollup
uv run apmode trace "$BUNDLE" --json # machine-readableuv run apmode policies # list versions for all lanes
uv run apmode policies submission # inspect submission thresholds
uv run apmode policies --validate # CI-grade schema + constraint check
uv run apmode policies --json | jq # machine-readable policy manifestEvery read command honors --json, emitting a stable
{"ok": bool, ...} envelope on stdout (Rich output suppressed) so
CI / jq pipelines have a single contract:
uv run apmode validate "$BUNDLE" --json | jq '.ok'
uv run apmode inspect "$BUNDLE" --json | jq '.gates.gate3'
uv run apmode log "$BUNDLE" --top 5 --json | jq '.candidates[].rank_entry.bic'
uv run apmode datasets --json | jq '.datasets[] | select(.route=="oral") | .name'
uv run apmode doctor --json | jq '.components[] | select(.status!="ok")'
uv run apmode report "$BUNDLE" --json
uv run apmode trace "$BUNDLE" --json
uv run apmode lineage "$BUNDLE" cand_a3f8 --jsonapmode ls and apmode graph use --format {table,path,json} /
--format {tree,dot,mermaid,json} because they offer multiple text
formats; pick whichever is more ergonomic.
apmode run honors the APMODE_* family of env vars when the matching
flag is omitted (each shows up in apmode run --help under [env var: …]):
export APMODE_LANE=discovery
export APMODE_OUTPUT_DIR=./runs
export APMODE_SEED=753849
export APMODE_TIMEOUT=900
export APMODE_PARALLEL_MODELS=4
export APMODE_BACKEND=nlmixr2 # or bayesian_stan
export APMODE_PROVIDER=anthropic # used only with --agentic
export APMODE_AGENTIC_MAX_ITER=15
export APMODE_POLICY=./policies/custom.json # overrides policies/<lane>.json
uv run apmode run ./data/trial.csv # picks up all of the aboveExisting APMODE_POLICIES_DIR (resolved in src/apmode/paths.py)
continues to point the policy loader at an alternative directory.
uv run pytest tests/ -q # <!-- apmode:AUTO:tests -->2581<!-- apmode:/AUTO:tests --> collected
uv run pytest tests/ -q -m "not live" # <!-- apmode:AUTO:tests_nonlive -->2564<!-- apmode:/AUTO:tests_nonlive --> skip live LLM tests
uv run mypy src/apmode/ --strict # type checking
uv run ruff check src/apmode/ tests/ # linting
uv run python scripts/sync_readme.py --check # README ↔ codebase drift guard
# Bayesian smoke test against Boeckmann 1994 theophylline (~5 min)
uv run python scripts/bayesian_smoke_theophylline.pyFormular is APMODE's typed domain-specific language for specifying population PK models. It is the single source of truth for model structure: every model is expressed as a Formular specification, compiled to a typed AST, validated against pharmacometric constraints, and lowered to backend-specific code (nlmixr2 R, Stan, or JAX/Diffrax).
Full reference:
docs/FORMULAR.md— grammar, compilation pipeline, constraint templates, semantic validation rules, extensibility.
Why a dedicated language? (1) Safety boundary for agentic AI — the Phase 3 LLM backend operates exclusively through Formular transforms, it cannot emit raw code. (2) Backend independence — one spec, multiple targets. (3) Reproducibility — the compiled DSLSpec is the serializable unit in the reproducibility bundle. (4) Constraint enforcement — pharmacometric invariants are checked at the language level before any code generation.
model {
absorption: Transit(n=4, ktr=2.0, ka=1.0)
distribution: TwoCmt(V1=30.0, V2=40.0, Q=5.0)
elimination: ParallelLinearMM(CL=2.0, Vmax=50.0, Km=5.0)
variability: {
IIV(params=[CL, V1, ka], structure=block)
CovariateLink(param=CL, covariate=WT, form=power)
}
observation: Combined(sigma_prop=0.1, sigma_add=0.5)
}
Compilation: Formular text → Lark parser (Earley) → DSLTransformer → DSLSpec (Pydantic AST) → Semantic validator → Backend emitter
Entry point: apmode.dsl.grammar.compile_dsl(text) → DSLSpec
| DSL Axis | Supported Modules |
|---|---|
| Absorption | IVBolus, FirstOrder, ZeroOrder, LaggedFirstOrder, Transit(n), MixedFirstZero, NODE_Absorption |
| Distribution | OneCmt, TwoCmt, ThreeCmt, TMDD_Core, TMDD_QSS |
| Elimination | Linear, MichaelisMenten, ParallelLinearMM, TimeVarying(kdecay, decay_fn), NODE_Elimination |
| Variability | IIV (diagonal/block), IOV (ByStudy/ByVisit/ByDoseEpoch/Custom), CovariateLink (power/exponential/linear/categorical/maturation) |
| Observation | Proportional, Additive, Combined, BLQ_M3, BLQ_M4 (with composable error_model) |
Priors (semantic axis; set via SetPrior, no grammar block) |
Normal, LogNormal, HalfNormal, HalfCauchy, Gamma, InvGamma, Beta, LKJ, Mixture, HistoricalBorrowing (Schmidli 2014 robust MAP) |
NODE constraint templates: monotone_increasing, monotone_decreasing, bounded_positive, saturable, unconstrained_smooth — with lane-dependent dim ceilings (≤8 Discovery, ≤4 Optimization, excluded from Submission).
Formular transforms (agentic LLM admissible operations): 10 typed transforms produce new DSLSpec instances — swap_module, add_covariate_link, adjust_variability, set_transit_n, toggle_lag, replace_with_node, set_prior. set_prior is parameterization-schema validated: only HalfNormal/HalfCauchy/Gamma/InvGamma families are admissible on IIV ω and residual σ targets; only Normal/LogNormal/Mixture/HistoricalBorrowing on log-scale structural params; LKJ only on correlation matrices. Invalid pairs are rejected at compile time so the LLM cannot propose nonsense priors. The transform count badge above is auto-synced from src/apmode/dsl/transforms.py + src/apmode/dsl/prior_transforms.py — if the number drifts, scripts/sync_readme.py --check fails CI.
Candidate generation is APMODE's core algorithmic contribution. It is an evidence-driven, two-phase search with a fixed budget that turns a dataset into 30-50 typed model specifications ready for backend fitting. The budget is deterministic (no tunable knobs, no unbounded iteration), pruned by what the profiler observed in the data, and grown once via warm-started children of the best root candidates.
The Data Profiler emits an EvidenceManifest describing what the dataset supports. SearchSpace.from_manifest() translates each flag into allowed modules and constraints:
| Evidence flag | Effect on search space |
|---|---|
nonlinear_clearance_signature=true |
Adds MichaelisMenten + ParallelLinearMM to elimination set |
absorption_complexity="multi-phase" |
Adds Transit(n) + LaggedFirstOrder to absorption set |
absorption_complexity="lag-signature" |
Adds LaggedFirstOrder |
richness_category="sparse" + absorption_coverage="inadequate" |
Removes NODE from backends (data insufficient) |
identifiability_ceiling="low" |
Caps compartment count at 1 |
identifiability_ceiling="medium" |
Caps at 2 compartments |
identifiability_ceiling="high" |
Allows up to 3 compartments |
covariate_burden > 0 |
Tracked for SCM search and Gate 2.5 credibility checks |
blq_burden > policy.blq_m3_threshold |
Selects BLQ method (M7+ below, M3 above) per MissingDataPolicy |
covariate_missingness.fraction > 0 |
Triggers policy-driven MissingDataDirective: MI-PMM / MI-missRanger / FREM |
time_varying_covariates=true |
Prefers FREM over MI-PMM (Nyberg 2024) |
| Lane = Submission | Hard-blocks NODE backends regardless of data |
This is a disqualifying filter, not a weighted score. If the profiler says the data doesn't support 3-cmt estimation, no 3-cmt candidate is generated — saving hours of wasted fitting on unidentifiable models.
Missing data is resolved by a lane-tiered policy (policies/{submission,discovery,optimization}.json → missing_data) that turns the EvidenceManifest into a binding MissingDataDirective. Two independent paths:
Covariate missingness (src/apmode/data/missing_data.py::resolve_directive):
- No missingness →
exclude(no directive action). - Time-varying covariates → FREM (handled in the NLME likelihood — avoids per-occasion imputation and Ω pooling). Rationale: Nyberg 2024, Jonsson 2024.
fraction > frem_preferred_above→ FREM (MI pooling of Ω becomes the dominant error source at high missingness).fraction > mi_pmm_max_missingness(between thresholds) → FREM (prefer single-fit over large-m MI).- Correlated covariates +
missforest_fallback→ MI-missRanger (nonlinear relations; Bräm 2022 concept,missRanger2.6.x ranger-backed implementation). - Otherwise → MI-PMM with
m = policy.m_imputations(Wijk 2025 DiVA).
BLQ handling:
blq_burden > policy.blq_m3_thresholdorblq_force_m3=true→ M3 (likelihood-based; Beal 2001).- Otherwise → M7+ (impute 0 + inflated additive residual error; Wijk 2025).
Lane tiers:
| Lane | m | adaptive | BLQ M3 threshold | Rank penalty |
|---|---|---|---|---|
| Submission | 20 | → 40 | 5% | 0.0 (hard floor only) |
| Discovery | 5 | off | 15% | 0.25 |
| Optimization | 10 | → 20 | 10% | 0.5 |
Imputation stability (Gate 1). When MI is active, each candidate gets an ImputationStabilityEntry (Rubin-pooled OFV/AIC/BIC, convergence_rate, top-K rank_stability, within/between variance proxy, covariate sign consistency, plus per-parameter Rubin pooling — see below). Gate 1 fails candidates with convergence_rate < 0.5 (hard floor) or rank_stability < 1 − policy.imputation_stability_penalty (soft, lane-driven).
Rubin's rules for per-parameter pooling. apmode.search.stability.rubin_pool(estimates, ses, m_total=...) implements the canonical Rubin (1987) decomposition on a single scalar parameter:
Q̄— pooled point estimate (mean of per-imputation estimates).Ū— within-imputation variance (mean of per-imputationSE²).B— between-imputation variance (sample variance of per-imputation estimates).T = Ū + (1 + 1/m) · B— total variance used for inference.dof— Barnard–Rubin degrees of freedom for interval construction.
ImputationStabilityEntry.pooled_parameters carries the full 5-tuple per parameter so downstream reports can quote Rubin-pooled estimates and confidence intervals with the correct variance decomposition (not just the arithmetic mean of OFV/AIC/BIC). The orchestrator populates the input tuples by having _run_mi_stage forward each refit's (estimate, SE) pairs on PerImputationFit.parameter_estimates. When the backend does not emit SEs (e.g., SAEM without a covariance step), the pool degrades gracefully to a between-imputation-only variance.
Agentic LLM cherry-picking guard. The stability manifest is the only per-imputation artifact the LLM sees. When directive.llm_pooled_only=true (default), diagnostic_summarizer.summarize_stability_for_llm substitutes pooled scores + stability scores for raw per-imputation diagnostics — the LLM cannot observe individual imputation draws, so it cannot select transforms that exploit a single lucky imputation.
Ω-pooling caveats. Whenever MI is used, the credibility report automatically appends three Rubin/log-Cholesky/EBE caveats to the limitations block for Gate 2.5 ingestion (ICH M15 alignment).
Categorical covariate format + auto-detection. APMODE expects binary categorical covariates encoded as canonical {0, 1}. Real-world PK data routinely uses other conventions; apmode.data.categorical_encoding.auto_remap_binary_columns recognises and remaps the following automatically (no per-dataset configuration required):
| Native form | Example datasets | Mapping |
|---|---|---|
{0, 1} |
(canonical) | identity, no remap |
{1, 2} 1-indexed integers |
mavoglurant SEX | 1 → 0, 2 → 1 |
{True, False} booleans |
clinical screening flags | False → 0, True → 1 |
M/F, Male/Female (any case) |
warfarin sex | f/female → 0, m/male → 1 |
Yes/No, Y/N |
adherence flags | no/n → 0, yes/y → 1 |
True/False, Pos/Neg, Positive/Negative |
screening | smaller → 0 |
Absent/Present, Control/Case, Placebo/Active |
trial arms | smaller → 0 |
| Unknown two-level string pair | (any) | alphabetic-order default + UserWarning |
Multi-level categorical covariates (>2 distinct values) are not auto-remapped — they need k-1 one-hot binary indicators chosen and named by the analyst. Mixed-type columns (e.g., ["M", 1] mixing strings and numbers) are rejected with an actionable diagnostic.
Override the polarity when the auto-detected one is wrong:
from apmode.dsl.frem_emitter import summarize_covariates
covs = summarize_covariates(
df, ["SEX"],
transforms={"SEX": "binary"},
binary_encode_overrides={"SEX": {"male": 0, "female": 1}}, # opposite polarity
)Or call auto_remap_binary_columns(df, ["SEX"], overrides={...}) directly before passing the DataFrame into the FREM pipeline.
Provenance: Every encoding decision (column, detected kind, applied remap, source = auto/override/no_remap, rationale) is captured in categorical_encoding_provenance.json in the run bundle so reviewers can trace exactly how each raw categorical value was mapped to 0/1.
R harness. Imputation uses src/apmode/r/impute.R, which dispatches to mice::mice(method="pmm") or missRanger::missRanger(num.trees=100, pmm.k=10) — the ranger-backed fast alternative to missForest (Mayer CRAN 2.6.x). Python-side providers are R_MiceImputer and R_MissRangerImputer (src/apmode/data/imputers.py); both are covered by live tests in tests/unit/test_imputers_live.py that spawn real Rscript against installed mice/missRanger and verify (a) imputed CSVs appear, (b) no residual NaN in imputed columns, (c) between-imputation variance proves MI is functioning.
FREM emitter. The FREM path is implemented in src/apmode/dsl/frem_emitter.py and executed via src/apmode/backends/frem_runner.py::run_frem_fit. Public API:
FREMCovariate(name, mu_init, sigma_init, dvid, epsilon_sd, transform, time_varying)— per-covariate metadata.transform="log"for positive/right-skewed covariates (Yngman 2022 conditioning);transform="binary"for 0/1-coded categorical covariates (additive-normal endpoint, the standard categorical-FREM compromise — multi-level categorical covariates are one-hot encoded upstream into k−1 binary indicators).time_varying=Trueemits per-(subject, TIME) observation rows and leavessig_cov_*estimable so it absorbs within-subject variation while the eta continues to capture between-subject variance.summarize_covariates(df, names, transforms=...)— compute baseline (min TIME) mean + SD per covariate; rejects duplicates and covariates with <2 observed subjects.prepare_frem_data(df, covariates)— pivot to DVID-multiplexed long format; adds one observation row per subject per covariate at the subject's baseline time, rejects DVID collisions with any existing multi-analyte scheme.emit_nlmixr2_frem(spec, covariates, initial_estimates=...)— emits the augmented nlmixr2 model function with a joint Ω block (PK IIV etas + covariate etas), estimable covariate means, fixed covariate residuals (fix(...)) so the eta absorbs all BSV, and one observation endpoint per covariate. Routing is data-driven via theDVIDcolumn: nlmixr2 assigns endpoints DVID 1 (PK), 2, 3, … in declaration order, andprepare_frem_datawrites matching DVIDs. No| DVID==Npipe is emitted — nlmixr2 5.0 rejects conditions on the endpoint RHS (verified live 2026-04-14).run_frem_fit(spec_template, df, covariate_names, runner, ...)— composes the above withNlmixr2Runnerfor a single-call FREM fit.
Scope covers static + time-varying subject-level covariates and continuous + log-transformed + binary-categorical encodings. summarize_covariates auto-detects per-covariate time-varying status by checking within-subject variation and sets time_varying=True accordingly. Live end-to-end tests in tests/unit/test_frem_emitter.py::TestFREMLiveIntegration spawn real Rscript + nlmixr2 and verify (a) the emitted model compiles, (b) FOCE-I actually fits the joint Ω and learns a non-degenerate eta.cov.WT variance on tiny synthetic data.
Estimator requirement: FREM requires FOCE-I. nlmixr2 SAEM treats subject-level covariate observations as dynamic sampling targets and collapses the random-effect variance to zero. The FREM runner must be constructed with Nlmixr2Runner(estimation=["focei"]); Orchestrator._run_frem_stage does this by default when no frem_runner is injected.
Orchestrator execution: missing_data_directive.json is written to every run's reproducibility bundle. When the directive resolves to FREM, Orchestrator._run_frem_stage automatically refits the best healthy classical candidate (filtered to backend="nlmixr2" + finite BIC + not ill_conditioned) via run_frem_fit. When the directive resolves to MI-PMM / MI-missRanger, _run_mi_stage freezes the classical candidate set and refits each spec on m imputed datasets (produced by R_MiceImputer / R_MissRangerImputer by default, or an injected mi_provider); aggregate_stability applies Rubin's rules, emits imputation_stability.json, and feeds per-candidate stability entries into Gate 1. Both stages wrap their work in a narrow (BackendError, RuntimeError, ValueError, NotImplementedError) handler with exc_info=True so classical search results survive a missing-data failure.
generate_root_candidates(space, base_params=nca_estimates) enumerates the cross-product of allowed modules, seeded with parameter estimates from per-subject PKNCA-style NCA (adaptive lambda_z selection by curve-stripping, linear-up/log-down AUC integration per Purves 1992, and QC gates on adj-R², AUC extrapolation fraction, and span ratio; subjects failing QC are excluded from the population median, and when ≥50% are excluded APMODE falls back to RunConfig.fallback_estimates — e.g., a dataset card's published_model.key_estimates — or conservative defaults. Per-subject diagnostics are emitted as nca_diagnostics.jsonl in the reproducibility bundle):
for cmt in space.structural_cmt: # e.g., [1, 2, 3]
for abs_type in space.absorption_types: # [FirstOrder, Transit-3, Lagged]
for elim_type in space.elimination_types: # [Linear, MM, ParallelLinearMM]
for error_type in space.error_types: # [Proportional, Combined]
yield DSLSpec(...)
For a typical dataset with rich sampling, nonlinear CL, and high identifiability, this produces ~20-30 root candidates. NCA-seeded initial estimates dramatically reduce SAEM convergence time vs. fitting from arbitrary values.
The NCA estimator is unit-aware: when raw CL is implausibly small (< 0.5 L/h), the DV magnitude looks like ng/mL territory (> 50), and dose_median ≥ 1 (mg-scale human dosing), it applies a x1000 scaling factor — accommodating the common pharmacometric convention of dose-in-mg with concentration-in-ng/mL without requiring users to specify units explicitly. The dose_median guard prevents over-correction of preclinical data (small-animal studies, biologics) where a legitimate low-CL drug with ng/mL concentrations may superficially match the heuristic. The applied scale factor is recorded as _unit_scale_applied in the initial estimates bundle for auditability.
After all root candidates are fit, the SearchEngine selects the top-3 converged roots by BIC and generates warm-started children from each. For each parent, it produces up to 2 children per error model type (proportional, additive, combined), giving a hard ceiling of 3 parents × 3 error types × 2 children = 18 child candidates. Duplicates (deduplicated by model_id) are skipped, so the actual count is usually lower.
Children warm-start from their parent's fitted parameters rather than NCA estimates. For an ka/CL/V 2-cmt parent that converged to ka=0.8, CL=3.2, V1=45, V2=80, Q=9, those values seed the child's initial estimates, cutting SAEM convergence time by ~3-5x. The child's structural modules can differ (e.g., different IIV structure or error model), but the shared structural parameters inherit the parent's fitted values.
Every parent → child relationship is tracked in the Search DAG (candidate_lineage.json), with edge labels like "warm_start_combined" indicating which error model drove the child.
total_budget = phase1_roots + phase3_children
≤ |cmt| × |absorption| × |elimination| × |error_types| + 18
For a typical rich dataset with nonlinear CL and multi-phase absorption:
| Dimension | Typical size |
|---|---|
| Phase 1 roots | 20-30 (cross-product after _build_spec() filters inadmissible combos) |
| Phase 3 children | up to 18 |
| Total | ~30-50 candidates |
Empirically, the benchmark-results table shows Gate-1-entered counts of 25–41 per dataset — consistent with this envelope. The search does not iterate until convergence, apply transforms beyond warm-start children, or have a user-tunable --max-candidates flag. The budget is fully determined by the Evidence Manifest. This makes wall time estimable as total_budget × avg_fit_seconds / max_concurrency.
The DSL supports typed transforms (add_iiv, change_absorption, add_covariate, etc.) documented in docs/FORMULAR.md. These are used by the agentic LLM backend (Phase 3 of APMODE, see Agentic LLM Backend below), which applies them iteratively based on LLM proposals up to --max-iterations. The automated search engine does not apply transforms iteratively — it only does the Phase 1 cross-product + Phase 3 warm-start children. If you want iterative transform-based exploration, enable the agentic backend with apmode run --agentic --lane discovery.
Running on the 120-subject Novartis mavoglurant dataset (nlmixr2data::mavoglurant), the profiler detects: nonlinear CL, rich sampling, 6 potential covariates, inadequate absorption-phase coverage. The resulting search:
- Phase 1 roots: ~20-30 candidates = (1,2,3)-cmt × (FirstOrder, Transit, Lagged) × (Linear, MM, ParallelLinearMM) × (Proportional, Additive, Combined), minus
_build_spec()-filtered combos - Phase 3 children: top-3 converged roots warm-start up to 18 children with varied error models (some dedup, usually 10-15 actual)
- Time per candidate: 2-5 minutes SAEM on 120 subjects, parallelizable up to
max_concurrency - Total wall time: ~60-90 minutes for a single-threaded submission-lane run
- Output:
candidate_lineage.json(DAG edges),search_trajectory.jsonl(per-candidate BIC/OFV/convergence)
- Evidence-driven pruning — no wasted fits on models the data can't identify (saves 10x+ runtime vs. full grid)
- Deterministic budget — reproducible runtime estimates, no runaway loops, no hyperparameters to tune
- Warm-start efficiency — children inherit converged parent parameters (3-5x faster SAEM)
- Auditability — every candidate has a logged origin: either a root (Phase 1) or a warm-start child of a specific parent (Phase 3). Use
apmode lineage <bundle> <candidate>to trace it. - Typed safety boundary — transforms exist as a finite, validated DSL operation set. The automated search uses a restricted subset (error-model variation via warm-start). The full transform set is available to the agentic LLM backend, which cannot emit raw code.
NONMEM CSV ──→ Ingest + Validate ──→ Canonical PK Schema (Pandera)
│
┌─────────┴──────────┐
↓ ↓
Data Manifest Evidence Manifest
(SHA-256, (richness, route,
covariates) CL linearity, BLQ)
│ │
↓ ↓
NCA Estimator Search Space
(CL, V, ka, (dispatch constraints)
multi-dose) │
│ ↓
Formular text ──→ Lark parser ──→ AST ──→ Search Engine
│ ├─ Candidate generation
Semantic validator ├─ Multi-backend dispatch
│ │ ├─ Classical → nlmixr2 (R subprocess)
split_subjects() │ ├─ Bayesian → Stan/Torsten (cmdstanpy)
(k-fold, LORO) │ ├─ NODE → JAX/Diffrax
│ └─ Agentic LLM → Formular transforms only
└─ Pareto frontier + BIC scoring
│
┌────────────┼────────────┐
↓ ↓ ↓
Gate 1: Gate 2: Gate 2.5:
Technical Lane ICH M15
Validity Admissibility Credibility
(CWRES, (identifiability, (COU,
VPC, shrinkage, data adequacy,
R̂/ESS/ NODE exclusion, limitation-
divergences) LORO-CV) risk map)
└────────────┴────────────┘
│
Gate 3: Ranking
(within-paradigm BIC or
cross-paradigm VPC/NPE
+ AUC/Cmax BE composite)
│
↓
Reproducibility Bundle (JSON/JSONL)
| Component | Path | Role |
|---|---|---|
| PK DSL grammar | src/apmode/dsl/pk_grammar.lark |
Full Lark EBNF for PK model specs |
| AST models | src/apmode/dsl/ast_models.py |
Typed Pydantic nodes for all DSL axes |
| Semantic validator | src/apmode/dsl/validator.py |
Constraint table enforcement (PRD §4.2.5) |
| Transforms | src/apmode/dsl/transforms.py + prior_transforms.py |
The 10 typed FormularTransforms — agentic-safe DSL operations |
| nlmixr2 emitter | src/apmode/dsl/nlmixr2_emitter.py |
DSL AST → R code for nlmixr2/rxode2 |
| Stan emitter | src/apmode/dsl/stan_emitter.py |
DSL AST → Stan program (Phase 2+; Bayesian backend) |
| FREM emitter | src/apmode/dsl/frem_emitter.py |
Joint-Ω FREM code for covariate-missingness workflows |
| Data pipeline | src/apmode/data/ |
Ingestion, profiling, NCA estimates, splitting, imputation |
| Dose expansion | src/apmode/data/dosing.py |
ADDL/II expansion, infusion events, event table builder |
| Classical backend | src/apmode/backends/nlmixr2_runner.py |
Async subprocess runner with file-based IPC |
| Bayesian backend | src/apmode/backends/bayesian_runner.py + src/apmode/bayes/harness.py |
Stan/Torsten via cmdstanpy; R̂/ESS/E-BFMI/Pareto-k Gate 1 integration |
| FREM runner | src/apmode/backends/frem_runner.py |
FOCE-I FREM driver for the orchestrator's missing-data stage |
| NODE backend | src/apmode/backends/node_*.py |
Bräm-style hybrid MLP, Diffrax ODE, Optax training, constraint templates |
| Agentic LLM backend | src/apmode/backends/agentic_runner.py |
Closed-loop LLM model improvement via Formular transforms (≤25 iters) |
| LLM providers | src/apmode/backends/llm_providers.py |
Anthropic, OpenAI, Gemini, Ollama, OpenRouter, litellm |
| Predictive-diagnostics helper | src/apmode/backends/predictive_summary.py |
Canonical VPC / NPE / AUC-Cmax-BE builder (single path) |
| Governance | src/apmode/governance/ |
Gates 1/2/2.5/3, cross-paradigm ranking, policy schemas + validator |
| Search engine | src/apmode/search/engine.py |
Multi-backend dispatch, BIC scoring, Pareto frontier |
| Orchestrator | src/apmode/orchestrator/ |
Full pipeline: ingest → profile → NCA → search → gates → bundle → report |
| Bundle emitter | src/apmode/bundle/ |
All reproducibility bundle artifacts per PRD §5 |
| Report generator | src/apmode/report/ |
HTML + Markdown regulatory report with credibility framing |
| Dataset registry | src/apmode/data/datasets.py |
14 public PK datasets from nlmixr2data with auto-fetch |
| Suite C literature loader | src/apmode/benchmarks/literature_loader.py |
YAML → LiteratureFixture → DSLSpec traversal for the Phase-1 MLE + Bayesian benchmark roster |
| Suite C Phase-1 scoring | src/apmode/benchmarks/suite_c_phase1_scoring.py + suite_c_phase1_cli.py |
Pure-Python score_fixture + aggregate_phase1_scorecard (fraction_beats_literature_median ≥ 60% gate, δ = 0.02 win margin); standalone python -m CLI driver consumed by the weekly .github/workflows/suite_c_phase1.yml |
| HTTP API persistence | src/apmode/api/store.py |
RunStore Protocol + SQLiteRunStore (WAL mode, BEGIN IMMEDIATE writes, idempotent startup-sweep of RUNNING rows) — backbone for the FastAPI surface |
| HTTP API surface | src/apmode/api/{app,routes,runs}.py |
build_app() factory + Starlette lifespan; POST/GET/DELETE /runs + /runs/{id}/{status,bundle,rocrate} endpoints; execute_run background-task wrapper that catches CancelledError and writes RunStatus.CANCELLED |
| Subprocess termination | src/apmode/backends/process_lifecycle.py |
terminate_process_group(proc, grace_seconds=5) shared by Nlmixr2Runner + BayesianRunner — SIGTERM → wait → SIGKILL on the child process group, called from the runners' asyncio.CancelledError paths |
| Path resolver | src/apmode/paths.py |
APMODE_POLICIES_DIR env override + pyproject-walk fallback for CLI/orchestrator |
| CLI | src/apmode/cli.py |
Typer CLI with 16 commands (see CLI Reference) |
APMODE enforces a gated funnel — not a weighted sum. Each gate is disqualifying; only survivors advance.
| Gate | Purpose | Checks |
|---|---|---|
| Gate 1 | Technical Validity | Convergence, parameter plausibility, CWRES normality, PIT calibration (replaces bin-level VPC in 0.4.2), condition number, seed stability, split integrity; Bayesian: R̂, bulk/tail ESS, divergences, E-BFMI, Pareto-k |
| Gate 2 | Lane Admissibility | Interpretability, shrinkage, identifiability (profile-likelihood CI), NODE exclusion (Submission lane), LORO-CV (Optimization lane), prior-justification artifact (Bayesian) |
| Gate 2.5 | Credibility Qualification | ICH M15 context-of-use, data adequacy, ML transparency, limitation-risk mapping, operational qualification |
| Gate 3 | Ranking | Within-paradigm BIC; cross-paradigm VPC concordance + NPE + AUC/Cmax BE composite (Borda or weighted sum) |
Gate thresholds are versioned policy artifacts in policies/*.json (schema v0.6.0) — not hard-coded constants.
The Gate 1 calibration check is PIT / NPDE-lite on the posterior-predictive simulation matrix, replacing the bin-level VPC coverage check used through 0.4.1. The earlier metric was invariant to fit quality on sparse real data — coverage per percentile band quantized at 1/n_bins (typically 8 bins), so submission-lane tolerances admitted only {1.0, 7/8} and any band at 6/8=0.75 auto-failed (see CHANGELOG rc9 follow-up "PIT/NPDE-lite Gate 1 calibration" for the failure mode that triggered the redesign).
Methodology. For each observation j on subject i and each probability level p ∈ {0.05, 0.50, 0.95}, compute the simulated predictive p-quantile q_p(i,j) = nanpercentile({y_sim[s,i,j]}, 100·p) from the (n_sims, n_obs_i) matrix, form the CDF indicator I_p(i,j) = 𝟙[y_obs(i,j) ≤ q_p(i,j)], and aggregate subject-robustly as c_p = mean_i(mean_j I_p(i,j)). Under correct predictive calibration c_p → p. The gate fails if
|c_p − p| > tol(p, n_subjects)
tol(p, n) = max( floor_{tail|med}, z_alpha · sqrt(p(1−p) / n_subjects) )
The tolerance is n-scaled so the SE-coverage stays constant across dataset sizes; the floor prevents large-n vacuous strictness (at n_subj ≈ 10 000 the z · SE term would otherwise demand a ~0.003 tolerance and false-reject perfectly-calibrated models on sampling noise alone). Effective sample size is n_subjects (outer-mean denominator under subject-robust aggregation) rather than n_observations — governance-conservative because within-subject observations are strongly correlated in population-PK, so using n_observations would understate variance.
Tails (p ∈ (0, 0.25] ∪ [0.75, 1)) use a smaller floor because tail miscalibration is the most diagnostic signal of variability-component misspecification (residual-error magnitude, IIV distribution); the sqrt(p(1−p)) factor also gives the tails a tighter natural scale (≈0.22 vs ≈0.50 at the median).
Lane calibration. z_alpha sets the SE multiplier per lane: submission 1.5 (strictest, ~87% per-band CI), optimization 2.0, discovery 2.5 (widest — reflects NODE/agentic variance). Floors track the same ordering: submission {0.03, 0.05}, optimization {0.04, 0.07}, discovery {0.05, 0.10}. VPCSummary is still written to bundles for reports and within-paradigm concordance ranking, but is descriptive — not a Gate 1 gate.
Alignment. This is a trimmed NPDE family metric (Brendel 2006; Comets 2008) with subject-robust aggregation, avoiding the decorrelation cost of full NPDE while preserving the pass/fail semantics at three PIT levels.
APMODE routes work through separate pipelines with different admissible backends and evidence standards:
| Lane | Purpose | Admissible Backends | NODE Eligible? | LORO-CV |
|---|---|---|---|---|
| Submission | Regulatory-grade models | Classical NLME (+ Bayesian for prior-informed fits) | No (hard rule) | Not required |
| Discovery | Broad exploration | Classical + Bayesian + NODE + Agentic LLM | Yes (Gate 2.5 required) | Not required |
| Optimization | LORO-CV prediction (Phase 3) | All backends | Yes | Required (Gate 2) |
Simulated PK datasets with known ground truth (PRD §5). Requires R 4.4+ with rxode2.
Rscript benchmarks/suite_a/simulate_all.R [output_dir]| Scenario | Model | Key Test |
|---|---|---|
| A1 | 1-cmt oral, first-order abs, linear elim | Structure identification |
| A2 | 2-cmt IV, parallel linear+MM elim | Compartment count + nonlinear CL |
| A3 | Transit (n=3), 1-cmt, linear elim | Transit chain detection |
| A4 | 1-cmt oral, MM elimination | Nonlinear clearance detection |
| A5 | TMDD quasi-steady-state (SC mAb) | TMDD vs. 2-cmt confusion |
| A6 | 1-cmt oral + allometric WT + renal covariate | Covariate structure recovery |
| A7 | 2-cmt + NODE saturable absorption | NODE shape recovery + surrogate fidelity |
| A8 | 1-cmt oral + time-varying CL + CRCL covariate | Time-varying clearance + covariate recovery |
| Scenario | Test | Assertion |
|---|---|---|
| B1 | NODE absorption recovery | Mock NODE fit passes Gate 1+2 Discovery |
| B2 | Sparse data + NODE dispatch | Lane Router blocks NODE when data insufficient |
| B3 | Cross-paradigm ranking | Gate 3 correctly ranks mixed nlmixr2 + jax_node candidates |
Suite C anchors APMODE against published, peer-reviewed reference parameterisations and selected ground-truth reference datasets. Each fixture pairs a NONMEM-style CSV (resolved via dataset_id against benchmarks/datasets/registry.yaml) with a DSLSpec JSON and a LiteratureFixture YAML carrying the reference parameter values, citation/DOI when available, and a parameterization mapping that translates published symbol names (e.g. TVCL) into APMODE's DSL-canonical names (e.g. CL).
Phase 1 MLE roster (apmode.benchmarks.literature_loader.PHASE1_MLE_FIXTURE_IDS):
dataset_id |
Route | DSL skeleton | Reference DOI |
|---|---|---|---|
theophylline_boeckmann_1992 |
oral | 1-cmt + FO ka | 10.1002/psp4.12471 (Schoemaker et al. 2019 tutorial grid containing the Theoph fit) |
warfarin_funaki_2018 |
oral | 1-cmt + lagged-FO ka | 10.1002/psp4.12445 |
mavoglurant_wendling_2015 |
oral | 2-cmt + FO ka | 10.1007/s11095-014-1574-1 |
phenobarbital_grasela_1985 |
iv_bolus | 1-cmt | 10.1159/000457062 |
oral_1cpt_acop_2016 |
oral | 1-cmt + FO ka | 10.32614/CRAN.package.nlmixr2data |
gentamicin_germovsek_2017 |
iv_bolus | 1-cmt | 10.1128/AAC.02659-16 |
schoemaker_nlmixr2_tutorial |
iv_bolus | 1-cmt | 10.1002/psp4.12471 |
Phase 1 Bayesian roster — vancomycin Roberts 2011 (10.1128/AAC.01708-10) ships in v0.6 with weakly-informative log-Normal priors on log-CL/log-V centred on the published typical values (4.6 L/h, 105 L) and half-Normal priors on the BSV/residual SDs. Eleveld propofol is deferred per the docs/discovery/eleveld_propofol_coverage.md NO-GO assessment (DSL gaps: derived FFM covariate, Stan-side maturation, age-decay piecewise on CL).
The integration tests (tests/integration/test_suite_c_phase1_mle.py, tests/integration/test_suite_c_bayesian.py) verify each fixture loads, validates against its target lane, lowers cleanly to nlmixr2 R or Stan code, and carries a Crossref-canonical DOI. The full short-fit recovery test (warmup=200, sampling=200, chains=2 vs 8-subject shrink data) is gated behind @pytest.mark.slow for the weekly CI workflow.
The scoring harness — fraction-beats-literature-median ≥ 60% with δ=0.02 win margin and 5-fold subject-level CV — runs in honest mode: the live-fit driver python -m apmode.benchmarks.suite_c_phase1_runner writes a disjoint train/test CSV pair per fold, fits the APMODE side on the train CSV with posterior-predictive sims routed at the held-out fold (Nlmixr2Runner.run(..., test_data_path=test_csv) → rxode2::rxSolve(events=test_df)), and fits the literature side at the published parameter values via est='posthoc' (fixed_parameter=True → harness freezes THETA/OMEGA/SIGMA at the compiled ini() values and only estimates ETAs). The reported NPE on each side is therefore true held-out generalisation; the gate is a methodology-drift detector rather than a goodness-of-fit detector. The weekly CI workflow at .github/workflows/suite_c_phase1.yml consumes the resulting phase1_npe_inputs.json via suite_c_phase1_cli.py.
Ten infrastructure fixes from the v0.6.1 honest-mode bring-up keep this pipeline robust on every canonical fixture: (1) Nlmixr2Runner.run pre-adapts the on-disk CSV through apmode.data.adapters.to_nlmixr2_format (NMID→ID rename + DVID PK-row filter + string-categorical remap + _PK_PARAM_COLLISION_COLUMNS strip for ACOP-style simulator metadata V/CL/KA/DOSE/SD) and points the harness at the adapted copy in the per-fit scratch directory — without it, FOCEI silently entered a "Theta reset (ETA drift)" loop on small folds and rxode2's posterior-predictive sims got hijacked by the simulator's true-param columns; r/harness.R::.normalize_id_column keeps an idempotent NMID→ID rename as defence in depth. (2) NCAEstimator shares the runner's PK_DVID_ALLOWLIST to filter mixed-endpoint datasets (warfarin's DVID="cp"/"pca") before per-subject NCA, and a new data-driven fallback (V = Dose_geo / Cmax_geo, CL = Dose_geo / AUC_obs_geo, ka = 2.5 / Tmax_geo with log-normal floors/caps) precedes the conservative hard-coded defaults — the cascade is nca → dataset_card → data_driven → defaults. (3) The NPE residual scaling is driven by spec.observation (compute_npe(error_model="proportional"|"combined"|"additive")) so an ng/mL-scaled fixture (mavoglurant) and an mg/L-scaled fixture (theo) on the same Proportional model produce comparable NPE values — without this wire the rc8 raw-MedAE path inflated NPE by ~3 OoMs purely from the unit difference. (4) r/harness.R::.simulate_posterior_predictive wraps each per-sim row + the per-subject t_observed/observed_dv vectors in I(...) so jsonlite::toJSON(auto_unbox = TRUE) preserves the array shape on single-observation subjects (sparse fixtures like pheno_sd); a defence-in-depth field_validator on PredictedSimulationsSubject coerces the flat list[float] and bare-scalar shapes if the R-side fix ever regresses. (5) Nlmixr2Runner._spawn_r races a os.kill(pid, 0)-based watchdog against the drain gather so an orphaned gcc/clang grandchild holding inherited stdout/stderr FDs cannot block proc.wait() past the per-fit budget. (6) CanonicalPKSchema.SS accepts the ACOP-style 99 "not applicable" sentinel (in addition to the standard 0/1/2). (7) The harness filters the rxSolve output by evid (lowercase, modern rxode2) AND EVID (legacy uppercase) so EVID=2 reset rows do not inflate the per-sim row count past n_obs. (8) The harness uses the simulator's actual covered TIMES as the per-subject time axis and positional-matches via match() so partial rxSolve coverage (e.g. trailing DV=0 observations rxode2 truncates) does not silently mark every replicate as failed. (9) apmode.data.adapters._PK_PARAM_COLLISION_COLUMNS strips simulator-truth columns (V, CL, KA, Q, V1/V2/V3, KM, VM, MTT, KTR, KE, F1, TLAG, plus ACOP metadata DOSE, SD) — these names are never legitimate NONMEM event-table columns, so stripping is always safe. (10) Undefined CWRES (nlmixr2 fits that converge numerically but produce all-NaN residuals — Suite-B b8_mavoglurant_null_covariates is the canonical case) propagates as Optional[float] = None end-to-end (per ICH M15 §3 and Karlsson 2007): the harness emits JSON null via .finite_or_null, GOFMetrics.{cwres_mean,cwres_sd,outlier_fraction} accept None, Gate 1 fails-closed on the None ("cannot affirm structural validity without a usable diagnostic"), the cross-paradigm ranker's compute_cwres_npe_proxy returns +inf so a diagnostically-incomplete fit ranks worst (not best) under the lower-is-better ordering, LORO-CV / diagnostic_summarizer / report.renderer render the gap as the literal "unavailable" rather than fabricating a misleading "0.0000" entry. The earlier silent 0/1 fallback approach is reverted because it would have let degenerate fits pass Gate 1 ranking as "perfectly diagnosed" — exactly the failure mode regulatory diagnostics protocols explicitly forbid.
End-to-end Phase-1 on the five open nlmixr2data fixtures (theophylline / warfarin / mavoglurant / phenobarbital / Oral_1CPT) returned all 50 fits with status=success in the v0.6.1 reference run. The dimensionless-NPE scorecard:
| Fixture | n_subj | NPE APMODE | NPE Literature | APMODE wins (δ ≥ 0.02) |
|---|---|---|---|---|
theophylline_boeckmann_1992 |
12 | 0.186 | 0.178 | ✗ (-4%) |
warfarin_funaki_2018 |
32 | 0.232 | 0.227 | ✗ (-2%) |
mavoglurant_wendling_2015 |
120 | 0.408 | 0.993 | ✓ (+59%) |
phenobarbital_grasela_1985 |
59 | 0.251 | 0.270 | ✓ (+7%) |
oral_1cpt_acop_2016 |
120 | 0.263 | 0.259 | ✗ (-2%) |
fraction-beats-literature-median = 40% (2/5), below the 60% target. The three losses are all within the δ=0.02 win-margin Monte-Carlo noise band; the mavoglurant win (59% better) is the methodology-improvement signal. Oral_1CPT is a simulated ground-truth-recovery fixture where the literature side is fitting to the simulator's exact typical values, so a near-tie is the design expectation. Raw inputs are committed at benchmarks/suite_c/phase1_npe_inputs.json and the weekly CI workflow re-derives them via python -m apmode.benchmarks.suite_c_phase1_runner.
Full apmode run pipeline on all Suite A fixtures plus three real
public datasets. Driver: scripts/run_full_benchmark.sh; bundles land
in benchmarks/runs/full-<timestamp>/.
Provenance — APMODE <!-- apmode:AUTO:version_tag -->v0.6.1-rc1<!-- apmode:/AUTO:version_tag -->, macOS 14 / Linux x86_64, Python 3.12, R 4.4.1 (nlmixr2 4.0.1, rxode2 4.0.0), CmdStan 2.36 where applicable. Reproduce with:
bash scripts/run_full_benchmark.sh ./benchmarks/runs/full-$(date -u +%Y%m%dT%H%M%SZ)Gate funnel — 10/10 scenarios produced a recommended candidate.
| Dataset | Lane | Gate 1 | Gate 2 | Recommended | Best BIC |
|---|---|---|---|---|---|
| a1_1cmt_oral_linear | submission | 19 / 34 | 19 / 19 | ✓ | 2848.3 |
| a2_2cmt_iv_parallel_mm | discovery | 8 / 41 | 8 / 8 | ✓ | 4396.7 |
| a3_transit_1cmt_linear | submission | 9 / 34 | 9 / 9 | ✓ | 2623.2 |
| a4_1cmt_oral_mm | discovery | 4 / 41 | 4 / 4 | ✓ | 3113.3 |
| a5_tmdd_qss | discovery | 7 / 34 | 7 / 7 | ✓ | 4858.8 |
| a6_1cmt_covariates | submission | 17 / 34 | 17 / 17 | ✓ | −317.5 |
| a7_2cmt_node_absorption | discovery | 22 / 34 | 22 / 22 | ✓ | 807.4 |
| warfarin (nlmixr2data) | submission | 19 / 33 | 19 / 19 | ✓ | 960.1 |
| theo_sd (nlmixr2data) | submission | 10 / 25 | 8 / 10 | ✓ | 396.3 |
| mavoglurant (nlmixr2data) | discovery | 4 / 37 | 4 / 4 | ✓ | 28906.9 |
| Total | 119 / 347 (34%) | 117 / 119 (98%) | 10 / 10 |
Gate 1 rejects non-converged candidates and those failing plausibility, CWRES normality, or trajectory validity. Gate 2 (submission lane) applies shrinkage ≤ 30%, identifiability, and reproducible-estimation checks. Gate 2 now passes 98% of Gate 1 survivors across all scenarios — the log-space plausibility fix and shrinkage unit fix (percentage → fraction) resolved the two systematic false-positive rejection sources.
Every bundle passes apmode validate and emits the v2 structured
nonlinear_clearance_signals record with full provenance (algorithm,
citation, policy_key, threshold, observed value + 90% CI, eligibility
reason, vote). apmode inspect <bundle> renders the per-signal table;
apmode lineage <bundle> <candidate_id> traces the transform DAG.
2581 tests collected (2564 non-live) across multiple strategies — all counts auto-synced by scripts/sync_readme.py:
uv run pytest tests/unit/ -q # unit tests
uv run pytest tests/integration/ -q # integration tests
uv run pytest tests/property/ -q # Hypothesis property-based tests
uv run pytest tests/golden/ -q # syrupy golden master snapshots
uv run pytest tests/ --snapshot-update # update snapshots after emitter changes| Directory | Coverage |
|---|---|
tests/unit/ |
All modules: DSL, data, backends (classical/Bayesian/NODE/agentic), search, governance, routing, bundle, report |
tests/integration/ |
Mock R pipeline, Discovery lane, LLM providers, Suite A/B/C E2E, BLQ flows |
tests/property/ |
Hypothesis: DSL round-trip, transforms, LORO split invariants |
tests/golden/ |
Syrupy snapshots for emitter output |
tests/fixtures/ |
Benchmark Suite A CSVs + stored policies |
- Smith 2000 dose-proportionality power model: Smith BP, Vandenhende FR, DeSante KA, Farid NA, Welch PA, Callaghan JT, Forgue ST (2000). Confidence interval criteria for assessment of dose proportionality. Pharm Res 17(10):1278-1283. doi:10.1023/a:1026451721686
- Huang 2025 λz selector + initial-estimates pipeline: Huang Z, Fidler M, Lan M, Cheng I-L, Kloprogge F, Standing JF (2025). An automated pipeline to generate initial estimates for population pharmacokinetic base models. J Pharmacokinet Pharmacodyn 52:60. doi:10.1007/s10928-025-10000-z
- Wagner-Nelson absorption: Wagner JG, Nelson E (1963). Percent absorbed time plots derived from blood level and/or urinary excretion data. J Pharm Sci 52(6):610-611. doi:10.1002/jps.2600520627
- Richardson 2025 popPK automation: Richardson S, Irurzun-Arana I et al. (2025). A machine learning approach to population pharmacokinetic modelling automation. Commun Med 5:327. doi:10.1038/s43856-025-01054-8
- Pharmpy AMD: Chen X, Nordgren R, Belin S, Hamdan A, Wang S, Huang Z, Hooker AC, Karlsson MO (2024). A fully automatic tool for development of population pharmacokinetic models. CPT Pharmacometrics Syst Pharmacol 13(10):1784-1797. doi:10.1002/psp4.13222
- BLQ M3 likelihood: Beal SL (2001). Ways to fit a PK model with some data below the quantification limit. J Pharmacokinet Pharmacodyn 28(5):481-504. doi:10.1023/a:1012299115260
- BLQ method comparison: Ahn JE, Karlsson MO, Dunne A, Ludden TM (2008). Likelihood-based approaches to handling data below the quantification limit using NONMEM VI. J Pharmacokinet Pharmacodyn 35(4):401-421. doi:10.1007/s10928-008-9094-4
- SCM covariate selection: Wählby U, Jonsson EN, Karlsson MO (2002). Comparison of stepwise covariate model building strategies in population pharmacokinetic-pharmacodynamic analysis. AAPS PharmSci 4(4):E27. doi:10.1208/ps040427
- Covariate model power: Ribbing J, Jonsson EN (2004). Power, selection bias and predictive performance of the population pharmacokinetic covariate model. J Pharmacokinet Pharmacodyn 31(2):109-134. doi:10.1023/B:JOPA.0000034404.86036.72
- TMDD full binding: Mager & Jusko (2001), J Pharmacokinet Pharmacodyn 28:507-532
- TMDD QSS: Gibiansky et al. (2008), J Pharmacokinet Pharmacodyn 35:573-591
- Transit compartments: Savic et al. (2007), J Pharmacokinet Pharmacodyn 34:711-726
- Allometric scaling: Anderson & Holford (2008), Clin Pharmacokinet 47:455-467
- Rank-normalized R̂ + ESS: Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner P-C (2021). Rank-normalization, folding, and localization: an improved R̂ for assessing convergence of MCMC. Bayesian Analysis 16(2):667-718
- FREM: Jonsson EN, Nyberg J (2024). Full Random Effects Models (FREM): A practical usage guide. CPT Pharmacometrics Syst Pharmacol 13(8):1297-1308. doi:10.1002/psp4.13190
- Robust MAP / Historical Borrowing: Schmidli H, Gsteiger S, Roychoudhury S, O'Hagan A, Spiegelhalter D, Neuenschwander B (2014). Robust meta-analytic-predictive priors in clinical trials with historical control information. Biometrics 70(4):1023-1032. doi:10.1111/biom.12242
- NCA: PKNCA-style curve-stripping for terminal lambda_z (
pk.calc.half.life, adjusted R² with most-points tiebreak), linear-up/log-down AUC integration (Purves 1992,pk.calc.auc). Reference: Purves (1992) J Pharmacokin Biopharm 20:211; PKNCA vignettes at https://humanpred.github.io/pknca/
Every threshold used by apmode.data.profiler is stored in a versioned
policy artifact so runs are reproducible without touching Python source:
- Source of truth:
policies/profiler.json - Loader:
src/apmode/data/policy.py—get_policy()returns a typed frozenProfilerPolicydataclass;policy_sha256is embedded in everyEvidenceManifest. - Current version:
profiler/v<!-- apmode:AUTO:policy_profiler -->2.1.0<!-- apmode:/AUTO:policy_profiler -->(manifest_schema_version = <!-- apmode:AUTO:profiler_manifest -->2<!-- apmode:/AUTO:profiler_manifest -->; structurednonlinear_clearance_signalsreplaces flat scalars/masks) - Derivation & citations:
docs/PROFILER_REFINEMENT_PLAN.md— one paragraph per policy group tying each default to a primary source. - Drift guard:
tests/unit/test_profiler_policy_consistency.py— enforces JSON↔dataclass↔constants equality and AST-scans drift-prone heuristic functions for bare numeric literals.
| Group | Parameter | Default | Purpose |
|---|---|---|---|
| Smith 2000 dose-proportionality | theta_low |
0.80 | Lower bound of bioequivalence-style exposure interval |
theta_high |
1.25 | Upper bound | |
min_dose_levels |
3 | Minimum distinct dose levels required for eligibility | |
min_dose_ratio |
3.0 | Minimum ratio of highest to lowest dose | |
| Huang 2025 λz selector | min_points |
3 | Minimum points in the terminal-fit window |
tolerance |
1e-4 | Adjusted-R² tie-break tolerance | |
phoenix_constraint |
true | Restrict window to second half of post-Cmax (falls back to min_points on sparse profiles) |
|
adj_r2_threshold |
0.7 | Minimum acceptable terminal-fit quality | |
| Steady-state check | n_half_lives_required |
3 | Elapsed duration in half-lives for SS (87.5% attainment) |
n_doses_alt |
5 | Alternative dose-count path to SS | |
interval_tolerance |
0.25 | ±25% inter-dose variation | |
dose_tolerance |
0.20 | ±20% dose amount variation | |
min_doses |
3 | Minimum doses required in any SS branch | |
| Nonlinear clearance | mm_curvature_ratio |
1.8 | Early/late slope ratio triggering MM vote |
tmdd_curvature_ratio |
0.3 | Inverse ratio triggering TMDD vote | |
compartmentality_curvature_ratio |
1.3 | Separate threshold for 2-cmt linear (distinct from MM) | |
terminal_monoexp_r2_linear_threshold |
0.85 | Monoexp R² above this is evidence for linear PK | |
| Covariate | correlation_threshold_abs_r |
0.7 | Inter-covariate |
missingness_full_information_cutoff |
0.15 | Fraction above → full-information likelihood | |
| Error model | blq_m3_trigger |
0.10 | BLQ fraction triggering M3 |
dynamic_range_proportional |
50.0 | Cmax_p95/p05 triggering proportional error | |
high_cv_ceiling |
80.0 | Suppresses proportional recommendation at extreme CV% | |
lloq_cmax_combined |
0.05 | LLOQ/Cmax median triggering combined | |
terminal_log_mad_combined |
0.35 | Terminal log-residual MAD triggering combined | |
narrow_range_additive |
5.0 | Dynamic range below this + low CV → additive error plausible | |
low_cv_additive_ceiling |
30.0 | DV CV% ceiling for additive recommendation (biomarker-like) | |
| NODE readiness | dim_budget.discovery |
8 | NODE dim cap for Discovery lane |
dim_budget.optimization |
4 | NODE dim cap for Optimization lane | |
min_subjects.{discovery,optimization} |
20 / 10 | Minimum subjects per lane | |
min_median_samples.{discovery,optimization} |
8 / 4 | Minimum median samples/subject | |
| Flip-flop | ka_lambdaz_ratio_likely |
1.0 | ka < λz → flip-flop "likely" |
ka_lambdaz_ratio_possible |
1.5 | ka < 1.5·λz → "possible" |
|
quality_adj_r2_min |
0.85 | Strict terminal adj-R² required to retain "likely" (Richardson 2025) |
|
quality_min_npts |
4 | Minimum median terminal-window size for the strict-quality check | |
| TAD consistency | in_window_fraction_clean |
0.80 | Fraction of obs within union of per-dose intervals |
| Protocol heterogeneity | obs_per_subject_cv_threshold |
0.5 | Across-study CV of obs/subject above this flags pooled-heterogeneous |
| DVID filter | pk_dvid_allowlist |
["cp", "1", "conc", "concentration"] | Accepted observation DVIDs |
fail_open_when_no_match |
true | Keep all rows and WARN (vs raise ValueError) when allowlist matches nothing | |
| Shape detection | multi_peak_fraction_threshold |
0.3 | Fraction of subjects with ≥2 prominent peaks |
lag_signature_fraction_threshold |
0.5 | Fraction with delayed-onset absorption | |
peak_prominence_range_fraction |
0.10 | Prominence as fraction of dynamic range | |
peak_prominence_cmax_floor |
0.05 | Prominence floor as fraction of Cmax | |
peak_min_distance_intervals |
2.0 | Minimum inter-peak distance in sampling intervals | |
| Subject quality | min_concs_for_profile |
5 | Minimum per-subject observations for shape analysis |
min_subjects_for_median |
4 | Minimum subjects for population-median statistics | |
min_subjects_for_dynamic_range |
10 | Minimum positive-Cmax subjects for p95/p05 ratio | |
min_obs_per_subject_moderate |
4 | Richness band: sparse / moderate boundary | |
min_obs_per_subject_rich |
8 | Richness band: moderate / rich boundary | |
absorption_coverage_min_pre_tmax |
2.0 | Mean pre-Tmax samples for "adequate" coverage | |
elimination_coverage_min_post_tmax |
3.0 | Mean post-Tmax samples for "adequate" coverage |
Profiler fields populate the EvidenceManifest — they are advisory
inputs that shape dispatch (Lane Router) and ranking (Gate 3). None of
them on their own remove a candidate. Hard disqualification lives in
the Gate policies (policies/submission.json, discovery.json,
optimization.json) and is evaluated by governance/gates.py.
The Huang-2025 lambdaz_adj_r2_threshold = 0.7 is used as a
quality floor (advisory) inside _assess_flip_flop_risk: below
this, the profiler returns "unknown" because the terminal fit is
too poor to support any flip-flop call. The stricter
flip_flop.quality_adj_r2_min = 0.85 is required to retain a
"likely" call — a lower-quality-but-still-usable fit downgrades to
"possible".
Gate policies are lane-specific, versioned (currently 0.6.0), and discoverable via
apmode policies <lane>. The Pydantic schema lives in
src/apmode/governance/policy.py
and is validated by the CI hook
src/apmode/governance/validate_policies.py.
| Gate | Parameter | Submission | Discovery | Optimization | Purpose |
|---|---|---|---|---|---|
| Gate 1 — Technical validity | cwres_mean_max |
0.10 | 0.15 | 0.12 | Max magnitude of population-level CWRES mean |
outlier_fraction_max |
0.05 | 0.08 | 0.06 | Max fraction of |CWRES| > 4 outliers | |
pit_z_alpha |
1.5 | 2.5 | 2.0 | SE multiplier for n-scaled PIT tolerance; higher = wider window (more permissive). Submission lane is strictest | |
pit_tol_tail_floor / pit_tol_median_floor |
0.03 / 0.05 | 0.05 / 0.10 | 0.04 / 0.07 | Absolute floor on the n-scaled tolerance: tol = max(floor, z_α·sqrt(p(1-p)/n_subjects)). Binds at large n to prevent vacuous strictness (0.4.2 replaces bin-level VPC) |
|
vpc_coverage_target / _tolerance |
0.90 / 0.10 | 0.90 / 0.15 | 0.90 / 0.12 | Legacy — retained in policy schema for compatibility; no longer consumed by Gate 1 (0.4.2) | |
seed_stability_n |
3 | 3 | 3 | Required seed replicates (top-K candidates only) | |
seed_stability_cv_max |
0.10 | 0.10 | 0.10 | Max CV of OFV across seeds | |
obs_vs_pred_r2_min |
0.30 | 0.30 | 0.30 | Minimum R² for state-trajectory validity | |
cwres_sd_min / _max |
0.50 / 2.0 | 0.50 / 2.0 | 0.50 / 2.0 | CWRES SD should hug 1.0 | |
gradient_norm_max |
100.0 | 100.0 | 100.0 | Convergence gradient norm ceiling | |
Bayesian rhat_max |
1.01 | 1.01 | 1.01 | Vehtari 2021 rank-normalized R̂ | |
Bayesian ess_bulk_min / _tail_min |
400 / 400 | 400 / 400 | 400 / 400 | Vehtari 2021 ESS floor | |
Bayesian n_divergent_max |
0 | 0 | 0 | HMC divergence ceiling | |
Bayesian ebfmi_min |
0.30 | 0.30 | 0.30 | E-BFMI floor (Betancourt 2016) — flags pathological energy mixing | |
Bayesian pareto_k_max |
0.70 | 0.70 | 0.70 | PSIS-LOO Pareto-k ceiling (Vehtari 2017) | |
| Gate 2 — Lane admissibility | interpretable_parameterization |
required | not_required | preferred | Blocks NODE in Submission |
shrinkage_max |
0.30 | null | 0.30 | Max η-shrinkage (null = no threshold) | |
identifiability_required |
true | false | true | Profile-likelihood CI + condition number check | |
node_eligible |
false | true | true | Hard Submission exclusion (PRD §3) | |
loro_required |
false | false | true | LORO-CV required for Optimization | |
LORO loro_npde_mean_max |
— | — | 0.3 | NPE pool-mean ceiling | |
LORO loro_npde_variance_{min,max} |
— | — | 0.5 / 1.5 | NPE pool-variance bounds | |
LORO loro_vpc_coverage_min |
— | — | 0.80 | VPC coverage concordance floor | |
LORO loro_min_folds |
— | — | 3 | Minimum regimen folds | |
| Gate 2.5 — Credibility (ICH M15) | context_of_use_required |
n/a | true | true | COU statement present |
limitation_to_risk_mapping_required |
n/a | false | true | Risk-mapped limitations | |
data_adequacy_required |
n/a | true | true | n_obs / n_params check | |
data_adequacy_ratio_min |
n/a | 5.0 | 5.0 | Minimum obs/params ratio | |
sensitivity_analysis_required |
n/a | false | true | Sensitivity artifact present | |
ai_ml_transparency_required |
n/a | true | true | Statement required for NODE/agentic | |
| Gate 3 — Cross-paradigm composite | gate3.composite_method |
weighted_sum |
borda |
borda |
Aggregation: "weighted_sum" or "borda" (rank-based, scale-invariant) |
gate3.vpc_weight |
0.5 | 0.5 | 0.35 | VPC coverage concordance weight | |
gate3.npe_weight |
0.5 | 0.5 | 0.35 | NPE (simulation if available, else CWRES proxy) weight | |
gate3.bic_weight |
0.0 | 0.0 | 0.0 | BIC off by default (PRD §10 Q2 — likelihoods incomparable cross-paradigm) | |
gate3.auc_cmax_weight |
0.0 | 0.0 | 0.30 | Smith 2000 AUC/Cmax BE fraction weight (Optimization-only until backends ship posterior-predictive sims) | |
gate3.auc_cmax_nca_max_blq_burden |
0.20 | 0.20 | 0.20 | BLQ% above which observed-NCA reference is ineligible pooled (Thway 2018) | |
gate3.auc_cmax_nca_min_eligible |
8 | 8 | 8 | Min per-subject NCA-eligible count to emit auc_cmax_be_score |
|
gate3.auc_cmax_nca_min_eligible_fraction |
0.5 | 0.5 | 0.5 | Min eligible / total fraction — AND-combined with absolute floor | |
gate3.n_posterior_predictive_sims |
500 | 500 | 500 | Backend posterior-predictive draws per candidate (Bergstrand 2011 VPC convention) | |
gate3.vpc_n_bins |
10 | 10 | 10 | Post-hoc time bins for VPC coverage aggregation (no pre-declared grid) | |
gate3.npe_cap |
5.0 | 5.0 | 5.0 | Weighted-sum only: NPE value above this clamps to 1.0 | |
gate3.bic_norm_scale |
1000.0 | 1000.0 | 1000.0 | Weighted-sum only: divisor for BIC normalization | |
vpc_concordance_target |
0.90 | 0.90 | 0.90 | Target coverage for concordance score | |
| Missing data | mi_pmm_max_missingness |
0.20 | 0.30 | 0.25 | Above → FREM preferred |
frem_preferred_above |
0.25 | 0.40 | 0.30 | Explicit FREM trigger | |
m_imputations |
20 | 5 | 10 | Default MI multiplicity | |
adaptive_m |
true | false | true | Escalate m up to m_max on variance |
|
m_max |
40 | 10 | 20 | Adaptive ceiling | |
adaptive_variance_threshold |
0.05 | 0.10 | 0.08 | Between-imputation variance trigger | |
blq_m3_threshold |
0.05 | 0.15 | 0.10 | BLQ burden → M3 (else M7+) | |
imputation_stability_penalty |
0.0 | 0.25 | 0.5 | Rank-stability threshold = 1 − penalty |
All gate thresholds are sourced from primary literature: CWRES conventions (Nguyen et al. 2017, CPT PSP 6:87), VPC coverage (Bergstrand & Karlsson 2009, AAPS J 11:371), R̂/ESS (Vehtari et al. 2021, Bayesian Analysis 16:667), E-BFMI (Betancourt 2016, arXiv:1604.00695), PSIS-LOO Pareto-k (Vehtari et al. 2017, Stat Comput 27:1413), BLQ M3/M7 (Beal 2001, JPKPD 28:481; Wijk M, Wasmann RE, Jacobson KR, Svensson EM, Denti P 2025, CPT:PSP, doi:10.1002/psp4.70015), FREM (Jonsson & Nyberg 2024, CPT:PSP 13(8):1297, doi:10.1002/psp4.13190). Lane-specific tolerances follow the PRD §4.3.1 submission/discovery/optimization risk profile.
To tune for a deployment, edit the relevant policy JSON, re-run
apmode run ..., and inspect the emitted
evidence_manifest.policy_sha256 (profiler) or
policy_file.json (gate) in the reproducibility bundle to confirm
the change propagated.
The APMODE Data Profiler stands on the shoulders of the open-source pharmacometric community. We specifically build on:
- nlmixr2 / nlmixr2autoinit (Standing lab, UCL; Matt Fidler et al.) — the Huang 2025
find_best_lambdaz,is_ss, Wagner-Nelson helpers, and the automated initial-estimates pipeline are replicated faithfully from the R package. https://github.com/ucl-pharmacometrics/nlmixr2autoinit - Pharmpy / AMD tool (Uppsala Pharmacometrics Research Group) — the search-space MFL, structural/IIV/RUV search algorithms, and dispatch-feasibility concepts inform APMODE's Lane Router. https://github.com/pharmpy/pharmpy
- pyDarwin (Certara; Sale, Sherer, Nieforth et al.) — the Bayesian-optimisation + penalty-function framework for popPK structural search motivated APMODE's penalty / plausibility signals. https://github.com/certara/pyDarwin
- rxode2 / nlmixr2est — canonical oral / IV / infusion model code generation.
- PKNCA (Denney) — the reference NCA implementation whose conventions (linear-up/log-down AUC,
pk.calc.half.life) APMODE's profiler follows. https://github.com/humanpred/pknca - Stan / Torsten / cmdstanpy (Metrum Research Group + Stan dev team) — Bayesian backend depends on Torsten's PK-aware ODE integrators and cmdstanpy for CmdStan driving.
- pandas, numpy, scipy, scikit-learn, hypothesis, syrupy — the Python scientific and testing stack.
- Pydantic, Lark, ruff, mypy, uv, pytest — the type-safety, parsing, linting, and test toolchain.
Please cite the individual papers listed under Pharmacometric References when reporting APMODE-based analyses.
| Document | Purpose |
|---|---|
docs/ARCHITECTURE.md |
Technical architecture (v0.3) |
docs/PRD_APMODE_v0.3.md |
Current PRD (v0.3, source of truth) |
docs/FORMULAR.md |
Formular DSL full reference |
docs/PROFILER_REFINEMENT_PLAN.md |
Profiler policy derivation + citations |
policies/*.json |
Gate threshold policy files per lane |
.claude/skills/apmode/SKILL.md |
Claude Skill for LLM assistants (Claude Code, Codex, Gemini, Droid, …) — lane guidance, verified flag defaults, bundle artifacts, gotchas |
16 direct `@app.command` entries are auto-counted from `src/apmode/cli.py`; `apmode --help` also shows the registered `bundle` and `completion` command groups:
| Command | Description |
|---|---|
apmode run <csv> --lane <lane> |
Full pipeline: ingest → profile → NCA → search → gates → bundle → report |
apmode validate <bundle> |
Validate bundle completeness + JSONL integrity |
apmode inspect <bundle> |
Print bundle summary + structured nonlinear-clearance signals panel |
apmode datasets [name] |
Browse or download 14 public PK datasets from nlmixr2data |
apmode explore <name-or-csv> |
Interactive wizard: step-by-step data exploration with optional pipeline launch |
apmode diff <bundle-a> <bundle-b> |
Side-by-side comparison of evidence, rankings, gate pass rates |
apmode log <bundle> --top N |
Show top-N ranked candidates; --failed / --gate gate1 for failure triage |
apmode trace <bundle> |
Agentic iteration traces: summary, --iteration N, --cost, --json |
apmode lineage <bundle> <candidate> |
Transform chain from root to candidate with per-gate status |
apmode report <bundle> |
Open HTML (browser) or --format md regulatory report in pager |
apmode doctor |
Check R/nlmixr2/CmdStan/Python deps + LLM provider keys |
apmode ls [--sort bic|time] [--limit N] [--format {table,path,json}] |
List run bundles under ./runs; --format path emits absolute paths for shell scripting (BUNDLE=$(apmode ls --sort bic --limit 1 --format path)) |
apmode policies [lane] [--validate] |
List/inspect gate policies; --validate runs the CI schema hook |
apmode graph <bundle> |
Search DAG visualization: --format tree/dot/mermaid/json, --converged, --backend |
apmode serve |
HTTP API behind uvicorn; loopback default (refuses non-loopback without --allow-public); 30 s graceful-shutdown budget; full endpoint contract under HTTP API |
apmode bundle ... |
Bundle operations: RO-Crate export/import/publish and CycloneDX SBOM sidecar generation |
apmode completion ... |
Install, show, or uninstall shell completion |
| Flag | Default | Description |
|---|---|---|
--lane |
submission |
Operating lane: submission, discovery, optimization |
--backend |
nlmixr2 |
Estimation backend: nlmixr2 (SAEM/FOCEi) or bayesian_stan (NUTS via cmdstanpy) |
--bayes-chains |
4 |
NUTS chains (Bayesian backend only) |
--bayes-warmup |
1000 |
Warmup iterations per chain (Bayesian backend only) |
--bayes-sampling |
1000 |
Sampling iterations per chain (Bayesian backend only) |
--bayes-adapt-delta |
0.8 |
NUTS target acceptance (Bayesian backend only) |
--bayes-max-treedepth |
12 |
NUTS max treedepth (Bayesian backend only) |
--seed |
753849 |
Root random seed for reproducibility |
--parallel-models N / -j N |
1 |
Max concurrent model evaluations (R subprocesses). Higher values speed up search but use more memory. |
--timeout |
900 |
Per-candidate backend timeout in seconds (SAEM on 50 subj ≈ 10s; 120 subj ≈ 60–120s; 1000+ subj ≈ 300–600s) |
--agentic/--no-agentic |
off | Enable the agentic LLM backend (discovery/optimization lanes). OFF by default because the loop ships aggregated diagnostics to a third-party LLM provider; pass --agentic to opt in. |
--resume-agentic |
off | Skip Stage 5 (classical SAEM search) and load classical_checkpoint.json from the existing bundle directory. Use after an agentic API failure to restart the LLM loop without re-running a multi-hour SAEM search. |
--max-iterations |
10 (cap 25) |
Agentic LLM iteration count (PRD §4.2.6 hard-caps at 25). |
--provider |
anthropic |
LLM provider: anthropic, openai, gemini, ollama, openrouter |
--model |
per-provider default (e.g., claude-sonnet-4-20250514, qwen3:4b) |
Model name passed to the provider SDK |
--policy |
auto | Gate policy JSON file (falls back to policies/<lane>.json) |
-o, --output-dir |
./runs |
Bundle parent directory |
--dry-run |
off | Preview the pipeline without dispatching backends |
| Variable | Purpose |
|---|---|
APMODE_POLICIES_DIR |
Override policy resolution in src/apmode/paths.py (defaults to pyproject-walk) |
ANTHROPIC_API_KEY / OPENAI_API_KEY / GEMINI_API_KEY / GOOGLE_API_KEY / OPENROUTER_API_KEY |
LLM provider auth (agentic backend) |
OLLAMA_HOST |
Non-default local Ollama endpoint (agentic backend) |
Exit codes: 0 success, 1 input/validation error, 2 backend error, 130 user interrupt.
| Dataset | Subjects | Route | Elimination | Covariates |
|---|---|---|---|---|
theo_sd |
12 | oral | linear | WT |
warfarin |
32 | oral | linear | WT, age, sex |
mavoglurant |
120 | oral | unknown | AGE, SEX, WT, HT |
pheno_sd |
59 | IV | linear | WT, APGR |
nimoData |
40 | IV infusion | unknown | WT |
Plus 9 simulated ground-truth datasets (1/2-cmt, oral/IV/infusion, linear/MM): Oral_1CPT, Oral_1CPTMM, Oral_2CPT, Oral_2CPTMM, Bolus_1CPT, Bolus_2CPT, Bolus_2CPTMM, Infusion_1CPT, theo_md.
The agentic backend is a closed-loop model improvement system where an LLM proposes typed PK model transforms based on diagnostic feedback, operating exclusively within the Formular DSL grammar.
Privacy: The agentic backend is off by default (
--agenticto enable). When enabled, aggregated fit diagnostics — but never per-subject data — are sent to the selected LLM provider. The allow-list gate indiagnostic_summarizer.redact_for_llm()is the single enforcement point; unknown fields fail closed.
The orchestrator runs the agentic stage after classical search, in two modes:
| Mode | Starting Spec | What the LLM Does |
|---|---|---|
| Refine | Best classical candidate from search | Targeted improvement — add covariates, swap modules, adjust variability, declare priors |
| Independent | Minimal 1-cmt oral spec | Build from scratch — LLM discovers structure through transforms |
In discovery/optimization lanes, the LLM can also propose replace_with_node transforms to introduce Neural ODE modules, and set_prior transforms for Bayesian fits. All agentic candidates enter the same governance gate funnel as classical candidates.
AgenticRunner.run(initial_spec, data)
│
FOR each iteration (max 25, hard-clamped in AgenticConfig):
│ 1. EVALUATE → inner_runner fits current spec (nlmixr2 / Stan / JAX)
│ 2. SUMMARIZE → convergence, CWRES, shrinkage, VPC (+ R̂/ESS if Bayesian) → markdown
│ 3. PROMPT → system prompt + conversation history → LLM
│ 4. PARSE → JSON extraction → typed FormularTransform objects
│ 5. VALIDATE → precondition checks + post-apply semantic validation
│ 6. APPLY → transforms applied sequentially, lineage DAG tracked
│ 7. TRACE → input/output/meta/cached_response written per iteration
│ 8. FEEDBACK → validation failures fed back to LLM for correction
│
RETURN best BackendResult by BIC across all iterations
The LLM cannot write raw code — it can only propose these 10 typed operations (auto-synced from dsl/transforms.py + dsl/prior_transforms.py):
| Transform | Purpose | Example |
|---|---|---|
swap_module |
Replace absorption/distribution/elimination/observation | Linear → MichaelisMenten elimination |
add_covariate_link |
Add covariate effect (power/exponential/linear/categorical/maturation) | Allometric WT → CL |
adjust_variability |
Modify IIV structure (add/remove/upgrade_to_block) | Remove IIV on param with >30% shrinkage |
set_transit_n |
Change transit compartment count | Increase transit N for delayed absorption |
toggle_lag |
Enable/disable absorption lag time | Add tlag for delayed onset |
replace_with_node |
Swap to Neural ODE (discovery/optimization only) | NODE absorption with dim=4 |
set_prior |
Declare/replace a typed prior (Bayesian workflows) | HalfNormal(0, 0.5) on IIV_CL ω |
| Provider | SDK | Auth | Default model | Cost |
|---|---|---|---|---|
| Anthropic | anthropic.AsyncAnthropic |
ANTHROPIC_API_KEY |
claude-sonnet-4-20250514 |
Per-token |
| OpenAI | openai.AsyncOpenAI |
OPENAI_API_KEY |
gpt-4o |
Per-token |
| Google Gemini | google.genai.Client |
GEMINI_API_KEY / GOOGLE_API_KEY |
gemini-2.5-flash |
Per-token |
| OpenRouter | OpenAI-compatible | OPENROUTER_API_KEY |
anthropic/claude-sonnet-4-20250514 |
Per-token |
| Ollama | ollama.AsyncClient |
None (local) | qwen3:4b |
Free |
| litellm | litellm.acompletion |
Per-provider | caller-specified | Fallback |
Install provider SDKs: uv sync --extra llm
- temperature=0 enforced (non-zero raises
ValueError) - Payload hashing — SHA-256 of every request for audit
- Cached responses —
ReplayClientreplays fromagentic_trace/without API calls - Model-version escrow — deterministic fingerprint when available (Anthropic:
full, others:best-effort) - Conversation history — multi-turn context preserved across iterations
- Checkpoint resume —
--resume-agenticreloadsclassical_checkpoint.jsonso a 5xx / timeout / quota-trip during the LLM loop does not discard multi-hour SAEM work
# Run live provider tests (requires API keys or local Ollama)
uv run pytest tests/integration/test_llm_providers_live.py -m live -v
# Tests skip gracefully on missing keys or billing/quota errorsuv sync --extra api pulls FastAPI + Uvicorn + aiosqlite. The CLI wrapper apmode serve binds the API to loopback by default; the factory apmode.api.app.build_app(runs_dir=..., db_path=...) is also available for direct embedding (custom auth middleware, mounting under a parent ASGI app, etc.).
# Default: loopback + port 8765, runs in ./runs, SQLite registry alongside
uv run apmode serve
# Custom paths and port
uv run apmode serve --port 9000 --runs-dir /scratch/apmode-runs --db-path /scratch/apmode-runs/registry.sqlite3
# Bind to all interfaces (requires API key + dataset root; front with TLS/auth)
export APMODE_API_KEY="$(python -c 'import secrets; print(secrets.token_urlsafe(32))')"
uv run apmode serve --host 0.0.0.0 --allow-public --dataset-root /scratch/apmode-data
# Allow Bayesian backend in POST /runs body (off by default until the runner is fully wired)
uv run apmode serve --allow-bayesianThe canonical scientific workflow — ssh -L 8765:127.0.0.1:8765 lab-gpu-box then apmode serve on the remote — needs no extra flags; the loopback default is the right thing for a tunnelled session. Passing --host 0.0.0.0 (or any non-loopback address) without --allow-public exits with code 2 and a message naming the risk: bundle artefacts may contain patient data. --allow-public itself now requires all three of: APMODE_API_KEY set in the environment (verified per-request via hmac.compare_digest), --dataset-root <DIR> (so a remote caller cannot request dataset_path: /etc/passwd — the path is resolved as joinpath(root, user_input).resolve() and rejected if it escapes the root), and a TLS-terminating reverse proxy in front. /healthz stays open without a key for liveness probes; every other endpoint requires the X-API-Key header.
| Endpoint | Method | Behaviour |
|---|---|---|
/healthz |
GET |
{ "status": "ok", "apmode_version": "<v>" } — readiness probe |
/runs |
POST |
202 Accepted + Retry-After: 5; spawns the run as an asyncio task tracked in app.state.active_tasks |
/runs |
GET |
List run summaries from the SQLite store |
/runs/{id}/status |
GET |
Current RunStatus (PENDING / RUNNING / COMPLETED / CANCELLED / FAILED / INTERRUPTED) |
/runs/{id}/bundle |
GET |
Sealed reproducibility bundle as a ZIP; 425 Too Early while the bundle is unsealed |
/runs/{id}/rocrate |
GET |
Workflow Run RO-Crate projection as a ZIP (provenance-run-crate-0.5) |
/runs/{id} |
DELETE |
Cancellation: SIGTERM the child R/cmdstan process group, 5 s grace, then SIGKILL — terminal-status rows return 409 |
Persistence + lifecycle. The Starlette lifespan calls SQLiteRunStore.initialize(), which idempotently sweeps RUNNING rows from a crashed prior process to INTERRUPTED, and cancels every active task on shutdown. Writes use BEGIN IMMEDIATE in WAL mode for safe concurrent access. Subprocess termination is shared between Nlmixr2Runner and BayesianRunner via apmode.backends.process_lifecycle.terminate_process_group(proc, grace_seconds=5), called from each runner's asyncio.CancelledError path; the SIGTERM → grace → SIGKILL → reap sequence runs inside asyncio.shield so a second cancellation (lifespan shutdown racing a DELETE) cannot orphan the inner wait_for or skip the SIGKILL escalation. The CANCELLED / FAILED status writes and the post-terminal on_complete callback in apmode.api.runs.execute_run are likewise shielded so the 429 capacity slot is always released even under double-cancel. The default --timeout-graceful-shutdown of 30 s covers the 5 s SIGTERM-to-SIGKILL window plus headroom for the cancellation handler to seal the partial bundle.
Hardening guidance. The apmode serve CLI is intended for single-user local + lab-network use. Loopback deployments may run without APMODE_API_KEY; any non-loopback bind requires --allow-public, APMODE_API_KEY, and --dataset-root, and all run-management endpoints require X-API-Key. Treat that static key as a floor, not a full deployment security model: front public deployments with a TLS-terminating reverse proxy that handles user authentication and per-bundle access control.
Future enhancement (not shipped in v0.6.1-rc1): a ?wait=N long-poll on GET /runs/{id}/status would let interactive clients wait up to 60 s for a status transition without the 5 s polling round-trip. The implementation path is a per-run_id asyncio.Event registry on app.state (set by the update_status write path, awaited by the route handler) capped by uvicorn's --limit-concurrency. Deferred until usage telemetry shows polling pain — current scientific clients run minutes-to-hours per fit, so 5 s polling is well within the tolerable budget.
A sealed bundle can be projected onto a Workflow Run RO-Crate — Provenance Run Crate v0.5 for FAIR packaging, WorkflowHub / Zenodo-ready archives, and regulatory crosswalk (FDA PCCP, EU AI Act Article 12). The Pydantic bundle remains producer-side truth; the RO-Crate is a read-only external projection — the source directory is never mutated by export.
# Directory-form crate
uv run apmode bundle rocrate export runs/<run_id> --out runs/<run_id>.crate
# ZIP-form crate (deterministic archive; fixed 1980-01-01 timestamps + sorted entries)
uv run apmode bundle rocrate export runs/<run_id> --out runs/<run_id>.crate.zip
# Validate with roc-validator (REQUIRED severity, provenance-run-crate-0.5)
uv run apmode validate runs/<run_id> --rocrate --crate runs/<run_id>.crate
# Round-trip import — extracts and re-verifies the _COMPLETE SHA-256 digest
uv run apmode bundle import runs/<run_id>.crate.zip --out runs/<run_id>-imported
# Inspect crate summary (mainEntity, action-triad counts, sentinel)
uv run apmode inspect runs/<run_id> --rocrate-view --crate runs/<run_id>.crateWhat the export writes — every file under the bundle directory is copied verbatim into the crate. The ro-crate-metadata.json graph registers the core artifacts (data / split / evidence / seed / backend-versions / initial-estimates / policy / missing-data-directive manifests, compiled specs, backend results, gate decisions, candidate + run lineage, search trajectory / failed candidates / search graph, ranking, credibility reports, report provenance, Bayesian artefacts, agentic-trace iterations, regulatory PCCP files, and the _COMPLETE sentinel) as graph entities. A few diagnostic artefacts (e.g. nca_diagnostics.jsonl, imputation_stability.json, categorical_encoding_provenance.json, loro_cv/, diagnostics/nca_plots/, seed-stability _seed_N_result.json) are present as files in the crate but are not registered as named File entities — consumers can still read them by bundle-relative path.
Graph shape (WRROC action triad) — each candidate fit is a CreateAction whose instrument is the backend-engine SoftwareApplication (one of #engine-nlmixr2 / #engine-bayesian-stan / #engine-node / #engine-agentic, with softwareVersion from backend_versions.json) and whose object carries the candidate DSL spec (apmode:dslSpec → compiled-spec File) plus the data manifest. Each backend that emitted a result registers a #step-backend-<engine> HowToStep on the lane workflow. Each gate decision is a ControlAction (instrument = HowToStep, object = CreateAction, result = gate-decision File, apmode:gateRationale → the same file). The lane run is wrapped in an OrganizeAction whose startTime / endTime are derived from _COMPLETE.sealed_at, so a given bundle yields a stable datePublished regardless of host. The _COMPLETE sentinel is a File with additionalType = "apmode:completeSentinel" and identifier = "sha256:<hex>" for host-independent integrity checks. The root Dataset declares conformsTo against Provenance / Workflow / Process Run Crate v0.5 plus the base workflow-ro-crate/1.0 and ro-crate/1.1 profiles.
Security — ZIP import performs per-entry validation before extraction: entries whose resolved paths escape the staging root, absolute paths, Windows drive-letter prefixes, and non-regular file types (symlinks, sockets, block devices) are rejected. Directory-form import also rejects symlinks encountered anywhere under the source. The synthetic workflow stub at workflows/<lane>-lane.apmode is identified by reading mainEntity.@id from the crate metadata (with a .. / absolute-path / null-byte sanity guard), so user-authored files under workflows/ round-trip unmodified. The digest-exclusion check is case-insensitive, bundle-relative (matched against p.relative_to(bundle).as_posix() rather than the basename so a same-named artefact in a nested subdir is not silently exempted), and stays in lock-step with apmode.bundle.emitter._DIGEST_EXCLUDED_RELATIVE_PATHS (_COMPLETE and bom.cdx.json at the bundle root; the SBC manifest at the full relative path artifacts/sbc/sbc_manifest.json). A _COMPLETE digest mismatch aborts the import with a non-zero exit.
Scope — read-only export, directory and ZIP forms, round-trip import, and apmode validate --rocrate / apmode inspect --rocrate-view wiring. Lane detection reads policy_file.json.lane (Submission, Discovery, Optimization) and the corresponding HowTo / HowToStep graph is emitted per WRROC; the Submission-lane ranking guardrail (NODE/agentic never recommended) is an APMODE governance rule, not an RO-Crate constraint. Agentic-trace iterations are projected by default; PROV-AGENT typing (provagent:AIModelInvocation, canonical class per Souza et al., eScience 2025, arXiv:2508.02866 v3) is opt-in via --include-provagent. apmode:regulatoryContext is no longer auto-stamped to pccp-ai-dsf whenever regulatory/ is present — operators must pass --regulatory-context {research-only|pccp-ai-dsf|mdr|ai-act-article-12} explicitly. Discovery-lane tiering for ≥ 10³-candidate runs and live WorkflowHub / Zenodo uploads are planned follow-ons; the apmode bundle publish CLI ships the argument surface today and validates the bundle path. Design rationale: docs/ARCHITECTURE.md § RO-Crate.
Every CI run and every tagged release ships a CycloneDX Software Bill of Materials (SBOM) covering the Python dependency graph that produced the bundle.
- CI (
securityjob) emitsbom.cdx.jsonviapip-audit --format cyclonedx-jsonand uploads it as a workflow artifact (sbom-cyclonedx-json, 90-day retention). - Releases (
.github/workflows/release.yml) regenerate the SBOM on eachv*tag push and attach it — alongside the built wheel + sdist — to the GitHub release viagh release create. - Per-bundle sidecar — run
uv run apmode bundle sbom runs/<run_id>to dropbom.cdx.jsoninto a sealed bundle. The file is explicitly excluded from the_COMPLETESHA-256 digest (apmode.bundle.emitter._compute_bundle_digest), so adding or regenerating the SBOM does not invalidate the bundle seal. - In the RO-Crate — when the SBOM sidecar is present,
apmode bundle rocrate exportprojects it as aFileentity withencodingFormat="application/vnd.cyclonedx+json"andadditionalType="apmode:sbom", linked from the root Dataset'shasPart. Consumers that walk the graph can dispatch on the media type without opening the file.
This README's numeric claims (version, test count, transform count, CLI-command count, dataset count, policy versions, profiler manifest version) are rewritten from the codebase by scripts/sync_readme.py. Each auto-synced value sits between HTML comment markers like:
<!-- apmode:AUTO:tests -->2581<!-- apmode:/AUTO:tests -->
Running the script:
uv run python scripts/sync_readme.py # update
uv run python scripts/sync_readme.py --check # exit 1 if drift; CI-friendlyOn --check, the script fails if any marker's value disagrees with the resolved authoritative value:
| Key | Authoritative source |
|---|---|
version, version_tag |
Most recent ## [X.Y.Z] header in CHANGELOG.md |
tests, tests_nonlive |
pytest --collect-only -q (non-live: -m "not live") |
policy_gate |
policies/{submission,discovery,optimization}.json — all three must agree |
policy_profiler, profiler_manifest |
policies/profiler.json |
transforms |
BaseModel subclasses in dsl/transforms.py + dsl/prior_transforms.py |
cli_cmds |
@app.command(...) decorators in cli.py |
datasets |
DatasetInfo(...) entries in data/datasets.py |
backends |
Non-dunder .py files under src/apmode/backends/ |
Wire scripts/sync_readme.py --check into pre-commit or CI to block PRs that let the README drift.
User-facing constraints in <!-- apmode:AUTO:version_tag -->v0.6.1-rc1<!-- apmode:/AUTO:version_tag -->. Status descriptors: partial = behaviour available with caveats; planned = scoped follow-on; deferred = explicit ADR-documented out-of-scope. See docs/ARCHITECTURE.md and the docs/adr/ series for the full design rationale.
- Multi-dose steady state (partial) —
ADDL/IIdose expansion is supported across all backends;SS != 0is supported in the nlmixr2 lane only. Stan and NODE backends hard-rejectSS != 0at Gate 1; seedocs/ARCHITECTURE.mdand current validator/backend limitations. - NODE infusions (deferred) — the NODE backend is oral-only; infusion data (
RATE > 0) is rejected withInvalidSpecError. Piecewise-JAX-solver follow-on remains deferred; seedocs/ARCHITECTURE.mdfor backend scope. - NODE random effects (planned) — current NODE training is pooled-population NLL (no per-subject random effects). Laplace approximation on a ≤16×16 block of latent/input-layer parameters (block-diagonal Hessian primary; L-BFGS + ridge fallback) is the next milestone; full-NN-weight RE remains research-branch.
- NODE scaling (planned) — the Python-list subject loop scales to ~50 subjects. The next release replaces it with
jax.lax.mapper subject plus ajax.vmapfastpath for uniform time grids (target memory ceiling on A100-40G ≈ 2000 subjects for a 4-compartment NODE). - NODE posterior-predictive simulation (planned) —
sample_posterior_predictiveis currently an inert stub returningNonewithUserWarning. The Laplace-RE landing draws random effects from the posterior and routes per-subject simulations through the canonicalbuild_predictive_diagnostics. - Stan maturation + IOV (planned) —
NotImplementedErroron Emax maturation and IOV eta back-transform today. BLQ M3/M4 left-censored likelihood lowering is present in the Stan emitter. - TMDD QSS coverage (planned) — Suite A5 now has benchmark coverage and the nlmixr2 emitter supports TMDD QSS, but automated enumeration/profiler support remains limited. Planned: late-slope-steepening (LSS) profiler signal,
tmdd_qssenumerator wiring, default to pure-MM (3 params) withparallel-linear + MMonly when the profiler detects a linear clearance component.KSS = KD + kint / koncanonicalisation remains research-branch. - TimeVaryingElim decay forms: the nlmixr2 emitter supports all three
decay_fn ∈ {exponential, half_life, linear}. Stan-side lowering is exponential-only pending the same Stan-emitter follow-on. - Context of use (partial) — the orchestrator auto-generates a COU statement for Gate 2.5; a
--context-of-use "<str>"CLI override is planned. - Agentic LLM backend: requires funded API keys (Anthropic / OpenAI / Gemini / OpenRouter) or a local Ollama install with a chat-capable model (≥ 4B parameters recommended).
If you use APMODE in your research, please cite:
@software{apmode2026,
title = {APMODE: Adaptive Pharmacokinetic Model Discovery Engine},
author = {Kornilov, Sergey A.},
year = {2026},
url = {https://github.com/biostochastics/apmode},
license = {GPL-2.0-or-later},
version = {0.6.1-rc1}
}GitHub users can also click "Cite this repository" on the repo page, powered by CITATION.cff.
Licensed under GPL-2.0-or-later.
The primary engine is nlmixr2 (R), which is GPL-2 licensed. This license choice is deliberate and affects build structure from Phase 1.
Capability Status • Quick Start • Formular DSL • Architecture • Governance • Benchmarks • CLI • HTTP API