OmniBioAI ModelHub

OmniBioAI ModelHub is a production-grade experiment tracking and model lifecycle management system for AI/ML models within the OmniBioAI ecosystem — purpose-built for biomedical AI plugins.

It provides:

Experiment tracking — log params, metrics, tags from training runs
Step-indexed metric history with sparkline visualization
Immutable model versioning (write-once)
Cryptographic integrity verification (SHA256)
Staged promotion workflows (latest → staging → production)
Alias management with full audit trail
MySQL-backed run storage with filesystem fallback
Plugin-first design — PluginRunClient for TES container environments
Local-first, cloud-ready storage abstraction
REST API (FastAPI) + CLI (omr) + Python SDK

The registry is implemented as a standalone Python library (package name: omnibioai-model-registry, CLI entrypoint: omr) and ships a self-contained FastAPI service.

Status

✅ Experiment tracking (RunLogger, PluginRunClient)
✅ MySQL-backed metric + param storage
✅ Immutable and verifiable model storage
✅ Audit-ready promotion workflow
✅ 18 REST endpoints (tracking + registry + governance)
✅ 11 CLI commands
✅ ModelHub UI with Experiments tab + metric sparklines
✅ Local-first, cloud-ready design

Why This Exists

Biomedical AI requires:

Reproducibility
Auditability
Governance
Offline / air-gapped deployment
Cross-infrastructure execution parity

Traditional ML tooling often assumes cloud-first infrastructure, mutable artifacts, and weak provenance guarantees.

OmniBioAI ModelHub is designed differently.

It treats AI models as scientific artifacts that must be immutable, verifiable, and reproducible across environments.

Experiment Tracking

Two clients cover the two primary execution contexts.

RunLogger (filesystem — local scripts, notebooks)

Writes directly to the registry root on the local filesystem. No network required.

from omnibioai_model_registry import RunLogger, register_model

with RunLogger(task="celltype_sc", model_name="human_pbmc") as run:
    run.log_params({"lr": 0.001, "epochs": 50, "batch_size": 32})
    for epoch, acc in enumerate(training_curve):
        run.log_metric("accuracy", acc, step=epoch)
        run.log_metric("val_loss", loss, step=epoch)
    run.set_tag("team", "bioml")

register_model(
    task="celltype_sc",
    model_name="human_pbmc",
    version="2026-06-14_001",
    artifacts_dir="/path/to/artifacts",
    metadata={"lineage": {"run_id": run.run_id}},
)

Filesystem layout produced by RunLogger:

{registry_root}/tasks/{task}/models/{model_name}/runs/{run_id}/
    params.json          # {"lr": 0.001, "epochs": 50}
    tags.json            # {"team": "bioml"}
    metrics/
        accuracy.jsonl   # one JSON record per step
        val_loss.jsonl

PluginRunClient (HTTP — TES container plugins)

Posts metrics and params to the ModelHub REST API. Designed for training jobs running inside TES-scheduled containers that cannot access the registry filesystem directly.

import os
from omnibioai_model_registry import PluginRunClient

with PluginRunClient(
    task="celltype_sc",
    model_name="human_pbmc",
    registry_url=os.environ["MODEL_REGISTRY_BASE_URL"],
) as run:
    run.log_params({"lr": 0.001})
    run.log_metric("accuracy", 0.95, step=0)
    run.set_tag("plugin_version", "1.2.3")

Both clients share the same log_params / log_metric / set_tag interface. The storage backend is the only difference.

Role in the OmniBioAI Architecture

OmniBioAI follows a four-plane architecture:

Plane	Responsibility
Control Plane	UI, registries, metadata, governance
Compute Plane	Workflow execution, HPC/cloud adapters
Data Plane	Artifacts, datasets, outputs
AI Plane	Reasoning, RAG, agents, interpretation

The ModelHub belongs to the Control Plane and provides AI artifact governance, deterministic inference references, promotion and audit workflows, and infrastructure-independent model resolution.

Core Design Principles

1) Immutability

Each model version is write-once: no overwrites, no silent mutation, full historical trace. This guarantees scientific reproducibility.

2) Integrity Verification

Every model package includes a SHA256 manifest (sha256sums.txt) that hashes the package contents (excluding itself). This enables bit-level reproducibility, tamper detection, and trustworthy deployment in regulated environments.

3) Provenance-Friendly Metadata

Each model stores structured metadata via model_meta.json:

training code version (git commit)
dataset reference (e.g., DVC / object store ref)
hyperparameters and preprocessing
lineage.run_id linking back to the originating tracking run

4) Promotion Workflow

Models move through controlled stages:

latest → staging → production

All promotions are explicit, append-only, and audited (audit/promotions.jsonl).

5) Storage Abstraction

v0.2.0 supports a local filesystem backend (localfs) with a MySQL-backed tracking layer. S3 / Azure Blob backends are on the roadmap.

Repository Structure

omnibioai-model-registry/
├── omnibioai_model_registry/
│   ├── api.py
│   ├── config.py
│   ├── refs.py
│   ├── errors.py
│   ├── run.py              # RunLogger — filesystem-based tracking
│   ├── plugin_client.py    # PluginRunClient — HTTP-based tracking for TES plugins
│   ├── db.py               # MySQL connection + table bootstrap
│   ├── tracking.py         # Pure-SQL tracking functions
│   ├── storage/
│   ├── package/
│   ├── audit/
│   ├── cli/
│   └── service/
├── frontend/
│   └── omnibioai-model-registry-ui/   # ModelHub UI (React + TypeScript)
├── tests/
├── pyproject.toml
└── README.md

Canonical Model Package Layout

Registered models follow a strict, portable structure:

<OMNIBIOAI_MODEL_REGISTRY_ROOT>/
tasks/<task>/models/<model_name>/
    versions/<version>/
        model.pt
        model_genes.txt
        label_map.json
        model_meta.json
        metrics.json
        feature_schema.json
        sha256sums.txt
    aliases/
        latest.json
        staging.json
        production.json
    audit/
        promotions.jsonl

This guarantees deterministic loading, integrity validation, and cross-environment portability.

Install, Build, and Use as a Python Package

1) Configure registry root

export OMNIBIOAI_MODEL_REGISTRY_ROOT=~/local_registry/model_registry

2) Install (editable) for development

pip install -e .

Verify:

python -c "import omnibioai_model_registry as m; print('OK', m.__file__)"
omr --help

3) Build a wheel (distribution)

pip install build
python -m build

Artifacts are written to dist/:

dist/omnibioai_model_registry-0.2.0-py3-none-any.whl
dist/omnibioai_model_registry-0.2.0.tar.gz

Install the wheel:

pip install dist/*.whl

CLI Usage (`omr`)

11 commands covering the full model lifecycle.

Register a model package

omr register \
  --task celltype_sc \
  --model human_pbmc \
  --version 2026-06-14_001 \
  --artifacts /tmp/model_pkg \
  --set-alias latest

Resolve a model reference

omr resolve --task celltype_sc --ref human_pbmc@latest

Promote a version to production

omr promote --task celltype_sc --model human_pbmc --version 2026-06-14_001 --alias production

Verify integrity

omr verify --task celltype_sc --ref human_pbmc@production

Show metadata

omr show --task celltype_sc --ref human_pbmc@production --json

List models for a task

omr list --task celltype_sc

Show version metrics and run history

omr metrics --task celltype_sc --ref human_pbmc@latest

List aliases

omr aliases --task celltype_sc --model human_pbmc

Set a tag on a model version

omr tag --task celltype_sc --ref human_pbmc@2026-06-14_001 --key team --value bioml

Set lifecycle stage

omr stage --task celltype_sc --model human_pbmc --version 2026-06-14_001 --stage production

Valid stages: none, staging, production, archived.

Compare metrics across versions

omr compare --task celltype_sc --model human_pbmc --versions 2026-02-14_001 2026-06-14_001

Python API Usage

from omnibioai_model_registry import register_model, resolve_model, promote_model

register_model(
    task="celltype_sc",
    model_name="human_pbmc",
    version="2026-06-14_001",
    artifacts_dir="/tmp/model_pkg",
    metadata={
        "framework": "pytorch",
        "model_type": "classifier",
        "provenance": {
            "git_commit": "abc123",
            "training_data_ref": "s3://bucket/datasets/pbmc_v1",
            "trainer_version": "0.2.0",
        },
    },
    set_alias="latest",
    actor="manish",
    reason="initial training",
)

# Resolve by alias (or version)
path = resolve_model("celltype_sc", "human_pbmc@latest", verify=True)
print("Resolved model dir:", path)

# Promote to production
promote_model(
    task="celltype_sc",
    model_name="human_pbmc",
    alias="production",
    version="2026-06-14_001",
    actor="manish",
    reason="validated metrics",
)

REST Service (FastAPI)

Run locally

pip install -e .
uvicorn omnibioai_model_registry.service.app.main:app --host 0.0.0.0 --port 8095

Health check:

curl -s http://127.0.0.1:8095/health | python -m json.tool

Endpoints

Registry

Method	Path	Description
POST	/v1/register	Register a model version
GET	/v1/resolve	Resolve a model reference to a path
POST	/v1/promote	Promote a version to an alias
POST	/v1/verify	Verify SHA256 integrity
GET	/v1/show	Return model_meta.json for a ref
GET	/v1/models	List all registered model versions

Tracking (requires MySQL — HTTP 503 if DB_HOST is unset)

Method	Path	Description
POST	/v1/runs/log-metric	Log a single metric point
POST	/v1/runs/log-param	Log a single parameter
POST	/v1/runs/log-batch	Log metrics, params, and tags
GET	/v1/runs/get	Fetch a full run snapshot
GET	/v1/runs/list	List runs for a (task, model)

Governance

Method	Path	Description
GET	/v1/aliases	List all aliases for a model
GET	/v1/metrics	Return version metrics + step history from DB/JSONL
GET	/v1/compare	Compare metrics across two or more versions
GET	/v1/artifacts	List files in a version package with SHA256 + sizes
PUT	/v1/tags	Set a tag on a model version
POST	/v1/versions/patch	Patch description or tags on a version
POST	/v1/stage	Set lifecycle stage (none/staging/production/archived)

MySQL setup (optional)

When DB_HOST is set, the service bootstraps five tables on startup:

omr_runs          — run lifecycle (run_id, status, started_at, finished_at)
omr_params        — key/value params per run
omr_metrics       — step-indexed metric values per run
omr_tags          — key/value tags per run
omr_version_tags  — key/value tags per model version

Environment variables:

export DB_HOST=localhost
export DB_PORT=3306
export DB_USER=omr
export DB_PASSWORD=secret
export DB_NAME=model_registry

When DB_HOST is absent, the service runs in filesystem-only mode. Tracking endpoints return HTTP 503; all registry and governance endpoints remain fully functional.

Testing

pip install -e ".[dev]"
pytest -q

Relationship to OmniBioAI Ecosystem

The ModelHub is a control-plane component of OmniBioAI.

Companion repositories:

omnibioai → AI-powered bioinformatics workbench
omnibioai-tes → execution orchestration across local/HPC/cloud
omnibioai-rag → reasoning and literature intelligence
omnibioai-lims → laboratory data management
omnibioai-workflow-bundles → reproducible pipelines
omnibioai-sdk → Python client access

The ModelHub provides the AI artifact governance layer shared by all.

Roadmap

Done (v0.2.0)

Experiment tracking with RunLogger + PluginRunClient
MySQL-backed run/metric/param/tag storage
ModelHub UI with Experiments tab + metric sparklines
Stage management (none → staging → production → archived)
Alias listing, metric comparison, artifact browser endpoints

Near Term

S3 / Azure Blob storage backends
Step-history sparklines in UI pulled from DB (currently single-point)
Model signature validation (input/output schema enforcement)
RBAC — per-task access control

Mid Term

Parallel coordinates plot for hyperparameter search
Auto-link run_id → model version in UI (Registered As chip)
Pagination + filtering on GET /v1/models and GET /v1/runs/list
Promotion policies (metric threshold gates)

Long Term

Regulatory-ready audit and lineage export (PDF/CSV)
Enterprise biomedical AI governance platform
Deeper LIMS integration (sample → dataset → run → model chain)

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows_disabled		.github/workflows_disabled
frontend		frontend
omnibioai_model_registry.egg-info		omnibioai_model_registry.egg-info
omnibioai_model_registry		omnibioai_model_registry
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.legacy		Dockerfile.legacy
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

OmniBioAI ModelHub

Status

Why This Exists

Experiment Tracking

RunLogger (filesystem — local scripts, notebooks)

PluginRunClient (HTTP — TES container plugins)

Role in the OmniBioAI Architecture

Core Design Principles

1) Immutability

2) Integrity Verification

3) Provenance-Friendly Metadata

4) Promotion Workflow

5) Storage Abstraction

Repository Structure

Canonical Model Package Layout

Install, Build, and Use as a Python Package

1) Configure registry root

2) Install (editable) for development

3) Build a wheel (distribution)

CLI Usage (omr)

Register a model package

Resolve a model reference

Promote a version to production

Verify integrity

Show metadata

List models for a task

Show version metrics and run history

List aliases

Set a tag on a model version

Set lifecycle stage

Compare metrics across versions

Python API Usage

REST Service (FastAPI)

Run locally

Endpoints

MySQL setup (optional)

Testing

Relationship to OmniBioAI Ecosystem

Roadmap

Done (v0.2.0)

Near Term

Mid Term

Long Term

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

CLI Usage (`omr`)

Packages