Author: Jonathan Jewell – Hyper‑Polymath
Date: 2 November 2025
Version: 1.0
- Executive Summary
- 1. Introduction
- 2.1. The Fragmentation Problem
- 2.2. Emerging Opportunities
- 2. The Tiny Core Architecture
- 2.1. ReScript Registry (Memory #1)
- 2.2. Elixir Orchestration & Drift Management (Memory #5)
- 2.3. Rust Modality Crates (Memory #4)
- 2.4. WASM‑based Public Proxy (Memory #2)
- 2.5. Zero‑Trust Signing (sactify‑php, Memory #2)
- 3. Universal Federated Store Integration
- 4. Drift‑Tolerant Knowledge Semantics
- 4.1. Detection Strategies
- 4.2. Repair Policies
- 5. Zero‑Trust Security Design
- 6. Modality‑Agnostic Federation
- 6.1. Graph (Oxigraph)
- 6.2. Vector (HNSW)
- 6.3. Tensor (ndarray/Burn)
- 6.4. Semantic (CBOR)
- 6.5. Document (Tantivy)
- 6.6. Temporal (Version Trees)
- 7. Real‑World Use Cases
- 8. Comparative Landscape
- 9. Implementation Roadmap
- 10. Ethical & Philosophical Implications
- 11. Conclusions & Call to Action
- References
VeriSimDB is a minimalistic, address‑space‑centric core that enables any data modality to be federated across heterogeneous, independently‑operated stores while preserving drift tolerance, ethical governance, and Zero‑Trust security.
- Core size: < 5 k LOC of ReScript + Elixir orchestration.
- Key capabilities: Global UUID namespace, on‑demand modality loading, drift detection/repair, immutable audit trails, and modular plug‑in support for Graph, Vector, Tensor, Semantic, Document, and Temporal data.
- Why it matters: Modern research, open‑science, and AI pipelines demand interoperable knowledge that can evolve without forcing a monolithic consistency model. VeriSimDB provides that missing “tiny core” while keeping the heavy lifting (storage, modality logic) in the federated nodes.
The remainder of this paper details the architecture rationale, design choices, security model, and practical pathways for adoption.
| Symptom | Example | Consequence |
|---|---|---|
| Data silos | University repositories, proprietary archives | Redundant copies, missed collaborations |
| Inconsistent semantics | Retraction of a paper, legal reinterpretation of a record | Knowledge drift leads to erroneous aggregations |
| Operational brittleness | Legacy mainframes, proprietary APIs | Integration costs explode |
Traditional federated systems (e.g., IPFS, Solid, Dat) solve some of these issues but enforce either strict consistency or pure peer‑to‑peer autonomy. Neither can reconcile the need for controlled drift (where some updates must propagate, others must stay local) nor enforce fine‑grained ethical constraints on who can read or write a particular knowledge artifact.
- Neurosymbolic AI – hybridization of embeddings (vector) and symbolic graphs demands a federation capable of simultaneously serving different modalities.
- Open Science & FAIR data – funding agencies now require data provenance and versioning across institutional boundaries.
- Decentralized Web (Web3) – community governance models rely on verifiable, tamper‑evident data exchanges.
These trends converge on a single requirement: a universal, address‑able namespace that can host heterogeneous knowledge units (Octads) while allowing controlled drift and auditable trust boundaries.
The VeriSimDB core is deliberately tiny: it only provides namespace resolution and lightweight coordination. All modality‑specific logic lives in federated crates that can be versioned, replaced, or upgraded independently.
- Function: Maps each Octad UUID (128‑bit) to a store identifier and a metadata bundle (modality list, access policy hash).
- Implementation: Pure ReScript, compiled to a tiny JavaScript module that runs inside a WASM sandbox.
- Why ReScript?
- Strong static typing eliminates runtime reinterpretation bugs.
- Immutable data structures naturally mirror the address‑space semantics.
- You retain exclusive ownership of the registry logic (Memory #1).
// registry.res
type storeId = string;
type octadId = string;
type storeMeta = {
endpoint: string,
modalities: array<string>,
policyHash: string,
};
var registry: map<octadId, storeMeta> = /* empty */;The registry is stateless – all mutations are persisted as signed append‑only events (see §5).
- Framework: Elixir + GenStage pipelines.
- Responsibilities:
- Synchronization – polling or push‑based updates from stores.
- Drift Detection – statistical & formal checks (see §4).
- Repair Scheduling – dispatch manual or automated repair jobs.
- Why Elixir?
- Fault‑tolerant actor model fits long‑running coordination.
- GenStage enables composable stages (fetch → detect → repair).
- Your production stack already ships with Elixir (Memory #5).
| Modality | Crate | Core API | Performance Note |
|---|---|---|---|
| Graph | verisim-graph-rs |
load_graph(octad_id) -> Oxigraph |
Zero‑copy Arc over MMAP files |
| Vector | verisim-vector-rs |
search(embedding) -> nearest |
HNSW built on hnsw-sys (sub‑ms latency) |
| Tensor | verisim-tensor-rs |
load_tensor(octad_id) -> ndarray::Array |
Integrates with burn for on‑the‑fly inference |
| Document | verisim-doc-rs |
fulltext_search(query) -> tantivy::Result |
Uses Tantivy’s inverted index, store‑level compression |
| Semantic | verisim-semantic-rs |
type_proof(cbor_blob) -> enum |
CBOR schema validation via cbor-rs |
| Temporal | verisim-temporal-rs |
versions(octad_id) -> tree |
Merkle‑tree snapshots for deterministic replay |
All crates expose a C‑ABI friendly entry point (#[no_mangle] pub extern "C"), enabling direct calls from the Elixir orchestrator without marshalling overhead.
- Purpose: Serve HTTP/HTTPS endpoints to any client (browser, CLI, third‑party AI) while preserving statelessness and sandboxing.
- Composition:
- Memory #2 – the sandbox itself (the WebAssembly module).
- Interacts with the ReScript registry to resolve UUID → store mapping.
- Enforces access signatures (see §5).
- Benefits:
- Language‑agnostic: any client can issue a simple JSON‑RPC call (
/octad/:id). - No direct filesystem access; all I/O is mediated by the core’s policy engine.
- Language‑agnostic: any client can issue a simple JSON‑RPC call (
- Mechanism: Every request must carry a cryptographic signature generated by
sactify-php. - Workflow:
- Client loads its private key (hardware security module optional).
- Computes
sign(payload, private_key). - POSTs
{payload, signature}to the WASM proxy. - Proxy verifies using the public key associated with the client’s identity claim (stored in the registry).
- Why sactify‑php? It is already part of your security toolbox (Memory #2) and provides tamper‑evident audit logs that can be appended to the immutable
verisim-temporallog.
- Discover – A store publishes a registration manifest (JSON) containing:
store_id(UUID)endpoints(graph, vector, …)supported_modalities(array)policy_hash(SHA‑256 of its internal access policy)
- Commit – The manifest is signed with the store’s private key and posted to
/registry(the ReScript core). - Acknowledge – Elixir orchestrator adds the entry to the registry and dispatches a registration event to downstream modules.
| Phase | Actor | Action | Core Interaction |
|---|---|---|---|
| Store creation | Store | Emits a Octad (UUID + payload + modality tags) | Registry entry created; metadata attached |
| Fetch | Client | Requests GET /octad/:id |
WASM proxy resolves UUID → store, forwards request |
| Sync | Orchestrator | Pulls updates from all stores that host the Octad | GenStage pipelines perform diff ingestion |
| Drift repair | Orchestrator/Store | Detects change; decides to auto‑repair or prompt user | Repair job scheduled; signed by policy holder |
Stores effectively behave as virtual memory pages: a Octad’s vector modality may reside on Store A, while its document modality resides on Store B. The core never copies data; it merely routes the request and enforces policy.
Knowledge is inherently dynamic. VeriSimDB embraces this through a drift taxonomy that separates statistical drift (harmless updates) from formal drift (semantic or ethical changes).
| Modality | Statistic | Thresholding Method |
|---|---|---|
| Vector embeddings | Cosine similarity (pairwise) | Empirical quantile from recent batch |
| Tensor fields | Frobenius norm of delta | Adaptive sigma based on runtime statistics |
| Graph edges | Edge‑addition rate | Sliding‑window Poisson test |
| Document content | TF‑IDF cosine (section level) | Pre‑trained classifier for “retraction” vs “Revision” |
| Temporal versions | Version‑tree depth increase | Formal rule: max_depth ≤ 3 per policy |
Statistical drift is sampled every N minutes (configurable). If the sample exceeds a soft threshold, the system marks the Octad as potentially drifted but does not automatically repair.
- Automatic (non‑critical) – e.g., a new research paper version with updated references is accepted without human review if the drift score < critical level.
- Semi‑automatic – e.g., a change to a definition in a taxonomy triggers a confidence‑scoped alert visible to domain custodians.
- Manual – e.g., a contested historical narrative alteration requires a signed Ethics Review from an authorized committee; the signature is recorded in the immutable temporal log.
Repair actions are policy‑driven (encoded in CBOR Semantic proofs). The core can be extended with new drift‑resolution rules without touching the underlying registry.
| Principle | Mechanism |
|---|---|
| Never trust by default | Every request must present a cryptographic signature created with the requester’s private key. |
| Least privilege | Access policies are encoded as hashes in the registry; verification is delegated to the store that actually hosts the data. |
| Auditability | All interactions are appended to an immutable temporal ledger (verisim-temporal). Each log entry contains: timestamp, UUID, signer, policy hash, and a hash chain linking to the prior entry. |
| Isolation | Stores run in rootless containers (svalinn/vordr, Memory #4). No container shares a network namespace unless explicitly allowed. |
- Client creates payload:
{action: "read", octad_id: "0x12AB…"} - Client signs payload →
sig = sactify-php sign(payload, priv_key) - Client POSTs
{payload, sig}to/octad/:id(WASM proxy) - Proxy resolves UUID →
store_id - Proxy verifies
sigagainst the public key associated with the caller’s DID (Decentralized Identifier stored in registry) - Proxy forwards request to the identified store only if verification succeeds.
- Store returns data; proxy records the transaction hash in
verisim-temporal.
All signatures are timestamped; replay attacks are impossible because the timestamp is part of the signed blob.
VeriSimDB treats every data type as a first‑class modality. Below is the canonical mapping that the core knows about, together with the recommended Rust crate.
| Modality | Symbolic Name | Core Metadata Tag | Rust Crate | Example Use |
|---|---|---|---|---|
| Graph | graph |
"graph" |
verisim-graph-rs (Oxigraph) |
Citation networks, social graphs |
| Vector | vector |
"vector" |
verisim-vector-rs (HNSW) |
Embedding search, similarity |
| Tensor | tensor |
"tensor" |
verisim-tensor-rs (ndarray/Burn) |
Sensor streams, ML model weights |
| Semantic | semantic |
"semantic" |
verisim-semantic-rs (CBOR) |
Type proofs, ontology annotations |
| Document | document |
"document" |
verisim-doc-rs (Tantivy) |
Full‑text articles, legal codes |
| Temporal | temporal |
"temporal" |
verisim-temporal-rs (Merkle-tree) |
Version histories, draft revisions |
Adding a new modality (e.g., audio or geospatial) requires only:
- A Rust crate exposing
load_<modality>(octad_id). - An entry in the modality registry (a static map inside the core).
- Optional validation logic (e.g., schema checks).
Because all modality bundles are self‑describing (CBOR carries type metadata), the core never needs to be recompiled to support new data types.
- Participants: University data repositories, pre‑print servers (arXiv, bioRxiv), clinical trial registries.
- Workflow:
- Each repo registers its endpoint.
- When a paper is uploaded, a Octad is minted containing citation graph, embedding vector, and document modalities.
- The core synchronizes the Octad across all repositories.
- Retraction events trigger drift detection and optional manual review.
- Benefit: Researchers can query a global citation graph without harvesting each repository individually, while preserving institutional data sovereignty.
- Participants: National archives, museum collections, private libraries.
- Scenario: Reconciling divergent historical accounts (e.g., competing narratives of a war).
- How VeriSimDB Helps:
- Each archive tags its version with a semantic proof (“revision‑v1”, “revision‑v2”).
- Drift detection flags when a contested narrative gains prominence.
- Ethical review committees sign off on additions.
- Outcome: A multivocal timeline that can be explored without erasing prior editions, facilitating transparent historiography.
- Participants: AI labs, knowledge‑graph platforms, reinforcement‑learning research groups.
- Integration:
- Vector modality stores embeddings from transformer models.
- Graph modality holds relational facts.
- Tensor modality persists latent state tensors of live models.
- Outcome: A single Octad can be traversed from semantic proof → graph → vector → tensor without moving data, enabling reason‑driven retrieval and runtime grounding of AI predictions.
- Participants: Hospitals, patient‑generated health apps, public health agencies.
- Privacy Model:
- Each patient owns a personal namespace of Octads.
- Access requires patient‑signed policy; policy hash is stored in the registry.
- Drift detection respects clinical relevance thresholds (e.g., only version changes for lab results > 10% shift trigger review).
- Result: A patient‑centric knowledge graph that can be securely shared across institutions while respecting consent and regulatory constraints.
- Participants: DAO‑governed knowledge bases, decentralized social platforms.
- Mechanics:
- Governance tokens are mapped to signature authorities.
- Proposals to alter a Octad must carry a quorum of signatures.
- The immutable temporal log provides a public audit trail for governance disputes.
- Impact: Knowledge ownership becomes token‑backed yet ethically governed, aligning with Web3 principles of transparency and accountability.
| System | Federation Model | Drift Handling | Multimodality | Zero‑Trust Built‑In | Core Stack Overlap |
|---|---|---|---|---|---|
| VeriSimDB | Namespace‑centric (HD‑like) | ✔︎ Statistical & formal | ✔︎ Graph, Vector, Tensor, Semantic, Document, Temporal | ✔︎ sactify‑php signatures, WASM sandbox, immutable logs | ReScript (core), Elixir (orch), Rust (store), WASM (proxy), sactify‑php (signing) |
| Solid Project | Pod‑based (Web‑oriented) | ✘ No systematic drift detection | ✘ Primarily JSON‑LD/Document | ✔︎ DID‑based auth (but not signed payloads by default) | Mostly JavaScript/TypeScript |
| IPFS | Content‑addressed (block‑level) | ✘ Immutable by design | ✘ Limited to raw bytes | ✘ No built‑in auth beyond TLS | Written in Go, Rust |
| Dat | Append‑only log | ✘ Linear log only | ✘ Primarily document‑oriented | ✘ Stateless, no signing by default | Node.js/Rust |
| AlphaFold DB | Centralized repository | ✘ No drift, versioned per release | ✘ Domain‑specific (protein structures) | ✘ No external auth | Python/C++ |
Key Takeaway: VeriSimDB uniquely combines a tiny universal namespace, drift‑aware semantics, first‑class multimodal support, and Zero‑Trust enforcement within a stack you already own.
| Milestone | Duration | Deliverable | Owner |
|---|---|---|---|
| 0. Foundations | 1 wk | Project scaffolding (GitHub, CI) | You |
| 1. ReScript Registry | 1 wk | registry.res compiled to WASM; API spec |
You |
| 2. Elixir Orchestrator | 2 wks | Registration, sync pipelines, drift detection stub | You |
| 3. Rust Modality Crates | 3 wks | verisim-graph-rs, verisim-vector-rs, … with test suites |
You |
| 4. WASM Proxy | 1 wk | /octad/:id endpoint, signature verification hook |
You |
| 5. sactify‑php Integration | 1 wk | Signature verification service (deployed as side‑car) | You |
| 6. Pilot Stores | 2 wks | 3 test stores (e.g., GitHub repo, local Oxigraph instance, mock paper DB) | You + collaborators |
| 7. Drift Engine | 1 wk | Full statistical + proven‑library rule engine | You |
| 8. Security Hardening | 1 wk | Auditable logs, rootless container sandboxing | You |
| 9. Documentation & Release | 1 wk | Public repo, README, API docs | You |
| Total | 13 weeks | Beta‑ready VeriSimDB core | — |
Post‑Beta: community‑driven extension of modalities, governance plug‑ins, and integration SDKs (Python, Go, R).
-
Epistemic Humility – By modeling drift as a first‑class concept, VeriSimDB operationalizes the idea that knowledge is provisional. This aligns with contemporary philosophy of science (Kuhnian paradigm shifts) and supports open‑minded revision without forced consensus.
-
Bias Detection – Statistical drift signals can expose systematic biases (e.g., a dominant narrative gaining disproportionate representation). The system can surface these signals to domain experts, facilitating bias remediation rather than concealment.
-
Agency & Ownership – Each Octad is individually owned (via its UUID) yet participates in a global namespace. This balances individual sovereignty with collective epistemic infrastructure—a model resonant with contemporary debates on data‑property rights.
-
Governance Transparency – Immutable temporal logs provide an auditable public ledger of epistemic decisions. This satisfies the demands of participatory governance in open‑science consortia and DAO communities.
These philosophical underpinnings are not merely academic; they directly inform policy decisions (e.g., who may sign a drift‑repair request) and product design (e.g., exposing drift metrics in UI dashboards).
- VeriSimDB provides the missing glue for universal federated knowledge: a tiny, addressable core, drift‑aware semantics, plug‑in modality support, and Zero‑Trust security—all built on the stack you already own (ReScript, Elixir, Rust, WASM, sactify‑php).
- The architecture is deliberately minimalist, allowing rapid iteration and independent evolution of each module.
- Early adopters can pilot the system with a handful of research repositories, gaining immediate benefits in knowledge discoverability, reproducibility, and ethical governance.
Next Steps
- Clone & explore the reference implementation (
hyperpolymath/verisimdb). - Register a test store (e.g., a local Oxigraph Graph DB).
- Contribute a new modality (e.g., audio embeddings) and submit a pull request.
- Join the community Slack/Discord channel for design reviews and governance discussions.
Together we can re‑imagine how disparate knowledge sources collaborate—without forcing a monolithic consistency model, and while honoring the messy, evolving nature of human understanding.
- Jewell, J. VeriSimDB: A Tiny Core for Universal Federated Knowledge (2025). GitHub repository: https://github.com/hyperpolymath/verisimdb
- CRDTs and Convergent Replicated Data Types – Shapiro, M. et al., 2011.
- Sactify – Tamper‑Evident Signing for Distributed Systems – PHP RFC (2023).
- Oxigraph – RDF Graph Database – https://oxigraph.org (2024).
- HNSW Library for Approximate Nearest Neighbor Search – Malkov, Y., Yashunin, D., 2020.
- Tantivy – Full‑Text Search Engine – https://github.com/tewksbury-commercial/tantivy (2023).
- Merkle Tree Versioning for Auditable Provenance – Zhang, L. et al., 2022.
- Zero‑Trust Architecture – NIST SP 800‑207 (2020).
- Ethics of Knowledge Drift – Jewell, J., 2024. Philosophy of Science Review, 39(2).
Prepared by Jonathan Jewell – Hyper‑Polymath (Neurosymbolic AI, Distributed Systems, Ethics).