Skip to content

ulywae/NeuDB

Repository files navigation

NeuDB

License: MIT Platform: Arduino Platform: ESP32

Ultra-Lean Embedded LSM-Tree Storage Engine with Power-Fail Resilient Lifespan Protection and Isolated Twin-Engine Rolling History Channels for ESP32.

NeuDB is an high-performance, transactional Log-Structured Merge-tree (LSM-Tree) storage engine engineered from first principles for resource-constrained ESP32 architectures using the Arduino framework and native FreeRTOS primitives.

Built around a revolutionary Twin-Engine Storage Pipeline, NeuDB cleanly bifurcates standard point-in-time transactions from massive sequential telemetric log tracking. Operating entirely within a high-address 32-bit virtual coordinate topology (0xFFFFFFFF), the architecture scales up to 16,384 circular history rolling slots per object ID, enabling microsecond lookups and lightning-fast write ingestion while maintaining zero heap fragmentation and preserving silicon longevity across internal Flash (LittleFS) or external MicroSD hardware.


Why NeuDB?

Feature Traditional Frameworks / JSON NeuDB (Twin-Engine LSM)
Write Latency & Wear High Flash Wear (In-Place Update Blocks) Hyper-Drive Speed (Sequential Append Operations)
Power-Fail Safety High Risk of Filesystem Sector Corruption Bulletproof System (Atomic WAL Replay Validation)
Task Synchronization Blocking I/O Triggers Watchdog (WDT) Resets Smart Adaptive Stall & Auto-Yielding Backoff
History Log Capacity Linear O(N) Filesystem Search Degradation 16,384 Circular Slots Per ID with O(1) Bitmasking
Search Speed Full File Table Scans (Highly Inefficient) Microsecond Lookups (Bloom Filters + RAM Binary Search)

Advanced Architectural Features

  • Twin-Engine Storage Isolation Pipeline: Cleanly segregates standard variable mutation maps from historical log snap streams into isolated SSTable directory trees (/lsm/lvX_Y.sst vs /lsm/log_lvX_Y.sst), guaranteeing that heavy telemetry drops never pollute regular operational configurations.
  • 32-Bit High-Address Anchor Coordinates: Maps logical circular histories into an absolute 32-bit virtual layout space anchored at the register ceiling (0xFFFFFFFF). This design layout enforces strict address space separation, completely neutralizing memory address collisions with underlying standard key blocks.
  • 16,384 Circular Rolling History Slots: Allocates a deep 14-bit index resolution track per tracking identifier. The internal engine coordinates rolling data wraps automatically via bitwise masking and modulo structures, preserving the freshest historical states while dropping expired segments during compaction sweeps.
  • Asynchronous Incremental Compaction Daemon: Driven by an isolated FreeRTOS background thread pinned exclusively to CPU Core 1. It processes multi-way merge-sort arrays using custom Min-Heaps (std::priority_queue) to consolidate SSTable blocks, clear stale records, and evaluate transaction timestamps under absolute sequential consistency (__ATOMIC_SEQ_CST).
  • Microsecond Read Optimization & Probabilistic Filtering: Combines stack-allocated, multi-way 128-Bit Deduplication Bitmasks with hardware-assisted Dynamic Bloom Filters to intercept absent search coordinates instantly before triggering a physical non-volatile disk lookup loop. Actual data discovery utilizes high-speed binary lookahead arrays (std::lower_bound) on RAM-cached file indices.
  • Smart Adaptive Write Stall Policy: Employs an intelligent concurrency ingestion brake. If the background worker task is heavily churning storage clusters, the ingestion path applies a temporary, dynamic backoff window to resolve filesystem buffer locks without blocking main application loops.
  • Fault-Tolerant Power-Fail Recovery: Implements sequential Write-Ahead Logging (WAL) packed with hardware-backed CRC32 checksum verification. Upon boot initialization, the engine intercepts dirty shutdowns, skips truncated data fragments, and executes an atomic WAL Replay pass to hydrate uncommitted transactions safely back into RAM.
  • Defensive Capacity Boundary Guards: Proactively samples filesystem structures to establish a rigid 90% hard capacity threshold fence (_flashFullGuard). If the storage threshold breaches constraints, the engine safely switches into an automatic fallback, executing reactive Heuristic Eviction over the oldest metadata nodes to protect against memory corruption or panic loops.
  • Zero-Header Pollution Pimpl Idiom Interface: Encapsulates complex standard template structures (std::map, std::vector, std::list), operating system primitives, and mutex flags behind a clean, opaque wrapper allocation mask (void*). Exposes an ultra-lean API block to front-end sketches while keeping code changes securely hidden.

Performance Benchmarks

  • Ingestion Performance: 100 sequential log writes committed with complete physical WAL serialization to non-volatile disk in 12 milliseconds (~120 microseconds per individual write operation).
  • Zero-Malloc Operational Stability: Erases dynamic heap allocations (malloc/new) during active runtime write transactions. Employs zero-copy move transformations (std::move) during data flushing passes to maintain dead-flat line memory utilization curves over long-term application cycles.

Quick Start

1. Developer-Friendly API Interface

NeuDB exposes a type-safe template acceleration facade, allowing you to ingest variables, telemetry blocks, or custom packed layout structures cleanly without passing physical data sizes or low-level address pointers manually.

#include <NeuDB.h>

// Struct packed ensures dense storage alignment without compiler padding bytes
struct __attribute__((packed)) EmbeddedTelemetry {
    uint32_t transactionID;
    float coreTemperature;
    uint32_t hardwarePulseCount;
};

void setup() {
    Serial.begin(115200);

    // Bootstrap Sequence: Mounts VFS partition topology, scans deep file indices,
    // and automatically replays outstanding transactional delta states from the WAL log.
    if (db.init()) {
        Serial.println("NeuDB v2.0.0 Storage Framework Initialized Successfully.");
    }
}

void loop() {
    static uint16_t sampleCounter = 0;
    sampleCounter++;

    // -----------------------------------------------------------------
    // SKENARIO A: STANDARD KEY-VALUE PERSISTENCE (0 - 2047 RANGE)
    // -----------------------------------------------------------------
    uint16_t configKey = 105;
    uint32_t activeSystemBitmask = 0xDEADBEEF;
    db.putLogVar(configKey, activeSystemBitmask); // Fast standard mutation write

    // -----------------------------------------------------------------
    // SKENARIO B: TWIN-ENGINE AUTOMATIC TELEMETRY LOG INGESTION
    // -----------------------------------------------------------------
    uint16_t targetSensorID = 7; // Object tracking limit defined in configuration profile
    EmbeddedTelemetry payload = { sampleCounter, 28.7f + ((float)sampleCounter * 0.1f), 1000 + sampleCounter };

    // Ingest data cleanly: The internal engine resolves circular rolling sequences
    // and calculates absolute high-address virtual coordinates automatically.
    if (db.putLogVar(targetSensorID, payload)) {
        Serial.printf("Telemetry drop appended for ID %d\n", targetSensorID);
    }

    // -----------------------------------------------------------------
    // SKENARIO C: REAL-TIME LOG POINT-LOOKUP (LATEST STATE)
    // -----------------------------------------------------------------
    EmbeddedTelemetry latestFrame;
    if (db.getLogVar(targetSensorID, latestFrame)) {
        Serial.printf("Latest State Retrieved -> Temp: %.2f C | Pulse: %u\n",
                      latestFrame.coreTemperature, latestFrame.hardwarePulseCount);
    }

    // -----------------------------------------------------------------
    // SKENARIO D: LAZY-LOADING STATEFUL RANGE ITERATOR STREAMING
    // -----------------------------------------------------------------
    Serial.println("Streaming chronological log segment from Slot 75 to 85...");

    // Arm a lookahead tracking cursor context over a specific range window
    if (db.logIterator(targetSensorID, 75, 85)) {
        while (db.nextLog()) {
            EmbeddedTelemetry recordStream;
            db.getLogValueVar(recordStream); // Accelerated structured type decompression

            uint16_t decodedSlotIndex = db.getLogIndex();       // Extracts native slot sequence position
            uint32_t nodeTimestamp    = db.getLogTimestamp();   // Extracts millisecond transaction epoch

            Serial.printf("  -> [STREAM] Slot: %u | Time: %u ms | Temp: %.2f C\n",
                          decodedSlotIndex, nodeTimestamp, recordStream.coreTemperature);
        }

        // Always explicitly close the tracking channel to secure heap vector bounds
        db.closeLog();
    }

    // FreeRTOS Cores Coordination Rule (ESP32, S3, C3, C6):
    // Always preserve a brief block delay to allow the background compaction scheduler
    // running pinned on the secondary core to execute transactional flash writes smoothly.
    vTaskDelay(pdMS_TO_TICKS(10));
}

Top-Level API Reference

Core Key-Value Methods (Regular Pipeline)

  • bool init() / begin(): Triggers the hardware bootstrap pipeline, mounts destination virtual filesystems, and executes atomic crash recovery.
  • bool put(uint16_t key, const void *data, size_t size): Low-level byte ingestion handler. Automatically evaluates write stall policies and structural boundaries before committing to the WAL.
  • bool get(uint16_t key, void *out, size_t &size): Point-lookup retrieval path. Executes fast multi-tier lookahead sweeps across RAM and disk using pre-filtered binary search.
  • bool del(uint16_t key): Injects a logical cancellation marker (Tombstone) to discard a key, freeing space asynchronously during compaction tasks.

Twin-Engine Telemetry Methods (Isolated Log Pipeline)

  • bool putLog(uint16_t id, const void *data, size_t size): Automatically resolves the dynamic circular rolling sequence track to append a high-address transactional telemetry record.
  • bool getLog(uint16_t id, void *out, size_t &size): Point-lookup query channel that intercepts and extracts the single newest historical snapshot record based on the highest timestamp.
  • bool getLog(uint16_t id, uint16_t index, void *out, size_t &size): Precision lookup query that extracts a targeted historical snapshot record at an exact circular ring index slot.
  • size_t getTotalLog(uint16_t id): Compiles the true cumulative metric count of active, non-tombstone historical entries currently retained in storage clusters.
  • bool deleteLog(uint16_t id): Enqueues low-overhead cancellation markers across all tracking slots to wipe the entire history track of an object ID asynchronously.

High-Level Architecture Log Range Iterator Pipeline

  • bool logIterator(uint16_t id, uint16_t startIdx, uint16_t endIdx): Spawns a stateful lookahead cursor context to stream historical log boundaries using zero upfront heap overhead allocation.
  • bool nextLog(): Advances the range cursor position forward. Automatically filters out query boundary overflows and intercepts tombstones in-flight inside clean application loops.
  • uint16_t getLogIndex(): Retrieves the decoded circular slot index position currently pointed to by the active stream cursor.
  • uint32_t getLogTimestamp(): Retrieves the native hardware millisecond timestamp tracking exactly when the current cursor entry was recorded.
  • void closeLog(): Destroys the stateful tracking cursor context and deallocates all heap vector partitions to prevent volatile memory fragmentation.

Global System Operations

  • void flush(): Forces immediate isolation splitting and serialization of the volatile RAM MemTable down to Level 0 regular and log SST physical blocks.
  • bool format(): Safely deconstructs active threads, purges the background compaction queue, unlinks physical files, and cleanly resets the state machine from zero.
  • void auditLevels(): Generates an accurate topological live report detailing entries count, file volumes, storage footprints, and active tombstones across regular and log levels.
  • const char* getVersion() const: Retrieves the absolute framework release version string supporting read-only diagnostic metadata checks.
  • bool exportKeyValuesToStream(Stream* targetStream): Sweeps the active regular database partition, serializing all live distinct Key-Value pairs directly onto an external output channel using a zero-copy, space-optimized 4-byte wire header protocol.
  • bool exportLogsToStream(Stream* targetStream): Executes a chronological cascading sweep across multi-version log snapshot layers, unpacking high-address offsets into pure circular rolling coordinates before streaming packed binary telemetry structures onto the target interface bus.

Advanced Type-Safe Utility Accelerators (Regular Pipeline)

  • template <typename T> bool putVar(uint16_t key, const T &value): Ingests arbitrary variables or custom structural packed matrices directly into RAM. Automatically extracts payload address coordinates and data type size bounds at compile-time, removing the need for manual pointer casts or sizeof arithmetic in consumer sketches.
  • template <typename T> bool getVar(uint16_t key, T &out): Recovers arbitrary structures securely from the storage architecture. Compiles exact target size boundaries matching type T to enforce memory safety bounds during physical extraction passes.
  • bool putString(uint16_t key, const String &str): Serializes dynamic heap-allocated Arduino String components down to persistent blocks. Safely traverses heap pointers to capture true character array boundaries and length metrics with embedded null terminators.
  • String getString(uint16_t key): Materializes dynamic heap-allocated text entries back into standard Arduino String objects. Returns a valid verified text record string, or an empty object block on database miss.
  • void setMaxStringLength(size_t maxLen): Configures the maximum internal stack-allocated buffer size for dynamic string retrieval. Features a built-in safety ceiling (up to 1024 bytes) to explicitly safeguard the microcontroller from runtime stack overflows caused by malformed configuration sizes.
  • size_t getMaxStringLength() const: Retrieves the currently configured maximum internal buffer size designated for string extraction routines.

System Storage Monitoring Metrics (Zero-IO RAM Cached)

  • size_t getTotalBytes() const: Retrieves the total flash storage partition scale in bytes. Directly reads from the internal RAM cache to ensure lightning-fast retrieval with zero hardware bus contention.
  • size_t getUsedBytes() const: Retrieves the currently occupied storage footprint in bytes. Returns the cached non-volatile byte consumption tracked inside the database layer without triggering expensive hardware VFS/SPI operations.
  • size_t getFreeBytes() const: Computes the remaining free non-volatile storage space in bytes. Instantly calculates the mathematical delta between the total capacity register and used space bounds.

Performance Tuning & Hardware Configuration

NeuDB features an Adaptive Compile-Time Configuration Engine managed entirely through the external NeuDB_Config.h header. The core system architecture automatically shifts its internal topology, dynamic memory budgets, tree depths, and pre-filtering matrices based on your active storage pipeline selector.

To modify the database profile, open NeuDB_Config.h and toggle the static preprocessor macro definition:

// ==================================================================================
// STORAGE SETTINGS: CHOOSE ONE (Static Profile Selector)
// ==================================================================================
#define USE_LITTLEFS ///< Mount Built-In Internal Flash Partition Topology (Default)
// #define USE_SDCARD  ///< Mount External MicroSD SPI Hardware Peripheral Bus Layer

Adaptive Compiler Profile Comparison Matrix

When a profile is locked, the internal engine automatically scales the following hardware-aware constraints at compile time:

Architectural Constant Profile 1: USE_LITTLEFS (Internal Flash) Profile 2: USE_SDCARD (MicroSD Card) Purpose / Technical Impact
NEU_MAX_LEVEL 4 Levels (L0-L3) 5 Levels (L0-L4) Controls tree depth to avoid SST file compaction bottlenecks.
NEU_KEY_SPACE_LIMIT 2048 Keys 2048 Keys Bounds the maximum legal 16-bit Key address to optimize index RAM.
NEU_MAX_TOTAL_ENTRIES 2048 Rows 32768 Rows Unlocks physical storage scale up to 32k historical records on disk.
NEU_MEMTABLE_MAX_ENTRIES 512 Entries 2048 Entries Batches transactions in fast RAM to minimize flash wear amplification.
NEU_SST_TARGET_SIZE 4 KB Blocks 32 KB Chunks Aligns chunk serialization perfectly with physical SD sector clusters.
NEU_CACHE_SIZE_BYTES 1024 Bytes 4096 Bytes Allocates dynamic lookahead memory for faster SPI disk search.
NEU_BLOOM_FILTER_SIZE 64 Bytes (512 bits) 128 Bytes (1024 bits) Tightens the dense bitmask array to suppress lookup false positives.
NEU_BLOOM_HASH_COUNT 4 Functions 5 Functions Optimizes the mathematical collision dampener for target block density.

Dynamic Resource-Aware Auto-Tuning Engine

NeuDB embeds an asynchronous, lock-free system telemetry monitor directly within the background tick() daemon running on CPU Core 1. Instead of enforcing rigid static memory boundaries, the engine dynamically recalculates an elastic threshold (_adaptiveLimit) based on an active multi-variable pressure score matrix:

$$\text{Score} = (\text{WritePressure} \times 0.5) + ((1.0 - \text{HeapRatio}) \times 0.3) + (\text{L0FileCount} \times 0.2)$$

  • Low Pressure (< 0.3): Dynamically expands the active transaction threshold up to 8,192 entries, maximizing write throughput during system idle states.
  • Moderate Pressure (< 0.6): Automatically scales back to a balanced 4,000 entries baseline.
  • High/Critical Pressure (>= 0.6): Defensively constricts the ingestion boundary down to 2,048 entries (with a hard safety ceiling at 1024), triggering aggressive reactive cache evictions and forced Level 0 SST serialization before hardware Out-Of-Memory (OOM) exceptions occur.

Field-proven to seamlessly sustain infinite file rolling pipelines exceeding 100+ sequential SST file descriptors (/lsm/log_lv0_92.sst) under randomized pounding loads without introducing frontend execution latency or heap fragmentation cross-platform.

Custom SPI Pin Mapping (MicroSD Mode Only)

If USE_SDCARD is uncommented, the unified STORAGE_INIT() gate automatically configures the hardware SPI peripheral controllers prior to mounting the filesystem. Modify the GPIO pins directly inside the configuration boundary to match your hardware layout:

#define SD_CS     5   ///< SPI Chip Select Active-Low Controller Line
#define SD_MOSI   23  ///< Master Out Slave In Peripheral Data Line
#define SD_MISO   19  ///< Master In Slave Out Peripheral Data Line
#define SD_SCK    18  ///< Serial Clock Synchronizer Signal Line
#define SD_SPEED  4000000 ///< SPI Transmission Bus Clock Velocity (4MHz)

License & Attribution

  • Core Architect & Maintainer: Developed and engineered from first principles by Ulywae (2026).
  • Ecosystem Integration: Officially adopted as a tier-1 core transactional storage engine module within the Neu Embedded Ecosystem framework.
  • Licensing Framework: Distributed under the open-source MIT License. Open collaboration blueprints and re-entrancy diagnostic pull requests are welcome.

About

Ultra-Lean LSM-Tree Storage Engine with Power-Fail Safety for ESP32

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors