Skip to content

SuyashMullick/distributed-system-production-support

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Sensor Simulation Framework

License C++20 Python

Overview

This project is a high-performance simulation framework for distributed sensor networks. It enables the study of system-level resilience in the face of complex edge failures by combining a C++ runtime with an automated Python-based telemetry investigation pipeline.

The framework components include:

  • Edge Nodes (C++): High-frequency data producers (10Hz) that simulate sensor telemetry and health heartbeats. These nodes include a deterministic fault-injection engine.
  • Central Aggregator (C++): A specialized data sink that performs real-time validation, detects sequence gaps, monitors clock synchronization across the cluster, and identifies operational anomalies.
  • Investigation Pipeline (Python): An offline analysis suite that correlates distributed JSONL logs to perform Root Cause Analysis (RCA) and generate detailed incident reports and safety artifacts.

Architecture

The system follows a decoupled, message-oriented architecture using POSIX UDP for low-latency edge communication.

graph TD
    subgraph Edge_Tier [Edge Tier: C++ Nodes]
        Node1[Sensor Node 1\nConfigurable Fault Engine]
        NodeN[Sensor Node N\nConfigurable Fault Engine]
    end

    subgraph Processing_Tier [Processing Tier: C++ Aggregator]
        Agg[Central Aggregator\nValidation & Anomaly Detection]
    end

    subgraph Storage_Tier [Storage Tier: Structured Logs]
        LogNode[(Node Logs\nInternal Ground Truth)]
        LogAgg[(Aggregator Logs\nTelemetry & Alarms)]
    end

    subgraph Analysis_Tier [Analysis Tier: Python Pipeline]
        PyAnalyzer[RCA Engine\nLog Correlation]
        Docs[Technical Artifacts\nFMEA, FTA, RCA Reports]
    end

    Node1 -- UDP Payload --> Agg
    NodeN -- UDP Payload --> Agg

    Node1 -.-> LogNode
    NodeN -.-> LogNode
    Agg -.-> LogAgg

    LogNode == Batch Correlate ==> PyAnalyzer
    LogAgg == Batch Correlate ==> PyAnalyzer
    
    PyAnalyzer ==> Docs
Loading

Technical Features

Deterministic Fault Injection

The edge nodes support a configuration-driven failure model. This allows developers to test system stability against specific hardware and environmental edge cases without needing physical failure scenarios.

Feature Profile Fault Type System Impact Detection Mechanism
dropout Intermittent Outage Temporary data loss (0.5s-3s) Silence Detection
clock_drift_high Sync Degradation Clock skew (200ppm) Median-Relative Tracking
corruption Payload Noise Bit-flips in transmitted data CRC32 Verification
environmental_noise Thermal Spike Signal noise correlated with Temp (>35°C) Heuristic Filtering

Automated Investigation Pipeline

The Python pipeline automates the "Post-Mortem" process. By comparing node-side ground truth (what actually happened) with aggregator telemetry (what was observed), it generates:

  • Root Cause Analysis (RCA): Detailed evidence-based investigation using log timestamping.
  • Corrective Actions: Technical prescriptions for architectural fixes or requirement updates.
  • Safety Artifacts: Automated generation of FMEA (Failure Mode and Effects Analysis) and Fault Tree Analysis (FTA) diagrams.

Setup & Execution

1. Build Requirements

Ensure you have cmake and a C++20 compliant compiler installed.

mkdir -p build && cd build
cmake ../cpp
make -j$(nproc)
cd ..

2. Running Simulations

Use the included run_test.sh script to execute a simulation scenario. This runs the runtime for 15 seconds and then invokes the analysis pipeline.

# Example: Run a network dropout scenario
./run_test.sh dropout configs/scenario_dropout.json

# Example: Run an environmental noise scenario
./run_test.sh noise configs/scenario_noise.json

3. Reviewing Results

All investigation reports are located in the docs/ directory:

  • rca/: Investigation logs.
  • capa/: Action plans for system improvements.
  • fmea.md / fta.md: Formal risk assessment artifacts.

About

Fault injection and failure analysis framework for a distributed sensor system, simulating field failures and documenting root-cause analysis, corrective actions, and system-level mitigations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors