This project is a high-performance simulation framework for distributed sensor networks. It enables the study of system-level resilience in the face of complex edge failures by combining a C++ runtime with an automated Python-based telemetry investigation pipeline.
The framework components include:
- Edge Nodes (C++): High-frequency data producers (10Hz) that simulate sensor telemetry and health heartbeats. These nodes include a deterministic fault-injection engine.
- Central Aggregator (C++): A specialized data sink that performs real-time validation, detects sequence gaps, monitors clock synchronization across the cluster, and identifies operational anomalies.
- Investigation Pipeline (Python): An offline analysis suite that correlates distributed JSONL logs to perform Root Cause Analysis (RCA) and generate detailed incident reports and safety artifacts.
The system follows a decoupled, message-oriented architecture using POSIX UDP for low-latency edge communication.
graph TD
subgraph Edge_Tier [Edge Tier: C++ Nodes]
Node1[Sensor Node 1\nConfigurable Fault Engine]
NodeN[Sensor Node N\nConfigurable Fault Engine]
end
subgraph Processing_Tier [Processing Tier: C++ Aggregator]
Agg[Central Aggregator\nValidation & Anomaly Detection]
end
subgraph Storage_Tier [Storage Tier: Structured Logs]
LogNode[(Node Logs\nInternal Ground Truth)]
LogAgg[(Aggregator Logs\nTelemetry & Alarms)]
end
subgraph Analysis_Tier [Analysis Tier: Python Pipeline]
PyAnalyzer[RCA Engine\nLog Correlation]
Docs[Technical Artifacts\nFMEA, FTA, RCA Reports]
end
Node1 -- UDP Payload --> Agg
NodeN -- UDP Payload --> Agg
Node1 -.-> LogNode
NodeN -.-> LogNode
Agg -.-> LogAgg
LogNode == Batch Correlate ==> PyAnalyzer
LogAgg == Batch Correlate ==> PyAnalyzer
PyAnalyzer ==> Docs
The edge nodes support a configuration-driven failure model. This allows developers to test system stability against specific hardware and environmental edge cases without needing physical failure scenarios.
| Feature Profile | Fault Type | System Impact | Detection Mechanism |
|---|---|---|---|
dropout |
Intermittent Outage | Temporary data loss (0.5s-3s) | Silence Detection |
clock_drift_high |
Sync Degradation | Clock skew (200ppm) | Median-Relative Tracking |
corruption |
Payload Noise | Bit-flips in transmitted data | CRC32 Verification |
environmental_noise |
Thermal Spike | Signal noise correlated with Temp (>35°C) | Heuristic Filtering |
The Python pipeline automates the "Post-Mortem" process. By comparing node-side ground truth (what actually happened) with aggregator telemetry (what was observed), it generates:
- Root Cause Analysis (RCA): Detailed evidence-based investigation using log timestamping.
- Corrective Actions: Technical prescriptions for architectural fixes or requirement updates.
- Safety Artifacts: Automated generation of FMEA (Failure Mode and Effects Analysis) and Fault Tree Analysis (FTA) diagrams.
Ensure you have cmake and a C++20 compliant compiler installed.
mkdir -p build && cd build
cmake ../cpp
make -j$(nproc)
cd ..Use the included run_test.sh script to execute a simulation scenario. This runs the runtime for 15 seconds and then invokes the analysis pipeline.
# Example: Run a network dropout scenario
./run_test.sh dropout configs/scenario_dropout.json
# Example: Run an environmental noise scenario
./run_test.sh noise configs/scenario_noise.jsonAll investigation reports are located in the docs/ directory:
rca/: Investigation logs.capa/: Action plans for system improvements.fmea.md / fta.md: Formal risk assessment artifacts.