An end-to-end cybersecurity solution that uses 10+ supervised ML algorithms and a hybrid stacked ensemble to detect network intrusions in real-time โ with live dashboards, attack simulation, and dual logging.
- Overview
- Key Highlights
- Problem Statement
- System Architecture
- Tech Stack
- Dataset Details
- Machine Learning Models
- Real-Time Detection Pipeline
- Dashboard Features
- Project Structure
- Installation & Setup
- Testing Strategy
- Results & Achievements
- Limitations
- Future Scope
- Author
- License
- Contributing
The ML-Based Network Intrusion Detection System (NIDS) is an advanced cybersecurity solution built to monitor, analyze, and detect malicious network activity in real-time using Machine Learning.
Unlike traditional signature-based systems, this project trains on real-world traffic data and applies a hybrid stacked ensemble model to maximize detection accuracy while minimizing false positives. The detection engine feeds into a full MERN stack dashboard, forming a complete, production-ready pipeline:
Packet Capture โโโบ Feature Extraction โโโบ ML Prediction โโโบ Live Dashboard
| Feature | Description |
|---|---|
| ๐ต๏ธ Real-Time Sniffing | Live packet capture using Scapy |
| ๐ค 10+ ML Models | Classical, advanced, and ensemble algorithms |
| ๐งฌ Hybrid Ensemble | Stacked model: RF + XGBoost + LightGBM โ LR meta-classifier |
| ๐พ Dual Logging | Simultaneous logging to MongoDB and CSV |
| ๐ Live Dashboard | React-based UI with charts, filters, and export |
| ๐ญ Attack Simulation | UDP Flood, Port Scan, SYN Flood built-in |
| ๐๏ธ Tunable Threshold | Adjustable prediction probability cutoff (default: 0.6) |
| โ High Accuracy | Achieves ~90%+ detection accuracy on CICIDS2018 |
Traditional Intrusion Detection Systems carry critical weaknesses that leave networks exposed:
| โ Problem | Impact |
|---|---|
| Static rule-based signatures | Cannot adapt to new or evolving attack patterns |
| Zero-day attack blindness | Unknown threats go undetected |
| High false positive rate | Security teams suffer from alert fatigue |
- ML-based classification โ learns attack patterns, not rigid rules
- Trained on CICIDS2017, a benchmark dataset with real traffic flows
- Ensemble stacking boosts precision and suppresses false alarms
- Configurable threshold provides fine-grained sensitivity control
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Network Traffic โ
โ (Live packets or simulated flows) โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Packet Sniffer (Scapy) โ
โ Captures raw TCP/UDP/ICMP packets โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Feature Extraction Engine โ
โ Computes 6 flow-based features โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ML Prediction Engine โ
โ Stacked Ensemble (RF+XGB+LGBM+LR) โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Classification Output โ
โ ๐จ ATTACK โโorโโ โ
BENIGN โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Dual Logging System (CSV + MongoDB) โ
โโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Node.js + Express REST API โ
โโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ React Dashboard (Live Visualization UI) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Layer | Technology | Purpose |
|---|---|---|
| ๐ง ML Engine | Python, scikit-learn, XGBoost, LightGBM, Pandas, NumPy | Model training, prediction, feature processing |
| ๐ Networking | Scapy | Real-time packet sniffing and flow analysis |
| ๐ฅ๏ธ Backend | Node.js, Express.js, MongoDB (Mongoose) | REST API, database logging, data persistence |
| ๐จ Frontend | React.js, Tailwind CSS, Recharts | Live dashboard, charts, alerts, export |
| Property | Value |
|---|---|
| Name | CICIDS2018 |
| Source | Canadian Institute for Cybersecurity (UNB) |
| Traffic Types | BENIGN, DoS, DDoS, PortScan, Brute Force |
| Label Encoding | BENIGN โ 0 / All Attacks โ 1 |
1. Drop irrelevant columns โโโบ Flow ID, Timestamp, IP addresses removed
2. Binary label encoding โโโบ BENIGN=0, ATTACK=1
3. Handle nulls & infinities โโโบ Missing/inf values replaced or dropped
4. Feature normalization โโโบ MinMaxScaler / StandardScaler applied
5. Feature selection โโโบ 6 real-time-compatible features retained
These 6 features balance computational speed with classification quality:
| Feature | Description |
|---|---|
Destination Port |
Target port number |
Flow Duration |
Total flow duration (ยตs) |
Fwd Packet Length Min |
Minimum forward packet size |
Packet Length Std |
Standard deviation of packet lengths |
Flow IAT Mean |
Mean inter-arrival time of packets |
Fwd IAT Mean |
Mean inter-arrival time (forward direction) |
| Model | Type | Notes |
|---|---|---|
| Logistic Regression | Linear Classifier | Fast, interpretable baseline |
| Decision Tree | Tree-based | Captures non-linear boundaries |
| Naรฏve Bayes | Probabilistic | Lightweight, works well with limited data |
| Model | Type | Notes |
|---|---|---|
| Random Forest | Bagging Ensemble | Robust to noise, strong base learner |
| Gradient Boosting | Sequential Boosting | Handles class imbalance well |
| SVM | Margin-based | Effective in high-dimensional space |
| KNN | Instance-based | Simple, good for local patterns |
| XGBoost | Extreme Boosting | Regularized, high-performance |
| LightGBM | Leaf-wise Boosting | Fastest of the ensemble trio |
The production model uses stacking โ a meta-learning strategy where base model outputs serve as inputs to a final classifier:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BASE LAYER โ
โ โ
โ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโ โ
โ โ Random Forest โ โ XGBoost โ โ LightGBM โ โ
โ โโโโโโโโโฌโโโโโโโโ โโโโโโโโโฌโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโโโ โ
โโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ META LAYER โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Logistic Regression (Combiner) โ โ
โ โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
๐ฏ Final Prediction (ATTACK / BENIGN)
Why stacking? The meta-learner learns which base model to trust for which type of input โ correcting individual model errors and achieving higher accuracy than any single model alone.
Step 1 โโบ Capture raw packets via Scapy
Step 2 โโบ Maintain per-flow statistics in memory
Step 3 โโบ Extract 6 feature values per flow
Step 4 โโบ Pass feature vector to stacked ML model
Step 5 โโบ Receive attack probability score (0.0 โ 1.0)
Step 6 โโบ Apply threshold (default: 0.6)
โ
โโโโบ score โฅ 0.6 โ ๐จ ATTACK
โโโโบ score < 0.6 โ โ
BENIGN
Step 7 โโบ Log result to CSV + MongoDB
Step 8 โโบ Push alert to React Dashboard via REST API
| Metric | What It Measures |
|---|---|
| Accuracy | Overall correct predictions across all classes |
| Precision | How many flagged alerts are true attacks (minimizes false positives) |
| Recall | How many real attacks are caught (minimizes missed detections) |
| F1-Score | Harmonic mean of Precision and Recall |
| ROC-AUC | Threshold-independent discrimination ability |
- Live attack alerts with timestamp, source/destination IPs, and probability score
- Continuously updated benign traffic log
| Chart | Purpose |
|---|---|
| Pie Chart | Attack vs Benign traffic distribution |
| Line Chart | Attack frequency trend over time |
| Column | Description |
|---|---|
| Source IP | Origin address |
| Destination IP | Target address |
| Timestamp | Detection time |
| Prediction | ๐จ ATTACK / โ BENIGN |
| Probability | Model confidence score |
- Search logs by IP address
- Filter by prediction label (ATTACK / BENIGN)
- Time range selection
- Export to CSV or PDF
ML_NIDS/
โ
โโโ ๐ data/ # CICIDS2018 dataset files
โ โโโ *.csv # Raw and preprocessed data
โ
โโโ ๐ detection-engine/ # Python ML core
โ โโโ realtime_detector.py # Main detection loop
โ โโโ feature_extractor.py # Flow-based feature computation
โ โโโ simulate_attack.py # Attack traffic generator
โ โโโ ๐ models/ # Trained .pkl model files
โ โโโ random_forest.pkl
โ โโโ xgboost_model.pkl
โ โโโ lightgbm_model.pkl
โ โโโ stacked_ensemble.pkl
โ
โโโ ๐ server/ # Node.js + Express backend
โ โโโ ๐ models/ # Mongoose schemas
โ โโโ ๐ routes/ # API route handlers
โ โโโ server.js # Entry point (port 5000)
โ
โโโ ๐ client/ # React frontend
โ โโโ ๐ components/ # Reusable UI components
โ โโโ ๐ pages/ # Dashboard pages
โ โโโ App.js # Root component (port 3000)
โ
โโโ ๐ logs/ # CSV detection logs
โโโ ๐ docs/ # Reports & analysis notebooks
โโโ README.md
Make sure the following are installed and running before setup:
| Requirement | Version |
|---|---|
| Python | 3.10+ |
| Node.js | 16+ |
| MongoDB | Running locally (default port 27017) |
git clone https://github.com/ManojThamke/ML_NIDS.git
cd ML_NIDS# Create and activate a virtual environment
python -m venv venv
# Windows
venv\Scripts\activate
# macOS / Linux
source venv/bin/activate
# Install Python dependencies
pip install pandas numpy scikit-learn joblib scapy matplotlib seaborn xgboost lightgbmcd detection-engine
python realtime_detector.py
โ ๏ธ Permissions required: Run as Administrator on Windows or withsudoon Linux/macOS for raw packet capture.
cd server
npm install
npm startBackend REST API runs at
http://localhost:5000
cd client
npm install
npm startDashboard accessible at
http://localhost:3000
- Normal web browsing
- Video/audio streaming traffic
Run these commands from within the detection-engine/ directory:
python simulate_attack.py --type udp_flood
python simulate_attack.py --type port_scan
python simulate_attack.py --type syn_flood| Attack Type | Method | Description |
|---|---|---|
| UDP Flood | Scapy UDP packet burst | Overwhelms target with UDP datagrams |
| Port Scan | Sequential port probing | Identifies open ports on target host |
| SYN Flood | Half-open TCP connections | Exhausts server connection table |
| Metric | Result |
|---|---|
| ๐ฏ Detection Accuracy | ~90%+ on CICIDS2018 test set |
| โก Real-Time Monitoring | โ Fully functional |
| ๐งฌ Ensemble Improvement over Single Models | โ Confirmed |
| ๐ Live Dashboard with Alerts | โ Operational |
| ๐ Dual Logging (CSV + MongoDB) | โ Working |
| Limitation | Details |
|---|---|
| Computational overhead | Ensemble inference is slower than single models |
| Feature dependency | Detection quality is tied to feature engineering quality |
| Hardware sensitivity | Performance varies on low-resource machines |
| Dataset gap | Trained on CICIDS2017; real-world accuracy may differ |
| Enhancement | Description |
|---|---|
| ๐ง Deep Learning | LSTM / CNN / Transformer models for sequential traffic analysis |
| โ๏ธ Cloud Deployment | Containerized deployment on AWS / Azure / GCP |
| ๐ฅ Auto-Blocking | Automatic firewall rule injection on confirmed attack detection |
| โก WebSocket Streaming | True real-time push to frontend (replacing polling) |
| ๐ฑ Mobile Dashboard | React Native monitoring app for on-the-go alerting |
| ๐ Explainability | SHAP / LIME integration for interpretable predictions |
| ๐๏ธ Multi-Dataset Support | Extend to NSL-KDD, UNSW-NB15 for generalization testing |
This project is developed for academic and educational purposes. You are free to reference, study, and build upon it with attribution.
Contributions, suggestions, and improvements are welcome!
1. ๐ด Fork the repository
2. ๐ฟ Create a feature branch โ git checkout -b feature/your-feature
3. ๐ฌ Commit your changes โ git commit -m 'Add: your feature description'
4. ๐ค Push to your branch โ git push origin feature/your-feature
5. ๐ Open a Pull Request
Please ensure your code follows existing conventions and is well-documented.