IoT Intrusion Detection System (IDS)

Apache Spark MLlib | BoT-IoT Dataset | Real-time Detection

A production-grade, scalable IoT Intrusion Detection System built with Apache Spark for processing massive network traffic. Train ML models on the BoT-IoT dataset from UNSW Canberra Cyber, achieving near real-time anomaly detection with reproducible ML pipelines.

🎯 Project Overview

This system implements a complete ML pipeline for detecting IoT-based network attacks using:

Apache Spark MLlib for distributed ML training
BoT-IoT Dataset with realistic attack patterns (DDoS, DoS, Reconnaissance, Theft)
Real-time streaming detection with live alert monitoring
Full-stack GUI for complete pipeline control

Key Features

✅ Data Ingestion

Generate synthetic BoT-IoT dataset with realistic traffic patterns
Support for 72M+ records (configurable)
Automatic preprocessing and validation

✅ ML Model Training

Random Forest (F1 Score: >99%)
Decision Tree (F1 Score: ~98%)
Naive Bayes (Baseline)
Chi-square feature selection (Top 5, Top 10, All features)
Hyperparameter tuning (Max Depth, Number of Trees)

✅ Model Evaluation

Comprehensive metrics (Accuracy, F1, Precision, Recall)
Confusion matrix visualization
Model comparison dashboard
Training time analysis

✅ Real-time Detection

Simulated streaming detection service
Live threat alerts with severity levels
Attack categorization (DDoS, DoS, Reconnaissance, Theft)
Real-time statistics and monitoring

✅ Full GUI Dashboard

System overview with key metrics
Data ingestion and preprocessing interface
Model training configuration
Evaluation and comparison tools
Live monitoring with real-time alerts

🚀 Quick Start

Access the Application

Frontend: http://localhost:3000
Backend API: http://localhost:8001/api

Complete Workflow

Generate Dataset
- Navigate to Data Ingestion
- Set sample size (e.g., 10,000)
- Click "Generate Dataset"
- View statistics
Train Model
- Go to Model Training
- Select algorithm (Random Forest recommended)
- Choose feature selection
- Adjust hyperparameters
- Click "Start Training"
- Wait for results (~15-20s)
Evaluate Models
- Visit Evaluation page
- Compare all trained models
- View best performing model
- Analyze metrics
Start Monitoring
- Go to Live Monitoring
- Click "Start Detection"
- View real-time alerts
- Monitor attack patterns

📊 Dataset: BoT-IoT

Source

UNSW Canberra Cyber - https://research.unsw.edu.au/projects/bot-iot-dataset

Attack Types

DDoS - Distributed Denial of Service
DoS - Denial of Service
Reconnaissance - Network scanning
Theft - Data exfiltration

Performance Benchmarks

Algorithm	F1 Score	Accuracy	Training Time
Random Forest	99.7%	99.8%	~15-20s
Decision Tree	99.3%	99.5%	~8-12s
Naive Bayes	50.9%	51.2%	~5-8s

🔧 API Endpoints

Data Ingestion

POST /api/data/generate
GET /api/data/stats

Model Training

POST /api/train
GET /api/models

Streaming Detection

POST /api/streaming/start
POST /api/streaming/stop
GET /api/alerts/recent

📁 Project Structure

/app/
├── backend/               # FastAPI + Spark MLlib
│   ├── server.py
│   ├── spark_engine.py
│   ├── data_ingestion.py
│   ├── model_training.py
│   └── streaming_detection.py
│
├── frontend/              # React Dashboard
│   └── src/
│       ├── pages/
│       └── components/
│
├── data/                  # Datasets & models
└── alerts/               # Detection alerts

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
alerts		alerts
data/raw/bot-iot		data/raw/bot-iot
frontend		frontend
models		models
tests		tests
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IoT Intrusion Detection System (IDS)

Apache Spark MLlib | BoT-IoT Dataset | Real-time Detection

🎯 Project Overview

Key Features

🚀 Quick Start

Access the Application

Complete Workflow

📊 Dataset: BoT-IoT

Source

Attack Types

Performance Benchmarks

🔧 API Endpoints

Data Ingestion

Model Training

Streaming Detection

📁 Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IoT Intrusion Detection System (IDS)

Apache Spark MLlib | BoT-IoT Dataset | Real-time Detection

🎯 Project Overview

Key Features

🚀 Quick Start

Access the Application

Complete Workflow

📊 Dataset: BoT-IoT

Source

Attack Types

Performance Benchmarks

🔧 API Endpoints

Data Ingestion

Model Training

Streaming Detection

📁 Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages