The Future of Steganography Detection โ Machine Learning powered, Wavelet-Enhanced, GUI-Driven
This project has undergone a significant overhaul for improved maintainability, performance, accuracy, and robust error handling. The core functionalities have been modularized, and several critical issues addressed.
Steganography is not just about hiding information โ itโs about the invisible war in cyberspace.
Traditional security tools fail to detect hidden payloads inside images, videos, or signals.
Thatโs why Steg-Detector Pro was born: a futuristic, AI-driven, and research-grade framework that blends:
- ๐ง Machine Learning for smart detection
- ๐ Wavelet Transform (PyWavelets) for frequency feature extraction
- ๐ผ๏ธ Computer Vision (OpenCV) for image processing
- ๐ฅ๏ธ Beautiful GUI (Tkinter) for easy use
- ๐ Dataset Training so it keeps evolving with your data
- Modular Architecture: The entire codebase has been refactored into a
src/directory with logical submodules (src/logic,src/gui,src/utils). This greatly enhances maintainability and extensibility. - SSIM Bug Fix: Corrected a critical bug in the Structural Similarity Index (SSIM) feature extraction for images. Previously, SSIM was incorrectly calculated by comparing an image to itself; it now compares the image to a slightly blurred version for a more accurate steganographic artifact detection.
- Enhanced Audio Feature Extraction:
- MFCCs Integration: Added Mel-frequency cepstral coefficients (MFCCs) using
librosa, significantly improving the accuracy of audio steganography detection. - Performance Optimization: Optimized audio processing by calculating
rfftfreqoutside loops, reducing computational overhead.
- MFCCs Integration: Added Mel-frequency cepstral coefficients (MFCCs) using
- Robust Dependency Handling: Implemented conditional imports for optional libraries (
librosa,pywt,skimage,pydub). The application now logs warnings and gracefully falls back to alternative methods or defaults if these libraries are not installed, preventing crashes. - Centralized Configuration: All application-wide constants are now managed in
src/config.py, making it easier to customize settings. - Improved Error Handling: Replaced generic
except Exceptionblocks with more specific exception types across the codebase for clearer error reporting and more robust application behavior. - Comprehensive Unit Tests: Added a new
tests/directory with unit tests for core functionalities, ensuring code reliability and preventing regressions.
-
Clone the repository:
git clone https://github.com/0warn/STEG-Detector.git cd STEG-Detector -
Install dependencies:
Ensure you havePython 3.8+andpipinstalled. Then, install all required libraries:pip install -r requirement.txt
โก Note: Some optional dependencies like
librosaandpywt(PyWavelets) might require system-level dependencies depending on your OS. The application will function without them, but certain features (MFCCs, Wavelet features) will be disabled.
- Run the detector:
python3 main.py
Due to significant internal changes in scikit-learn versions, any previously trained model files (steg_detector_model.joblib) are incompatible with this updated version of STEG-Detector.
Upon first launch (or if an incompatible model exists), the application will start, but detection features will not work until a new model is trained and saved.
Steps to Retrain Your Model:
- Launch the application (
python3 main.py). - Navigate to the "Train / Evaluate" tab in the GUI.
- Select your Dataset Source: Choose either "Folders (clean/stego)" or "CSV (path,label)".
- If "Folders", browse to your root dataset directory (e.g., a folder containing
clean/andstego/subfolders). - If "CSV", browse to your CSV file containing paths and labels.
- If "Folders", browse to your root dataset directory (e.g., a folder containing
- Select Algorithm: Choose
rf(RandomForest) orxgb(XGBoost, requiresxgboostpackage installed). - Click the "Train & Evaluate" button.
- Once training is complete and evaluation results are displayed, click the "Save Model" button. This will save a new, compatible model file (
steg_detector_model.joblib) in your project's root directory.
After saving the new model, the application's detection capabilities will be fully functional.
- ๐ Detect โ Upload a media file (image, audio, or video) to check for hidden data.
- ๐ Train / Evaluate โ Train new ML models with your datasets and evaluate their performance.
- โ๏ธ Dataset โ Access tools for generating synthetic stego datasets (useful for bootstrapping training data).
- โน๏ธ About / Updates โ View application information and check for updates.
- Prepare your dataset:
- For "Folders" mode: Create a root directory containing two subfolders:
cover/โ Contains original, clean media files.stego/โ Contains steganographically altered media files.
- For "CSV" mode: Create a CSV file where each row contains
path_to_media_file,label(e.g.,image1.png,0for clean,stego_image1.png,1for stego).
- For "Folders" mode: Create a root directory containing two subfolders:
- Use the "Train / Evaluate" tab in the GUI, select your source, and follow the retraining steps mentioned above.
- The trained model is saved as
steg_detector_model.joblibfor future detection.
# Assuming you have trained and saved a model using the GUI.
# If you wish to use the detector programmatically:
from src.logic.detector import Detector
# Initialize the detector (it will attempt to load the model saved as steg_detector_model.joblib)
detector = Detector()
# Example detection:
result = detector.detect("path/to/your/sample_image.png")
if result.get('ok'):
print(f"Detection Result for {result['path']}:")
print(f" Type: {result['type']}")
print(f" Heuristics Flag: {result['heuristics'].get('flag')}")
if result.get('ml_prediction') is not None:
prediction_label = "Stego Detected โ
" if result['ml_prediction'] == 1 else "Clean โ"
print(f" ML Prediction: {prediction_label}")
else:
print(" ML Prediction: Model not loaded or available.")
else:
print(f"Error detecting steganography: {result.get('error')}")| Error / Warning | Cause | Solution |
|---|---|---|
ModuleNotFoundError: No module named 'pywt' / librosa / skimage / pydub |
Optional dependency not installed. | Run pip install -r requirement.txt. If issues persist, ensure librosa and pywt might have underlying system dependencies. The application will function, but related features might be unavailable. |
ValueError: node array from the pickle has an incompatible dtype (on startup or load) |
Model file (steg_detector_model.joblib) was trained with an older scikit-learn version. |
Retrain the model using the "Train / Evaluate" tab in the GUI, then click "Save Model". The new model will be compatible with your current environment. |
GUI crashes on Linux (_tkinter error) |
Tkinter development files not installed. | For Debian/Ubuntu, run sudo apt install python3-tk. For other systems, consult your package manager. |
| Model not found (no detection results) | No model has been trained or saved yet, or the loaded model is incompatible. | Use the "Train / Evaluate" tab in the GUI to train a new model with your dataset, then click "Save Model". |
- ๐ฅ Video steganography detection (Initial support added)
- ๐ Audio steganography analysis (Initial support and MFCC features added)
- ๐ฎ Deep Learning (CNN, ResNet) integration
- ๐ธ๏ธ Web-based dashboard
- ๐ก Real-time stego-sniffer for network traffic
๐จโ๐ป Developed with dedication by CODE ๐ก "Because hidden data should never remain invisible."