A production-ready hybrid log classification system that intelligently combines Regex rules, Machine Learning (Sentence Transformers + Logistic Regression), and LLM fallback to classify system logs with high accuracy and adaptability.
Designed to handle:
- β‘ Simple structured logs
- π€ Complex semantic logs
- π§ Unseen or ambiguous patterns
Modern applications generate massive volumes of logs that are difficult to analyze manually. This system automates log classification using a multi-stage intelligent pipeline that dynamically selects the best method for each log.
β‘ This hybrid pipeline ensures optimal trade-off between speed (Regex), accuracy (ML), and flexibility (LLM).
flowchart TD
A[API Layer - FastAPI] --> B[Incoming Log]
B --> C[Regex Engine]
C -->|Regex Match| F1[Final Output]
C -->|No Regex Match| D[ML Model\nSentence Transformer + LR]
D -->|Confidence >=\n0.75| F1
D -->|Confidence <\n0.75| E[LLM Classifier]
E --> F1
F1 --> F2[Label + Method + Confidence]
- Handles predictable log patterns
- Ultra-fast pattern matching using predefined rules
- Example:
"System reboot initiated"β System Notification
- Uses embeddings from Sentence Transformers
- Applies Logistic Regression for classification
- Works best with sufficient labeled data
- Returns:
- Predicted label
- Confidence score
- Used when:
- ML confidence is low
- Log is complex or unseen
- Uses LLM via Groq API
- Ensures robustness for real-world logs
if regex_match:
return label
label, prob = ML_model(log)
if prob > 0.75:
return label
else:
return LLM(log)Log-Classification-System/
β
βββ models/
β βββ log_classifier.joblib
β
βββ resources/
β βββ test.csv
β βββ output.csv
β
βββ training/
β βββ dataset/
β βββ log-classification.ipynb
β
βββ classify.py
βββ processor_regex.py
βββ processor_bert.py
βββ processor_llm.py
βββ server.py
βββ requirements.txt
βββ .env- Hybrid classification (Regex + ML + LLM)
- Confidence-based intelligent routing
- FastAPI-powered backend API
- CSV upload & batch classification
- Modular and scalable architecture
- Model persistence using joblib
- Handles real-world log patterns
- Backend: FastAPI
- ML: scikit-learn
- Embeddings: SentenceTransformers
- LLM: Groq API (LLaMA models)
- Data: Pandas, NumPy
git clone https://github.com/SwedeshnaMishra/Log-Classification-System.git
cd Log-Classification-Systempip install -r requirements.txtCreate .env file:
GROQ_API_KEY=your_api_key_hereuvicorn server:app --reload| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Health check |
/classify/ |
POST | Upload CSV file for batch log classification |
/classify-single/ |
POST | Classify a single log message |
/docs |
GET | Swagger UI |
/redoc |
GET | API documentation |
CSV must contain the following columns:
source,log_message
ModernCRM,User login failed
BillingSystem,Transaction timeout error
System,CPU usage exceeded thresholdsource,log_message,target_label,method_used,confidence
ModernCRM,User login failed,Security Alert,ML,0.91
BillingSystem,Transaction timeout error,Workflow Error,Regex,0.99
System,CPU usage exceeded,Resource Usage,LLM,0.87- Accuracy: ~99%
- F1 Score: 0.98+
- Dataset Size: 1900+ logs
- Embedding Dimension: 384
Located in:
training/log-classification.ipynbSteps:
- Load dataset
- Generate embeddings using Sentence Transformers
- Train Logistic Regression classifier
- Evaluate model
- Save model using joblib
| Method | Strength | Limitation |
|---|---|---|
| Regex | Fast, deterministic | Limited flexibility |
| ML | Accurate, scalable | Needs labeled data |
| LLM | Flexible, intelligent | Higher latency & cost |
Combining all three ensures:
- Speed β‘
- Accuracy π―
- Robustness π§
- π Streamlit dashboard for visualization
- π‘ Real-time log streaming support
- π³ Docker containerization
- βοΈ Cloud deployment (AWS / Render)
- π Explainable AI (prediction reasoning)
- DevOps monitoring
- Security threat detection
- System observability
- Log anomaly detection
- Automated incident classification
If you want to contribute to this project, please follow these steps:
Forkthe repository.- Create a new branch
(git checkout -b feature/your-feature-name). - Make your changes and commit them
(git commit -m 'Add some feature'). - Push to the branch
(git push origin feature/your-feature-name). - Open a pull request.
Github: Swedeshna Mishra