GitHub - sid-thephysicskid/fraud_detection

Project Overview

This project implements a machine learning solution to flag fraudulent transactions given the constraints of the business problem: aka, only 400 transactions can be reviewed per month, and the goal is to prevent the most amount of fraud value.

Key Features:

Time-based train-test split to simulate production conditions (last month used as test data)
Custom feature engineering for fraud detection
Machine learning model (XGBoost) for fraud prediction
Rule-based fraud detection for comparison
Cross-validation for robust model evaluation
Performance evaluation using a custom fraud capture score

Project Structure

├── data/
│   ├── transactions_obf.csv
│   └── labels_obf.csv
├── src/
│   ├── main.py
│   ├── data_loader.py
│   ├── feature_engineering.py
│   ├── model_pipeline.py
│   └── evaluation.py
└── README.md

Installation and Setup

Install required packages ussing requirements.txt:
```
pip install -r requirements.txt
```
Alternatively, if there are any clashes, the following should work:
```
pip install pandas numpy scikit-learn xgboost imbalanced-learn joblib
```
Ensure the data files are in the data/ directory

Usage

Run the main script:

python src/main.py

This will:

Load and preprocess the data
Perform cross-validation
Train the final model
Evaluate the model on the test set
Compare the ML model with a rule-based approach
Save the trained model

Model Performance

The machine learning model significantly outperforms the rule-based approach:

ML Model Fraud Capture Score: 90.52%
Rule-Based Model Fraud Capture Score: 63.93%
Improvement: 26.59%

Cross-validation results:

ML Model Average Fraud Capture Score: 88.36% (+/- 3.89%)
Rule-Based Model Average Fraud Capture Score: 77.51% (+/- 8.16%)
Average Improvement: 10.86%

Key Components

DataLoader: Loads transaction and label data
DataPreprocessor: Performs initial data preprocessing
CustomFeatureEngineer: Implements domain-specific feature engineering
FraudDetectionPipeline: Combines feature engineering, preprocessing, and model training
evaluate_models_with_cv: Performs cross-validation for model evaluation
fraud_capture_score: Custom metric for evaluating model performance

Future Improvements

Feature importance analysis for better understanding of the model
Hyperparameter tuning for potentially improved performance
Exploration of other machine learning algorithms

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Key Features:

Project Structure

Installation and Setup

Usage

Model Performance

Key Components

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Key Features:

Project Structure

Installation and Setup

Usage

Model Performance

Key Components

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages