fraud-detection / README.md
shahviransh's picture
Update README.md
619f5be verified
---
tags:
- fraud-detection
- ensemble-learning
- e-commerce
- imbalanced-data
license: mit
metrics:
- accuracy
- precision
- recall
- f1
- auc
---
# E-Commerce Fraud Detection Model
## Model Description
This is an ensemble fraud detection system trained on 1.47M e-commerce transactions with a 5.01% fraud rate.
### Architecture
**Weighted Ensemble Strategy (70%-30%)**
- **Stage 1 - Recall Specialists (70% weight):** Logistic Regression + Random Forest
- **Stage 2 - Precision Specialists (30% weight):** Neural Network + XGBoost
### Performance Metrics
| Model | Accuracy | Precision | Recall | F1-Score | AUC-ROC |
|-------|----------|-----------|--------|----------|---------|
| Logistic Regression | 0.5723 | 0.0988 | 0.9273 | 0.1786 | 0.8619 |
| Random Forest | 0.6203 | 0.1075 | 0.8999 | 0.1920 | 0.8712 |
| Neural Network | 0.9569 | 0.7013 | 0.2442 | 0.3623 | 0.8748 |
| XGBoost | 0.9558 | 0.6632 | 0.2389 | 0.3513 | 0.8459 |
| Stacking Ensemble | 0.8973 | 0.2640 | 0.5868 | 0.3642 | 0.8731 |
### Key Features
- **52 engineered features** including:
- Transaction patterns (amount, quantity, frequency)
- Customer behavior (account age, transaction history)
- Temporal features (time-based patterns)
- Risk indicators (unusual patterns, high-value flags)
- Interaction features (multi-dimensional risk signals)
### Training
- **Resampling:** ADASYN (1:1 balance)
- **GPU Acceleration:** RAPIDS cuML, PyTorch, XGBoost
- **Threshold Optimization:** F-beta score optimization
- **Validation:** Stratified K-Fold Cross-Validation
### Usage
```python
### Usage
## Warning: Need GPU environment with CUDA installed
```python
import joblib
import numpy as np
# Load models
lr_model = joblib.load("lr_model.pkl")
rf_model = joblib.load("rf_model.pkl")
nn_model = joblib.load("nn_model.pkl")
xgb_model = joblib.load("xgb_model.pkl")
ensemble_model = joblib.load("ensemble_model.pkl")
scaler = joblib.load("scaler.pkl")
# Prepare your data
df = ...
X = df[df.columns.difference(['Is Fraudulent'])].copy()
y = df['Is Fraudulent'].copy()
# Predict with ensemble
fraud_proba = ensemble_model.predict_proba(X)[:, 1]
fraud_pred = ensemble_model.predict(X)
# Evaluate predictions
evaluate_models([lr_model, rf_model, nn_model, xgb_model, ensemble_model], X, y, ['Logistic Regression', 'Random Forest', 'Neural Network', 'XGBoost', 'Stacking Ensemble'])
```
### License
MIT License
### Contact
COMPSCI 4AL3 - Group 34
Viransh Shah (shahv47@mcmaster.ca)
Ellen Xiong (xionge1@mcmaster.ca)