|
|
--- |
|
|
tags: |
|
|
- fraud-detection |
|
|
- ensemble-learning |
|
|
- e-commerce |
|
|
- imbalanced-data |
|
|
license: mit |
|
|
metrics: |
|
|
- accuracy |
|
|
- precision |
|
|
- recall |
|
|
- f1 |
|
|
- auc |
|
|
--- |
|
|
|
|
|
# E-Commerce Fraud Detection Model |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This is an ensemble fraud detection system trained on 1.47M e-commerce transactions with a 5.01% fraud rate. |
|
|
|
|
|
### Architecture |
|
|
|
|
|
**Weighted Ensemble Strategy (70%-30%)** |
|
|
- **Stage 1 - Recall Specialists (70% weight):** Logistic Regression + Random Forest |
|
|
- **Stage 2 - Precision Specialists (30% weight):** Neural Network + XGBoost |
|
|
|
|
|
### Performance Metrics |
|
|
|
|
|
| Model | Accuracy | Precision | Recall | F1-Score | AUC-ROC | |
|
|
|-------|----------|-----------|--------|----------|---------| |
|
|
| Logistic Regression | 0.5723 | 0.0988 | 0.9273 | 0.1786 | 0.8619 | |
|
|
| Random Forest | 0.6203 | 0.1075 | 0.8999 | 0.1920 | 0.8712 | |
|
|
| Neural Network | 0.9569 | 0.7013 | 0.2442 | 0.3623 | 0.8748 | |
|
|
| XGBoost | 0.9558 | 0.6632 | 0.2389 | 0.3513 | 0.8459 | |
|
|
| Stacking Ensemble | 0.8973 | 0.2640 | 0.5868 | 0.3642 | 0.8731 | |
|
|
|
|
|
|
|
|
### Key Features |
|
|
|
|
|
- **52 engineered features** including: |
|
|
- Transaction patterns (amount, quantity, frequency) |
|
|
- Customer behavior (account age, transaction history) |
|
|
- Temporal features (time-based patterns) |
|
|
- Risk indicators (unusual patterns, high-value flags) |
|
|
- Interaction features (multi-dimensional risk signals) |
|
|
|
|
|
### Training |
|
|
|
|
|
- **Resampling:** ADASYN (1:1 balance) |
|
|
- **GPU Acceleration:** RAPIDS cuML, PyTorch, XGBoost |
|
|
- **Threshold Optimization:** F-beta score optimization |
|
|
- **Validation:** Stratified K-Fold Cross-Validation |
|
|
|
|
|
### Usage |
|
|
|
|
|
```python |
|
|
### Usage |
|
|
|
|
|
## Warning: Need GPU environment with CUDA installed |
|
|
|
|
|
```python |
|
|
import joblib |
|
|
import numpy as np |
|
|
|
|
|
# Load models |
|
|
lr_model = joblib.load("lr_model.pkl") |
|
|
rf_model = joblib.load("rf_model.pkl") |
|
|
nn_model = joblib.load("nn_model.pkl") |
|
|
xgb_model = joblib.load("xgb_model.pkl") |
|
|
ensemble_model = joblib.load("ensemble_model.pkl") |
|
|
scaler = joblib.load("scaler.pkl") |
|
|
|
|
|
# Prepare your data |
|
|
df = ... |
|
|
|
|
|
X = df[df.columns.difference(['Is Fraudulent'])].copy() |
|
|
y = df['Is Fraudulent'].copy() |
|
|
|
|
|
# Predict with ensemble |
|
|
fraud_proba = ensemble_model.predict_proba(X)[:, 1] |
|
|
fraud_pred = ensemble_model.predict(X) |
|
|
|
|
|
# Evaluate predictions |
|
|
evaluate_models([lr_model, rf_model, nn_model, xgb_model, ensemble_model], X, y, ['Logistic Regression', 'Random Forest', 'Neural Network', 'XGBoost', 'Stacking Ensemble']) |
|
|
``` |
|
|
|
|
|
### License |
|
|
|
|
|
MIT License |
|
|
|
|
|
### Contact |
|
|
|
|
|
COMPSCI 4AL3 - Group 34 |
|
|
|
|
|
Viransh Shah (shahv47@mcmaster.ca) |
|
|
Ellen Xiong (xionge1@mcmaster.ca) |
|
|
|