File size: 2,516 Bytes
719445e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
619f5be
 
 
 
 
719445e
 
 
 
 
 
 
 
 
 
 
 
619f5be
 
719445e
619f5be
 
719445e
 
619f5be
 
 
 
 
719445e
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
tags:
- fraud-detection
- ensemble-learning
- e-commerce
- imbalanced-data
license: mit
metrics:
- accuracy
- precision
- recall
- f1
- auc
---

# E-Commerce Fraud Detection Model

## Model Description

This is an ensemble fraud detection system trained on 1.47M e-commerce transactions with a 5.01% fraud rate.

### Architecture

**Weighted Ensemble Strategy (70%-30%)**
- **Stage 1 - Recall Specialists (70% weight):** Logistic Regression + Random Forest
- **Stage 2 - Precision Specialists (30% weight):** Neural Network + XGBoost

### Performance Metrics

| Model | Accuracy | Precision | Recall | F1-Score | AUC-ROC |
|-------|----------|-----------|--------|----------|---------|
| Logistic Regression | 0.5723 | 0.0988 | 0.9273 | 0.1786 | 0.8619 |
| Random Forest | 0.6203 | 0.1075 | 0.8999 | 0.1920 | 0.8712 |
| Neural Network | 0.9569 | 0.7013 | 0.2442 | 0.3623 | 0.8748 |
| XGBoost | 0.9558 | 0.6632 | 0.2389 | 0.3513 | 0.8459 |
| Stacking Ensemble | 0.8973 | 0.2640 | 0.5868 | 0.3642 | 0.8731 |


### Key Features

- **52 engineered features** including:
  - Transaction patterns (amount, quantity, frequency)
  - Customer behavior (account age, transaction history)
  - Temporal features (time-based patterns)
  - Risk indicators (unusual patterns, high-value flags)
  - Interaction features (multi-dimensional risk signals)

### Training

- **Resampling:** ADASYN (1:1 balance)
- **GPU Acceleration:** RAPIDS cuML, PyTorch, XGBoost
- **Threshold Optimization:** F-beta score optimization
- **Validation:** Stratified K-Fold Cross-Validation

### Usage

```python
### Usage

## Warning: Need GPU environment with CUDA installed

```python
import joblib
import numpy as np

# Load models
lr_model = joblib.load("lr_model.pkl")
rf_model = joblib.load("rf_model.pkl")
nn_model = joblib.load("nn_model.pkl")
xgb_model = joblib.load("xgb_model.pkl")
ensemble_model = joblib.load("ensemble_model.pkl")
scaler = joblib.load("scaler.pkl")

# Prepare your data
df = ...

X = df[df.columns.difference(['Is Fraudulent'])].copy()
y = df['Is Fraudulent'].copy()

# Predict with ensemble
fraud_proba = ensemble_model.predict_proba(X)[:, 1]
fraud_pred = ensemble_model.predict(X)

# Evaluate predictions
evaluate_models([lr_model, rf_model, nn_model, xgb_model, ensemble_model], X, y, ['Logistic Regression', 'Random Forest', 'Neural Network', 'XGBoost', 'Stacking Ensemble'])
```

### License

MIT License

### Contact

COMPSCI 4AL3 - Group 34

Viransh Shah (shahv47@mcmaster.ca)
Ellen Xiong (xionge1@mcmaster.ca)