shahviransh
/

fraud-detection

fraud-detection

ensemble-learning

imbalanced-data

Model card Files Files and versions

fraud-detection / README.md

shahviransh's picture

Update README.md

619f5be verified about 2 months ago

|

history blame contribute delete

2.52 kB

	---
	tags:
	- fraud-detection
	- ensemble-learning
	- e-commerce
	- imbalanced-data
	license: mit
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	- auc
	---

	# E-Commerce Fraud Detection Model

	## Model Description

	This is an ensemble fraud detection system trained on 1.47M e-commerce transactions with a 5.01% fraud rate.

	### Architecture

	Weighted Ensemble Strategy (70%-30%)
	- Stage 1 - Recall Specialists (70% weight): Logistic Regression + Random Forest
	- Stage 2 - Precision Specialists (30% weight): Neural Network + XGBoost

	### Performance Metrics

	\| Model \| Accuracy \| Precision \| Recall \| F1-Score \| AUC-ROC \|
	\|-------\|----------\|-----------\|--------\|----------\|---------\|
	\| Logistic Regression \| 0.5723 \| 0.0988 \| 0.9273 \| 0.1786 \| 0.8619 \|
	\| Random Forest \| 0.6203 \| 0.1075 \| 0.8999 \| 0.1920 \| 0.8712 \|
	\| Neural Network \| 0.9569 \| 0.7013 \| 0.2442 \| 0.3623 \| 0.8748 \|
	\| XGBoost \| 0.9558 \| 0.6632 \| 0.2389 \| 0.3513 \| 0.8459 \|
	\| Stacking Ensemble \| 0.8973 \| 0.2640 \| 0.5868 \| 0.3642 \| 0.8731 \|


	### Key Features

	- 52 engineered features including:
	- Transaction patterns (amount, quantity, frequency)
	- Customer behavior (account age, transaction history)
	- Temporal features (time-based patterns)
	- Risk indicators (unusual patterns, high-value flags)
	- Interaction features (multi-dimensional risk signals)

	### Training

	- Resampling: ADASYN (1:1 balance)
	- GPU Acceleration: RAPIDS cuML, PyTorch, XGBoost
	- Threshold Optimization: F-beta score optimization
	- Validation: Stratified K-Fold Cross-Validation

	### Usage

	```python
	### Usage

	## Warning: Need GPU environment with CUDA installed

	```python
	import joblib
	import numpy as np

	# Load models
	lr_model = joblib.load("lr_model.pkl")
	rf_model = joblib.load("rf_model.pkl")
	nn_model = joblib.load("nn_model.pkl")
	xgb_model = joblib.load("xgb_model.pkl")
	ensemble_model = joblib.load("ensemble_model.pkl")
	scaler = joblib.load("scaler.pkl")

	# Prepare your data
	df = ...

	X = df[df.columns.difference(['Is Fraudulent'])].copy()
	y = df['Is Fraudulent'].copy()

	# Predict with ensemble
	fraud_proba = ensemble_model.predict_proba(X)[:, 1]
	fraud_pred = ensemble_model.predict(X)

	# Evaluate predictions
	evaluate_models([lr_model, rf_model, nn_model, xgb_model, ensemble_model], X, y, ['Logistic Regression', 'Random Forest', 'Neural Network', 'XGBoost', 'Stacking Ensemble'])
	```

	### License

	MIT License

	### Contact

	COMPSCI 4AL3 - Group 34

	Viransh Shah (shahv47@mcmaster.ca)
	Ellen Xiong (xionge1@mcmaster.ca)