Spaces:

Beepeen24
/

Card-Fraud-detection

Sleeping

App Files Files Community

Card-Fraud-detection / README.md

Beepeen78

Deploy: Update app with feature fixes and Gradio bug patch

510e777 3 months ago

preview code

raw

history blame contribute delete

3.29 kB

	---
	title: Credit Card Fraud Detection
	emoji: 🛡️
	colorFrom: blue
	colorTo: red
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# 🛡️ Credit Card Fraud Detection System

	A machine learning pipeline for detecting fraudulent credit card transactions in real-time using an ensemble of LightGBM, XGBoost, and Random Forest models.

	## 🚀 Quick Start

	1. Upload a CSV file with transaction data
	2. Adjust the threshold (recommended: 0.01-0.05 for imbalanced data)
	3. Click "Detect Fraud" to analyze transactions
	4. View results across 15 interactive visualizations

	## 📊 Features

	- Real-time fraud scoring with calibrated probabilities
	- 15 interactive visualizations including:
	- Fraud probability distributions
	- Risk level breakdowns
	- Time series analysis
	- Model performance metrics (when ground truth available)
	- ROC and Precision-Recall curves
	- Feature correlation heatmaps
	- Threshold sensitivity analysis

	## 📁 Required CSV Format

	Your CSV should include at minimum:
	- `unix_time` or timestamp column
	- `amt` or amount column
	- `city_pop` (city population)
	- `dist_home_merch` (distance from home to merchant)
	- `category` (transaction category)

	Note: If you're missing velocity features (like `txn_count_last_1h`), the system will fill them with sensible defaults.

	## 🎯 Model Details

	- 25 engineered features including time-based, velocity, and aggregated features
	- Ensemble approach: LightGBM + XGBoost + Random Forest
	- Calibrated probabilities for reliable threshold tuning
	- Handles imbalanced data (typical fraud rate: 0.2%)

	## 📈 Expected Performance

	- ROC-AUC: ~0.81 (good discrimination)
	- PR-AUC: 0.01-0.10 (typical for imbalanced data)
	- Precision: 0.01-0.20 (depends on threshold)
	- Recall: 0.50-0.95 (depends on threshold)

	## 💡 Usage Tips

	1. Threshold Selection: For imbalanced fraud data, use 0.01-0.05 instead of 0.5. The default 0.5 is too high and will miss most fraud.

	2. File Size: Processing is limited to 10,000 rows for optimal performance.

	3. Ground Truth: If your CSV includes a fraud label column (`is_fraud`, `fraud`, `target`, etc.), the app will automatically calculate model performance metrics.

	## 🔧 Technical Details

	- Framework: Gradio Blocks API
	- Visualizations: Plotly (interactive charts)
	- Model: Calibrated LightGBM ensemble
	- Features: 25 engineered features with automatic feature engineering

	## 📝 Model File

	Important: This Space requires the model file `fraud_lgbm_calibrated.pkl` to be present.

	If deploying this Space:
	1. Train the model using `train_improved_model.py` (if you have the training script)
	2. Upload the model file to the Space repository
	3. Or use Git LFS for large model files

	## 🔗 Related Resources

	- Full project documentation: See `README.md` in the repository
	- Model training: `train_improved_model.py`
	- Sample data generator: `generate_sample_dataset.py`
	- Power BI integration: `powerbi_export.py`

	## 📄 License

	This project is for educational and portfolio purposes. Ensure you have proper data usage rights before processing real transaction data.

	---

	Built with Python, LightGBM, XGBoost, Gradio, and Plotly.