Beepeen78
Deploy: Update app with feature fixes and Gradio bug patch
510e777
---
title: Credit Card Fraud Detection
emoji: πŸ›‘οΈ
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---
# πŸ›‘οΈ Credit Card Fraud Detection System
A machine learning pipeline for detecting fraudulent credit card transactions in real-time using an ensemble of LightGBM, XGBoost, and Random Forest models.
## πŸš€ Quick Start
1. **Upload a CSV file** with transaction data
2. **Adjust the threshold** (recommended: 0.01-0.05 for imbalanced data)
3. **Click "Detect Fraud"** to analyze transactions
4. **View results** across 15 interactive visualizations
## πŸ“Š Features
- **Real-time fraud scoring** with calibrated probabilities
- **15 interactive visualizations** including:
- Fraud probability distributions
- Risk level breakdowns
- Time series analysis
- Model performance metrics (when ground truth available)
- ROC and Precision-Recall curves
- Feature correlation heatmaps
- Threshold sensitivity analysis
## πŸ“ Required CSV Format
Your CSV should include at minimum:
- `unix_time` or timestamp column
- `amt` or amount column
- `city_pop` (city population)
- `dist_home_merch` (distance from home to merchant)
- `category` (transaction category)
**Note:** If you're missing velocity features (like `txn_count_last_1h`), the system will fill them with sensible defaults.
## 🎯 Model Details
- **25 engineered features** including time-based, velocity, and aggregated features
- **Ensemble approach**: LightGBM + XGBoost + Random Forest
- **Calibrated probabilities** for reliable threshold tuning
- **Handles imbalanced data** (typical fraud rate: 0.2%)
## πŸ“ˆ Expected Performance
- **ROC-AUC**: ~0.81 (good discrimination)
- **PR-AUC**: 0.01-0.10 (typical for imbalanced data)
- **Precision**: 0.01-0.20 (depends on threshold)
- **Recall**: 0.50-0.95 (depends on threshold)
## πŸ’‘ Usage Tips
1. **Threshold Selection**: For imbalanced fraud data, use **0.01-0.05** instead of 0.5. The default 0.5 is too high and will miss most fraud.
2. **File Size**: Processing is limited to 10,000 rows for optimal performance.
3. **Ground Truth**: If your CSV includes a fraud label column (`is_fraud`, `fraud`, `target`, etc.), the app will automatically calculate model performance metrics.
## πŸ”§ Technical Details
- **Framework**: Gradio Blocks API
- **Visualizations**: Plotly (interactive charts)
- **Model**: Calibrated LightGBM ensemble
- **Features**: 25 engineered features with automatic feature engineering
## πŸ“ Model File
**Important**: This Space requires the model file `fraud_lgbm_calibrated.pkl` to be present.
If deploying this Space:
1. Train the model using `train_improved_model.py` (if you have the training script)
2. Upload the model file to the Space repository
3. Or use Git LFS for large model files
## πŸ”— Related Resources
- Full project documentation: See `README.md` in the repository
- Model training: `train_improved_model.py`
- Sample data generator: `generate_sample_dataset.py`
- Power BI integration: `powerbi_export.py`
## πŸ“„ License
This project is for educational and portfolio purposes. Ensure you have proper data usage rights before processing real transaction data.
---
Built with Python, LightGBM, XGBoost, Gradio, and Plotly.