Spaces:
Sleeping
Sleeping
| title: Credit Card Fraud Detection | |
| emoji: π‘οΈ | |
| colorFrom: blue | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # π‘οΈ Credit Card Fraud Detection System | |
| A machine learning pipeline for detecting fraudulent credit card transactions in real-time using an ensemble of LightGBM, XGBoost, and Random Forest models. | |
| ## π Quick Start | |
| 1. **Upload a CSV file** with transaction data | |
| 2. **Adjust the threshold** (recommended: 0.01-0.05 for imbalanced data) | |
| 3. **Click "Detect Fraud"** to analyze transactions | |
| 4. **View results** across 15 interactive visualizations | |
| ## π Features | |
| - **Real-time fraud scoring** with calibrated probabilities | |
| - **15 interactive visualizations** including: | |
| - Fraud probability distributions | |
| - Risk level breakdowns | |
| - Time series analysis | |
| - Model performance metrics (when ground truth available) | |
| - ROC and Precision-Recall curves | |
| - Feature correlation heatmaps | |
| - Threshold sensitivity analysis | |
| ## π Required CSV Format | |
| Your CSV should include at minimum: | |
| - `unix_time` or timestamp column | |
| - `amt` or amount column | |
| - `city_pop` (city population) | |
| - `dist_home_merch` (distance from home to merchant) | |
| - `category` (transaction category) | |
| **Note:** If you're missing velocity features (like `txn_count_last_1h`), the system will fill them with sensible defaults. | |
| ## π― Model Details | |
| - **25 engineered features** including time-based, velocity, and aggregated features | |
| - **Ensemble approach**: LightGBM + XGBoost + Random Forest | |
| - **Calibrated probabilities** for reliable threshold tuning | |
| - **Handles imbalanced data** (typical fraud rate: 0.2%) | |
| ## π Expected Performance | |
| - **ROC-AUC**: ~0.81 (good discrimination) | |
| - **PR-AUC**: 0.01-0.10 (typical for imbalanced data) | |
| - **Precision**: 0.01-0.20 (depends on threshold) | |
| - **Recall**: 0.50-0.95 (depends on threshold) | |
| ## π‘ Usage Tips | |
| 1. **Threshold Selection**: For imbalanced fraud data, use **0.01-0.05** instead of 0.5. The default 0.5 is too high and will miss most fraud. | |
| 2. **File Size**: Processing is limited to 10,000 rows for optimal performance. | |
| 3. **Ground Truth**: If your CSV includes a fraud label column (`is_fraud`, `fraud`, `target`, etc.), the app will automatically calculate model performance metrics. | |
| ## π§ Technical Details | |
| - **Framework**: Gradio Blocks API | |
| - **Visualizations**: Plotly (interactive charts) | |
| - **Model**: Calibrated LightGBM ensemble | |
| - **Features**: 25 engineered features with automatic feature engineering | |
| ## π Model File | |
| **Important**: This Space requires the model file `fraud_lgbm_calibrated.pkl` to be present. | |
| If deploying this Space: | |
| 1. Train the model using `train_improved_model.py` (if you have the training script) | |
| 2. Upload the model file to the Space repository | |
| 3. Or use Git LFS for large model files | |
| ## π Related Resources | |
| - Full project documentation: See `README.md` in the repository | |
| - Model training: `train_improved_model.py` | |
| - Sample data generator: `generate_sample_dataset.py` | |
| - Power BI integration: `powerbi_export.py` | |
| ## π License | |
| This project is for educational and portfolio purposes. Ensure you have proper data usage rights before processing real transaction data. | |
| --- | |
| Built with Python, LightGBM, XGBoost, Gradio, and Plotly. | |