Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.8.0
title: Credit Card Fraud Detection
emoji: π‘οΈ
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
π‘οΈ Credit Card Fraud Detection System
A machine learning pipeline for detecting fraudulent credit card transactions in real-time using an ensemble of LightGBM, XGBoost, and Random Forest models.
π Quick Start
- Upload a CSV file with transaction data
- Adjust the threshold (recommended: 0.01-0.05 for imbalanced data)
- Click "Detect Fraud" to analyze transactions
- View results across 15 interactive visualizations
π Features
- Real-time fraud scoring with calibrated probabilities
- 15 interactive visualizations including:
- Fraud probability distributions
- Risk level breakdowns
- Time series analysis
- Model performance metrics (when ground truth available)
- ROC and Precision-Recall curves
- Feature correlation heatmaps
- Threshold sensitivity analysis
π Required CSV Format
Your CSV should include at minimum:
unix_timeor timestamp columnamtor amount columncity_pop(city population)dist_home_merch(distance from home to merchant)category(transaction category)
Note: If you're missing velocity features (like txn_count_last_1h), the system will fill them with sensible defaults.
π― Model Details
- 25 engineered features including time-based, velocity, and aggregated features
- Ensemble approach: LightGBM + XGBoost + Random Forest
- Calibrated probabilities for reliable threshold tuning
- Handles imbalanced data (typical fraud rate: 0.2%)
π Expected Performance
- ROC-AUC: ~0.81 (good discrimination)
- PR-AUC: 0.01-0.10 (typical for imbalanced data)
- Precision: 0.01-0.20 (depends on threshold)
- Recall: 0.50-0.95 (depends on threshold)
π‘ Usage Tips
Threshold Selection: For imbalanced fraud data, use 0.01-0.05 instead of 0.5. The default 0.5 is too high and will miss most fraud.
File Size: Processing is limited to 10,000 rows for optimal performance.
Ground Truth: If your CSV includes a fraud label column (
is_fraud,fraud,target, etc.), the app will automatically calculate model performance metrics.
π§ Technical Details
- Framework: Gradio Blocks API
- Visualizations: Plotly (interactive charts)
- Model: Calibrated LightGBM ensemble
- Features: 25 engineered features with automatic feature engineering
π Model File
Important: This Space requires the model file fraud_lgbm_calibrated.pkl to be present.
If deploying this Space:
- Train the model using
train_improved_model.py(if you have the training script) - Upload the model file to the Space repository
- Or use Git LFS for large model files
π Related Resources
- Full project documentation: See
README.mdin the repository - Model training:
train_improved_model.py - Sample data generator:
generate_sample_dataset.py - Power BI integration:
powerbi_export.py
π License
This project is for educational and portfolio purposes. Ensure you have proper data usage rights before processing real transaction data.
Built with Python, LightGBM, XGBoost, Gradio, and Plotly.