Spaces:

Beepeen24
/

Card-Fraud-detection

Sleeping

App Files Files Community

Card-Fraud-detection / README.md

Beepeen78

Deploy: Update app with feature fixes and Gradio bug patch

510e777 3 months ago

preview code

raw

history blame contribute delete

3.29 kB

A newer version of the Gradio SDK is available: 6.8.0

Upgrade

metadata

title: Credit Card Fraud Detection
emoji: 🛡️
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit

🛡️ Credit Card Fraud Detection System

A machine learning pipeline for detecting fraudulent credit card transactions in real-time using an ensemble of LightGBM, XGBoost, and Random Forest models.

🚀 Quick Start

Upload a CSV file with transaction data
Adjust the threshold (recommended: 0.01-0.05 for imbalanced data)
Click "Detect Fraud" to analyze transactions
View results across 15 interactive visualizations

📊 Features

Real-time fraud scoring with calibrated probabilities
15 interactive visualizations including:
- Fraud probability distributions
- Risk level breakdowns
- Time series analysis
- Model performance metrics (when ground truth available)
- ROC and Precision-Recall curves
- Feature correlation heatmaps
- Threshold sensitivity analysis

📁 Required CSV Format

Your CSV should include at minimum:

unix_time or timestamp column
amt or amount column
city_pop (city population)
dist_home_merch (distance from home to merchant)
category (transaction category)

Note: If you're missing velocity features (like txn_count_last_1h), the system will fill them with sensible defaults.

🎯 Model Details

25 engineered features including time-based, velocity, and aggregated features
Ensemble approach: LightGBM + XGBoost + Random Forest
Calibrated probabilities for reliable threshold tuning
Handles imbalanced data (typical fraud rate: 0.2%)

📈 Expected Performance

ROC-AUC: ~0.81 (good discrimination)
PR-AUC: 0.01-0.10 (typical for imbalanced data)
Precision: 0.01-0.20 (depends on threshold)
Recall: 0.50-0.95 (depends on threshold)

💡 Usage Tips

Threshold Selection: For imbalanced fraud data, use 0.01-0.05 instead of 0.5. The default 0.5 is too high and will miss most fraud.
File Size: Processing is limited to 10,000 rows for optimal performance.
Ground Truth: If your CSV includes a fraud label column (is_fraud, fraud, target, etc.), the app will automatically calculate model performance metrics.

🔧 Technical Details

Framework: Gradio Blocks API
Visualizations: Plotly (interactive charts)
Model: Calibrated LightGBM ensemble
Features: 25 engineered features with automatic feature engineering

📝 Model File

Important: This Space requires the model file fraud_lgbm_calibrated.pkl to be present.

If deploying this Space:

Train the model using train_improved_model.py (if you have the training script)
Upload the model file to the Space repository
Or use Git LFS for large model files

🔗 Related Resources

Full project documentation: See README.md in the repository
Model training: train_improved_model.py
Sample data generator: generate_sample_dataset.py
Power BI integration: powerbi_export.py

📄 License

This project is for educational and portfolio purposes. Ensure you have proper data usage rights before processing real transaction data.

Built with Python, LightGBM, XGBoost, Gradio, and Plotly.