Beepeen78
Deploy: Update app with feature fixes and Gradio bug patch
510e777

A newer version of the Gradio SDK is available: 6.8.0

Upgrade
metadata
title: Credit Card Fraud Detection
emoji: πŸ›‘οΈ
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit

πŸ›‘οΈ Credit Card Fraud Detection System

A machine learning pipeline for detecting fraudulent credit card transactions in real-time using an ensemble of LightGBM, XGBoost, and Random Forest models.

πŸš€ Quick Start

  1. Upload a CSV file with transaction data
  2. Adjust the threshold (recommended: 0.01-0.05 for imbalanced data)
  3. Click "Detect Fraud" to analyze transactions
  4. View results across 15 interactive visualizations

πŸ“Š Features

  • Real-time fraud scoring with calibrated probabilities
  • 15 interactive visualizations including:
    • Fraud probability distributions
    • Risk level breakdowns
    • Time series analysis
    • Model performance metrics (when ground truth available)
    • ROC and Precision-Recall curves
    • Feature correlation heatmaps
    • Threshold sensitivity analysis

πŸ“ Required CSV Format

Your CSV should include at minimum:

  • unix_time or timestamp column
  • amt or amount column
  • city_pop (city population)
  • dist_home_merch (distance from home to merchant)
  • category (transaction category)

Note: If you're missing velocity features (like txn_count_last_1h), the system will fill them with sensible defaults.

🎯 Model Details

  • 25 engineered features including time-based, velocity, and aggregated features
  • Ensemble approach: LightGBM + XGBoost + Random Forest
  • Calibrated probabilities for reliable threshold tuning
  • Handles imbalanced data (typical fraud rate: 0.2%)

πŸ“ˆ Expected Performance

  • ROC-AUC: ~0.81 (good discrimination)
  • PR-AUC: 0.01-0.10 (typical for imbalanced data)
  • Precision: 0.01-0.20 (depends on threshold)
  • Recall: 0.50-0.95 (depends on threshold)

πŸ’‘ Usage Tips

  1. Threshold Selection: For imbalanced fraud data, use 0.01-0.05 instead of 0.5. The default 0.5 is too high and will miss most fraud.

  2. File Size: Processing is limited to 10,000 rows for optimal performance.

  3. Ground Truth: If your CSV includes a fraud label column (is_fraud, fraud, target, etc.), the app will automatically calculate model performance metrics.

πŸ”§ Technical Details

  • Framework: Gradio Blocks API
  • Visualizations: Plotly (interactive charts)
  • Model: Calibrated LightGBM ensemble
  • Features: 25 engineered features with automatic feature engineering

πŸ“ Model File

Important: This Space requires the model file fraud_lgbm_calibrated.pkl to be present.

If deploying this Space:

  1. Train the model using train_improved_model.py (if you have the training script)
  2. Upload the model file to the Space repository
  3. Or use Git LFS for large model files

πŸ”— Related Resources

  • Full project documentation: See README.md in the repository
  • Model training: train_improved_model.py
  • Sample data generator: generate_sample_dataset.py
  • Power BI integration: powerbi_export.py

πŸ“„ License

This project is for educational and portfolio purposes. Ensure you have proper data usage rights before processing real transaction data.


Built with Python, LightGBM, XGBoost, Gradio, and Plotly.