BeyzaTopbas's picture
Update README.md
85783a4 verified
metadata
title: Credit Card Fraud Detection App
emoji: πŸš€
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: Streamlit template space

πŸ’³ Credit Card Fraud Detection

Real-time fraud detection using Machine Learning and an interactive Streamlit dashboard.

πŸš€ Live App

πŸ‘‰ [HuggingFace Space link]


πŸ“Œ Problem

Credit card fraud detection is a highly imbalanced classification problem where fraudulent transactions represent a very small fraction of the data.

The goal is to:

  • Detect fraudulent transactions
  • Minimize false negatives
  • Provide real-time predictions

πŸ“Š Dataset

Source: Kaggle – Credit Card Fraud Detection

Features

The dataset contains:

  • Time β†’ seconds since first transaction
  • Amount β†’ transaction value
  • V1 – V28 β†’ PCA-transformed anonymized features

πŸ” Why PCA?

The original transaction data contains sensitive financial information.

To preserve privacy:

  • All original features were transformed using Principal Component Analysis (PCA)
  • The resulting components are labeled V1–V28

These components:

  • Are not directly interpretable
  • Capture the underlying transaction patterns
  • Retain the information needed for fraud detection

In other words:

V1–V28 are orthogonal principal components representing the variance of the original feature space while ensuring data anonymization.


🧠 Model

Baseline model trained using:

  • Scaled features
  • Train/test split
  • ROC-AUC evaluation

Evaluation Metric

ROC-AUC was used because:

  • The dataset is highly imbalanced
  • Accuracy is misleading
  • AUC measures class separability

🎯 Streamlit App Features

πŸ” Prediction

  • Manual transaction input
  • Random transaction generator
  • Fraud probability score
  • Adjustable decision threshold
  • Downloadable prediction report

πŸ“Š Model Insights

  • ROC Curve
  • Confusion Matrix
  • AUC score
  • Feature importance (tree-based models)

βš™οΈ Tech Stack

  • Python
  • Scikit-learn
  • Streamlit
  • NumPy
  • Matplotlib

🧠 What I Learned

  • Handling imbalanced datasets
  • Why ROC-AUC is better than accuracy for fraud detection
  • Feature scaling impact
  • Threshold tuning for business use-cases
  • Building ML dashboards for real-time inference

πŸš€ Future Improvements

  • SMOTE / class weighting
  • XGBoost / LightGBM
  • SHAP explainability
  • Real-time API deployment

πŸ‘€ Author

Beyza Topbas

Machine Learning Portfolio Project