--- title: Credit Card Fraud Detection App emoji: 🚀 colorFrom: red colorTo: red sdk: docker app_port: 8501 tags: - streamlit pinned: false short_description: Streamlit template space --- # 💳 Credit Card Fraud Detection Real-time fraud detection using Machine Learning and an interactive Streamlit dashboard. ## 🚀 Live App 👉 [HuggingFace Space link] --- ## 📌 Problem Credit card fraud detection is a highly imbalanced classification problem where fraudulent transactions represent a very small fraction of the data. The goal is to: - Detect fraudulent transactions - Minimize false negatives - Provide real-time predictions --- ## 📊 Dataset Source: Kaggle – Credit Card Fraud Detection ### Features The dataset contains: - **Time** → seconds since first transaction - **Amount** → transaction value - **V1 – V28** → PCA-transformed anonymized features ### 🔐 Why PCA? The original transaction data contains sensitive financial information. To preserve privacy: - All original features were transformed using **Principal Component Analysis (PCA)** - The resulting components are labeled **V1–V28** These components: - Are **not directly interpretable** - Capture the **underlying transaction patterns** - Retain the information needed for fraud detection In other words: > V1–V28 are orthogonal principal components representing the variance of the original feature space while ensuring data anonymization. --- ## 🧠 Model Baseline model trained using: - Scaled features - Train/test split - ROC-AUC evaluation ### Evaluation Metric ROC-AUC was used because: - The dataset is highly imbalanced - Accuracy is misleading - AUC measures class separability --- ## 🎯 Streamlit App Features ### 🔍 Prediction - Manual transaction input - Random transaction generator - Fraud probability score - Adjustable decision threshold - Downloadable prediction report ### 📊 Model Insights - ROC Curve - Confusion Matrix - AUC score - Feature importance (tree-based models) --- ## ⚙️ Tech Stack - Python - Scikit-learn - Streamlit - NumPy - Matplotlib --- ## 🧠 What I Learned - Handling imbalanced datasets - Why ROC-AUC is better than accuracy for fraud detection - Feature scaling impact - Threshold tuning for business use-cases - Building ML dashboards for real-time inference --- ## 🚀 Future Improvements - SMOTE / class weighting - XGBoost / LightGBM - SHAP explainability - Real-time API deployment --- ## 👤 Author Beyza Topbas Machine Learning Portfolio Project