BeyzaTopbas commited on
Commit
85783a4
Β·
verified Β·
1 Parent(s): 4ffcd3e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -25
README.md CHANGED
@@ -10,55 +10,98 @@ tags:
10
  pinned: false
11
  short_description: Streamlit template space
12
  ---
13
- # πŸ’³ Credit Card Fraud Detection – Streamlit App
14
 
15
- An end-to-end Machine Learning project for detecting fraudulent credit card transactions.
16
 
17
- πŸš€ **Live demo:** [HuggingFace Space link]
 
18
 
19
  ---
20
 
21
  ## πŸ“Œ Problem
22
 
23
- Credit card fraud detection is a highly imbalanced classification problem where fraudulent transactions represent a very small percentage of the data.
24
 
25
- The goal is to correctly identify fraudulent transactions while minimizing false positives.
 
 
 
 
26
 
27
  ---
28
 
29
  ## πŸ“Š Dataset
30
 
31
- - European cardholders dataset
32
- - PCA-transformed features (V1–V28)
33
- - Time & Amount
34
- - Highly imbalanced
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ---
37
 
38
- ## βš™οΈ Model
39
 
40
- - Algorithm: *(fill in β€” Logistic Regression / Random Forest / XGBoost)*
41
- - Evaluation metric: **ROC-AUC**
42
- - Trained on balanced data using proper preprocessing
 
 
 
 
 
 
 
 
 
 
43
 
44
  ---
45
 
46
- ## πŸ–₯️ App Features
47
 
48
  ### πŸ” Prediction
 
49
  - Manual transaction input
50
  - Random transaction generator
51
  - Fraud probability score
52
- - Real-time prediction
 
53
 
54
  ### πŸ“Š Model Insights
 
55
  - ROC Curve
56
  - Confusion Matrix
57
- - Feature Importance
 
58
 
59
  ---
60
 
61
- ## 🧠 Tech Stack
62
 
63
  - Python
64
  - Scikit-learn
@@ -68,17 +111,27 @@ The goal is to correctly identify fraudulent transactions while minimizing false
68
 
69
  ---
70
 
71
- ## πŸ“ˆ Key Learnings
72
 
73
  - Handling imbalanced datasets
74
- - Fraud detection strategies
75
- - Model evaluation with ROC-AUC
76
- - Deploying ML apps using Streamlit & HuggingFace
 
77
 
78
  ---
79
 
80
- ## πŸš€ Run Locally
 
 
 
 
 
 
 
 
 
 
 
81
 
82
- ```bash
83
- pip install -r requirements.txt
84
- streamlit run src/streamlit_app.py
 
10
  pinned: false
11
  short_description: Streamlit template space
12
  ---
13
+ # πŸ’³ Credit Card Fraud Detection
14
 
15
+ Real-time fraud detection using Machine Learning and an interactive Streamlit dashboard.
16
 
17
+ ## πŸš€ Live App
18
+ πŸ‘‰ [HuggingFace Space link]
19
 
20
  ---
21
 
22
  ## πŸ“Œ Problem
23
 
24
+ Credit card fraud detection is a highly imbalanced classification problem where fraudulent transactions represent a very small fraction of the data.
25
 
26
+ The goal is to:
27
+
28
+ - Detect fraudulent transactions
29
+ - Minimize false negatives
30
+ - Provide real-time predictions
31
 
32
  ---
33
 
34
  ## πŸ“Š Dataset
35
 
36
+ Source: Kaggle – Credit Card Fraud Detection
37
+
38
+ ### Features
39
+
40
+ The dataset contains:
41
+
42
+ - **Time** β†’ seconds since first transaction
43
+ - **Amount** β†’ transaction value
44
+ - **V1 – V28** β†’ PCA-transformed anonymized features
45
+
46
+ ### πŸ” Why PCA?
47
+
48
+ The original transaction data contains sensitive financial information.
49
+
50
+ To preserve privacy:
51
+
52
+ - All original features were transformed using **Principal Component Analysis (PCA)**
53
+ - The resulting components are labeled **V1–V28**
54
+
55
+ These components:
56
+
57
+ - Are **not directly interpretable**
58
+ - Capture the **underlying transaction patterns**
59
+ - Retain the information needed for fraud detection
60
+
61
+ In other words:
62
+
63
+ > V1–V28 are orthogonal principal components representing the variance of the original feature space while ensuring data anonymization.
64
 
65
  ---
66
 
67
+ ## 🧠 Model
68
 
69
+ Baseline model trained using:
70
+
71
+ - Scaled features
72
+ - Train/test split
73
+ - ROC-AUC evaluation
74
+
75
+ ### Evaluation Metric
76
+
77
+ ROC-AUC was used because:
78
+
79
+ - The dataset is highly imbalanced
80
+ - Accuracy is misleading
81
+ - AUC measures class separability
82
 
83
  ---
84
 
85
+ ## 🎯 Streamlit App Features
86
 
87
  ### πŸ” Prediction
88
+
89
  - Manual transaction input
90
  - Random transaction generator
91
  - Fraud probability score
92
+ - Adjustable decision threshold
93
+ - Downloadable prediction report
94
 
95
  ### πŸ“Š Model Insights
96
+
97
  - ROC Curve
98
  - Confusion Matrix
99
+ - AUC score
100
+ - Feature importance (tree-based models)
101
 
102
  ---
103
 
104
+ ## βš™οΈ Tech Stack
105
 
106
  - Python
107
  - Scikit-learn
 
111
 
112
  ---
113
 
114
+ ## 🧠 What I Learned
115
 
116
  - Handling imbalanced datasets
117
+ - Why ROC-AUC is better than accuracy for fraud detection
118
+ - Feature scaling impact
119
+ - Threshold tuning for business use-cases
120
+ - Building ML dashboards for real-time inference
121
 
122
  ---
123
 
124
+ ## πŸš€ Future Improvements
125
+
126
+ - SMOTE / class weighting
127
+ - XGBoost / LightGBM
128
+ - SHAP explainability
129
+ - Real-time API deployment
130
+
131
+ ---
132
+
133
+ ## πŸ‘€ Author
134
+
135
+ Beyza Topbas
136
 
137
+ Machine Learning Portfolio Project