iremrit commited on
Commit
0404839
·
verified ·
1 Parent(s): 95409ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -104
README.md CHANGED
@@ -1,104 +1,96 @@
1
- # Credit Score Classification Project
2
-
3
- ## 1. Problem Definition
4
- The objective of this project is to build a machine learning model to classify customers' credit scores into three categories: **Good, Standard, and Poor**. This automated system aims to reduce manual underwriting time and improve risk assessment accuracy.
5
-
6
- ## 2. Project Scope & Features
7
- * **Data Cleaning**: Handled dirty data (special characters), missing values (imputation), and outliers.
8
- * **Feature Engineering**: Created financial ratios, parsed credit history strings, and encoded categorical variables.
9
- * **Modeling**: Compared Logistic Regression (Baseline) vs. Random Forest vs. XGBoost.
10
- * **Deployment**: Modular pipeline (`src/`) with a Gradio web interface (`app.py`).
11
-
12
- ## 3. Deployment
13
- **Try the Model Instantly:**
14
- [Link to Live Demo (Simulated)] (e.g., HuggingFace Spaces URL)
15
-
16
- To run locally:
17
- 1. Install dependencies: `pip install -r requirements.txt`
18
- 2. Run the app: `python src/app.py`
19
- 3. Open browser at `http://localhost:7860`
20
-
21
- ## 4. Key Findings & Results
22
- * **Baseline Score**: 60% Accuracy (Logistic Regression).
23
- * **Final Score**: **80% Accuracy** (XGBoost).
24
- * **Top Predictors**: Outstanding Debt, Credit Mix, and Interest Rate.
25
- * **Business Impact**: Potential to reduce default rates by 15% and cut processing time by 90%.
26
-
27
- ## 5. Repository Structure
28
-
29
-
30
- ```
31
- FinRisk-AI/
32
-
33
- ├── README.md # Project Overview
34
- ├── requirements.txt # Dependencies
35
- ├── .gitignore
36
-
37
- ├── data/ # Raw and Processed Data
38
- ├── raw/
39
- │ ├── train.csv
40
- │ │ └── test.csv
41
- ── processed/
42
- ├── train_processed.csv
43
- └── test_processed.csv
44
-
45
- ├── docs/ # Detailed Documentation
46
- ── 00_setup.md
47
- ├── 01_data_overview.md
48
- ├── 02_baseline.md
49
- │ ├── 03_feature_engineering.md
50
- │ ├── 04_model_optimization.md
51
- ── 05_evaluation_report.md
52
-
53
- ├── notebooks/ # Jupyter Notebooks (EDA -> Pipeline)
54
- ── Analysis/
55
- │ └── 00_Data_Preparation_Training.ipynb
56
- │ └── Modeling/
57
- ├── 01_EDA.ipynb
58
- ── 02_baseline_model.ipynb
59
- ── 03_feature_engineering.ipynb
60
- │ ├── 04_model_optimization.ipynb
61
- ── 05_model_evaluation.ipynb
62
-
63
-
64
- ── src/ # Source Code
65
- ├── templates/ #UI
66
- │ └── index.html
67
- ├── models/ # Saved Artifacts
68
- ├── final_model.pkl
69
- │ │ └── features.json
70
- ── tests/
71
- ├── app.py # App
72
- ── config.py # Configuration
73
- ── inference.py # Prediction Logic
74
- ── pipeline.py # Training Pipeline
75
-
76
- ── OIG2.png
77
- ```
78
-
79
- ## 6. Validation Strategy
80
- We used **Stratified K-Fold Cross-Validation** to ensure our model generalizes well across all credit score classes, preventing overfitting to the "Standard" class which is the majority.
81
-
82
- ## 7. Pipeline Strategy
83
- * **Preprocessing**: robust regex cleaning for dirty numerical columns.
84
- * **Imputation**: Median imputation for skewed financial data.
85
- * **Model**: XGBoost chosen for its ability to handle non-linear relationships and high performance on tabular data.
86
-
87
- ## 8. Monitoring
88
- Post-deployment, we recommend monitoring:
89
- * **Accuracy**: Check against ground truth labels after 3 months.
90
- * **Data Drift**: Monitor `Annual_Income` and `Debt` distributions for shifts.
91
-
92
- ## 📌 To-Do: Business & Model Improvements
93
-
94
- - [ ] Validate the final model on a separate holdout test set
95
- - [ ] Set up model monitoring (monthly accuracy, drift in key features)
96
- - [ ] Define decision thresholds for each credit score class
97
- - [ ] Add fallback rules for uncertain predictions (e.g., probability < 55%)
98
- - [ ] Build a feedback loop to compare predicted vs actual scores
99
- - [ ] Document model limitations and train credit team on edge cases
100
-
101
- ## Contact
102
- * **Author**: [Your Name]
103
- * **Email**: [Your Email]
104
- * **LinkedIn**: [Your Profile]
 
1
+ # FinRisk-AI / Credit Score Classification
2
+
3
+ ## 1. Problem Definition
4
+ The objective of this project is to build a machine learning model to classify customers' credit scores into three categories: **Good, Standard, and Poor**. This automated system aims to reduce manual underwriting time and improve risk assessment accuracy.
5
+
6
+ ## 2. Project Scope
7
+ ### 2.1.Documentation
8
+ - [Setup & Installation](docs/00_setup.md)
9
+ - [Data Overview](docs/01_data_overview.md) - Dataset schema and relationships
10
+ - [Baseline Models](docs/02_baseline.md) - Baseline modeling results
11
+ - [Feature Engineering](docs/03_feature_engineering.md) - Phase-by-phase feature creation
12
+ - [Model Optimization](docs/04_model_optimization.md) - Hyperparameter tuning with Optuna
13
+ - [API Deployment](docs/api_deployment.md) - FastAPI deployment guide
14
+
15
+ ## 3. Deployment
16
+ **Try the Model Instantly:**
17
+ [Link to Live Demo (Simulated)] (e.g., HuggingFace Spaces URL)
18
+
19
+ To run locally:
20
+ 1. Install dependencies: `pip install -r requirements.txt`
21
+ 2. Run the app: `python src/app.py`
22
+ 3. Open browser at `http://localhost:7860`
23
+
24
+ ## 4. Key Findings & Results
25
+ * **Baseline Score**: 60% Accuracy (Logistic Regression).
26
+ * **Final Score**: **80% Accuracy** (XGBoost).
27
+ * **Top Predictors**: Outstanding Debt, Credit Mix, and Interest Rate.
28
+ * **Business Impact**: Potential to reduce default rates by 15% and cut processing time by 90%.
29
+
30
+ ## 5. Repository Structure
31
+
32
+
33
+ ```
34
+ FinRisk-AI/
35
+
36
+ ├── README.md # Project Overview
37
+ ├── requirements.txt # Dependencies
38
+ ├── .gitignore
39
+
40
+ ── data/ # Raw and Processed Data
41
+ ── raw/
42
+ ├── train.csv
43
+ └── test.csv
44
+ └── processed/
45
+ ├── train_processed.csv
46
+ ── test_processed.csv
47
+
48
+ ├── docs/ # Detailed Documentation
49
+ │ ├── 00_setup.md
50
+ │ ├── 01_data_overview.md
51
+ ── 02_baseline.md
52
+ ├── 03_feature_engineering.md
53
+ ├── 04_model_optimization.md
54
+ ── 05_evaluation_report.md
55
+
56
+ ── notebooks/ # Jupyter Notebooks (EDA -> Pipeline)
57
+ ├── Analysis/
58
+ │ └── 00_Data_Preparation_Training.ipynb
59
+ ── Modeling/
60
+ │ ├── 01_EDA.ipynb
61
+ ── 02_baseline_model.ipynb
62
+ ├── 03_feature_engineering.ipynb
63
+ ├── 04_model_optimization.ipynb
64
+ │ └── 05_model_evaluation.ipynb
65
+
66
+
67
+ ├── src/ # Source Code
68
+ │ ├── templates/ #UI
69
+ │ │ └── index.html
70
+ ── models/ # Saved Artifacts
71
+ ├── final_model.pkl
72
+ │ └── features.json
73
+ ── tests/
74
+ ── app.py # App
75
+ ├── config.py # Configuration
76
+ │ ├── inference.py # Prediction Logic
77
+ │ └── pipeline.py # Training Pipeline
78
+
79
+ └── OIG2.png
80
+ ```
81
+ <img width="1862" height="853" alt="image" src="https://github.com/user-attachments/assets/0e259956-69d9-4c82-99d3-0ad0fbb619a3" />
82
+
83
+
84
+ ## 📌 To-Do: Business & Model Improvements
85
+
86
+ - [ ] Validate the final model on a separate holdout test set
87
+ - [ ] Set up model monitoring (monthly accuracy, drift in key features)
88
+ - [ ] Define decision thresholds for each credit score class
89
+ - [ ] Add fallback rules for uncertain predictions (e.g., probability < 55%)
90
+ - [ ] Build a feedback loop to compare predicted vs actual scores
91
+ - [ ] Document model limitations and train credit team on edge cases
92
+
93
+ ## Contact
94
+ * **Author**: Rana Irem Turhan
95
+ * **GitHub**: github.com/Rana-Irem-Turhan
96
+ * **LinkedIn**: https://www.linkedin.com/in/irem-turhan/