maxdavinci commited on
Commit
a4068e4
Β·
verified Β·
1 Parent(s): f23e733

Update Readme.md

Browse files
Files changed (1) hide show
  1. README.md +127 -3
README.md CHANGED
@@ -1,3 +1,127 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - ru
6
+ pipeline_tag: tabular-classification
7
+ tags:
8
+ - credit-scoring
9
+ - catboost
10
+ - lightgbm
11
+ - polars
12
+ - tabular
13
+ - binary-classification
14
+ metrics:
15
+ - roc_auc
16
+ ---
17
+
18
+ Credit Risk Prediction Model
19
+
20
+ Description
21
+
22
+ Machine learning model for predicting bank client defaults. This model uses an ensemble of CatBoost and LightGBM with advanced feature engineering to assess credit risk.
23
+
24
+ Business Context
25
+
26
+ Development of a high-performance credit risk assessment system for the banking sector. The primary goal is to minimize bank losses by automating the prediction of client default probability.
27
+
28
+
29
+ Model Performance
30
+
31
+ | Metric | Value |
32
+ |--------|-------|
33
+ | **ROC-AUC** | 0.7523 |
34
+ | **Target KPI** | 0.75 |
35
+ | **Status** | βœ… Achieved |
36
+
37
+
38
+ Tech Stack
39
+
40
+ - **Language**: Python 3.10
41
+ - **Big Data Processing**: Polars (Lazy Loading)
42
+ - **Machine Learning**:
43
+ - CatBoost (weight: 0.05)
44
+ - LightGBM (weight: 0.95)
45
+ - **Infrastructure**: GPU acceleration (NVIDIA RTX 3050)
46
+ - **Tools**: Scikit-learn, Scipy, Pandas, Matplotlib, Seaborn
47
+
48
+
49
+ Dataset
50
+
51
+ - **Records**: 3,000,000
52
+ - **Files**: 12 Parquet files
53
+ - **Size**: 4.5 GB
54
+ - **Class Imbalance**: 1:49 (2% positive class)
55
+
56
+
57
+ Key Features
58
+
59
+ Over 170 engineered features including:
60
+ - `utilization_ratio` β€” credit limit usage level
61
+ - `overdue_ratio` β€” share of overdue debt
62
+ - `delays_per_loan` β€” frequency of critical delays (90+ days)
63
+
64
+
65
+ Usage
66
+
67
+ Installation
68
+
69
+ ```bash
70
+ pip install -r requirements.txt
71
+ ```
72
+
73
+ ```python
74
+ import joblib
75
+ import polars as pl
76
+
77
+ # Load model
78
+ model = joblib.load("final_pipeline.pkl")
79
+
80
+ # Load data
81
+ df = pl.read_parquet("client_data.parquet")
82
+
83
+ # Make predictions
84
+ predictions = model.predict(df)
85
+ probabilities = model.predict_proba(df)
86
+
87
+ # Results
88
+ print(f"Default probability: {probabilities[:, 1]}")
89
+ ```
90
+
91
+
92
+ ```python
93
+ from huggingface_hub import hf_hub_download
94
+ import joblib
95
+
96
+ # Download model
97
+ model_path = hf_hub_download(
98
+ repo_id="maxdavinci/Credit_Risk_Prediction_Model_0.75",
99
+ filename="final_pipeline.pkl"
100
+ )
101
+
102
+ # Load and use
103
+ model = joblib.load(model_path)
104
+ ```
105
+
106
+
107
+ Engineering Solutions
108
+
109
+ Scalability: Polars for efficient Big Data processing
110
+ Class Imbalance: Stratified validation + scale_pos_weight (27.18)
111
+ Ensembling: Rank Averaging method for stability
112
+ Production Ready: Custom CreditEnsemble class compatible with sklearn.pipeline
113
+
114
+
115
+ Project Structure
116
+
117
+ Credit_Risk_Prediction_Model_0.75/
118
+ β”œβ”€β”€ credit_risk_modeling.ipynb # Jupyter notebook with code
119
+ β”œβ”€β”€ final_pipeline.pkl # Trained model (90 MB)
120
+ β”œβ”€β”€ requirements.txt # Dependencies
121
+ └── README.md # This file
122
+
123
+
124
+ Links
125
+
126
+ GitHub Repository: https://github.com/maxdavinci2022/Credit_Risk_Prediction_Model_0.75
127
+ Author: @maxdavinci2022