File size: 2,743 Bytes
a4068e4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | ---
license: mit
language:
- en
- ru
pipeline_tag: tabular-classification
tags:
- credit-scoring
- catboost
- lightgbm
- polars
- tabular
- binary-classification
metrics:
- roc_auc
---
Credit Risk Prediction Model
Description
Machine learning model for predicting bank client defaults. This model uses an ensemble of CatBoost and LightGBM with advanced feature engineering to assess credit risk.
Business Context
Development of a high-performance credit risk assessment system for the banking sector. The primary goal is to minimize bank losses by automating the prediction of client default probability.
Model Performance
| Metric | Value |
|--------|-------|
| **ROC-AUC** | 0.7523 |
| **Target KPI** | 0.75 |
| **Status** | β
Achieved |
Tech Stack
- **Language**: Python 3.10
- **Big Data Processing**: Polars (Lazy Loading)
- **Machine Learning**:
- CatBoost (weight: 0.05)
- LightGBM (weight: 0.95)
- **Infrastructure**: GPU acceleration (NVIDIA RTX 3050)
- **Tools**: Scikit-learn, Scipy, Pandas, Matplotlib, Seaborn
Dataset
- **Records**: 3,000,000
- **Files**: 12 Parquet files
- **Size**: 4.5 GB
- **Class Imbalance**: 1:49 (2% positive class)
Key Features
Over 170 engineered features including:
- `utilization_ratio` β credit limit usage level
- `overdue_ratio` β share of overdue debt
- `delays_per_loan` β frequency of critical delays (90+ days)
Usage
Installation
```bash
pip install -r requirements.txt
```
```python
import joblib
import polars as pl
# Load model
model = joblib.load("final_pipeline.pkl")
# Load data
df = pl.read_parquet("client_data.parquet")
# Make predictions
predictions = model.predict(df)
probabilities = model.predict_proba(df)
# Results
print(f"Default probability: {probabilities[:, 1]}")
```
```python
from huggingface_hub import hf_hub_download
import joblib
# Download model
model_path = hf_hub_download(
repo_id="maxdavinci/Credit_Risk_Prediction_Model_0.75",
filename="final_pipeline.pkl"
)
# Load and use
model = joblib.load(model_path)
```
Engineering Solutions
Scalability: Polars for efficient Big Data processing
Class Imbalance: Stratified validation + scale_pos_weight (27.18)
Ensembling: Rank Averaging method for stability
Production Ready: Custom CreditEnsemble class compatible with sklearn.pipeline
Project Structure
Credit_Risk_Prediction_Model_0.75/
βββ credit_risk_modeling.ipynb # Jupyter notebook with code
βββ final_pipeline.pkl # Trained model (90 MB)
βββ requirements.txt # Dependencies
βββ README.md # This file
Links
GitHub Repository: https://github.com/maxdavinci2022/Credit_Risk_Prediction_Model_0.75
Author: @maxdavinci2022 |