maxdavinci
/

Credit_Risk_Prediction_Model_0.75

Tabular Classification

binary-classification

Model card Files Files and versions

maxdavinci commited on 17 days ago

Commit

a4068e4

·

verified ·

1 Parent(s): f23e733

Update Readme.md

Files changed (1) hide show

README.md +127 -3

README.md CHANGED Viewed

@@ -1,3 +1,127 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+- ru
+pipeline_tag: tabular-classification
+tags:
+- credit-scoring
+- catboost
+- lightgbm
+- polars
+- tabular
+- binary-classification
+metrics:
+- roc_auc
+---
+Credit Risk Prediction Model
+Description
+Machine learning model for predicting bank client defaults. This model uses an ensemble of CatBoost and LightGBM with advanced feature engineering to assess credit risk.
+Business Context
+Development of a high-performance credit risk assessment system for the banking sector. The primary goal is to minimize bank losses by automating the prediction of client default probability.
+Model Performance
+| Metric | Value |
+|--------|-------|
+| **ROC-AUC** | 0.7523 |
+| **Target KPI** | 0.75 |
+| **Status** | ✅ Achieved |
+Tech Stack
+- **Language**: Python 3.10
+- **Big Data Processing**: Polars (Lazy Loading)
+- **Machine Learning**:
+  - CatBoost (weight: 0.05)
+  - LightGBM (weight: 0.95)
+- **Infrastructure**: GPU acceleration (NVIDIA RTX 3050)
+- **Tools**: Scikit-learn, Scipy, Pandas, Matplotlib, Seaborn
+Dataset
+- **Records**: 3,000,000
+- **Files**: 12 Parquet files
+- **Size**: 4.5 GB
+- **Class Imbalance**: 1:49 (2% positive class)
+Key Features
+Over 170 engineered features including:
+- `utilization_ratio` — credit limit usage level
+- `overdue_ratio` — share of overdue debt
+- `delays_per_loan` — frequency of critical delays (90+ days)
+Usage
+Installation
+```bash
+pip install -r requirements.txt
+```
+```python
+import joblib
+import polars as pl
+# Load model
+model = joblib.load("final_pipeline.pkl")
+# Load data
+df = pl.read_parquet("client_data.parquet")
+# Make predictions
+predictions = model.predict(df)
+probabilities = model.predict_proba(df)
+# Results
+print(f"Default probability: {probabilities[:, 1]}")
+```
+```python
+from huggingface_hub import hf_hub_download
+import joblib
+# Download model
+model_path = hf_hub_download(
+    repo_id="maxdavinci/Credit_Risk_Prediction_Model_0.75",
+    filename="final_pipeline.pkl"
+)
+# Load and use
+model = joblib.load(model_path)
+```
+Engineering Solutions
+    Scalability: Polars for efficient Big Data processing
+    Class Imbalance: Stratified validation + scale_pos_weight (27.18)
+    Ensembling: Rank Averaging method for stability
+    Production Ready: Custom CreditEnsemble class compatible with sklearn.pipeline
+Project Structure
+Credit_Risk_Prediction_Model_0.75/
+├── credit_risk_modeling.ipynb  # Jupyter notebook with code
+├── final_pipeline.pkl          # Trained model (90 MB)
+├── requirements.txt            # Dependencies
+└── README.md                   # This file
+Links
+    GitHub Repository: https://github.com/maxdavinci2022/Credit_Risk_Prediction_Model_0.75
+    Author: @maxdavinci2022