maxdavinci
/

Credit_Risk_Prediction_Model_0.75

Tabular Classification

binary-classification

Model card Files Files and versions

Credit_Risk_Prediction_Model_0.75 / README.md

maxdavinci's picture

Update Readme.md

a4068e4 verified 17 days ago

|

history blame contribute delete

2.74 kB

	---
	license: mit
	language:
	- en
	- ru
	pipeline_tag: tabular-classification
	tags:
	- credit-scoring
	- catboost
	- lightgbm
	- polars
	- tabular
	- binary-classification
	metrics:
	- roc_auc
	---

	Credit Risk Prediction Model

	Description

	Machine learning model for predicting bank client defaults. This model uses an ensemble of CatBoost and LightGBM with advanced feature engineering to assess credit risk.

	Business Context

	Development of a high-performance credit risk assessment system for the banking sector. The primary goal is to minimize bank losses by automating the prediction of client default probability.


	Model Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| ROC-AUC \| 0.7523 \|
	\| Target KPI \| 0.75 \|
	\| Status \| ✅ Achieved \|


	Tech Stack

	- Language: Python 3.10
	- Big Data Processing: Polars (Lazy Loading)
	- Machine Learning:
	- CatBoost (weight: 0.05)
	- LightGBM (weight: 0.95)
	- Infrastructure: GPU acceleration (NVIDIA RTX 3050)
	- Tools: Scikit-learn, Scipy, Pandas, Matplotlib, Seaborn


	Dataset

	- Records: 3,000,000
	- Files: 12 Parquet files
	- Size: 4.5 GB
	- Class Imbalance: 1:49 (2% positive class)


	Key Features

	Over 170 engineered features including:
	- `utilization_ratio` — credit limit usage level
	- `overdue_ratio` — share of overdue debt
	- `delays_per_loan` — frequency of critical delays (90+ days)


	Usage

	Installation

	```bash
	pip install -r requirements.txt
	```

	```python
	import joblib
	import polars as pl

	# Load model
	model = joblib.load("final_pipeline.pkl")

	# Load data
	df = pl.read_parquet("client_data.parquet")

	# Make predictions
	predictions = model.predict(df)
	probabilities = model.predict_proba(df)

	# Results
	print(f"Default probability: {probabilities[:, 1]}")
	```


	```python
	from huggingface_hub import hf_hub_download
	import joblib

	# Download model
	model_path = hf_hub_download(
	repo_id="maxdavinci/Credit_Risk_Prediction_Model_0.75",
	filename="final_pipeline.pkl"
	)

	# Load and use
	model = joblib.load(model_path)
	```


	Engineering Solutions

	Scalability: Polars for efficient Big Data processing
	Class Imbalance: Stratified validation + scale_pos_weight (27.18)
	Ensembling: Rank Averaging method for stability
	Production Ready: Custom CreditEnsemble class compatible with sklearn.pipeline


	Project Structure

	Credit_Risk_Prediction_Model_0.75/
	├── credit_risk_modeling.ipynb # Jupyter notebook with code
	├── final_pipeline.pkl # Trained model (90 MB)
	├── requirements.txt # Dependencies
	└── README.md # This file


	Links

	GitHub Repository: https://github.com/maxdavinci2022/Credit_Risk_Prediction_Model_0.75
	Author: @maxdavinci2022