--- license: mit language: - en - ru pipeline_tag: tabular-classification tags: - credit-scoring - catboost - lightgbm - polars - tabular - binary-classification metrics: - roc_auc --- Credit Risk Prediction Model Description Machine learning model for predicting bank client defaults. This model uses an ensemble of CatBoost and LightGBM with advanced feature engineering to assess credit risk. Business Context Development of a high-performance credit risk assessment system for the banking sector. The primary goal is to minimize bank losses by automating the prediction of client default probability. Model Performance | Metric | Value | |--------|-------| | **ROC-AUC** | 0.7523 | | **Target KPI** | 0.75 | | **Status** | ✅ Achieved | Tech Stack - **Language**: Python 3.10 - **Big Data Processing**: Polars (Lazy Loading) - **Machine Learning**: - CatBoost (weight: 0.05) - LightGBM (weight: 0.95) - **Infrastructure**: GPU acceleration (NVIDIA RTX 3050) - **Tools**: Scikit-learn, Scipy, Pandas, Matplotlib, Seaborn Dataset - **Records**: 3,000,000 - **Files**: 12 Parquet files - **Size**: 4.5 GB - **Class Imbalance**: 1:49 (2% positive class) Key Features Over 170 engineered features including: - `utilization_ratio` — credit limit usage level - `overdue_ratio` — share of overdue debt - `delays_per_loan` — frequency of critical delays (90+ days) Usage Installation ```bash pip install -r requirements.txt ``` ```python import joblib import polars as pl # Load model model = joblib.load("final_pipeline.pkl") # Load data df = pl.read_parquet("client_data.parquet") # Make predictions predictions = model.predict(df) probabilities = model.predict_proba(df) # Results print(f"Default probability: {probabilities[:, 1]}") ``` ```python from huggingface_hub import hf_hub_download import joblib # Download model model_path = hf_hub_download( repo_id="maxdavinci/Credit_Risk_Prediction_Model_0.75", filename="final_pipeline.pkl" ) # Load and use model = joblib.load(model_path) ``` Engineering Solutions Scalability: Polars for efficient Big Data processing Class Imbalance: Stratified validation + scale_pos_weight (27.18) Ensembling: Rank Averaging method for stability Production Ready: Custom CreditEnsemble class compatible with sklearn.pipeline Project Structure Credit_Risk_Prediction_Model_0.75/ ├── credit_risk_modeling.ipynb # Jupyter notebook with code ├── final_pipeline.pkl # Trained model (90 MB) ├── requirements.txt # Dependencies └── README.md # This file Links GitHub Repository: https://github.com/maxdavinci2022/Credit_Risk_Prediction_Model_0.75 Author: @maxdavinci2022