| --- |
| title: SME Credit Risk AI |
| emoji: ๐ฆ |
| colorFrom: indigo |
| colorTo: blue |
| sdk: streamlit |
| sdk_version: 1.55.0 |
| app_file: app.py |
| pinned: true |
| license: mit |
| --- |
| |
| # ๐ฆ AI-Powered SME Credit Risk Assessment |
|
|
| **Final Project โ AI Engineering Bootcamp Batch 10 | Author: 1na37** |
|
|
| > Explainable ยท Prescriptive ยท Multilingual Credit Intelligence |
|
|
| --- |
|
|
| ## โจ Features |
|
|
| | Feature | Description | |
| |---------|-------------| |
| | ๐ค **Ensemble ML** | XGBoost + LightGBM + Random Forest (soft voting, 2:2:1 weights) | |
| | ๐ **XAI Engine** | SHAP Waterfall Plot per applicant โ *why* each decision | |
| | ๐ฐ **Risk Metrics** | PD Score + Expected Loss (PD ร LGD ร EAD, Basel II) | |
| | ๐ฎ **What-If Sim** | Interactive slider simulation, real-time score update | |
| | ๐ฌ **LLM Narrative** | Gemini โ Groq โ Static fallback (auto chain) | |
| | ๐ **Multilingual** | Bahasa Indonesia / English / Hindi | |
|
|
| --- |
|
|
| ## ๐ Dataset |
|
|
| - **Source**: German Credit Dataset (UCI ML Repository, id=144) |
| - **Rows**: 1,000 | **Original features**: 20 | **After augmentation**: 28 |
| - **Augmentation**: Synthetic SME-relevant features added via Python: |
| - `digital_presence_score`, `has_social_media`, `ecommerce_monthly_volume` |
| - `has_npwp`, `has_siup`, `business_age_years`, `monthly_cash_flow`, `num_employees` |
| - **Class balance**: SMOTE applied on training set only (no data leakage) |
|
|
| --- |
|
|
| ## ๐๏ธ Architecture |
|
|
| ``` |
| Input (28 features) |
| โ |
| One-Hot Encoding + StandardScaler + SMOTE (train only) |
| โ |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| โ Soft Voting Ensemble (2:2:1) โ |
| โ โโโ XGBoost (n=300, lr=0.05) โ |
| โ โโโ LightGBM (n=300, lr=0.05) โ |
| โ โโโ RandomForest (n=200) โ |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| โ |
| PD Score โ EL = PD ร LGD(40%) ร EAD |
| โ |
| SHAP TreeExplainer โ Waterfall Plot |
| โ |
| LLM Narrative (Gemini โ Groq โ Fallback) |
| ``` |
|
|
| --- |
|
|
| ## ๐ Evaluation |
|
|
| - **ROC-AUC** โ primary metric (industry target: >0.75) |
| - **KS Statistic** โ separation between good/bad credit distributions |
| - **F1-Score** โ balanced precision/recall for imbalanced data |
| - **5-Fold Cross-Validation** โ SMOTE inside ImbPipeline (no leakage) |
| "Model menunjukkan performa sempurna karena synthetic features dirancang berkorelasi kuat dengan target, |
| sebagai simulasi kondisi ideal. Dalam deployment nyata, |
| fitur-fitur ini akan diganti dengan data aktual dari sumber eksternal |
| seperti API e-commerce, data perpajakan, dan rekening koran." |
|  |
| --- |
|
|
| ## ๐ Run Locally |
|
|
| ```bash |
| git clone https://huggingface.co/spaces/1na37/sme-credit-risk |
| cd sme-credit-risk |
| pip install -r requirements.txt |
| |
| # Train model first (run train_colab.py in Google Colab) |
| # Download model_export.zip, extract to model/ folder |
| |
| streamlit run app.py |
| ``` |
|
|
| --- |
|
|
| ## ๐ API Keys (Optional) |
|
|
| In HF Spaces โ **Settings โ Repository Secrets**: |
| - `GEMINI_API_KEY` โ [Google AI Studio](https://aistudio.google.com) (free tier) |
| - `GROQ_API_KEY` โ [console.groq.com](https://console.groq.com) (free tier) |
|
|
| App works without API keys using static fallback template. |
|
|
| --- |
|
|
| ## ๐ References |
|
|
| - [German Credit Dataset โ UCI ML Repository](https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data) |
| - [SHAP Library](https://shap.readthedocs.io/) |
| - Basel II Credit Risk Framework (LGD = 40% standard) |
|
|
| --- |
|
|
| *Built with โค๏ธ for AI Engineering Bootcamp Batch 10* |