testt / README.md
1na37's picture
Update README.md
0ae6d07 verified
---
title: SME Credit Risk AI
emoji: ๐Ÿฆ
colorFrom: indigo
colorTo: blue
sdk: streamlit
sdk_version: 1.55.0
app_file: app.py
pinned: true
license: mit
---
# ๐Ÿฆ AI-Powered SME Credit Risk Assessment
**Final Project โ€” AI Engineering Bootcamp Batch 10 | Author: 1na37**
> Explainable ยท Prescriptive ยท Multilingual Credit Intelligence
---
## โœจ Features
| Feature | Description |
|---------|-------------|
| ๐Ÿค– **Ensemble ML** | XGBoost + LightGBM + Random Forest (soft voting, 2:2:1 weights) |
| ๐Ÿ” **XAI Engine** | SHAP Waterfall Plot per applicant โ€” *why* each decision |
| ๐Ÿ’ฐ **Risk Metrics** | PD Score + Expected Loss (PD ร— LGD ร— EAD, Basel II) |
| ๐ŸŽฎ **What-If Sim** | Interactive slider simulation, real-time score update |
| ๐Ÿ’ฌ **LLM Narrative** | Gemini โ†’ Groq โ†’ Static fallback (auto chain) |
| ๐ŸŒ **Multilingual** | Bahasa Indonesia / English / Hindi |
---
## ๐Ÿ“Š Dataset
- **Source**: German Credit Dataset (UCI ML Repository, id=144)
- **Rows**: 1,000 | **Original features**: 20 | **After augmentation**: 28
- **Augmentation**: Synthetic SME-relevant features added via Python:
- `digital_presence_score`, `has_social_media`, `ecommerce_monthly_volume`
- `has_npwp`, `has_siup`, `business_age_years`, `monthly_cash_flow`, `num_employees`
- **Class balance**: SMOTE applied on training set only (no data leakage)
---
## ๐Ÿ—๏ธ Architecture
```
Input (28 features)
โ†“
One-Hot Encoding + StandardScaler + SMOTE (train only)
โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Soft Voting Ensemble (2:2:1) โ”‚
โ”‚ โ”œโ”€โ”€ XGBoost (n=300, lr=0.05) โ”‚
โ”‚ โ”œโ”€โ”€ LightGBM (n=300, lr=0.05) โ”‚
โ”‚ โ””โ”€โ”€ RandomForest (n=200) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ†“
PD Score โ†’ EL = PD ร— LGD(40%) ร— EAD
โ†“
SHAP TreeExplainer โ†’ Waterfall Plot
โ†“
LLM Narrative (Gemini โ†’ Groq โ†’ Fallback)
```
---
## ๐Ÿ“ˆ Evaluation
- **ROC-AUC** โ€” primary metric (industry target: >0.75)
- **KS Statistic** โ€” separation between good/bad credit distributions
- **F1-Score** โ€” balanced precision/recall for imbalanced data
- **5-Fold Cross-Validation** โ€” SMOTE inside ImbPipeline (no leakage)
"Model menunjukkan performa sempurna karena synthetic features dirancang berkorelasi kuat dengan target,
sebagai simulasi kondisi ideal. Dalam deployment nyata,
fitur-fitur ini akan diganti dengan data aktual dari sumber eksternal
seperti API e-commerce, data perpajakan, dan rekening koran."
![Evaluation Report](evalsme.png)
---
## ๐Ÿš€ Run Locally
```bash
git clone https://huggingface.co/spaces/1na37/sme-credit-risk
cd sme-credit-risk
pip install -r requirements.txt
# Train model first (run train_colab.py in Google Colab)
# Download model_export.zip, extract to model/ folder
streamlit run app.py
```
---
## ๐Ÿ”‘ API Keys (Optional)
In HF Spaces โ†’ **Settings โ†’ Repository Secrets**:
- `GEMINI_API_KEY` โ€” [Google AI Studio](https://aistudio.google.com) (free tier)
- `GROQ_API_KEY` โ€” [console.groq.com](https://console.groq.com) (free tier)
App works without API keys using static fallback template.
---
## ๐Ÿ“š References
- [German Credit Dataset โ€” UCI ML Repository](https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data)
- [SHAP Library](https://shap.readthedocs.io/)
- Basel II Credit Risk Framework (LGD = 40% standard)
---
*Built with โค๏ธ for AI Engineering Bootcamp Batch 10*