--- title: SME Credit Risk AI emoji: ๐Ÿฆ colorFrom: indigo colorTo: blue sdk: streamlit sdk_version: 1.55.0 app_file: app.py pinned: true license: mit --- # ๐Ÿฆ AI-Powered SME Credit Risk Assessment **Final Project โ€” AI Engineering Bootcamp Batch 10 | Author: 1na37** > Explainable ยท Prescriptive ยท Multilingual Credit Intelligence --- ## โœจ Features | Feature | Description | |---------|-------------| | ๐Ÿค– **Ensemble ML** | XGBoost + LightGBM + Random Forest (soft voting, 2:2:1 weights) | | ๐Ÿ” **XAI Engine** | SHAP Waterfall Plot per applicant โ€” *why* each decision | | ๐Ÿ’ฐ **Risk Metrics** | PD Score + Expected Loss (PD ร— LGD ร— EAD, Basel II) | | ๐ŸŽฎ **What-If Sim** | Interactive slider simulation, real-time score update | | ๐Ÿ’ฌ **LLM Narrative** | Gemini โ†’ Groq โ†’ Static fallback (auto chain) | | ๐ŸŒ **Multilingual** | Bahasa Indonesia / English / Hindi | --- ## ๐Ÿ“Š Dataset - **Source**: German Credit Dataset (UCI ML Repository, id=144) - **Rows**: 1,000 | **Original features**: 20 | **After augmentation**: 28 - **Augmentation**: Synthetic SME-relevant features added via Python: - `digital_presence_score`, `has_social_media`, `ecommerce_monthly_volume` - `has_npwp`, `has_siup`, `business_age_years`, `monthly_cash_flow`, `num_employees` - **Class balance**: SMOTE applied on training set only (no data leakage) --- ## ๐Ÿ—๏ธ Architecture ``` Input (28 features) โ†“ One-Hot Encoding + StandardScaler + SMOTE (train only) โ†“ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Soft Voting Ensemble (2:2:1) โ”‚ โ”‚ โ”œโ”€โ”€ XGBoost (n=300, lr=0.05) โ”‚ โ”‚ โ”œโ”€โ”€ LightGBM (n=300, lr=0.05) โ”‚ โ”‚ โ””โ”€โ”€ RandomForest (n=200) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ†“ PD Score โ†’ EL = PD ร— LGD(40%) ร— EAD โ†“ SHAP TreeExplainer โ†’ Waterfall Plot โ†“ LLM Narrative (Gemini โ†’ Groq โ†’ Fallback) ``` --- ## ๐Ÿ“ˆ Evaluation - **ROC-AUC** โ€” primary metric (industry target: >0.75) - **KS Statistic** โ€” separation between good/bad credit distributions - **F1-Score** โ€” balanced precision/recall for imbalanced data - **5-Fold Cross-Validation** โ€” SMOTE inside ImbPipeline (no leakage) "Model menunjukkan performa sempurna karena synthetic features dirancang berkorelasi kuat dengan target, sebagai simulasi kondisi ideal. Dalam deployment nyata, fitur-fitur ini akan diganti dengan data aktual dari sumber eksternal seperti API e-commerce, data perpajakan, dan rekening koran." ![Evaluation Report](evalsme.png) --- ## ๐Ÿš€ Run Locally ```bash git clone https://huggingface.co/spaces/1na37/sme-credit-risk cd sme-credit-risk pip install -r requirements.txt # Train model first (run train_colab.py in Google Colab) # Download model_export.zip, extract to model/ folder streamlit run app.py ``` --- ## ๐Ÿ”‘ API Keys (Optional) In HF Spaces โ†’ **Settings โ†’ Repository Secrets**: - `GEMINI_API_KEY` โ€” [Google AI Studio](https://aistudio.google.com) (free tier) - `GROQ_API_KEY` โ€” [console.groq.com](https://console.groq.com) (free tier) App works without API keys using static fallback template. --- ## ๐Ÿ“š References - [German Credit Dataset โ€” UCI ML Repository](https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data) - [SHAP Library](https://shap.readthedocs.io/) - Basel II Credit Risk Framework (LGD = 40% standard) --- *Built with โค๏ธ for AI Engineering Bootcamp Batch 10*