testt

Sleeping

App Files Files Community

testt / README.md

1na37

Update README.md

0ae6d07 verified 23 days ago

preview code

raw

history blame contribute delete

3.59 kB

A newer version of the Streamlit SDK is available: 1.56.0

Upgrade

metadata

title: SME Credit Risk AI
emoji: 🏦
colorFrom: indigo
colorTo: blue
sdk: streamlit
sdk_version: 1.55.0
app_file: app.py
pinned: true
license: mit

🏦 AI-Powered SME Credit Risk Assessment

Final Project — AI Engineering Bootcamp Batch 10 | Author: 1na37

Explainable · Prescriptive · Multilingual Credit Intelligence

✨ Features

Feature	Description
🤖 Ensemble ML	XGBoost + LightGBM + Random Forest (soft voting, 2:2:1 weights)
🔍 XAI Engine	SHAP Waterfall Plot per applicant — why each decision
💰 Risk Metrics	PD Score + Expected Loss (PD × LGD × EAD, Basel II)
🎮 What-If Sim	Interactive slider simulation, real-time score update
💬 LLM Narrative	Gemini → Groq → Static fallback (auto chain)
🌍 Multilingual	Bahasa Indonesia / English / Hindi

📊 Dataset

Source: German Credit Dataset (UCI ML Repository, id=144)
Rows: 1,000 | Original features: 20 | After augmentation: 28
Augmentation: Synthetic SME-relevant features added via Python:
- digital_presence_score, has_social_media, ecommerce_monthly_volume
- has_npwp, has_siup, business_age_years, monthly_cash_flow, num_employees
Class balance: SMOTE applied on training set only (no data leakage)

🏗️ Architecture

Input (28 features)
    ↓
One-Hot Encoding + StandardScaler + SMOTE (train only)
    ↓
┌─────────────────────────────────────┐
│  Soft Voting Ensemble (2:2:1)       │
│  ├── XGBoost  (n=300, lr=0.05)     │
│  ├── LightGBM (n=300, lr=0.05)     │
│  └── RandomForest (n=200)           │
└─────────────────────────────────────┘
    ↓
PD Score → EL = PD × LGD(40%) × EAD
    ↓
SHAP TreeExplainer → Waterfall Plot
    ↓
LLM Narrative (Gemini → Groq → Fallback)

📈 Evaluation

ROC-AUC — primary metric (industry target: >0.75)
KS Statistic — separation between good/bad credit distributions
F1-Score — balanced precision/recall for imbalanced data
5-Fold Cross-Validation — SMOTE inside ImbPipeline (no leakage) "Model menunjukkan performa sempurna karena synthetic features dirancang berkorelasi kuat dengan target, sebagai simulasi kondisi ideal. Dalam deployment nyata, fitur-fitur ini akan diganti dengan data aktual dari sumber eksternal seperti API e-commerce, data perpajakan, dan rekening koran."

🚀 Run Locally

git clone https://huggingface.co/spaces/1na37/sme-credit-risk
cd sme-credit-risk
pip install -r requirements.txt

# Train model first (run train_colab.py in Google Colab)
# Download model_export.zip, extract to model/ folder

streamlit run app.py

🔑 API Keys (Optional)

In HF Spaces → Settings → Repository Secrets:

GEMINI_API_KEY — Google AI Studio (free tier)
GROQ_API_KEY — console.groq.com (free tier)

App works without API keys using static fallback template.

📚 References

German Credit Dataset — UCI ML Repository
SHAP Library
Basel II Credit Risk Framework (LGD = 40% standard)

Built with ❤️ for AI Engineering Bootcamp Batch 10