testt

Sleeping

App Files Files Community

testt / README.md

1na37

Update README.md

0ae6d07 verified 24 days ago

preview code

raw

history blame contribute delete

3.59 kB

	---
	title: SME Credit Risk AI
	emoji: 🏦
	colorFrom: indigo
	colorTo: blue
	sdk: streamlit
	sdk_version: 1.55.0
	app_file: app.py
	pinned: true
	license: mit
	---

	# 🏦 AI-Powered SME Credit Risk Assessment

	Final Project — AI Engineering Bootcamp Batch 10 \| Author: 1na37

	> Explainable · Prescriptive · Multilingual Credit Intelligence

	---

	## ✨ Features

	\| Feature \| Description \|
	\|---------\|-------------\|
	\| 🤖 Ensemble ML \| XGBoost + LightGBM + Random Forest (soft voting, 2:2:1 weights) \|
	\| 🔍 XAI Engine \| SHAP Waterfall Plot per applicant — why each decision \|
	\| 💰 Risk Metrics \| PD Score + Expected Loss (PD × LGD × EAD, Basel II) \|
	\| 🎮 What-If Sim \| Interactive slider simulation, real-time score update \|
	\| 💬 LLM Narrative \| Gemini → Groq → Static fallback (auto chain) \|
	\| 🌍 Multilingual \| Bahasa Indonesia / English / Hindi \|

	---

	## 📊 Dataset

	- Source: German Credit Dataset (UCI ML Repository, id=144)
	- Rows: 1,000 \| Original features: 20 \| After augmentation: 28
	- Augmentation: Synthetic SME-relevant features added via Python:
	- `digital_presence_score`, `has_social_media`, `ecommerce_monthly_volume`
	- `has_npwp`, `has_siup`, `business_age_years`, `monthly_cash_flow`, `num_employees`
	- Class balance: SMOTE applied on training set only (no data leakage)

	---

	## 🏗️ Architecture

	```
	Input (28 features)
	↓
	One-Hot Encoding + StandardScaler + SMOTE (train only)
	↓
	┌─────────────────────────────────────┐
	│ Soft Voting Ensemble (2:2:1) │
	│ ├── XGBoost (n=300, lr=0.05) │
	│ ├── LightGBM (n=300, lr=0.05) │
	│ └── RandomForest (n=200) │
	└─────────────────────────────────────┘
	↓
	PD Score → EL = PD × LGD(40%) × EAD
	↓
	SHAP TreeExplainer → Waterfall Plot
	↓
	LLM Narrative (Gemini → Groq → Fallback)
	```

	---

	## 📈 Evaluation

	- ROC-AUC — primary metric (industry target: >0.75)
	- KS Statistic — separation between good/bad credit distributions
	- F1-Score — balanced precision/recall for imbalanced data
	- 5-Fold Cross-Validation — SMOTE inside ImbPipeline (no leakage)
	"Model menunjukkan performa sempurna karena synthetic features dirancang berkorelasi kuat dengan target,
	sebagai simulasi kondisi ideal. Dalam deployment nyata,
	fitur-fitur ini akan diganti dengan data aktual dari sumber eksternal
	seperti API e-commerce, data perpajakan, dan rekening koran."
	![Evaluation Report](evalsme.png)
	---

	## 🚀 Run Locally

	```bash
	git clone https://huggingface.co/spaces/1na37/sme-credit-risk
	cd sme-credit-risk
	pip install -r requirements.txt

	# Train model first (run train_colab.py in Google Colab)
	# Download model_export.zip, extract to model/ folder

	streamlit run app.py
	```

	---

	## 🔑 API Keys (Optional)

	In HF Spaces → Settings → Repository Secrets:
	- `GEMINI_API_KEY` — [Google AI Studio](https://aistudio.google.com) (free tier)
	- `GROQ_API_KEY` — [console.groq.com](https://console.groq.com) (free tier)

	App works without API keys using static fallback template.

	---

	## 📚 References

	- [German Credit Dataset — UCI ML Repository](https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data)
	- [SHAP Library](https://shap.readthedocs.io/)
	- Basel II Credit Risk Framework (LGD = 40% standard)

	---

	Built with ❤️ for AI Engineering Bootcamp Batch 10