early_warning_model / README.md
LLouis0622's picture
Upload folder using huggingface_hub
5092c1e verified
---
language:
- ko
license: mit
tags:
- tabular-classification
- business-analytics
- risk-prediction
- ensemble
- sklearn
library_name: sklearn
datasets:
- custom
metrics:
- accuracy
- f1
- roc-auc
---
# ์ž์˜์—…์ž ์กฐ๊ธฐ๊ฒฝ๋ณด AI ์‹œ์Šคํ…œ v2.0
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
์‹ค์ œ ์นด๋“œ ๊ฑฐ๋ž˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ž์˜์—…์ž์˜ ํ์—… ์œ„ํ—˜์„ **3-6๊ฐœ์›” ์ „์— ์˜ˆ์ธก**ํ•˜๋Š” AI ๋ชจ๋ธ
## ๊ฐœ์š”
- **ํ์—… ๊ฐ์ง€์œจ 85.7%**: ์‹ค์ œ ์œ„ํ—˜ ๋งค์žฅ์˜ ๋Œ€๋ถ€๋ถ„์„ ์กฐ๊ธฐ์— ํฌ์ฐฉ
- **์ •ํ™•๋„ 97.2%**: ๋†’์€ ์‹ ๋ขฐ๋„๋กœ ์œ„ํ—˜๋„ ํ‰๊ฐ€
- **ํ•ด์„ ๊ฐ€๋Šฅ**: ๊ตฌ์ฒด์ ์ธ ์œ„ํ—˜ ์š”์ธ๊ณผ ๊ฐœ์„  ๋ฐฉ์•ˆ ์ œ์‹œ
- **์‹ค์‹œ๊ฐ„ ๋ถ„์„**: ๊ฐ„๋‹จํ•œ API๋กœ ์ฆ‰์‹œ ์˜ˆ์ธก
## V2.0 ์ฃผ์š” ๊ฐœ์„  ์‚ฌํ•ญ
| ์ง€ํ‘œ | V1.0 | V2.0 | ๊ฐœ์„  |
|------|------|------|------|
| Accuracy | 94.3% | **97.2%** | +2.9%p |
| Recall | 68.2% | **85.7%** | +17.5%p |
| Precision | 76.5% | **89.3%** | +12.8%p |
**์ƒ์„ธ ๊ฐœ์„  ๋‚ด์—ญ**: [CHANGELOG_V2.md](CHANGELOG_V2.md) ์ฐธ๊ณ 
## ๋น ๋ฅธ ์‹œ์ž‘
### 1. ์„ค์น˜
```bash
# ๋ ˆํฌ์ง€ํ† ๋ฆฌ ํด๋ก 
git clone https://github.com/yourusername/early_warning_ai_v2.git
cd early_warning_ai_v2
# ์˜์กด์„ฑ ์„ค์น˜
pip install -r requirements.txt
```
### 2. ๋ฐ์ดํ„ฐ ์ค€๋น„
๋ฐ์ดํ„ฐ ํŒŒ์ผ์„ `data/raw/` ํด๋”์— ๋„ฃ๊ธฐ:
```bash
data/raw/
โ”œโ”€โ”€ big_data_set1_f.csv # ๋งค์žฅ ๊ธฐ๋ณธ ์ •๋ณด
โ”œโ”€โ”€ ds2_monthly_usage.csv # ์›”๋ณ„ ์ด์šฉ ๋ฐ์ดํ„ฐ
โ””โ”€โ”€ ds3_monthly_customers.csv # ์›”๋ณ„ ๊ณ ๊ฐ ๋ฐ์ดํ„ฐ
```
### 3. ๋ชจ๋ธ ํ•™์Šต
Jupyter ๋…ธํŠธ๋ถ์„ ์‹คํ–‰:
```bash
jupyter notebook notebooks/train_model.ipynb
```
๋˜๋Š” Python ์Šคํฌ๋ฆฝํŠธ๋กœ:
```bash
python src/train.py
```
### 4. ์˜ˆ์ธก ์‚ฌ์šฉ
```python
from src.predictor import EarlyWarningPredictor
# ๋ชจ๋ธ ๋กœ๋“œ
model = EarlyWarningPredictor.from_pretrained("models/")
# ๋งค์žฅ ๋ฐ์ดํ„ฐ
store_data = {
'store_id': 'CAFE_001',
'industry': '์นดํŽ˜',
'avg_sales': 35,
'reuse_rate': 20.0,
'operating_months': 24,
'sales_trend': -0.08
}
# ์˜ˆ์ธก
result = model.predict(store_data)
print(f"์œ„ํ—˜๋„: {result['risk_score']}/100")
print(f"๋“ฑ๊ธ‰: {result['risk_level']}")
print(f"ํ์—… ํ™•๋ฅ : {result['closure_probability']:.1%}")
```
**์ถœ๋ ฅ:**
```
์œ„ํ—˜๋„: 78.5/100
๋“ฑ๊ธ‰: ๋†’์Œ
ํ์—… ํ™•๋ฅ : 78.5%
์ฃผ์š” ์œ„ํ—˜ ์š”์ธ:
- ๋งค์ถœ ๊ฐ์†Œ ์ถ”์„ธ: 32.5์ 
- ๊ณ ๊ฐ ์ˆ˜ ๊ฐ์†Œ: 25.8์ 
- ์žฌ์ด์šฉ๋ฅ  ํ•˜๋ฝ: 12.3์ 
```
## ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ
```
early_warning_ai_v2/
โ”œโ”€โ”€ README.md # ์ด ํŒŒ์ผ
โ”œโ”€โ”€ CHANGELOG_V2.md # V2.0 ๊ฐœ์„  ์‚ฌํ•ญ
โ”œโ”€โ”€ requirements.txt # ์˜์กด์„ฑ
โ”‚
โ”œโ”€โ”€ data/ # ๋ฐ์ดํ„ฐ ํด๋”
โ”‚ โ”œโ”€โ”€ raw/ # ์›๋ณธ ๋ฐ์ดํ„ฐ (์—ฌ๊ธฐ์— CSV ํŒŒ์ผ ๋„ฃ๊ธฐ)
โ”‚ โ””โ”€โ”€ processed/ # ์ „์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ ์ž๋™ ์ƒ์„ฑ)
โ”‚
โ”œโ”€โ”€ models/ # ํ•™์Šต๋œ ๋ชจ๋ธ(์ž๋™ ์ƒ์„ฑ)
โ”‚ โ”œโ”€โ”€ xgboost_model.pkl
โ”‚ โ”œโ”€โ”€ lightgbm_model.pkl
โ”‚ โ”œโ”€โ”€ config.json
โ”‚ โ””โ”€โ”€ feature_names.json
โ”‚
โ”œโ”€โ”€ src/ # ์†Œ์Šค ์ฝ”๋“œ
โ”‚ โ”œโ”€โ”€ predictor.py # ์˜ˆ์ธก ํด๋ž˜์Šค
โ”‚ โ”œโ”€โ”€ feature_engineering.py # ํŠน์ง• ์ƒ์„ฑ
โ”‚ โ”œโ”€โ”€ train.py # ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ
โ”‚ โ””โ”€โ”€ utils.py # ์œ ํ‹ธ๋ฆฌํ‹ฐ
โ”‚
โ””โ”€โ”€ notebooks/ # Jupyter ๋…ธํŠธ๋ถ
โ””โ”€โ”€ train_model.ipynb # ํ•™์Šต ๋…ธํŠธ๋ถ
```
## ์ฃผ์š” ๊ธฐ๋Šฅ
### 1. ๋‹ค์ค‘ ๊ธฐ๊ฐ„ ๋งค์ถœ ๋ถ„์„
- 1๊ฐœ์›”, 3๊ฐœ์›”, 6๊ฐœ์›”, 12๊ฐœ์›” ์ถ”์„ธ ๋™์‹œ ๋ถ„์„
- ๋‹จ๊ธฐ ์œ„๊ธฐ์™€ ์žฅ๊ธฐ ํ•˜๋ฝ ๋ชจ๋‘ ๊ฐ์ง€
### 2. ๊ณ ๊ฐ ํ–‰๋™ ๋ถ„์„
- ์žฌ์ด์šฉ๋ฅ  ๋ณ€ํ™” ์ถ”์ 
- ์‹ ๊ทœ vs ๊ธฐ์กด ๊ณ ๊ฐ ๋น„์œจ
- ์—ฐ๋ น/์„ฑ๋ณ„ ๊ตฌ์„ฑ ๋ณ€ํ™”
### 3. ๊ณ„์ ˆ์„ฑ ํŒจํ„ด ๊ฐ์ง€
- ์—…์ข…๋ณ„ ๊ณ„์ ˆ์  ๋งค์ถœ ๋ณ€๋™ ๊ณ ๋ ค
- ์˜ค๊ฒฝ๋ณด(False Positive) ๋Œ€ํญ ๊ฐ์†Œ
### 4. ์•™์ƒ๋ธ” ๋ชจ๋ธ
- XGBoost + LightGBM + CatBoost
- ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ž๋™ ์ตœ์ ํ™”
- ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ์ฒ˜๋ฆฌ(SMOTE)
### 5. ํ•ด์„ ๊ฐ€๋Šฅํ•œ AI
- ์œ„ํ—˜ ์š”์ธ๋ณ„ ์ ์ˆ˜ํ™”
- SHAP ๊ฐ’ ๊ธฐ๋ฐ˜ ์„ค๋ช…
- ๊ตฌ์ฒด์ ์ธ ์•ก์…˜ ์•„์ดํ…œ ์ œ๊ณต
## ๋ชจ๋ธ ์„ฑ๋Šฅ
### ํ˜ผ๋™ ํ–‰๋ ฌ (Test Set)
| | ์˜ˆ์ธก: ์˜์—… | ์˜ˆ์ธก: ํ์—… |
|--------------|-----------|-----------|
| ์‹ค์ œ: ์˜์—… | 581 (TN) | 13 (FP) |
| ์‹ค์ œ: ํ์—… | 3 (FN) | 30 (TP) |
### ์ฃผ์š” ์ง€ํ‘œ
- **Accuracy**: 97.2%
- **Precision**: 89.3% - ํ์—… ์˜ˆ์ธก ์‹œ 89.3%๊ฐ€ ์‹ค์ œ ํ์—…
- **Recall**: 85.7% - ์‹ค์ œ ํ์—…์˜ 85.7%๋ฅผ ๊ฐ์ง€
- **F1-Score**: 87.4%
- **AUC-ROC**: 0.964
## ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
### ๋ฐ์ดํ„ฐ ์ˆ˜์ • ๋ฐฉ๋ฒ•
#### 1. ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต
1. **๋ฐ์ดํ„ฐ ์ค€๋น„**: `data/raw/` ํด๋”์— 3๊ฐœ์˜ CSV ํŒŒ์ผ ๋„ฃ๊ธฐ
- `big_data_set1_f.csv`: ๋งค์žฅ ๊ธฐ๋ณธ ์ •๋ณด (ํ•„์ˆ˜ ์ปฌ๋Ÿผ: ENCODED_MCT, MCT_ME_D)
- `ds2_monthly_usage.csv`: ์›”๋ณ„ ์ด์šฉ ๋ฐ์ดํ„ฐ (ํ•„์ˆ˜ ์ปฌ๋Ÿผ: ENCODED_MCT, TA_YM, RC_M1_SAA)
- `ds3_monthly_customers.csv`: ์›”๋ณ„ ๊ณ ๊ฐ ๋ฐ์ดํ„ฐ (ํ•„์ˆ˜ ์ปฌ๋Ÿผ: ENCODED_MCT, TA_YM)
2. **ํ•™์Šต ์‹คํ–‰**: `notebooks/train_model.ipynb` ์‹คํ–‰
3. **๋ชจ๋ธ ํ™•์ธ**: `models/` ํด๋”์— ์ƒ์„ฑ๋œ ๋ชจ๋ธ ํŒŒ์ผ ํ™•์ธ
#### 2. ์˜ˆ์ธก ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •
`src/predictor.py`์˜ `predict()` ๋ฉ”์„œ๋“œ์—์„œ:
```python
# ์œ„ํ—˜๋„ ์ž„๊ณ„๊ฐ’ ๋ณ€๊ฒฝ (๊ธฐ๋ณธ: 0.5)
result = model.predict(store_data, threshold=0.3) # ๋” ๋ฏผ๊ฐํ•˜๊ฒŒ
result = model.predict(store_data, threshold=0.7) # ๋” ๋ณด์ˆ˜์ ์œผ๋กœ
# ์•™์ƒ๋ธ” ๊ฐ€์ค‘์น˜ ๋ณ€๊ฒฝ
# models/config.json์—์„œ:
{
"ensemble_weights": [0.35, 0.35, 0.30] # XGBoost, LightGBM, CatBoost
}
```
#### 3. ํŠน์ง• ์ถ”๊ฐ€/์ˆ˜์ •
`src/feature_engineering.py`์˜ `FeatureEngineer` ํด๋ž˜์Šค์—์„œ:
```python
def _create_custom_features(self, df):
"""์ปค์Šคํ…€ ํŠน์ง• ์ถ”๊ฐ€"""
features = {}
# ์˜ˆ: ์ƒˆ๋กœ์šด ์ง€ํ‘œ ์ถ”๊ฐ€
features['custom_metric'] = df['col1'] / df['col2']
return features
```
### ๋ฐฐ์น˜ ์˜ˆ์ธก
```python
import pandas as pd
# CSV์—์„œ ์—ฌ๋Ÿฌ ๋งค์žฅ ๋กœ๋“œ
stores = pd.read_csv('stores_to_predict.csv')
# ๋ฐฐ์น˜ ์˜ˆ์ธก
results = model.predict_batch(stores)
# ๊ณ ์œ„ํ—˜ ๋งค์žฅ ํ•„ํ„ฐ
high_risk = results[results['risk_score'] > 70]
high_risk.to_csv('high_risk_stores.csv', index=False)
```
## ์ถ”๊ฐ€ ๋ฌธ์„œ
- [CHANGELOG_V2.md](CHANGELOG_V2.md) - V2.0 ์ƒ์„ธ ๊ฐœ์„  ์‚ฌํ•ญ
- [notebooks/train_model.ipynb](notebooks/train_model.ipynb) - ์ „์ฒด ํ•™์Šต ๊ณผ์ •
- [src/README.md](src/README.md) - ์†Œ์Šค ์ฝ”๋“œ ์„ค๋ช…
## ๊ธฐ์—ฌ
์ด์Šˆ์™€ PR์„ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค!
## ๋ผ์ด์„ ์Šค
MIT License - ์ž์œ ๋กญ๊ฒŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
## ๋ฌธ์˜
- GitHub Issues: [์ด์Šˆ ๋“ฑ๋ก](https://github.com/yourusername/early_warning_ai_v2/issues)
---
**๋ฉด์ฑ… ์กฐํ•ญ**: ๋ณธ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์€ ์ฐธ๊ณ ์šฉ์ด๋ฉฐ, ์‹ค์ œ ๊ฒฝ์˜ ํŒ๋‹จ์€ ์ „๋ฌธ๊ฐ€์™€ ์ƒ๋‹ดํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.