LLouis0622's picture
Upload folder using huggingface_hub
5092c1e verified
# ์†Œ์Šค ์ฝ”๋“œ ์„ค๋ช…
## ํŒŒ์ผ ๊ตฌ์กฐ
```
src/
โ”œโ”€โ”€ predictor.py # ์˜ˆ์ธก ํด๋ž˜์Šค
โ”œโ”€โ”€ feature_engineering.py # ํŠน์ง• ์ƒ์„ฑ
โ”œโ”€โ”€ train.py # ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ
โ””โ”€โ”€ README.md # ์ด ํŒŒ์ผ
```
---
## ๊ฐ ํŒŒ์ผ ์„ค๋ช…
### 1. `predictor.py` - ์˜ˆ์ธก ํด๋ž˜์Šค
**์šฉ๋„**: ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๊ณ  ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฉ”์ธ ํด๋ž˜์Šค
**์ฃผ์š” ํด๋ž˜์Šค**: `EarlyWarningPredictor`
**์ฃผ์š” ๋ฉ”์„œ๋“œ**:
```python
# ๋ชจ๋ธ ๋กœ๋“œ (ํ—ˆ๊น…ํŽ˜์ด์Šค ์Šคํƒ€์ผ)
model = EarlyWarningPredictor.from_pretrained("models/")
# ๋‹จ์ผ ์˜ˆ์ธก
result = model.predict(store_data)
# ๋ฐฐ์น˜ ์˜ˆ์ธก
results = model.predict_batch(stores_df)
# ์˜ˆ์ธก ์„ค๋ช…
explanation = model.explain(store_data)
# ๋ชจ๋ธ ์ •๋ณด
info = model.get_model_info()
```
**๋ฐ˜ํ™˜ ๊ฐ’**:
```python
{
'risk_score': 78.5, # 0-100์  ์œ„ํ—˜๋„
'risk_level': '๋†’์Œ', # ๋‚ฎ์Œ/๋ณดํ†ต/๋†’์Œ
'closure_probability': 0.785, # ํ์—… ํ™•๋ฅ 
'risk_factors': {...}, # ์œ„ํ—˜ ์š”์ธ๋ณ„ ์ ์ˆ˜
'action_items': [...] # ๊ถŒ์žฅ ์กฐ์น˜
}
```
**์ˆ˜์ • ๋ฐฉ๋ฒ•**:
```python
# 1. ์œ„ํ—˜๋„ ์ž„๊ณ„๊ฐ’ ๋ณ€๊ฒฝ
def predict(self, store_data, threshold=0.5): # ๊ธฐ๋ณธ๊ฐ’ ๋ณ€๊ฒฝ
...
# 2. ์•™์ƒ๋ธ” ๊ฐ€์ค‘์น˜ ์กฐ์ •
# models/config.json ํŒŒ์ผ์—์„œ:
{
"ensemble_weights": [0.6, 0.4] # XGBoost 60%, LightGBM 40%
}
# 3. ์œ„ํ—˜ ๋“ฑ๊ธ‰ ๊ธฐ์ค€ ๋ณ€๊ฒฝ
if risk_score < 40: # ๊ธฐ์กด 30์—์„œ 40์œผ๋กœ
risk_level = '๋‚ฎ์Œ'
```
---
### 2. `feature_engineering.py` - ํŠน์ง• ์ƒ์„ฑ
**์šฉ๋„**: ์›๋ณธ ๋ฐ์ดํ„ฐ์—์„œ 47๊ฐœ์˜ ํŠน์ง•์„ ์ž๋™์œผ๋กœ ์ƒ์„ฑ
**์ฃผ์š” ํด๋ž˜์Šค**: `FeatureEngineer`
**์ƒ์„ฑ๋˜๋Š” ํŠน์ง•**:
#### ๋งค์ถœ ๊ด€๋ จ (15๊ฐœ)
- `sales_avg_1m`, `sales_avg_3m`, `sales_avg_6m`, `sales_avg_12m`
- `sales_recent_vs_previous`, `sales_mom_change`, `sales_yoy_change`
- `sales_max`, `sales_min`, `sales_range`
#### ๊ณ ๊ฐ ๊ด€๋ จ (12๊ฐœ)
- `customer_reuse_rate`, `customer_reuse_trend`
- `customer_new_rate`
- ์—ฐ๋ น/์„ฑ๋ณ„๋ณ„ ๊ณ ๊ฐ ๋น„์œจ (10๊ฐœ)
#### ์šด์˜ ๊ด€๋ จ (8๊ฐœ)
- `operation_months`, `operation_avg_amount`
- `operation_cancel_rate`, `operation_delivery_rate`
#### ํŠธ๋ Œ๋“œ (5๊ฐœ)
- `trend_slope`, `trend_r2`, `trend_direction`
- `trend_consecutive_down`, `trend_consecutive_up`
#### ๋ณ€๋™์„ฑ (4๊ฐœ)
- `volatility_cv`, `volatility_std`, `volatility_mad`, `volatility_recent_std`
#### ๊ณ„์ ˆ์„ฑ (2๊ฐœ)
- `seasonality_detected`, `seasonality_strength`
#### ๋งฅ๋ฝ (1๊ฐœ)
- `context_industry`
**์‚ฌ์šฉ ์˜ˆ์‹œ**:
```python
from feature_engineering import FeatureEngineer
engineer = FeatureEngineer()
features = engineer.create_features(
store_data={'industry': '์นดํŽ˜', 'location': '์„œ์šธ'},
monthly_usage=usage_df,
monthly_customers=customer_df
)
```
**์ƒˆ๋กœ์šด ํŠน์ง• ์ถ”๊ฐ€ ๋ฐฉ๋ฒ•**:
```python
class FeatureEngineer:
def _create_custom_features(self, df):
"""์ปค์Šคํ…€ ํŠน์ง• ์ถ”๊ฐ€"""
features = {}
# ์˜ˆ: ์„ฑ์žฅ๋ฅ  ์ง€ํ‘œ
if 'RC_M1_SAA' in df.columns and len(df) >= 6:
recent_3m = df['RC_M1_SAA'].tail(3).mean()
past_3m = df['RC_M1_SAA'].head(3).mean()
features['growth_rate'] = (recent_3m / past_3m - 1) * 100
return features
def create_features(self, store_data, monthly_usage, monthly_customers):
features = {}
# ๊ธฐ์กด ํŠน์ง•๋“ค...
features.update(self._create_sales_features(monthly_usage))
features.update(self._create_customer_features(monthly_customers))
# ์ƒˆ๋กœ์šด ์ปค์Šคํ…€ ํŠน์ง• ์ถ”๊ฐ€
features.update(self._create_custom_features(monthly_usage))
return pd.DataFrame([features])
```
---
### 3. `train.py` - ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ
**์šฉ๋„**: ์ปค๋งจ๋“œ๋ผ์ธ์—์„œ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ
**์‚ฌ์šฉ๋ฒ•**:
```bash
# ๊ธฐ๋ณธ ์‚ฌ์šฉ
python src/train.py
# ์˜ต์…˜ ์ง€์ •
python src/train.py --data data/raw --output models/ --max-stores 1000
# ๋„์›€๋ง
python src/train.py --help
```
**ํŒŒ๋ผ๋ฏธํ„ฐ**:
- `--data`: ๋ฐ์ดํ„ฐ ๋””๋ ‰ํ† ๋ฆฌ ๊ฒฝ๋กœ (๊ธฐ๋ณธ: `data/raw`)
- `--output`: ๋ชจ๋ธ ์ €์žฅ ๊ฒฝ๋กœ (๊ธฐ๋ณธ: `models`)
- `--max-stores`: ํ…Œ์ŠคํŠธ์šฉ ์ตœ๋Œ€ ๋งค์žฅ ์ˆ˜ (์„ ํƒ์‚ฌํ•ญ)
**์ฃผ์š” ํ•จ์ˆ˜**:
```python
def load_data(data_dir)
"""๋ฐ์ดํ„ฐ ๋กœ๋“œ"""
def create_features(df_store, df_usage, df_customer)
"""ํŠน์ง• ์ƒ์„ฑ"""
def preprocess_data(X, y)
"""์ „์ฒ˜๋ฆฌ ๋ฐ ๋ถ„ํ• """
def apply_smote(X_train, y_train)
"""SMOTE ์ ์šฉ"""
def train_models(X_train, y_train)
"""๋ชจ๋ธ ํ•™์Šต"""
def evaluate_models(xgb_model, lgb_model, X_test, y_test)
"""ํ‰๊ฐ€"""
def save_models(...)
"""๋ชจ๋ธ ์ €์žฅ"""
```
**์ˆ˜์ • ๋ฐฉ๋ฒ•**:
```python
# 1. ๋ชจ๋ธ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ๋ณ€๊ฒฝ
def train_models(X_train, y_train):
xgb_model = xgb.XGBClassifier(
max_depth=8, # 6์—์„œ 8๋กœ ์ฆ๊ฐ€
learning_rate=0.05, # 0.1์—์„œ 0.05๋กœ ๊ฐ์†Œ
n_estimators=300, # 200์—์„œ 300์œผ๋กœ ์ฆ๊ฐ€
# ...
)
# 2. ์•™์ƒ๋ธ” ๊ฐ€์ค‘์น˜ ๋ณ€๊ฒฝ
def evaluate_models(...):
ensemble_pred = 0.6 * xgb_pred + 0.4 * lgb_pred # ๊ธฐ์กด 0.5, 0.5
# 3. ๋ฐ์ดํ„ฐ ๋ถ„ํ•  ๋น„์œจ ๋ณ€๊ฒฝ
def preprocess_data(X, y):
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, ... # 0.25์—์„œ 0.2๋กœ
)
```
---
## ์ฃผ์š” ์ˆ˜์ • ์‹œ๋‚˜๋ฆฌ์˜ค
### ์‹œ๋‚˜๋ฆฌ์˜ค 1: ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต
**1๋‹จ๊ณ„**: ๋ฐ์ดํ„ฐ ์ค€๋น„
```bash
# data/raw/์— CSV ํŒŒ์ผ 3๊ฐœ ๋ฐฐ์น˜
data/raw/
โ”œโ”€โ”€ big_data_set1_f.csv
โ”œโ”€โ”€ ds2_monthly_usage.csv
โ””โ”€โ”€ ds3_monthly_customers.csv
```
**2๋‹จ๊ณ„**: ํ•™์Šต ์‹คํ–‰
```bash
python src/train.py
```
**3๋‹จ๊ณ„**: ์˜ˆ์ธก ์‚ฌ์šฉ
```python
from src.predictor import EarlyWarningPredictor
model = EarlyWarningPredictor.from_pretrained("models/")
```
### ์‹œ๋‚˜๋ฆฌ์˜ค 2: ๋ชจ๋ธ ์„ฑ๋Šฅ ๊ฐœ์„ 
**๋ฐฉ๋ฒ• 1**: ํŠน์ง• ์ถ”๊ฐ€
```python
# feature_engineering.py์— ์ƒˆ๋กœ์šด ํŠน์ง• ์ถ”๊ฐ€
def _create_custom_features(self, df):
# ์ƒˆ๋กœ์šด ์ง€ํ‘œ ๊ณ„์‚ฐ
pass
```
**๋ฐฉ๋ฒ• 2**: ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹
```python
# train.py์—์„œ ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •
xgb_model = xgb.XGBClassifier(
max_depth=8,
learning_rate=0.05,
...
)
```
**๋ฐฉ๋ฒ• 3**: ์•™์ƒ๋ธ” ๊ฐ€์ค‘์น˜ ์กฐ์ •
```python
# models/config.json ์ˆ˜์ •
{
"ensemble_weights": [0.6, 0.4]
}
```
### ์‹œ๋‚˜๋ฆฌ์˜ค 3: ์˜ˆ์ธก ์ž„๊ณ„๊ฐ’ ์กฐ์ •
**๋” ๋ฏผ๊ฐํ•˜๊ฒŒ (์กฐ๊ธฐ ๊ฒฝ๋ณด ๊ฐ•ํ™”)**:
```python
result = model.predict(store_data, threshold=0.3)
# ํ์—… ํ™•๋ฅ  30% ์ด์ƒ์ด๋ฉด ์œ„ํ—˜์œผ๋กœ ํŒ๋‹จ
```
**๋” ๋ณด์ˆ˜์ ์œผ๋กœ**:
```python
result = model.predict(store_data, threshold=0.7)
# ํ์—… ํ™•๋ฅ  70% ์ด์ƒ์ด์–ด์•ผ ์œ„ํ—˜์œผ๋กœ ํŒ๋‹จ
```
---
## ์ฐธ๊ณ  ์ž๋ฃŒ
- XGBoost ๋ฌธ์„œ: https://xgboost.readthedocs.io/
- LightGBM ๋ฌธ์„œ: https://lightgbm.readthedocs.io/
- SMOTE ์„ค๋ช…: https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html