Upload xgboost_weekly_demand_hf
Browse files- README.md +185 -153
- feature_names.json +39 -0
- model_metadata.json +6 -6
- requirements.txt +24 -0
README.md
CHANGED
|
@@ -1,154 +1,186 @@
|
|
| 1 |
-
# XGBoost Weekly Demand Forecasting Model
|
| 2 |
-
|
| 3 |
-
[](https://huggingface.co/datasets/username/xgboost-weekly-demand)
|
| 4 |
-
|
| 5 |
-
## Overview
|
| 6 |
-
Model ini dikembangkan untuk meramalkan permintaan produk mingguan menggunakan algoritma XGBoost dengan fitur yang ditingkatkan. Model ini menggunakan pendekatan time series forecasting dengan validasi yang tepat menggunakan TimeSeriesSplit untuk mencegah kebocoran data masa depan.
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
-
|
| 93 |
-
-
|
| 94 |
-
-
|
| 95 |
-
-
|
| 96 |
-
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
#
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 154 |
Tanggal: November 2023
|
|
|
|
| 1 |
+
# XGBoost Weekly Demand Forecasting Model
|
| 2 |
+
|
| 3 |
+
[](https://huggingface.co/datasets/username/xgboost-weekly-demand)
|
| 4 |
+
|
| 5 |
+
## Overview
|
| 6 |
+
Model ini dikembangkan untuk meramalkan permintaan produk mingguan menggunakan algoritma XGBoost dengan fitur yang ditingkatkan. Model ini menggunakan pendekatan time series forecasting dengan validasi yang tepat menggunakan TimeSeriesSplit untuk mencegah kebocoran data masa depan.
|
| 7 |
+
|
| 8 |
+

|
| 9 |
+
|
| 10 |
+
> **Catatan:** Gambar di atas akan tersedia setelah model diunggah ke Hugging Face
|
| 11 |
+
|
| 12 |
+
## Performa Model
|
| 13 |
+
|
| 14 |
+
Model menunjukkan performa yang sangat baik dengan metrik sebagai berikut:
|
| 15 |
+
|
| 16 |
+
- **R² Score**: 0.9677
|
| 17 |
+
- **RMSE**: 15.06
|
| 18 |
+
- **MAE**: 1.33
|
| 19 |
+
- **MAPE**: 1.4%
|
| 20 |
+
|
| 21 |
+
### Performa per Segmen
|
| 22 |
+
|
| 23 |
+
Model memberikan hasil yang konsisten di semua segmen permintaan:
|
| 24 |
+
|
| 25 |
+
| Segmen Permintaan | MAPE |
|
| 26 |
+
|-------------------|------|
|
| 27 |
+
| Rendah (≤25%) | 0.6% |
|
| 28 |
+
| Sedang (25-75%) | 1.4% |
|
| 29 |
+
| Tinggi (>75%) | 2.3% |
|
| 30 |
+
|
| 31 |
+
## Visualisasi Utama
|
| 32 |
+
|
| 33 |
+
### Aktual vs Prediksi
|
| 34 |
+
|
| 35 |
+
```python
|
| 36 |
+
# Visualisasi Performa Model
|
| 37 |
+
plt.style.use('default')
|
| 38 |
+
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
|
| 39 |
+
fig.suptitle('Analisis Performa Model XGBoost', fontsize=16, fontweight='bold')
|
| 40 |
+
|
| 41 |
+
# 1. Scatter Plot Aktual vs Prediksi
|
| 42 |
+
ax1 = axes[0, 0]
|
| 43 |
+
ax1.scatter(y_test, y_test_pred, alpha=0.6, s=20)
|
| 44 |
+
ax1.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
|
| 45 |
+
ax1.set_xlabel('Permintaan Aktual')
|
| 46 |
+
ax1.set_ylabel('Permintaan Prediksi')
|
| 47 |
+
ax1.set_title(f'Aktual vs Prediksi\nR² = {test_metrics["R2"]:.3f}')
|
| 48 |
+
ax1.grid(True, alpha=0.3)
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
### Analisis Residual
|
| 52 |
+
|
| 53 |
+
```python
|
| 54 |
+
# 2. Plot Residual
|
| 55 |
+
ax2 = axes[0, 1]
|
| 56 |
+
residuals = y_test - y_test_pred
|
| 57 |
+
ax2.scatter(y_test_pred, residuals, alpha=0.6, s=20)
|
| 58 |
+
ax2.axhline(y=0, color='r', linestyle='--')
|
| 59 |
+
ax2.set_xlabel('Permintaan Prediksi')
|
| 60 |
+
ax2.set_ylabel('Residual')
|
| 61 |
+
ax2.set_title('Plot Residual')
|
| 62 |
+
ax2.grid(True, alpha=0.3)
|
| 63 |
+
|
| 64 |
+
# 3. Distribusi Error
|
| 65 |
+
ax3 = axes[1, 0]
|
| 66 |
+
ax3.hist(residuals, bins=30, alpha=0.7, edgecolor='black')
|
| 67 |
+
ax3.axvline(x=0, color='r', linestyle='--', linewidth=2)
|
| 68 |
+
ax3.set_xlabel('Residual')
|
| 69 |
+
ax3.set_ylabel('Frekuensi')
|
| 70 |
+
ax3.set_title(f'Distribusi Error\nRata-rata = {np.mean(residuals):.2f}')
|
| 71 |
+
ax3.grid(True, alpha=0.3)
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
## Peningkatan yang Diimplementasikan
|
| 75 |
+
|
| 76 |
+
1. **Feature Engineering Lanjutan**
|
| 77 |
+
- Fitur lag (1, 2, 3, 4 minggu)
|
| 78 |
+
- Fitur rolling window (mean, std, max dengan jendela 2, 4, 8 minggu)
|
| 79 |
+
- Fitur tren (2 minggu, 4 minggu)
|
| 80 |
+
- Fitur musiman (sin/cos transformasi bulan dan minggu)
|
| 81 |
+
- Fitur interaksi (harga, kategori, hari libur)
|
| 82 |
+
- Fitur relatif (permintaan vs rata-rata, harga vs kategori)
|
| 83 |
+
- Fitur level permintaan (permintaan nol, permintaan rendah)
|
| 84 |
+
|
| 85 |
+
2. **Transformasi Target**
|
| 86 |
+
- Transformasi `log1p` untuk menangani distribusi permintaan yang miring
|
| 87 |
+
|
| 88 |
+
3. **Optimasi Hyperparameter**
|
| 89 |
+
- Parameter terbaik dari RandomizedSearchCV:
|
| 90 |
+
- n_estimators: 300
|
| 91 |
+
- max_depth: 7
|
| 92 |
+
- learning_rate: 0.12
|
| 93 |
+
- subsample: 1.0
|
| 94 |
+
- colsample_bytree: 1.0
|
| 95 |
+
- reg_alpha: 0.1
|
| 96 |
+
- reg_lambda: 1.0
|
| 97 |
+
|
| 98 |
+
4. **Validasi Time Series**
|
| 99 |
+
- Menggunakan TimeSeriesSplit untuk validasi yang sadar waktu
|
| 100 |
+
- 3 pembagian waktu untuk validasi yang kuat
|
| 101 |
+
|
| 102 |
+
## Fitur yang Digunakan
|
| 103 |
+
|
| 104 |
+
Total 37 fitur digunakan dalam model, termasuk 11 fitur dasar dan 26 fitur lanjutan:
|
| 105 |
+
|
| 106 |
+
```python
|
| 107 |
+
base_features = [
|
| 108 |
+
'Year', 'Month', 'Quarter', 'WeekNumber', 'IsHoliday',
|
| 109 |
+
'UnitPrice', 'ProductCategory_encoded',
|
| 110 |
+
'TotalDemand', 'AvgDemand', 'TotalRevenue', 'UniqueCustomers'
|
| 111 |
+
]
|
| 112 |
+
|
| 113 |
+
lag_features = ['Demand_lag_1', 'Demand_lag_2', 'Demand_lag_3', 'Demand_lag_4']
|
| 114 |
+
rolling_features = ['Demand_rolling_mean_2', 'Demand_rolling_mean_4', ...]
|
| 115 |
+
trend_features = ['Demand_trend_2w', 'Demand_trend_4w']
|
| 116 |
+
seasonal_features = ['Month_sin', 'Month_cos', 'Week_sin', 'Week_cos']
|
| 117 |
+
interaction_features = ['Price_Category_Interaction', 'Holiday_Category_Interaction', 'Price_Holiday_Interaction']
|
| 118 |
+
relative_features = ['Demand_vs_AvgDemand', 'Price_vs_Category_Mean']
|
| 119 |
+
demand_level_features = ['Is_Zero_Demand', 'Is_Low_Demand']
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
## Informasi Dataset
|
| 123 |
+
|
| 124 |
+
- **Training**: 70,954 sampel
|
| 125 |
+
- **Validasi**: 15,204 sampel
|
| 126 |
+
- **Testing**: 15,205 sampel
|
| 127 |
+
- **Periode Training**: 2010-W48 sampai 2011-W49
|
| 128 |
+
- **Periode Testing**: 2010-W48 sampai 2011-W49
|
| 129 |
+
|
| 130 |
+
## Penggunaan Model
|
| 131 |
+
|
| 132 |
+
### Loading Model
|
| 133 |
+
|
| 134 |
+
```python
|
| 135 |
+
import joblib
|
| 136 |
+
import json
|
| 137 |
+
|
| 138 |
+
# Load model
|
| 139 |
+
model_path = 'models/xgboost_weekly_demand_hf'
|
| 140 |
+
model = joblib.load(f'{model_path}/xgboost_model.joblib')
|
| 141 |
+
|
| 142 |
+
# Load feature names
|
| 143 |
+
with open(f'{model_path}/feature_names.json', 'r') as f:
|
| 144 |
+
feature_names = json.load(f)
|
| 145 |
+
|
| 146 |
+
# Load model metadata
|
| 147 |
+
with open(f'{model_path}/model_metadata.json', 'r') as f:
|
| 148 |
+
model_metadata = json.load(f)
|
| 149 |
+
```
|
| 150 |
+
|
| 151 |
+
### Membuat Prediksi
|
| 152 |
+
|
| 153 |
+
```python
|
| 154 |
+
import numpy as np
|
| 155 |
+
|
| 156 |
+
def transform_target(y, method='log1p'):
|
| 157 |
+
if method == 'log1p':
|
| 158 |
+
return np.log1p(y)
|
| 159 |
+
elif method == 'sqrt':
|
| 160 |
+
return np.sqrt(y)
|
| 161 |
+
return y
|
| 162 |
+
|
| 163 |
+
def inverse_transform_target(y_transformed, method='log1p'):
|
| 164 |
+
if method == 'log1p':
|
| 165 |
+
return np.expm1(y_transformed)
|
| 166 |
+
elif method == 'sqrt':
|
| 167 |
+
return np.square(y_transformed)
|
| 168 |
+
return y_transformed
|
| 169 |
+
|
| 170 |
+
# Prediksi dengan data baru
|
| 171 |
+
X_new = df[feature_names].fillna(0)
|
| 172 |
+
y_pred_transformed = model.predict(X_new)
|
| 173 |
+
y_pred = inverse_transform_target(y_pred_transformed, 'log1p')
|
| 174 |
+
y_pred = np.maximum(y_pred, 0) # Pastikan permintaan non-negatif
|
| 175 |
+
```
|
| 176 |
+
|
| 177 |
+
## Kesimpulan
|
| 178 |
+
|
| 179 |
+
Model XGBoost yang ditingkatkan ini menunjukkan performa yang sangat baik untuk peramalan permintaan mingguan. Dengan R² score 0.968 dan MAPE hanya 1.4%, model ini memberikan prediksi yang akurat di semua segmen permintaan.
|
| 180 |
+
|
| 181 |
+
Peningkatan utama dalam model ini meliputi feature engineering lanjutan, transformasi target, dan validasi time series yang tepat. Model siap untuk produksi dengan kemampuan monitoring yang komprehensif.
|
| 182 |
+
|
| 183 |
+
---
|
| 184 |
+
|
| 185 |
+
Dibuat oleh: Tim Data Science
|
| 186 |
Tanggal: November 2023
|
feature_names.json
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
"Year",
|
| 3 |
+
"Month",
|
| 4 |
+
"Quarter",
|
| 5 |
+
"WeekNumber",
|
| 6 |
+
"IsHoliday",
|
| 7 |
+
"UnitPrice",
|
| 8 |
+
"ProductCategory_encoded",
|
| 9 |
+
"TotalDemand",
|
| 10 |
+
"AvgDemand",
|
| 11 |
+
"TotalRevenue",
|
| 12 |
+
"UniqueCustomers",
|
| 13 |
+
"Demand_lag_1",
|
| 14 |
+
"Demand_lag_2",
|
| 15 |
+
"Demand_lag_3",
|
| 16 |
+
"Demand_lag_4",
|
| 17 |
+
"Demand_rolling_mean_2",
|
| 18 |
+
"Demand_rolling_mean_4",
|
| 19 |
+
"Demand_rolling_mean_8",
|
| 20 |
+
"Demand_rolling_std_2",
|
| 21 |
+
"Demand_rolling_std_4",
|
| 22 |
+
"Demand_rolling_std_8",
|
| 23 |
+
"Demand_rolling_max_2",
|
| 24 |
+
"Demand_rolling_max_4",
|
| 25 |
+
"Demand_rolling_max_8",
|
| 26 |
+
"Demand_trend_2w",
|
| 27 |
+
"Demand_trend_4w",
|
| 28 |
+
"Month_sin",
|
| 29 |
+
"Month_cos",
|
| 30 |
+
"Week_sin",
|
| 31 |
+
"Week_cos",
|
| 32 |
+
"Price_Category_Interaction",
|
| 33 |
+
"Holiday_Category_Interaction",
|
| 34 |
+
"Price_Holiday_Interaction",
|
| 35 |
+
"Demand_vs_AvgDemand",
|
| 36 |
+
"Price_vs_Category_Mean",
|
| 37 |
+
"Is_Zero_Demand",
|
| 38 |
+
"Is_Low_Demand"
|
| 39 |
+
]
|
model_metadata.json
CHANGED
|
@@ -20,13 +20,13 @@
|
|
| 20 |
}
|
| 21 |
},
|
| 22 |
"best_params": {
|
| 23 |
-
"subsample":
|
| 24 |
-
"reg_lambda":
|
| 25 |
-
"reg_alpha": 0.
|
| 26 |
-
"n_estimators":
|
| 27 |
"max_depth": 7,
|
| 28 |
-
"learning_rate": 0.
|
| 29 |
-
"colsample_bytree":
|
| 30 |
},
|
| 31 |
"performance": {
|
| 32 |
"train_r2": 0.9994686245918274,
|
|
|
|
| 20 |
}
|
| 21 |
},
|
| 22 |
"best_params": {
|
| 23 |
+
"subsample": 0.9,
|
| 24 |
+
"reg_lambda": 2.0,
|
| 25 |
+
"reg_alpha": 0.5,
|
| 26 |
+
"n_estimators": 250,
|
| 27 |
"max_depth": 7,
|
| 28 |
+
"learning_rate": 0.08,
|
| 29 |
+
"colsample_bytree": 0.9
|
| 30 |
},
|
| 31 |
"performance": {
|
| 32 |
"train_r2": 0.9994686245918274,
|
requirements.txt
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Dependencies for Machine Learning Project
|
| 2 |
+
numpy>=1.21.0
|
| 3 |
+
pandas>=1.3.0
|
| 4 |
+
scikit-learn>=1.0.0
|
| 5 |
+
matplotlib>=3.4.0
|
| 6 |
+
seaborn>=0.11.0
|
| 7 |
+
jupyter>=1.0.0
|
| 8 |
+
plotly>=5.0.0
|
| 9 |
+
scipy>=1.7.0
|
| 10 |
+
|
| 11 |
+
# Machine Learning Models
|
| 12 |
+
xgboost>=1.6.0
|
| 13 |
+
joblib>=1.1.0
|
| 14 |
+
|
| 15 |
+
# Data processing
|
| 16 |
+
openpyxl>=3.0.0
|
| 17 |
+
xlrd>=2.0.0
|
| 18 |
+
|
| 19 |
+
# Model evaluation
|
| 20 |
+
mlflow>=1.20.0
|
| 21 |
+
|
| 22 |
+
# Utilities
|
| 23 |
+
tqdm>=4.62.0
|
| 24 |
+
python-dotenv>=0.19.0
|