Spaces:
Sleeping
Sleeping
Revised version
Browse files- .gitignore +3 -0
- README.md +103 -24
- app.py +194 -578
- backend.py +674 -0
- config.py +96 -56
- data_loader.py +9 -29
- data_processor.py +150 -111
- data_validator.py +0 -467
- evaluator.py +213 -16
- prophet_predictor.py +223 -40
- ui_components.py +322 -0
.gitignore
CHANGED
|
@@ -12,3 +12,6 @@ WORKFLOW.md
|
|
| 12 |
data/
|
| 13 |
hf_cache/
|
| 14 |
MODEL_WORKFLOW.md
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
data/
|
| 13 |
hf_cache/
|
| 14 |
MODEL_WORKFLOW.md
|
| 15 |
+
data_validator.py
|
| 16 |
+
utils/
|
| 17 |
+
.gitignore
|
README.md
CHANGED
|
@@ -9,42 +9,121 @@ app_file: app.py
|
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
| 12 |
-
# SKS Course Enrollment Prediction System
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
##
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
|
| 24 |
-
##
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
-
|
| 27 |
-
2. **Generate Predictions**: AI analyzes historical enrollment patterns
|
| 28 |
-
3. **View Recommendations**: See which courses should be opened and recommended quotas
|
| 29 |
-
4. **Review Metrics**: Check model performance (MAE, RMSE)
|
| 30 |
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
- **Cold Start**: For new courses without historical data
|
| 38 |
|
| 39 |
-
##
|
| 40 |
|
| 41 |
-
|
| 42 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
- **Data Processing**: Pandas, NumPy
|
|
|
|
| 44 |
- **Deployment**: Hugging Face Spaces
|
| 45 |
|
| 46 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
- Mean Absolute Error (MAE): ~31 students
|
| 50 |
- Root Mean Squared Error (RMSE): ~49 students
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# SKS Course Enrollment & Class Capacity Prediction System
|
| 13 |
|
| 14 |
+
Sistem prediksi **jumlah kelas yang perlu dibuka** berdasarkan forecasting enrollment dengan mempertimbangkan kapasitas maksimum per kelas menggunakan Prophet time series forecasting.
|
| 15 |
|
| 16 |
+
## How It Works
|
| 17 |
|
| 18 |
+
### Single Semester Prediction
|
| 19 |
+
1. **Pilih Tahun dan Semester**: Tentukan periode akademik yang akan diprediksi
|
| 20 |
+
2. **Generate Predictions**: AI menganalisis pola enrollment historis
|
| 21 |
+
3. **Lihat Rekomendasi Kelas**: Berapa kelas yang perlu dibuka untuk setiap mata kuliah
|
| 22 |
+
4. **Review Utilization**: Cek tingkat utilisasi kapasitas kelas
|
| 23 |
|
| 24 |
+
### Multi-Year Forecasting
|
| 25 |
+
1. **Tentukan Periode Awal**: Tahun dan semester mulai proyeksi
|
| 26 |
+
2. **Pilih Horizon Forecast**: Berapa tahun ke depan (1-5 tahun)
|
| 27 |
+
3. **Lihat Tren**: Bagaimana kebutuhan kelas berevolusi dari waktu ke waktu
|
| 28 |
|
| 29 |
+
## Class Capacity Logic
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
Contoh skenario:
|
| 32 |
+
- **PDST Course**: 10-15 mahasiswa (2023) → 40 mahasiswa (2024) → Proyeksi terus naik
|
| 33 |
+
- **Kapasitas Max**: 50 mahasiswa per kelas
|
| 34 |
+
- **Rekomendasi**: 1 kelas (jika ≤50), 2 kelas (jika 51-100), dst.
|
| 35 |
|
| 36 |
+
### Calculation Formula
|
| 37 |
+
```
|
| 38 |
+
Jumlah Kelas = ⌈Prediksi Enrollment / Kapasitas per Kelas⌉
|
| 39 |
+
```
|
|
|
|
| 40 |
|
| 41 |
+
## Prediction Strategy
|
| 42 |
|
| 43 |
+
Sistem menggunakan beberapa strategi forecasting:
|
| 44 |
+
- **Prophet Logistic Growth**: Untuk mata kuliah dengan data historis cukup, menggunakan kapasitas sebagai upper bound (cap)
|
| 45 |
+
- **Trend-Based Fallback**: Untuk prediksi Prophet yang tidak realistis
|
| 46 |
+
- **Mean Fallback**: Untuk mata kuliah dengan data terbatas
|
| 47 |
+
- **Cold Start**: Untuk mata kuliah baru tanpa data historis
|
| 48 |
+
|
| 49 |
+
## Technical Stack
|
| 50 |
+
|
| 51 |
+
- **Framework**: Gradio untuk UI
|
| 52 |
+
- **ML Model**: Facebook Prophet dengan logistic growth
|
| 53 |
- **Data Processing**: Pandas, NumPy
|
| 54 |
+
- **Visualization**: Matplotlib, Seaborn
|
| 55 |
- **Deployment**: Hugging Face Spaces
|
| 56 |
|
| 57 |
+
## Configuration
|
| 58 |
+
|
| 59 |
+
### Class Capacity Settings (config.py)
|
| 60 |
+
```python
|
| 61 |
+
@dataclass
|
| 62 |
+
class ClassCapacityConfig:
|
| 63 |
+
DEFAULT_CLASS_CAPACITY: int = 50 # Max students per class
|
| 64 |
+
MIN_STUDENTS_TO_OPEN_CLASS: int = 10 # Minimum to open a class
|
| 65 |
+
CAPACITY_WARNING_THRESHOLD: float = 0.8 # 80% utilization warning
|
| 66 |
+
ENABLE_CAPACITY_CONSTRAINTS: bool = True
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
### Multi-Year Forecast Settings
|
| 70 |
+
```python
|
| 71 |
+
@dataclass
|
| 72 |
+
class MultiYearForecastConfig:
|
| 73 |
+
FORECAST_YEARS_AHEAD: int = 3 # Years to forecast
|
| 74 |
+
MAX_YEARLY_GROWTH_RATE: float = 0.5 # 50% max growth/year
|
| 75 |
+
MIN_YEARLY_GROWTH_RATE: float = -0.3 # 30% max decline/year
|
| 76 |
+
```
|
| 77 |
|
| 78 |
+
## Model Performance
|
| 79 |
+
|
| 80 |
+
Model divalidasi melalui backtesting pada data historis:
|
| 81 |
+
|
| 82 |
+
### Enrollment Prediction
|
| 83 |
- Mean Absolute Error (MAE): ~31 students
|
| 84 |
- Root Mean Squared Error (RMSE): ~49 students
|
| 85 |
+
|
| 86 |
+
### Class Count Prediction
|
| 87 |
+
- Class MAE: ~0.5 classes
|
| 88 |
+
- Exact Class Match: ~70%
|
| 89 |
+
- Within ±1 Class: ~95%
|
| 90 |
+
|
| 91 |
+
## 🔧 Usage
|
| 92 |
+
|
| 93 |
+
### Local Development
|
| 94 |
+
```bash
|
| 95 |
+
# Install dependencies
|
| 96 |
+
pip install -r requirements.txt
|
| 97 |
+
|
| 98 |
+
# Run the app
|
| 99 |
+
python app.py
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
### Environment Variables
|
| 103 |
+
- `HF_TOKEN`: Hugging Face token untuk akses private dataset
|
| 104 |
+
|
| 105 |
+
## 📝 Output Columns
|
| 106 |
+
|
| 107 |
+
### Prediksi Semester
|
| 108 |
+
| Column | Description |
|
| 109 |
+
|--------|-------------|
|
| 110 |
+
| Kode MK | Kode mata kuliah |
|
| 111 |
+
| Nama MK | Nama mata kuliah |
|
| 112 |
+
| Prediksi | Prediksi jumlah mahasiswa |
|
| 113 |
+
| Jumlah Kelas | Rekomendasi jumlah kelas dibuka |
|
| 114 |
+
| Kapasitas/Kelas | Kapasitas maksimum per kelas |
|
| 115 |
+
| Total Kuota | Total kapasitas (Jumlah Kelas × Kapasitas) |
|
| 116 |
+
| Utilization % | Persentase utilisasi kapasitas |
|
| 117 |
+
| Status | BUKA/TUTUP |
|
| 118 |
+
| Confidence | high/medium/low |
|
| 119 |
+
| Strategy | Metode prediksi yang digunakan |
|
| 120 |
+
|
| 121 |
+
### Proyeksi Multi-Tahun
|
| 122 |
+
| Column | Description |
|
| 123 |
+
|--------|-------------|
|
| 124 |
+
| Tahun | Tahun prediksi |
|
| 125 |
+
| Kode MK | Kode mata kuliah |
|
| 126 |
+
| Nama MK | Nama mata kuliah |
|
| 127 |
+
| Prediksi | Prediksi enrollment |
|
| 128 |
+
| Kelas | Jumlah kelas dibutuhkan |
|
| 129 |
+
| Kapasitas | Total kapasitas tersedia |
|
app.py
CHANGED
|
@@ -1,616 +1,232 @@
|
|
| 1 |
-
# Version: 3.1 - Dark theme UI with white text
|
| 2 |
import logging
|
| 3 |
-
from typing import Optional, Tuple
|
| 4 |
|
| 5 |
import gradio as gr
|
| 6 |
-
import pandas as pd
|
| 7 |
|
| 8 |
-
from
|
| 9 |
-
from
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
from utils import setup_logging
|
| 13 |
|
| 14 |
setup_logging("INFO")
|
| 15 |
logger = logging.getLogger("GradioApp")
|
| 16 |
|
| 17 |
-
_processor: Optional[DataProcessor] = None
|
| 18 |
-
_predictor: Optional[ProphetPredictor] = None
|
| 19 |
-
_config: Optional[Config] = None
|
| 20 |
-
_df_enrollment: Optional[pd.DataFrame] = None
|
| 21 |
-
_elective_codes: Optional[set] = None
|
| 22 |
-
_backtest_metrics: Optional[dict] = None
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
def initialize_system():
|
| 26 |
-
"""Initialize the prediction system (called once at startup)."""
|
| 27 |
-
global \
|
| 28 |
-
_processor, \
|
| 29 |
-
_predictor, \
|
| 30 |
-
_config, \
|
| 31 |
-
_df_enrollment, \
|
| 32 |
-
_elective_codes, \
|
| 33 |
-
_backtest_metrics
|
| 34 |
-
|
| 35 |
-
try:
|
| 36 |
-
logger.info("Initializing prediction system...")
|
| 37 |
-
_config = Config()
|
| 38 |
-
|
| 39 |
-
_processor = DataProcessor(_config)
|
| 40 |
-
_df_enrollment, _elective_codes = _processor.load_and_process()
|
| 41 |
-
|
| 42 |
-
_predictor = ProphetPredictor(_config)
|
| 43 |
-
_predictor.train_student_population_model(
|
| 44 |
-
_processor.raw_data["students_yearly"]
|
| 45 |
-
)
|
| 46 |
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
def generate_predictions(
|
| 55 |
-
year: int, semester: int
|
| 56 |
-
) -> Tuple[str, Optional[pd.DataFrame], Optional[pd.DataFrame]]:
|
| 57 |
-
"""
|
| 58 |
-
Generate enrollment predictions for a given year and semester.
|
| 59 |
-
|
| 60 |
-
Args:
|
| 61 |
-
year: Target year (e.g., 2025)
|
| 62 |
-
semester: Target semester (1 = Ganjil, 2 = Genap)
|
| 63 |
-
|
| 64 |
-
Returns:
|
| 65 |
-
Tuple of (summary_text, all_predictions_df, comparison_df)
|
| 66 |
-
"""
|
| 67 |
-
global \
|
| 68 |
-
_processor, \
|
| 69 |
-
_predictor, \
|
| 70 |
-
_config, \
|
| 71 |
-
_df_enrollment, \
|
| 72 |
-
_elective_codes, \
|
| 73 |
-
_backtest_metrics
|
| 74 |
-
|
| 75 |
-
try:
|
| 76 |
-
if semester not in [1, 2]:
|
| 77 |
-
return (
|
| 78 |
-
"Error: Semester harus 1 (Ganjil) atau 2 (Genap)",
|
| 79 |
-
None,
|
| 80 |
-
None,
|
| 81 |
-
)
|
| 82 |
-
|
| 83 |
-
if year < 2020 or year > 2030:
|
| 84 |
-
return "Error: Year must be between 2020 and 2030", None, None
|
| 85 |
-
|
| 86 |
-
if (
|
| 87 |
-
_config is None
|
| 88 |
-
or _predictor is None
|
| 89 |
-
or _processor is None
|
| 90 |
-
or _df_enrollment is None
|
| 91 |
-
or _elective_codes is None
|
| 92 |
-
):
|
| 93 |
-
return (
|
| 94 |
-
"Error: System not initialized. Please restart the app.",
|
| 95 |
-
None,
|
| 96 |
-
None,
|
| 97 |
-
)
|
| 98 |
-
|
| 99 |
-
logger.info(f"Generating predictions for {year} Semester {semester}...")
|
| 100 |
-
|
| 101 |
-
_config.prediction.PREDICT_YEAR = year
|
| 102 |
-
_config.prediction.PREDICT_SEMESTER = semester
|
| 103 |
-
|
| 104 |
-
# Check if actual data exists for this year/semester
|
| 105 |
-
actual_data = _df_enrollment[
|
| 106 |
-
(_df_enrollment["thn"] == year) & (_df_enrollment["smt"] == semester)
|
| 107 |
-
]
|
| 108 |
-
has_actual_data = len(actual_data) > 0
|
| 109 |
-
|
| 110 |
-
if has_actual_data:
|
| 111 |
-
logger.info(
|
| 112 |
-
f"Found actual enrollment data for {year} Semester {semester} - will compare predictions vs actual"
|
| 113 |
-
)
|
| 114 |
-
else:
|
| 115 |
-
logger.info(
|
| 116 |
-
f"No actual data for {year} Semester {semester} - generating future predictions"
|
| 117 |
-
)
|
| 118 |
-
|
| 119 |
-
if _backtest_metrics is None:
|
| 120 |
-
logger.info("Running backtest for the first time...")
|
| 121 |
-
evaluator = Evaluator(_config)
|
| 122 |
-
backtest_results = evaluator.run_backtest(_df_enrollment, _predictor)
|
| 123 |
-
|
| 124 |
-
if backtest_results is None or len(backtest_results) == 0:
|
| 125 |
-
logger.warning("Backtest returned no results, using defaults")
|
| 126 |
-
_backtest_metrics = {"mae": 0, "rmse": 0}
|
| 127 |
-
else:
|
| 128 |
-
metrics_result = evaluator.generate_metrics(backtest_results)
|
| 129 |
-
if metrics_result is None:
|
| 130 |
-
logger.warning("Metrics calculation failed, using defaults")
|
| 131 |
-
_backtest_metrics = {"mae": 0, "rmse": 0}
|
| 132 |
-
else:
|
| 133 |
-
_backtest_metrics = metrics_result
|
| 134 |
-
else:
|
| 135 |
-
logger.info("Using cached backtest metrics")
|
| 136 |
-
|
| 137 |
-
metrics = _backtest_metrics
|
| 138 |
-
|
| 139 |
-
predictions = _predictor.generate_batch_predictions(
|
| 140 |
-
_df_enrollment,
|
| 141 |
-
_processor.raw_data["courses"],
|
| 142 |
-
_elective_codes,
|
| 143 |
-
year,
|
| 144 |
-
semester,
|
| 145 |
-
)
|
| 146 |
|
| 147 |
-
semester_name = "1 (Ganjil)" if semester == 1 else "2 (Genap)"
|
| 148 |
-
total_to_open = len(predictions[predictions["recommendation"] == "BUKA"])
|
| 149 |
-
total_seats = (
|
| 150 |
-
int(
|
| 151 |
-
predictions[predictions["recommendation"] == "BUKA"][
|
| 152 |
-
"recommended_quota"
|
| 153 |
-
].sum()
|
| 154 |
-
)
|
| 155 |
-
if total_to_open > 0
|
| 156 |
-
else 0
|
| 157 |
-
)
|
| 158 |
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
comparison = predictions.merge(
|
| 163 |
-
actual_data[["kode_mk", "enrollment"]], on="kode_mk", how="left"
|
| 164 |
-
)
|
| 165 |
-
comparison = comparison.rename(columns={"enrollment": "actual_enrollment"})
|
| 166 |
-
|
| 167 |
-
# Calculate comparison metrics only for courses with actual data
|
| 168 |
-
courses_with_actual = comparison[
|
| 169 |
-
comparison["actual_enrollment"].notna()
|
| 170 |
-
].copy()
|
| 171 |
-
|
| 172 |
-
if len(courses_with_actual) > 0:
|
| 173 |
-
comparison_mae = abs(
|
| 174 |
-
courses_with_actual["predicted_enrollment"]
|
| 175 |
-
- courses_with_actual["actual_enrollment"]
|
| 176 |
-
).mean()
|
| 177 |
-
comparison_rmse = (
|
| 178 |
-
(
|
| 179 |
-
courses_with_actual["predicted_enrollment"]
|
| 180 |
-
- courses_with_actual["actual_enrollment"]
|
| 181 |
-
)
|
| 182 |
-
** 2
|
| 183 |
-
).mean() ** 0.5
|
| 184 |
-
total_actual = courses_with_actual["actual_enrollment"].sum()
|
| 185 |
-
total_predicted = courses_with_actual["predicted_enrollment"].sum()
|
| 186 |
-
accuracy_pct = (
|
| 187 |
-
1 - abs(total_predicted - total_actual) / total_actual
|
| 188 |
-
) * 100
|
| 189 |
-
|
| 190 |
-
diff_color = (
|
| 191 |
-
"#4ade80" if total_predicted - total_actual >= 0 else "#f87171"
|
| 192 |
-
)
|
| 193 |
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
<div style="margin-bottom: 24px;">
|
| 197 |
-
<h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">{year} Semester {semester_name}</h2>
|
| 198 |
-
<p style="color: #9ca3af; margin: 0; font-size: 14px;">Validasi prediksi terhadap data aktual</p>
|
| 199 |
-
</div>
|
| 200 |
-
|
| 201 |
-
<div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px; margin-bottom: 24px;">
|
| 202 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #4ade80;">
|
| 203 |
-
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">Akurasi</div>
|
| 204 |
-
<div style="font-size: 28px; font-weight: 700; color: #4ade80;">{accuracy_pct:.1f}%</div>
|
| 205 |
-
</div>
|
| 206 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #60a5fa;">
|
| 207 |
-
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">MAE</div>
|
| 208 |
-
<div style="font-size: 28px; font-weight: 700; color: #60a5fa;">{comparison_mae:.2f}</div>
|
| 209 |
-
</div>
|
| 210 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #a78bfa;">
|
| 211 |
-
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">RMSE</div>
|
| 212 |
-
<div style="font-size: 28px; font-weight: 700; color: #a78bfa;">{comparison_rmse:.2f}</div>
|
| 213 |
-
</div>
|
| 214 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #fb923c;">
|
| 215 |
-
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">MK Divalidasi</div>
|
| 216 |
-
<div style="font-size: 28px; font-weight: 700; color: #fb923c;">{len(courses_with_actual)}</div>
|
| 217 |
-
</div>
|
| 218 |
-
</div>
|
| 219 |
-
|
| 220 |
-
<div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 16px;">
|
| 221 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px;">
|
| 222 |
-
<h4 style="margin: 0 0 16px 0; color: #fff; font-size: 14px; font-weight: 600;">Ringkasan Enrollment</h4>
|
| 223 |
-
<div style="display: flex; justify-content: space-between; padding: 12px 0; border-bottom: 1px solid #334155;">
|
| 224 |
-
<span style="color: #9ca3af;">Total Aktual</span>
|
| 225 |
-
<span style="font-weight: 600; color: #fff;">{int(total_actual)}</span>
|
| 226 |
-
</div>
|
| 227 |
-
<div style="display: flex; justify-content: space-between; padding: 12px 0; border-bottom: 1px solid #334155;">
|
| 228 |
-
<span style="color: #9ca3af;">Total Prediksi</span>
|
| 229 |
-
<span style="font-weight: 600; color: #fff;">{int(total_predicted)}</span>
|
| 230 |
-
</div>
|
| 231 |
-
<div style="display: flex; justify-content: space-between; padding: 12px 0;">
|
| 232 |
-
<span style="color: #9ca3af;">Selisih</span>
|
| 233 |
-
<span style="font-weight: 600; color: {diff_color};">{int(total_predicted - total_actual):+d}</span>
|
| 234 |
-
</div>
|
| 235 |
-
</div>
|
| 236 |
-
|
| 237 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px;">
|
| 238 |
-
<h4 style="margin: 0 0 16px 0; color: #fff; font-size: 14px; font-weight: 600;">Rekomendasi</h4>
|
| 239 |
-
<div style="display: flex; justify-content: space-between; padding: 12px 0; border-bottom: 1px solid #334155;">
|
| 240 |
-
<span style="color: #9ca3af;">MK Dibuka</span>
|
| 241 |
-
<span style="font-weight: 600; color: #fff;">{total_to_open}</span>
|
| 242 |
-
</div>
|
| 243 |
-
<div style="display: flex; justify-content: space-between; padding: 12px 0; border-bottom: 1px solid #334155;">
|
| 244 |
-
<span style="color: #9ca3af;">Total Kuota</span>
|
| 245 |
-
<span style="font-weight: 600; color: #fff;">{total_seats}</span>
|
| 246 |
-
</div>
|
| 247 |
-
<div style="display: flex; justify-content: space-between; padding: 12px 0;">
|
| 248 |
-
<span style="color: #9ca3af;">Backtest MAE</span>
|
| 249 |
-
<span style="font-weight: 600; color: #fff;">{metrics["mae"]:.2f}</span>
|
| 250 |
-
</div>
|
| 251 |
-
</div>
|
| 252 |
-
</div>
|
| 253 |
-
</div>
|
| 254 |
-
"""
|
| 255 |
-
else:
|
| 256 |
-
summary = f"""
|
| 257 |
-
<div style="padding: 24px;">
|
| 258 |
-
<div style="margin-bottom: 24px;">
|
| 259 |
-
<h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">{year} Semester {semester_name}</h2>
|
| 260 |
-
<p style="color: #9ca3af; margin: 0; font-size: 14px;">Data semester ada, tetapi tidak ditemukan MK pilihan yang cocok</p>
|
| 261 |
-
</div>
|
| 262 |
-
|
| 263 |
-
<div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px;">
|
| 264 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #60a5fa;">
|
| 265 |
-
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">MAE (Backtest)</div>
|
| 266 |
-
<div style="font-size: 28px; font-weight: 700; color: #60a5fa;">{metrics["mae"]:.2f}</div>
|
| 267 |
-
</div>
|
| 268 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #a78bfa;">
|
| 269 |
-
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">RMSE (Backtest)</div>
|
| 270 |
-
<div style="font-size: 28px; font-weight: 700; color: #a78bfa;">{metrics["rmse"]:.2f}</div>
|
| 271 |
-
</div>
|
| 272 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #4ade80;">
|
| 273 |
-
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">MK Dibuka</div>
|
| 274 |
-
<div style="font-size: 28px; font-weight: 700; color: #4ade80;">{total_to_open}</div>
|
| 275 |
-
</div>
|
| 276 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #fb923c;">
|
| 277 |
-
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">Total Kuota</div>
|
| 278 |
-
<div style="font-size: 28px; font-weight: 700; color: #fb923c;">{total_seats}</div>
|
| 279 |
-
</div>
|
| 280 |
-
</div>
|
| 281 |
-
</div>
|
| 282 |
-
"""
|
| 283 |
-
else:
|
| 284 |
-
summary = f"""
|
| 285 |
-
<div style="padding: 24px;">
|
| 286 |
-
<div style="margin-bottom: 24px;">
|
| 287 |
-
<h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">{year} Semester {semester_name}</h2>
|
| 288 |
-
<p style="color: #9ca3af; margin: 0; font-size: 14px;">Prediksi masa depan berdasarkan tren historis</p>
|
| 289 |
-
</div>
|
| 290 |
-
|
| 291 |
-
<div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px; margin-bottom: 24px;">
|
| 292 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #60a5fa;">
|
| 293 |
-
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">MAE (Backtest)</div>
|
| 294 |
-
<div style="font-size: 28px; font-weight: 700; color: #60a5fa;">{metrics["mae"]:.2f}</div>
|
| 295 |
-
</div>
|
| 296 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #a78bfa;">
|
| 297 |
-
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">RMSE (Backtest)</div>
|
| 298 |
-
<div style="font-size: 28px; font-weight: 700; color: #a78bfa;">{metrics["rmse"]:.2f}</div>
|
| 299 |
-
</div>
|
| 300 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #4ade80;">
|
| 301 |
-
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">MK Dibuka</div>
|
| 302 |
-
<div style="font-size: 28px; font-weight: 700; color: #4ade80;">{total_to_open}</div>
|
| 303 |
-
</div>
|
| 304 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #fb923c;">
|
| 305 |
-
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">Total Kuota</div>
|
| 306 |
-
<div style="font-size: 28px; font-weight: 700; color: #fb923c;">{total_seats}</div>
|
| 307 |
-
</div>
|
| 308 |
-
</div>
|
| 309 |
-
|
| 310 |
-
<div style="background: #1e293b; padding: 20px; border-radius: 12px; max-width: 300px;">
|
| 311 |
-
<h4 style="margin: 0 0 16px 0; color: #fff; font-size: 14px; font-weight: 600;">Estimasi Total</h4>
|
| 312 |
-
<div style="display: flex; justify-content: space-between; padding: 12px 0;">
|
| 313 |
-
<span style="color: #9ca3af;">Total Mahasiswa</span>
|
| 314 |
-
<span style="font-weight: 600; color: #fff;">{int(predictions["predicted_enrollment"].sum())}</span>
|
| 315 |
-
</div>
|
| 316 |
-
</div>
|
| 317 |
-
</div>
|
| 318 |
-
"""
|
| 319 |
-
|
| 320 |
-
# Prepare all predictions display
|
| 321 |
-
all_predictions_display = predictions[
|
| 322 |
-
[
|
| 323 |
-
"kode_mk",
|
| 324 |
-
"nama_mk",
|
| 325 |
-
"predicted_enrollment",
|
| 326 |
-
"recommended_quota",
|
| 327 |
-
"recommendation",
|
| 328 |
-
"confidence",
|
| 329 |
-
"strategy",
|
| 330 |
-
]
|
| 331 |
-
].copy()
|
| 332 |
-
all_predictions_display.columns = [
|
| 333 |
-
"Kode MK",
|
| 334 |
-
"Nama MK",
|
| 335 |
-
"Prediksi",
|
| 336 |
-
"Kuota",
|
| 337 |
-
"Status",
|
| 338 |
-
"Confidence",
|
| 339 |
-
"Strategy",
|
| 340 |
-
]
|
| 341 |
-
all_predictions_display["Prediksi"] = all_predictions_display["Prediksi"].round(
|
| 342 |
-
1
|
| 343 |
-
)
|
| 344 |
-
all_predictions_display["Kuota"] = all_predictions_display["Kuota"].astype(int)
|
| 345 |
|
| 346 |
-
|
| 347 |
-
|
| 348 |
-
|
| 349 |
-
|
|
|
|
|
|
|
|
|
|
| 350 |
|
| 351 |
-
|
| 352 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 353 |
)
|
| 354 |
|
| 355 |
-
# Prepare comparison table if actual data exists
|
| 356 |
-
comparison_display = None
|
| 357 |
-
if has_actual_data:
|
| 358 |
-
logger.info(
|
| 359 |
-
f"Building comparison table - Actual data has {len(actual_data)} courses"
|
| 360 |
-
)
|
| 361 |
-
logger.info(f"Predictions has {len(predictions)} courses")
|
| 362 |
-
|
| 363 |
-
comparison = predictions.merge(
|
| 364 |
-
actual_data[["kode_mk", "enrollment"]], on="kode_mk", how="left"
|
| 365 |
-
)
|
| 366 |
-
comparison = comparison.rename(columns={"enrollment": "actual_enrollment"})
|
| 367 |
-
|
| 368 |
-
# Filter to courses with actual data and calculate error
|
| 369 |
-
courses_with_actual = comparison[
|
| 370 |
-
comparison["actual_enrollment"].notna()
|
| 371 |
-
].copy()
|
| 372 |
-
|
| 373 |
-
logger.info(
|
| 374 |
-
f"Courses with matching actual data: {len(courses_with_actual)}"
|
| 375 |
-
)
|
| 376 |
-
if len(courses_with_actual) > 0:
|
| 377 |
-
logger.info(
|
| 378 |
-
f"Matching courses: {courses_with_actual['kode_mk'].tolist()}"
|
| 379 |
-
)
|
| 380 |
|
| 381 |
-
|
| 382 |
-
|
| 383 |
-
|
| 384 |
-
- courses_with_actual["actual_enrollment"]
|
| 385 |
-
)
|
| 386 |
-
courses_with_actual["abs_error"] = abs(courses_with_actual["error"])
|
| 387 |
-
courses_with_actual["accuracy_%"] = 100 * (
|
| 388 |
-
1
|
| 389 |
-
- courses_with_actual["abs_error"]
|
| 390 |
-
/ courses_with_actual["actual_enrollment"].replace(0, 1)
|
| 391 |
-
)
|
| 392 |
|
| 393 |
-
|
| 394 |
-
|
| 395 |
-
|
| 396 |
-
|
| 397 |
-
|
| 398 |
-
|
| 399 |
-
|
| 400 |
-
|
| 401 |
-
|
| 402 |
-
|
| 403 |
-
|
| 404 |
-
|
| 405 |
-
|
| 406 |
-
|
| 407 |
-
|
| 408 |
-
|
| 409 |
-
|
| 410 |
-
|
| 411 |
-
|
| 412 |
-
|
| 413 |
-
|
| 414 |
-
|
| 415 |
-
|
| 416 |
-
|
| 417 |
-
|
| 418 |
-
|
| 419 |
-
|
| 420 |
-
|
| 421 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 422 |
)
|
| 423 |
-
|
| 424 |
-
|
|
|
|
|
|
|
| 425 |
)
|
| 426 |
|
| 427 |
-
|
| 428 |
-
"
|
| 429 |
-
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 430 |
|
| 431 |
-
|
| 432 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 433 |
)
|
| 434 |
-
|
| 435 |
-
|
| 436 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 437 |
)
|
| 438 |
-
logger.warning(f"Predicted courses: {predictions['kode_mk'].tolist()}")
|
| 439 |
-
logger.warning(f"Actual courses: {actual_data['kode_mk'].tolist()}")
|
| 440 |
|
| 441 |
-
|
| 442 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 443 |
)
|
| 444 |
-
return summary, all_predictions_display, comparison_display
|
| 445 |
|
| 446 |
-
|
| 447 |
-
|
| 448 |
-
|
| 449 |
-
|
|
|
|
|
|
|
|
|
|
| 450 |
|
| 451 |
|
| 452 |
-
def get_data_info() -> str:
|
| 453 |
-
"""Get information about the loaded dataset."""
|
| 454 |
-
global _processor, _config
|
| 455 |
-
|
| 456 |
-
try:
|
| 457 |
-
if _processor is None or _config is None:
|
| 458 |
-
return "System not initialized"
|
| 459 |
-
|
| 460 |
-
courses = _processor.raw_data.get("courses")
|
| 461 |
-
students = _processor.raw_data.get("students_yearly")
|
| 462 |
-
|
| 463 |
-
if courses is None or students is None:
|
| 464 |
-
return "Data not loaded"
|
| 465 |
-
|
| 466 |
-
# Get elective courses
|
| 467 |
-
elective_courses = courses[courses["kategori_mk"] == "P"]
|
| 468 |
-
|
| 469 |
-
info = f"""
|
| 470 |
-
<div style="padding: 8px 0;">
|
| 471 |
-
<div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 12px;">
|
| 472 |
-
<div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
|
| 473 |
-
<div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">Total MK</div>
|
| 474 |
-
<div style="font-size: 20px; font-weight: 700; color: #fff;">{len(courses)}</div>
|
| 475 |
-
</div>
|
| 476 |
-
<div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
|
| 477 |
-
<div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">MK Pilihan</div>
|
| 478 |
-
<div style="font-size: 20px; font-weight: 700; color: #4ade80;">{len(elective_courses)}</div>
|
| 479 |
-
</div>
|
| 480 |
-
<div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
|
| 481 |
-
<div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">MK Wajib</div>
|
| 482 |
-
<div style="font-size: 20px; font-weight: 700; color: #fff;">{len(courses) - len(elective_courses)}</div>
|
| 483 |
-
</div>
|
| 484 |
-
<div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
|
| 485 |
-
<div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">Tahun Data</div>
|
| 486 |
-
<div style="font-size: 20px; font-weight: 700; color: #60a5fa;">{students["thn"].min()}-{students["thn"].max()}</div>
|
| 487 |
-
</div>
|
| 488 |
-
</div>
|
| 489 |
-
</div>
|
| 490 |
-
"""
|
| 491 |
-
return info
|
| 492 |
-
|
| 493 |
-
except Exception as e:
|
| 494 |
-
return f"Error getting data info: {str(e)}"
|
| 495 |
-
|
| 496 |
-
|
| 497 |
-
# Initialize system at startup
|
| 498 |
logger.info("Starting Gradio app...")
|
| 499 |
-
|
|
|
|
| 500 |
|
| 501 |
if not init_success:
|
| 502 |
logger.error("Failed to initialize system. App may not work correctly.")
|
| 503 |
|
| 504 |
-
|
| 505 |
-
with gr.Blocks(title="SKS Enrollment Predictor") as demo:
|
| 506 |
-
# Header
|
| 507 |
-
gr.Markdown("# Course Enrollment Predictor")
|
| 508 |
-
|
| 509 |
-
with gr.Row():
|
| 510 |
-
# Left panel - Controls
|
| 511 |
-
with gr.Column(scale=1, min_width=300):
|
| 512 |
-
year_input = gr.Number(
|
| 513 |
-
label="Tahun",
|
| 514 |
-
value=2025,
|
| 515 |
-
precision=0,
|
| 516 |
-
minimum=2020,
|
| 517 |
-
maximum=2030,
|
| 518 |
-
)
|
| 519 |
-
|
| 520 |
-
semester_input = gr.Radio(
|
| 521 |
-
choices=[("1 (Ganjil)", 1), ("2 (Genap)", 2)],
|
| 522 |
-
label="Semester",
|
| 523 |
-
value=2,
|
| 524 |
-
)
|
| 525 |
-
|
| 526 |
-
predict_btn = gr.Button(
|
| 527 |
-
"Generate Predictions",
|
| 528 |
-
variant="primary",
|
| 529 |
-
size="lg",
|
| 530 |
-
)
|
| 531 |
-
|
| 532 |
-
gr.Markdown("---")
|
| 533 |
-
|
| 534 |
-
# Data info section
|
| 535 |
-
with gr.Accordion("Dataset Info", open=False):
|
| 536 |
-
data_info_output = gr.HTML()
|
| 537 |
-
demo.load(fn=get_data_info, inputs=[], outputs=data_info_output)
|
| 538 |
-
|
| 539 |
-
# Right panel - Results
|
| 540 |
-
with gr.Column(scale=3):
|
| 541 |
-
summary_output = gr.HTML(
|
| 542 |
-
value="""
|
| 543 |
-
<div style="padding: 60px 40px; text-align: center; background: #1e293b; border-radius: 12px;">
|
| 544 |
-
<h3 style="color: #fff; margin: 0 0 8px 0; font-size: 18px; font-weight: 600;">Pilih tahun dan semester</h3>
|
| 545 |
-
<p style="color: #9ca3af; margin: 0; font-size: 14px;">Klik Generate Predictions untuk melihat hasil</p>
|
| 546 |
-
</div>
|
| 547 |
-
"""
|
| 548 |
-
)
|
| 549 |
-
|
| 550 |
-
gr.Markdown("---")
|
| 551 |
-
|
| 552 |
-
# Predictions table
|
| 553 |
-
gr.Markdown("### Daftar Prediksi Mata Kuliah")
|
| 554 |
-
all_predictions_output = gr.Dataframe(
|
| 555 |
-
label="",
|
| 556 |
-
wrap=True,
|
| 557 |
-
interactive=False,
|
| 558 |
-
)
|
| 559 |
-
|
| 560 |
-
# Comparison section
|
| 561 |
-
with gr.Accordion("Detail Validasi", open=False) as comparison_accordion:
|
| 562 |
-
comparison_info = gr.Markdown(
|
| 563 |
-
value="Data validasi muncul ketika data aktual tersedia",
|
| 564 |
-
)
|
| 565 |
-
comparison_output = gr.Dataframe(
|
| 566 |
-
label="",
|
| 567 |
-
wrap=True,
|
| 568 |
-
interactive=False,
|
| 569 |
-
)
|
| 570 |
-
|
| 571 |
-
def update_ui_with_predictions(year, semester):
|
| 572 |
-
"""Wrapper to handle UI updates based on whether comparison data exists."""
|
| 573 |
-
summary, all_predictions, comparison = generate_predictions(year, semester)
|
| 574 |
-
|
| 575 |
-
logger.info(
|
| 576 |
-
f"UI Update: comparison is None: {comparison is None}, empty: {comparison.empty if comparison is not None else 'N/A'}"
|
| 577 |
-
)
|
| 578 |
-
|
| 579 |
-
if comparison is not None and not comparison.empty:
|
| 580 |
-
logger.info(f"Showing comparison table with {len(comparison)} rows")
|
| 581 |
-
return (
|
| 582 |
-
summary,
|
| 583 |
-
all_predictions,
|
| 584 |
-
gr.update(open=True),
|
| 585 |
-
gr.update(
|
| 586 |
-
value=f"Validasi terhadap {len(comparison)} mata kuliah",
|
| 587 |
-
),
|
| 588 |
-
gr.update(value=comparison),
|
| 589 |
-
)
|
| 590 |
-
else:
|
| 591 |
-
logger.info("Hiding comparison table - no data available")
|
| 592 |
-
return (
|
| 593 |
-
summary,
|
| 594 |
-
all_predictions,
|
| 595 |
-
gr.update(open=False),
|
| 596 |
-
gr.update(
|
| 597 |
-
value="Tidak ada data validasi untuk prediksi masa depan",
|
| 598 |
-
),
|
| 599 |
-
gr.update(value=None),
|
| 600 |
-
)
|
| 601 |
-
|
| 602 |
-
predict_btn.click(
|
| 603 |
-
fn=update_ui_with_predictions,
|
| 604 |
-
inputs=[year_input, semester_input],
|
| 605 |
-
outputs=[
|
| 606 |
-
summary_output,
|
| 607 |
-
all_predictions_output,
|
| 608 |
-
comparison_accordion,
|
| 609 |
-
comparison_info,
|
| 610 |
-
comparison_output,
|
| 611 |
-
],
|
| 612 |
-
)
|
| 613 |
-
|
| 614 |
|
| 615 |
# Launch the app
|
| 616 |
if __name__ == "__main__":
|
|
|
|
|
|
|
| 1 |
import logging
|
|
|
|
| 2 |
|
| 3 |
import gradio as gr
|
|
|
|
| 4 |
|
| 5 |
+
from backend import get_backend
|
| 6 |
+
from ui_components import (
|
| 7 |
+
build_data_info,
|
| 8 |
+
build_multi_year_summary,
|
| 9 |
+
build_prediction_summary,
|
| 10 |
+
get_forecast_placeholder,
|
| 11 |
+
get_prediction_placeholder,
|
| 12 |
+
)
|
| 13 |
from utils import setup_logging
|
| 14 |
|
| 15 |
setup_logging("INFO")
|
| 16 |
logger = logging.getLogger("GradioApp")
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
# Backend Interface
|
| 20 |
+
def get_data_info() -> str:
|
| 21 |
+
backend = get_backend()
|
| 22 |
+
data = backend.get_data_info()
|
| 23 |
+
return build_data_info(data)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
def generate_predictions(year: int, semester: int):
|
| 27 |
+
backend = get_backend()
|
| 28 |
+
result = backend.generate_predictions(year, semester)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
if result.error:
|
| 31 |
+
return f"Error: {result.error}", None, None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
+
summary_html = build_prediction_summary(result.summary_data)
|
| 34 |
+
return summary_html, result.predictions_df, result.comparison_df
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
def generate_multi_year_forecast(year: int, semester: int, years_ahead: int = 3):
|
| 38 |
+
backend = get_backend()
|
| 39 |
+
result = backend.generate_multi_year_forecast(year, semester, years_ahead)
|
| 40 |
|
| 41 |
+
if result.error:
|
| 42 |
+
return f"Error: {result.error}", None
|
| 43 |
+
|
| 44 |
+
summary_html = build_multi_year_summary(result.summary_data)
|
| 45 |
+
return summary_html, result.forecast_df
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
def update_ui_with_predictions(year: int, semester: int):
|
| 49 |
+
summary, all_predictions, comparison = generate_predictions(year, semester)
|
| 50 |
+
|
| 51 |
+
logger.info(
|
| 52 |
+
f"UI Update: comparison is None: {comparison is None}, "
|
| 53 |
+
f"empty: {comparison.empty if comparison is not None else 'N/A'}"
|
| 54 |
+
)
|
| 55 |
+
|
| 56 |
+
if comparison is not None and not comparison.empty:
|
| 57 |
+
logger.info(f"Showing comparison table with {len(comparison)} rows")
|
| 58 |
+
return (
|
| 59 |
+
summary,
|
| 60 |
+
all_predictions,
|
| 61 |
+
gr.update(open=True),
|
| 62 |
+
gr.update(
|
| 63 |
+
value=f"Validasi terhadap {len(comparison)} mata kuliah - "
|
| 64 |
+
"termasuk perbandingan jumlah kelas aktual vs prediksi"
|
| 65 |
+
),
|
| 66 |
+
gr.update(value=comparison),
|
| 67 |
+
)
|
| 68 |
+
else:
|
| 69 |
+
logger.info("Hiding comparison table - no data available")
|
| 70 |
+
return (
|
| 71 |
+
summary,
|
| 72 |
+
all_predictions,
|
| 73 |
+
gr.update(open=False),
|
| 74 |
+
gr.update(value="Tidak ada data validasi untuk prediksi masa depan"),
|
| 75 |
+
gr.update(value=None),
|
| 76 |
)
|
| 77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
+
# Gradio UI
|
| 80 |
+
def create_gradio_app() -> gr.Blocks:
|
| 81 |
+
"""Create and configure the Gradio application."""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
with gr.Blocks(title="SKS Enrollment Predictor") as demo:
|
| 84 |
+
# Header
|
| 85 |
+
gr.Markdown("# Course Enrollment & Class Capacity Predictor")
|
| 86 |
+
gr.Markdown(
|
| 87 |
+
"Sistem prediksi **jumlah kelas yang perlu dibuka** berdasarkan "
|
| 88 |
+
"forecasting enrollment dengan mempertimbangkan kapasitas maksimum per kelas."
|
| 89 |
+
)
|
| 90 |
+
|
| 91 |
+
with gr.Tabs():
|
| 92 |
+
# Single Year
|
| 93 |
+
with gr.TabItem("Prediksi Semester"):
|
| 94 |
+
with gr.Row():
|
| 95 |
+
with gr.Column(scale=1, min_width=300):
|
| 96 |
+
year_input = gr.Number(
|
| 97 |
+
label="Tahun",
|
| 98 |
+
value=2025,
|
| 99 |
+
precision=0,
|
| 100 |
+
minimum=2020,
|
| 101 |
+
maximum=2030,
|
| 102 |
+
)
|
| 103 |
+
|
| 104 |
+
semester_input = gr.Radio(
|
| 105 |
+
choices=[("1 (Ganjil)", 1), ("2 (Genap)", 2)],
|
| 106 |
+
label="Semester",
|
| 107 |
+
value=2,
|
| 108 |
+
)
|
| 109 |
+
|
| 110 |
+
predict_btn = gr.Button(
|
| 111 |
+
"Generate Predictions",
|
| 112 |
+
variant="primary",
|
| 113 |
+
size="lg",
|
| 114 |
+
)
|
| 115 |
+
|
| 116 |
+
gr.Markdown("---")
|
| 117 |
+
|
| 118 |
+
with gr.Accordion("Dataset Info", open=False):
|
| 119 |
+
data_info_output = gr.HTML()
|
| 120 |
+
demo.load(
|
| 121 |
+
fn=get_data_info, inputs=[], outputs=data_info_output
|
| 122 |
+
)
|
| 123 |
+
|
| 124 |
+
with gr.Column(scale=3):
|
| 125 |
+
summary_output = gr.HTML(value=get_prediction_placeholder())
|
| 126 |
+
|
| 127 |
+
gr.Markdown("---")
|
| 128 |
+
|
| 129 |
+
gr.Markdown("### Rekomendasi Jumlah Kelas per Mata Kuliah")
|
| 130 |
+
gr.Markdown(
|
| 131 |
+
"*Jumlah kelas dihitung berdasarkan prediksi enrollment ÷ "
|
| 132 |
+
"kapasitas per kelas*"
|
| 133 |
)
|
| 134 |
+
all_predictions_output = gr.Dataframe(
|
| 135 |
+
label="",
|
| 136 |
+
wrap=True,
|
| 137 |
+
interactive=False,
|
| 138 |
)
|
| 139 |
|
| 140 |
+
with gr.Accordion(
|
| 141 |
+
"Detail Validasi", open=False
|
| 142 |
+
) as comparison_accordion:
|
| 143 |
+
comparison_info = gr.Markdown(
|
| 144 |
+
value="Data validasi muncul ketika data aktual tersedia"
|
| 145 |
+
)
|
| 146 |
+
comparison_output = gr.Dataframe(
|
| 147 |
+
label="",
|
| 148 |
+
wrap=True,
|
| 149 |
+
interactive=False,
|
| 150 |
+
)
|
| 151 |
|
| 152 |
+
# Multi-Year Forecast
|
| 153 |
+
with gr.TabItem("Proyeksi Multi-Tahun"):
|
| 154 |
+
gr.Markdown("### Forecasting Kebutuhan Kelas Beberapa Tahun ke Depan")
|
| 155 |
+
gr.Markdown(
|
| 156 |
+
"Memprediksi tren jumlah mahasiswa dan kebutuhan kelas "
|
| 157 |
+
"untuk perencanaan jangka panjang."
|
| 158 |
)
|
| 159 |
+
|
| 160 |
+
with gr.Row():
|
| 161 |
+
with gr.Column(scale=1):
|
| 162 |
+
forecast_year = gr.Number(
|
| 163 |
+
label="Tahun Mulai",
|
| 164 |
+
value=2025,
|
| 165 |
+
precision=0,
|
| 166 |
+
minimum=2020,
|
| 167 |
+
maximum=2030,
|
| 168 |
+
)
|
| 169 |
+
|
| 170 |
+
forecast_semester = gr.Radio(
|
| 171 |
+
choices=[("1 (Ganjil)", 1), ("2 (Genap)", 2)],
|
| 172 |
+
label="Semester",
|
| 173 |
+
value=2,
|
| 174 |
+
)
|
| 175 |
+
|
| 176 |
+
forecast_years = gr.Slider(
|
| 177 |
+
label="Tahun ke Depan",
|
| 178 |
+
minimum=1,
|
| 179 |
+
maximum=5,
|
| 180 |
+
value=3,
|
| 181 |
+
step=1,
|
| 182 |
+
)
|
| 183 |
+
|
| 184 |
+
forecast_btn = gr.Button(
|
| 185 |
+
"Generate Forecast",
|
| 186 |
+
variant="primary",
|
| 187 |
+
size="lg",
|
| 188 |
+
)
|
| 189 |
+
|
| 190 |
+
with gr.Column(scale=3):
|
| 191 |
+
forecast_summary = gr.HTML(value=get_forecast_placeholder())
|
| 192 |
+
|
| 193 |
+
gr.Markdown("---")
|
| 194 |
+
gr.Markdown("### Detail Proyeksi per Mata Kuliah per Tahun")
|
| 195 |
+
forecast_table = gr.Dataframe(
|
| 196 |
+
label="",
|
| 197 |
+
wrap=True,
|
| 198 |
+
interactive=False,
|
| 199 |
)
|
|
|
|
|
|
|
| 200 |
|
| 201 |
+
predict_btn.click(
|
| 202 |
+
fn=update_ui_with_predictions,
|
| 203 |
+
inputs=[year_input, semester_input],
|
| 204 |
+
outputs=[
|
| 205 |
+
summary_output,
|
| 206 |
+
all_predictions_output,
|
| 207 |
+
comparison_accordion,
|
| 208 |
+
comparison_info,
|
| 209 |
+
comparison_output,
|
| 210 |
+
],
|
| 211 |
)
|
|
|
|
| 212 |
|
| 213 |
+
forecast_btn.click(
|
| 214 |
+
fn=generate_multi_year_forecast,
|
| 215 |
+
inputs=[forecast_year, forecast_semester, forecast_years],
|
| 216 |
+
outputs=[forecast_summary, forecast_table],
|
| 217 |
+
)
|
| 218 |
+
|
| 219 |
+
return demo
|
| 220 |
|
| 221 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 222 |
logger.info("Starting Gradio app...")
|
| 223 |
+
backend = get_backend()
|
| 224 |
+
init_success = backend.initialize()
|
| 225 |
|
| 226 |
if not init_success:
|
| 227 |
logger.error("Failed to initialize system. App may not work correctly.")
|
| 228 |
|
| 229 |
+
demo = create_gradio_app()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 230 |
|
| 231 |
# Launch the app
|
| 232 |
if __name__ == "__main__":
|
backend.py
ADDED
|
@@ -0,0 +1,674 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import logging
|
| 2 |
+
from dataclasses import dataclass
|
| 3 |
+
from typing import Dict, Optional, Tuple
|
| 4 |
+
|
| 5 |
+
import pandas as pd
|
| 6 |
+
|
| 7 |
+
from config import Config
|
| 8 |
+
from data_processor import DataProcessor
|
| 9 |
+
from evaluator import Evaluator
|
| 10 |
+
from prophet_predictor import ProphetPredictor
|
| 11 |
+
from utils import setup_logging
|
| 12 |
+
|
| 13 |
+
setup_logging("INFO")
|
| 14 |
+
logger = logging.getLogger("Backend")
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
@dataclass
|
| 18 |
+
class PredictionResult:
|
| 19 |
+
summary_data: Dict
|
| 20 |
+
predictions_df: pd.DataFrame
|
| 21 |
+
comparison_df: Optional[pd.DataFrame]
|
| 22 |
+
has_actual_data: bool
|
| 23 |
+
error: Optional[str] = None
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
@dataclass
|
| 27 |
+
class ForecastResult:
|
| 28 |
+
summary_data: Dict
|
| 29 |
+
forecast_df: pd.DataFrame
|
| 30 |
+
yearly_summary: pd.DataFrame
|
| 31 |
+
error: Optional[str] = None
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
class PredictionBackend:
|
| 35 |
+
def __init__(self):
|
| 36 |
+
self._processor: Optional[DataProcessor] = None
|
| 37 |
+
self._predictor: Optional[ProphetPredictor] = None
|
| 38 |
+
self._config: Optional[Config] = None
|
| 39 |
+
self._df_enrollment: Optional[pd.DataFrame] = None
|
| 40 |
+
self._elective_codes: Optional[set] = None
|
| 41 |
+
self._backtest_metrics: Optional[dict] = None
|
| 42 |
+
self._initialized: bool = False
|
| 43 |
+
|
| 44 |
+
@property
|
| 45 |
+
def is_initialized(self) -> bool:
|
| 46 |
+
return self._initialized
|
| 47 |
+
|
| 48 |
+
@property
|
| 49 |
+
def config(self) -> Optional[Config]:
|
| 50 |
+
return self._config
|
| 51 |
+
|
| 52 |
+
def initialize(self) -> bool:
|
| 53 |
+
try:
|
| 54 |
+
logger.info("Initializing prediction system...")
|
| 55 |
+
self._config = Config()
|
| 56 |
+
|
| 57 |
+
self._processor = DataProcessor(self._config)
|
| 58 |
+
self._df_enrollment, self._elective_codes = (
|
| 59 |
+
self._processor.load_and_process()
|
| 60 |
+
)
|
| 61 |
+
|
| 62 |
+
self._predictor = ProphetPredictor(self._config)
|
| 63 |
+
self._predictor.train_student_population_model(
|
| 64 |
+
self._processor.raw_data["students_yearly"]
|
| 65 |
+
)
|
| 66 |
+
|
| 67 |
+
self._initialized = True
|
| 68 |
+
logger.info("System initialized successfully")
|
| 69 |
+
return True
|
| 70 |
+
|
| 71 |
+
except Exception as e:
|
| 72 |
+
logger.error(f"Failed to initialize system: {e}", exc_info=True)
|
| 73 |
+
self._initialized = False
|
| 74 |
+
return False
|
| 75 |
+
|
| 76 |
+
def get_data_info(self) -> Dict:
|
| 77 |
+
if not self._initialized or self._processor is None or self._config is None:
|
| 78 |
+
return {"error": "System not initialized"}
|
| 79 |
+
|
| 80 |
+
try:
|
| 81 |
+
courses = self._processor.raw_data.get("courses")
|
| 82 |
+
students = self._processor.raw_data.get("students_yearly")
|
| 83 |
+
|
| 84 |
+
if courses is None or students is None:
|
| 85 |
+
return {"error": "Data not loaded"}
|
| 86 |
+
|
| 87 |
+
elective_courses = courses[courses["kategori_mk"] == "P"]
|
| 88 |
+
|
| 89 |
+
return {
|
| 90 |
+
"total_courses": len(courses),
|
| 91 |
+
"elective_courses": len(elective_courses),
|
| 92 |
+
"class_capacity": self._config.class_capacity.DEFAULT_CLASS_CAPACITY,
|
| 93 |
+
"year_min": int(students["thn"].min()),
|
| 94 |
+
"year_max": int(students["thn"].max()),
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
except Exception as e:
|
| 98 |
+
return {"error": str(e)}
|
| 99 |
+
|
| 100 |
+
def _run_backtest_if_needed(self) -> Dict:
|
| 101 |
+
if self._backtest_metrics is not None:
|
| 102 |
+
return self._backtest_metrics
|
| 103 |
+
|
| 104 |
+
if (
|
| 105 |
+
self._config is None
|
| 106 |
+
or self._df_enrollment is None
|
| 107 |
+
or self._predictor is None
|
| 108 |
+
):
|
| 109 |
+
logger.warning("System not initialized, using default metrics")
|
| 110 |
+
self._backtest_metrics = {"mae": 0, "rmse": 0}
|
| 111 |
+
return self._backtest_metrics
|
| 112 |
+
|
| 113 |
+
logger.info("Running backtest for the first time...")
|
| 114 |
+
evaluator = Evaluator(self._config)
|
| 115 |
+
backtest_results = evaluator.run_backtest(self._df_enrollment, self._predictor)
|
| 116 |
+
|
| 117 |
+
if backtest_results is None or len(backtest_results) == 0:
|
| 118 |
+
logger.warning("Backtest returned no results, using defaults")
|
| 119 |
+
self._backtest_metrics = {"mae": 0, "rmse": 0}
|
| 120 |
+
else:
|
| 121 |
+
metrics_result = evaluator.generate_metrics(backtest_results)
|
| 122 |
+
if metrics_result is None:
|
| 123 |
+
logger.warning("Metrics calculation failed, using defaults")
|
| 124 |
+
self._backtest_metrics = {"mae": 0, "rmse": 0}
|
| 125 |
+
else:
|
| 126 |
+
self._backtest_metrics = metrics_result
|
| 127 |
+
|
| 128 |
+
return self._backtest_metrics
|
| 129 |
+
|
| 130 |
+
def _get_actual_data(self, year: int, semester: int) -> Tuple[pd.DataFrame, bool]:
|
| 131 |
+
if self._df_enrollment is None:
|
| 132 |
+
return pd.DataFrame(), False
|
| 133 |
+
|
| 134 |
+
actual_data = self._df_enrollment[
|
| 135 |
+
(self._df_enrollment["thn"] == year)
|
| 136 |
+
& (self._df_enrollment["smt"] == semester)
|
| 137 |
+
]
|
| 138 |
+
return actual_data, len(actual_data) > 0
|
| 139 |
+
|
| 140 |
+
def _calculate_class_metrics(
|
| 141 |
+
self,
|
| 142 |
+
courses_with_actual: pd.DataFrame,
|
| 143 |
+
year: int,
|
| 144 |
+
semester: int,
|
| 145 |
+
) -> Dict:
|
| 146 |
+
if self._processor is None or self._config is None:
|
| 147 |
+
return {
|
| 148 |
+
"class_matches": 0,
|
| 149 |
+
"class_within_one": 0,
|
| 150 |
+
"total_for_class_accuracy": 0,
|
| 151 |
+
"class_accuracy_pct": 0,
|
| 152 |
+
"class_within_one_pct": 0,
|
| 153 |
+
"has_actual_class_data": False,
|
| 154 |
+
"data_source": "kalkulasi",
|
| 155 |
+
}
|
| 156 |
+
|
| 157 |
+
actual_classes_df = self._processor.get_class_count_for_validation(
|
| 158 |
+
year, semester
|
| 159 |
+
)
|
| 160 |
+
|
| 161 |
+
has_actual_class_data = False
|
| 162 |
+
courses_with_class_data: Optional[pd.DataFrame] = None
|
| 163 |
+
|
| 164 |
+
if len(actual_classes_df) > 0:
|
| 165 |
+
courses_with_actual = courses_with_actual.merge(
|
| 166 |
+
actual_classes_df, on="kode_mk", how="left"
|
| 167 |
+
)
|
| 168 |
+
has_actual_class_data = courses_with_actual["actual_classes"].notna().any()
|
| 169 |
+
|
| 170 |
+
if has_actual_class_data:
|
| 171 |
+
courses_with_class_data = courses_with_actual[
|
| 172 |
+
courses_with_actual["actual_classes"].notna()
|
| 173 |
+
].copy()
|
| 174 |
+
courses_with_class_data["actual_classes"] = courses_with_class_data[
|
| 175 |
+
"actual_classes"
|
| 176 |
+
].astype(int)
|
| 177 |
+
|
| 178 |
+
class_matches = (
|
| 179 |
+
courses_with_class_data["classes_needed"]
|
| 180 |
+
== courses_with_class_data["actual_classes"]
|
| 181 |
+
).sum()
|
| 182 |
+
total_for_class_accuracy = len(courses_with_class_data)
|
| 183 |
+
|
| 184 |
+
else:
|
| 185 |
+
config = self._config
|
| 186 |
+
courses_with_actual["actual_classes_calc"] = courses_with_actual.apply(
|
| 187 |
+
lambda row: config.calculate_classes_needed(
|
| 188 |
+
row["actual_enrollment"],
|
| 189 |
+
row["kode_mk"],
|
| 190 |
+
has_historical_data=True,
|
| 191 |
+
),
|
| 192 |
+
axis=1,
|
| 193 |
+
)
|
| 194 |
+
class_matches = (
|
| 195 |
+
courses_with_actual["classes_needed"]
|
| 196 |
+
== courses_with_actual["actual_classes_calc"]
|
| 197 |
+
).sum()
|
| 198 |
+
total_for_class_accuracy = len(courses_with_actual)
|
| 199 |
+
|
| 200 |
+
class_accuracy_pct = (
|
| 201 |
+
(class_matches / total_for_class_accuracy) * 100
|
| 202 |
+
if total_for_class_accuracy > 0
|
| 203 |
+
else 0
|
| 204 |
+
)
|
| 205 |
+
|
| 206 |
+
if has_actual_class_data and courses_with_class_data is not None:
|
| 207 |
+
class_within_one = (
|
| 208 |
+
abs(
|
| 209 |
+
courses_with_class_data["classes_needed"]
|
| 210 |
+
- courses_with_class_data["actual_classes"]
|
| 211 |
+
)
|
| 212 |
+
<= 1
|
| 213 |
+
).sum()
|
| 214 |
+
else:
|
| 215 |
+
class_within_one = (
|
| 216 |
+
abs(
|
| 217 |
+
courses_with_actual["classes_needed"]
|
| 218 |
+
- courses_with_actual["actual_classes_calc"]
|
| 219 |
+
)
|
| 220 |
+
<= 1
|
| 221 |
+
).sum()
|
| 222 |
+
|
| 223 |
+
class_within_one_pct = (
|
| 224 |
+
(class_within_one / total_for_class_accuracy) * 100
|
| 225 |
+
if total_for_class_accuracy > 0
|
| 226 |
+
else 0
|
| 227 |
+
)
|
| 228 |
+
|
| 229 |
+
return {
|
| 230 |
+
"class_matches": int(class_matches),
|
| 231 |
+
"class_within_one": int(class_within_one),
|
| 232 |
+
"total_for_class_accuracy": total_for_class_accuracy,
|
| 233 |
+
"class_accuracy_pct": class_accuracy_pct,
|
| 234 |
+
"class_within_one_pct": class_within_one_pct,
|
| 235 |
+
"has_actual_class_data": has_actual_class_data,
|
| 236 |
+
"data_source": "tabel2" if has_actual_class_data else "kalkulasi",
|
| 237 |
+
}
|
| 238 |
+
|
| 239 |
+
def _prepare_comparison_table(
|
| 240 |
+
self,
|
| 241 |
+
predictions: pd.DataFrame,
|
| 242 |
+
actual_data: pd.DataFrame,
|
| 243 |
+
year: int,
|
| 244 |
+
semester: int,
|
| 245 |
+
) -> Optional[pd.DataFrame]:
|
| 246 |
+
if self._processor is None or self._config is None:
|
| 247 |
+
return None
|
| 248 |
+
|
| 249 |
+
comparison = predictions.merge(
|
| 250 |
+
actual_data[["kode_mk", "enrollment"]], on="kode_mk", how="left"
|
| 251 |
+
)
|
| 252 |
+
comparison = comparison.rename(columns={"enrollment": "actual_enrollment"})
|
| 253 |
+
|
| 254 |
+
actual_classes_df = self._processor.get_class_count_for_validation(
|
| 255 |
+
year, semester
|
| 256 |
+
)
|
| 257 |
+
if len(actual_classes_df) > 0:
|
| 258 |
+
comparison = comparison.merge(actual_classes_df, on="kode_mk", how="left")
|
| 259 |
+
else:
|
| 260 |
+
comparison["actual_classes"] = None
|
| 261 |
+
|
| 262 |
+
courses_with_actual = comparison[comparison["actual_enrollment"].notna()].copy()
|
| 263 |
+
|
| 264 |
+
if len(courses_with_actual) == 0:
|
| 265 |
+
return None
|
| 266 |
+
|
| 267 |
+
courses_with_actual["error"] = (
|
| 268 |
+
courses_with_actual["predicted_enrollment"]
|
| 269 |
+
- courses_with_actual["actual_enrollment"]
|
| 270 |
+
)
|
| 271 |
+
courses_with_actual["abs_error"] = abs(courses_with_actual["error"])
|
| 272 |
+
courses_with_actual["accuracy_%"] = 100 * (
|
| 273 |
+
1
|
| 274 |
+
- courses_with_actual["abs_error"]
|
| 275 |
+
/ courses_with_actual["actual_enrollment"].replace(0, 1)
|
| 276 |
+
)
|
| 277 |
+
|
| 278 |
+
if (
|
| 279 |
+
"actual_classes" not in courses_with_actual.columns
|
| 280 |
+
or courses_with_actual["actual_classes"].isna().all()
|
| 281 |
+
):
|
| 282 |
+
config_ref = self._config
|
| 283 |
+
courses_with_actual["actual_classes"] = courses_with_actual.apply(
|
| 284 |
+
lambda row: config_ref.calculate_classes_needed(
|
| 285 |
+
row["actual_enrollment"],
|
| 286 |
+
row["kode_mk"],
|
| 287 |
+
has_historical_data=True,
|
| 288 |
+
),
|
| 289 |
+
axis=1,
|
| 290 |
+
)
|
| 291 |
+
else:
|
| 292 |
+
config_ref = self._config
|
| 293 |
+
courses_with_actual["actual_classes"] = courses_with_actual.apply(
|
| 294 |
+
lambda row: (
|
| 295 |
+
int(row["actual_classes"])
|
| 296 |
+
if pd.notna(row["actual_classes"])
|
| 297 |
+
else config_ref.calculate_classes_needed(
|
| 298 |
+
row["actual_enrollment"],
|
| 299 |
+
row["kode_mk"],
|
| 300 |
+
has_historical_data=True,
|
| 301 |
+
)
|
| 302 |
+
),
|
| 303 |
+
axis=1,
|
| 304 |
+
)
|
| 305 |
+
|
| 306 |
+
courses_with_actual["class_diff"] = (
|
| 307 |
+
courses_with_actual["classes_needed"]
|
| 308 |
+
- courses_with_actual["actual_classes"]
|
| 309 |
+
)
|
| 310 |
+
|
| 311 |
+
comparison_display = courses_with_actual[
|
| 312 |
+
[
|
| 313 |
+
"kode_mk",
|
| 314 |
+
"nama_mk",
|
| 315 |
+
"actual_enrollment",
|
| 316 |
+
"predicted_enrollment",
|
| 317 |
+
"actual_classes",
|
| 318 |
+
"classes_needed",
|
| 319 |
+
"class_diff",
|
| 320 |
+
"error",
|
| 321 |
+
"accuracy_%",
|
| 322 |
+
"strategy",
|
| 323 |
+
]
|
| 324 |
+
].copy()
|
| 325 |
+
|
| 326 |
+
comparison_display.columns = [
|
| 327 |
+
"Kode MK",
|
| 328 |
+
"Nama MK",
|
| 329 |
+
"Aktual",
|
| 330 |
+
"Prediksi",
|
| 331 |
+
"Kelas Aktual",
|
| 332 |
+
"Kelas Prediksi",
|
| 333 |
+
"Selisih Kelas",
|
| 334 |
+
"Error",
|
| 335 |
+
"Akurasi %",
|
| 336 |
+
"Strategy",
|
| 337 |
+
]
|
| 338 |
+
|
| 339 |
+
comparison_display["Aktual"] = comparison_display["Aktual"].astype(int)
|
| 340 |
+
comparison_display["Prediksi"] = comparison_display["Prediksi"].round(1)
|
| 341 |
+
comparison_display["Error"] = comparison_display["Error"].round(1)
|
| 342 |
+
comparison_display["Akurasi %"] = comparison_display["Akurasi %"].round(1)
|
| 343 |
+
comparison_display["Kelas Aktual"] = comparison_display["Kelas Aktual"].astype(
|
| 344 |
+
int
|
| 345 |
+
)
|
| 346 |
+
comparison_display["Kelas Prediksi"] = comparison_display[
|
| 347 |
+
"Kelas Prediksi"
|
| 348 |
+
].astype(int)
|
| 349 |
+
comparison_display["Selisih Kelas"] = comparison_display[
|
| 350 |
+
"Selisih Kelas"
|
| 351 |
+
].astype(int)
|
| 352 |
+
|
| 353 |
+
return comparison_display.sort_values("Aktual", ascending=False)
|
| 354 |
+
|
| 355 |
+
def _prepare_predictions_display(self, predictions: pd.DataFrame) -> pd.DataFrame:
|
| 356 |
+
"""Prepare predictions dataframe for display."""
|
| 357 |
+
display_df = predictions[
|
| 358 |
+
[
|
| 359 |
+
"kode_mk",
|
| 360 |
+
"nama_mk",
|
| 361 |
+
"predicted_enrollment",
|
| 362 |
+
"classes_needed",
|
| 363 |
+
"class_capacity",
|
| 364 |
+
"total_quota",
|
| 365 |
+
"utilization_pct",
|
| 366 |
+
"recommendation",
|
| 367 |
+
"confidence",
|
| 368 |
+
"strategy",
|
| 369 |
+
]
|
| 370 |
+
].copy()
|
| 371 |
+
|
| 372 |
+
display_df.columns = [
|
| 373 |
+
"Kode MK",
|
| 374 |
+
"Nama MK",
|
| 375 |
+
"Prediksi",
|
| 376 |
+
"Jumlah Kelas",
|
| 377 |
+
"Kapasitas/Kelas",
|
| 378 |
+
"Total Kuota",
|
| 379 |
+
"Utilization %",
|
| 380 |
+
"Status",
|
| 381 |
+
"Confidence",
|
| 382 |
+
"Strategy",
|
| 383 |
+
]
|
| 384 |
+
|
| 385 |
+
display_df["Prediksi"] = display_df["Prediksi"].round(1)
|
| 386 |
+
display_df["Jumlah Kelas"] = display_df["Jumlah Kelas"].astype(int)
|
| 387 |
+
display_df["Total Kuota"] = display_df["Total Kuota"].astype(int)
|
| 388 |
+
|
| 389 |
+
display_df["Status"] = display_df["Status"].map(
|
| 390 |
+
{"BUKA": "BUKA", "TUTUP": "TUTUP"}
|
| 391 |
+
)
|
| 392 |
+
|
| 393 |
+
display_df = display_df[display_df["Confidence"] == "high"]
|
| 394 |
+
display_df = display_df[display_df["Status"] == "BUKA"]
|
| 395 |
+
|
| 396 |
+
display_df = display_df.sort_values("Prediksi", ascending=False)
|
| 397 |
+
display_df = display_df.drop(columns=["Confidence", "Status"])
|
| 398 |
+
|
| 399 |
+
return display_df
|
| 400 |
+
|
| 401 |
+
def generate_predictions(self, year: int, semester: int) -> PredictionResult:
|
| 402 |
+
if semester not in [1, 2]:
|
| 403 |
+
return PredictionResult(
|
| 404 |
+
summary_data={},
|
| 405 |
+
predictions_df=pd.DataFrame(),
|
| 406 |
+
comparison_df=None,
|
| 407 |
+
has_actual_data=False,
|
| 408 |
+
error="Semester harus 1 (Ganjil) atau 2 (Genap)",
|
| 409 |
+
)
|
| 410 |
+
|
| 411 |
+
if year < 2020 or year > 2030:
|
| 412 |
+
return PredictionResult(
|
| 413 |
+
summary_data={},
|
| 414 |
+
predictions_df=pd.DataFrame(),
|
| 415 |
+
comparison_df=None,
|
| 416 |
+
has_actual_data=False,
|
| 417 |
+
error="Year must be between 2020 and 2030",
|
| 418 |
+
)
|
| 419 |
+
|
| 420 |
+
if not self._initialized:
|
| 421 |
+
return PredictionResult(
|
| 422 |
+
summary_data={},
|
| 423 |
+
predictions_df=pd.DataFrame(),
|
| 424 |
+
comparison_df=None,
|
| 425 |
+
has_actual_data=False,
|
| 426 |
+
error="System not initialized. Please restart the app.",
|
| 427 |
+
)
|
| 428 |
+
|
| 429 |
+
try:
|
| 430 |
+
logger.info(f"Generating predictions for {year} Semester {semester}...")
|
| 431 |
+
|
| 432 |
+
assert self._config is not None
|
| 433 |
+
assert self._predictor is not None
|
| 434 |
+
assert self._processor is not None
|
| 435 |
+
assert self._df_enrollment is not None
|
| 436 |
+
assert self._elective_codes is not None
|
| 437 |
+
|
| 438 |
+
self._config.prediction.PREDICT_YEAR = year
|
| 439 |
+
self._config.prediction.PREDICT_SEMESTER = semester
|
| 440 |
+
|
| 441 |
+
actual_data, has_actual_data = self._get_actual_data(year, semester)
|
| 442 |
+
|
| 443 |
+
if has_actual_data:
|
| 444 |
+
logger.info(
|
| 445 |
+
f"Found actual enrollment data for {year} Semester {semester}"
|
| 446 |
+
)
|
| 447 |
+
else:
|
| 448 |
+
logger.info(f"No actual data for {year} Semester {semester}")
|
| 449 |
+
|
| 450 |
+
metrics = self._run_backtest_if_needed()
|
| 451 |
+
|
| 452 |
+
predictions = self._predictor.generate_batch_predictions(
|
| 453 |
+
self._df_enrollment,
|
| 454 |
+
self._processor.raw_data["courses"],
|
| 455 |
+
self._elective_codes,
|
| 456 |
+
year,
|
| 457 |
+
semester,
|
| 458 |
+
)
|
| 459 |
+
|
| 460 |
+
open_courses = predictions[predictions["recommendation"] == "BUKA"]
|
| 461 |
+
total_to_open = len(open_courses)
|
| 462 |
+
total_classes = int(open_courses["classes_needed"].sum())
|
| 463 |
+
total_predicted_students = int(open_courses["predicted_enrollment"].sum())
|
| 464 |
+
total_capacity = int(open_courses["total_quota"].sum())
|
| 465 |
+
class_capacity = self._config.class_capacity.DEFAULT_CLASS_CAPACITY
|
| 466 |
+
|
| 467 |
+
summary_data = {
|
| 468 |
+
"year": year,
|
| 469 |
+
"semester": semester,
|
| 470 |
+
"semester_name": "1 (Ganjil)" if semester == 1 else "2 (Genap)",
|
| 471 |
+
"total_to_open": total_to_open,
|
| 472 |
+
"total_classes": total_classes,
|
| 473 |
+
"total_predicted_students": total_predicted_students,
|
| 474 |
+
"total_capacity": total_capacity,
|
| 475 |
+
"class_capacity": class_capacity,
|
| 476 |
+
"metrics": metrics,
|
| 477 |
+
"has_actual_data": has_actual_data,
|
| 478 |
+
}
|
| 479 |
+
|
| 480 |
+
comparison_df = None
|
| 481 |
+
if has_actual_data:
|
| 482 |
+
comparison = predictions.merge(
|
| 483 |
+
actual_data[["kode_mk", "enrollment"]], on="kode_mk", how="left"
|
| 484 |
+
)
|
| 485 |
+
comparison = comparison.rename(
|
| 486 |
+
columns={"enrollment": "actual_enrollment"}
|
| 487 |
+
)
|
| 488 |
+
|
| 489 |
+
courses_with_actual = comparison[
|
| 490 |
+
comparison["actual_enrollment"].notna()
|
| 491 |
+
].copy()
|
| 492 |
+
|
| 493 |
+
if len(courses_with_actual) > 0:
|
| 494 |
+
comparison_mae = abs(
|
| 495 |
+
courses_with_actual["predicted_enrollment"]
|
| 496 |
+
- courses_with_actual["actual_enrollment"]
|
| 497 |
+
).mean()
|
| 498 |
+
comparison_rmse = (
|
| 499 |
+
(
|
| 500 |
+
courses_with_actual["predicted_enrollment"]
|
| 501 |
+
- courses_with_actual["actual_enrollment"]
|
| 502 |
+
)
|
| 503 |
+
** 2
|
| 504 |
+
).mean() ** 0.5
|
| 505 |
+
|
| 506 |
+
total_actual = courses_with_actual["actual_enrollment"].sum()
|
| 507 |
+
total_predicted = courses_with_actual["predicted_enrollment"].sum()
|
| 508 |
+
accuracy_pct = (
|
| 509 |
+
1 - abs(total_predicted - total_actual) / total_actual
|
| 510 |
+
) * 100
|
| 511 |
+
|
| 512 |
+
class_metrics = self._calculate_class_metrics(
|
| 513 |
+
courses_with_actual.copy(), year, semester
|
| 514 |
+
)
|
| 515 |
+
|
| 516 |
+
summary_data.update(
|
| 517 |
+
{
|
| 518 |
+
"comparison_mae": comparison_mae,
|
| 519 |
+
"comparison_rmse": comparison_rmse,
|
| 520 |
+
"total_actual": total_actual,
|
| 521 |
+
"total_predicted": total_predicted,
|
| 522 |
+
"accuracy_pct": accuracy_pct,
|
| 523 |
+
**class_metrics,
|
| 524 |
+
}
|
| 525 |
+
)
|
| 526 |
+
|
| 527 |
+
comparison_df = self._prepare_comparison_table(
|
| 528 |
+
predictions, actual_data, year, semester
|
| 529 |
+
)
|
| 530 |
+
|
| 531 |
+
predictions_display = self._prepare_predictions_display(predictions)
|
| 532 |
+
|
| 533 |
+
return PredictionResult(
|
| 534 |
+
summary_data=summary_data,
|
| 535 |
+
predictions_df=predictions_display,
|
| 536 |
+
comparison_df=comparison_df,
|
| 537 |
+
has_actual_data=has_actual_data,
|
| 538 |
+
)
|
| 539 |
+
|
| 540 |
+
except Exception as e:
|
| 541 |
+
logger.error(f"Error generating predictions: {e}", exc_info=True)
|
| 542 |
+
return PredictionResult(
|
| 543 |
+
summary_data={},
|
| 544 |
+
predictions_df=pd.DataFrame(),
|
| 545 |
+
comparison_df=None,
|
| 546 |
+
has_actual_data=False,
|
| 547 |
+
error=str(e),
|
| 548 |
+
)
|
| 549 |
+
|
| 550 |
+
def generate_multi_year_forecast(
|
| 551 |
+
self, year: int, semester: int, years_ahead: int = 3
|
| 552 |
+
) -> ForecastResult:
|
| 553 |
+
if not self._initialized:
|
| 554 |
+
return ForecastResult(
|
| 555 |
+
summary_data={},
|
| 556 |
+
forecast_df=pd.DataFrame(),
|
| 557 |
+
yearly_summary=pd.DataFrame(),
|
| 558 |
+
error="System not initialized.",
|
| 559 |
+
)
|
| 560 |
+
|
| 561 |
+
try:
|
| 562 |
+
logger.info(f"Generating {years_ahead}-year forecast from {year}...")
|
| 563 |
+
|
| 564 |
+
assert self._config is not None
|
| 565 |
+
assert self._predictor is not None
|
| 566 |
+
assert self._processor is not None
|
| 567 |
+
assert self._df_enrollment is not None
|
| 568 |
+
assert self._elective_codes is not None
|
| 569 |
+
|
| 570 |
+
forecast_df = self._predictor.generate_multi_year_forecast(
|
| 571 |
+
self._df_enrollment,
|
| 572 |
+
self._processor.raw_data["courses"],
|
| 573 |
+
self._elective_codes,
|
| 574 |
+
year,
|
| 575 |
+
semester,
|
| 576 |
+
years_ahead,
|
| 577 |
+
)
|
| 578 |
+
|
| 579 |
+
if forecast_df.empty:
|
| 580 |
+
return ForecastResult(
|
| 581 |
+
summary_data={},
|
| 582 |
+
forecast_df=pd.DataFrame(),
|
| 583 |
+
yearly_summary=pd.DataFrame(),
|
| 584 |
+
error="Tidak ada data untuk forecast.",
|
| 585 |
+
)
|
| 586 |
+
|
| 587 |
+
yearly_summary = (
|
| 588 |
+
forecast_df.groupby("year")
|
| 589 |
+
.agg(
|
| 590 |
+
{
|
| 591 |
+
"predicted_enrollment": "sum",
|
| 592 |
+
"classes_needed": "sum",
|
| 593 |
+
"total_capacity": "sum",
|
| 594 |
+
"kode_mk": "count",
|
| 595 |
+
}
|
| 596 |
+
)
|
| 597 |
+
.reset_index()
|
| 598 |
+
)
|
| 599 |
+
yearly_summary.columns = [
|
| 600 |
+
"Tahun",
|
| 601 |
+
"Total Prediksi",
|
| 602 |
+
"Total Kelas",
|
| 603 |
+
"Total Kapasitas",
|
| 604 |
+
"Jumlah MK",
|
| 605 |
+
]
|
| 606 |
+
|
| 607 |
+
class_capacity = self._config.class_capacity.DEFAULT_CLASS_CAPACITY
|
| 608 |
+
semester_name = "Ganjil" if semester == 1 else "Genap"
|
| 609 |
+
|
| 610 |
+
first_year = yearly_summary.iloc[0]
|
| 611 |
+
last_year = yearly_summary.iloc[-1]
|
| 612 |
+
growth_classes = int(last_year["Total Kelas"] - first_year["Total Kelas"])
|
| 613 |
+
growth_students = int(
|
| 614 |
+
last_year["Total Prediksi"] - first_year["Total Prediksi"]
|
| 615 |
+
)
|
| 616 |
+
|
| 617 |
+
summary_data = {
|
| 618 |
+
"year": year,
|
| 619 |
+
"semester": semester,
|
| 620 |
+
"semester_name": semester_name,
|
| 621 |
+
"years_ahead": years_ahead,
|
| 622 |
+
"class_capacity": class_capacity,
|
| 623 |
+
"first_year_classes": int(first_year["Total Kelas"]),
|
| 624 |
+
"last_year_classes": int(last_year["Total Kelas"]),
|
| 625 |
+
"growth_classes": growth_classes,
|
| 626 |
+
"growth_students": growth_students,
|
| 627 |
+
}
|
| 628 |
+
|
| 629 |
+
display_df = forecast_df[
|
| 630 |
+
[
|
| 631 |
+
"year",
|
| 632 |
+
"kode_mk",
|
| 633 |
+
"nama_mk",
|
| 634 |
+
"predicted_enrollment",
|
| 635 |
+
"classes_needed",
|
| 636 |
+
"total_capacity",
|
| 637 |
+
]
|
| 638 |
+
].copy()
|
| 639 |
+
display_df.columns = [
|
| 640 |
+
"Tahun",
|
| 641 |
+
"Kode MK",
|
| 642 |
+
"Nama MK",
|
| 643 |
+
"Prediksi",
|
| 644 |
+
"Kelas",
|
| 645 |
+
"Kapasitas",
|
| 646 |
+
]
|
| 647 |
+
display_df["Prediksi"] = display_df["Prediksi"].round(0).astype(int)
|
| 648 |
+
display_df = display_df.sort_values(["Kode MK", "Tahun"])
|
| 649 |
+
|
| 650 |
+
return ForecastResult(
|
| 651 |
+
summary_data=summary_data,
|
| 652 |
+
forecast_df=display_df,
|
| 653 |
+
yearly_summary=yearly_summary,
|
| 654 |
+
)
|
| 655 |
+
|
| 656 |
+
except Exception as e:
|
| 657 |
+
logger.error(f"Error generating forecast: {e}", exc_info=True)
|
| 658 |
+
return ForecastResult(
|
| 659 |
+
summary_data={},
|
| 660 |
+
forecast_df=pd.DataFrame(),
|
| 661 |
+
yearly_summary=pd.DataFrame(),
|
| 662 |
+
error=str(e),
|
| 663 |
+
)
|
| 664 |
+
|
| 665 |
+
|
| 666 |
+
_backend_instance: Optional[PredictionBackend] = None
|
| 667 |
+
|
| 668 |
+
|
| 669 |
+
def get_backend() -> PredictionBackend:
|
| 670 |
+
"""Get the singleton backend instance."""
|
| 671 |
+
global _backend_instance
|
| 672 |
+
if _backend_instance is None:
|
| 673 |
+
_backend_instance = PredictionBackend()
|
| 674 |
+
return _backend_instance
|
config.py
CHANGED
|
@@ -1,29 +1,21 @@
|
|
|
|
|
| 1 |
from dataclasses import dataclass, field
|
| 2 |
from typing import Dict, List
|
| 3 |
-
import os
|
| 4 |
|
| 5 |
-
# Import data loader for private HF dataset support
|
| 6 |
try:
|
| 7 |
from data_loader import load_data_file
|
|
|
|
| 8 |
DATA_LOADER_AVAILABLE = True
|
| 9 |
except ImportError:
|
| 10 |
DATA_LOADER_AVAILABLE = False
|
|
|
|
| 11 |
def load_data_file() -> str:
|
| 12 |
-
"""Fallback if data_loader not available."""
|
| 13 |
return "data/optimized_data.xlsx"
|
| 14 |
|
| 15 |
|
| 16 |
def _get_data_file_path() -> str:
|
| 17 |
-
"""
|
| 18 |
-
Get data file path based on environment.
|
| 19 |
-
|
| 20 |
-
Priority:
|
| 21 |
-
1. If HF_TOKEN set: Load from private HF dataset (muhalwan/optimized_data_mhs)
|
| 22 |
-
2. If DEMO_MODE=true: Use demo_data.xlsx (anonymized)
|
| 23 |
-
3. Otherwise: Use local optimized_data.xlsx
|
| 24 |
-
"""
|
| 25 |
if os.getenv("HF_TOKEN"):
|
| 26 |
-
return load_data_file()
|
| 27 |
elif os.getenv("DEMO_MODE", "false").lower() == "true":
|
| 28 |
return "data/demo_data.xlsx"
|
| 29 |
else:
|
|
@@ -32,12 +24,8 @@ def _get_data_file_path() -> str:
|
|
| 32 |
|
| 33 |
@dataclass
|
| 34 |
class DataConfig:
|
| 35 |
-
"""Data source configuration and validation rules."""
|
| 36 |
-
|
| 37 |
-
# Data file path - automatically determined based on environment
|
| 38 |
FILE_PATH: str = field(default_factory=_get_data_file_path)
|
| 39 |
|
| 40 |
-
# Sheet mappings
|
| 41 |
SHEET_COURSES: str = "tabel1_data_matkul"
|
| 42 |
SHEET_OFFERINGS: str = "tabel2_data_matkul_dibuka"
|
| 43 |
SHEET_STUDENTS_YEARLY: str = "tabel3_data_mahasiswa_per_tahun"
|
|
@@ -48,20 +36,55 @@ class DataConfig:
|
|
| 48 |
default_factory=lambda: {"tahun": "thn", "semester": "smt"}
|
| 49 |
)
|
| 50 |
|
| 51 |
-
# Elective Course Identification
|
| 52 |
-
# IMPORTANT: Elective courses are identified by kategori_mk = 'P' in tabel1
|
| 53 |
-
# Mandatory/Required courses have kategori_mk = 'W'
|
| 54 |
ELECTIVE_CATEGORY: str = "P"
|
| 55 |
MANDATORY_CATEGORY: str = "W"
|
| 56 |
|
| 57 |
-
# Valid category values (will be normalized to uppercase)
|
| 58 |
VALID_CATEGORIES: List[str] = field(default_factory=lambda: ["P", "W"])
|
| 59 |
|
| 60 |
|
| 61 |
@dataclass
|
| 62 |
-
class
|
| 63 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
|
|
|
|
|
|
|
|
|
| 65 |
# Prophet Hyperparameters
|
| 66 |
GROWTH_MODE: str = "logistic"
|
| 67 |
CHANGEPOINT_SCALE: float = 0.01
|
|
@@ -75,6 +98,11 @@ class ModelConfig:
|
|
| 75 |
# Minimum historical data points required for reliable prediction
|
| 76 |
MIN_HISTORY_POINTS: int = 3
|
| 77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
@dataclass
|
| 80 |
class PredictionConfig:
|
|
@@ -83,7 +111,6 @@ class PredictionConfig:
|
|
| 83 |
PREDICT_YEAR: int = 2025
|
| 84 |
PREDICT_SEMESTER: int = 2
|
| 85 |
|
| 86 |
-
# Buffer Calculations
|
| 87 |
BUFFER_PERCENT: float = 0.20
|
| 88 |
MIN_QUOTA_OPEN: int = 25
|
| 89 |
MIN_PREDICT_THRESHOLD: int = 15
|
|
@@ -101,8 +128,6 @@ class PredictionConfig:
|
|
| 101 |
|
| 102 |
@dataclass
|
| 103 |
class OutputConfig:
|
| 104 |
-
"""Output settings."""
|
| 105 |
-
|
| 106 |
OUTPUT_DIR: str = "output"
|
| 107 |
LOG_LEVEL: str = "INFO"
|
| 108 |
TOP_N_DISPLAY: int = 30
|
|
@@ -110,8 +135,6 @@ class OutputConfig:
|
|
| 110 |
|
| 111 |
@dataclass
|
| 112 |
class BacktestConfig:
|
| 113 |
-
"""Backtest settings and validation."""
|
| 114 |
-
|
| 115 |
START_YEAR: int = 2010
|
| 116 |
END_YEAR: int = 2024
|
| 117 |
VERBOSE: bool = True
|
|
@@ -123,48 +146,65 @@ class BacktestConfig:
|
|
| 123 |
|
| 124 |
|
| 125 |
class Config:
|
| 126 |
-
"""
|
| 127 |
-
Master Config Object.
|
| 128 |
-
|
| 129 |
-
ELECTIVE COURSE DEFINITION:
|
| 130 |
-
---------------------------
|
| 131 |
-
Elective courses are identified by kategori_mk = 'P' in tabel1_data_matkul.
|
| 132 |
-
This is the ONLY source of truth for course categories.
|
| 133 |
-
|
| 134 |
-
Examples of elective courses (kategori_mk = 'P'):
|
| 135 |
-
- EF234607: Keamanan Aplikasi
|
| 136 |
-
- EF234613: Game Edukasi dan Simulasi
|
| 137 |
-
- UG234922: Kebudayaan dan Kebangsaan
|
| 138 |
-
- IW184301: Sistem Basis Data
|
| 139 |
-
- KI series: Various computer science electives
|
| 140 |
-
|
| 141 |
-
Mandatory courses have kategori_mk = 'W' (Wajib).
|
| 142 |
-
|
| 143 |
-
DATA REQUIREMENTS FOR BACKTESTING:
|
| 144 |
-
-----------------------------------
|
| 145 |
-
To backtest a semester, you need:
|
| 146 |
-
1. Course catalog (tabel1) with kategori_mk properly set
|
| 147 |
-
2. ACTUAL student enrollments (tabel4) for that semester
|
| 148 |
-
3. At least one elective course with enrollments
|
| 149 |
-
|
| 150 |
-
Note: Course offerings (tabel2) alone are NOT sufficient for backtesting.
|
| 151 |
-
You must have actual enrollment data (tabel4) to validate predictions.
|
| 152 |
-
"""
|
| 153 |
-
|
| 154 |
def __init__(self):
|
| 155 |
self.data: DataConfig = DataConfig()
|
| 156 |
self.model: ModelConfig = ModelConfig()
|
| 157 |
self.prediction: PredictionConfig = PredictionConfig()
|
| 158 |
self.output: OutputConfig = OutputConfig()
|
| 159 |
self.backtest: BacktestConfig = BacktestConfig()
|
|
|
|
|
|
|
| 160 |
|
| 161 |
def get_prediction_target_name(self) -> str:
|
| 162 |
sem = "Ganjil" if self.prediction.PREDICT_SEMESTER == 1 else "Genap"
|
| 163 |
return f"{self.prediction.PREDICT_YEAR} Semester {sem}"
|
| 164 |
|
| 165 |
def get_elective_filter_description(self) -> str:
|
| 166 |
-
"""Get human-readable description of elective identification."""
|
| 167 |
return f"kategori_mk = '{self.data.ELECTIVE_CATEGORY}' in {self.data.SHEET_COURSES}"
|
| 168 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 169 |
|
| 170 |
default_config = Config()
|
|
|
|
| 1 |
+
import os
|
| 2 |
from dataclasses import dataclass, field
|
| 3 |
from typing import Dict, List
|
|
|
|
| 4 |
|
|
|
|
| 5 |
try:
|
| 6 |
from data_loader import load_data_file
|
| 7 |
+
|
| 8 |
DATA_LOADER_AVAILABLE = True
|
| 9 |
except ImportError:
|
| 10 |
DATA_LOADER_AVAILABLE = False
|
| 11 |
+
|
| 12 |
def load_data_file() -> str:
|
|
|
|
| 13 |
return "data/optimized_data.xlsx"
|
| 14 |
|
| 15 |
|
| 16 |
def _get_data_file_path() -> str:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
if os.getenv("HF_TOKEN"):
|
| 18 |
+
return load_data_file()
|
| 19 |
elif os.getenv("DEMO_MODE", "false").lower() == "true":
|
| 20 |
return "data/demo_data.xlsx"
|
| 21 |
else:
|
|
|
|
| 24 |
|
| 25 |
@dataclass
|
| 26 |
class DataConfig:
|
|
|
|
|
|
|
|
|
|
| 27 |
FILE_PATH: str = field(default_factory=_get_data_file_path)
|
| 28 |
|
|
|
|
| 29 |
SHEET_COURSES: str = "tabel1_data_matkul"
|
| 30 |
SHEET_OFFERINGS: str = "tabel2_data_matkul_dibuka"
|
| 31 |
SHEET_STUDENTS_YEARLY: str = "tabel3_data_mahasiswa_per_tahun"
|
|
|
|
| 36 |
default_factory=lambda: {"tahun": "thn", "semester": "smt"}
|
| 37 |
)
|
| 38 |
|
|
|
|
|
|
|
|
|
|
| 39 |
ELECTIVE_CATEGORY: str = "P"
|
| 40 |
MANDATORY_CATEGORY: str = "W"
|
| 41 |
|
|
|
|
| 42 |
VALID_CATEGORIES: List[str] = field(default_factory=lambda: ["P", "W"])
|
| 43 |
|
| 44 |
|
| 45 |
@dataclass
|
| 46 |
+
class ClassCapacityConfig:
|
| 47 |
+
# Default maximum students per class
|
| 48 |
+
DEFAULT_CLASS_CAPACITY: int = 50
|
| 49 |
+
|
| 50 |
+
# Minimum students required to open a class
|
| 51 |
+
MIN_STUDENTS_TO_OPEN_CLASS: int = 1
|
| 52 |
+
|
| 53 |
+
# Threshold for opening additional classes
|
| 54 |
+
ADDITIONAL_CLASS_THRESHOLD: float = 0.7
|
| 55 |
+
|
| 56 |
+
# Always open at least 1 class if there's any historical enrollment
|
| 57 |
+
OPEN_CLASS_IF_HAS_HISTORY: bool = True
|
| 58 |
+
|
| 59 |
+
# Course-specific capacity overrides (kode_mk -> max_capacity)
|
| 60 |
+
COURSE_CAPACITY_OVERRIDES: Dict[str, int] = field(default_factory=dict)
|
| 61 |
+
|
| 62 |
+
# Warning threshold - if predicted > capacity * threshold, warn about capacity
|
| 63 |
+
CAPACITY_WARNING_THRESHOLD: float = 0.8
|
| 64 |
+
|
| 65 |
+
# Enable capacity-aware prediction
|
| 66 |
+
# When True, predictions will be bounded by realistic capacity constraints
|
| 67 |
+
ENABLE_CAPACITY_CONSTRAINTS: bool = True
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
@dataclass
|
| 71 |
+
class MultiYearForecastConfig:
|
| 72 |
+
# How many years ahead to forecast
|
| 73 |
+
FORECAST_YEARS_AHEAD: int = 3
|
| 74 |
+
|
| 75 |
+
# Include trend analysis in output
|
| 76 |
+
SHOW_TREND_ANALYSIS: bool = True
|
| 77 |
+
|
| 78 |
+
# Confidence interval for forecasts (0-1)
|
| 79 |
+
CONFIDENCE_INTERVAL: float = 0.95
|
| 80 |
+
|
| 81 |
+
# Growth rate limits for sanity checking
|
| 82 |
+
MAX_YEARLY_GROWTH_RATE: float = 0.5 # 50% max growth per year
|
| 83 |
+
MIN_YEARLY_GROWTH_RATE: float = -0.3 # 30% max decline per year
|
| 84 |
|
| 85 |
+
|
| 86 |
+
@dataclass
|
| 87 |
+
class ModelConfig:
|
| 88 |
# Prophet Hyperparameters
|
| 89 |
GROWTH_MODE: str = "logistic"
|
| 90 |
CHANGEPOINT_SCALE: float = 0.01
|
|
|
|
| 98 |
# Minimum historical data points required for reliable prediction
|
| 99 |
MIN_HISTORY_POINTS: int = 3
|
| 100 |
|
| 101 |
+
# Use student population as regressor
|
| 102 |
+
USE_POPULATION_REGRESSOR: bool = True
|
| 103 |
+
# Use capacity as upper bound (cap in logistic growth)
|
| 104 |
+
USE_CAPACITY_AS_CAP: bool = True
|
| 105 |
+
|
| 106 |
|
| 107 |
@dataclass
|
| 108 |
class PredictionConfig:
|
|
|
|
| 111 |
PREDICT_YEAR: int = 2025
|
| 112 |
PREDICT_SEMESTER: int = 2
|
| 113 |
|
|
|
|
| 114 |
BUFFER_PERCENT: float = 0.20
|
| 115 |
MIN_QUOTA_OPEN: int = 25
|
| 116 |
MIN_PREDICT_THRESHOLD: int = 15
|
|
|
|
| 128 |
|
| 129 |
@dataclass
|
| 130 |
class OutputConfig:
|
|
|
|
|
|
|
| 131 |
OUTPUT_DIR: str = "output"
|
| 132 |
LOG_LEVEL: str = "INFO"
|
| 133 |
TOP_N_DISPLAY: int = 30
|
|
|
|
| 135 |
|
| 136 |
@dataclass
|
| 137 |
class BacktestConfig:
|
|
|
|
|
|
|
| 138 |
START_YEAR: int = 2010
|
| 139 |
END_YEAR: int = 2024
|
| 140 |
VERBOSE: bool = True
|
|
|
|
| 146 |
|
| 147 |
|
| 148 |
class Config:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149 |
def __init__(self):
|
| 150 |
self.data: DataConfig = DataConfig()
|
| 151 |
self.model: ModelConfig = ModelConfig()
|
| 152 |
self.prediction: PredictionConfig = PredictionConfig()
|
| 153 |
self.output: OutputConfig = OutputConfig()
|
| 154 |
self.backtest: BacktestConfig = BacktestConfig()
|
| 155 |
+
self.class_capacity: ClassCapacityConfig = ClassCapacityConfig()
|
| 156 |
+
self.multi_year: MultiYearForecastConfig = MultiYearForecastConfig()
|
| 157 |
|
| 158 |
def get_prediction_target_name(self) -> str:
|
| 159 |
sem = "Ganjil" if self.prediction.PREDICT_SEMESTER == 1 else "Genap"
|
| 160 |
return f"{self.prediction.PREDICT_YEAR} Semester {sem}"
|
| 161 |
|
| 162 |
def get_elective_filter_description(self) -> str:
|
|
|
|
| 163 |
return f"kategori_mk = '{self.data.ELECTIVE_CATEGORY}' in {self.data.SHEET_COURSES}"
|
| 164 |
|
| 165 |
+
def get_class_capacity(self, course_code: str) -> int:
|
| 166 |
+
if course_code in self.class_capacity.COURSE_CAPACITY_OVERRIDES:
|
| 167 |
+
return self.class_capacity.COURSE_CAPACITY_OVERRIDES[course_code]
|
| 168 |
+
return self.class_capacity.DEFAULT_CLASS_CAPACITY
|
| 169 |
+
|
| 170 |
+
def calculate_classes_needed(
|
| 171 |
+
self,
|
| 172 |
+
predicted_enrollment: float,
|
| 173 |
+
course_code: str,
|
| 174 |
+
has_historical_data: bool = True,
|
| 175 |
+
) -> int:
|
| 176 |
+
import math
|
| 177 |
+
|
| 178 |
+
capacity = self.get_class_capacity(course_code)
|
| 179 |
+
|
| 180 |
+
if predicted_enrollment <= 0:
|
| 181 |
+
return 0
|
| 182 |
+
|
| 183 |
+
if predicted_enrollment < 1 and has_historical_data:
|
| 184 |
+
return 1
|
| 185 |
+
|
| 186 |
+
classes = math.ceil(predicted_enrollment / capacity)
|
| 187 |
+
|
| 188 |
+
return max(1, classes)
|
| 189 |
+
|
| 190 |
+
def get_capacity_status(self, predicted_enrollment: float, course_code: str) -> str:
|
| 191 |
+
capacity = self.get_class_capacity(course_code)
|
| 192 |
+
classes_needed = self.calculate_classes_needed(
|
| 193 |
+
predicted_enrollment, course_code
|
| 194 |
+
)
|
| 195 |
+
|
| 196 |
+
if classes_needed == 0:
|
| 197 |
+
return "UNDER"
|
| 198 |
+
|
| 199 |
+
total_capacity = classes_needed * capacity
|
| 200 |
+
utilization = predicted_enrollment / total_capacity
|
| 201 |
+
|
| 202 |
+
if utilization >= 1.0:
|
| 203 |
+
return "OVER"
|
| 204 |
+
elif utilization >= self.class_capacity.CAPACITY_WARNING_THRESHOLD:
|
| 205 |
+
return "WARNING"
|
| 206 |
+
else:
|
| 207 |
+
return "NORMAL"
|
| 208 |
+
|
| 209 |
|
| 210 |
default_config = Config()
|
data_loader.py
CHANGED
|
@@ -12,8 +12,7 @@ def load_data_file() -> str:
|
|
| 12 |
try:
|
| 13 |
from huggingface_hub import hf_hub_download
|
| 14 |
|
| 15 |
-
logger.info("
|
| 16 |
-
logger.info(" Dataset: muhalwan/optimized_data_mhs")
|
| 17 |
|
| 18 |
file_path = hf_hub_download(
|
| 19 |
repo_id="muhalwan/optimized_data_mhs",
|
|
@@ -23,34 +22,22 @@ def load_data_file() -> str:
|
|
| 23 |
cache_dir="./hf_cache",
|
| 24 |
)
|
| 25 |
|
| 26 |
-
logger.info("
|
| 27 |
-
logger.info(f" Cached at: {file_path}")
|
| 28 |
return file_path
|
| 29 |
|
| 30 |
-
except ImportError:
|
| 31 |
-
logger.error(
|
| 32 |
-
"huggingface_hub not installed. Install with: pip install huggingface_hub"
|
| 33 |
-
)
|
| 34 |
-
raise
|
| 35 |
-
|
| 36 |
except Exception as e:
|
| 37 |
logger.error(f"Failed to download from HF dataset: {e}")
|
| 38 |
-
logger.error("Falling back to local file if available...")
|
| 39 |
|
| 40 |
local_path = "data/optimized_data.xlsx"
|
| 41 |
|
| 42 |
if Path(local_path).exists():
|
| 43 |
-
logger.info(f"
|
| 44 |
return local_path
|
| 45 |
|
| 46 |
-
|
| 47 |
-
"No data
|
| 48 |
-
"
|
| 49 |
-
"1. Set HF_TOKEN environment variable to load from private dataset\n"
|
| 50 |
-
"2. Place optimized_data.xlsx in data/ folder for local development\n"
|
| 51 |
)
|
| 52 |
-
logger.error(error_msg)
|
| 53 |
-
raise FileNotFoundError(error_msg)
|
| 54 |
|
| 55 |
|
| 56 |
def get_data_source_info() -> dict:
|
|
@@ -69,21 +56,14 @@ def get_data_source_info() -> dict:
|
|
| 69 |
|
| 70 |
if __name__ == "__main__":
|
| 71 |
logging.basicConfig(level=logging.INFO)
|
| 72 |
-
|
| 73 |
-
print("=" * 80)
|
| 74 |
-
print("Data Source Information")
|
| 75 |
-
print("=" * 80)
|
| 76 |
|
| 77 |
info = get_data_source_info()
|
| 78 |
for key, value in info.items():
|
| 79 |
print(f" {key}: {value}")
|
| 80 |
|
| 81 |
-
print("\n" + "=" * 80)
|
| 82 |
-
print("Attempting to load data...")
|
| 83 |
-
print("=" * 80)
|
| 84 |
-
|
| 85 |
try:
|
| 86 |
file_path = load_data_file()
|
| 87 |
-
print(f"\
|
| 88 |
except Exception as e:
|
| 89 |
-
print(f"\
|
|
|
|
| 12 |
try:
|
| 13 |
from huggingface_hub import hf_hub_download
|
| 14 |
|
| 15 |
+
logger.info("Dataset: muhalwan/optimized_data_mhs")
|
|
|
|
| 16 |
|
| 17 |
file_path = hf_hub_download(
|
| 18 |
repo_id="muhalwan/optimized_data_mhs",
|
|
|
|
| 22 |
cache_dir="./hf_cache",
|
| 23 |
)
|
| 24 |
|
| 25 |
+
logger.info("Data loaded successfully from HF dataset")
|
|
|
|
| 26 |
return file_path
|
| 27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
except Exception as e:
|
| 29 |
logger.error(f"Failed to download from HF dataset: {e}")
|
|
|
|
| 30 |
|
| 31 |
local_path = "data/optimized_data.xlsx"
|
| 32 |
|
| 33 |
if Path(local_path).exists():
|
| 34 |
+
logger.info(f"Loading data from local file: {local_path}")
|
| 35 |
return local_path
|
| 36 |
|
| 37 |
+
raise FileNotFoundError(
|
| 38 |
+
"No data source available. Either set HF_TOKEN environment variable "
|
| 39 |
+
"or place data file at 'data/optimized_data.xlsx'"
|
|
|
|
|
|
|
| 40 |
)
|
|
|
|
|
|
|
| 41 |
|
| 42 |
|
| 43 |
def get_data_source_info() -> dict:
|
|
|
|
| 56 |
|
| 57 |
if __name__ == "__main__":
|
| 58 |
logging.basicConfig(level=logging.INFO)
|
| 59 |
+
print("Data Information")
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
info = get_data_source_info()
|
| 62 |
for key, value in info.items():
|
| 63 |
print(f" {key}: {value}")
|
| 64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
try:
|
| 66 |
file_path = load_data_file()
|
| 67 |
+
print(f"\nSuccess! Data file: {file_path}")
|
| 68 |
except Exception as e:
|
| 69 |
+
print(f"\nFailed: {e}")
|
data_processor.py
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
import logging
|
| 2 |
-
from typing import Dict, Set, Tuple
|
| 3 |
|
| 4 |
import numpy as np
|
| 5 |
import pandas as pd
|
|
@@ -22,7 +22,6 @@ class DataProcessor:
|
|
| 22 |
return self._preprocess()
|
| 23 |
|
| 24 |
def _load_excel(self):
|
| 25 |
-
logger.info(f"Loading {self.config.data.FILE_PATH}...")
|
| 26 |
try:
|
| 27 |
sheets = pd.read_excel(self.config.data.FILE_PATH, sheet_name=None)
|
| 28 |
self.raw_data = {
|
|
@@ -36,7 +35,6 @@ class DataProcessor:
|
|
| 36 |
raise
|
| 37 |
|
| 38 |
def _validate_raw_data(self):
|
| 39 |
-
"""Validate required columns and log data quality metrics."""
|
| 40 |
req_cols = {
|
| 41 |
"courses": ["kode_mk", "kategori_mk"],
|
| 42 |
"students_ind": ["kode_mk", "thn", "smt", "kode_mhs"],
|
|
@@ -47,46 +45,146 @@ class DataProcessor:
|
|
| 47 |
if not all(col in self.raw_data[key].columns for col in cols):
|
| 48 |
raise ValueError(f"Missing columns in {key}: {cols}")
|
| 49 |
|
| 50 |
-
|
| 51 |
-
self
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
-
|
| 54 |
-
"""Log data quality metrics for monitoring."""
|
| 55 |
-
courses_df = self.raw_data["courses"]
|
| 56 |
-
students_df = self.raw_data["students_ind"]
|
| 57 |
|
| 58 |
-
logger.info("=" * 60)
|
| 59 |
-
logger.info("Data Quality Report:")
|
| 60 |
-
logger.info(f" Courses (tabel1): {len(courses_df)} records")
|
| 61 |
-
logger.info(f" - Unique courses: {courses_df['kode_mk'].nunique()}")
|
| 62 |
logger.info(
|
| 63 |
-
f"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
)
|
| 65 |
-
logger.info(f" Students (tabel4): {len(students_df)} records")
|
| 66 |
-
logger.info(f" - Unique students: {students_df['kode_mhs'].nunique()}")
|
| 67 |
-
logger.info("=" * 60)
|
| 68 |
|
| 69 |
def _clean_courses_data(self, courses: pd.DataFrame) -> pd.DataFrame:
|
| 70 |
-
"""
|
| 71 |
-
Clean and standardize course catalog data.
|
| 72 |
-
|
| 73 |
-
Cleaning steps:
|
| 74 |
-
1. Remove exact duplicates
|
| 75 |
-
2. Standardize kategori_mk values (uppercase, strip whitespace)
|
| 76 |
-
3. Remove courses with invalid/missing data
|
| 77 |
-
4. Keep first occurrence for duplicate course codes
|
| 78 |
-
5. Validate kategori_mk values
|
| 79 |
-
"""
|
| 80 |
initial_count = len(courses)
|
| 81 |
|
| 82 |
-
#
|
| 83 |
courses = courses.drop_duplicates()
|
| 84 |
if len(courses) < initial_count:
|
| 85 |
logger.info(
|
| 86 |
f" Removed {initial_count - len(courses)} exact duplicate rows"
|
| 87 |
)
|
| 88 |
|
| 89 |
-
#
|
| 90 |
courses["kategori_mk"] = (
|
| 91 |
courses["kategori_mk"]
|
| 92 |
.astype(str)
|
|
@@ -95,7 +193,7 @@ class DataProcessor:
|
|
| 95 |
.replace("", np.nan)
|
| 96 |
)
|
| 97 |
|
| 98 |
-
#
|
| 99 |
before_dropna = len(courses)
|
| 100 |
courses = courses.dropna(subset=["kode_mk", "kategori_mk"])
|
| 101 |
if len(courses) < before_dropna:
|
|
@@ -103,7 +201,7 @@ class DataProcessor:
|
|
| 103 |
f" Removed {before_dropna - len(courses)} rows with missing kode_mk or kategori_mk"
|
| 104 |
)
|
| 105 |
|
| 106 |
-
#
|
| 107 |
valid_categories = {"P", "W"}
|
| 108 |
invalid_mask = ~courses["kategori_mk"].isin(valid_categories)
|
| 109 |
if invalid_mask.any():
|
|
@@ -114,7 +212,7 @@ class DataProcessor:
|
|
| 114 |
logger.warning(" Keeping only valid categories (P, W)")
|
| 115 |
courses = courses[~invalid_mask]
|
| 116 |
|
| 117 |
-
#
|
| 118 |
before_dedup = len(courses)
|
| 119 |
courses = courses.drop_duplicates(subset="kode_mk", keep="first")
|
| 120 |
if len(courses) < before_dedup:
|
|
@@ -127,29 +225,20 @@ class DataProcessor:
|
|
| 127 |
return courses
|
| 128 |
|
| 129 |
def _clean_students_data(self, students: pd.DataFrame) -> pd.DataFrame:
|
| 130 |
-
"""
|
| 131 |
-
Clean and validate student enrollment data.
|
| 132 |
-
|
| 133 |
-
Cleaning steps:
|
| 134 |
-
1. Remove rows with missing critical data
|
| 135 |
-
2. Standardize data types
|
| 136 |
-
3. Remove invalid year/semester values
|
| 137 |
-
4. Remove duplicate enrollment records
|
| 138 |
-
"""
|
| 139 |
initial_count = len(students)
|
| 140 |
|
| 141 |
-
#
|
| 142 |
students = students.dropna(subset=["kode_mk", "thn", "smt", "kode_mhs"])
|
| 143 |
if len(students) < initial_count:
|
| 144 |
logger.info(
|
| 145 |
f" Removed {initial_count - len(students)} rows with missing critical data"
|
| 146 |
)
|
| 147 |
|
| 148 |
-
#
|
| 149 |
students["thn"] = pd.to_numeric(students["thn"], errors="coerce")
|
| 150 |
students["smt"] = pd.to_numeric(students["smt"], errors="coerce")
|
| 151 |
|
| 152 |
-
#
|
| 153 |
before_invalid = len(students)
|
| 154 |
students = students.dropna(subset=["thn", "smt"])
|
| 155 |
if len(students) < before_invalid:
|
|
@@ -157,8 +246,8 @@ class DataProcessor:
|
|
| 157 |
f" Removed {before_invalid - len(students)} rows with invalid year/semester values"
|
| 158 |
)
|
| 159 |
|
| 160 |
-
#
|
| 161 |
-
valid_semesters = {1, 2
|
| 162 |
invalid_sem = ~students["smt"].isin(valid_semesters)
|
| 163 |
if invalid_sem.any():
|
| 164 |
logger.warning(
|
|
@@ -166,7 +255,7 @@ class DataProcessor:
|
|
| 166 |
)
|
| 167 |
students = students[~invalid_sem]
|
| 168 |
|
| 169 |
-
#
|
| 170 |
current_year = pd.Timestamp.now().year
|
| 171 |
invalid_year = (students["thn"] < 2000) | (students["thn"] > current_year + 1)
|
| 172 |
if invalid_year.any():
|
|
@@ -175,7 +264,7 @@ class DataProcessor:
|
|
| 175 |
)
|
| 176 |
students = students[~invalid_year]
|
| 177 |
|
| 178 |
-
#
|
| 179 |
before_dedup = len(students)
|
| 180 |
students = students.drop_duplicates(
|
| 181 |
subset=["kode_mhs", "kode_mk", "thn", "smt"], keep="first"
|
|
@@ -190,14 +279,6 @@ class DataProcessor:
|
|
| 190 |
return students
|
| 191 |
|
| 192 |
def _clean_yearly_population(self, yearly_pop: pd.DataFrame) -> pd.DataFrame:
|
| 193 |
-
"""
|
| 194 |
-
Clean and validate yearly student population data.
|
| 195 |
-
|
| 196 |
-
Cleaning steps:
|
| 197 |
-
1. Remove duplicates
|
| 198 |
-
2. Validate and fill missing population data
|
| 199 |
-
3. Ensure chronological order
|
| 200 |
-
"""
|
| 201 |
# Remove duplicate year-semester combinations
|
| 202 |
before_dedup = len(yearly_pop)
|
| 203 |
yearly_pop = yearly_pop.drop_duplicates(subset=["thn", "smt"], keep="first")
|
|
@@ -211,7 +292,7 @@ class DataProcessor:
|
|
| 211 |
yearly_pop["jumlah_aktif"], errors="coerce"
|
| 212 |
)
|
| 213 |
|
| 214 |
-
# Replace zero or negative values with NaN
|
| 215 |
yearly_pop.loc[yearly_pop["jumlah_aktif"] <= 0, "jumlah_aktif"] = np.nan
|
| 216 |
|
| 217 |
# Sort by year and semester
|
|
@@ -222,20 +303,14 @@ class DataProcessor:
|
|
| 222 |
return yearly_pop
|
| 223 |
|
| 224 |
def _preprocess(self) -> Tuple[pd.DataFrame, Set[str]]:
|
| 225 |
-
|
| 226 |
-
logger.info("Preprocessing data...")
|
| 227 |
-
logger.info("-" * 60)
|
| 228 |
-
|
| 229 |
-
# Step 1: Clean course catalog
|
| 230 |
-
logger.info("Step 1: Cleaning course catalog...")
|
| 231 |
courses = self._clean_courses_data(self.raw_data["courses"].copy())
|
| 232 |
|
| 233 |
-
#
|
| 234 |
elective_category = self.config.data.ELECTIVE_CATEGORY
|
| 235 |
self.elective_codes = set(
|
| 236 |
courses[courses["kategori_mk"] == elective_category]["kode_mk"]
|
| 237 |
)
|
| 238 |
-
logger.info(f"Step 2: Identified {len(self.elective_codes)} elective courses")
|
| 239 |
|
| 240 |
if len(self.elective_codes) == 0:
|
| 241 |
logger.warning(
|
|
@@ -246,88 +321,52 @@ class DataProcessor:
|
|
| 246 |
)
|
| 247 |
return pd.DataFrame(), set()
|
| 248 |
|
| 249 |
-
#
|
| 250 |
-
logger.info("Step 3: Cleaning student enrollment data...")
|
| 251 |
students = self._clean_students_data(self.raw_data["students_ind"].copy())
|
| 252 |
|
| 253 |
-
#
|
| 254 |
students = students[students["kode_mk"].isin(self.elective_codes)]
|
| 255 |
-
logger.info(f"Step 4: Filtered to {len(students)} elective enrollment records")
|
| 256 |
|
| 257 |
if len(students) == 0:
|
| 258 |
logger.warning("No enrollment data found for elective courses!")
|
| 259 |
return pd.DataFrame(), self.elective_codes
|
| 260 |
|
| 261 |
-
#
|
| 262 |
-
logger.info("Step 5: Aggregating enrollment data...")
|
| 263 |
enrollment = (
|
| 264 |
students.groupby(["kode_mk", "thn", "smt"])["kode_mhs"]
|
| 265 |
.nunique()
|
| 266 |
.reset_index(name="enrollment")
|
| 267 |
)
|
| 268 |
-
logger.info(f" Created {len(enrollment)} course-semester enrollment records")
|
| 269 |
|
| 270 |
-
#
|
| 271 |
-
logger.info("Step 6: Cleaning yearly population data...")
|
| 272 |
yearly_pop = self._clean_yearly_population(
|
| 273 |
self.raw_data["students_yearly"][["thn", "smt", "jumlah_aktif"]].copy()
|
| 274 |
)
|
| 275 |
|
| 276 |
-
#
|
| 277 |
-
logger.info("Step 7: Merging enrollment with population data...")
|
| 278 |
df = enrollment.merge(yearly_pop, on=["thn", "smt"], how="left")
|
| 279 |
|
| 280 |
-
#
|
| 281 |
missing_pop = df["jumlah_aktif"].isna().sum()
|
| 282 |
if missing_pop > 0:
|
| 283 |
-
logger.warning(
|
| 284 |
-
f" {missing_pop} records missing population data - filling with interpolation"
|
| 285 |
-
)
|
| 286 |
df["jumlah_aktif"] = df["jumlah_aktif"].ffill().bfill()
|
| 287 |
|
| 288 |
-
# If still missing, use a reasonable default
|
| 289 |
if df["jumlah_aktif"].isna().any():
|
| 290 |
-
default_pop = 500
|
| 291 |
-
logger.warning(
|
| 292 |
-
f" Some population data still missing - using default: {default_pop}"
|
| 293 |
-
)
|
| 294 |
df["jumlah_aktif"] = df["jumlah_aktif"].fillna(default_pop)
|
| 295 |
|
| 296 |
-
#
|
| 297 |
-
logger.info("Step 8: Validating final enrollment data...")
|
| 298 |
df = self._validate_enrollment_data(df)
|
| 299 |
|
| 300 |
-
#
|
| 301 |
df = df.sort_values(["kode_mk", "thn", "smt"]).reset_index(drop=True)
|
| 302 |
self.processed_data = df
|
| 303 |
|
| 304 |
-
logger.info("-" * 60)
|
| 305 |
-
logger.info(
|
| 306 |
-
f"✓ Preprocessing complete. {len(df)} enrollment records generated."
|
| 307 |
-
)
|
| 308 |
-
logger.info(f"✓ Year range: {df['thn'].min():.0f} - {df['thn'].max():.0f}")
|
| 309 |
-
logger.info(f"✓ Courses with data: {df['kode_mk'].nunique()}")
|
| 310 |
-
logger.info("-" * 60)
|
| 311 |
-
|
| 312 |
return df, self.elective_codes
|
| 313 |
|
| 314 |
def _validate_enrollment_data(self, df: pd.DataFrame) -> pd.DataFrame:
|
| 315 |
-
"""
|
| 316 |
-
Validate and clean the final enrollment dataset.
|
| 317 |
-
|
| 318 |
-
Checks:
|
| 319 |
-
1. Remove records with zero enrollment
|
| 320 |
-
2. Check for outliers
|
| 321 |
-
3. Validate population data
|
| 322 |
-
"""
|
| 323 |
-
initial_count = len(df)
|
| 324 |
-
|
| 325 |
# Remove zero enrollments
|
| 326 |
df = df[df["enrollment"] > 0]
|
| 327 |
-
if len(df) < initial_count:
|
| 328 |
-
logger.info(
|
| 329 |
-
f" Removed {initial_count - len(df)} records with zero enrollment"
|
| 330 |
-
)
|
| 331 |
|
| 332 |
# Check for extreme outliers in enrollment
|
| 333 |
for course in df["kode_mk"].unique():
|
|
@@ -335,7 +374,7 @@ class DataProcessor:
|
|
| 335 |
if len(course_data) > 1:
|
| 336 |
q75, q25 = course_data.quantile([0.75, 0.25])
|
| 337 |
iqr = q75 - q25
|
| 338 |
-
upper_bound = q75 + (3 * iqr)
|
| 339 |
|
| 340 |
outliers = course_data > upper_bound
|
| 341 |
if outliers.any():
|
|
|
|
| 1 |
import logging
|
| 2 |
+
from typing import Dict, Optional, Set, Tuple
|
| 3 |
|
| 4 |
import numpy as np
|
| 5 |
import pandas as pd
|
|
|
|
| 22 |
return self._preprocess()
|
| 23 |
|
| 24 |
def _load_excel(self):
|
|
|
|
| 25 |
try:
|
| 26 |
sheets = pd.read_excel(self.config.data.FILE_PATH, sheet_name=None)
|
| 27 |
self.raw_data = {
|
|
|
|
| 35 |
raise
|
| 36 |
|
| 37 |
def _validate_raw_data(self):
|
|
|
|
| 38 |
req_cols = {
|
| 39 |
"courses": ["kode_mk", "kategori_mk"],
|
| 40 |
"students_ind": ["kode_mk", "thn", "smt", "kode_mhs"],
|
|
|
|
| 45 |
if not all(col in self.raw_data[key].columns for col in cols):
|
| 46 |
raise ValueError(f"Missing columns in {key}: {cols}")
|
| 47 |
|
| 48 |
+
def get_actual_classes_opened(
|
| 49 |
+
self, year: int, semester: int, course_code: Optional[str] = None
|
| 50 |
+
) -> Dict[str, int]:
|
| 51 |
+
offerings = self.raw_data.get("offerings")
|
| 52 |
+
if offerings is None or len(offerings) == 0:
|
| 53 |
+
logger.warning("No offerings data (tabel2) available")
|
| 54 |
+
return {}
|
| 55 |
+
|
| 56 |
+
# Standardize column names
|
| 57 |
+
offerings = offerings.copy()
|
| 58 |
+
for old_col, new_col in self.config.data.OFFERINGS_RENAME.items():
|
| 59 |
+
if old_col in offerings.columns and new_col not in offerings.columns:
|
| 60 |
+
offerings = offerings.rename(columns={old_col: new_col})
|
| 61 |
+
|
| 62 |
+
# Log column names for debugging
|
| 63 |
+
logger.debug(f"Offerings columns: {offerings.columns.tolist()}")
|
| 64 |
+
|
| 65 |
+
# Filter by year and semester
|
| 66 |
+
mask = (offerings["thn"] == year) & (offerings["smt"] == semester)
|
| 67 |
+
if course_code:
|
| 68 |
+
mask = mask & (offerings["kode_mk"] == course_code)
|
| 69 |
+
|
| 70 |
+
filtered = offerings[mask]
|
| 71 |
+
|
| 72 |
+
if len(filtered) == 0:
|
| 73 |
+
logger.info(f"No class offerings found for {year} semester {semester}")
|
| 74 |
+
return {}
|
| 75 |
+
|
| 76 |
+
class_id_candidates = [
|
| 77 |
+
"kelas_id",
|
| 78 |
+
"id_kelas",
|
| 79 |
+
"kode_kelas",
|
| 80 |
+
"class_id",
|
| 81 |
+
"kelas",
|
| 82 |
+
"section_id",
|
| 83 |
+
"section",
|
| 84 |
+
]
|
| 85 |
+
class_id_col = None
|
| 86 |
+
|
| 87 |
+
for col in class_id_candidates:
|
| 88 |
+
if col in filtered.columns:
|
| 89 |
+
class_id_col = col
|
| 90 |
+
logger.debug(f"Using class ID column: {col}")
|
| 91 |
+
break
|
| 92 |
+
|
| 93 |
+
if class_id_col is None:
|
| 94 |
+
cols = filtered.columns.tolist()
|
| 95 |
+
if len(cols) > 2:
|
| 96 |
+
potential_id_col = cols[2]
|
| 97 |
+
non_id_cols = [
|
| 98 |
+
"nama_mk",
|
| 99 |
+
"smt",
|
| 100 |
+
"thn",
|
| 101 |
+
"semester",
|
| 102 |
+
"tahun",
|
| 103 |
+
"kuota",
|
| 104 |
+
"kapasitas",
|
| 105 |
+
]
|
| 106 |
+
if potential_id_col.lower() not in non_id_cols:
|
| 107 |
+
class_id_col = potential_id_col
|
| 108 |
+
logger.debug(
|
| 109 |
+
f"Using positional class ID column (index 2): {potential_id_col}"
|
| 110 |
+
)
|
| 111 |
+
|
| 112 |
+
result = {}
|
| 113 |
+
|
| 114 |
+
for kode_mk in filtered["kode_mk"].unique():
|
| 115 |
+
course_data = filtered[filtered["kode_mk"] == kode_mk]
|
| 116 |
+
|
| 117 |
+
if class_id_col and class_id_col in course_data.columns:
|
| 118 |
+
unique_classes = course_data[class_id_col].nunique()
|
| 119 |
+
logger.debug(
|
| 120 |
+
f"Course {kode_mk}: {len(course_data)} rows, {unique_classes} unique classes (by {class_id_col})"
|
| 121 |
+
)
|
| 122 |
+
else:
|
| 123 |
+
all_cols = course_data.columns.tolist()
|
| 124 |
+
|
| 125 |
+
dosen_cols = [
|
| 126 |
+
col
|
| 127 |
+
for col in all_cols
|
| 128 |
+
if "dosen" in col.lower()
|
| 129 |
+
or "pengajar" in col.lower()
|
| 130 |
+
or "teacher" in col.lower()
|
| 131 |
+
]
|
| 132 |
+
|
| 133 |
+
if len(all_cols) > 0:
|
| 134 |
+
last_col = all_cols[-1]
|
| 135 |
+
if last_col not in dosen_cols:
|
| 136 |
+
non_last_cols = [c for c in all_cols if c != last_col]
|
| 137 |
+
if len(non_last_cols) > 0:
|
| 138 |
+
grouped = course_data.groupby(non_last_cols)[
|
| 139 |
+
last_col
|
| 140 |
+
].nunique()
|
| 141 |
+
if (grouped > 1).any():
|
| 142 |
+
dosen_cols.append(last_col)
|
| 143 |
+
|
| 144 |
+
non_dosen_cols = [col for col in all_cols if col not in dosen_cols]
|
| 145 |
+
|
| 146 |
+
if non_dosen_cols:
|
| 147 |
+
unique_classes = len(
|
| 148 |
+
course_data.drop_duplicates(subset=non_dosen_cols)
|
| 149 |
+
)
|
| 150 |
+
else:
|
| 151 |
+
unique_classes = len(course_data.drop_duplicates())
|
| 152 |
+
|
| 153 |
+
logger.debug(
|
| 154 |
+
f"Course {kode_mk}: {len(course_data)} rows, {unique_classes} unique classes (fallback method)"
|
| 155 |
+
)
|
| 156 |
|
| 157 |
+
result[kode_mk] = max(1, unique_classes)
|
|
|
|
|
|
|
|
|
|
| 158 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
logger.info(
|
| 160 |
+
f"Found {len(result)} courses with {sum(result.values())} total classes for {year} sem {semester}"
|
| 161 |
+
)
|
| 162 |
+
return result
|
| 163 |
+
|
| 164 |
+
def get_class_count_for_validation(self, year: int, semester: int) -> pd.DataFrame:
|
| 165 |
+
actual_classes = self.get_actual_classes_opened(year, semester)
|
| 166 |
+
|
| 167 |
+
if not actual_classes:
|
| 168 |
+
return pd.DataFrame(columns=["kode_mk", "actual_classes"])
|
| 169 |
+
|
| 170 |
+
return pd.DataFrame(
|
| 171 |
+
[
|
| 172 |
+
{"kode_mk": kode, "actual_classes": count}
|
| 173 |
+
for kode, count in actual_classes.items()
|
| 174 |
+
]
|
| 175 |
)
|
|
|
|
|
|
|
|
|
|
| 176 |
|
| 177 |
def _clean_courses_data(self, courses: pd.DataFrame) -> pd.DataFrame:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
initial_count = len(courses)
|
| 179 |
|
| 180 |
+
# Remove duplicate
|
| 181 |
courses = courses.drop_duplicates()
|
| 182 |
if len(courses) < initial_count:
|
| 183 |
logger.info(
|
| 184 |
f" Removed {initial_count - len(courses)} exact duplicate rows"
|
| 185 |
)
|
| 186 |
|
| 187 |
+
# Standardize kategori_mk
|
| 188 |
courses["kategori_mk"] = (
|
| 189 |
courses["kategori_mk"]
|
| 190 |
.astype(str)
|
|
|
|
| 193 |
.replace("", np.nan)
|
| 194 |
)
|
| 195 |
|
| 196 |
+
# Remove rows with missing critical data
|
| 197 |
before_dropna = len(courses)
|
| 198 |
courses = courses.dropna(subset=["kode_mk", "kategori_mk"])
|
| 199 |
if len(courses) < before_dropna:
|
|
|
|
| 201 |
f" Removed {before_dropna - len(courses)} rows with missing kode_mk or kategori_mk"
|
| 202 |
)
|
| 203 |
|
| 204 |
+
# Validate kategori_mk values
|
| 205 |
valid_categories = {"P", "W"}
|
| 206 |
invalid_mask = ~courses["kategori_mk"].isin(valid_categories)
|
| 207 |
if invalid_mask.any():
|
|
|
|
| 212 |
logger.warning(" Keeping only valid categories (P, W)")
|
| 213 |
courses = courses[~invalid_mask]
|
| 214 |
|
| 215 |
+
# Remove duplicate course codes (keep first)
|
| 216 |
before_dedup = len(courses)
|
| 217 |
courses = courses.drop_duplicates(subset="kode_mk", keep="first")
|
| 218 |
if len(courses) < before_dedup:
|
|
|
|
| 225 |
return courses
|
| 226 |
|
| 227 |
def _clean_students_data(self, students: pd.DataFrame) -> pd.DataFrame:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 228 |
initial_count = len(students)
|
| 229 |
|
| 230 |
+
# Remove rows with missing critical data
|
| 231 |
students = students.dropna(subset=["kode_mk", "thn", "smt", "kode_mhs"])
|
| 232 |
if len(students) < initial_count:
|
| 233 |
logger.info(
|
| 234 |
f" Removed {initial_count - len(students)} rows with missing critical data"
|
| 235 |
)
|
| 236 |
|
| 237 |
+
# Ensure correct data types
|
| 238 |
students["thn"] = pd.to_numeric(students["thn"], errors="coerce")
|
| 239 |
students["smt"] = pd.to_numeric(students["smt"], errors="coerce")
|
| 240 |
|
| 241 |
+
# Remove rows with invalid year/semester after conversion
|
| 242 |
before_invalid = len(students)
|
| 243 |
students = students.dropna(subset=["thn", "smt"])
|
| 244 |
if len(students) < before_invalid:
|
|
|
|
| 246 |
f" Removed {before_invalid - len(students)} rows with invalid year/semester values"
|
| 247 |
)
|
| 248 |
|
| 249 |
+
# Validate semester values
|
| 250 |
+
valid_semesters = {1, 2}
|
| 251 |
invalid_sem = ~students["smt"].isin(valid_semesters)
|
| 252 |
if invalid_sem.any():
|
| 253 |
logger.warning(
|
|
|
|
| 255 |
)
|
| 256 |
students = students[~invalid_sem]
|
| 257 |
|
| 258 |
+
# Validate year range
|
| 259 |
current_year = pd.Timestamp.now().year
|
| 260 |
invalid_year = (students["thn"] < 2000) | (students["thn"] > current_year + 1)
|
| 261 |
if invalid_year.any():
|
|
|
|
| 264 |
)
|
| 265 |
students = students[~invalid_year]
|
| 266 |
|
| 267 |
+
# Remove exact duplicate enrollments (same student, course, semester)
|
| 268 |
before_dedup = len(students)
|
| 269 |
students = students.drop_duplicates(
|
| 270 |
subset=["kode_mhs", "kode_mk", "thn", "smt"], keep="first"
|
|
|
|
| 279 |
return students
|
| 280 |
|
| 281 |
def _clean_yearly_population(self, yearly_pop: pd.DataFrame) -> pd.DataFrame:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 282 |
# Remove duplicate year-semester combinations
|
| 283 |
before_dedup = len(yearly_pop)
|
| 284 |
yearly_pop = yearly_pop.drop_duplicates(subset=["thn", "smt"], keep="first")
|
|
|
|
| 292 |
yearly_pop["jumlah_aktif"], errors="coerce"
|
| 293 |
)
|
| 294 |
|
| 295 |
+
# Replace zero or negative values with NaN
|
| 296 |
yearly_pop.loc[yearly_pop["jumlah_aktif"] <= 0, "jumlah_aktif"] = np.nan
|
| 297 |
|
| 298 |
# Sort by year and semester
|
|
|
|
| 303 |
return yearly_pop
|
| 304 |
|
| 305 |
def _preprocess(self) -> Tuple[pd.DataFrame, Set[str]]:
|
| 306 |
+
# Clean course catalog
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 307 |
courses = self._clean_courses_data(self.raw_data["courses"].copy())
|
| 308 |
|
| 309 |
+
# Identify elective courses
|
| 310 |
elective_category = self.config.data.ELECTIVE_CATEGORY
|
| 311 |
self.elective_codes = set(
|
| 312 |
courses[courses["kategori_mk"] == elective_category]["kode_mk"]
|
| 313 |
)
|
|
|
|
| 314 |
|
| 315 |
if len(self.elective_codes) == 0:
|
| 316 |
logger.warning(
|
|
|
|
| 321 |
)
|
| 322 |
return pd.DataFrame(), set()
|
| 323 |
|
| 324 |
+
# Clean student enrollment data
|
|
|
|
| 325 |
students = self._clean_students_data(self.raw_data["students_ind"].copy())
|
| 326 |
|
| 327 |
+
# Filter for elective courses only
|
| 328 |
students = students[students["kode_mk"].isin(self.elective_codes)]
|
|
|
|
| 329 |
|
| 330 |
if len(students) == 0:
|
| 331 |
logger.warning("No enrollment data found for elective courses!")
|
| 332 |
return pd.DataFrame(), self.elective_codes
|
| 333 |
|
| 334 |
+
# Aggregate enrollment by course-semester
|
|
|
|
| 335 |
enrollment = (
|
| 336 |
students.groupby(["kode_mk", "thn", "smt"])["kode_mhs"]
|
| 337 |
.nunique()
|
| 338 |
.reset_index(name="enrollment")
|
| 339 |
)
|
|
|
|
| 340 |
|
| 341 |
+
# Clean yearly population data
|
|
|
|
| 342 |
yearly_pop = self._clean_yearly_population(
|
| 343 |
self.raw_data["students_yearly"][["thn", "smt", "jumlah_aktif"]].copy()
|
| 344 |
)
|
| 345 |
|
| 346 |
+
# Merge enrollment with population data
|
|
|
|
| 347 |
df = enrollment.merge(yearly_pop, on=["thn", "smt"], how="left")
|
| 348 |
|
| 349 |
+
# Handle missing population data
|
| 350 |
missing_pop = df["jumlah_aktif"].isna().sum()
|
| 351 |
if missing_pop > 0:
|
|
|
|
|
|
|
|
|
|
| 352 |
df["jumlah_aktif"] = df["jumlah_aktif"].ffill().bfill()
|
| 353 |
|
|
|
|
| 354 |
if df["jumlah_aktif"].isna().any():
|
| 355 |
+
default_pop = 500
|
|
|
|
|
|
|
|
|
|
| 356 |
df["jumlah_aktif"] = df["jumlah_aktif"].fillna(default_pop)
|
| 357 |
|
| 358 |
+
# Validate enrollment data
|
|
|
|
| 359 |
df = self._validate_enrollment_data(df)
|
| 360 |
|
| 361 |
+
# Sort and finalize
|
| 362 |
df = df.sort_values(["kode_mk", "thn", "smt"]).reset_index(drop=True)
|
| 363 |
self.processed_data = df
|
| 364 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 365 |
return df, self.elective_codes
|
| 366 |
|
| 367 |
def _validate_enrollment_data(self, df: pd.DataFrame) -> pd.DataFrame:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 368 |
# Remove zero enrollments
|
| 369 |
df = df[df["enrollment"] > 0]
|
|
|
|
|
|
|
|
|
|
|
|
|
| 370 |
|
| 371 |
# Check for extreme outliers in enrollment
|
| 372 |
for course in df["kode_mk"].unique():
|
|
|
|
| 374 |
if len(course_data) > 1:
|
| 375 |
q75, q25 = course_data.quantile([0.75, 0.25])
|
| 376 |
iqr = q75 - q25
|
| 377 |
+
upper_bound = q75 + (3 * iqr)
|
| 378 |
|
| 379 |
outliers = course_data > upper_bound
|
| 380 |
if outliers.any():
|
data_validator.py
DELETED
|
@@ -1,467 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Data Validation Utility
|
| 3 |
-
|
| 4 |
-
Provides pre-flight checks and data quality validation for the enrollment prediction system.
|
| 5 |
-
This module validates data availability, quality, and completeness before processing.
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
import logging
|
| 9 |
-
from dataclasses import dataclass
|
| 10 |
-
from typing import Dict, List, Optional, Tuple
|
| 11 |
-
|
| 12 |
-
import pandas as pd
|
| 13 |
-
|
| 14 |
-
logger = logging.getLogger(__name__)
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
@dataclass
|
| 18 |
-
class ValidationResult:
|
| 19 |
-
"""Result of a validation check."""
|
| 20 |
-
|
| 21 |
-
passed: bool
|
| 22 |
-
message: str
|
| 23 |
-
severity: str = "INFO" # INFO, WARNING, ERROR
|
| 24 |
-
details: Optional[Dict] = None
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
@dataclass
|
| 28 |
-
class SemesterDataStatus:
|
| 29 |
-
"""Status of data availability for a specific semester."""
|
| 30 |
-
|
| 31 |
-
year: int
|
| 32 |
-
semester: int
|
| 33 |
-
has_offerings: bool
|
| 34 |
-
has_enrollments: bool
|
| 35 |
-
has_elective_enrollments: bool
|
| 36 |
-
total_enrollments: int
|
| 37 |
-
elective_enrollments: int
|
| 38 |
-
elective_courses: Dict[str, int]
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
class DataValidator:
|
| 42 |
-
"""Validates data quality and availability for the enrollment prediction system."""
|
| 43 |
-
|
| 44 |
-
def __init__(self, file_path: str):
|
| 45 |
-
"""
|
| 46 |
-
Initialize the validator.
|
| 47 |
-
|
| 48 |
-
Args:
|
| 49 |
-
file_path: Path to the Excel data file
|
| 50 |
-
"""
|
| 51 |
-
self.file_path = file_path
|
| 52 |
-
self.validation_results: List[ValidationResult] = []
|
| 53 |
-
|
| 54 |
-
def validate_all(self) -> Tuple[bool, List[ValidationResult]]:
|
| 55 |
-
"""
|
| 56 |
-
Run all validation checks.
|
| 57 |
-
|
| 58 |
-
Returns:
|
| 59 |
-
Tuple of (all_passed, list of validation results)
|
| 60 |
-
"""
|
| 61 |
-
logger.info("Running comprehensive data validation...")
|
| 62 |
-
|
| 63 |
-
# Load raw data
|
| 64 |
-
try:
|
| 65 |
-
self.raw_data = self._load_raw_data()
|
| 66 |
-
except Exception as e:
|
| 67 |
-
self.validation_results.append(
|
| 68 |
-
ValidationResult(
|
| 69 |
-
passed=False,
|
| 70 |
-
message=f"Failed to load data: {str(e)}",
|
| 71 |
-
severity="ERROR",
|
| 72 |
-
)
|
| 73 |
-
)
|
| 74 |
-
return False, self.validation_results
|
| 75 |
-
|
| 76 |
-
# Run validation checks
|
| 77 |
-
self._validate_file_structure()
|
| 78 |
-
self._validate_course_catalog()
|
| 79 |
-
self._validate_elective_courses()
|
| 80 |
-
self._validate_enrollment_data()
|
| 81 |
-
self._validate_population_data()
|
| 82 |
-
|
| 83 |
-
# Overall result
|
| 84 |
-
all_passed = all(
|
| 85 |
-
r.passed for r in self.validation_results if r.severity == "ERROR"
|
| 86 |
-
)
|
| 87 |
-
|
| 88 |
-
return all_passed, self.validation_results
|
| 89 |
-
|
| 90 |
-
def check_semester_data_availability(
|
| 91 |
-
self, year: int, semester: int
|
| 92 |
-
) -> SemesterDataStatus:
|
| 93 |
-
"""
|
| 94 |
-
Check data availability for a specific semester.
|
| 95 |
-
|
| 96 |
-
Args:
|
| 97 |
-
year: Academic year
|
| 98 |
-
semester: Semester (1 or 2)
|
| 99 |
-
|
| 100 |
-
Returns:
|
| 101 |
-
SemesterDataStatus object with detailed availability info
|
| 102 |
-
"""
|
| 103 |
-
if not hasattr(self, "raw_data"):
|
| 104 |
-
self.raw_data = self._load_raw_data()
|
| 105 |
-
|
| 106 |
-
# Check course offerings (tabel2)
|
| 107 |
-
offerings = self.raw_data["offerings"]
|
| 108 |
-
has_offerings = (
|
| 109 |
-
len(
|
| 110 |
-
offerings[
|
| 111 |
-
(offerings["tahun"] == year) & (offerings["semester"] == semester)
|
| 112 |
-
]
|
| 113 |
-
)
|
| 114 |
-
> 0
|
| 115 |
-
)
|
| 116 |
-
|
| 117 |
-
# Check enrollments (tabel4)
|
| 118 |
-
students = self.raw_data["students"]
|
| 119 |
-
semester_enrollments = students[
|
| 120 |
-
(students["thn"] == year) & (students["smt"] == semester)
|
| 121 |
-
]
|
| 122 |
-
has_enrollments = len(semester_enrollments) > 0
|
| 123 |
-
|
| 124 |
-
# Check elective enrollments
|
| 125 |
-
elective_codes = self._get_elective_codes()
|
| 126 |
-
elective_enrollments = semester_enrollments[
|
| 127 |
-
semester_enrollments["kode_mk"].isin(elective_codes)
|
| 128 |
-
]
|
| 129 |
-
has_elective_enrollments = len(elective_enrollments) > 0
|
| 130 |
-
|
| 131 |
-
# Get elective courses for this semester
|
| 132 |
-
elective_courses: Dict[str, int] = {}
|
| 133 |
-
if has_elective_enrollments:
|
| 134 |
-
elective_courses = (
|
| 135 |
-
elective_enrollments.groupby("kode_mk")["kode_mhs"]
|
| 136 |
-
.nunique()
|
| 137 |
-
.sort_values(ascending=False)
|
| 138 |
-
.to_dict()
|
| 139 |
-
)
|
| 140 |
-
|
| 141 |
-
return SemesterDataStatus(
|
| 142 |
-
year=year,
|
| 143 |
-
semester=semester,
|
| 144 |
-
has_offerings=has_offerings,
|
| 145 |
-
has_enrollments=has_enrollments,
|
| 146 |
-
has_elective_enrollments=has_elective_enrollments,
|
| 147 |
-
total_enrollments=len(semester_enrollments),
|
| 148 |
-
elective_enrollments=len(elective_enrollments),
|
| 149 |
-
elective_courses=elective_courses,
|
| 150 |
-
)
|
| 151 |
-
|
| 152 |
-
def get_available_semesters_for_backtesting(self) -> List[Tuple[int, int]]:
|
| 153 |
-
"""
|
| 154 |
-
Get list of semesters that have elective enrollment data (suitable for backtesting).
|
| 155 |
-
|
| 156 |
-
Returns:
|
| 157 |
-
List of (year, semester) tuples
|
| 158 |
-
"""
|
| 159 |
-
if not hasattr(self, "raw_data"):
|
| 160 |
-
self.raw_data = self._load_raw_data()
|
| 161 |
-
|
| 162 |
-
students = self.raw_data["students"]
|
| 163 |
-
elective_codes = self._get_elective_codes()
|
| 164 |
-
|
| 165 |
-
# Filter to elective enrollments only
|
| 166 |
-
elective_students = students[students["kode_mk"].isin(elective_codes)]
|
| 167 |
-
|
| 168 |
-
# Get unique year-semester combinations
|
| 169 |
-
available = (
|
| 170 |
-
elective_students.groupby(["thn", "smt"]).size().reset_index(name="count")
|
| 171 |
-
)
|
| 172 |
-
available = available[available["count"] > 0]
|
| 173 |
-
|
| 174 |
-
semesters = [
|
| 175 |
-
(int(row["thn"]), int(row["smt"])) for _, row in available.iterrows()
|
| 176 |
-
]
|
| 177 |
-
semesters.sort(reverse=True) # Most recent first
|
| 178 |
-
|
| 179 |
-
return semesters
|
| 180 |
-
|
| 181 |
-
def print_validation_summary(self):
|
| 182 |
-
"""Print a summary of validation results."""
|
| 183 |
-
if not self.validation_results:
|
| 184 |
-
print("\nWARNING: No validation has been run yet.")
|
| 185 |
-
return
|
| 186 |
-
|
| 187 |
-
print("\n" + "=" * 80)
|
| 188 |
-
print("DATA VALIDATION SUMMARY")
|
| 189 |
-
print("=" * 80)
|
| 190 |
-
|
| 191 |
-
errors = [r for r in self.validation_results if r.severity == "ERROR"]
|
| 192 |
-
warnings = [r for r in self.validation_results if r.severity == "WARNING"]
|
| 193 |
-
info = [r for r in self.validation_results if r.severity == "INFO"]
|
| 194 |
-
|
| 195 |
-
if errors:
|
| 196 |
-
print(f"\nERROR ({len(errors)}):")
|
| 197 |
-
for result in errors:
|
| 198 |
-
print(f" - {result.message}")
|
| 199 |
-
|
| 200 |
-
if warnings:
|
| 201 |
-
print(f"\nWARNING ({len(warnings)}):")
|
| 202 |
-
for result in warnings:
|
| 203 |
-
print(f" - {result.message}")
|
| 204 |
-
|
| 205 |
-
if info:
|
| 206 |
-
print(f"\nINFO ({len(info)}):")
|
| 207 |
-
for result in info:
|
| 208 |
-
print(f" - {result.message}")
|
| 209 |
-
|
| 210 |
-
print("\n" + "=" * 80)
|
| 211 |
-
if not errors:
|
| 212 |
-
print("VALIDATION PASSED - Data is ready for processing")
|
| 213 |
-
else:
|
| 214 |
-
print("VALIDATION FAILED - Please fix errors before proceeding")
|
| 215 |
-
print("=" * 80)
|
| 216 |
-
|
| 217 |
-
def _load_raw_data(self) -> Dict[str, pd.DataFrame]:
|
| 218 |
-
"""Load raw data from Excel file."""
|
| 219 |
-
logger.info(f"Loading data from {self.file_path}...")
|
| 220 |
-
|
| 221 |
-
return {
|
| 222 |
-
"courses": pd.read_excel(self.file_path, sheet_name="tabel1_data_matkul"),
|
| 223 |
-
"offerings": pd.read_excel(
|
| 224 |
-
self.file_path, sheet_name="tabel2_data_matkul_dibuka"
|
| 225 |
-
),
|
| 226 |
-
"population": pd.read_excel(
|
| 227 |
-
self.file_path, sheet_name="tabel3_data_mahasiswa_per_tahun"
|
| 228 |
-
),
|
| 229 |
-
"students": pd.read_excel(
|
| 230 |
-
self.file_path, sheet_name="tabel4_data_individu_mahasiswa"
|
| 231 |
-
),
|
| 232 |
-
}
|
| 233 |
-
|
| 234 |
-
def _validate_file_structure(self):
|
| 235 |
-
"""Validate that all required sheets and columns exist."""
|
| 236 |
-
required_sheets = {
|
| 237 |
-
"courses": ["kode_mk", "nama_mk", "kategori_mk"],
|
| 238 |
-
"offerings": ["kode_mk", "tahun", "semester"],
|
| 239 |
-
"students": ["kode_mk", "kode_mhs", "thn", "smt"],
|
| 240 |
-
"population": ["jumlah_aktif"], # tahun_ajaran and semester may vary
|
| 241 |
-
}
|
| 242 |
-
|
| 243 |
-
for sheet_name, required_cols in required_sheets.items():
|
| 244 |
-
df = self.raw_data.get(sheet_name)
|
| 245 |
-
if df is None:
|
| 246 |
-
self.validation_results.append(
|
| 247 |
-
ValidationResult(
|
| 248 |
-
passed=False,
|
| 249 |
-
message=f"Sheet '{sheet_name}' not found",
|
| 250 |
-
severity="ERROR",
|
| 251 |
-
)
|
| 252 |
-
)
|
| 253 |
-
continue
|
| 254 |
-
|
| 255 |
-
missing_cols = [col for col in required_cols if col not in df.columns]
|
| 256 |
-
if missing_cols:
|
| 257 |
-
self.validation_results.append(
|
| 258 |
-
ValidationResult(
|
| 259 |
-
passed=False,
|
| 260 |
-
message=f"Missing columns in {sheet_name}: {missing_cols}",
|
| 261 |
-
severity="ERROR",
|
| 262 |
-
)
|
| 263 |
-
)
|
| 264 |
-
else:
|
| 265 |
-
self.validation_results.append(
|
| 266 |
-
ValidationResult(
|
| 267 |
-
passed=True,
|
| 268 |
-
message=f"Sheet '{sheet_name}' has all required columns",
|
| 269 |
-
severity="INFO",
|
| 270 |
-
)
|
| 271 |
-
)
|
| 272 |
-
|
| 273 |
-
def _validate_course_catalog(self):
|
| 274 |
-
"""Validate course catalog (tabel1)."""
|
| 275 |
-
courses = self.raw_data["courses"]
|
| 276 |
-
|
| 277 |
-
# Check for duplicates
|
| 278 |
-
total_records = len(courses)
|
| 279 |
-
unique_courses = courses["kode_mk"].nunique()
|
| 280 |
-
duplicate_count = total_records - unique_courses
|
| 281 |
-
|
| 282 |
-
if duplicate_count > 0:
|
| 283 |
-
self.validation_results.append(
|
| 284 |
-
ValidationResult(
|
| 285 |
-
passed=True,
|
| 286 |
-
message=f"Course catalog has {duplicate_count:,} duplicate records (will be cleaned)",
|
| 287 |
-
severity="WARNING",
|
| 288 |
-
details={"total": total_records, "unique": unique_courses},
|
| 289 |
-
)
|
| 290 |
-
)
|
| 291 |
-
|
| 292 |
-
# Check for category consistency
|
| 293 |
-
categories = courses["kategori_mk"].unique()
|
| 294 |
-
non_standard = [c for c in categories if c not in ["W", "P"]]
|
| 295 |
-
if non_standard:
|
| 296 |
-
self.validation_results.append(
|
| 297 |
-
ValidationResult(
|
| 298 |
-
passed=True,
|
| 299 |
-
message=f"Non-standard categories found: {non_standard} (will be normalized)",
|
| 300 |
-
severity="WARNING",
|
| 301 |
-
)
|
| 302 |
-
)
|
| 303 |
-
|
| 304 |
-
def _validate_elective_courses(self):
|
| 305 |
-
"""Validate elective course identification."""
|
| 306 |
-
courses = self.raw_data["courses"]
|
| 307 |
-
|
| 308 |
-
# Clean and identify electives
|
| 309 |
-
courses_clean = courses.drop_duplicates(subset="kode_mk").copy()
|
| 310 |
-
courses_clean["kategori_mk"] = (
|
| 311 |
-
courses_clean["kategori_mk"].astype(str).str.upper().str.strip()
|
| 312 |
-
)
|
| 313 |
-
|
| 314 |
-
electives = courses_clean[courses_clean["kategori_mk"] == "P"]
|
| 315 |
-
elective_count = len(electives)
|
| 316 |
-
|
| 317 |
-
if elective_count == 0:
|
| 318 |
-
self.validation_results.append(
|
| 319 |
-
ValidationResult(
|
| 320 |
-
passed=False,
|
| 321 |
-
message="No elective courses found (kategori_mk = 'P')",
|
| 322 |
-
severity="ERROR",
|
| 323 |
-
)
|
| 324 |
-
)
|
| 325 |
-
else:
|
| 326 |
-
self.validation_results.append(
|
| 327 |
-
ValidationResult(
|
| 328 |
-
passed=True,
|
| 329 |
-
message=f"Found {elective_count} elective courses",
|
| 330 |
-
severity="INFO",
|
| 331 |
-
details={"electives": electives["kode_mk"].tolist()},
|
| 332 |
-
)
|
| 333 |
-
)
|
| 334 |
-
|
| 335 |
-
def _validate_enrollment_data(self):
|
| 336 |
-
"""Validate student enrollment data (tabel4)."""
|
| 337 |
-
students = self.raw_data["students"]
|
| 338 |
-
|
| 339 |
-
# Check for missing critical data
|
| 340 |
-
critical_fields = ["kode_mk", "kode_mhs", "thn", "smt"]
|
| 341 |
-
missing_data = students[critical_fields].isnull().any(axis=1).sum()
|
| 342 |
-
|
| 343 |
-
if missing_data > 0:
|
| 344 |
-
self.validation_results.append(
|
| 345 |
-
ValidationResult(
|
| 346 |
-
passed=True,
|
| 347 |
-
message=f"{missing_data} enrollment records have missing data (will be cleaned)",
|
| 348 |
-
severity="WARNING",
|
| 349 |
-
)
|
| 350 |
-
)
|
| 351 |
-
|
| 352 |
-
# Check for duplicates
|
| 353 |
-
duplicate_enrollments = students.duplicated(
|
| 354 |
-
subset=["kode_mhs", "kode_mk", "thn", "smt"]
|
| 355 |
-
).sum()
|
| 356 |
-
|
| 357 |
-
if duplicate_enrollments > 0:
|
| 358 |
-
self.validation_results.append(
|
| 359 |
-
ValidationResult(
|
| 360 |
-
passed=True,
|
| 361 |
-
message=f"{duplicate_enrollments:,} duplicate enrollment records (will be cleaned)",
|
| 362 |
-
severity="WARNING",
|
| 363 |
-
)
|
| 364 |
-
)
|
| 365 |
-
|
| 366 |
-
# Check year range
|
| 367 |
-
min_year = students["thn"].min()
|
| 368 |
-
max_year = students["thn"].max()
|
| 369 |
-
|
| 370 |
-
self.validation_results.append(
|
| 371 |
-
ValidationResult(
|
| 372 |
-
passed=True,
|
| 373 |
-
message=f"Enrollment data spans {int(min_year)} to {int(max_year)}",
|
| 374 |
-
severity="INFO",
|
| 375 |
-
)
|
| 376 |
-
)
|
| 377 |
-
|
| 378 |
-
def _validate_population_data(self):
|
| 379 |
-
"""Validate yearly population data (tabel3)."""
|
| 380 |
-
population = self.raw_data["population"]
|
| 381 |
-
|
| 382 |
-
if len(population) == 0:
|
| 383 |
-
self.validation_results.append(
|
| 384 |
-
ValidationResult(
|
| 385 |
-
passed=False,
|
| 386 |
-
message="No population data found",
|
| 387 |
-
severity="ERROR",
|
| 388 |
-
)
|
| 389 |
-
)
|
| 390 |
-
return
|
| 391 |
-
|
| 392 |
-
# Check for required fields (note: actual columns are tahun_ajaran/semester, not in sheet_name definition)
|
| 393 |
-
if "jumlah_aktif" in population.columns:
|
| 394 |
-
min_pop = population["jumlah_aktif"].min()
|
| 395 |
-
max_pop = population["jumlah_aktif"].max()
|
| 396 |
-
|
| 397 |
-
self.validation_results.append(
|
| 398 |
-
ValidationResult(
|
| 399 |
-
passed=True,
|
| 400 |
-
message=f"Population data: {len(population)} records, range {int(min_pop)}-{int(max_pop)} students",
|
| 401 |
-
severity="INFO",
|
| 402 |
-
)
|
| 403 |
-
)
|
| 404 |
-
else:
|
| 405 |
-
self.validation_results.append(
|
| 406 |
-
ValidationResult(
|
| 407 |
-
passed=False,
|
| 408 |
-
message="Population data missing 'jumlah_aktif' column",
|
| 409 |
-
severity="ERROR",
|
| 410 |
-
)
|
| 411 |
-
)
|
| 412 |
-
|
| 413 |
-
def _get_elective_codes(self) -> set:
|
| 414 |
-
"""Get set of elective course codes."""
|
| 415 |
-
courses = self.raw_data["courses"]
|
| 416 |
-
courses_clean = courses.drop_duplicates(subset="kode_mk").copy()
|
| 417 |
-
courses_clean["kategori_mk"] = (
|
| 418 |
-
courses_clean["kategori_mk"].astype(str).str.upper().str.strip()
|
| 419 |
-
)
|
| 420 |
-
return set(courses_clean[courses_clean["kategori_mk"] == "P"]["kode_mk"])
|
| 421 |
-
|
| 422 |
-
|
| 423 |
-
if __name__ == "__main__":
|
| 424 |
-
# Example usage
|
| 425 |
-
logging.basicConfig(
|
| 426 |
-
level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
|
| 427 |
-
)
|
| 428 |
-
|
| 429 |
-
validator = DataValidator(
|
| 430 |
-
"data/Data Perkuliahan Mahasiswa untuk Penelitian (8 Oktober 2025).xlsx"
|
| 431 |
-
)
|
| 432 |
-
|
| 433 |
-
# Run validation
|
| 434 |
-
passed, results = validator.validate_all()
|
| 435 |
-
validator.print_validation_summary()
|
| 436 |
-
|
| 437 |
-
# Check specific semesters
|
| 438 |
-
print("\n" + "=" * 80)
|
| 439 |
-
print("SEMESTER DATA AVAILABILITY")
|
| 440 |
-
print("=" * 80)
|
| 441 |
-
|
| 442 |
-
for year, semester in [(2024, 2), (2025, 1)]:
|
| 443 |
-
status = validator.check_semester_data_availability(year, semester)
|
| 444 |
-
print(f"\n{year} Semester {semester}:")
|
| 445 |
-
print(f" Offerings: {'Yes' if status.has_offerings else 'No'}")
|
| 446 |
-
print(
|
| 447 |
-
f" Enrollments: {'Yes' if status.has_enrollments else 'No'} ({status.total_enrollments} records)"
|
| 448 |
-
)
|
| 449 |
-
print(
|
| 450 |
-
f" Elective Enrollments: {'Yes' if status.has_elective_enrollments else 'No'} ({status.elective_enrollments} records)"
|
| 451 |
-
)
|
| 452 |
-
if status.elective_courses:
|
| 453 |
-
print(f" Elective courses: {len(status.elective_courses)}")
|
| 454 |
-
for code, count in list(status.elective_courses.items())[:5]:
|
| 455 |
-
print(f" - {code}: {count} students")
|
| 456 |
-
|
| 457 |
-
# Show available semesters for backtesting
|
| 458 |
-
print("\n" + "=" * 80)
|
| 459 |
-
print("SEMESTERS AVAILABLE FOR BACKTESTING")
|
| 460 |
-
print("=" * 80)
|
| 461 |
-
available = validator.get_available_semesters_for_backtesting()
|
| 462 |
-
if available:
|
| 463 |
-
print(f"\nFound {len(available)} semesters with elective enrollment data:")
|
| 464 |
-
for year, sem in available:
|
| 465 |
-
print(f" • {year} Semester {sem}")
|
| 466 |
-
else:
|
| 467 |
-
print("\nERROR: No semesters with elective enrollment data found!")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
evaluator.py
CHANGED
|
@@ -17,12 +17,11 @@ class Evaluator:
|
|
| 17 |
self.config = config
|
| 18 |
|
| 19 |
def run_backtest(self, full_data: pd.DataFrame, predictor):
|
| 20 |
-
"""Simulate past semesters to check accuracy."""
|
| 21 |
-
logger.info("Starting Backtest...")
|
| 22 |
results = []
|
| 23 |
|
| 24 |
start_year: int = self.config.backtest.START_YEAR
|
| 25 |
end_year: int = self.config.backtest.END_YEAR
|
|
|
|
| 26 |
|
| 27 |
for year in range(start_year, end_year + 1):
|
| 28 |
for smt in [1, 2]:
|
|
@@ -47,53 +46,251 @@ class Evaluator:
|
|
| 47 |
row["kode_mk"], train_set, year, smt, pop_est
|
| 48 |
)
|
| 49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
results.append(
|
| 51 |
{
|
| 52 |
"year": year,
|
| 53 |
"semester": smt,
|
| 54 |
"kode_mk": row["kode_mk"],
|
| 55 |
-
"actual":
|
| 56 |
-
"predicted":
|
|
|
|
|
|
|
| 57 |
"strategy": pred["strategy"],
|
| 58 |
-
"error": abs(
|
|
|
|
| 59 |
}
|
| 60 |
)
|
| 61 |
|
| 62 |
return pd.DataFrame(results)
|
| 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
def generate_metrics(self, results: pd.DataFrame):
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
| 66 |
results["error"] = abs(results["predicted"] - results["actual"])
|
|
|
|
|
|
|
|
|
|
| 67 |
|
|
|
|
| 68 |
mae = mean_absolute_error(results["actual"], results["predicted"])
|
| 69 |
rmse = np.sqrt(mean_squared_error(results["actual"], results["predicted"]))
|
| 70 |
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
logger.info("BACKTEST METRICS")
|
| 73 |
-
logger.info("
|
| 74 |
-
logger.info(f"Overall MAE: {mae:.2f}")
|
| 75 |
-
logger.info(f"Overall RMSE: {rmse:.2f}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
logger.info("\nPerformance by Strategy:")
|
| 78 |
-
strat_perf =
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
logger.info(strat_perf.to_string())
|
| 80 |
|
|
|
|
|
|
|
| 81 |
self._plot_results(results)
|
|
|
|
| 82 |
|
| 83 |
-
return {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
|
| 85 |
def _plot_results(self, df):
|
| 86 |
-
"""Generate simple Actual vs Predicted scatter plot."""
|
| 87 |
Path(self.config.output.OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
|
| 88 |
|
| 89 |
plt.figure(figsize=(10, 6))
|
| 90 |
sns.scatterplot(
|
| 91 |
-
data=df,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
)
|
| 93 |
|
| 94 |
limit = max(df["actual"].max(), df["predicted"].max())
|
| 95 |
-
plt.plot([0, limit], [0, limit], "r--", alpha=0.5)
|
| 96 |
|
| 97 |
plt.title("Actual vs Predicted Enrollment")
|
| 98 |
-
plt.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
plt.close()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
self.config = config
|
| 18 |
|
| 19 |
def run_backtest(self, full_data: pd.DataFrame, predictor):
|
|
|
|
|
|
|
| 20 |
results = []
|
| 21 |
|
| 22 |
start_year: int = self.config.backtest.START_YEAR
|
| 23 |
end_year: int = self.config.backtest.END_YEAR
|
| 24 |
+
class_capacity = self.config.class_capacity.DEFAULT_CLASS_CAPACITY
|
| 25 |
|
| 26 |
for year in range(start_year, end_year + 1):
|
| 27 |
for smt in [1, 2]:
|
|
|
|
| 46 |
row["kode_mk"], train_set, year, smt, pop_est
|
| 47 |
)
|
| 48 |
|
| 49 |
+
actual_enrollment = row["enrollment"]
|
| 50 |
+
predicted_enrollment = pred["val"]
|
| 51 |
+
|
| 52 |
+
actual_classes = self._calculate_classes(
|
| 53 |
+
actual_enrollment, class_capacity
|
| 54 |
+
)
|
| 55 |
+
predicted_classes = pred.get(
|
| 56 |
+
"classes_needed",
|
| 57 |
+
self._calculate_classes(predicted_enrollment, class_capacity),
|
| 58 |
+
)
|
| 59 |
+
|
| 60 |
results.append(
|
| 61 |
{
|
| 62 |
"year": year,
|
| 63 |
"semester": smt,
|
| 64 |
"kode_mk": row["kode_mk"],
|
| 65 |
+
"actual": actual_enrollment,
|
| 66 |
+
"predicted": predicted_enrollment,
|
| 67 |
+
"actual_classes": actual_classes,
|
| 68 |
+
"predicted_classes": predicted_classes,
|
| 69 |
"strategy": pred["strategy"],
|
| 70 |
+
"error": abs(actual_enrollment - predicted_enrollment),
|
| 71 |
+
"class_error": abs(actual_classes - predicted_classes),
|
| 72 |
}
|
| 73 |
)
|
| 74 |
|
| 75 |
return pd.DataFrame(results)
|
| 76 |
|
| 77 |
+
def _calculate_classes(self, enrollment: float, capacity: int) -> int:
|
| 78 |
+
if enrollment < self.config.class_capacity.MIN_STUDENTS_TO_OPEN_CLASS:
|
| 79 |
+
return 0
|
| 80 |
+
return int(np.ceil(enrollment / capacity))
|
| 81 |
+
|
| 82 |
def generate_metrics(self, results: pd.DataFrame):
|
| 83 |
+
if results.empty:
|
| 84 |
+
logger.warning("No results to generate metrics from")
|
| 85 |
+
return {"mae": 0, "rmse": 0, "class_mae": 0, "class_accuracy": 0}
|
| 86 |
+
|
| 87 |
results["error"] = abs(results["predicted"] - results["actual"])
|
| 88 |
+
results["class_error"] = abs(
|
| 89 |
+
results["predicted_classes"] - results["actual_classes"]
|
| 90 |
+
)
|
| 91 |
|
| 92 |
+
# Enrollment metrics
|
| 93 |
mae = mean_absolute_error(results["actual"], results["predicted"])
|
| 94 |
rmse = np.sqrt(mean_squared_error(results["actual"], results["predicted"]))
|
| 95 |
|
| 96 |
+
# Class count metrics
|
| 97 |
+
class_mae = results["class_error"].mean()
|
| 98 |
+
|
| 99 |
+
# Class accuracy: percentage of predictions with correct class count
|
| 100 |
+
class_correct = (results["class_error"] == 0).sum()
|
| 101 |
+
class_accuracy = (class_correct / len(results)) * 100 if len(results) > 0 else 0
|
| 102 |
+
|
| 103 |
+
# Class accuracy within 1: predictions within ±1 class
|
| 104 |
+
class_within_1 = (results["class_error"] <= 1).sum()
|
| 105 |
+
class_accuracy_within_1 = (
|
| 106 |
+
(class_within_1 / len(results)) * 100 if len(results) > 0 else 0
|
| 107 |
+
)
|
| 108 |
+
|
| 109 |
logger.info("BACKTEST METRICS")
|
| 110 |
+
logger.info("\nEnrollment Prediction Metrics:")
|
| 111 |
+
logger.info(f" Overall MAE: {mae:.2f} students")
|
| 112 |
+
logger.info(f" Overall RMSE: {rmse:.2f} students")
|
| 113 |
+
|
| 114 |
+
logger.info("\nClass Count Prediction Metrics:")
|
| 115 |
+
logger.info(f" Class MAE: {class_mae:.2f} classes")
|
| 116 |
+
logger.info(f" Exact Class Match: {class_accuracy:.1f}%")
|
| 117 |
+
logger.info(f" Within ±1 Class: {class_accuracy_within_1:.1f}%")
|
| 118 |
|
| 119 |
logger.info("\nPerformance by Strategy:")
|
| 120 |
+
strat_perf = (
|
| 121 |
+
results.groupby("strategy")
|
| 122 |
+
.agg({"error": "mean", "class_error": "mean"})
|
| 123 |
+
.round(2)
|
| 124 |
+
)
|
| 125 |
+
strat_perf.columns = ["Avg Enrollment Error", "Avg Class Error"]
|
| 126 |
logger.info(strat_perf.to_string())
|
| 127 |
|
| 128 |
+
logger.info("=" * 50)
|
| 129 |
+
|
| 130 |
self._plot_results(results)
|
| 131 |
+
self._plot_class_results(results)
|
| 132 |
|
| 133 |
+
return {
|
| 134 |
+
"mae": mae,
|
| 135 |
+
"rmse": rmse,
|
| 136 |
+
"class_mae": class_mae,
|
| 137 |
+
"class_accuracy": class_accuracy,
|
| 138 |
+
"class_accuracy_within_1": class_accuracy_within_1,
|
| 139 |
+
}
|
| 140 |
|
| 141 |
def _plot_results(self, df):
|
|
|
|
| 142 |
Path(self.config.output.OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
|
| 143 |
|
| 144 |
plt.figure(figsize=(10, 6))
|
| 145 |
sns.scatterplot(
|
| 146 |
+
data=df,
|
| 147 |
+
x="actual",
|
| 148 |
+
y="predicted",
|
| 149 |
+
hue="strategy",
|
| 150 |
+
style="strategy",
|
| 151 |
+
alpha=0.7,
|
| 152 |
)
|
| 153 |
|
| 154 |
limit = max(df["actual"].max(), df["predicted"].max())
|
| 155 |
+
plt.plot([0, limit], [0, limit], "r--", alpha=0.5, label="Perfect Prediction")
|
| 156 |
|
| 157 |
plt.title("Actual vs Predicted Enrollment")
|
| 158 |
+
plt.xlabel("Actual Enrollment")
|
| 159 |
+
plt.ylabel("Predicted Enrollment")
|
| 160 |
+
plt.legend(bbox_to_anchor=(1.05, 1), loc="upper left")
|
| 161 |
+
plt.tight_layout()
|
| 162 |
+
plt.savefig(
|
| 163 |
+
f"{self.config.output.OUTPUT_DIR}/backtest_enrollment_scatter.png", dpi=150
|
| 164 |
+
)
|
| 165 |
plt.close()
|
| 166 |
+
|
| 167 |
+
def _plot_class_results(self, df):
|
| 168 |
+
Path(self.config.output.OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
|
| 169 |
+
|
| 170 |
+
plt.figure(figsize=(10, 6))
|
| 171 |
+
|
| 172 |
+
jitter_strength = 0.1
|
| 173 |
+
df_plot = df.copy()
|
| 174 |
+
df_plot["actual_jitter"] = df_plot["actual_classes"] + np.random.uniform(
|
| 175 |
+
-jitter_strength, jitter_strength, len(df_plot)
|
| 176 |
+
)
|
| 177 |
+
df_plot["predicted_jitter"] = df_plot["predicted_classes"] + np.random.uniform(
|
| 178 |
+
-jitter_strength, jitter_strength, len(df_plot)
|
| 179 |
+
)
|
| 180 |
+
|
| 181 |
+
sns.scatterplot(
|
| 182 |
+
data=df_plot,
|
| 183 |
+
x="actual_jitter",
|
| 184 |
+
y="predicted_jitter",
|
| 185 |
+
hue="strategy",
|
| 186 |
+
style="strategy",
|
| 187 |
+
alpha=0.7,
|
| 188 |
+
)
|
| 189 |
+
|
| 190 |
+
limit = max(df["actual_classes"].max(), df["predicted_classes"].max()) + 1
|
| 191 |
+
plt.plot([0, limit], [0, limit], "r--", alpha=0.5, label="Perfect Prediction")
|
| 192 |
+
|
| 193 |
+
plt.title("Actual vs Predicted Number of Classes")
|
| 194 |
+
plt.xlabel("Actual Classes Needed")
|
| 195 |
+
plt.ylabel("Predicted Classes Needed")
|
| 196 |
+
plt.legend(bbox_to_anchor=(1.05, 1), loc="upper left")
|
| 197 |
+
plt.tight_layout()
|
| 198 |
+
plt.savefig(
|
| 199 |
+
f"{self.config.output.OUTPUT_DIR}/backtest_classes_scatter.png", dpi=150
|
| 200 |
+
)
|
| 201 |
+
plt.close()
|
| 202 |
+
|
| 203 |
+
def generate_class_capacity_report(self, results: pd.DataFrame) -> pd.DataFrame:
|
| 204 |
+
if results.empty:
|
| 205 |
+
return pd.DataFrame()
|
| 206 |
+
|
| 207 |
+
course_summary = (
|
| 208 |
+
results.groupby("kode_mk")
|
| 209 |
+
.agg(
|
| 210 |
+
{
|
| 211 |
+
"actual": ["mean", "sum", "count"],
|
| 212 |
+
"predicted": ["mean", "sum"],
|
| 213 |
+
"actual_classes": ["mean", "sum"],
|
| 214 |
+
"predicted_classes": ["mean", "sum"],
|
| 215 |
+
"class_error": ["mean", "sum"],
|
| 216 |
+
}
|
| 217 |
+
)
|
| 218 |
+
.round(2)
|
| 219 |
+
)
|
| 220 |
+
|
| 221 |
+
course_summary.columns = [
|
| 222 |
+
"avg_actual_enrollment",
|
| 223 |
+
"total_actual_enrollment",
|
| 224 |
+
"n_semesters",
|
| 225 |
+
"avg_predicted_enrollment",
|
| 226 |
+
"total_predicted_enrollment",
|
| 227 |
+
"avg_actual_classes",
|
| 228 |
+
"total_actual_classes",
|
| 229 |
+
"avg_predicted_classes",
|
| 230 |
+
"total_predicted_classes",
|
| 231 |
+
"avg_class_error",
|
| 232 |
+
"total_class_error",
|
| 233 |
+
]
|
| 234 |
+
|
| 235 |
+
course_summary = course_summary.reset_index()
|
| 236 |
+
course_summary = course_summary.sort_values(
|
| 237 |
+
"total_class_error", ascending=False
|
| 238 |
+
)
|
| 239 |
+
|
| 240 |
+
return course_summary
|
| 241 |
+
|
| 242 |
+
def analyze_capacity_trends(self, full_data: pd.DataFrame) -> pd.DataFrame:
|
| 243 |
+
class_capacity = self.config.class_capacity.DEFAULT_CLASS_CAPACITY
|
| 244 |
+
|
| 245 |
+
trend_data = full_data.copy()
|
| 246 |
+
trend_data["classes_needed"] = trend_data["enrollment"].apply(
|
| 247 |
+
lambda x: self._calculate_classes(x, class_capacity)
|
| 248 |
+
)
|
| 249 |
+
|
| 250 |
+
course_trends = []
|
| 251 |
+
|
| 252 |
+
for course in trend_data["kode_mk"].unique():
|
| 253 |
+
course_data = trend_data[trend_data["kode_mk"] == course].sort_values(
|
| 254 |
+
["thn", "smt"]
|
| 255 |
+
)
|
| 256 |
+
|
| 257 |
+
if len(course_data) < 2:
|
| 258 |
+
continue
|
| 259 |
+
|
| 260 |
+
first_year = course_data.iloc[0]
|
| 261 |
+
last_year = course_data.iloc[-1]
|
| 262 |
+
|
| 263 |
+
enrollment_growth = last_year["enrollment"] - first_year["enrollment"]
|
| 264 |
+
class_growth = last_year["classes_needed"] - first_year["classes_needed"]
|
| 265 |
+
|
| 266 |
+
years_diff = last_year["thn"] - first_year["thn"]
|
| 267 |
+
if years_diff > 0 and first_year["enrollment"] > 0:
|
| 268 |
+
annual_growth_rate = (
|
| 269 |
+
(last_year["enrollment"] / first_year["enrollment"])
|
| 270 |
+
** (1 / years_diff)
|
| 271 |
+
- 1
|
| 272 |
+
) * 100
|
| 273 |
+
else:
|
| 274 |
+
annual_growth_rate = 0
|
| 275 |
+
|
| 276 |
+
course_trends.append(
|
| 277 |
+
{
|
| 278 |
+
"kode_mk": course,
|
| 279 |
+
"first_enrollment": first_year["enrollment"],
|
| 280 |
+
"last_enrollment": last_year["enrollment"],
|
| 281 |
+
"enrollment_growth": enrollment_growth,
|
| 282 |
+
"first_classes": first_year["classes_needed"],
|
| 283 |
+
"last_classes": last_year["classes_needed"],
|
| 284 |
+
"class_growth": class_growth,
|
| 285 |
+
"annual_growth_rate": round(annual_growth_rate, 1),
|
| 286 |
+
"data_points": len(course_data),
|
| 287 |
+
"year_range": f"{int(first_year['thn'])}-{int(last_year['thn'])}",
|
| 288 |
+
}
|
| 289 |
+
)
|
| 290 |
+
|
| 291 |
+
trends_df = pd.DataFrame(course_trends)
|
| 292 |
+
|
| 293 |
+
if not trends_df.empty:
|
| 294 |
+
trends_df = trends_df.sort_values("annual_growth_rate", ascending=False)
|
| 295 |
+
|
| 296 |
+
return trends_df
|
prophet_predictor.py
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
import logging
|
| 2 |
-
from typing import Optional
|
| 3 |
|
| 4 |
import numpy as np
|
| 5 |
import pandas as pd
|
|
@@ -24,8 +24,13 @@ class ProphetPredictor:
|
|
| 24 |
)
|
| 25 |
df["y"] = df["jumlah_aktif"]
|
| 26 |
|
| 27 |
-
self.student_model = Prophet(
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
logger.info("Student population model trained.")
|
| 30 |
|
| 31 |
def get_student_forecast(self, year: int, semester: int) -> float:
|
|
@@ -37,6 +42,19 @@ class ProphetPredictor:
|
|
| 37 |
forecast = self.student_model.predict(future)
|
| 38 |
return max(forecast["yhat"].values[0], 100)
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
def predict_course(
|
| 41 |
self,
|
| 42 |
course_code: str,
|
|
@@ -46,23 +64,41 @@ class ProphetPredictor:
|
|
| 46 |
student_pop: float,
|
| 47 |
) -> dict:
|
| 48 |
hist = df_history[
|
| 49 |
-
(df_history["kode_mk"] == course_code) &
|
| 50 |
-
(df_history["smt"] == target_smt)
|
| 51 |
].sort_values(["thn", "smt"])
|
| 52 |
|
| 53 |
-
|
|
|
|
|
|
|
| 54 |
return {
|
| 55 |
"val": self.config.model.FALLBACK_DEFAULT,
|
| 56 |
"strategy": "cold_start",
|
| 57 |
"confidence": "low",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
}
|
| 59 |
|
| 60 |
-
|
| 61 |
-
hist, target_year, target_smt, student_pop
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
)
|
| 63 |
|
| 64 |
-
|
| 65 |
-
|
|
|
|
|
|
|
| 66 |
) -> dict:
|
| 67 |
df = hist.copy()
|
| 68 |
df["ds"] = pd.to_datetime(
|
|
@@ -89,14 +125,20 @@ class ProphetPredictor:
|
|
| 89 |
"confidence": "low",
|
| 90 |
}
|
| 91 |
|
| 92 |
-
hist_max = df["y"].max()
|
| 93 |
-
hist_mean = df["y"].mean()
|
|
|
|
|
|
|
| 94 |
|
| 95 |
cap_value = min(
|
| 96 |
hist_max * self.config.prediction.MAX_CAPACITY_MULTIPLIER,
|
| 97 |
self.config.prediction.ABSOLUTE_MAX_STUDENTS,
|
| 98 |
)
|
| 99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
df["cap"] = cap_value
|
| 101 |
df["floor"] = 0
|
| 102 |
|
|
@@ -109,8 +151,11 @@ class ProphetPredictor:
|
|
| 109 |
weekly_seasonality=False, # type: ignore[arg-type]
|
| 110 |
)
|
| 111 |
|
| 112 |
-
|
| 113 |
-
|
|
|
|
|
|
|
|
|
|
| 114 |
|
| 115 |
future_date = pd.to_datetime(
|
| 116 |
f"{year}-{self.config.prediction.SEMESTER_TO_MONTH[smt]}"
|
|
@@ -121,10 +166,12 @@ class ProphetPredictor:
|
|
| 121 |
"ds": [future_date],
|
| 122 |
"cap": [cap_value],
|
| 123 |
"floor": [0],
|
| 124 |
-
"jumlah_aktif": [pop],
|
| 125 |
}
|
| 126 |
)
|
| 127 |
|
|
|
|
|
|
|
|
|
|
| 128 |
forecast = m.predict(future)
|
| 129 |
raw_pred = forecast["yhat"].values[0]
|
| 130 |
|
|
@@ -135,18 +182,17 @@ class ProphetPredictor:
|
|
| 135 |
or raw_pred > cap_value * 2
|
| 136 |
):
|
| 137 |
logger.warning(
|
| 138 |
-
f"Prophet prediction ({raw_pred:.1f}) unrealistic. "
|
| 139 |
f"Using trend-based fallback. (hist_max={hist_max}, cap={cap_value})"
|
| 140 |
)
|
|
|
|
| 141 |
if len(df) >= 3:
|
| 142 |
-
recent_trend = df["y"].tail(3).mean()
|
| 143 |
-
pop_growth_factor = pop /
|
| 144 |
-
growth_factor = min(
|
| 145 |
-
max(pop_growth_factor, 0.8), 1.3
|
| 146 |
-
)
|
| 147 |
pred = recent_trend * growth_factor
|
| 148 |
else:
|
| 149 |
-
pop_growth_factor = pop /
|
| 150 |
pred = hist_mean * min(max(pop_growth_factor, 0.8), 1.3)
|
| 151 |
|
| 152 |
pred = min(max(pred, 0), cap_value)
|
|
@@ -166,13 +212,37 @@ class ProphetPredictor:
|
|
| 166 |
}
|
| 167 |
|
| 168 |
except Exception as e:
|
| 169 |
-
logger.warning(
|
|
|
|
|
|
|
| 170 |
return {
|
| 171 |
"val": hist["enrollment"].mean(),
|
| 172 |
"strategy": "fallback_mean",
|
| 173 |
"confidence": "medium",
|
| 174 |
}
|
| 175 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
def generate_batch_predictions(
|
| 177 |
self,
|
| 178 |
full_data: pd.DataFrame,
|
|
@@ -180,8 +250,7 @@ class ProphetPredictor:
|
|
| 180 |
electives: set,
|
| 181 |
year: int,
|
| 182 |
smt: int,
|
| 183 |
-
):
|
| 184 |
-
"""Generate predictions for all courses."""
|
| 185 |
student_pop = self.get_student_forecast(year, smt)
|
| 186 |
results = []
|
| 187 |
|
|
@@ -190,35 +259,57 @@ class ProphetPredictor:
|
|
| 190 |
)
|
| 191 |
|
| 192 |
for code in electives:
|
| 193 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 194 |
|
| 195 |
pred_result = self.predict_course(code, full_data, year, smt, student_pop)
|
| 196 |
pred_val = pred_result["val"]
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 200 |
)
|
| 201 |
-
rec_quota = max(rec_quota, self.config.prediction.MIN_QUOTA_OPEN)
|
| 202 |
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 207 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 208 |
|
| 209 |
results.append(
|
| 210 |
{
|
| 211 |
"kode_mk": code,
|
| 212 |
"nama_mk": meta["nama_mk"],
|
| 213 |
-
"sks": meta
|
| 214 |
"predicted_enrollment": round(pred_val, 1),
|
| 215 |
-
"
|
|
|
|
|
|
|
|
|
|
| 216 |
"recommendation": status,
|
|
|
|
| 217 |
"strategy": pred_result["strategy"],
|
| 218 |
"confidence": pred_result["confidence"],
|
| 219 |
-
"classes_est": int(np.ceil(rec_quota / 40))
|
| 220 |
-
if status == "BUKA"
|
| 221 |
-
else 0,
|
| 222 |
}
|
| 223 |
)
|
| 224 |
|
|
@@ -226,6 +317,98 @@ class ProphetPredictor:
|
|
| 226 |
"predicted_enrollment", ascending=False
|
| 227 |
)
|
| 228 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 229 |
def predict_course_enrollment(
|
| 230 |
self,
|
| 231 |
course_code: str,
|
|
@@ -233,7 +416,7 @@ class ProphetPredictor:
|
|
| 233 |
test_year: int,
|
| 234 |
test_semester: int,
|
| 235 |
test_student_count: float,
|
| 236 |
-
) -> tuple
|
| 237 |
result = self.predict_course(
|
| 238 |
course_code=course_code,
|
| 239 |
df_history=train_data,
|
|
|
|
| 1 |
import logging
|
| 2 |
+
from typing import Dict, List, Optional, Tuple
|
| 3 |
|
| 4 |
import numpy as np
|
| 5 |
import pandas as pd
|
|
|
|
| 24 |
)
|
| 25 |
df["y"] = df["jumlah_aktif"]
|
| 26 |
|
| 27 |
+
self.student_model = Prophet(
|
| 28 |
+
growth="linear",
|
| 29 |
+
daily_seasonality=False, # type: ignore[arg-type]
|
| 30 |
+
weekly_seasonality=False, # type: ignore[arg-type]
|
| 31 |
+
yearly_seasonality=True, # type: ignore[arg-type]
|
| 32 |
+
)
|
| 33 |
+
self.student_model.fit(df[["ds", "y"]])
|
| 34 |
logger.info("Student population model trained.")
|
| 35 |
|
| 36 |
def get_student_forecast(self, year: int, semester: int) -> float:
|
|
|
|
| 42 |
forecast = self.student_model.predict(future)
|
| 43 |
return max(forecast["yhat"].values[0], 100)
|
| 44 |
|
| 45 |
+
def get_multi_year_student_forecast(
|
| 46 |
+
self, start_year: int, semester: int, years_ahead: int
|
| 47 |
+
) -> List[Tuple[int, float]]:
|
| 48 |
+
assert self.student_model is not None, "Student model must be trained first"
|
| 49 |
+
|
| 50 |
+
forecasts = []
|
| 51 |
+
for i in range(years_ahead + 1):
|
| 52 |
+
year = start_year + i
|
| 53 |
+
pop = self.get_student_forecast(year, semester)
|
| 54 |
+
forecasts.append((year, pop))
|
| 55 |
+
|
| 56 |
+
return forecasts
|
| 57 |
+
|
| 58 |
def predict_course(
|
| 59 |
self,
|
| 60 |
course_code: str,
|
|
|
|
| 64 |
student_pop: float,
|
| 65 |
) -> dict:
|
| 66 |
hist = df_history[
|
| 67 |
+
(df_history["kode_mk"] == course_code) & (df_history["smt"] == target_smt)
|
|
|
|
| 68 |
].sort_values(["thn", "smt"])
|
| 69 |
|
| 70 |
+
has_historical_data = len(hist) > 0
|
| 71 |
+
|
| 72 |
+
if not has_historical_data:
|
| 73 |
return {
|
| 74 |
"val": self.config.model.FALLBACK_DEFAULT,
|
| 75 |
"strategy": "cold_start",
|
| 76 |
"confidence": "low",
|
| 77 |
+
"classes_needed": self.config.calculate_classes_needed(
|
| 78 |
+
self.config.model.FALLBACK_DEFAULT,
|
| 79 |
+
course_code,
|
| 80 |
+
has_historical_data=False,
|
| 81 |
+
),
|
| 82 |
+
"capacity_status": self.config.get_capacity_status(
|
| 83 |
+
self.config.model.FALLBACK_DEFAULT, course_code
|
| 84 |
+
),
|
| 85 |
}
|
| 86 |
|
| 87 |
+
result = self._predict_prophet_with_capacity(
|
| 88 |
+
hist, target_year, target_smt, student_pop, course_code
|
| 89 |
+
)
|
| 90 |
+
|
| 91 |
+
result["classes_needed"] = self.config.calculate_classes_needed(
|
| 92 |
+
result["val"], course_code, has_historical_data=has_historical_data
|
| 93 |
+
)
|
| 94 |
+
result["capacity_status"] = self.config.get_capacity_status(
|
| 95 |
+
result["val"], course_code
|
| 96 |
)
|
| 97 |
|
| 98 |
+
return result
|
| 99 |
+
|
| 100 |
+
def _predict_prophet_with_capacity(
|
| 101 |
+
self, hist: pd.DataFrame, year: int, smt: int, pop: float, course_code: str
|
| 102 |
) -> dict:
|
| 103 |
df = hist.copy()
|
| 104 |
df["ds"] = pd.to_datetime(
|
|
|
|
| 125 |
"confidence": "low",
|
| 126 |
}
|
| 127 |
|
| 128 |
+
hist_max = float(df["y"].max())
|
| 129 |
+
hist_mean = float(df["y"].mean())
|
| 130 |
+
|
| 131 |
+
class_capacity = self.config.get_class_capacity(course_code)
|
| 132 |
|
| 133 |
cap_value = min(
|
| 134 |
hist_max * self.config.prediction.MAX_CAPACITY_MULTIPLIER,
|
| 135 |
self.config.prediction.ABSOLUTE_MAX_STUDENTS,
|
| 136 |
)
|
| 137 |
|
| 138 |
+
if self.config.class_capacity.ENABLE_CAPACITY_CONSTRAINTS:
|
| 139 |
+
max_realistic_cap = class_capacity * 4
|
| 140 |
+
cap_value = min(cap_value, max_realistic_cap)
|
| 141 |
+
|
| 142 |
df["cap"] = cap_value
|
| 143 |
df["floor"] = 0
|
| 144 |
|
|
|
|
| 151 |
weekly_seasonality=False, # type: ignore[arg-type]
|
| 152 |
)
|
| 153 |
|
| 154 |
+
if self.config.model.USE_POPULATION_REGRESSOR:
|
| 155 |
+
m.add_regressor("jumlah_aktif", mode="multiplicative")
|
| 156 |
+
m.fit(df[["ds", "y", "cap", "floor", "jumlah_aktif"]])
|
| 157 |
+
else:
|
| 158 |
+
m.fit(df[["ds", "y", "cap", "floor"]])
|
| 159 |
|
| 160 |
future_date = pd.to_datetime(
|
| 161 |
f"{year}-{self.config.prediction.SEMESTER_TO_MONTH[smt]}"
|
|
|
|
| 166 |
"ds": [future_date],
|
| 167 |
"cap": [cap_value],
|
| 168 |
"floor": [0],
|
|
|
|
| 169 |
}
|
| 170 |
)
|
| 171 |
|
| 172 |
+
if self.config.model.USE_POPULATION_REGRESSOR:
|
| 173 |
+
future["jumlah_aktif"] = pop
|
| 174 |
+
|
| 175 |
forecast = m.predict(future)
|
| 176 |
raw_pred = forecast["yhat"].values[0]
|
| 177 |
|
|
|
|
| 182 |
or raw_pred > cap_value * 2
|
| 183 |
):
|
| 184 |
logger.warning(
|
| 185 |
+
f"Prophet prediction ({raw_pred:.1f}) unrealistic for {course_code}. "
|
| 186 |
f"Using trend-based fallback. (hist_max={hist_max}, cap={cap_value})"
|
| 187 |
)
|
| 188 |
+
pop_mean = float(df["jumlah_aktif"].mean())
|
| 189 |
if len(df) >= 3:
|
| 190 |
+
recent_trend = float(df["y"].tail(3).mean())
|
| 191 |
+
pop_growth_factor = pop / pop_mean if pop_mean > 0 else 1.0
|
| 192 |
+
growth_factor = min(max(pop_growth_factor, 0.8), 1.3)
|
|
|
|
|
|
|
| 193 |
pred = recent_trend * growth_factor
|
| 194 |
else:
|
| 195 |
+
pop_growth_factor = pop / pop_mean if pop_mean > 0 else 1.0
|
| 196 |
pred = hist_mean * min(max(pop_growth_factor, 0.8), 1.3)
|
| 197 |
|
| 198 |
pred = min(max(pred, 0), cap_value)
|
|
|
|
| 212 |
}
|
| 213 |
|
| 214 |
except Exception as e:
|
| 215 |
+
logger.warning(
|
| 216 |
+
f"Prophet failed for course {course_code}. Error: {e}. Using fallback."
|
| 217 |
+
)
|
| 218 |
return {
|
| 219 |
"val": hist["enrollment"].mean(),
|
| 220 |
"strategy": "fallback_mean",
|
| 221 |
"confidence": "medium",
|
| 222 |
}
|
| 223 |
|
| 224 |
+
def predict_multi_year(
|
| 225 |
+
self,
|
| 226 |
+
course_code: str,
|
| 227 |
+
df_history: pd.DataFrame,
|
| 228 |
+
start_year: int,
|
| 229 |
+
target_smt: int,
|
| 230 |
+
years_ahead: int = 3,
|
| 231 |
+
) -> List[Dict]:
|
| 232 |
+
predictions = []
|
| 233 |
+
|
| 234 |
+
for i in range(years_ahead + 1):
|
| 235 |
+
year = start_year + i
|
| 236 |
+
pop = self.get_student_forecast(year, target_smt)
|
| 237 |
+
|
| 238 |
+
pred = self.predict_course(course_code, df_history, year, target_smt, pop)
|
| 239 |
+
pred["year"] = year
|
| 240 |
+
pred["semester"] = target_smt
|
| 241 |
+
pred["student_population"] = pop
|
| 242 |
+
predictions.append(pred)
|
| 243 |
+
|
| 244 |
+
return predictions
|
| 245 |
+
|
| 246 |
def generate_batch_predictions(
|
| 247 |
self,
|
| 248 |
full_data: pd.DataFrame,
|
|
|
|
| 250 |
electives: set,
|
| 251 |
year: int,
|
| 252 |
smt: int,
|
| 253 |
+
) -> pd.DataFrame:
|
|
|
|
| 254 |
student_pop = self.get_student_forecast(year, smt)
|
| 255 |
results = []
|
| 256 |
|
|
|
|
| 259 |
)
|
| 260 |
|
| 261 |
for code in electives:
|
| 262 |
+
meta_rows = course_metadata[course_metadata["kode_mk"] == code]
|
| 263 |
+
if len(meta_rows) == 0:
|
| 264 |
+
logger.warning(f"No metadata found for course {code}, skipping")
|
| 265 |
+
continue
|
| 266 |
+
meta = meta_rows.iloc[0]
|
| 267 |
|
| 268 |
pred_result = self.predict_course(code, full_data, year, smt, student_pop)
|
| 269 |
pred_val = pred_result["val"]
|
| 270 |
+
course_history = full_data[full_data["kode_mk"] == code]
|
| 271 |
+
has_history = len(course_history) > 0
|
| 272 |
+
|
| 273 |
+
classes_needed = pred_result.get(
|
| 274 |
+
"classes_needed",
|
| 275 |
+
self.config.calculate_classes_needed(
|
| 276 |
+
pred_val, code, has_historical_data=has_history
|
| 277 |
+
),
|
| 278 |
)
|
|
|
|
| 279 |
|
| 280 |
+
course_capacity = self.config.get_class_capacity(code)
|
| 281 |
+
|
| 282 |
+
if classes_needed > 0:
|
| 283 |
+
rec_quota = classes_needed * course_capacity
|
| 284 |
+
else:
|
| 285 |
+
rec_quota = 0
|
| 286 |
+
|
| 287 |
+
min_threshold = self.config.class_capacity.MIN_STUDENTS_TO_OPEN_CLASS
|
| 288 |
+
should_open = pred_val >= min_threshold or (
|
| 289 |
+
has_history and self.config.class_capacity.OPEN_CLASS_IF_HAS_HISTORY
|
| 290 |
)
|
| 291 |
+
status = "BUKA" if should_open else "TUTUP"
|
| 292 |
+
|
| 293 |
+
if classes_needed > 0:
|
| 294 |
+
total_capacity = classes_needed * course_capacity
|
| 295 |
+
utilization = (pred_val / total_capacity) * 100
|
| 296 |
+
else:
|
| 297 |
+
utilization = 0
|
| 298 |
|
| 299 |
results.append(
|
| 300 |
{
|
| 301 |
"kode_mk": code,
|
| 302 |
"nama_mk": meta["nama_mk"],
|
| 303 |
+
"sks": meta.get("sks_mk", 0),
|
| 304 |
"predicted_enrollment": round(pred_val, 1),
|
| 305 |
+
"class_capacity": course_capacity,
|
| 306 |
+
"classes_needed": classes_needed,
|
| 307 |
+
"total_quota": rec_quota,
|
| 308 |
+
"utilization_pct": round(utilization, 1),
|
| 309 |
"recommendation": status,
|
| 310 |
+
"capacity_status": pred_result.get("capacity_status", "NORMAL"),
|
| 311 |
"strategy": pred_result["strategy"],
|
| 312 |
"confidence": pred_result["confidence"],
|
|
|
|
|
|
|
|
|
|
| 313 |
}
|
| 314 |
)
|
| 315 |
|
|
|
|
| 317 |
"predicted_enrollment", ascending=False
|
| 318 |
)
|
| 319 |
|
| 320 |
+
def generate_multi_year_forecast(
|
| 321 |
+
self,
|
| 322 |
+
full_data: pd.DataFrame,
|
| 323 |
+
course_metadata: pd.DataFrame,
|
| 324 |
+
electives: set,
|
| 325 |
+
start_year: int,
|
| 326 |
+
smt: int,
|
| 327 |
+
years_ahead: int = 3,
|
| 328 |
+
) -> pd.DataFrame:
|
| 329 |
+
all_results = []
|
| 330 |
+
|
| 331 |
+
for code in electives:
|
| 332 |
+
meta_rows = course_metadata[course_metadata["kode_mk"] == code]
|
| 333 |
+
if len(meta_rows) == 0:
|
| 334 |
+
continue
|
| 335 |
+
meta = meta_rows.iloc[0]
|
| 336 |
+
|
| 337 |
+
year_predictions = self.predict_multi_year(
|
| 338 |
+
code, full_data, start_year, smt, years_ahead
|
| 339 |
+
)
|
| 340 |
+
|
| 341 |
+
for pred in year_predictions:
|
| 342 |
+
course_capacity = self.config.get_class_capacity(code)
|
| 343 |
+
classes_needed = pred.get("classes_needed", 0)
|
| 344 |
+
|
| 345 |
+
all_results.append(
|
| 346 |
+
{
|
| 347 |
+
"kode_mk": code,
|
| 348 |
+
"nama_mk": meta["nama_mk"],
|
| 349 |
+
"year": pred["year"],
|
| 350 |
+
"semester": pred["semester"],
|
| 351 |
+
"predicted_enrollment": round(pred["val"], 1),
|
| 352 |
+
"classes_needed": classes_needed,
|
| 353 |
+
"total_capacity": classes_needed * course_capacity,
|
| 354 |
+
"student_population": round(pred["student_population"], 0),
|
| 355 |
+
"strategy": pred["strategy"],
|
| 356 |
+
"confidence": pred["confidence"],
|
| 357 |
+
}
|
| 358 |
+
)
|
| 359 |
+
|
| 360 |
+
return pd.DataFrame(all_results).sort_values(["kode_mk", "year"])
|
| 361 |
+
|
| 362 |
+
def get_course_trend_analysis(
|
| 363 |
+
self,
|
| 364 |
+
course_code: str,
|
| 365 |
+
df_history: pd.DataFrame,
|
| 366 |
+
target_smt: int,
|
| 367 |
+
) -> Dict:
|
| 368 |
+
hist = df_history[
|
| 369 |
+
(df_history["kode_mk"] == course_code) & (df_history["smt"] == target_smt)
|
| 370 |
+
].sort_values("thn")
|
| 371 |
+
|
| 372 |
+
if len(hist) < 2:
|
| 373 |
+
return {
|
| 374 |
+
"has_sufficient_data": False,
|
| 375 |
+
"data_points": len(hist),
|
| 376 |
+
}
|
| 377 |
+
|
| 378 |
+
enrollments = np.array(hist["enrollment"].values, dtype=float)
|
| 379 |
+
years = np.array(hist["thn"].values, dtype=float)
|
| 380 |
+
|
| 381 |
+
growth_rates = []
|
| 382 |
+
for i in range(1, len(enrollments)):
|
| 383 |
+
if enrollments[i - 1] > 0:
|
| 384 |
+
rate = (enrollments[i] - enrollments[i - 1]) / enrollments[i - 1]
|
| 385 |
+
growth_rates.append(rate)
|
| 386 |
+
|
| 387 |
+
avg_growth_rate = float(np.mean(growth_rates)) if growth_rates else 0.0
|
| 388 |
+
|
| 389 |
+
if len(years) >= 2:
|
| 390 |
+
coeffs = np.polyfit(years, enrollments, 1)
|
| 391 |
+
trend_slope = float(coeffs[0])
|
| 392 |
+
else:
|
| 393 |
+
trend_slope = 0.0
|
| 394 |
+
|
| 395 |
+
return {
|
| 396 |
+
"has_sufficient_data": True,
|
| 397 |
+
"data_points": len(hist),
|
| 398 |
+
"min_enrollment": int(enrollments.min()),
|
| 399 |
+
"max_enrollment": int(enrollments.max()),
|
| 400 |
+
"avg_enrollment": round(float(enrollments.mean()), 1),
|
| 401 |
+
"latest_enrollment": int(enrollments[-1]),
|
| 402 |
+
"avg_growth_rate": round(avg_growth_rate * 100, 1), # as percentage
|
| 403 |
+
"trend_slope": round(trend_slope, 2),
|
| 404 |
+
"trend_direction": "increasing"
|
| 405 |
+
if trend_slope > 0
|
| 406 |
+
else "decreasing"
|
| 407 |
+
if trend_slope < 0
|
| 408 |
+
else "stable",
|
| 409 |
+
"year_range": f"{int(years.min())}-{int(years.max())}",
|
| 410 |
+
}
|
| 411 |
+
|
| 412 |
def predict_course_enrollment(
|
| 413 |
self,
|
| 414 |
course_code: str,
|
|
|
|
| 416 |
test_year: int,
|
| 417 |
test_semester: int,
|
| 418 |
test_student_count: float,
|
| 419 |
+
) -> tuple:
|
| 420 |
result = self.predict_course(
|
| 421 |
course_code=course_code,
|
| 422 |
df_history=train_data,
|
ui_components.py
ADDED
|
@@ -0,0 +1,322 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from typing import Dict
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
def get_color(value: float, thresholds: tuple = (50, 25)) -> str:
|
| 5 |
+
high, low = thresholds
|
| 6 |
+
if value >= high:
|
| 7 |
+
return "#4ade80"
|
| 8 |
+
elif value >= low:
|
| 9 |
+
return "#fb923c"
|
| 10 |
+
else:
|
| 11 |
+
return "#f87171"
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
def get_diff_color(value: float) -> str:
|
| 15 |
+
return "#4ade80" if value >= 0 else "#f87171"
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
# Card Components
|
| 19 |
+
def metric_card(title: str, value: str, color: str, subtitle: str = "") -> str:
|
| 20 |
+
subtitle_html = (
|
| 21 |
+
f'<div style="font-size: 11px; color: #9ca3af; margin-top: 4px;">{subtitle}</div>'
|
| 22 |
+
if subtitle
|
| 23 |
+
else ""
|
| 24 |
+
)
|
| 25 |
+
return f"""
|
| 26 |
+
<div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid {color};">
|
| 27 |
+
<div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">{title}</div>
|
| 28 |
+
<div style="font-size: 28px; font-weight: 700; color: {color};">{value}</div>
|
| 29 |
+
{subtitle_html}
|
| 30 |
+
</div>
|
| 31 |
+
"""
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
def info_row(label: str, value: str, color: str = "#fff", border: bool = True) -> str:
|
| 35 |
+
border_style = "border-bottom: 1px solid #334155;" if border else ""
|
| 36 |
+
return f"""
|
| 37 |
+
<div style="display: flex; justify-content: space-between; padding: 12px 0; {border_style}">
|
| 38 |
+
<span style="color: #9ca3af;">{label}</span>
|
| 39 |
+
<span style="font-weight: 600; color: {color};">{value}</span>
|
| 40 |
+
</div>
|
| 41 |
+
"""
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
def info_card(title: str, rows: list) -> str:
|
| 45 |
+
rows_html = "".join(rows)
|
| 46 |
+
return f"""
|
| 47 |
+
<div style="background: #1e293b; padding: 20px; border-radius: 12px;">
|
| 48 |
+
<h4 style="margin: 0 0 16px 0; color: #fff; font-size: 14px; font-weight: 600;">{title}</h4>
|
| 49 |
+
{rows_html}
|
| 50 |
+
</div>
|
| 51 |
+
"""
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
# Summary Templates
|
| 55 |
+
def build_validation_summary(data: Dict) -> str:
|
| 56 |
+
"""Build summary HTML for validation mode (when actual data exists)."""
|
| 57 |
+
year = data["year"]
|
| 58 |
+
semester_name = data["semester_name"]
|
| 59 |
+
class_capacity = data["class_capacity"]
|
| 60 |
+
data_source = data.get("data_source", "kalkulasi")
|
| 61 |
+
|
| 62 |
+
# Metrics
|
| 63 |
+
class_accuracy_pct = data.get("class_accuracy_pct", 0)
|
| 64 |
+
class_within_one_pct = data.get("class_within_one_pct", 0)
|
| 65 |
+
total_classes = data.get("total_classes", 0)
|
| 66 |
+
comparison_mae = data.get("comparison_mae", 0)
|
| 67 |
+
comparison_rmse = data.get("comparison_rmse", 0)
|
| 68 |
+
total_for_class_accuracy = data.get("total_for_class_accuracy", 0)
|
| 69 |
+
|
| 70 |
+
# Enrollment metrics
|
| 71 |
+
total_actual = data.get("total_actual", 0)
|
| 72 |
+
total_predicted = data.get("total_predicted", 0)
|
| 73 |
+
accuracy_pct = data.get("accuracy_pct", 0)
|
| 74 |
+
class_matches = data.get("class_matches", 0)
|
| 75 |
+
class_within_one = data.get("class_within_one", 0)
|
| 76 |
+
|
| 77 |
+
# Colors
|
| 78 |
+
class_accuracy_color = get_color(class_accuracy_pct)
|
| 79 |
+
diff_color = get_diff_color(total_predicted - total_actual)
|
| 80 |
+
|
| 81 |
+
return f"""
|
| 82 |
+
<div style="padding: 24px;">
|
| 83 |
+
<div style="margin-bottom: 24px;">
|
| 84 |
+
<h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">{
|
| 85 |
+
year
|
| 86 |
+
} Semester {semester_name}</h2>
|
| 87 |
+
<p style="color: #9ca3af; margin: 0; font-size: 14px;">Validasi prediksi terhadap data aktual | Kapasitas per kelas: {
|
| 88 |
+
class_capacity
|
| 89 |
+
} mahasiswa | Sumber kelas aktual: {data_source}</p>
|
| 90 |
+
</div>
|
| 91 |
+
|
| 92 |
+
<div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px; margin-bottom: 24px;">
|
| 93 |
+
{
|
| 94 |
+
metric_card(
|
| 95 |
+
"Akurasi Kelas",
|
| 96 |
+
f"{class_accuracy_pct:.1f}%",
|
| 97 |
+
class_accuracy_color,
|
| 98 |
+
f"±1 kelas: {class_within_one_pct:.1f}%",
|
| 99 |
+
)
|
| 100 |
+
}
|
| 101 |
+
{metric_card("Total Kelas Prediksi", str(total_classes), "#60a5fa")}
|
| 102 |
+
{
|
| 103 |
+
metric_card(
|
| 104 |
+
"MAE / RMSE", f"{comparison_mae:.1f} / {comparison_rmse:.1f}", "#a78bfa"
|
| 105 |
+
)
|
| 106 |
+
}
|
| 107 |
+
{metric_card("MK Divalidasi", str(total_for_class_accuracy), "#fb923c")}
|
| 108 |
+
</div>
|
| 109 |
+
|
| 110 |
+
<div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 16px;">
|
| 111 |
+
{
|
| 112 |
+
info_card(
|
| 113 |
+
"Ringkasan Enrollment",
|
| 114 |
+
[
|
| 115 |
+
info_row("Total Aktual", str(int(total_actual))),
|
| 116 |
+
info_row("Total Prediksi", str(int(total_predicted))),
|
| 117 |
+
info_row(
|
| 118 |
+
"Selisih",
|
| 119 |
+
f"{int(total_predicted - total_actual):+d}",
|
| 120 |
+
diff_color,
|
| 121 |
+
border=False,
|
| 122 |
+
),
|
| 123 |
+
],
|
| 124 |
+
)
|
| 125 |
+
}
|
| 126 |
+
{
|
| 127 |
+
info_card(
|
| 128 |
+
f"Akurasi Prediksi Kelas (dari {data_source})",
|
| 129 |
+
[
|
| 130 |
+
info_row(
|
| 131 |
+
"Kelas Tepat",
|
| 132 |
+
f"{class_matches}/{total_for_class_accuracy}",
|
| 133 |
+
"#4ade80",
|
| 134 |
+
),
|
| 135 |
+
info_row(
|
| 136 |
+
"Selisih ±1 Kelas",
|
| 137 |
+
f"{class_within_one}/{total_for_class_accuracy}",
|
| 138 |
+
"#60a5fa",
|
| 139 |
+
),
|
| 140 |
+
info_row("Akurasi Enrollment", f"{accuracy_pct:.1f}%", border=False),
|
| 141 |
+
],
|
| 142 |
+
)
|
| 143 |
+
}
|
| 144 |
+
</div>
|
| 145 |
+
</div>
|
| 146 |
+
"""
|
| 147 |
+
|
| 148 |
+
|
| 149 |
+
def build_no_match_summary(data: Dict) -> str:
|
| 150 |
+
year = data["year"]
|
| 151 |
+
semester_name = data["semester_name"]
|
| 152 |
+
metrics = data.get("metrics", {"mae": 0, "rmse": 0})
|
| 153 |
+
total_to_open = data.get("total_to_open", 0)
|
| 154 |
+
total_classes = data.get("total_classes", 0)
|
| 155 |
+
|
| 156 |
+
return f"""
|
| 157 |
+
<div style="padding: 24px;">
|
| 158 |
+
<div style="margin-bottom: 24px;">
|
| 159 |
+
<h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">{year} Semester {semester_name}</h2>
|
| 160 |
+
<p style="color: #9ca3af; margin: 0; font-size: 14px;">Data semester ada, tetapi tidak ditemukan MK pilihan yang cocok</p>
|
| 161 |
+
</div>
|
| 162 |
+
|
| 163 |
+
<div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px;">
|
| 164 |
+
{metric_card("MAE (Backtest)", f"{metrics['mae']:.2f}", "#60a5fa")}
|
| 165 |
+
{metric_card("RMSE (Backtest)", f"{metrics['rmse']:.2f}", "#a78bfa")}
|
| 166 |
+
{metric_card("MK Dibuka", str(total_to_open), "#4ade80")}
|
| 167 |
+
{metric_card("Total Kelas", str(total_classes), "#fb923c")}
|
| 168 |
+
</div>
|
| 169 |
+
</div>
|
| 170 |
+
"""
|
| 171 |
+
|
| 172 |
+
|
| 173 |
+
def build_future_prediction_summary(data: Dict) -> str:
|
| 174 |
+
year = data["year"]
|
| 175 |
+
semester_name = data["semester_name"]
|
| 176 |
+
class_capacity = data["class_capacity"]
|
| 177 |
+
metrics = data.get("metrics", {"mae": 0, "rmse": 0})
|
| 178 |
+
|
| 179 |
+
total_to_open = data.get("total_to_open", 0)
|
| 180 |
+
total_classes = data.get("total_classes", 0)
|
| 181 |
+
total_predicted_students = data.get("total_predicted_students", 0)
|
| 182 |
+
total_capacity = data.get("total_capacity", 0)
|
| 183 |
+
|
| 184 |
+
avg_utilization = (
|
| 185 |
+
(total_predicted_students / total_capacity * 100) if total_capacity > 0 else 0
|
| 186 |
+
)
|
| 187 |
+
|
| 188 |
+
return f"""
|
| 189 |
+
<div style="padding: 24px;">
|
| 190 |
+
<div style="margin-bottom: 24px;">
|
| 191 |
+
<h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">{
|
| 192 |
+
year
|
| 193 |
+
} Semester {semester_name}</h2>
|
| 194 |
+
<p style="color: #9ca3af; margin: 0; font-size: 14px;">Prediksi masa depan berdasarkan tren historis | Kapasitas per kelas: {
|
| 195 |
+
class_capacity
|
| 196 |
+
} mahasiswa</p>
|
| 197 |
+
</div>
|
| 198 |
+
|
| 199 |
+
<div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px; margin-bottom: 24px;">
|
| 200 |
+
{metric_card("MK Dibuka", str(total_to_open), "#4ade80")}
|
| 201 |
+
{metric_card("Total Kelas Dibuka", str(total_classes), "#60a5fa")}
|
| 202 |
+
{metric_card("Prediksi Mahasiswa", str(total_predicted_students), "#a78bfa")}
|
| 203 |
+
{metric_card("Total Kuota", str(total_capacity), "#fb923c")}
|
| 204 |
+
</div>
|
| 205 |
+
|
| 206 |
+
<div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 16px;">
|
| 207 |
+
{
|
| 208 |
+
info_card(
|
| 209 |
+
"Backtest Metrics",
|
| 210 |
+
[
|
| 211 |
+
info_row("MAE", f"{metrics['mae']:.2f}"),
|
| 212 |
+
info_row("RMSE", f"{metrics['rmse']:.2f}", border=False),
|
| 213 |
+
],
|
| 214 |
+
)
|
| 215 |
+
}
|
| 216 |
+
{
|
| 217 |
+
info_card(
|
| 218 |
+
"Kapasitas Info",
|
| 219 |
+
[
|
| 220 |
+
info_row("Kapasitas/Kelas", f"{class_capacity} mhs"),
|
| 221 |
+
info_row("Avg Utilization", f"{avg_utilization:.1f}%", border=False),
|
| 222 |
+
],
|
| 223 |
+
)
|
| 224 |
+
}
|
| 225 |
+
</div>
|
| 226 |
+
</div>
|
| 227 |
+
"""
|
| 228 |
+
|
| 229 |
+
|
| 230 |
+
def build_prediction_summary(data: Dict) -> str:
|
| 231 |
+
has_actual_data = data.get("has_actual_data", False)
|
| 232 |
+
|
| 233 |
+
if has_actual_data:
|
| 234 |
+
if "comparison_mae" in data:
|
| 235 |
+
return build_validation_summary(data)
|
| 236 |
+
else:
|
| 237 |
+
return build_no_match_summary(data)
|
| 238 |
+
else:
|
| 239 |
+
return build_future_prediction_summary(data)
|
| 240 |
+
|
| 241 |
+
|
| 242 |
+
def build_multi_year_summary(data: Dict) -> str:
|
| 243 |
+
year = data["year"]
|
| 244 |
+
years_ahead = data["years_ahead"]
|
| 245 |
+
semester_name = data["semester_name"]
|
| 246 |
+
class_capacity = data["class_capacity"]
|
| 247 |
+
|
| 248 |
+
first_year_classes = data["first_year_classes"]
|
| 249 |
+
last_year_classes = data["last_year_classes"]
|
| 250 |
+
growth_classes = data["growth_classes"]
|
| 251 |
+
growth_students = data["growth_students"]
|
| 252 |
+
|
| 253 |
+
growth_class_color = get_diff_color(growth_classes)
|
| 254 |
+
growth_student_color = get_diff_color(growth_students)
|
| 255 |
+
|
| 256 |
+
return f"""
|
| 257 |
+
<div style="padding: 24px;">
|
| 258 |
+
<div style="margin-bottom: 24px;">
|
| 259 |
+
<h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">Proyeksi {years_ahead} Tahun ke Depan - Semester {semester_name}</h2>
|
| 260 |
+
<p style="color: #9ca3af; margin: 0; font-size: 14px;">Forecasting kebutuhan kelas {year} - {year + years_ahead} | Kapasitas per kelas: {class_capacity} mahasiswa</p>
|
| 261 |
+
</div>
|
| 262 |
+
|
| 263 |
+
<div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px; margin-bottom: 24px;">
|
| 264 |
+
{metric_card(f"Kelas ({year})", str(first_year_classes), "#4ade80")}
|
| 265 |
+
{metric_card(f"Kelas ({year + years_ahead})", str(last_year_classes), "#60a5fa")}
|
| 266 |
+
{metric_card("Pertumbuhan Kelas", f"{growth_classes:+d}", growth_class_color)}
|
| 267 |
+
{metric_card("Pertumbuhan Mhs", f"{growth_students:+d}", growth_student_color)}
|
| 268 |
+
</div>
|
| 269 |
+
</div>
|
| 270 |
+
"""
|
| 271 |
+
|
| 272 |
+
|
| 273 |
+
# Placeholder Templates
|
| 274 |
+
def placeholder_card(title: str, subtitle: str) -> str:
|
| 275 |
+
return f"""
|
| 276 |
+
<div style="padding: 60px 40px; text-align: center; background: #1e293b; border-radius: 12px;">
|
| 277 |
+
<h3 style="color: #fff; margin: 0 0 8px 0; font-size: 18px; font-weight: 600;">{title}</h3>
|
| 278 |
+
<p style="color: #9ca3af; margin: 0; font-size: 14px;">{subtitle}</p>
|
| 279 |
+
</div>
|
| 280 |
+
"""
|
| 281 |
+
|
| 282 |
+
|
| 283 |
+
def get_prediction_placeholder() -> str:
|
| 284 |
+
return placeholder_card(
|
| 285 |
+
"Pilih tahun dan semester",
|
| 286 |
+
"Klik Generate Predictions untuk melihat rekomendasi jumlah kelas",
|
| 287 |
+
)
|
| 288 |
+
|
| 289 |
+
|
| 290 |
+
def get_forecast_placeholder() -> str:
|
| 291 |
+
return placeholder_card(
|
| 292 |
+
"Proyeksi Multi-Tahun", "Lihat tren kebutuhan kelas beberapa tahun ke depan"
|
| 293 |
+
)
|
| 294 |
+
|
| 295 |
+
|
| 296 |
+
# Data Info Component
|
| 297 |
+
def build_data_info(data: Dict) -> str:
|
| 298 |
+
if "error" in data:
|
| 299 |
+
return f"<p style='color: #f87171;'>{data['error']}</p>"
|
| 300 |
+
|
| 301 |
+
return f"""
|
| 302 |
+
<div style="padding: 8px 0;">
|
| 303 |
+
<div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 12px;">
|
| 304 |
+
<div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
|
| 305 |
+
<div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">Total MK</div>
|
| 306 |
+
<div style="font-size: 20px; font-weight: 700; color: #fff;">{data["total_courses"]}</div>
|
| 307 |
+
</div>
|
| 308 |
+
<div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
|
| 309 |
+
<div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">MK Pilihan</div>
|
| 310 |
+
<div style="font-size: 20px; font-weight: 700; color: #4ade80;">{data["elective_courses"]}</div>
|
| 311 |
+
</div>
|
| 312 |
+
<div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
|
| 313 |
+
<div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">Kapasitas/Kelas</div>
|
| 314 |
+
<div style="font-size: 20px; font-weight: 700; color: #60a5fa;">{data["class_capacity"]}</div>
|
| 315 |
+
</div>
|
| 316 |
+
<div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
|
| 317 |
+
<div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">Tahun Data</div>
|
| 318 |
+
<div style="font-size: 20px; font-weight: 700; color: #fb923c;">{data["year_min"]}-{data["year_max"]}</div>
|
| 319 |
+
</div>
|
| 320 |
+
</div>
|
| 321 |
+
</div>
|
| 322 |
+
"""
|