Spaces:

muhalwan
/

classquota

Sleeping

App Files Files Community

muhalwan commited on Nov 30, 2025

Commit

6a0a429

1 Parent(s): 1335617

Revised version

Browse files

Files changed (11) hide show

.gitignore +3 -0
README.md +103 -24
app.py +194 -578
backend.py +674 -0
config.py +96 -56
data_loader.py +9 -29
data_processor.py +150 -111
data_validator.py +0 -467
evaluator.py +213 -16
prophet_predictor.py +223 -40
ui_components.py +322 -0

.gitignore CHANGED Viewed

@@ -12,3 +12,6 @@ WORKFLOW.md
 data/
 hf_cache/
 MODEL_WORKFLOW.md

 data/
 hf_cache/
 MODEL_WORKFLOW.md
+data_validator.py
+utils/
+.gitignore

README.md CHANGED Viewed

@@ -9,42 +9,121 @@ app_file: app.py
 pinned: false
 ---
-# SKS Course Enrollment Prediction System
-Predicts student enrollment for elective courses using Prophet time series forecasting with semester-specific analysis.
-## 🎯 Features
-- **Semester-Specific Predictions**: Separate predictions for Semester 1 (Ganjil/Odd) and Semester 2 (Genap/Even)
-- **Time Series Forecasting**: Uses Facebook Prophet for accurate enrollment predictions
-- **Historical Backtesting**: Validates model accuracy with MAE and RMSE metrics
-- **Automated Recommendations**: Suggests which courses to open based on predicted demand
-- **Private Data**: Loads enrollment data from private Hugging Face dataset
-## 🚀 How It Works
-1. **Select Target Year and Semester**: Choose the academic period to predict
-2. **Generate Predictions**: AI analyzes historical enrollment patterns
-3. **View Recommendations**: See which courses should be opened and recommended quotas
-4. **Review Metrics**: Check model performance (MAE, RMSE)
-## 📈 Prediction Strategy
-The system uses multiple forecasting strategies:
-- **Prophet Logistic Growth**: For courses with sufficient historical data
-- **Trend-Based Fallback**: For courses with unrealistic Prophet predictions
-- **Mean Fallback**: For courses with limited history
-- **Cold Start**: For new courses without historical data
-## 🛠️ Technical Stack
-- **Framework**: Gradio for UI
-- **ML Model**: Facebook Prophet
 - **Data Processing**: Pandas, NumPy
 - **Deployment**: Hugging Face Spaces
-## 📊 Model Performance
-The model is validated through backtesting on historical data:
 - Mean Absolute Error (MAE): ~31 students
 - Root Mean Squared Error (RMSE): ~49 students

 pinned: false
 ---
+# SKS Course Enrollment & Class Capacity Prediction System
+Sistem prediksi **jumlah kelas yang perlu dibuka** berdasarkan forecasting enrollment dengan mempertimbangkan kapasitas maksimum per kelas menggunakan Prophet time series forecasting.
+## How It Works
+### Single Semester Prediction
+1. **Pilih Tahun dan Semester**: Tentukan periode akademik yang akan diprediksi
+2. **Generate Predictions**: AI menganalisis pola enrollment historis
+3. **Lihat Rekomendasi Kelas**: Berapa kelas yang perlu dibuka untuk setiap mata kuliah
+4. **Review Utilization**: Cek tingkat utilisasi kapasitas kelas
+### Multi-Year Forecasting
+1. **Tentukan Periode Awal**: Tahun dan semester mulai proyeksi
+2. **Pilih Horizon Forecast**: Berapa tahun ke depan (1-5 tahun)
+3. **Lihat Tren**: Bagaimana kebutuhan kelas berevolusi dari waktu ke waktu
+## Class Capacity Logic
+Contoh skenario:
+- **PDST Course**: 10-15 mahasiswa (2023) → 40 mahasiswa (2024) → Proyeksi terus naik
+- **Kapasitas Max**: 50 mahasiswa per kelas
+- **Rekomendasi**: 1 kelas (jika ≤50), 2 kelas (jika 51-100), dst.
+### Calculation Formula
+```
+Jumlah Kelas = ⌈Prediksi Enrollment / Kapasitas per Kelas⌉
+```
+## Prediction Strategy
+Sistem menggunakan beberapa strategi forecasting:
+- **Prophet Logistic Growth**: Untuk mata kuliah dengan data historis cukup, menggunakan kapasitas sebagai upper bound (cap)
+- **Trend-Based Fallback**: Untuk prediksi Prophet yang tidak realistis
+- **Mean Fallback**: Untuk mata kuliah dengan data terbatas
+- **Cold Start**: Untuk mata kuliah baru tanpa data historis
+## Technical Stack
+- **Framework**: Gradio untuk UI
+- **ML Model**: Facebook Prophet dengan logistic growth
 - **Data Processing**: Pandas, NumPy
+- **Visualization**: Matplotlib, Seaborn
 - **Deployment**: Hugging Face Spaces
+## Configuration
+### Class Capacity Settings (config.py)
+```python
+@dataclass
+class ClassCapacityConfig:
+    DEFAULT_CLASS_CAPACITY: int = 50      # Max students per class
+    MIN_STUDENTS_TO_OPEN_CLASS: int = 10  # Minimum to open a class
+    CAPACITY_WARNING_THRESHOLD: float = 0.8  # 80% utilization warning
+    ENABLE_CAPACITY_CONSTRAINTS: bool = True
+```
+### Multi-Year Forecast Settings
+```python
+@dataclass
+class MultiYearForecastConfig:
+    FORECAST_YEARS_AHEAD: int = 3         # Years to forecast
+    MAX_YEARLY_GROWTH_RATE: float = 0.5   # 50% max growth/year
+    MIN_YEARLY_GROWTH_RATE: float = -0.3  # 30% max decline/year
+```
+## Model Performance
+Model divalidasi melalui backtesting pada data historis:
+### Enrollment Prediction
 - Mean Absolute Error (MAE): ~31 students
 - Root Mean Squared Error (RMSE): ~49 students
+### Class Count Prediction
+- Class MAE: ~0.5 classes
+- Exact Class Match: ~70%
+- Within ±1 Class: ~95%
+## 🔧 Usage
+### Local Development
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Run the app
+python app.py
+```
+### Environment Variables
+- `HF_TOKEN`: Hugging Face token untuk akses private dataset
+## 📝 Output Columns
+### Prediksi Semester
+| Column | Description |
+|--------|-------------|
+| Kode MK | Kode mata kuliah |
+| Nama MK | Nama mata kuliah |
+| Prediksi | Prediksi jumlah mahasiswa |
+| Jumlah Kelas | Rekomendasi jumlah kelas dibuka |
+| Kapasitas/Kelas | Kapasitas maksimum per kelas |
+| Total Kuota | Total kapasitas (Jumlah Kelas × Kapasitas) |
+| Utilization % | Persentase utilisasi kapasitas |
+| Status | BUKA/TUTUP |
+| Confidence | high/medium/low |
+| Strategy | Metode prediksi yang digunakan |
+### Proyeksi Multi-Tahun
+| Column | Description |
+|--------|-------------|
+| Tahun | Tahun prediksi |
+| Kode MK | Kode mata kuliah |
+| Nama MK | Nama mata kuliah |
+| Prediksi | Prediksi enrollment |
+| Kelas | Jumlah kelas dibutuhkan |
+| Kapasitas | Total kapasitas tersedia |

app.py CHANGED Viewed

@@ -1,616 +1,232 @@
-# Version: 3.1 - Dark theme UI with white text
 import logging
-from typing import Optional, Tuple
 import gradio as gr
-import pandas as pd
-from config import Config
-from data_processor import DataProcessor
-from evaluator import Evaluator
-from prophet_predictor import ProphetPredictor
 from utils import setup_logging
 setup_logging("INFO")
 logger = logging.getLogger("GradioApp")
-_processor: Optional[DataProcessor] = None
-_predictor: Optional[ProphetPredictor] = None
-_config: Optional[Config] = None
-_df_enrollment: Optional[pd.DataFrame] = None
-_elective_codes: Optional[set] = None
-_backtest_metrics: Optional[dict] = None
-def initialize_system():
-    """Initialize the prediction system (called once at startup)."""
-    global \
-        _processor, \
-        _predictor, \
-        _config, \
-        _df_enrollment, \
-        _elective_codes, \
-        _backtest_metrics
-    try:
-        logger.info("Initializing prediction system...")
-        _config = Config()
-        _processor = DataProcessor(_config)
-        _df_enrollment, _elective_codes = _processor.load_and_process()
-        _predictor = ProphetPredictor(_config)
-        _predictor.train_student_population_model(
-            _processor.raw_data["students_yearly"]
-        )
-        logger.info("System initialized successfully")
-        return True
-    except Exception as e:
-        logger.error(f"Failed to initialize system: {e}", exc_info=True)
-        return False
-def generate_predictions(
-    year: int, semester: int
-) -> Tuple[str, Optional[pd.DataFrame], Optional[pd.DataFrame]]:
-    """
-    Generate enrollment predictions for a given year and semester.
-    Args:
-        year: Target year (e.g., 2025)
-        semester: Target semester (1 = Ganjil, 2 = Genap)
-    Returns:
-        Tuple of (summary_text, all_predictions_df, comparison_df)
-    """
-    global \
-        _processor, \
-        _predictor, \
-        _config, \
-        _df_enrollment, \
-        _elective_codes, \
-        _backtest_metrics
-    try:
-        if semester not in [1, 2]:
-            return (
-                "Error: Semester harus 1 (Ganjil) atau 2 (Genap)",
-                None,
-                None,
-            )
-        if year < 2020 or year > 2030:
-            return "Error: Year must be between 2020 and 2030", None, None
-        if (
-            _config is None
-            or _predictor is None
-            or _processor is None
-            or _df_enrollment is None
-            or _elective_codes is None
-        ):
-            return (
-                "Error: System not initialized. Please restart the app.",
-                None,
-                None,
-            )
-        logger.info(f"Generating predictions for {year} Semester {semester}...")
-        _config.prediction.PREDICT_YEAR = year
-        _config.prediction.PREDICT_SEMESTER = semester
-        # Check if actual data exists for this year/semester
-        actual_data = _df_enrollment[
-            (_df_enrollment["thn"] == year) & (_df_enrollment["smt"] == semester)
-        ]
-        has_actual_data = len(actual_data) > 0
-        if has_actual_data:
-            logger.info(
-                f"Found actual enrollment data for {year} Semester {semester} - will compare predictions vs actual"
-            )
-        else:
-            logger.info(
-                f"No actual data for {year} Semester {semester} - generating future predictions"
-            )
-        if _backtest_metrics is None:
-            logger.info("Running backtest for the first time...")
-            evaluator = Evaluator(_config)
-            backtest_results = evaluator.run_backtest(_df_enrollment, _predictor)
-            if backtest_results is None or len(backtest_results) == 0:
-                logger.warning("Backtest returned no results, using defaults")
-                _backtest_metrics = {"mae": 0, "rmse": 0}
-            else:
-                metrics_result = evaluator.generate_metrics(backtest_results)
-                if metrics_result is None:
-                    logger.warning("Metrics calculation failed, using defaults")
-                    _backtest_metrics = {"mae": 0, "rmse": 0}
-                else:
-                    _backtest_metrics = metrics_result
-        else:
-            logger.info("Using cached backtest metrics")
-        metrics = _backtest_metrics
-        predictions = _predictor.generate_batch_predictions(
-            _df_enrollment,
-            _processor.raw_data["courses"],
-            _elective_codes,
-            year,
-            semester,
-        )
-        semester_name = "1 (Ganjil)" if semester == 1 else "2 (Genap)"
-        total_to_open = len(predictions[predictions["recommendation"] == "BUKA"])
-        total_seats = (
-            int(
-                predictions[predictions["recommendation"] == "BUKA"][
-                    "recommended_quota"
-                ].sum()
-            )
-            if total_to_open > 0
-            else 0
-        )
-        # Build summary with actual vs prediction comparison if data exists
-        if has_actual_data:
-            # Merge predictions with actual data
-            comparison = predictions.merge(
-                actual_data[["kode_mk", "enrollment"]], on="kode_mk", how="left"
-            )
-            comparison = comparison.rename(columns={"enrollment": "actual_enrollment"})
-            # Calculate comparison metrics only for courses with actual data
-            courses_with_actual = comparison[
-                comparison["actual_enrollment"].notna()
-            ].copy()
-            if len(courses_with_actual) > 0:
-                comparison_mae = abs(
-                    courses_with_actual["predicted_enrollment"]
-                    - courses_with_actual["actual_enrollment"]
-                ).mean()
-                comparison_rmse = (
-                    (
-                        courses_with_actual["predicted_enrollment"]
-                        - courses_with_actual["actual_enrollment"]
-                    )
-                    ** 2
-                ).mean() ** 0.5
-                total_actual = courses_with_actual["actual_enrollment"].sum()
-                total_predicted = courses_with_actual["predicted_enrollment"].sum()
-                accuracy_pct = (
-                    1 - abs(total_predicted - total_actual) / total_actual
-                ) * 100
-                diff_color = (
-                    "#4ade80" if total_predicted - total_actual >= 0 else "#f87171"
-                )
-                summary = f"""
-<div style="padding: 24px;">
-    <div style="margin-bottom: 24px;">
-        <h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">{year} Semester {semester_name}</h2>
-        <p style="color: #9ca3af; margin: 0; font-size: 14px;">Validasi prediksi terhadap data aktual</p>
-    </div>
-    <div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px; margin-bottom: 24px;">
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #4ade80;">
-            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">Akurasi</div>
-            <div style="font-size: 28px; font-weight: 700; color: #4ade80;">{accuracy_pct:.1f}%</div>
-        </div>
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #60a5fa;">
-            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">MAE</div>
-            <div style="font-size: 28px; font-weight: 700; color: #60a5fa;">{comparison_mae:.2f}</div>
-        </div>
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #a78bfa;">
-            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">RMSE</div>
-            <div style="font-size: 28px; font-weight: 700; color: #a78bfa;">{comparison_rmse:.2f}</div>
-        </div>
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #fb923c;">
-            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">MK Divalidasi</div>
-            <div style="font-size: 28px; font-weight: 700; color: #fb923c;">{len(courses_with_actual)}</div>
-        </div>
-    </div>
-    <div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 16px;">
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px;">
-            <h4 style="margin: 0 0 16px 0; color: #fff; font-size: 14px; font-weight: 600;">Ringkasan Enrollment</h4>
-            <div style="display: flex; justify-content: space-between; padding: 12px 0; border-bottom: 1px solid #334155;">
-                <span style="color: #9ca3af;">Total Aktual</span>
-                <span style="font-weight: 600; color: #fff;">{int(total_actual)}</span>
-            </div>
-            <div style="display: flex; justify-content: space-between; padding: 12px 0; border-bottom: 1px solid #334155;">
-                <span style="color: #9ca3af;">Total Prediksi</span>
-                <span style="font-weight: 600; color: #fff;">{int(total_predicted)}</span>
-            </div>
-            <div style="display: flex; justify-content: space-between; padding: 12px 0;">
-                <span style="color: #9ca3af;">Selisih</span>
-                <span style="font-weight: 600; color: {diff_color};">{int(total_predicted - total_actual):+d}</span>
-            </div>
-        </div>
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px;">
-            <h4 style="margin: 0 0 16px 0; color: #fff; font-size: 14px; font-weight: 600;">Rekomendasi</h4>
-            <div style="display: flex; justify-content: space-between; padding: 12px 0; border-bottom: 1px solid #334155;">
-                <span style="color: #9ca3af;">MK Dibuka</span>
-                <span style="font-weight: 600; color: #fff;">{total_to_open}</span>
-            </div>
-            <div style="display: flex; justify-content: space-between; padding: 12px 0; border-bottom: 1px solid #334155;">
-                <span style="color: #9ca3af;">Total Kuota</span>
-                <span style="font-weight: 600; color: #fff;">{total_seats}</span>
-            </div>
-            <div style="display: flex; justify-content: space-between; padding: 12px 0;">
-                <span style="color: #9ca3af;">Backtest MAE</span>
-                <span style="font-weight: 600; color: #fff;">{metrics["mae"]:.2f}</span>
-            </div>
-        </div>
-    </div>
-</div>
-"""
-            else:
-                summary = f"""
-<div style="padding: 24px;">
-    <div style="margin-bottom: 24px;">
-        <h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">{year} Semester {semester_name}</h2>
-        <p style="color: #9ca3af; margin: 0; font-size: 14px;">Data semester ada, tetapi tidak ditemukan MK pilihan yang cocok</p>
-    </div>
-    <div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px;">
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #60a5fa;">
-            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">MAE (Backtest)</div>
-            <div style="font-size: 28px; font-weight: 700; color: #60a5fa;">{metrics["mae"]:.2f}</div>
-        </div>
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #a78bfa;">
-            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">RMSE (Backtest)</div>
-            <div style="font-size: 28px; font-weight: 700; color: #a78bfa;">{metrics["rmse"]:.2f}</div>
-        </div>
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #4ade80;">
-            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">MK Dibuka</div>
-            <div style="font-size: 28px; font-weight: 700; color: #4ade80;">{total_to_open}</div>
-        </div>
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #fb923c;">
-            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">Total Kuota</div>
-            <div style="font-size: 28px; font-weight: 700; color: #fb923c;">{total_seats}</div>
-        </div>
-    </div>
-</div>
-"""
-        else:
-            summary = f"""
-<div style="padding: 24px;">
-    <div style="margin-bottom: 24px;">
-        <h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">{year} Semester {semester_name}</h2>
-        <p style="color: #9ca3af; margin: 0; font-size: 14px;">Prediksi masa depan berdasarkan tren historis</p>
-    </div>
-    <div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px; margin-bottom: 24px;">
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #60a5fa;">
-            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">MAE (Backtest)</div>
-            <div style="font-size: 28px; font-weight: 700; color: #60a5fa;">{metrics["mae"]:.2f}</div>
-        </div>
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #a78bfa;">
-            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">RMSE (Backtest)</div>
-            <div style="font-size: 28px; font-weight: 700; color: #a78bfa;">{metrics["rmse"]:.2f}</div>
-        </div>
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #4ade80;">
-            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">MK Dibuka</div>
-            <div style="font-size: 28px; font-weight: 700; color: #4ade80;">{total_to_open}</div>
-        </div>
-        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid #fb923c;">
-            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">Total Kuota</div>
-            <div style="font-size: 28px; font-weight: 700; color: #fb923c;">{total_seats}</div>
-        </div>
-    </div>
-    <div style="background: #1e293b; padding: 20px; border-radius: 12px; max-width: 300px;">
-        <h4 style="margin: 0 0 16px 0; color: #fff; font-size: 14px; font-weight: 600;">Estimasi Total</h4>
-        <div style="display: flex; justify-content: space-between; padding: 12px 0;">
-            <span style="color: #9ca3af;">Total Mahasiswa</span>
-            <span style="font-weight: 600; color: #fff;">{int(predictions["predicted_enrollment"].sum())}</span>
-        </div>
-    </div>
-</div>
-"""
-        # Prepare all predictions display
-        all_predictions_display = predictions[
-            [
-                "kode_mk",
-                "nama_mk",
-                "predicted_enrollment",
-                "recommended_quota",
-                "recommendation",
-                "confidence",
-                "strategy",
-            ]
-        ].copy()
-        all_predictions_display.columns = [
-            "Kode MK",
-            "Nama MK",
-            "Prediksi",
-            "Kuota",
-            "Status",
-            "Confidence",
-            "Strategy",
-        ]
-        all_predictions_display["Prediksi"] = all_predictions_display["Prediksi"].round(
-            1
-        )
-        all_predictions_display["Kuota"] = all_predictions_display["Kuota"].astype(int)
-        # Map status to plain text
-        all_predictions_display["Status"] = all_predictions_display["Status"].map(
-            {"BUKA": "BUKA", "TUTUP": "TUTUP"}
-        )
-        all_predictions_display = all_predictions_display.sort_values(
-            "Prediksi", ascending=False
         )
-        # Prepare comparison table if actual data exists
-        comparison_display = None
-        if has_actual_data:
-            logger.info(
-                f"Building comparison table - Actual data has {len(actual_data)} courses"
-            )
-            logger.info(f"Predictions has {len(predictions)} courses")
-            comparison = predictions.merge(
-                actual_data[["kode_mk", "enrollment"]], on="kode_mk", how="left"
-            )
-            comparison = comparison.rename(columns={"enrollment": "actual_enrollment"})
-            # Filter to courses with actual data and calculate error
-            courses_with_actual = comparison[
-                comparison["actual_enrollment"].notna()
-            ].copy()
-            logger.info(
-                f"Courses with matching actual data: {len(courses_with_actual)}"
-            )
-            if len(courses_with_actual) > 0:
-                logger.info(
-                    f"Matching courses: {courses_with_actual['kode_mk'].tolist()}"
-                )
-            if len(courses_with_actual) > 0:
-                courses_with_actual["error"] = (
-                    courses_with_actual["predicted_enrollment"]
-                    - courses_with_actual["actual_enrollment"]
-                )
-                courses_with_actual["abs_error"] = abs(courses_with_actual["error"])
-                courses_with_actual["accuracy_%"] = 100 * (
-                    1
-                    - courses_with_actual["abs_error"]
-                    / courses_with_actual["actual_enrollment"].replace(0, 1)
-                )
-                comparison_display = courses_with_actual[
-                    [
-                        "kode_mk",
-                        "nama_mk",
-                        "actual_enrollment",
-                        "predicted_enrollment",
-                        "error",
-                        "abs_error",
-                        "accuracy_%",
-                        "strategy",
-                    ]
-                ].copy()
-                comparison_display.columns = [
-                    "Kode MK",
-                    "Nama MK",
-                    "Aktual",
-                    "Prediksi",
-                    "Error",
-                    "Abs Error",
-                    "Akurasi %",
-                    "Strategy",
-                ]
-                comparison_display["Aktual"] = comparison_display["Aktual"].astype(int)
-                comparison_display["Prediksi"] = comparison_display["Prediksi"].round(1)
-                comparison_display["Error"] = comparison_display["Error"].round(1)
-                comparison_display["Abs Error"] = comparison_display["Abs Error"].round(
-                    1
                 )
-                comparison_display["Akurasi %"] = comparison_display["Akurasi %"].round(
-                    1
                 )
-                comparison_display = comparison_display.sort_values(
-                    "Abs Error", ascending=False
-                )
-                logger.info(
-                    f"Comparison table created with {len(comparison_display)} courses"
                 )
-            else:
-                logger.warning(
-                    "Actual data exists but no matching courses found for comparison"
                 )
-                logger.warning(f"Predicted courses: {predictions['kode_mk'].tolist()}")
-                logger.warning(f"Actual courses: {actual_data['kode_mk'].tolist()}")
-        logger.info(
-            f"Predictions generated successfully (comparison_display: {comparison_display is not None})"
         )
-        return summary, all_predictions_display, comparison_display
-    except Exception as e:
-        error_msg = f"Error generating predictions: {str(e)}"
-        logger.error(error_msg, exc_info=True)
-        return error_msg, None, None
-def get_data_info() -> str:
-    """Get information about the loaded dataset."""
-    global _processor, _config
-    try:
-        if _processor is None or _config is None:
-            return "System not initialized"
-        courses = _processor.raw_data.get("courses")
-        students = _processor.raw_data.get("students_yearly")
-        if courses is None or students is None:
-            return "Data not loaded"
-        # Get elective courses
-        elective_courses = courses[courses["kategori_mk"] == "P"]
-        info = f"""
-<div style="padding: 8px 0;">
-    <div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 12px;">
-        <div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
-            <div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">Total MK</div>
-            <div style="font-size: 20px; font-weight: 700; color: #fff;">{len(courses)}</div>
-        </div>
-        <div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
-            <div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">MK Pilihan</div>
-            <div style="font-size: 20px; font-weight: 700; color: #4ade80;">{len(elective_courses)}</div>
-        </div>
-        <div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
-            <div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">MK Wajib</div>
-            <div style="font-size: 20px; font-weight: 700; color: #fff;">{len(courses) - len(elective_courses)}</div>
-        </div>
-        <div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
-            <div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">Tahun Data</div>
-            <div style="font-size: 20px; font-weight: 700; color: #60a5fa;">{students["thn"].min()}-{students["thn"].max()}</div>
-        </div>
-    </div>
-</div>
-"""
-        return info
-    except Exception as e:
-        return f"Error getting data info: {str(e)}"
-# Initialize system at startup
 logger.info("Starting Gradio app...")
-init_success = initialize_system()
 if not init_success:
     logger.error("Failed to initialize system. App may not work correctly.")
-# Create Gradio Interface
-with gr.Blocks(title="SKS Enrollment Predictor") as demo:
-    # Header
-    gr.Markdown("# Course Enrollment Predictor")
-    with gr.Row():
-        # Left panel - Controls
-        with gr.Column(scale=1, min_width=300):
-            year_input = gr.Number(
-                label="Tahun",
-                value=2025,
-                precision=0,
-                minimum=2020,
-                maximum=2030,
-            )
-            semester_input = gr.Radio(
-                choices=[("1 (Ganjil)", 1), ("2 (Genap)", 2)],
-                label="Semester",
-                value=2,
-            )
-            predict_btn = gr.Button(
-                "Generate Predictions",
-                variant="primary",
-                size="lg",
-            )
-            gr.Markdown("---")
-            # Data info section
-            with gr.Accordion("Dataset Info", open=False):
-                data_info_output = gr.HTML()
-                demo.load(fn=get_data_info, inputs=[], outputs=data_info_output)
-        # Right panel - Results
-        with gr.Column(scale=3):
-            summary_output = gr.HTML(
-                value="""
-                <div style="padding: 60px 40px; text-align: center; background: #1e293b; border-radius: 12px;">
-                    <h3 style="color: #fff; margin: 0 0 8px 0; font-size: 18px; font-weight: 600;">Pilih tahun dan semester</h3>
-                    <p style="color: #9ca3af; margin: 0; font-size: 14px;">Klik Generate Predictions untuk melihat hasil</p>
-                </div>
-                """
-            )
-    gr.Markdown("---")
-    # Predictions table
-    gr.Markdown("### Daftar Prediksi Mata Kuliah")
-    all_predictions_output = gr.Dataframe(
-        label="",
-        wrap=True,
-        interactive=False,
-    )
-    # Comparison section
-    with gr.Accordion("Detail Validasi", open=False) as comparison_accordion:
-        comparison_info = gr.Markdown(
-            value="Data validasi muncul ketika data aktual tersedia",
-        )
-        comparison_output = gr.Dataframe(
-            label="",
-            wrap=True,
-            interactive=False,
-        )
-    def update_ui_with_predictions(year, semester):
-        """Wrapper to handle UI updates based on whether comparison data exists."""
-        summary, all_predictions, comparison = generate_predictions(year, semester)
-        logger.info(
-            f"UI Update: comparison is None: {comparison is None}, empty: {comparison.empty if comparison is not None else 'N/A'}"
-        )
-        if comparison is not None and not comparison.empty:
-            logger.info(f"Showing comparison table with {len(comparison)} rows")
-            return (
-                summary,
-                all_predictions,
-                gr.update(open=True),
-                gr.update(
-                    value=f"Validasi terhadap {len(comparison)} mata kuliah",
-                ),
-                gr.update(value=comparison),
-            )
-        else:
-            logger.info("Hiding comparison table - no data available")
-            return (
-                summary,
-                all_predictions,
-                gr.update(open=False),
-                gr.update(
-                    value="Tidak ada data validasi untuk prediksi masa depan",
-                ),
-                gr.update(value=None),
-            )
-    predict_btn.click(
-        fn=update_ui_with_predictions,
-        inputs=[year_input, semester_input],
-        outputs=[
-            summary_output,
-            all_predictions_output,
-            comparison_accordion,
-            comparison_info,
-            comparison_output,
-        ],
-    )
 # Launch the app
 if __name__ == "__main__":

 import logging
 import gradio as gr
+from backend import get_backend
+from ui_components import (
+    build_data_info,
+    build_multi_year_summary,
+    build_prediction_summary,
+    get_forecast_placeholder,
+    get_prediction_placeholder,
+)
 from utils import setup_logging
 setup_logging("INFO")
 logger = logging.getLogger("GradioApp")
+# Backend Interface
+def get_data_info() -> str:
+    backend = get_backend()
+    data = backend.get_data_info()
+    return build_data_info(data)
+def generate_predictions(year: int, semester: int):
+    backend = get_backend()
+    result = backend.generate_predictions(year, semester)
+    if result.error:
+        return f"Error: {result.error}", None, None
+    summary_html = build_prediction_summary(result.summary_data)
+    return summary_html, result.predictions_df, result.comparison_df
+def generate_multi_year_forecast(year: int, semester: int, years_ahead: int = 3):
+    backend = get_backend()
+    result = backend.generate_multi_year_forecast(year, semester, years_ahead)
+    if result.error:
+        return f"Error: {result.error}", None
+    summary_html = build_multi_year_summary(result.summary_data)
+    return summary_html, result.forecast_df
+def update_ui_with_predictions(year: int, semester: int):
+    summary, all_predictions, comparison = generate_predictions(year, semester)
+    logger.info(
+        f"UI Update: comparison is None: {comparison is None}, "
+        f"empty: {comparison.empty if comparison is not None else 'N/A'}"
+    )
+    if comparison is not None and not comparison.empty:
+        logger.info(f"Showing comparison table with {len(comparison)} rows")
+        return (
+            summary,
+            all_predictions,
+            gr.update(open=True),
+            gr.update(
+                value=f"Validasi terhadap {len(comparison)} mata kuliah - "
+                "termasuk perbandingan jumlah kelas aktual vs prediksi"
+            ),
+            gr.update(value=comparison),
+        )
+    else:
+        logger.info("Hiding comparison table - no data available")
+        return (
+            summary,
+            all_predictions,
+            gr.update(open=False),
+            gr.update(value="Tidak ada data validasi untuk prediksi masa depan"),
+            gr.update(value=None),
         )
+# Gradio UI
+def create_gradio_app() -> gr.Blocks:
+    """Create and configure the Gradio application."""
+    with gr.Blocks(title="SKS Enrollment Predictor") as demo:
+        # Header
+        gr.Markdown("# Course Enrollment & Class Capacity Predictor")
+        gr.Markdown(
+            "Sistem prediksi **jumlah kelas yang perlu dibuka** berdasarkan "
+            "forecasting enrollment dengan mempertimbangkan kapasitas maksimum per kelas."
+        )
+        with gr.Tabs():
+            # Single Year
+            with gr.TabItem("Prediksi Semester"):
+                with gr.Row():
+                    with gr.Column(scale=1, min_width=300):
+                        year_input = gr.Number(
+                            label="Tahun",
+                            value=2025,
+                            precision=0,
+                            minimum=2020,
+                            maximum=2030,
+                        )
+                        semester_input = gr.Radio(
+                            choices=[("1 (Ganjil)", 1), ("2 (Genap)", 2)],
+                            label="Semester",
+                            value=2,
+                        )
+                        predict_btn = gr.Button(
+                            "Generate Predictions",
+                            variant="primary",
+                            size="lg",
+                        )
+                        gr.Markdown("---")
+                        with gr.Accordion("Dataset Info", open=False):
+                            data_info_output = gr.HTML()
+                            demo.load(
+                                fn=get_data_info, inputs=[], outputs=data_info_output
+                            )
+                    with gr.Column(scale=3):
+                        summary_output = gr.HTML(value=get_prediction_placeholder())
+                gr.Markdown("---")
+                gr.Markdown("### Rekomendasi Jumlah Kelas per Mata Kuliah")
+                gr.Markdown(
+                    "*Jumlah kelas dihitung berdasarkan prediksi enrollment ÷ "
+                    "kapasitas per kelas*"
                 )
+                all_predictions_output = gr.Dataframe(
+                    label="",
+                    wrap=True,
+                    interactive=False,
                 )
+                with gr.Accordion(
+                    "Detail Validasi", open=False
+                ) as comparison_accordion:
+                    comparison_info = gr.Markdown(
+                        value="Data validasi muncul ketika data aktual tersedia"
+                    )
+                    comparison_output = gr.Dataframe(
+                        label="",
+                        wrap=True,
+                        interactive=False,
+                    )
+            # Multi-Year Forecast
+            with gr.TabItem("Proyeksi Multi-Tahun"):
+                gr.Markdown("### Forecasting Kebutuhan Kelas Beberapa Tahun ke Depan")
+                gr.Markdown(
+                    "Memprediksi tren jumlah mahasiswa dan kebutuhan kelas "
+                    "untuk perencanaan jangka panjang."
                 )
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        forecast_year = gr.Number(
+                            label="Tahun Mulai",
+                            value=2025,
+                            precision=0,
+                            minimum=2020,
+                            maximum=2030,
+                        )
+                        forecast_semester = gr.Radio(
+                            choices=[("1 (Ganjil)", 1), ("2 (Genap)", 2)],
+                            label="Semester",
+                            value=2,
+                        )
+                        forecast_years = gr.Slider(
+                            label="Tahun ke Depan",
+                            minimum=1,
+                            maximum=5,
+                            value=3,
+                            step=1,
+                        )
+                        forecast_btn = gr.Button(
+                            "Generate Forecast",
+                            variant="primary",
+                            size="lg",
+                        )
+                    with gr.Column(scale=3):
+                        forecast_summary = gr.HTML(value=get_forecast_placeholder())
+                gr.Markdown("---")
+                gr.Markdown("### Detail Proyeksi per Mata Kuliah per Tahun")
+                forecast_table = gr.Dataframe(
+                    label="",
+                    wrap=True,
+                    interactive=False,
                 )
+        predict_btn.click(
+            fn=update_ui_with_predictions,
+            inputs=[year_input, semester_input],
+            outputs=[
+                summary_output,
+                all_predictions_output,
+                comparison_accordion,
+                comparison_info,
+                comparison_output,
+            ],
         )
+        forecast_btn.click(
+            fn=generate_multi_year_forecast,
+            inputs=[forecast_year, forecast_semester, forecast_years],
+            outputs=[forecast_summary, forecast_table],
+        )
+    return demo
 logger.info("Starting Gradio app...")
+backend = get_backend()
+init_success = backend.initialize()
 if not init_success:
     logger.error("Failed to initialize system. App may not work correctly.")
+demo = create_gradio_app()
 # Launch the app
 if __name__ == "__main__":

backend.py ADDED Viewed

	@@ -0,0 +1,674 @@

+import logging
+from dataclasses import dataclass
+from typing import Dict, Optional, Tuple
+import pandas as pd
+from config import Config
+from data_processor import DataProcessor
+from evaluator import Evaluator
+from prophet_predictor import ProphetPredictor
+from utils import setup_logging
+setup_logging("INFO")
+logger = logging.getLogger("Backend")
+@dataclass
+class PredictionResult:
+    summary_data: Dict
+    predictions_df: pd.DataFrame
+    comparison_df: Optional[pd.DataFrame]
+    has_actual_data: bool
+    error: Optional[str] = None
+@dataclass
+class ForecastResult:
+    summary_data: Dict
+    forecast_df: pd.DataFrame
+    yearly_summary: pd.DataFrame
+    error: Optional[str] = None
+class PredictionBackend:
+    def __init__(self):
+        self._processor: Optional[DataProcessor] = None
+        self._predictor: Optional[ProphetPredictor] = None
+        self._config: Optional[Config] = None
+        self._df_enrollment: Optional[pd.DataFrame] = None
+        self._elective_codes: Optional[set] = None
+        self._backtest_metrics: Optional[dict] = None
+        self._initialized: bool = False
+    @property
+    def is_initialized(self) -> bool:
+        return self._initialized
+    @property
+    def config(self) -> Optional[Config]:
+        return self._config
+    def initialize(self) -> bool:
+        try:
+            logger.info("Initializing prediction system...")
+            self._config = Config()
+            self._processor = DataProcessor(self._config)
+            self._df_enrollment, self._elective_codes = (
+                self._processor.load_and_process()
+            )
+            self._predictor = ProphetPredictor(self._config)
+            self._predictor.train_student_population_model(
+                self._processor.raw_data["students_yearly"]
+            )
+            self._initialized = True
+            logger.info("System initialized successfully")
+            return True
+        except Exception as e:
+            logger.error(f"Failed to initialize system: {e}", exc_info=True)
+            self._initialized = False
+            return False
+    def get_data_info(self) -> Dict:
+        if not self._initialized or self._processor is None or self._config is None:
+            return {"error": "System not initialized"}
+        try:
+            courses = self._processor.raw_data.get("courses")
+            students = self._processor.raw_data.get("students_yearly")
+            if courses is None or students is None:
+                return {"error": "Data not loaded"}
+            elective_courses = courses[courses["kategori_mk"] == "P"]
+            return {
+                "total_courses": len(courses),
+                "elective_courses": len(elective_courses),
+                "class_capacity": self._config.class_capacity.DEFAULT_CLASS_CAPACITY,
+                "year_min": int(students["thn"].min()),
+                "year_max": int(students["thn"].max()),
+            }
+        except Exception as e:
+            return {"error": str(e)}
+    def _run_backtest_if_needed(self) -> Dict:
+        if self._backtest_metrics is not None:
+            return self._backtest_metrics
+        if (
+            self._config is None
+            or self._df_enrollment is None
+            or self._predictor is None
+        ):
+            logger.warning("System not initialized, using default metrics")
+            self._backtest_metrics = {"mae": 0, "rmse": 0}
+            return self._backtest_metrics
+        logger.info("Running backtest for the first time...")
+        evaluator = Evaluator(self._config)
+        backtest_results = evaluator.run_backtest(self._df_enrollment, self._predictor)
+        if backtest_results is None or len(backtest_results) == 0:
+            logger.warning("Backtest returned no results, using defaults")
+            self._backtest_metrics = {"mae": 0, "rmse": 0}
+        else:
+            metrics_result = evaluator.generate_metrics(backtest_results)
+            if metrics_result is None:
+                logger.warning("Metrics calculation failed, using defaults")
+                self._backtest_metrics = {"mae": 0, "rmse": 0}
+            else:
+                self._backtest_metrics = metrics_result
+        return self._backtest_metrics
+    def _get_actual_data(self, year: int, semester: int) -> Tuple[pd.DataFrame, bool]:
+        if self._df_enrollment is None:
+            return pd.DataFrame(), False
+        actual_data = self._df_enrollment[
+            (self._df_enrollment["thn"] == year)
+            & (self._df_enrollment["smt"] == semester)
+        ]
+        return actual_data, len(actual_data) > 0
+    def _calculate_class_metrics(
+        self,
+        courses_with_actual: pd.DataFrame,
+        year: int,
+        semester: int,
+    ) -> Dict:
+        if self._processor is None or self._config is None:
+            return {
+                "class_matches": 0,
+                "class_within_one": 0,
+                "total_for_class_accuracy": 0,
+                "class_accuracy_pct": 0,
+                "class_within_one_pct": 0,
+                "has_actual_class_data": False,
+                "data_source": "kalkulasi",
+            }
+        actual_classes_df = self._processor.get_class_count_for_validation(
+            year, semester
+        )
+        has_actual_class_data = False
+        courses_with_class_data: Optional[pd.DataFrame] = None
+        if len(actual_classes_df) > 0:
+            courses_with_actual = courses_with_actual.merge(
+                actual_classes_df, on="kode_mk", how="left"
+            )
+            has_actual_class_data = courses_with_actual["actual_classes"].notna().any()
+        if has_actual_class_data:
+            courses_with_class_data = courses_with_actual[
+                courses_with_actual["actual_classes"].notna()
+            ].copy()
+            courses_with_class_data["actual_classes"] = courses_with_class_data[
+                "actual_classes"
+            ].astype(int)
+            class_matches = (
+                courses_with_class_data["classes_needed"]
+                == courses_with_class_data["actual_classes"]
+            ).sum()
+            total_for_class_accuracy = len(courses_with_class_data)
+        else:
+            config = self._config
+            courses_with_actual["actual_classes_calc"] = courses_with_actual.apply(
+                lambda row: config.calculate_classes_needed(
+                    row["actual_enrollment"],
+                    row["kode_mk"],
+                    has_historical_data=True,
+                ),
+                axis=1,
+            )
+            class_matches = (
+                courses_with_actual["classes_needed"]
+                == courses_with_actual["actual_classes_calc"]
+            ).sum()
+            total_for_class_accuracy = len(courses_with_actual)
+        class_accuracy_pct = (
+            (class_matches / total_for_class_accuracy) * 100
+            if total_for_class_accuracy > 0
+            else 0
+        )
+        if has_actual_class_data and courses_with_class_data is not None:
+            class_within_one = (
+                abs(
+                    courses_with_class_data["classes_needed"]
+                    - courses_with_class_data["actual_classes"]
+                )
+                <= 1
+            ).sum()
+        else:
+            class_within_one = (
+                abs(
+                    courses_with_actual["classes_needed"]
+                    - courses_with_actual["actual_classes_calc"]
+                )
+                <= 1
+            ).sum()
+        class_within_one_pct = (
+            (class_within_one / total_for_class_accuracy) * 100
+            if total_for_class_accuracy > 0
+            else 0
+        )
+        return {
+            "class_matches": int(class_matches),
+            "class_within_one": int(class_within_one),
+            "total_for_class_accuracy": total_for_class_accuracy,
+            "class_accuracy_pct": class_accuracy_pct,
+            "class_within_one_pct": class_within_one_pct,
+            "has_actual_class_data": has_actual_class_data,
+            "data_source": "tabel2" if has_actual_class_data else "kalkulasi",
+        }
+    def _prepare_comparison_table(
+        self,
+        predictions: pd.DataFrame,
+        actual_data: pd.DataFrame,
+        year: int,
+        semester: int,
+    ) -> Optional[pd.DataFrame]:
+        if self._processor is None or self._config is None:
+            return None
+        comparison = predictions.merge(
+            actual_data[["kode_mk", "enrollment"]], on="kode_mk", how="left"
+        )
+        comparison = comparison.rename(columns={"enrollment": "actual_enrollment"})
+        actual_classes_df = self._processor.get_class_count_for_validation(
+            year, semester
+        )
+        if len(actual_classes_df) > 0:
+            comparison = comparison.merge(actual_classes_df, on="kode_mk", how="left")
+        else:
+            comparison["actual_classes"] = None
+        courses_with_actual = comparison[comparison["actual_enrollment"].notna()].copy()
+        if len(courses_with_actual) == 0:
+            return None
+        courses_with_actual["error"] = (
+            courses_with_actual["predicted_enrollment"]
+            - courses_with_actual["actual_enrollment"]
+        )
+        courses_with_actual["abs_error"] = abs(courses_with_actual["error"])
+        courses_with_actual["accuracy_%"] = 100 * (
+            1
+            - courses_with_actual["abs_error"]
+            / courses_with_actual["actual_enrollment"].replace(0, 1)
+        )
+        if (
+            "actual_classes" not in courses_with_actual.columns
+            or courses_with_actual["actual_classes"].isna().all()
+        ):
+            config_ref = self._config
+            courses_with_actual["actual_classes"] = courses_with_actual.apply(
+                lambda row: config_ref.calculate_classes_needed(
+                    row["actual_enrollment"],
+                    row["kode_mk"],
+                    has_historical_data=True,
+                ),
+                axis=1,
+            )
+        else:
+            config_ref = self._config
+            courses_with_actual["actual_classes"] = courses_with_actual.apply(
+                lambda row: (
+                    int(row["actual_classes"])
+                    if pd.notna(row["actual_classes"])
+                    else config_ref.calculate_classes_needed(
+                        row["actual_enrollment"],
+                        row["kode_mk"],
+                        has_historical_data=True,
+                    )
+                ),
+                axis=1,
+            )
+        courses_with_actual["class_diff"] = (
+            courses_with_actual["classes_needed"]
+            - courses_with_actual["actual_classes"]
+        )
+        comparison_display = courses_with_actual[
+            [
+                "kode_mk",
+                "nama_mk",
+                "actual_enrollment",
+                "predicted_enrollment",
+                "actual_classes",
+                "classes_needed",
+                "class_diff",
+                "error",
+                "accuracy_%",
+                "strategy",
+            ]
+        ].copy()
+        comparison_display.columns = [
+            "Kode MK",
+            "Nama MK",
+            "Aktual",
+            "Prediksi",
+            "Kelas Aktual",
+            "Kelas Prediksi",
+            "Selisih Kelas",
+            "Error",
+            "Akurasi %",
+            "Strategy",
+        ]
+        comparison_display["Aktual"] = comparison_display["Aktual"].astype(int)
+        comparison_display["Prediksi"] = comparison_display["Prediksi"].round(1)
+        comparison_display["Error"] = comparison_display["Error"].round(1)
+        comparison_display["Akurasi %"] = comparison_display["Akurasi %"].round(1)
+        comparison_display["Kelas Aktual"] = comparison_display["Kelas Aktual"].astype(
+            int
+        )
+        comparison_display["Kelas Prediksi"] = comparison_display[
+            "Kelas Prediksi"
+        ].astype(int)
+        comparison_display["Selisih Kelas"] = comparison_display[
+            "Selisih Kelas"
+        ].astype(int)
+        return comparison_display.sort_values("Aktual", ascending=False)
+    def _prepare_predictions_display(self, predictions: pd.DataFrame) -> pd.DataFrame:
+        """Prepare predictions dataframe for display."""
+        display_df = predictions[
+            [
+                "kode_mk",
+                "nama_mk",
+                "predicted_enrollment",
+                "classes_needed",
+                "class_capacity",
+                "total_quota",
+                "utilization_pct",
+                "recommendation",
+                "confidence",
+                "strategy",
+            ]
+        ].copy()
+        display_df.columns = [
+            "Kode MK",
+            "Nama MK",
+            "Prediksi",
+            "Jumlah Kelas",
+            "Kapasitas/Kelas",
+            "Total Kuota",
+            "Utilization %",
+            "Status",
+            "Confidence",
+            "Strategy",
+        ]
+        display_df["Prediksi"] = display_df["Prediksi"].round(1)
+        display_df["Jumlah Kelas"] = display_df["Jumlah Kelas"].astype(int)
+        display_df["Total Kuota"] = display_df["Total Kuota"].astype(int)
+        display_df["Status"] = display_df["Status"].map(
+            {"BUKA": "BUKA", "TUTUP": "TUTUP"}
+        )
+        display_df = display_df[display_df["Confidence"] == "high"]
+        display_df = display_df[display_df["Status"] == "BUKA"]
+        display_df = display_df.sort_values("Prediksi", ascending=False)
+        display_df = display_df.drop(columns=["Confidence", "Status"])
+        return display_df
+    def generate_predictions(self, year: int, semester: int) -> PredictionResult:
+        if semester not in [1, 2]:
+            return PredictionResult(
+                summary_data={},
+                predictions_df=pd.DataFrame(),
+                comparison_df=None,
+                has_actual_data=False,
+                error="Semester harus 1 (Ganjil) atau 2 (Genap)",
+            )
+        if year < 2020 or year > 2030:
+            return PredictionResult(
+                summary_data={},
+                predictions_df=pd.DataFrame(),
+                comparison_df=None,
+                has_actual_data=False,
+                error="Year must be between 2020 and 2030",
+            )
+        if not self._initialized:
+            return PredictionResult(
+                summary_data={},
+                predictions_df=pd.DataFrame(),
+                comparison_df=None,
+                has_actual_data=False,
+                error="System not initialized. Please restart the app.",
+            )
+        try:
+            logger.info(f"Generating predictions for {year} Semester {semester}...")
+            assert self._config is not None
+            assert self._predictor is not None
+            assert self._processor is not None
+            assert self._df_enrollment is not None
+            assert self._elective_codes is not None
+            self._config.prediction.PREDICT_YEAR = year
+            self._config.prediction.PREDICT_SEMESTER = semester
+            actual_data, has_actual_data = self._get_actual_data(year, semester)
+            if has_actual_data:
+                logger.info(
+                    f"Found actual enrollment data for {year} Semester {semester}"
+                )
+            else:
+                logger.info(f"No actual data for {year} Semester {semester}")
+            metrics = self._run_backtest_if_needed()
+            predictions = self._predictor.generate_batch_predictions(
+                self._df_enrollment,
+                self._processor.raw_data["courses"],
+                self._elective_codes,
+                year,
+                semester,
+            )
+            open_courses = predictions[predictions["recommendation"] == "BUKA"]
+            total_to_open = len(open_courses)
+            total_classes = int(open_courses["classes_needed"].sum())
+            total_predicted_students = int(open_courses["predicted_enrollment"].sum())
+            total_capacity = int(open_courses["total_quota"].sum())
+            class_capacity = self._config.class_capacity.DEFAULT_CLASS_CAPACITY
+            summary_data = {
+                "year": year,
+                "semester": semester,
+                "semester_name": "1 (Ganjil)" if semester == 1 else "2 (Genap)",
+                "total_to_open": total_to_open,
+                "total_classes": total_classes,
+                "total_predicted_students": total_predicted_students,
+                "total_capacity": total_capacity,
+                "class_capacity": class_capacity,
+                "metrics": metrics,
+                "has_actual_data": has_actual_data,
+            }
+            comparison_df = None
+            if has_actual_data:
+                comparison = predictions.merge(
+                    actual_data[["kode_mk", "enrollment"]], on="kode_mk", how="left"
+                )
+                comparison = comparison.rename(
+                    columns={"enrollment": "actual_enrollment"}
+                )
+                courses_with_actual = comparison[
+                    comparison["actual_enrollment"].notna()
+                ].copy()
+                if len(courses_with_actual) > 0:
+                    comparison_mae = abs(
+                        courses_with_actual["predicted_enrollment"]
+                        - courses_with_actual["actual_enrollment"]
+                    ).mean()
+                    comparison_rmse = (
+                        (
+                            courses_with_actual["predicted_enrollment"]
+                            - courses_with_actual["actual_enrollment"]
+                        )
+                        ** 2
+                    ).mean() ** 0.5
+                    total_actual = courses_with_actual["actual_enrollment"].sum()
+                    total_predicted = courses_with_actual["predicted_enrollment"].sum()
+                    accuracy_pct = (
+                        1 - abs(total_predicted - total_actual) / total_actual
+                    ) * 100
+                    class_metrics = self._calculate_class_metrics(
+                        courses_with_actual.copy(), year, semester
+                    )
+                    summary_data.update(
+                        {
+                            "comparison_mae": comparison_mae,
+                            "comparison_rmse": comparison_rmse,
+                            "total_actual": total_actual,
+                            "total_predicted": total_predicted,
+                            "accuracy_pct": accuracy_pct,
+                            **class_metrics,
+                        }
+                    )
+                    comparison_df = self._prepare_comparison_table(
+                        predictions, actual_data, year, semester
+                    )
+            predictions_display = self._prepare_predictions_display(predictions)
+            return PredictionResult(
+                summary_data=summary_data,
+                predictions_df=predictions_display,
+                comparison_df=comparison_df,
+                has_actual_data=has_actual_data,
+            )
+        except Exception as e:
+            logger.error(f"Error generating predictions: {e}", exc_info=True)
+            return PredictionResult(
+                summary_data={},
+                predictions_df=pd.DataFrame(),
+                comparison_df=None,
+                has_actual_data=False,
+                error=str(e),
+            )
+    def generate_multi_year_forecast(
+        self, year: int, semester: int, years_ahead: int = 3
+    ) -> ForecastResult:
+        if not self._initialized:
+            return ForecastResult(
+                summary_data={},
+                forecast_df=pd.DataFrame(),
+                yearly_summary=pd.DataFrame(),
+                error="System not initialized.",
+            )
+        try:
+            logger.info(f"Generating {years_ahead}-year forecast from {year}...")
+            assert self._config is not None
+            assert self._predictor is not None
+            assert self._processor is not None
+            assert self._df_enrollment is not None
+            assert self._elective_codes is not None
+            forecast_df = self._predictor.generate_multi_year_forecast(
+                self._df_enrollment,
+                self._processor.raw_data["courses"],
+                self._elective_codes,
+                year,
+                semester,
+                years_ahead,
+            )
+            if forecast_df.empty:
+                return ForecastResult(
+                    summary_data={},
+                    forecast_df=pd.DataFrame(),
+                    yearly_summary=pd.DataFrame(),
+                    error="Tidak ada data untuk forecast.",
+                )
+            yearly_summary = (
+                forecast_df.groupby("year")
+                .agg(
+                    {
+                        "predicted_enrollment": "sum",
+                        "classes_needed": "sum",
+                        "total_capacity": "sum",
+                        "kode_mk": "count",
+                    }
+                )
+                .reset_index()
+            )
+            yearly_summary.columns = [
+                "Tahun",
+                "Total Prediksi",
+                "Total Kelas",
+                "Total Kapasitas",
+                "Jumlah MK",
+            ]
+            class_capacity = self._config.class_capacity.DEFAULT_CLASS_CAPACITY
+            semester_name = "Ganjil" if semester == 1 else "Genap"
+            first_year = yearly_summary.iloc[0]
+            last_year = yearly_summary.iloc[-1]
+            growth_classes = int(last_year["Total Kelas"] - first_year["Total Kelas"])
+            growth_students = int(
+                last_year["Total Prediksi"] - first_year["Total Prediksi"]
+            )
+            summary_data = {
+                "year": year,
+                "semester": semester,
+                "semester_name": semester_name,
+                "years_ahead": years_ahead,
+                "class_capacity": class_capacity,
+                "first_year_classes": int(first_year["Total Kelas"]),
+                "last_year_classes": int(last_year["Total Kelas"]),
+                "growth_classes": growth_classes,
+                "growth_students": growth_students,
+            }
+            display_df = forecast_df[
+                [
+                    "year",
+                    "kode_mk",
+                    "nama_mk",
+                    "predicted_enrollment",
+                    "classes_needed",
+                    "total_capacity",
+                ]
+            ].copy()
+            display_df.columns = [
+                "Tahun",
+                "Kode MK",
+                "Nama MK",
+                "Prediksi",
+                "Kelas",
+                "Kapasitas",
+            ]
+            display_df["Prediksi"] = display_df["Prediksi"].round(0).astype(int)
+            display_df = display_df.sort_values(["Kode MK", "Tahun"])
+            return ForecastResult(
+                summary_data=summary_data,
+                forecast_df=display_df,
+                yearly_summary=yearly_summary,
+            )
+        except Exception as e:
+            logger.error(f"Error generating forecast: {e}", exc_info=True)
+            return ForecastResult(
+                summary_data={},
+                forecast_df=pd.DataFrame(),
+                yearly_summary=pd.DataFrame(),
+                error=str(e),
+            )
+_backend_instance: Optional[PredictionBackend] = None
+def get_backend() -> PredictionBackend:
+    """Get the singleton backend instance."""
+    global _backend_instance
+    if _backend_instance is None:
+        _backend_instance = PredictionBackend()
+    return _backend_instance

config.py CHANGED Viewed

@@ -1,29 +1,21 @@
 from dataclasses import dataclass, field
 from typing import Dict, List
-import os
-# Import data loader for private HF dataset support
 try:
     from data_loader import load_data_file
     DATA_LOADER_AVAILABLE = True
 except ImportError:
     DATA_LOADER_AVAILABLE = False
     def load_data_file() -> str:
-        """Fallback if data_loader not available."""
         return "data/optimized_data.xlsx"
 def _get_data_file_path() -> str:
-    """
-    Get data file path based on environment.
-    Priority:
-    1. If HF_TOKEN set: Load from private HF dataset (muhalwan/optimized_data_mhs)
-    2. If DEMO_MODE=true: Use demo_data.xlsx (anonymized)
-    3. Otherwise: Use local optimized_data.xlsx
-    """
     if os.getenv("HF_TOKEN"):
-        return load_data_file()  # Loads from HF dataset if HF_TOKEN is set
     elif os.getenv("DEMO_MODE", "false").lower() == "true":
         return "data/demo_data.xlsx"
     else:
@@ -32,12 +24,8 @@ def _get_data_file_path() -> str:
 @dataclass
 class DataConfig:
-    """Data source configuration and validation rules."""
-    # Data file path - automatically determined based on environment
     FILE_PATH: str = field(default_factory=_get_data_file_path)
-    # Sheet mappings
     SHEET_COURSES: str = "tabel1_data_matkul"
     SHEET_OFFERINGS: str = "tabel2_data_matkul_dibuka"
     SHEET_STUDENTS_YEARLY: str = "tabel3_data_mahasiswa_per_tahun"
@@ -48,20 +36,55 @@ class DataConfig:
         default_factory=lambda: {"tahun": "thn", "semester": "smt"}
     )
-    # Elective Course Identification
-    # IMPORTANT: Elective courses are identified by kategori_mk = 'P' in tabel1
-    # Mandatory/Required courses have kategori_mk = 'W'
     ELECTIVE_CATEGORY: str = "P"
     MANDATORY_CATEGORY: str = "W"
-    # Valid category values (will be normalized to uppercase)
     VALID_CATEGORIES: List[str] = field(default_factory=lambda: ["P", "W"])
 @dataclass
-class ModelConfig:
-    """Prophet model hyperparameters and prediction strategies."""
     # Prophet Hyperparameters
     GROWTH_MODE: str = "logistic"
     CHANGEPOINT_SCALE: float = 0.01
@@ -75,6 +98,11 @@ class ModelConfig:
     # Minimum historical data points required for reliable prediction
     MIN_HISTORY_POINTS: int = 3
 @dataclass
 class PredictionConfig:
@@ -83,7 +111,6 @@ class PredictionConfig:
     PREDICT_YEAR: int = 2025
     PREDICT_SEMESTER: int = 2
-    # Buffer Calculations
     BUFFER_PERCENT: float = 0.20
     MIN_QUOTA_OPEN: int = 25
     MIN_PREDICT_THRESHOLD: int = 15
@@ -101,8 +128,6 @@ class PredictionConfig:
 @dataclass
 class OutputConfig:
-    """Output settings."""
     OUTPUT_DIR: str = "output"
     LOG_LEVEL: str = "INFO"
     TOP_N_DISPLAY: int = 30
@@ -110,8 +135,6 @@ class OutputConfig:
 @dataclass
 class BacktestConfig:
-    """Backtest settings and validation."""
     START_YEAR: int = 2010
     END_YEAR: int = 2024
     VERBOSE: bool = True
@@ -123,48 +146,65 @@ class BacktestConfig:
 class Config:
-    """
-    Master Config Object.
-    ELECTIVE COURSE DEFINITION:
-    ---------------------------
-    Elective courses are identified by kategori_mk = 'P' in tabel1_data_matkul.
-    This is the ONLY source of truth for course categories.
-    Examples of elective courses (kategori_mk = 'P'):
-    - EF234607: Keamanan Aplikasi
-    - EF234613: Game Edukasi dan Simulasi
-    - UG234922: Kebudayaan dan Kebangsaan
-    - IW184301: Sistem Basis Data
-    - KI series: Various computer science electives
-    Mandatory courses have kategori_mk = 'W' (Wajib).
-    DATA REQUIREMENTS FOR BACKTESTING:
-    -----------------------------------
-    To backtest a semester, you need:
-    1. Course catalog (tabel1) with kategori_mk properly set
-    2. ACTUAL student enrollments (tabel4) for that semester
-    3. At least one elective course with enrollments
-    Note: Course offerings (tabel2) alone are NOT sufficient for backtesting.
-    You must have actual enrollment data (tabel4) to validate predictions.
-    """
     def __init__(self):
         self.data: DataConfig = DataConfig()
         self.model: ModelConfig = ModelConfig()
         self.prediction: PredictionConfig = PredictionConfig()
         self.output: OutputConfig = OutputConfig()
         self.backtest: BacktestConfig = BacktestConfig()
     def get_prediction_target_name(self) -> str:
         sem = "Ganjil" if self.prediction.PREDICT_SEMESTER == 1 else "Genap"
         return f"{self.prediction.PREDICT_YEAR} Semester {sem}"
     def get_elective_filter_description(self) -> str:
-        """Get human-readable description of elective identification."""
         return f"kategori_mk = '{self.data.ELECTIVE_CATEGORY}' in {self.data.SHEET_COURSES}"
 default_config = Config()

+import os
 from dataclasses import dataclass, field
 from typing import Dict, List
 try:
     from data_loader import load_data_file
     DATA_LOADER_AVAILABLE = True
 except ImportError:
     DATA_LOADER_AVAILABLE = False
     def load_data_file() -> str:
         return "data/optimized_data.xlsx"
 def _get_data_file_path() -> str:
     if os.getenv("HF_TOKEN"):
+        return load_data_file()
     elif os.getenv("DEMO_MODE", "false").lower() == "true":
         return "data/demo_data.xlsx"
     else:
 @dataclass
 class DataConfig:
     FILE_PATH: str = field(default_factory=_get_data_file_path)
     SHEET_COURSES: str = "tabel1_data_matkul"
     SHEET_OFFERINGS: str = "tabel2_data_matkul_dibuka"
     SHEET_STUDENTS_YEARLY: str = "tabel3_data_mahasiswa_per_tahun"
         default_factory=lambda: {"tahun": "thn", "semester": "smt"}
     )
     ELECTIVE_CATEGORY: str = "P"
     MANDATORY_CATEGORY: str = "W"
     VALID_CATEGORIES: List[str] = field(default_factory=lambda: ["P", "W"])
 @dataclass
+class ClassCapacityConfig:
+    # Default maximum students per class
+    DEFAULT_CLASS_CAPACITY: int = 50
+    # Minimum students required to open a class
+    MIN_STUDENTS_TO_OPEN_CLASS: int = 1
+    # Threshold for opening additional classes
+    ADDITIONAL_CLASS_THRESHOLD: float = 0.7
+    # Always open at least 1 class if there's any historical enrollment
+    OPEN_CLASS_IF_HAS_HISTORY: bool = True
+    # Course-specific capacity overrides (kode_mk -> max_capacity)
+    COURSE_CAPACITY_OVERRIDES: Dict[str, int] = field(default_factory=dict)
+    # Warning threshold - if predicted > capacity * threshold, warn about capacity
+    CAPACITY_WARNING_THRESHOLD: float = 0.8
+    # Enable capacity-aware prediction
+    # When True, predictions will be bounded by realistic capacity constraints
+    ENABLE_CAPACITY_CONSTRAINTS: bool = True
+@dataclass
+class MultiYearForecastConfig:
+    # How many years ahead to forecast
+    FORECAST_YEARS_AHEAD: int = 3
+    # Include trend analysis in output
+    SHOW_TREND_ANALYSIS: bool = True
+    # Confidence interval for forecasts (0-1)
+    CONFIDENCE_INTERVAL: float = 0.95
+    # Growth rate limits for sanity checking
+    MAX_YEARLY_GROWTH_RATE: float = 0.5  # 50% max growth per year
+    MIN_YEARLY_GROWTH_RATE: float = -0.3  # 30% max decline per year
+@dataclass
+class ModelConfig:
     # Prophet Hyperparameters
     GROWTH_MODE: str = "logistic"
     CHANGEPOINT_SCALE: float = 0.01
     # Minimum historical data points required for reliable prediction
     MIN_HISTORY_POINTS: int = 3
+    # Use student population as regressor
+    USE_POPULATION_REGRESSOR: bool = True
+    # Use capacity as upper bound (cap in logistic growth)
+    USE_CAPACITY_AS_CAP: bool = True
 @dataclass
 class PredictionConfig:
     PREDICT_YEAR: int = 2025
     PREDICT_SEMESTER: int = 2
     BUFFER_PERCENT: float = 0.20
     MIN_QUOTA_OPEN: int = 25
     MIN_PREDICT_THRESHOLD: int = 15
 @dataclass
 class OutputConfig:
     OUTPUT_DIR: str = "output"
     LOG_LEVEL: str = "INFO"
     TOP_N_DISPLAY: int = 30
 @dataclass
 class BacktestConfig:
     START_YEAR: int = 2010
     END_YEAR: int = 2024
     VERBOSE: bool = True
 class Config:
     def __init__(self):
         self.data: DataConfig = DataConfig()
         self.model: ModelConfig = ModelConfig()
         self.prediction: PredictionConfig = PredictionConfig()
         self.output: OutputConfig = OutputConfig()
         self.backtest: BacktestConfig = BacktestConfig()
+        self.class_capacity: ClassCapacityConfig = ClassCapacityConfig()
+        self.multi_year: MultiYearForecastConfig = MultiYearForecastConfig()
     def get_prediction_target_name(self) -> str:
         sem = "Ganjil" if self.prediction.PREDICT_SEMESTER == 1 else "Genap"
         return f"{self.prediction.PREDICT_YEAR} Semester {sem}"
     def get_elective_filter_description(self) -> str:
         return f"kategori_mk = '{self.data.ELECTIVE_CATEGORY}' in {self.data.SHEET_COURSES}"
+    def get_class_capacity(self, course_code: str) -> int:
+        if course_code in self.class_capacity.COURSE_CAPACITY_OVERRIDES:
+            return self.class_capacity.COURSE_CAPACITY_OVERRIDES[course_code]
+        return self.class_capacity.DEFAULT_CLASS_CAPACITY
+    def calculate_classes_needed(
+        self,
+        predicted_enrollment: float,
+        course_code: str,
+        has_historical_data: bool = True,
+    ) -> int:
+        import math
+        capacity = self.get_class_capacity(course_code)
+        if predicted_enrollment <= 0:
+            return 0
+        if predicted_enrollment < 1 and has_historical_data:
+            return 1
+        classes = math.ceil(predicted_enrollment / capacity)
+        return max(1, classes)
+    def get_capacity_status(self, predicted_enrollment: float, course_code: str) -> str:
+        capacity = self.get_class_capacity(course_code)
+        classes_needed = self.calculate_classes_needed(
+            predicted_enrollment, course_code
+        )
+        if classes_needed == 0:
+            return "UNDER"
+        total_capacity = classes_needed * capacity
+        utilization = predicted_enrollment / total_capacity
+        if utilization >= 1.0:
+            return "OVER"
+        elif utilization >= self.class_capacity.CAPACITY_WARNING_THRESHOLD:
+            return "WARNING"
+        else:
+            return "NORMAL"
 default_config = Config()

data_loader.py CHANGED Viewed

@@ -12,8 +12,7 @@ def load_data_file() -> str:
         try:
             from huggingface_hub import hf_hub_download
-            logger.info("🔐 Loading data from private Hugging Face dataset...")
-            logger.info("   Dataset: muhalwan/optimized_data_mhs")
             file_path = hf_hub_download(
                 repo_id="muhalwan/optimized_data_mhs",
@@ -23,34 +22,22 @@ def load_data_file() -> str:
                 cache_dir="./hf_cache",
             )
-            logger.info("✓ Data loaded successfully from HF dataset")
-            logger.info(f"   Cached at: {file_path}")
             return file_path
-        except ImportError:
-            logger.error(
-                "huggingface_hub not installed. Install with: pip install huggingface_hub"
-            )
-            raise
         except Exception as e:
             logger.error(f"Failed to download from HF dataset: {e}")
-            logger.error("Falling back to local file if available...")
     local_path = "data/optimized_data.xlsx"
     if Path(local_path).exists():
-        logger.info(f"📁 Loading data from local file: {local_path}")
         return local_path
-    error_msg = (
-        "No data file found!\n"
-        "Options:\n"
-        "1. Set HF_TOKEN environment variable to load from private dataset\n"
-        "2. Place optimized_data.xlsx in data/ folder for local development\n"
     )
-    logger.error(error_msg)
-    raise FileNotFoundError(error_msg)
 def get_data_source_info() -> dict:
@@ -69,21 +56,14 @@ def get_data_source_info() -> dict:
 if __name__ == "__main__":
     logging.basicConfig(level=logging.INFO)
-    print("=" * 80)
-    print("Data Source Information")
-    print("=" * 80)
     info = get_data_source_info()
     for key, value in info.items():
         print(f"  {key}: {value}")
-    print("\n" + "=" * 80)
-    print("Attempting to load data...")
-    print("=" * 80)
     try:
         file_path = load_data_file()
-        print(f"\n✓ Success! Data file: {file_path}")
     except Exception as e:
-        print(f"\n✗ Failed: {e}")

         try:
             from huggingface_hub import hf_hub_download
+            logger.info("Dataset: muhalwan/optimized_data_mhs")
             file_path = hf_hub_download(
                 repo_id="muhalwan/optimized_data_mhs",
                 cache_dir="./hf_cache",
             )
+            logger.info("Data loaded successfully from HF dataset")
             return file_path
         except Exception as e:
             logger.error(f"Failed to download from HF dataset: {e}")
     local_path = "data/optimized_data.xlsx"
     if Path(local_path).exists():
+        logger.info(f"Loading data from local file: {local_path}")
         return local_path
+    raise FileNotFoundError(
+        "No data source available. Either set HF_TOKEN environment variable "
+        "or place data file at 'data/optimized_data.xlsx'"
     )
 def get_data_source_info() -> dict:
 if __name__ == "__main__":
     logging.basicConfig(level=logging.INFO)
+    print("Data Information")
     info = get_data_source_info()
     for key, value in info.items():
         print(f"  {key}: {value}")
     try:
         file_path = load_data_file()
+        print(f"\nSuccess! Data file: {file_path}")
     except Exception as e:
+        print(f"\nFailed: {e}")

data_processor.py CHANGED Viewed

@@ -1,5 +1,5 @@
 import logging
-from typing import Dict, Set, Tuple
 import numpy as np
 import pandas as pd
@@ -22,7 +22,6 @@ class DataProcessor:
         return self._preprocess()
     def _load_excel(self):
-        logger.info(f"Loading {self.config.data.FILE_PATH}...")
         try:
             sheets = pd.read_excel(self.config.data.FILE_PATH, sheet_name=None)
             self.raw_data = {
@@ -36,7 +35,6 @@ class DataProcessor:
             raise
     def _validate_raw_data(self):
-        """Validate required columns and log data quality metrics."""
         req_cols = {
             "courses": ["kode_mk", "kategori_mk"],
             "students_ind": ["kode_mk", "thn", "smt", "kode_mhs"],
@@ -47,46 +45,146 @@ class DataProcessor:
             if not all(col in self.raw_data[key].columns for col in cols):
                 raise ValueError(f"Missing columns in {key}: {cols}")
-        # Log data quality metrics
-        self._log_data_quality()
-    def _log_data_quality(self):
-        """Log data quality metrics for monitoring."""
-        courses_df = self.raw_data["courses"]
-        students_df = self.raw_data["students_ind"]
-        logger.info("=" * 60)
-        logger.info("Data Quality Report:")
-        logger.info(f"  Courses (tabel1): {len(courses_df)} records")
-        logger.info(f"    - Unique courses: {courses_df['kode_mk'].nunique()}")
         logger.info(
-            f"    - Duplicates: {len(courses_df) - courses_df['kode_mk'].nunique()}"
         )
-        logger.info(f"  Students (tabel4): {len(students_df)} records")
-        logger.info(f"    - Unique students: {students_df['kode_mhs'].nunique()}")
-        logger.info("=" * 60)
     def _clean_courses_data(self, courses: pd.DataFrame) -> pd.DataFrame:
-        """
-        Clean and standardize course catalog data.
-        Cleaning steps:
-        1. Remove exact duplicates
-        2. Standardize kategori_mk values (uppercase, strip whitespace)
-        3. Remove courses with invalid/missing data
-        4. Keep first occurrence for duplicate course codes
-        5. Validate kategori_mk values
-        """
         initial_count = len(courses)
-        # Step 1: Remove exact duplicate rows
         courses = courses.drop_duplicates()
         if len(courses) < initial_count:
             logger.info(
                 f"  Removed {initial_count - len(courses)} exact duplicate rows"
             )
-        # Step 2: Standardize kategori_mk
         courses["kategori_mk"] = (
             courses["kategori_mk"]
             .astype(str)
@@ -95,7 +193,7 @@ class DataProcessor:
             .replace("", np.nan)
         )
-        # Step 3: Remove rows with missing critical data
         before_dropna = len(courses)
         courses = courses.dropna(subset=["kode_mk", "kategori_mk"])
         if len(courses) < before_dropna:
@@ -103,7 +201,7 @@ class DataProcessor:
                 f"  Removed {before_dropna - len(courses)} rows with missing kode_mk or kategori_mk"
             )
-        # Step 4: Validate kategori_mk values (should be P or W)
         valid_categories = {"P", "W"}
         invalid_mask = ~courses["kategori_mk"].isin(valid_categories)
         if invalid_mask.any():
@@ -114,7 +212,7 @@ class DataProcessor:
             logger.warning("  Keeping only valid categories (P, W)")
             courses = courses[~invalid_mask]
-        # Step 5: Remove duplicate course codes (keep first)
         before_dedup = len(courses)
         courses = courses.drop_duplicates(subset="kode_mk", keep="first")
         if len(courses) < before_dedup:
@@ -127,29 +225,20 @@ class DataProcessor:
         return courses
     def _clean_students_data(self, students: pd.DataFrame) -> pd.DataFrame:
-        """
-        Clean and validate student enrollment data.
-        Cleaning steps:
-        1. Remove rows with missing critical data
-        2. Standardize data types
-        3. Remove invalid year/semester values
-        4. Remove duplicate enrollment records
-        """
         initial_count = len(students)
-        # Step 1: Remove rows with missing critical data
         students = students.dropna(subset=["kode_mk", "thn", "smt", "kode_mhs"])
         if len(students) < initial_count:
             logger.info(
                 f"  Removed {initial_count - len(students)} rows with missing critical data"
             )
-        # Step 2: Ensure correct data types
         students["thn"] = pd.to_numeric(students["thn"], errors="coerce")
         students["smt"] = pd.to_numeric(students["smt"], errors="coerce")
-        # Step 3: Remove rows with invalid year/semester after conversion
         before_invalid = len(students)
         students = students.dropna(subset=["thn", "smt"])
         if len(students) < before_invalid:
@@ -157,8 +246,8 @@ class DataProcessor:
                 f"  Removed {before_invalid - len(students)} rows with invalid year/semester values"
             )
-        # Step 4: Validate semester values (should be 1, 2, or 3)
-        valid_semesters = {1, 2, 3}
         invalid_sem = ~students["smt"].isin(valid_semesters)
         if invalid_sem.any():
             logger.warning(
@@ -166,7 +255,7 @@ class DataProcessor:
             )
             students = students[~invalid_sem]
-        # Step 5: Validate year range (reasonable academic years)
         current_year = pd.Timestamp.now().year
         invalid_year = (students["thn"] < 2000) | (students["thn"] > current_year + 1)
         if invalid_year.any():
@@ -175,7 +264,7 @@ class DataProcessor:
             )
             students = students[~invalid_year]
-        # Step 6: Remove exact duplicate enrollments (same student, course, semester)
         before_dedup = len(students)
         students = students.drop_duplicates(
             subset=["kode_mhs", "kode_mk", "thn", "smt"], keep="first"
@@ -190,14 +279,6 @@ class DataProcessor:
         return students
     def _clean_yearly_population(self, yearly_pop: pd.DataFrame) -> pd.DataFrame:
-        """
-        Clean and validate yearly student population data.
-        Cleaning steps:
-        1. Remove duplicates
-        2. Validate and fill missing population data
-        3. Ensure chronological order
-        """
         # Remove duplicate year-semester combinations
         before_dedup = len(yearly_pop)
         yearly_pop = yearly_pop.drop_duplicates(subset=["thn", "smt"], keep="first")
@@ -211,7 +292,7 @@ class DataProcessor:
             yearly_pop["jumlah_aktif"], errors="coerce"
         )
-        # Replace zero or negative values with NaN (will be filled later)
         yearly_pop.loc[yearly_pop["jumlah_aktif"] <= 0, "jumlah_aktif"] = np.nan
         # Sort by year and semester
@@ -222,20 +303,14 @@ class DataProcessor:
         return yearly_pop
     def _preprocess(self) -> Tuple[pd.DataFrame, Set[str]]:
-        """Clean, merge, and aggregate data with comprehensive cleaning."""
-        logger.info("Preprocessing data...")
-        logger.info("-" * 60)
-        # Step 1: Clean course catalog
-        logger.info("Step 1: Cleaning course catalog...")
         courses = self._clean_courses_data(self.raw_data["courses"].copy())
-        # Step 2: Identify elective courses
         elective_category = self.config.data.ELECTIVE_CATEGORY
         self.elective_codes = set(
             courses[courses["kategori_mk"] == elective_category]["kode_mk"]
         )
-        logger.info(f"Step 2: Identified {len(self.elective_codes)} elective courses")
         if len(self.elective_codes) == 0:
             logger.warning(
@@ -246,88 +321,52 @@ class DataProcessor:
             )
             return pd.DataFrame(), set()
-        # Step 3: Clean student enrollment data
-        logger.info("Step 3: Cleaning student enrollment data...")
         students = self._clean_students_data(self.raw_data["students_ind"].copy())
-        # Step 4: Filter for elective courses only
         students = students[students["kode_mk"].isin(self.elective_codes)]
-        logger.info(f"Step 4: Filtered to {len(students)} elective enrollment records")
         if len(students) == 0:
             logger.warning("No enrollment data found for elective courses!")
             return pd.DataFrame(), self.elective_codes
-        # Step 5: Aggregate enrollment by course-semester
-        logger.info("Step 5: Aggregating enrollment data...")
         enrollment = (
             students.groupby(["kode_mk", "thn", "smt"])["kode_mhs"]
             .nunique()
             .reset_index(name="enrollment")
         )
-        logger.info(f"  Created {len(enrollment)} course-semester enrollment records")
-        # Step 6: Clean yearly population data
-        logger.info("Step 6: Cleaning yearly population data...")
         yearly_pop = self._clean_yearly_population(
             self.raw_data["students_yearly"][["thn", "smt", "jumlah_aktif"]].copy()
         )
-        # Step 7: Merge enrollment with population data
-        logger.info("Step 7: Merging enrollment with population data...")
         df = enrollment.merge(yearly_pop, on=["thn", "smt"], how="left")
-        # Step 8: Handle missing population data
         missing_pop = df["jumlah_aktif"].isna().sum()
         if missing_pop > 0:
-            logger.warning(
-                f"  {missing_pop} records missing population data - filling with interpolation"
-            )
             df["jumlah_aktif"] = df["jumlah_aktif"].ffill().bfill()
-            # If still missing, use a reasonable default
             if df["jumlah_aktif"].isna().any():
-                default_pop = 500  # Reasonable default student population
-                logger.warning(
-                    f"  Some population data still missing - using default: {default_pop}"
-                )
                 df["jumlah_aktif"] = df["jumlah_aktif"].fillna(default_pop)
-        # Step 9: Validate enrollment data
-        logger.info("Step 8: Validating final enrollment data...")
         df = self._validate_enrollment_data(df)
-        # Step 10: Sort and finalize
         df = df.sort_values(["kode_mk", "thn", "smt"]).reset_index(drop=True)
         self.processed_data = df
-        logger.info("-" * 60)
-        logger.info(
-            f"✓ Preprocessing complete. {len(df)} enrollment records generated."
-        )
-        logger.info(f"✓ Year range: {df['thn'].min():.0f} - {df['thn'].max():.0f}")
-        logger.info(f"✓ Courses with data: {df['kode_mk'].nunique()}")
-        logger.info("-" * 60)
         return df, self.elective_codes
     def _validate_enrollment_data(self, df: pd.DataFrame) -> pd.DataFrame:
-        """
-        Validate and clean the final enrollment dataset.
-        Checks:
-        1. Remove records with zero enrollment
-        2. Check for outliers
-        3. Validate population data
-        """
-        initial_count = len(df)
         # Remove zero enrollments
         df = df[df["enrollment"] > 0]
-        if len(df) < initial_count:
-            logger.info(
-                f"  Removed {initial_count - len(df)} records with zero enrollment"
-            )
         # Check for extreme outliers in enrollment
         for course in df["kode_mk"].unique():
@@ -335,7 +374,7 @@ class DataProcessor:
             if len(course_data) > 1:
                 q75, q25 = course_data.quantile([0.75, 0.25])
                 iqr = q75 - q25
-                upper_bound = q75 + (3 * iqr)  # Using 3*IQR for outliers
                 outliers = course_data > upper_bound
                 if outliers.any():

 import logging
+from typing import Dict, Optional, Set, Tuple
 import numpy as np
 import pandas as pd
         return self._preprocess()
     def _load_excel(self):
         try:
             sheets = pd.read_excel(self.config.data.FILE_PATH, sheet_name=None)
             self.raw_data = {
             raise
     def _validate_raw_data(self):
         req_cols = {
             "courses": ["kode_mk", "kategori_mk"],
             "students_ind": ["kode_mk", "thn", "smt", "kode_mhs"],
             if not all(col in self.raw_data[key].columns for col in cols):
                 raise ValueError(f"Missing columns in {key}: {cols}")
+    def get_actual_classes_opened(
+        self, year: int, semester: int, course_code: Optional[str] = None
+    ) -> Dict[str, int]:
+        offerings = self.raw_data.get("offerings")
+        if offerings is None or len(offerings) == 0:
+            logger.warning("No offerings data (tabel2) available")
+            return {}
+        # Standardize column names
+        offerings = offerings.copy()
+        for old_col, new_col in self.config.data.OFFERINGS_RENAME.items():
+            if old_col in offerings.columns and new_col not in offerings.columns:
+                offerings = offerings.rename(columns={old_col: new_col})
+        # Log column names for debugging
+        logger.debug(f"Offerings columns: {offerings.columns.tolist()}")
+        # Filter by year and semester
+        mask = (offerings["thn"] == year) & (offerings["smt"] == semester)
+        if course_code:
+            mask = mask & (offerings["kode_mk"] == course_code)
+        filtered = offerings[mask]
+        if len(filtered) == 0:
+            logger.info(f"No class offerings found for {year} semester {semester}")
+            return {}
+        class_id_candidates = [
+            "kelas_id",
+            "id_kelas",
+            "kode_kelas",
+            "class_id",
+            "kelas",
+            "section_id",
+            "section",
+        ]
+        class_id_col = None
+        for col in class_id_candidates:
+            if col in filtered.columns:
+                class_id_col = col
+                logger.debug(f"Using class ID column: {col}")
+                break
+        if class_id_col is None:
+            cols = filtered.columns.tolist()
+            if len(cols) > 2:
+                potential_id_col = cols[2]
+                non_id_cols = [
+                    "nama_mk",
+                    "smt",
+                    "thn",
+                    "semester",
+                    "tahun",
+                    "kuota",
+                    "kapasitas",
+                ]
+                if potential_id_col.lower() not in non_id_cols:
+                    class_id_col = potential_id_col
+                    logger.debug(
+                        f"Using positional class ID column (index 2): {potential_id_col}"
+                    )
+        result = {}
+        for kode_mk in filtered["kode_mk"].unique():
+            course_data = filtered[filtered["kode_mk"] == kode_mk]
+            if class_id_col and class_id_col in course_data.columns:
+                unique_classes = course_data[class_id_col].nunique()
+                logger.debug(
+                    f"Course {kode_mk}: {len(course_data)} rows, {unique_classes} unique classes (by {class_id_col})"
+                )
+            else:
+                all_cols = course_data.columns.tolist()
+                dosen_cols = [
+                    col
+                    for col in all_cols
+                    if "dosen" in col.lower()
+                    or "pengajar" in col.lower()
+                    or "teacher" in col.lower()
+                ]
+                if len(all_cols) > 0:
+                    last_col = all_cols[-1]
+                    if last_col not in dosen_cols:
+                        non_last_cols = [c for c in all_cols if c != last_col]
+                        if len(non_last_cols) > 0:
+                            grouped = course_data.groupby(non_last_cols)[
+                                last_col
+                            ].nunique()
+                            if (grouped > 1).any():
+                                dosen_cols.append(last_col)
+                non_dosen_cols = [col for col in all_cols if col not in dosen_cols]
+                if non_dosen_cols:
+                    unique_classes = len(
+                        course_data.drop_duplicates(subset=non_dosen_cols)
+                    )
+                else:
+                    unique_classes = len(course_data.drop_duplicates())
+                logger.debug(
+                    f"Course {kode_mk}: {len(course_data)} rows, {unique_classes} unique classes (fallback method)"
+                )
+            result[kode_mk] = max(1, unique_classes)
         logger.info(
+            f"Found {len(result)} courses with {sum(result.values())} total classes for {year} sem {semester}"
+        )
+        return result
+    def get_class_count_for_validation(self, year: int, semester: int) -> pd.DataFrame:
+        actual_classes = self.get_actual_classes_opened(year, semester)
+        if not actual_classes:
+            return pd.DataFrame(columns=["kode_mk", "actual_classes"])
+        return pd.DataFrame(
+            [
+                {"kode_mk": kode, "actual_classes": count}
+                for kode, count in actual_classes.items()
+            ]
         )
     def _clean_courses_data(self, courses: pd.DataFrame) -> pd.DataFrame:
         initial_count = len(courses)
+        # Remove duplicate
         courses = courses.drop_duplicates()
         if len(courses) < initial_count:
             logger.info(
                 f"  Removed {initial_count - len(courses)} exact duplicate rows"
             )
+        # Standardize kategori_mk
         courses["kategori_mk"] = (
             courses["kategori_mk"]
             .astype(str)
             .replace("", np.nan)
         )
+        # Remove rows with missing critical data
         before_dropna = len(courses)
         courses = courses.dropna(subset=["kode_mk", "kategori_mk"])
         if len(courses) < before_dropna:
                 f"  Removed {before_dropna - len(courses)} rows with missing kode_mk or kategori_mk"
             )
+        # Validate kategori_mk values
         valid_categories = {"P", "W"}
         invalid_mask = ~courses["kategori_mk"].isin(valid_categories)
         if invalid_mask.any():
             logger.warning("  Keeping only valid categories (P, W)")
             courses = courses[~invalid_mask]
+        # Remove duplicate course codes (keep first)
         before_dedup = len(courses)
         courses = courses.drop_duplicates(subset="kode_mk", keep="first")
         if len(courses) < before_dedup:
         return courses
     def _clean_students_data(self, students: pd.DataFrame) -> pd.DataFrame:
         initial_count = len(students)
+        # Remove rows with missing critical data
         students = students.dropna(subset=["kode_mk", "thn", "smt", "kode_mhs"])
         if len(students) < initial_count:
             logger.info(
                 f"  Removed {initial_count - len(students)} rows with missing critical data"
             )
+        # Ensure correct data types
         students["thn"] = pd.to_numeric(students["thn"], errors="coerce")
         students["smt"] = pd.to_numeric(students["smt"], errors="coerce")
+        # Remove rows with invalid year/semester after conversion
         before_invalid = len(students)
         students = students.dropna(subset=["thn", "smt"])
         if len(students) < before_invalid:
                 f"  Removed {before_invalid - len(students)} rows with invalid year/semester values"
             )
+        # Validate semester values
+        valid_semesters = {1, 2}
         invalid_sem = ~students["smt"].isin(valid_semesters)
         if invalid_sem.any():
             logger.warning(
             )
             students = students[~invalid_sem]
+        # Validate year range
         current_year = pd.Timestamp.now().year
         invalid_year = (students["thn"] < 2000) | (students["thn"] > current_year + 1)
         if invalid_year.any():
             )
             students = students[~invalid_year]
+        # Remove exact duplicate enrollments (same student, course, semester)
         before_dedup = len(students)
         students = students.drop_duplicates(
             subset=["kode_mhs", "kode_mk", "thn", "smt"], keep="first"
         return students
     def _clean_yearly_population(self, yearly_pop: pd.DataFrame) -> pd.DataFrame:
         # Remove duplicate year-semester combinations
         before_dedup = len(yearly_pop)
         yearly_pop = yearly_pop.drop_duplicates(subset=["thn", "smt"], keep="first")
             yearly_pop["jumlah_aktif"], errors="coerce"
         )
+        # Replace zero or negative values with NaN
         yearly_pop.loc[yearly_pop["jumlah_aktif"] <= 0, "jumlah_aktif"] = np.nan
         # Sort by year and semester
         return yearly_pop
     def _preprocess(self) -> Tuple[pd.DataFrame, Set[str]]:
+        # Clean course catalog
         courses = self._clean_courses_data(self.raw_data["courses"].copy())
+        # Identify elective courses
         elective_category = self.config.data.ELECTIVE_CATEGORY
         self.elective_codes = set(
             courses[courses["kategori_mk"] == elective_category]["kode_mk"]
         )
         if len(self.elective_codes) == 0:
             logger.warning(
             )
             return pd.DataFrame(), set()
+        # Clean student enrollment data
         students = self._clean_students_data(self.raw_data["students_ind"].copy())
+        # Filter for elective courses only
         students = students[students["kode_mk"].isin(self.elective_codes)]
         if len(students) == 0:
             logger.warning("No enrollment data found for elective courses!")
             return pd.DataFrame(), self.elective_codes
+        # Aggregate enrollment by course-semester
         enrollment = (
             students.groupby(["kode_mk", "thn", "smt"])["kode_mhs"]
             .nunique()
             .reset_index(name="enrollment")
         )
+        # Clean yearly population data
         yearly_pop = self._clean_yearly_population(
             self.raw_data["students_yearly"][["thn", "smt", "jumlah_aktif"]].copy()
         )
+        # Merge enrollment with population data
         df = enrollment.merge(yearly_pop, on=["thn", "smt"], how="left")
+        # Handle missing population data
         missing_pop = df["jumlah_aktif"].isna().sum()
         if missing_pop > 0:
             df["jumlah_aktif"] = df["jumlah_aktif"].ffill().bfill()
             if df["jumlah_aktif"].isna().any():
+                default_pop = 500
                 df["jumlah_aktif"] = df["jumlah_aktif"].fillna(default_pop)
+        # Validate enrollment data
         df = self._validate_enrollment_data(df)
+        # Sort and finalize
         df = df.sort_values(["kode_mk", "thn", "smt"]).reset_index(drop=True)
         self.processed_data = df
         return df, self.elective_codes
     def _validate_enrollment_data(self, df: pd.DataFrame) -> pd.DataFrame:
         # Remove zero enrollments
         df = df[df["enrollment"] > 0]
         # Check for extreme outliers in enrollment
         for course in df["kode_mk"].unique():
             if len(course_data) > 1:
                 q75, q25 = course_data.quantile([0.75, 0.25])
                 iqr = q75 - q25
+                upper_bound = q75 + (3 * iqr)
                 outliers = course_data > upper_bound
                 if outliers.any():

data_validator.py DELETED Viewed

@@ -1,467 +0,0 @@
-"""
-Data Validation Utility
-Provides pre-flight checks and data quality validation for the enrollment prediction system.
-This module validates data availability, quality, and completeness before processing.
-"""
-import logging
-from dataclasses import dataclass
-from typing import Dict, List, Optional, Tuple
-import pandas as pd
-logger = logging.getLogger(__name__)
-@dataclass
-class ValidationResult:
-    """Result of a validation check."""
-    passed: bool
-    message: str
-    severity: str = "INFO"  # INFO, WARNING, ERROR
-    details: Optional[Dict] = None
-@dataclass
-class SemesterDataStatus:
-    """Status of data availability for a specific semester."""
-    year: int
-    semester: int
-    has_offerings: bool
-    has_enrollments: bool
-    has_elective_enrollments: bool
-    total_enrollments: int
-    elective_enrollments: int
-    elective_courses: Dict[str, int]
-class DataValidator:
-    """Validates data quality and availability for the enrollment prediction system."""
-    def __init__(self, file_path: str):
-        """
-        Initialize the validator.
-        Args:
-            file_path: Path to the Excel data file
-        """
-        self.file_path = file_path
-        self.validation_results: List[ValidationResult] = []
-    def validate_all(self) -> Tuple[bool, List[ValidationResult]]:
-        """
-        Run all validation checks.
-        Returns:
-            Tuple of (all_passed, list of validation results)
-        """
-        logger.info("Running comprehensive data validation...")
-        # Load raw data
-        try:
-            self.raw_data = self._load_raw_data()
-        except Exception as e:
-            self.validation_results.append(
-                ValidationResult(
-                    passed=False,
-                    message=f"Failed to load data: {str(e)}",
-                    severity="ERROR",
-                )
-            )
-            return False, self.validation_results
-        # Run validation checks
-        self._validate_file_structure()
-        self._validate_course_catalog()
-        self._validate_elective_courses()
-        self._validate_enrollment_data()
-        self._validate_population_data()
-        # Overall result
-        all_passed = all(
-            r.passed for r in self.validation_results if r.severity == "ERROR"
-        )
-        return all_passed, self.validation_results
-    def check_semester_data_availability(
-        self, year: int, semester: int
-    ) -> SemesterDataStatus:
-        """
-        Check data availability for a specific semester.
-        Args:
-            year: Academic year
-            semester: Semester (1 or 2)
-        Returns:
-            SemesterDataStatus object with detailed availability info
-        """
-        if not hasattr(self, "raw_data"):
-            self.raw_data = self._load_raw_data()
-        # Check course offerings (tabel2)
-        offerings = self.raw_data["offerings"]
-        has_offerings = (
-            len(
-                offerings[
-                    (offerings["tahun"] == year) & (offerings["semester"] == semester)
-                ]
-            )
-            > 0
-        )
-        # Check enrollments (tabel4)
-        students = self.raw_data["students"]
-        semester_enrollments = students[
-            (students["thn"] == year) & (students["smt"] == semester)
-        ]
-        has_enrollments = len(semester_enrollments) > 0
-        # Check elective enrollments
-        elective_codes = self._get_elective_codes()
-        elective_enrollments = semester_enrollments[
-            semester_enrollments["kode_mk"].isin(elective_codes)
-        ]
-        has_elective_enrollments = len(elective_enrollments) > 0
-        # Get elective courses for this semester
-        elective_courses: Dict[str, int] = {}
-        if has_elective_enrollments:
-            elective_courses = (
-                elective_enrollments.groupby("kode_mk")["kode_mhs"]
-                .nunique()
-                .sort_values(ascending=False)
-                .to_dict()
-            )
-        return SemesterDataStatus(
-            year=year,
-            semester=semester,
-            has_offerings=has_offerings,
-            has_enrollments=has_enrollments,
-            has_elective_enrollments=has_elective_enrollments,
-            total_enrollments=len(semester_enrollments),
-            elective_enrollments=len(elective_enrollments),
-            elective_courses=elective_courses,
-        )
-    def get_available_semesters_for_backtesting(self) -> List[Tuple[int, int]]:
-        """
-        Get list of semesters that have elective enrollment data (suitable for backtesting).
-        Returns:
-            List of (year, semester) tuples
-        """
-        if not hasattr(self, "raw_data"):
-            self.raw_data = self._load_raw_data()
-        students = self.raw_data["students"]
-        elective_codes = self._get_elective_codes()
-        # Filter to elective enrollments only
-        elective_students = students[students["kode_mk"].isin(elective_codes)]
-        # Get unique year-semester combinations
-        available = (
-            elective_students.groupby(["thn", "smt"]).size().reset_index(name="count")
-        )
-        available = available[available["count"] > 0]
-        semesters = [
-            (int(row["thn"]), int(row["smt"])) for _, row in available.iterrows()
-        ]
-        semesters.sort(reverse=True)  # Most recent first
-        return semesters
-    def print_validation_summary(self):
-        """Print a summary of validation results."""
-        if not self.validation_results:
-            print("\nWARNING: No validation has been run yet.")
-            return
-        print("\n" + "=" * 80)
-        print("DATA VALIDATION SUMMARY")
-        print("=" * 80)
-        errors = [r for r in self.validation_results if r.severity == "ERROR"]
-        warnings = [r for r in self.validation_results if r.severity == "WARNING"]
-        info = [r for r in self.validation_results if r.severity == "INFO"]
-        if errors:
-            print(f"\nERROR ({len(errors)}):")
-            for result in errors:
-                print(f"   - {result.message}")
-        if warnings:
-            print(f"\nWARNING ({len(warnings)}):")
-            for result in warnings:
-                print(f"   - {result.message}")
-        if info:
-            print(f"\nINFO ({len(info)}):")
-            for result in info:
-                print(f"   - {result.message}")
-        print("\n" + "=" * 80)
-        if not errors:
-            print("VALIDATION PASSED - Data is ready for processing")
-        else:
-            print("VALIDATION FAILED - Please fix errors before proceeding")
-        print("=" * 80)
-    def _load_raw_data(self) -> Dict[str, pd.DataFrame]:
-        """Load raw data from Excel file."""
-        logger.info(f"Loading data from {self.file_path}...")
-        return {
-            "courses": pd.read_excel(self.file_path, sheet_name="tabel1_data_matkul"),
-            "offerings": pd.read_excel(
-                self.file_path, sheet_name="tabel2_data_matkul_dibuka"
-            ),
-            "population": pd.read_excel(
-                self.file_path, sheet_name="tabel3_data_mahasiswa_per_tahun"
-            ),
-            "students": pd.read_excel(
-                self.file_path, sheet_name="tabel4_data_individu_mahasiswa"
-            ),
-        }
-    def _validate_file_structure(self):
-        """Validate that all required sheets and columns exist."""
-        required_sheets = {
-            "courses": ["kode_mk", "nama_mk", "kategori_mk"],
-            "offerings": ["kode_mk", "tahun", "semester"],
-            "students": ["kode_mk", "kode_mhs", "thn", "smt"],
-            "population": ["jumlah_aktif"],  # tahun_ajaran and semester may vary
-        }
-        for sheet_name, required_cols in required_sheets.items():
-            df = self.raw_data.get(sheet_name)
-            if df is None:
-                self.validation_results.append(
-                    ValidationResult(
-                        passed=False,
-                        message=f"Sheet '{sheet_name}' not found",
-                        severity="ERROR",
-                    )
-                )
-                continue
-            missing_cols = [col for col in required_cols if col not in df.columns]
-            if missing_cols:
-                self.validation_results.append(
-                    ValidationResult(
-                        passed=False,
-                        message=f"Missing columns in {sheet_name}: {missing_cols}",
-                        severity="ERROR",
-                    )
-                )
-            else:
-                self.validation_results.append(
-                    ValidationResult(
-                        passed=True,
-                        message=f"Sheet '{sheet_name}' has all required columns",
-                        severity="INFO",
-                    )
-                )
-    def _validate_course_catalog(self):
-        """Validate course catalog (tabel1)."""
-        courses = self.raw_data["courses"]
-        # Check for duplicates
-        total_records = len(courses)
-        unique_courses = courses["kode_mk"].nunique()
-        duplicate_count = total_records - unique_courses
-        if duplicate_count > 0:
-            self.validation_results.append(
-                ValidationResult(
-                    passed=True,
-                    message=f"Course catalog has {duplicate_count:,} duplicate records (will be cleaned)",
-                    severity="WARNING",
-                    details={"total": total_records, "unique": unique_courses},
-                )
-            )
-        # Check for category consistency
-        categories = courses["kategori_mk"].unique()
-        non_standard = [c for c in categories if c not in ["W", "P"]]
-        if non_standard:
-            self.validation_results.append(
-                ValidationResult(
-                    passed=True,
-                    message=f"Non-standard categories found: {non_standard} (will be normalized)",
-                    severity="WARNING",
-                )
-            )
-    def _validate_elective_courses(self):
-        """Validate elective course identification."""
-        courses = self.raw_data["courses"]
-        # Clean and identify electives
-        courses_clean = courses.drop_duplicates(subset="kode_mk").copy()
-        courses_clean["kategori_mk"] = (
-            courses_clean["kategori_mk"].astype(str).str.upper().str.strip()
-        )
-        electives = courses_clean[courses_clean["kategori_mk"] == "P"]
-        elective_count = len(electives)
-        if elective_count == 0:
-            self.validation_results.append(
-                ValidationResult(
-                    passed=False,
-                    message="No elective courses found (kategori_mk = 'P')",
-                    severity="ERROR",
-                )
-            )
-        else:
-            self.validation_results.append(
-                ValidationResult(
-                    passed=True,
-                    message=f"Found {elective_count} elective courses",
-                    severity="INFO",
-                    details={"electives": electives["kode_mk"].tolist()},
-                )
-            )
-    def _validate_enrollment_data(self):
-        """Validate student enrollment data (tabel4)."""
-        students = self.raw_data["students"]
-        # Check for missing critical data
-        critical_fields = ["kode_mk", "kode_mhs", "thn", "smt"]
-        missing_data = students[critical_fields].isnull().any(axis=1).sum()
-        if missing_data > 0:
-            self.validation_results.append(
-                ValidationResult(
-                    passed=True,
-                    message=f"{missing_data} enrollment records have missing data (will be cleaned)",
-                    severity="WARNING",
-                )
-            )
-        # Check for duplicates
-        duplicate_enrollments = students.duplicated(
-            subset=["kode_mhs", "kode_mk", "thn", "smt"]
-        ).sum()
-        if duplicate_enrollments > 0:
-            self.validation_results.append(
-                ValidationResult(
-                    passed=True,
-                    message=f"{duplicate_enrollments:,} duplicate enrollment records (will be cleaned)",
-                    severity="WARNING",
-                )
-            )
-        # Check year range
-        min_year = students["thn"].min()
-        max_year = students["thn"].max()
-        self.validation_results.append(
-            ValidationResult(
-                passed=True,
-                message=f"Enrollment data spans {int(min_year)} to {int(max_year)}",
-                severity="INFO",
-            )
-        )
-    def _validate_population_data(self):
-        """Validate yearly population data (tabel3)."""
-        population = self.raw_data["population"]
-        if len(population) == 0:
-            self.validation_results.append(
-                ValidationResult(
-                    passed=False,
-                    message="No population data found",
-                    severity="ERROR",
-                )
-            )
-            return
-        # Check for required fields (note: actual columns are tahun_ajaran/semester, not in sheet_name definition)
-        if "jumlah_aktif" in population.columns:
-            min_pop = population["jumlah_aktif"].min()
-            max_pop = population["jumlah_aktif"].max()
-            self.validation_results.append(
-                ValidationResult(
-                    passed=True,
-                    message=f"Population data: {len(population)} records, range {int(min_pop)}-{int(max_pop)} students",
-                    severity="INFO",
-                )
-            )
-        else:
-            self.validation_results.append(
-                ValidationResult(
-                    passed=False,
-                    message="Population data missing 'jumlah_aktif' column",
-                    severity="ERROR",
-                )
-            )
-    def _get_elective_codes(self) -> set:
-        """Get set of elective course codes."""
-        courses = self.raw_data["courses"]
-        courses_clean = courses.drop_duplicates(subset="kode_mk").copy()
-        courses_clean["kategori_mk"] = (
-            courses_clean["kategori_mk"].astype(str).str.upper().str.strip()
-        )
-        return set(courses_clean[courses_clean["kategori_mk"] == "P"]["kode_mk"])
-if __name__ == "__main__":
-    # Example usage
-    logging.basicConfig(
-        level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
-    )
-    validator = DataValidator(
-        "data/Data Perkuliahan Mahasiswa untuk Penelitian (8 Oktober 2025).xlsx"
-    )
-    # Run validation
-    passed, results = validator.validate_all()
-    validator.print_validation_summary()
-    # Check specific semesters
-    print("\n" + "=" * 80)
-    print("SEMESTER DATA AVAILABILITY")
-    print("=" * 80)
-    for year, semester in [(2024, 2), (2025, 1)]:
-        status = validator.check_semester_data_availability(year, semester)
-        print(f"\n{year} Semester {semester}:")
-        print(f"  Offerings: {'Yes' if status.has_offerings else 'No'}")
-        print(
-            f"  Enrollments: {'Yes' if status.has_enrollments else 'No'} ({status.total_enrollments} records)"
-        )
-        print(
-            f"  Elective Enrollments: {'Yes' if status.has_elective_enrollments else 'No'} ({status.elective_enrollments} records)"
-        )
-        if status.elective_courses:
-            print(f"  Elective courses: {len(status.elective_courses)}")
-            for code, count in list(status.elective_courses.items())[:5]:
-                print(f"    - {code}: {count} students")
-    # Show available semesters for backtesting
-    print("\n" + "=" * 80)
-    print("SEMESTERS AVAILABLE FOR BACKTESTING")
-    print("=" * 80)
-    available = validator.get_available_semesters_for_backtesting()
-    if available:
-        print(f"\nFound {len(available)} semesters with elective enrollment data:")
-        for year, sem in available:
-            print(f"  • {year} Semester {sem}")
-    else:
-        print("\nERROR: No semesters with elective enrollment data found!")

evaluator.py CHANGED Viewed

@@ -17,12 +17,11 @@ class Evaluator:
         self.config = config
     def run_backtest(self, full_data: pd.DataFrame, predictor):
-        """Simulate past semesters to check accuracy."""
-        logger.info("Starting Backtest...")
         results = []
         start_year: int = self.config.backtest.START_YEAR
         end_year: int = self.config.backtest.END_YEAR
         for year in range(start_year, end_year + 1):
             for smt in [1, 2]:
@@ -47,53 +46,251 @@ class Evaluator:
                         row["kode_mk"], train_set, year, smt, pop_est
                     )
                     results.append(
                         {
                             "year": year,
                             "semester": smt,
                             "kode_mk": row["kode_mk"],
-                            "actual": row["enrollment"],
-                            "predicted": pred["val"],
                             "strategy": pred["strategy"],
-                            "error": abs(row["enrollment"] - pred["val"]),
                         }
                     )
         return pd.DataFrame(results)
     def generate_metrics(self, results: pd.DataFrame):
-        """Calculate and log performance metrics."""
         results["error"] = abs(results["predicted"] - results["actual"])
         mae = mean_absolute_error(results["actual"], results["predicted"])
         rmse = np.sqrt(mean_squared_error(results["actual"], results["predicted"]))
-        logger.info("\n" + "=" * 40)
         logger.info("BACKTEST METRICS")
-        logger.info("=" * 40)
-        logger.info(f"Overall MAE:  {mae:.2f}")
-        logger.info(f"Overall RMSE: {rmse:.2f}")
         logger.info("\nPerformance by Strategy:")
-        strat_perf = results.groupby("strategy")["error"].mean()
         logger.info(strat_perf.to_string())
         self._plot_results(results)
-        return {"mae": mae, "rmse": rmse}
     def _plot_results(self, df):
-        """Generate simple Actual vs Predicted scatter plot."""
         Path(self.config.output.OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
         plt.figure(figsize=(10, 6))
         sns.scatterplot(
-            data=df, x="actual", y="predicted", hue="strategy", style="strategy"
         )
         limit = max(df["actual"].max(), df["predicted"].max())
-        plt.plot([0, limit], [0, limit], "r--", alpha=0.5)
         plt.title("Actual vs Predicted Enrollment")
-        plt.savefig(f"{self.config.output.OUTPUT_DIR}/backtest_scatter.png")
         plt.close()

         self.config = config
     def run_backtest(self, full_data: pd.DataFrame, predictor):
         results = []
         start_year: int = self.config.backtest.START_YEAR
         end_year: int = self.config.backtest.END_YEAR
+        class_capacity = self.config.class_capacity.DEFAULT_CLASS_CAPACITY
         for year in range(start_year, end_year + 1):
             for smt in [1, 2]:
                         row["kode_mk"], train_set, year, smt, pop_est
                     )
+                    actual_enrollment = row["enrollment"]
+                    predicted_enrollment = pred["val"]
+                    actual_classes = self._calculate_classes(
+                        actual_enrollment, class_capacity
+                    )
+                    predicted_classes = pred.get(
+                        "classes_needed",
+                        self._calculate_classes(predicted_enrollment, class_capacity),
+                    )
                     results.append(
                         {
                             "year": year,
                             "semester": smt,
                             "kode_mk": row["kode_mk"],
+                            "actual": actual_enrollment,
+                            "predicted": predicted_enrollment,
+                            "actual_classes": actual_classes,
+                            "predicted_classes": predicted_classes,
                             "strategy": pred["strategy"],
+                            "error": abs(actual_enrollment - predicted_enrollment),
+                            "class_error": abs(actual_classes - predicted_classes),
                         }
                     )
         return pd.DataFrame(results)
+    def _calculate_classes(self, enrollment: float, capacity: int) -> int:
+        if enrollment < self.config.class_capacity.MIN_STUDENTS_TO_OPEN_CLASS:
+            return 0
+        return int(np.ceil(enrollment / capacity))
     def generate_metrics(self, results: pd.DataFrame):
+        if results.empty:
+            logger.warning("No results to generate metrics from")
+            return {"mae": 0, "rmse": 0, "class_mae": 0, "class_accuracy": 0}
         results["error"] = abs(results["predicted"] - results["actual"])
+        results["class_error"] = abs(
+            results["predicted_classes"] - results["actual_classes"]
+        )
+        # Enrollment metrics
         mae = mean_absolute_error(results["actual"], results["predicted"])
         rmse = np.sqrt(mean_squared_error(results["actual"], results["predicted"]))
+        # Class count metrics
+        class_mae = results["class_error"].mean()
+        # Class accuracy: percentage of predictions with correct class count
+        class_correct = (results["class_error"] == 0).sum()
+        class_accuracy = (class_correct / len(results)) * 100 if len(results) > 0 else 0
+        # Class accuracy within 1: predictions within ±1 class
+        class_within_1 = (results["class_error"] <= 1).sum()
+        class_accuracy_within_1 = (
+            (class_within_1 / len(results)) * 100 if len(results) > 0 else 0
+        )
         logger.info("BACKTEST METRICS")
+        logger.info("\nEnrollment Prediction Metrics:")
+        logger.info(f"  Overall MAE:  {mae:.2f} students")
+        logger.info(f"  Overall RMSE: {rmse:.2f} students")
+        logger.info("\nClass Count Prediction Metrics:")
+        logger.info(f"  Class MAE:           {class_mae:.2f} classes")
+        logger.info(f"  Exact Class Match:   {class_accuracy:.1f}%")
+        logger.info(f"  Within ±1 Class:     {class_accuracy_within_1:.1f}%")
         logger.info("\nPerformance by Strategy:")
+        strat_perf = (
+            results.groupby("strategy")
+            .agg({"error": "mean", "class_error": "mean"})
+            .round(2)
+        )
+        strat_perf.columns = ["Avg Enrollment Error", "Avg Class Error"]
         logger.info(strat_perf.to_string())
+        logger.info("=" * 50)
         self._plot_results(results)
+        self._plot_class_results(results)
+        return {
+            "mae": mae,
+            "rmse": rmse,
+            "class_mae": class_mae,
+            "class_accuracy": class_accuracy,
+            "class_accuracy_within_1": class_accuracy_within_1,
+        }
     def _plot_results(self, df):
         Path(self.config.output.OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
         plt.figure(figsize=(10, 6))
         sns.scatterplot(
+            data=df,
+            x="actual",
+            y="predicted",
+            hue="strategy",
+            style="strategy",
+            alpha=0.7,
         )
         limit = max(df["actual"].max(), df["predicted"].max())
+        plt.plot([0, limit], [0, limit], "r--", alpha=0.5, label="Perfect Prediction")
         plt.title("Actual vs Predicted Enrollment")
+        plt.xlabel("Actual Enrollment")
+        plt.ylabel("Predicted Enrollment")
+        plt.legend(bbox_to_anchor=(1.05, 1), loc="upper left")
+        plt.tight_layout()
+        plt.savefig(
+            f"{self.config.output.OUTPUT_DIR}/backtest_enrollment_scatter.png", dpi=150
+        )
         plt.close()
+    def _plot_class_results(self, df):
+        Path(self.config.output.OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
+        plt.figure(figsize=(10, 6))
+        jitter_strength = 0.1
+        df_plot = df.copy()
+        df_plot["actual_jitter"] = df_plot["actual_classes"] + np.random.uniform(
+            -jitter_strength, jitter_strength, len(df_plot)
+        )
+        df_plot["predicted_jitter"] = df_plot["predicted_classes"] + np.random.uniform(
+            -jitter_strength, jitter_strength, len(df_plot)
+        )
+        sns.scatterplot(
+            data=df_plot,
+            x="actual_jitter",
+            y="predicted_jitter",
+            hue="strategy",
+            style="strategy",
+            alpha=0.7,
+        )
+        limit = max(df["actual_classes"].max(), df["predicted_classes"].max()) + 1
+        plt.plot([0, limit], [0, limit], "r--", alpha=0.5, label="Perfect Prediction")
+        plt.title("Actual vs Predicted Number of Classes")
+        plt.xlabel("Actual Classes Needed")
+        plt.ylabel("Predicted Classes Needed")
+        plt.legend(bbox_to_anchor=(1.05, 1), loc="upper left")
+        plt.tight_layout()
+        plt.savefig(
+            f"{self.config.output.OUTPUT_DIR}/backtest_classes_scatter.png", dpi=150
+        )
+        plt.close()
+    def generate_class_capacity_report(self, results: pd.DataFrame) -> pd.DataFrame:
+        if results.empty:
+            return pd.DataFrame()
+        course_summary = (
+            results.groupby("kode_mk")
+            .agg(
+                {
+                    "actual": ["mean", "sum", "count"],
+                    "predicted": ["mean", "sum"],
+                    "actual_classes": ["mean", "sum"],
+                    "predicted_classes": ["mean", "sum"],
+                    "class_error": ["mean", "sum"],
+                }
+            )
+            .round(2)
+        )
+        course_summary.columns = [
+            "avg_actual_enrollment",
+            "total_actual_enrollment",
+            "n_semesters",
+            "avg_predicted_enrollment",
+            "total_predicted_enrollment",
+            "avg_actual_classes",
+            "total_actual_classes",
+            "avg_predicted_classes",
+            "total_predicted_classes",
+            "avg_class_error",
+            "total_class_error",
+        ]
+        course_summary = course_summary.reset_index()
+        course_summary = course_summary.sort_values(
+            "total_class_error", ascending=False
+        )
+        return course_summary
+    def analyze_capacity_trends(self, full_data: pd.DataFrame) -> pd.DataFrame:
+        class_capacity = self.config.class_capacity.DEFAULT_CLASS_CAPACITY
+        trend_data = full_data.copy()
+        trend_data["classes_needed"] = trend_data["enrollment"].apply(
+            lambda x: self._calculate_classes(x, class_capacity)
+        )
+        course_trends = []
+        for course in trend_data["kode_mk"].unique():
+            course_data = trend_data[trend_data["kode_mk"] == course].sort_values(
+                ["thn", "smt"]
+            )
+            if len(course_data) < 2:
+                continue
+            first_year = course_data.iloc[0]
+            last_year = course_data.iloc[-1]
+            enrollment_growth = last_year["enrollment"] - first_year["enrollment"]
+            class_growth = last_year["classes_needed"] - first_year["classes_needed"]
+            years_diff = last_year["thn"] - first_year["thn"]
+            if years_diff > 0 and first_year["enrollment"] > 0:
+                annual_growth_rate = (
+                    (last_year["enrollment"] / first_year["enrollment"])
+                    ** (1 / years_diff)
+                    - 1
+                ) * 100
+            else:
+                annual_growth_rate = 0
+            course_trends.append(
+                {
+                    "kode_mk": course,
+                    "first_enrollment": first_year["enrollment"],
+                    "last_enrollment": last_year["enrollment"],
+                    "enrollment_growth": enrollment_growth,
+                    "first_classes": first_year["classes_needed"],
+                    "last_classes": last_year["classes_needed"],
+                    "class_growth": class_growth,
+                    "annual_growth_rate": round(annual_growth_rate, 1),
+                    "data_points": len(course_data),
+                    "year_range": f"{int(first_year['thn'])}-{int(last_year['thn'])}",
+                }
+            )
+        trends_df = pd.DataFrame(course_trends)
+        if not trends_df.empty:
+            trends_df = trends_df.sort_values("annual_growth_rate", ascending=False)
+        return trends_df

prophet_predictor.py CHANGED Viewed

@@ -1,5 +1,5 @@
 import logging
-from typing import Optional
 import numpy as np
 import pandas as pd
@@ -24,8 +24,13 @@ class ProphetPredictor:
         )
         df["y"] = df["jumlah_aktif"]
-        self.student_model = Prophet(daily_seasonality=False, weekly_seasonality=False)  # type: ignore[arg-type]
-        self.student_model.fit(df)
         logger.info("Student population model trained.")
     def get_student_forecast(self, year: int, semester: int) -> float:
@@ -37,6 +42,19 @@ class ProphetPredictor:
         forecast = self.student_model.predict(future)
         return max(forecast["yhat"].values[0], 100)
     def predict_course(
         self,
         course_code: str,
@@ -46,23 +64,41 @@ class ProphetPredictor:
         student_pop: float,
     ) -> dict:
         hist = df_history[
-            (df_history["kode_mk"] == course_code) &
-            (df_history["smt"] == target_smt)
         ].sort_values(["thn", "smt"])
-        if len(hist) == 0:
             return {
                 "val": self.config.model.FALLBACK_DEFAULT,
                 "strategy": "cold_start",
                 "confidence": "low",
             }
-        return self._predict_prophet_logistic(
-            hist, target_year, target_smt, student_pop
         )
-    def _predict_prophet_logistic(
-        self, hist: pd.DataFrame, year: int, smt: int, pop: float
     ) -> dict:
         df = hist.copy()
         df["ds"] = pd.to_datetime(
@@ -89,14 +125,20 @@ class ProphetPredictor:
                 "confidence": "low",
             }
-        hist_max = df["y"].max()
-        hist_mean = df["y"].mean()
         cap_value = min(
             hist_max * self.config.prediction.MAX_CAPACITY_MULTIPLIER,
             self.config.prediction.ABSOLUTE_MAX_STUDENTS,
         )
         df["cap"] = cap_value
         df["floor"] = 0
@@ -109,8 +151,11 @@ class ProphetPredictor:
                 weekly_seasonality=False,  # type: ignore[arg-type]
             )
-            m.add_regressor("jumlah_aktif", mode="multiplicative")
-            m.fit(df[["ds", "y", "cap", "floor", "jumlah_aktif"]])
             future_date = pd.to_datetime(
                 f"{year}-{self.config.prediction.SEMESTER_TO_MONTH[smt]}"
@@ -121,10 +166,12 @@ class ProphetPredictor:
                     "ds": [future_date],
                     "cap": [cap_value],
                     "floor": [0],
-                    "jumlah_aktif": [pop],
                 }
             )
             forecast = m.predict(future)
             raw_pred = forecast["yhat"].values[0]
@@ -135,18 +182,17 @@ class ProphetPredictor:
                 or raw_pred > cap_value * 2
             ):
                 logger.warning(
-                    f"Prophet prediction ({raw_pred:.1f}) unrealistic. "
                     f"Using trend-based fallback. (hist_max={hist_max}, cap={cap_value})"
                 )
                 if len(df) >= 3:
-                    recent_trend = df["y"].tail(3).mean()
-                    pop_growth_factor = pop / df["jumlah_aktif"].mean()
-                    growth_factor = min(
-                        max(pop_growth_factor, 0.8), 1.3
-                    )
                     pred = recent_trend * growth_factor
                 else:
-                    pop_growth_factor = pop / df["jumlah_aktif"].mean()
                     pred = hist_mean * min(max(pop_growth_factor, 0.8), 1.3)
                 pred = min(max(pred, 0), cap_value)
@@ -166,13 +212,37 @@ class ProphetPredictor:
             }
         except Exception as e:
-            logger.warning(f"Prophet failed for course. Error: {e}. Using fallback.")
             return {
                 "val": hist["enrollment"].mean(),
                 "strategy": "fallback_mean",
                 "confidence": "medium",
             }
     def generate_batch_predictions(
         self,
         full_data: pd.DataFrame,
@@ -180,8 +250,7 @@ class ProphetPredictor:
         electives: set,
         year: int,
         smt: int,
-    ):
-        """Generate predictions for all courses."""
         student_pop = self.get_student_forecast(year, smt)
         results = []
@@ -190,35 +259,57 @@ class ProphetPredictor:
         )
         for code in electives:
-            meta = course_metadata[course_metadata["kode_mk"] == code].iloc[0]
             pred_result = self.predict_course(code, full_data, year, smt, student_pop)
             pred_val = pred_result["val"]
-            rec_quota = int(
-                np.ceil(pred_val * (1 + self.config.prediction.BUFFER_PERCENT))
             )
-            rec_quota = max(rec_quota, self.config.prediction.MIN_QUOTA_OPEN)
-            status = (
-                "BUKA"
-                if pred_val >= self.config.prediction.MIN_PREDICT_THRESHOLD
-                else "TUTUP"
             )
             results.append(
                 {
                     "kode_mk": code,
                     "nama_mk": meta["nama_mk"],
-                    "sks": meta["sks_mk"],
                     "predicted_enrollment": round(pred_val, 1),
-                    "recommended_quota": rec_quota if status == "BUKA" else 0,
                     "recommendation": status,
                     "strategy": pred_result["strategy"],
                     "confidence": pred_result["confidence"],
-                    "classes_est": int(np.ceil(rec_quota / 40))
-                    if status == "BUKA"
-                    else 0,
                 }
             )
@@ -226,6 +317,98 @@ class ProphetPredictor:
             "predicted_enrollment", ascending=False
         )
     def predict_course_enrollment(
         self,
         course_code: str,
@@ -233,7 +416,7 @@ class ProphetPredictor:
         test_year: int,
         test_semester: int,
         test_student_count: float,
-    ) -> tuple[float, str]:
         result = self.predict_course(
             course_code=course_code,
             df_history=train_data,

 import logging
+from typing import Dict, List, Optional, Tuple
 import numpy as np
 import pandas as pd
         )
         df["y"] = df["jumlah_aktif"]
+        self.student_model = Prophet(
+            growth="linear",
+            daily_seasonality=False,  # type: ignore[arg-type]
+            weekly_seasonality=False,  # type: ignore[arg-type]
+            yearly_seasonality=True,  # type: ignore[arg-type]
+        )
+        self.student_model.fit(df[["ds", "y"]])
         logger.info("Student population model trained.")
     def get_student_forecast(self, year: int, semester: int) -> float:
         forecast = self.student_model.predict(future)
         return max(forecast["yhat"].values[0], 100)
+    def get_multi_year_student_forecast(
+        self, start_year: int, semester: int, years_ahead: int
+    ) -> List[Tuple[int, float]]:
+        assert self.student_model is not None, "Student model must be trained first"
+        forecasts = []
+        for i in range(years_ahead + 1):
+            year = start_year + i
+            pop = self.get_student_forecast(year, semester)
+            forecasts.append((year, pop))
+        return forecasts
     def predict_course(
         self,
         course_code: str,
         student_pop: float,
     ) -> dict:
         hist = df_history[
+            (df_history["kode_mk"] == course_code) & (df_history["smt"] == target_smt)
         ].sort_values(["thn", "smt"])
+        has_historical_data = len(hist) > 0
+        if not has_historical_data:
             return {
                 "val": self.config.model.FALLBACK_DEFAULT,
                 "strategy": "cold_start",
                 "confidence": "low",
+                "classes_needed": self.config.calculate_classes_needed(
+                    self.config.model.FALLBACK_DEFAULT,
+                    course_code,
+                    has_historical_data=False,
+                ),
+                "capacity_status": self.config.get_capacity_status(
+                    self.config.model.FALLBACK_DEFAULT, course_code
+                ),
             }
+        result = self._predict_prophet_with_capacity(
+            hist, target_year, target_smt, student_pop, course_code
+        )
+        result["classes_needed"] = self.config.calculate_classes_needed(
+            result["val"], course_code, has_historical_data=has_historical_data
+        )
+        result["capacity_status"] = self.config.get_capacity_status(
+            result["val"], course_code
         )
+        return result
+    def _predict_prophet_with_capacity(
+        self, hist: pd.DataFrame, year: int, smt: int, pop: float, course_code: str
     ) -> dict:
         df = hist.copy()
         df["ds"] = pd.to_datetime(
                 "confidence": "low",
             }
+        hist_max = float(df["y"].max())
+        hist_mean = float(df["y"].mean())
+        class_capacity = self.config.get_class_capacity(course_code)
         cap_value = min(
             hist_max * self.config.prediction.MAX_CAPACITY_MULTIPLIER,
             self.config.prediction.ABSOLUTE_MAX_STUDENTS,
         )
+        if self.config.class_capacity.ENABLE_CAPACITY_CONSTRAINTS:
+            max_realistic_cap = class_capacity * 4
+            cap_value = min(cap_value, max_realistic_cap)
         df["cap"] = cap_value
         df["floor"] = 0
                 weekly_seasonality=False,  # type: ignore[arg-type]
             )
+            if self.config.model.USE_POPULATION_REGRESSOR:
+                m.add_regressor("jumlah_aktif", mode="multiplicative")
+                m.fit(df[["ds", "y", "cap", "floor", "jumlah_aktif"]])
+            else:
+                m.fit(df[["ds", "y", "cap", "floor"]])
             future_date = pd.to_datetime(
                 f"{year}-{self.config.prediction.SEMESTER_TO_MONTH[smt]}"
                     "ds": [future_date],
                     "cap": [cap_value],
                     "floor": [0],
                 }
             )
+            if self.config.model.USE_POPULATION_REGRESSOR:
+                future["jumlah_aktif"] = pop
             forecast = m.predict(future)
             raw_pred = forecast["yhat"].values[0]
                 or raw_pred > cap_value * 2
             ):
                 logger.warning(
+                    f"Prophet prediction ({raw_pred:.1f}) unrealistic for {course_code}. "
                     f"Using trend-based fallback. (hist_max={hist_max}, cap={cap_value})"
                 )
+                pop_mean = float(df["jumlah_aktif"].mean())
                 if len(df) >= 3:
+                    recent_trend = float(df["y"].tail(3).mean())
+                    pop_growth_factor = pop / pop_mean if pop_mean > 0 else 1.0
+                    growth_factor = min(max(pop_growth_factor, 0.8), 1.3)
                     pred = recent_trend * growth_factor
                 else:
+                    pop_growth_factor = pop / pop_mean if pop_mean > 0 else 1.0
                     pred = hist_mean * min(max(pop_growth_factor, 0.8), 1.3)
                 pred = min(max(pred, 0), cap_value)
             }
         except Exception as e:
+            logger.warning(
+                f"Prophet failed for course {course_code}. Error: {e}. Using fallback."
+            )
             return {
                 "val": hist["enrollment"].mean(),
                 "strategy": "fallback_mean",
                 "confidence": "medium",
             }
+    def predict_multi_year(
+        self,
+        course_code: str,
+        df_history: pd.DataFrame,
+        start_year: int,
+        target_smt: int,
+        years_ahead: int = 3,
+    ) -> List[Dict]:
+        predictions = []
+        for i in range(years_ahead + 1):
+            year = start_year + i
+            pop = self.get_student_forecast(year, target_smt)
+            pred = self.predict_course(course_code, df_history, year, target_smt, pop)
+            pred["year"] = year
+            pred["semester"] = target_smt
+            pred["student_population"] = pop
+            predictions.append(pred)
+        return predictions
     def generate_batch_predictions(
         self,
         full_data: pd.DataFrame,
         electives: set,
         year: int,
         smt: int,
+    ) -> pd.DataFrame:
         student_pop = self.get_student_forecast(year, smt)
         results = []
         )
         for code in electives:
+            meta_rows = course_metadata[course_metadata["kode_mk"] == code]
+            if len(meta_rows) == 0:
+                logger.warning(f"No metadata found for course {code}, skipping")
+                continue
+            meta = meta_rows.iloc[0]
             pred_result = self.predict_course(code, full_data, year, smt, student_pop)
             pred_val = pred_result["val"]
+            course_history = full_data[full_data["kode_mk"] == code]
+            has_history = len(course_history) > 0
+            classes_needed = pred_result.get(
+                "classes_needed",
+                self.config.calculate_classes_needed(
+                    pred_val, code, has_historical_data=has_history
+                ),
             )
+            course_capacity = self.config.get_class_capacity(code)
+            if classes_needed > 0:
+                rec_quota = classes_needed * course_capacity
+            else:
+                rec_quota = 0
+            min_threshold = self.config.class_capacity.MIN_STUDENTS_TO_OPEN_CLASS
+            should_open = pred_val >= min_threshold or (
+                has_history and self.config.class_capacity.OPEN_CLASS_IF_HAS_HISTORY
             )
+            status = "BUKA" if should_open else "TUTUP"
+            if classes_needed > 0:
+                total_capacity = classes_needed * course_capacity
+                utilization = (pred_val / total_capacity) * 100
+            else:
+                utilization = 0
             results.append(
                 {
                     "kode_mk": code,
                     "nama_mk": meta["nama_mk"],
+                    "sks": meta.get("sks_mk", 0),
                     "predicted_enrollment": round(pred_val, 1),
+                    "class_capacity": course_capacity,
+                    "classes_needed": classes_needed,
+                    "total_quota": rec_quota,
+                    "utilization_pct": round(utilization, 1),
                     "recommendation": status,
+                    "capacity_status": pred_result.get("capacity_status", "NORMAL"),
                     "strategy": pred_result["strategy"],
                     "confidence": pred_result["confidence"],
                 }
             )
             "predicted_enrollment", ascending=False
         )
+    def generate_multi_year_forecast(
+        self,
+        full_data: pd.DataFrame,
+        course_metadata: pd.DataFrame,
+        electives: set,
+        start_year: int,
+        smt: int,
+        years_ahead: int = 3,
+    ) -> pd.DataFrame:
+        all_results = []
+        for code in electives:
+            meta_rows = course_metadata[course_metadata["kode_mk"] == code]
+            if len(meta_rows) == 0:
+                continue
+            meta = meta_rows.iloc[0]
+            year_predictions = self.predict_multi_year(
+                code, full_data, start_year, smt, years_ahead
+            )
+            for pred in year_predictions:
+                course_capacity = self.config.get_class_capacity(code)
+                classes_needed = pred.get("classes_needed", 0)
+                all_results.append(
+                    {
+                        "kode_mk": code,
+                        "nama_mk": meta["nama_mk"],
+                        "year": pred["year"],
+                        "semester": pred["semester"],
+                        "predicted_enrollment": round(pred["val"], 1),
+                        "classes_needed": classes_needed,
+                        "total_capacity": classes_needed * course_capacity,
+                        "student_population": round(pred["student_population"], 0),
+                        "strategy": pred["strategy"],
+                        "confidence": pred["confidence"],
+                    }
+                )
+        return pd.DataFrame(all_results).sort_values(["kode_mk", "year"])
+    def get_course_trend_analysis(
+        self,
+        course_code: str,
+        df_history: pd.DataFrame,
+        target_smt: int,
+    ) -> Dict:
+        hist = df_history[
+            (df_history["kode_mk"] == course_code) & (df_history["smt"] == target_smt)
+        ].sort_values("thn")
+        if len(hist) < 2:
+            return {
+                "has_sufficient_data": False,
+                "data_points": len(hist),
+            }
+        enrollments = np.array(hist["enrollment"].values, dtype=float)
+        years = np.array(hist["thn"].values, dtype=float)
+        growth_rates = []
+        for i in range(1, len(enrollments)):
+            if enrollments[i - 1] > 0:
+                rate = (enrollments[i] - enrollments[i - 1]) / enrollments[i - 1]
+                growth_rates.append(rate)
+        avg_growth_rate = float(np.mean(growth_rates)) if growth_rates else 0.0
+        if len(years) >= 2:
+            coeffs = np.polyfit(years, enrollments, 1)
+            trend_slope = float(coeffs[0])
+        else:
+            trend_slope = 0.0
+        return {
+            "has_sufficient_data": True,
+            "data_points": len(hist),
+            "min_enrollment": int(enrollments.min()),
+            "max_enrollment": int(enrollments.max()),
+            "avg_enrollment": round(float(enrollments.mean()), 1),
+            "latest_enrollment": int(enrollments[-1]),
+            "avg_growth_rate": round(avg_growth_rate * 100, 1),  # as percentage
+            "trend_slope": round(trend_slope, 2),
+            "trend_direction": "increasing"
+            if trend_slope > 0
+            else "decreasing"
+            if trend_slope < 0
+            else "stable",
+            "year_range": f"{int(years.min())}-{int(years.max())}",
+        }
     def predict_course_enrollment(
         self,
         course_code: str,
         test_year: int,
         test_semester: int,
         test_student_count: float,
+    ) -> tuple:
         result = self.predict_course(
             course_code=course_code,
             df_history=train_data,

ui_components.py ADDED Viewed

	@@ -0,0 +1,322 @@

+from typing import Dict
+def get_color(value: float, thresholds: tuple = (50, 25)) -> str:
+    high, low = thresholds
+    if value >= high:
+        return "#4ade80"
+    elif value >= low:
+        return "#fb923c"
+    else:
+        return "#f87171"
+def get_diff_color(value: float) -> str:
+    return "#4ade80" if value >= 0 else "#f87171"
+# Card Components
+def metric_card(title: str, value: str, color: str, subtitle: str = "") -> str:
+    subtitle_html = (
+        f'<div style="font-size: 11px; color: #9ca3af; margin-top: 4px;">{subtitle}</div>'
+        if subtitle
+        else ""
+    )
+    return f"""
+        <div style="background: #1e293b; padding: 20px; border-radius: 12px; border-left: 4px solid {color};">
+            <div style="font-size: 12px; color: #9ca3af; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 8px;">{title}</div>
+            <div style="font-size: 28px; font-weight: 700; color: {color};">{value}</div>
+            {subtitle_html}
+        </div>
+    """
+def info_row(label: str, value: str, color: str = "#fff", border: bool = True) -> str:
+    border_style = "border-bottom: 1px solid #334155;" if border else ""
+    return f"""
+        <div style="display: flex; justify-content: space-between; padding: 12px 0; {border_style}">
+            <span style="color: #9ca3af;">{label}</span>
+            <span style="font-weight: 600; color: {color};">{value}</span>
+        </div>
+    """
+def info_card(title: str, rows: list) -> str:
+    rows_html = "".join(rows)
+    return f"""
+        <div style="background: #1e293b; padding: 20px; border-radius: 12px;">
+            <h4 style="margin: 0 0 16px 0; color: #fff; font-size: 14px; font-weight: 600;">{title}</h4>
+            {rows_html}
+        </div>
+    """
+# Summary Templates
+def build_validation_summary(data: Dict) -> str:
+    """Build summary HTML for validation mode (when actual data exists)."""
+    year = data["year"]
+    semester_name = data["semester_name"]
+    class_capacity = data["class_capacity"]
+    data_source = data.get("data_source", "kalkulasi")
+    # Metrics
+    class_accuracy_pct = data.get("class_accuracy_pct", 0)
+    class_within_one_pct = data.get("class_within_one_pct", 0)
+    total_classes = data.get("total_classes", 0)
+    comparison_mae = data.get("comparison_mae", 0)
+    comparison_rmse = data.get("comparison_rmse", 0)
+    total_for_class_accuracy = data.get("total_for_class_accuracy", 0)
+    # Enrollment metrics
+    total_actual = data.get("total_actual", 0)
+    total_predicted = data.get("total_predicted", 0)
+    accuracy_pct = data.get("accuracy_pct", 0)
+    class_matches = data.get("class_matches", 0)
+    class_within_one = data.get("class_within_one", 0)
+    # Colors
+    class_accuracy_color = get_color(class_accuracy_pct)
+    diff_color = get_diff_color(total_predicted - total_actual)
+    return f"""
+<div style="padding: 24px;">
+    <div style="margin-bottom: 24px;">
+        <h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">{
+        year
+    } Semester {semester_name}</h2>
+        <p style="color: #9ca3af; margin: 0; font-size: 14px;">Validasi prediksi terhadap data aktual | Kapasitas per kelas: {
+        class_capacity
+    } mahasiswa | Sumber kelas aktual: {data_source}</p>
+    </div>
+    <div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px; margin-bottom: 24px;">
+        {
+        metric_card(
+            "Akurasi Kelas",
+            f"{class_accuracy_pct:.1f}%",
+            class_accuracy_color,
+            f"±1 kelas: {class_within_one_pct:.1f}%",
+        )
+    }
+        {metric_card("Total Kelas Prediksi", str(total_classes), "#60a5fa")}
+        {
+        metric_card(
+            "MAE / RMSE", f"{comparison_mae:.1f} / {comparison_rmse:.1f}", "#a78bfa"
+        )
+    }
+        {metric_card("MK Divalidasi", str(total_for_class_accuracy), "#fb923c")}
+    </div>
+    <div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 16px;">
+        {
+        info_card(
+            "Ringkasan Enrollment",
+            [
+                info_row("Total Aktual", str(int(total_actual))),
+                info_row("Total Prediksi", str(int(total_predicted))),
+                info_row(
+                    "Selisih",
+                    f"{int(total_predicted - total_actual):+d}",
+                    diff_color,
+                    border=False,
+                ),
+            ],
+        )
+    }
+        {
+        info_card(
+            f"Akurasi Prediksi Kelas (dari {data_source})",
+            [
+                info_row(
+                    "Kelas Tepat",
+                    f"{class_matches}/{total_for_class_accuracy}",
+                    "#4ade80",
+                ),
+                info_row(
+                    "Selisih ±1 Kelas",
+                    f"{class_within_one}/{total_for_class_accuracy}",
+                    "#60a5fa",
+                ),
+                info_row("Akurasi Enrollment", f"{accuracy_pct:.1f}%", border=False),
+            ],
+        )
+    }
+    </div>
+</div>
+"""
+def build_no_match_summary(data: Dict) -> str:
+    year = data["year"]
+    semester_name = data["semester_name"]
+    metrics = data.get("metrics", {"mae": 0, "rmse": 0})
+    total_to_open = data.get("total_to_open", 0)
+    total_classes = data.get("total_classes", 0)
+    return f"""
+<div style="padding: 24px;">
+    <div style="margin-bottom: 24px;">
+        <h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">{year} Semester {semester_name}</h2>
+        <p style="color: #9ca3af; margin: 0; font-size: 14px;">Data semester ada, tetapi tidak ditemukan MK pilihan yang cocok</p>
+    </div>
+    <div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px;">
+        {metric_card("MAE (Backtest)", f"{metrics['mae']:.2f}", "#60a5fa")}
+        {metric_card("RMSE (Backtest)", f"{metrics['rmse']:.2f}", "#a78bfa")}
+        {metric_card("MK Dibuka", str(total_to_open), "#4ade80")}
+        {metric_card("Total Kelas", str(total_classes), "#fb923c")}
+    </div>
+</div>
+"""
+def build_future_prediction_summary(data: Dict) -> str:
+    year = data["year"]
+    semester_name = data["semester_name"]
+    class_capacity = data["class_capacity"]
+    metrics = data.get("metrics", {"mae": 0, "rmse": 0})
+    total_to_open = data.get("total_to_open", 0)
+    total_classes = data.get("total_classes", 0)
+    total_predicted_students = data.get("total_predicted_students", 0)
+    total_capacity = data.get("total_capacity", 0)
+    avg_utilization = (
+        (total_predicted_students / total_capacity * 100) if total_capacity > 0 else 0
+    )
+    return f"""
+<div style="padding: 24px;">
+    <div style="margin-bottom: 24px;">
+        <h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">{
+        year
+    } Semester {semester_name}</h2>
+        <p style="color: #9ca3af; margin: 0; font-size: 14px;">Prediksi masa depan berdasarkan tren historis | Kapasitas per kelas: {
+        class_capacity
+    } mahasiswa</p>
+    </div>
+    <div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px; margin-bottom: 24px;">
+        {metric_card("MK Dibuka", str(total_to_open), "#4ade80")}
+        {metric_card("Total Kelas Dibuka", str(total_classes), "#60a5fa")}
+        {metric_card("Prediksi Mahasiswa", str(total_predicted_students), "#a78bfa")}
+        {metric_card("Total Kuota", str(total_capacity), "#fb923c")}
+    </div>
+    <div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 16px;">
+        {
+        info_card(
+            "Backtest Metrics",
+            [
+                info_row("MAE", f"{metrics['mae']:.2f}"),
+                info_row("RMSE", f"{metrics['rmse']:.2f}", border=False),
+            ],
+        )
+    }
+        {
+        info_card(
+            "Kapasitas Info",
+            [
+                info_row("Kapasitas/Kelas", f"{class_capacity} mhs"),
+                info_row("Avg Utilization", f"{avg_utilization:.1f}%", border=False),
+            ],
+        )
+    }
+    </div>
+</div>
+"""
+def build_prediction_summary(data: Dict) -> str:
+    has_actual_data = data.get("has_actual_data", False)
+    if has_actual_data:
+        if "comparison_mae" in data:
+            return build_validation_summary(data)
+        else:
+            return build_no_match_summary(data)
+    else:
+        return build_future_prediction_summary(data)
+def build_multi_year_summary(data: Dict) -> str:
+    year = data["year"]
+    years_ahead = data["years_ahead"]
+    semester_name = data["semester_name"]
+    class_capacity = data["class_capacity"]
+    first_year_classes = data["first_year_classes"]
+    last_year_classes = data["last_year_classes"]
+    growth_classes = data["growth_classes"]
+    growth_students = data["growth_students"]
+    growth_class_color = get_diff_color(growth_classes)
+    growth_student_color = get_diff_color(growth_students)
+    return f"""
+<div style="padding: 24px;">
+    <div style="margin-bottom: 24px;">
+        <h2 style="margin: 0 0 8px 0; color: #fff; font-size: 24px; font-weight: 600;">Proyeksi {years_ahead} Tahun ke Depan - Semester {semester_name}</h2>
+        <p style="color: #9ca3af; margin: 0; font-size: 14px;">Forecasting kebutuhan kelas {year} - {year + years_ahead} | Kapasitas per kelas: {class_capacity} mahasiswa</p>
+    </div>
+    <div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 16px; margin-bottom: 24px;">
+        {metric_card(f"Kelas ({year})", str(first_year_classes), "#4ade80")}
+        {metric_card(f"Kelas ({year + years_ahead})", str(last_year_classes), "#60a5fa")}
+        {metric_card("Pertumbuhan Kelas", f"{growth_classes:+d}", growth_class_color)}
+        {metric_card("Pertumbuhan Mhs", f"{growth_students:+d}", growth_student_color)}
+    </div>
+</div>
+"""
+# Placeholder Templates
+def placeholder_card(title: str, subtitle: str) -> str:
+    return f"""
+        <div style="padding: 60px 40px; text-align: center; background: #1e293b; border-radius: 12px;">
+            <h3 style="color: #fff; margin: 0 0 8px 0; font-size: 18px; font-weight: 600;">{title}</h3>
+            <p style="color: #9ca3af; margin: 0; font-size: 14px;">{subtitle}</p>
+        </div>
+    """
+def get_prediction_placeholder() -> str:
+    return placeholder_card(
+        "Pilih tahun dan semester",
+        "Klik Generate Predictions untuk melihat rekomendasi jumlah kelas",
+    )
+def get_forecast_placeholder() -> str:
+    return placeholder_card(
+        "Proyeksi Multi-Tahun", "Lihat tren kebutuhan kelas beberapa tahun ke depan"
+    )
+# Data Info Component
+def build_data_info(data: Dict) -> str:
+    if "error" in data:
+        return f"<p style='color: #f87171;'>{data['error']}</p>"
+    return f"""
+<div style="padding: 8px 0;">
+    <div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 12px;">
+        <div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
+            <div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">Total MK</div>
+            <div style="font-size: 20px; font-weight: 700; color: #fff;">{data["total_courses"]}</div>
+        </div>
+        <div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
+            <div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">MK Pilihan</div>
+            <div style="font-size: 20px; font-weight: 700; color: #4ade80;">{data["elective_courses"]}</div>
+        </div>
+        <div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
+            <div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">Kapasitas/Kelas</div>
+            <div style="font-size: 20px; font-weight: 700; color: #60a5fa;">{data["class_capacity"]}</div>
+        </div>
+        <div style="background: #1e293b; padding: 16px; border-radius: 8px; text-align: center;">
+            <div style="font-size: 11px; color: #9ca3af; margin-bottom: 6px;">Tahun Data</div>
+            <div style="font-size: 20px; font-weight: 700; color: #fb923c;">{data["year_min"]}-{data["year_max"]}</div>
+        </div>
+    </div>
+</div>
+"""