Spaces:

DavMelchi
/

db_query

Sleeping

App Files Files Community

DavMelchi commited on 22 days ago

Commit

a8899bd

1 Parent(s): 604d355

Add KPI Health Check Panel with multi-RAT analysis, persistent degradation detection, and Excel export functionality

Browse files

Files changed (12) hide show

documentations/kpi_health_check_plan.md +230 -0
panel_app/kpi_health_check_panel.py +475 -0
panel_app/panel_portal.py +115 -0
panel_app/trafic_analysis_panel.py +18 -10
process_kpi/__init__.py +0 -0
process_kpi/kpi_health_check/__init__.py +0 -0
process_kpi/kpi_health_check/engine.py +210 -0
process_kpi/kpi_health_check/export.py +38 -0
process_kpi/kpi_health_check/io.py +45 -0
process_kpi/kpi_health_check/multi_rat.py +126 -0
process_kpi/kpi_health_check/normalization.py +219 -0
process_kpi/kpi_health_check/rules.py +31 -0

documentations/kpi_health_check_plan.md ADDED Viewed

	@@ -0,0 +1,230 @@

+# KPI Health Check (Panel) — Plan global
+## 1) Contexte & objectif
+En NPO, on reçoit des plaintes client (site(s) impacté(s)). L’objectif est de pouvoir, en quelques minutes, vérifier si les KPI radio sont:
+- Dégradés récemment
+- Dégradés depuis longtemps (persistants)
+- En voie de résolution / résolus
+Le but est d’avoir une application Panel simple à utiliser, mais “expert-friendly”, qui standardise l’analyse et produit un rapport exportable.
+## 2) Entrées & formats de données
+### 2.1 Fichiers
+- Rapports KPI par technologie: 2G / 3G / LTE
+- Format: `.csv` ou `.zip` contenant un (ou plusieurs) `.csv`
+- Séparateur: `;` (latin1) comme dans les apps existantes
+### 2.2 Colonnes clés attendues
+- Date: `PERIOD_START_TIME` (parfois date seule ou date+heure)
+- Identifiant NE/site/cell (variable selon RAT)
+  - 2G: `BCF name` / `DN`
+  - 3G: `WBTS name` / `DN`
+  - LTE: `LNBTS name` / `DN`
+### 2.3 KPI (liste fournie)
+- Les colonnes KPI sont nombreuses et hétérogènes.
+- Il faut distinguer:
+  - KPI “taux” (availability, success rate, CSSR, etc.) => agrégation moyenne
+  - KPI “compteurs/volumes/traffic” => agrégation somme
+## 3) Sorties attendues
+### 3.1 Résultats UI
+- Tableau “Health Summary” par site:
+  - Nombre de KPI dégradés (par RAT)
+  - Nombre de KPI persistants
+  - Top KPI les plus critiques
+- Drill-down par site:
+  - Courbes temporelles
+  - Jours dégradés
+  - Comparaison baseline vs récent
+### 3.2 Export
+- Export Excel d’un rapport complet:
+  - Résumé datasets
+  - Règles KPI (seuils/direction)
+  - Résumé site
+  - Détails par KPI/site
+  - Série journalière (optionnel / selon volume)
+## 4) Modèle de détection (logique “expert”)
+### 4.1 Normalisation
+- Parsing date en `datetime`
+- Construction `date_only`
+- Extraction d’un `site_code` (ou code site) depuis le nom (pattern numérique / split comme dans l’app trafic)
+- Enrichissement via `physical_db` (City, Latitude/Longitude) comme dans `trafic_analysis_panel.py`
+### 4.2 Règles KPI (paramétrables)
+Chaque KPI doit avoir:
+- `direction`:
+  - `higher_is_better` (availability, success rate, CSSR, throughput, traffic)
+  - `lower_is_better` (drop rate, blocking, congestion, loss, discard, RTWP, PRB usage)
+- `sla` (optionnel): seuil absolu
+- `agg`: `mean` ou `sum`
+### 4.3 Fenêtres temporelles
+Paramètres globaux:
+- Baseline window (ex: 30 jours)
+- Recent window (ex: 7 jours)
+- Min consecutive bad days (ex: 3 jours) => persistant
+### 4.4 Critères de dégradation
+Pour un couple (site, KPI, RAT):
+- **Dégradation relative** vs baseline
+  - ex: variation > X% dans le “mauvais sens”
+- **Dégradation absolue** vs SLA
+  - ex: availability < 98% ou drop > 2%
+### 4.5 Classification (états)
+- `OK`: pas de dégradation
+- `DEGRADED`: dégradé récemment
+- `PERSISTENT_DEGRADED`: dégradé récemment + streak >= N jours
+- `RESOLVED` (V2): dégradé avant mais OK sur les derniers jours
+- `NO_DATA`: pas de points exploitables
+## 5) UX / écrans (Panel)
+### 5.0 Mode multipage (Portal)
+L’app cible est un **portal Panel multipage** qui regroupe plusieurs pages (apps) sur un seul serveur Panel.
+Pages initiales:
+- Global Traffic Analysis (page existante: `panel_app/trafic_analysis_panel.py`)
+- KPI Health Check (nouvelle page: `panel_app/kpi_health_check_panel.py`)
+Navigation:
+- menu latéral (sélecteur de page)
+- la sidebar et le contenu principal changent selon la page sélectionnée
+### 5.1 Sidebar (configuration)
+- Upload 2G/3G/LTE
+- Période d’analyse optionnelle
+- Paramètres: baseline/recent/threshold/persistance
+- Boutons:
+  - Charger & construire les règles
+  - Lancer le health check
+  - Export Excel
+### 5.2 Main
+- Datasets summary
+- KPI Rules table (editable)
+- Site Summary table
+- Drill-down:
+  - Sélection site
+  - Sélection RAT
+  - Sélection KPI
+  - Courbe KPI
+  - Table KPI/site (statuts)
+## 6) Architecture code & organisation
+### 6.1 Modules
+Objectif: **app modulaire**, pas un fichier monolithique.
+- `panel_app/kpi_health_check_panel.py`
+  - UI Panel (widgets, layout)
+  - branchement des callbacks
+  - aucune logique “métier” lourde
+- `panel_app/panel_portal.py`
+  - page d’accueil + navigation multipage
+  - import des pages et affichage via `get_page_components()`
+- `process_kpi/kpi_health_check/io.py`
+  - lecture ZIP/CSV
+  - support ZIP multi-CSV (V2)
+- `process_kpi/kpi_health_check/normalization.py`
+  - détection colonne date / parsing
+  - extraction `site_code` depuis BCF/WBTS/LNBTS/DN
+  - agrégation journalière (mean vs sum)
+  - enrichissement `physical_db` (City/Lat/Lon)
+- `process_kpi/kpi_health_check/rules.py`
+  - génération des règles KPI (direction, SLA, agg)
+  - validation / normalisation des règles
+- `process_kpi/kpi_health_check/engine.py`
+  - calcul baseline vs récent
+  - classification (OK / DEGRADED / PERSISTENT_DEGRADED / RESOLVED)
+  - construction tables output
+- `process_kpi/kpi_health_check/multi_rat.py`
+  - synthèse cross-RAT par site
+  - “top anomalies” multi-RAT
+- `process_kpi/kpi_health_check/export.py`
+  - build Excel bytes (réutilise `panel_app/convert_to_excel_panel.py`)
+### 6.2 Fonctions clés
+- Lecture ZIP/CSV
+- Détection colonnes date & ID
+- Construction dataset journalier
+- Génération règles KPI
+- Évaluation health check
+- Export Excel
+### 6.3 Règle `site_code` + enrichissement physical DB (comme l’app trafic)
+- **Extraction code site**
+  - stratégie principale: logique proche de `trafic_analysis_panel.py` (split / préfixe numérique du nom)
+  - fallback: regex sur séquence de chiffres dans le nom
+- **Enrichissement**
+  - charger `physical_db/physical_database.csv` via `get_physical_db()`
+  - construire `code` depuis `Code_Sector` (`split('_')[0]`) puis cast int
+  - jointure sur le code pour récupérer `City`, `Longitude`, `Latitude`
+## 7) Roadmap / itérations
+### V1 (MVP)
+- [DONE] Upload 2G/3G/LTE (multi-RAT)
+- [DONE] Détection KPI numériques
+- [DONE] Règles KPI éditables
+- [DONE] Détection DEGRADED / PERSISTENT_DEGRADED / OK
+- [DONE] Drill-down simple + export
+### V2 (expert)
+- [DONE] RESOLVED (dégradé puis OK)
+- [DONE] Support ZIP multi-CSV
+- [N/A] Support “cell-level” vs “site-level” (switch) (KPI confirmés par site)
+- [TODO] Score de criticité (pondérer par trafic, population, criticité client)
+- [DONE] Table “Top anomalies” multi-RAT (cross-RAT)
+- [TODO] Visualisations avancées (heatmap par jour, histogrammes, etc.)
+### V3 (industrialisation)
+- [TODO] Presets de règles par opérateur
+- [TODO] Gestion profils / sauvegarde de configuration
+- [TODO] Import automatique de “liste des sites plaintes"
+- [TODO] Génération PDF (optionnel) et pack de preuves
+## 8) Points ouverts à confirmer
+- [DONE] Les KPI sont par site
+- [DONE] Les ZIP contiennent-ils parfois plusieurs CSV ? (support multi-CSV implémenté)
+- [PARTIAL] Format exact de `PERIOD_START_TIME` sur tous les rapports ? (parsing renforcé, à valider sur tes fichiers)
+- [TODO] Extraction du code site: règle unique ou dépend du naming ?
+## 9) Critères de réussite
+- Charger un rapport KPI et obtenir un top sites dégradés en < 1 minute
+- Pouvoir isoler rapidement: depuis quand, sur quels KPI, et si c’est persistant
+- Export Excel exploitable pour partage interne

panel_app/kpi_health_check_panel.py ADDED Viewed

	@@ -0,0 +1,475 @@

+import io
+import os
+import sys
+from datetime import date
+import pandas as pd
+import panel as pn
+import plotly.express as px
+ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+if ROOT_DIR not in sys.path:
+    sys.path.insert(0, ROOT_DIR)
+from process_kpi.kpi_health_check.engine import evaluate_health_check
+from process_kpi.kpi_health_check.export import build_export_bytes
+from process_kpi.kpi_health_check.io import read_bytes_to_df
+from process_kpi.kpi_health_check.multi_rat import compute_multirat_views
+from process_kpi.kpi_health_check.normalization import (
+    build_daily_kpi,
+    infer_date_col,
+    infer_id_col,
+)
+from process_kpi.kpi_health_check.rules import infer_kpi_direction, infer_kpi_sla
+pn.extension("plotly", "tabulator")
+PLOTLY_CONFIG = {"displaylogo": False, "scrollZoom": True, "displayModeBar": True}
+def read_fileinput_to_df(file_input: pn.widgets.FileInput) -> pd.DataFrame | None:
+    if file_input is None or not file_input.value:
+        return None
+    return read_bytes_to_df(file_input.value, file_input.filename or "")
+current_daily_by_rat: dict[str, pd.DataFrame] = {}
+current_rules_df: pd.DataFrame | None = None
+current_status_df: pd.DataFrame | None = None
+current_summary_df: pd.DataFrame | None = None
+current_multirat_df: pd.DataFrame | None = None
+current_top_anomalies_df: pd.DataFrame | None = None
+current_export_bytes: bytes | None = None
+file_2g = pn.widgets.FileInput(name="2G KPI report", accept=".csv,.zip")
+file_3g = pn.widgets.FileInput(name="3G KPI report", accept=".csv,.zip")
+file_lte = pn.widgets.FileInput(name="LTE KPI report", accept=".csv,.zip")
+analysis_range = pn.widgets.DateRangePicker(name="Analysis date range (optional)")
+baseline_days = pn.widgets.IntInput(name="Baseline window (days)", value=30)
+recent_days = pn.widgets.IntInput(name="Recent window (days)", value=7)
+rel_threshold_pct = pn.widgets.FloatInput(
+    name="Relative change threshold (%)", value=10.0, step=1.0
+)
+min_consecutive_days = pn.widgets.IntInput(
+    name="Min consecutive bad days (persistent)", value=3
+)
+load_button = pn.widgets.Button(
+    name="Load datasets & build rules", button_type="primary"
+)
+run_button = pn.widgets.Button(name="Run health check", button_type="primary")
+status_pane = pn.pane.Alert(
+    "Upload KPI reports (ZIP/CSV), then load datasets and run health check.",
+    alert_type="primary",
+)
+datasets_table = pn.widgets.Tabulator(
+    height=180, sizing_mode="stretch_width", layout="fit_data_table"
+)
+rules_table = pn.widgets.Tabulator(
+    height=260, sizing_mode="stretch_width", layout="fit_data_table"
+)
+try:
+    rules_table.editable = True
+except Exception:  # noqa: BLE001
+    try:
+        cfg = dict(rules_table.configuration or {})
+        cfg["editable"] = True
+        rules_table.configuration = cfg
+    except Exception:  # noqa: BLE001
+        pass
+site_summary_table = pn.widgets.Tabulator(
+    height=260, sizing_mode="stretch_width", layout="fit_data_table"
+)
+multirat_summary_table = pn.widgets.Tabulator(
+    height=260, sizing_mode="stretch_width", layout="fit_data_table"
+)
+top_anomalies_table = pn.widgets.Tabulator(
+    height=260, sizing_mode="stretch_width", layout="fit_data_table"
+)
+site_select = pn.widgets.AutocompleteInput(
+    name="Select a site (Type to search)",
+    options={},
+    case_sensitive=False,
+    search_strategy="includes",
+    restrict=True,
+)
+rat_select = pn.widgets.RadioButtonGroup(
+    name="RAT", options=["2G", "3G", "LTE"], value="LTE"
+)
+kpi_select = pn.widgets.Select(name="KPI", options=[])
+site_kpi_table = pn.widgets.Tabulator(
+    height=260, sizing_mode="stretch_width", layout="fit_data_table"
+)
+trend_plot_pane = pn.pane.Plotly(sizing_mode="stretch_both", config=PLOTLY_CONFIG)
+export_button = pn.widgets.FileDownload(
+    label="Download KPI Health Check report",
+    filename="KPI_Health_Check_Report.xlsx",
+    button_type="primary",
+)
+def _filtered_daily(df: pd.DataFrame) -> pd.DataFrame:
+    if df is None or df.empty:
+        return pd.DataFrame()
+    if (
+        analysis_range.value
+        and len(analysis_range.value) == 2
+        and analysis_range.value[0]
+        and analysis_range.value[1]
+    ):
+        start, end = analysis_range.value
+        mask = (df["date_only"] >= start) & (df["date_only"] <= end)
+        return df[mask].copy()
+    return df
+def _update_site_options() -> None:
+    all_sites = []
+    for df in current_daily_by_rat.values():
+        if df is None or df.empty:
+            continue
+        cols = [c for c in ["site_code", "City"] if c in df.columns]
+        all_sites.append(df[cols].drop_duplicates("site_code"))
+    if not all_sites:
+        site_select.options = {}
+        site_select.value = None
+        return
+    sites_df = pd.concat(all_sites, ignore_index=True).drop_duplicates("site_code")
+    if "City" not in sites_df.columns:
+        sites_df["City"] = pd.NA
+    sites_df = sites_df.sort_values(by=["City", "site_code"], na_position="last")
+    opts: dict[str, int] = {}
+    for _, row in sites_df.iterrows():
+        label = (
+            f"{row['City']}_{row['site_code']}"
+            if pd.notna(row.get("City"))
+            else str(row["site_code"])
+        )
+        opts[str(label)] = int(row["site_code"])
+    site_select.options = opts
+    if opts and site_select.value not in opts.values():
+        site_select.value = next(iter(opts.values()))
+def _update_kpi_options() -> None:
+    rat = rat_select.value
+    df = current_daily_by_rat.get(rat)
+    if df is None or df.empty:
+        kpi_select.options = []
+        kpi_select.value = None
+        return
+    kpis = [
+        c
+        for c in df.columns
+        if c not in {"site_code", "date_only", "Longitude", "Latitude", "City", "RAT"}
+    ]
+    kpis = sorted([str(c) for c in kpis])
+    kpi_select.options = kpis
+    if kpis and kpi_select.value not in kpis:
+        kpi_select.value = kpis[0]
+def _update_site_view(event=None) -> None:
+    if current_status_df is None or current_status_df.empty:
+        site_kpi_table.value = pd.DataFrame()
+        trend_plot_pane.object = None
+        return
+    code = site_select.value
+    rat = rat_select.value
+    kpi = kpi_select.value
+    if code is None or rat is None:
+        site_kpi_table.value = pd.DataFrame()
+        trend_plot_pane.object = None
+        return
+    site_df = current_status_df[
+        (current_status_df["site_code"] == int(code))
+        & (current_status_df["RAT"] == rat)
+    ].copy()
+    site_kpi_table.value = site_df
+    daily = current_daily_by_rat.get(rat)
+    if daily is None or daily.empty or not kpi or kpi not in daily.columns:
+        trend_plot_pane.object = None
+        return
+    d = _filtered_daily(daily)
+    s = d[d["site_code"] == int(code)].copy().sort_values("date_only")
+    if s.empty:
+        trend_plot_pane.object = None
+        return
+    title = f"{rat} - {kpi} - site {int(code)}"
+    fig = px.line(s, x="date_only", y=kpi, markers=True)
+    fig.update_layout(template="plotly_white", title=title)
+    trend_plot_pane.object = fig
+def load_datasets(event=None) -> None:
+    try:
+        status_pane.alert_type = "primary"
+        status_pane.object = "Loading datasets..."
+        global current_daily_by_rat, current_rules_df
+        global current_status_df, current_summary_df, current_export_bytes
+        global current_multirat_df, current_top_anomalies_df
+        current_daily_by_rat = {}
+        current_rules_df = None
+        current_status_df = None
+        current_summary_df = None
+        current_multirat_df = None
+        current_top_anomalies_df = None
+        current_export_bytes = None
+        site_summary_table.value = pd.DataFrame()
+        multirat_summary_table.value = pd.DataFrame()
+        top_anomalies_table.value = pd.DataFrame()
+        site_kpi_table.value = pd.DataFrame()
+        trend_plot_pane.object = None
+        inputs = {"2G": file_2g, "3G": file_3g, "LTE": file_lte}
+        rows = []
+        rules_rows = []
+        loaded_any = False
+        for rat, widget in inputs.items():
+            df_raw = read_fileinput_to_df(widget)
+            if df_raw is None:
+                continue
+            loaded_any = True
+            date_col = None
+            id_col = None
+            try:
+                date_col = infer_date_col(df_raw)
+            except Exception:  # noqa: BLE001
+                date_col = None
+            try:
+                id_col = infer_id_col(df_raw, rat)
+            except Exception:  # noqa: BLE001
+                id_col = None
+            daily, kpi_cols = build_daily_kpi(df_raw, rat)
+            current_daily_by_rat[rat] = daily
+            d = _filtered_daily(daily)
+            rows.append(
+                {
+                    "RAT": rat,
+                    "rows_raw": int(df_raw.shape[0]),
+                    "cols_raw": int(df_raw.shape[1]),
+                    "date_col": date_col,
+                    "id_col": id_col,
+                    "sites": int(d["site_code"].nunique()),
+                    "days": int(d["date_only"].nunique()),
+                    "kpis": int(len(kpi_cols)),
+                }
+            )
+            for kpi in kpi_cols:
+                direction = infer_kpi_direction(kpi)
+                rules_rows.append(
+                    {
+                        "RAT": rat,
+                        "KPI": kpi,
+                        "direction": direction,
+                        "sla": infer_kpi_sla(kpi, direction),
+                    }
+                )
+        if not loaded_any:
+            raise ValueError("Please upload at least one KPI report")
+        datasets_table.value = pd.DataFrame(rows)
+        rules_df = (
+            pd.DataFrame(rules_rows)
+            .drop_duplicates(subset=["RAT", "KPI"])
+            .sort_values(by=["RAT", "KPI"])
+        )
+        current_rules_df = rules_df
+        rules_table.value = rules_df
+        _update_site_options()
+        _update_kpi_options()
+        status_pane.alert_type = "success"
+        status_pane.object = (
+            "Datasets loaded. Edit KPI rules if needed, then run health check."
+        )
+    except Exception as exc:  # noqa: BLE001
+        status_pane.alert_type = "danger"
+        status_pane.object = f"Error: {exc}"
+def run_health_check(event=None) -> None:
+    try:
+        status_pane.alert_type = "primary"
+        status_pane.object = "Running health check..."
+        global current_status_df, current_summary_df, current_export_bytes
+        global current_multirat_df, current_top_anomalies_df
+        rules_df = (
+            rules_table.value
+            if isinstance(rules_table.value, pd.DataFrame)
+            else pd.DataFrame()
+        )
+        if rules_df.empty:
+            raise ValueError("KPI rules table is empty")
+        all_status = []
+        all_summary = []
+        for rat, daily in current_daily_by_rat.items():
+            d = _filtered_daily(daily)
+            status_df, summary_df = evaluate_health_check(
+                d,
+                rat,
+                rules_df,
+                int(baseline_days.value),
+                int(recent_days.value),
+                float(rel_threshold_pct.value),
+                int(min_consecutive_days.value),
+            )
+            if not status_df.empty:
+                all_status.append(status_df)
+            if not summary_df.empty:
+                all_summary.append(summary_df)
+        current_status_df = (
+            pd.concat(all_status, ignore_index=True) if all_status else pd.DataFrame()
+        )
+        current_summary_df = (
+            pd.concat(all_summary, ignore_index=True) if all_summary else pd.DataFrame()
+        )
+        site_summary_table.value = current_summary_df
+        current_multirat_df, current_top_anomalies_df = compute_multirat_views(
+            current_status_df
+        )
+        multirat_summary_table.value = current_multirat_df
+        top_anomalies_table.value = current_top_anomalies_df
+        current_export_bytes = _build_export_bytes()
+        _update_site_view()
+        status_pane.alert_type = "success"
+        status_pane.object = "Health check completed."
+    except Exception as exc:  # noqa: BLE001
+        status_pane.alert_type = "danger"
+        status_pane.object = f"Error: {exc}"
+def _build_export_bytes() -> bytes:
+    return build_export_bytes(
+        (
+            datasets_table.value
+            if isinstance(datasets_table.value, pd.DataFrame)
+            else None
+        ),
+        rules_table.value if isinstance(rules_table.value, pd.DataFrame) else None,
+        current_summary_df if isinstance(current_summary_df, pd.DataFrame) else None,
+        current_status_df if isinstance(current_status_df, pd.DataFrame) else None,
+        (
+            current_multirat_df
+            if isinstance(current_multirat_df, pd.DataFrame)
+            else None
+        ),
+        (
+            current_top_anomalies_df
+            if isinstance(current_top_anomalies_df, pd.DataFrame)
+            else None
+        ),
+    )
+def _export_callback() -> io.BytesIO:
+    data = current_export_bytes or b""
+    if not data:
+        return io.BytesIO()
+    return io.BytesIO(data)
+load_button.on_click(load_datasets)
+run_button.on_click(run_health_check)
+rat_select.param.watch(lambda e: (_update_kpi_options(), _update_site_view()), "value")
+site_select.param.watch(_update_site_view, "value")
+kpi_select.param.watch(_update_site_view, "value")
+export_button.callback = _export_callback
+# Page layout components (used by the multipage portal)
+sidebar = pn.Column(
+    file_2g,
+    file_3g,
+    file_lte,
+    "---",
+    analysis_range,
+    baseline_days,
+    recent_days,
+    rel_threshold_pct,
+    min_consecutive_days,
+    "---",
+    load_button,
+    run_button,
+    "---",
+    export_button,
+)
+main = pn.Column(
+    status_pane,
+    pn.pane.Markdown("## Datasets"),
+    datasets_table,
+    pn.pane.Markdown("## KPI Rules (editable)"),
+    rules_table,
+    pn.pane.Markdown("## Site Summary"),
+    site_summary_table,
+    pn.pane.Markdown("## Multi-RAT Summary"),
+    multirat_summary_table,
+    pn.pane.Markdown("## Top anomalies (cross-RAT)"),
+    top_anomalies_table,
+    pn.layout.Divider(),
+    pn.pane.Markdown("## Drill-down"),
+    pn.Row(site_select, rat_select, kpi_select),
+    pn.Row(
+        pn.Column(site_kpi_table, sizing_mode="stretch_width"),
+        pn.Column(trend_plot_pane, sizing_mode="stretch_both"),
+    ),
+)
+def get_page_components():
+    return sidebar, main
+if __name__ == "__main__":
+    template = pn.template.MaterialTemplate(title="KPI Health Check - Panel")
+    template.sidebar.append(sidebar)
+    template.main.append(main)
+    template.servable()

panel_app/panel_portal.py ADDED Viewed

	@@ -0,0 +1,115 @@

+import os
+import sys
+import panel as pn
+ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+if ROOT_DIR not in sys.path:
+    sys.path.insert(0, ROOT_DIR)
+pn.extension("plotly", "tabulator")
+import kpi_health_check_panel
+# Import pages (kept as modules, not nested templates)
+import trafic_analysis_panel
+PAGES = {
+    "📊 Global Traffic Analysis": {
+        "get_components": trafic_analysis_panel.get_page_components,
+        "description": "Analyse trafic multi-RAT + cartes + exports.",
+    },
+    "📈 KPI Health Check": {
+        "get_components": kpi_health_check_panel.get_page_components,
+        "description": "Détection KPI dégradés/persistants/résolus + drill-down + export.",
+    },
+}
+HOME_PAGE = "🏠 Gallery"
+page_sidebar_container = pn.Column(sizing_mode="stretch_width")
+page_main_container = pn.Column(sizing_mode="stretch_both")
+page_title = pn.pane.Markdown("", sizing_mode="stretch_width")
+back_button = pn.widgets.Button(
+    name="← Back to gallery",
+    button_type="primary",
+    width=180,
+)
+home_button = pn.widgets.Button(
+    name=HOME_PAGE,
+    button_type="default",
+    width_policy="max",
+)
+def _load_page(page_name: str) -> None:
+    if page_name == HOME_PAGE:
+        page_title.object = "## Applications"
+        tiles = []
+        for title, meta in PAGES.items():
+            btn = pn.widgets.Button(name="Open", button_type="primary", width=120)
+            btn.on_click(lambda e, t=title: _load_page(t))
+            tile = pn.Column(
+                pn.pane.Markdown(f"### {title}\n\n{meta.get('description', '')}"),
+                btn,
+                sizing_mode="stretch_width",
+                margin=(10, 10, 10, 10),
+            )
+            tiles.append(tile)
+        gallery = pn.GridBox(*tiles, ncols=2, sizing_mode="stretch_width")
+        page_sidebar_container.objects = [
+            pn.pane.Markdown(
+                """### Bienvenue\n\nChoisis une application dans la gallery."""
+            )
+        ]
+        page_main_container.objects = [page_title, gallery]
+        return
+    meta = PAGES.get(page_name)
+    if meta is None:
+        page_sidebar_container.objects = [
+            pn.pane.Alert("Unknown page", alert_type="danger")
+        ]
+        page_main_container.objects = []
+        return
+    sidebar, main = meta["get_components"]()
+    page_title.object = f"## {page_name}"
+    page_sidebar_container.objects = [sidebar]
+    page_main_container.objects = [
+        pn.Row(back_button, pn.Spacer(), sizing_mode="stretch_width"),
+        page_title,
+        main,
+    ]
+template = pn.template.MaterialTemplate(title="OML DB - Panel Portal")
+def _go_home(event=None) -> None:
+    _load_page(HOME_PAGE)
+back_button.on_click(_go_home)
+home_button.on_click(_go_home)
+_load_page(HOME_PAGE)
+template.sidebar.append(
+    pn.Column(
+        pn.pane.Markdown("## Navigation"),
+        home_button,
+        pn.layout.Divider(),
+        page_sidebar_container,
+        sizing_mode="stretch_width",
+    )
+)
+template.main.append(page_main_container)
+template.servable()

panel_app/trafic_analysis_panel.py CHANGED Viewed

@@ -16,12 +16,16 @@ if ROOT_DIR not in sys.path:
 from panel_app.convert_to_excel_panel import write_dfs_to_excel
 from utils.utils_vars import get_physical_db
-pn.extension("plotly", "tabulator", raw_css=[
-    ":fullscreen { background-color: white; overflow: auto; }",
-    "::backdrop { background-color: white; }",
-    ".plot-fullscreen-wrapper:fullscreen { padding: 20px; display: flex; flex-direction: column; }",
-    ".plot-fullscreen-wrapper:fullscreen > * { height: 100% !important; width: 100% !important; }",
-])
 def read_fileinput_to_df(file_input: pn.widgets.FileInput) -> pd.DataFrame | None:
@@ -1480,7 +1484,7 @@ def _update_site_view(event=None) -> None:  # noqa: D401, ARG001
     ]
     first_row = site_detail_df.iloc[0]
     site_label = f"{first_row['code']}"
-    if pd.notna(first_row.get('City')):
         site_label += f" ({first_row['City']})"
     if traffic_cols:
@@ -2389,8 +2393,12 @@ main_content = pn.Column(
     export_button,
 )
-template.sidebar.append(sidebar_content)
-template.main.append(main_content)
-template.servable()

 from panel_app.convert_to_excel_panel import write_dfs_to_excel
 from utils.utils_vars import get_physical_db
+pn.extension(
+    "plotly",
+    "tabulator",
+    raw_css=[
+        ":fullscreen { background-color: white; overflow: auto; }",
+        "::backdrop { background-color: white; }",
+        ".plot-fullscreen-wrapper:fullscreen { padding: 20px; display: flex; flex-direction: column; }",
+        ".plot-fullscreen-wrapper:fullscreen > * { height: 100% !important; width: 100% !important; }",
+    ],
+)
 def read_fileinput_to_df(file_input: pn.widgets.FileInput) -> pd.DataFrame | None:
     ]
     first_row = site_detail_df.iloc[0]
     site_label = f"{first_row['code']}"
+    if pd.notna(first_row.get("City")):
         site_label += f" ({first_row['City']})"
     if traffic_cols:
     export_button,
 )
+def get_page_components():
+    return sidebar_content, main_content
+if __name__ == "__main__":
+    template.sidebar.append(sidebar_content)
+    template.main.append(main_content)
+    template.servable()

process_kpi/__init__.py ADDED Viewed

File without changes

process_kpi/kpi_health_check/__init__.py ADDED Viewed

File without changes

process_kpi/kpi_health_check/engine.py ADDED Viewed

	@@ -0,0 +1,210 @@

+from datetime import date, timedelta
+import numpy as np
+import pandas as pd
+def window_bounds(end_date: date, days: int) -> tuple[date, date]:
+    start = end_date - timedelta(days=days - 1)
+    return start, end_date
+def is_bad(
+    value: float | None,
+    baseline: float | None,
+    direction: str,
+    rel_threshold_pct: float,
+    sla: float | None,
+) -> bool:
+    if value is None or (isinstance(value, float) and np.isnan(value)):
+        return False
+    bad = False
+    if sla is not None and not (isinstance(sla, float) and np.isnan(sla)):
+        if direction == "higher_is_better":
+            bad = bad or (value < float(sla))
+        else:
+            bad = bad or (value > float(sla))
+    if baseline is None or (isinstance(baseline, float) and np.isnan(baseline)):
+        return bad
+    thr = float(rel_threshold_pct) / 100.0
+    if direction == "higher_is_better":
+        return bad or (value < baseline * (1.0 - thr))
+    return bad or (value > baseline * (1.0 + thr))
+def max_consecutive_days(dates: list[date]) -> int:
+    if not dates:
+        return 0
+    dates_sorted = sorted(set(dates))
+    streak = 1
+    best = 1
+    for prev, cur in zip(dates_sorted, dates_sorted[1:]):
+        if cur == prev + timedelta(days=1):
+            streak += 1
+        else:
+            streak = 1
+        if streak > best:
+            best = streak
+    return best
+def evaluate_health_check(
+    daily: pd.DataFrame,
+    rat: str,
+    rules_df: pd.DataFrame,
+    baseline_days_n: int,
+    recent_days_n: int,
+    rel_threshold_pct: float,
+    min_consecutive_days: int,
+) -> tuple[pd.DataFrame, pd.DataFrame]:
+    if daily.empty:
+        return pd.DataFrame(), pd.DataFrame()
+    end_date = max(daily["date_only"])
+    recent_start, recent_end = window_bounds(end_date, int(recent_days_n))
+    baseline_end = recent_start - timedelta(days=1)
+    baseline_start = baseline_end - timedelta(days=int(baseline_days_n) - 1)
+    rat_rules = rules_df[rules_df["RAT"] == rat].copy()
+    kpis = [k for k in rat_rules["KPI"].tolist() if k in daily.columns]
+    rows = []
+    for site_code, g_site in daily.groupby("site_code"):
+        city = (
+            g_site["City"].dropna().iloc[0]
+            if ("City" in g_site.columns and g_site["City"].notna().any())
+            else None
+        )
+        g_site = g_site.sort_values("date_only")
+        for kpi in kpis:
+            rule = rat_rules[rat_rules["KPI"] == kpi].iloc[0]
+            direction = str(rule.get("direction", "higher_is_better"))
+            sla = rule.get("sla", np.nan)
+            try:
+                sla_val = float(sla) if pd.notna(sla) else None
+            except Exception:
+                sla_val = None
+            s = g_site[["date_only", kpi]].dropna(subset=[kpi])
+            if s.empty:
+                rows.append(
+                    {
+                        "RAT": rat,
+                        "site_code": int(site_code),
+                        "City": city,
+                        "KPI": kpi,
+                        "status": "NO_DATA",
+                    }
+                )
+                continue
+            baseline_mask = (s["date_only"] >= baseline_start) & (
+                s["date_only"] <= baseline_end
+            )
+            recent_mask = (s["date_only"] >= recent_start) & (
+                s["date_only"] <= recent_end
+            )
+            baseline = (
+                s.loc[baseline_mask, kpi].median() if baseline_mask.any() else np.nan
+            )
+            recent = s.loc[recent_mask, kpi].median() if recent_mask.any() else np.nan
+            daily_recent = s.loc[recent_mask, ["date_only", kpi]].copy()
+            bad_dates = []
+            if not daily_recent.empty:
+                for d, v in zip(
+                    daily_recent["date_only"].tolist(), daily_recent[kpi].tolist()
+                ):
+                    if is_bad(
+                        float(v) if pd.notna(v) else None,
+                        float(baseline) if pd.notna(baseline) else None,
+                        direction,
+                        rel_threshold_pct,
+                        sla_val,
+                    ):
+                        bad_dates.append(d)
+            max_streak = max_consecutive_days(bad_dates)
+            persistent = max_streak >= int(min_consecutive_days)
+            is_bad_recent = is_bad(
+                float(recent) if pd.notna(recent) else None,
+                float(baseline) if pd.notna(baseline) else None,
+                direction,
+                rel_threshold_pct,
+                sla_val,
+            )
+            is_bad_current = is_bad_recent
+            if not daily_recent.empty:
+                last_row = daily_recent.sort_values("date_only").iloc[-1]
+                last_val = last_row[kpi]
+                is_bad_current = is_bad(
+                    float(last_val) if pd.notna(last_val) else None,
+                    float(baseline) if pd.notna(baseline) else None,
+                    direction,
+                    rel_threshold_pct,
+                    sla_val,
+                )
+            had_bad_recent = (len(bad_dates) > 0) or bool(is_bad_recent)
+            if is_bad_current and persistent:
+                status = "PERSISTENT_DEGRADED"
+            elif is_bad_current:
+                status = "DEGRADED"
+            elif had_bad_recent:
+                status = "RESOLVED"
+            else:
+                status = "OK"
+            rows.append(
+                {
+                    "RAT": rat,
+                    "site_code": int(site_code),
+                    "City": city,
+                    "KPI": kpi,
+                    "direction": direction,
+                    "sla": sla_val,
+                    "baseline_median": baseline,
+                    "recent_median": recent,
+                    "bad_days_recent": len(bad_dates),
+                    "max_streak_recent": int(max_streak),
+                    "status": status,
+                }
+            )
+    status_df = pd.DataFrame(rows)
+    summary_rows = []
+    for site_code, g in status_df.groupby("site_code"):
+        city = (
+            g["City"].dropna().iloc[0]
+            if ("City" in g.columns and g["City"].notna().any())
+            else None
+        )
+        degraded_cnt = int(g["status"].isin(["DEGRADED", "PERSISTENT_DEGRADED"]).sum())
+        persistent_cnt = int((g["status"] == "PERSISTENT_DEGRADED").sum())
+        resolved_cnt = int((g["status"] == "RESOLVED").sum())
+        summary_rows.append(
+            {
+                "RAT": rat,
+                "site_code": int(site_code),
+                "City": city,
+                "degraded_kpis": degraded_cnt,
+                "persistent_kpis": persistent_cnt,
+                "resolved_kpis": resolved_cnt,
+            }
+        )
+    summary_df = pd.DataFrame(summary_rows).sort_values(
+        by=["degraded_kpis", "persistent_kpis", "resolved_kpis"],
+        ascending=[False, False, False],
+    )
+    return status_df, summary_df

process_kpi/kpi_health_check/export.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import pandas as pd
+from panel_app.convert_to_excel_panel import write_dfs_to_excel
+def build_export_bytes(
+    datasets_df: pd.DataFrame | None,
+    rules_df: pd.DataFrame | None,
+    summary_df: pd.DataFrame | None,
+    status_df: pd.DataFrame | None,
+    multirat_summary_df: pd.DataFrame | None = None,
+    top_anomalies_df: pd.DataFrame | None = None,
+) -> bytes:
+    dfs = [
+        datasets_df if isinstance(datasets_df, pd.DataFrame) else pd.DataFrame(),
+        rules_df if isinstance(rules_df, pd.DataFrame) else pd.DataFrame(),
+        summary_df if isinstance(summary_df, pd.DataFrame) else pd.DataFrame(),
+        status_df if isinstance(status_df, pd.DataFrame) else pd.DataFrame(),
+        (
+            multirat_summary_df
+            if isinstance(multirat_summary_df, pd.DataFrame)
+            else pd.DataFrame()
+        ),
+        (
+            top_anomalies_df
+            if isinstance(top_anomalies_df, pd.DataFrame)
+            else pd.DataFrame()
+        ),
+    ]
+    sheet_names = [
+        "Datasets",
+        "KPI_Rules",
+        "Site_Summary",
+        "Site_KPI_Status",
+        "MultiRAT_Summary",
+        "Top_Anomalies",
+    ]
+    return write_dfs_to_excel(dfs, sheet_names, index=False)

process_kpi/kpi_health_check/io.py ADDED Viewed

	@@ -0,0 +1,45 @@

+import io
+import zipfile
+import pandas as pd
+def read_bytes_to_df(file_bytes: bytes, filename: str) -> pd.DataFrame:
+    if not file_bytes:
+        raise ValueError("Empty file")
+    filename_l = (filename or "").lower()
+    data = io.BytesIO(file_bytes)
+    if filename_l.endswith(".zip"):
+        with zipfile.ZipFile(data) as z:
+            csv_files = [f for f in z.namelist() if f.lower().endswith(".csv")]
+            if not csv_files:
+                raise ValueError("No CSV file found in the ZIP archive")
+            dfs = []
+            for csv_name in csv_files:
+                try:
+                    with z.open(csv_name) as f:
+                        df = pd.read_csv(
+                            f,
+                            encoding="latin1",
+                            sep=";",
+                            low_memory=False,
+                        )
+                    if isinstance(df, pd.DataFrame) and not df.empty:
+                        dfs.append(df)
+                except Exception:
+                    continue
+            if not dfs:
+                raise ValueError("No readable CSV content found in the ZIP archive")
+            if len(dfs) == 1:
+                return dfs[0]
+            return pd.concat(dfs, ignore_index=True, sort=False)
+    if filename_l.endswith(".csv"):
+        return pd.read_csv(data, encoding="latin1", sep=";", low_memory=False)
+    raise ValueError("Unsupported file format. Please upload a ZIP or CSV file.")

process_kpi/kpi_health_check/multi_rat.py ADDED Viewed

	@@ -0,0 +1,126 @@

+import pandas as pd
+def compute_multirat_views(
+    status_df: pd.DataFrame,
+) -> tuple[pd.DataFrame, pd.DataFrame]:
+    if status_df is None or status_df.empty:
+        return pd.DataFrame(), pd.DataFrame()
+    df = status_df.copy()
+    df["is_degraded"] = df["status"].isin(["DEGRADED", "PERSISTENT_DEGRADED"])
+    df["is_persistent"] = df["status"].isin(["PERSISTENT_DEGRADED"])
+    df["is_resolved"] = df["status"].isin(["RESOLVED"])
+    def _first_city(s: pd.Series):
+        s2 = s.dropna()
+        return s2.iloc[0] if not s2.empty else None
+    base = (
+        df.groupby("site_code", as_index=False)
+        .agg(
+            City=("City", _first_city),
+            degraded_kpis_total=("is_degraded", "sum"),
+            persistent_kpis_total=("is_persistent", "sum"),
+            resolved_kpis_total=("is_resolved", "sum"),
+        )
+        .copy()
+    )
+    impacted = (
+        df[df["is_degraded"]]
+        .groupby("site_code")["RAT"]
+        .nunique()
+        .rename("impacted_rats")
+        .reset_index()
+    )
+    resolved_pivot = (
+        df[df["is_resolved"]]
+        .pivot_table(
+            index="site_code",
+            columns="RAT",
+            values="KPI",
+            aggfunc="count",
+            fill_value=0,
+        )
+        .rename(columns=lambda c: f"resolved_{c}")
+        .reset_index()
+    )
+    base = pd.merge(base, impacted, on="site_code", how="left")
+    base["impacted_rats"] = base["impacted_rats"].fillna(0).astype(int)
+    degraded_pivot = (
+        df[df["is_degraded"]]
+        .pivot_table(
+            index="site_code",
+            columns="RAT",
+            values="KPI",
+            aggfunc="count",
+            fill_value=0,
+        )
+        .rename(columns=lambda c: f"degraded_{c}")
+        .reset_index()
+    )
+    persistent_pivot = (
+        df[df["is_persistent"]]
+        .pivot_table(
+            index="site_code",
+            columns="RAT",
+            values="KPI",
+            aggfunc="count",
+            fill_value=0,
+        )
+        .rename(columns=lambda c: f"persistent_{c}")
+        .reset_index()
+    )
+    out = base
+    if not degraded_pivot.empty:
+        out = pd.merge(out, degraded_pivot, on="site_code", how="left")
+    if not persistent_pivot.empty:
+        out = pd.merge(out, persistent_pivot, on="site_code", how="left")
+    if not resolved_pivot.empty:
+        out = pd.merge(out, resolved_pivot, on="site_code", how="left")
+    metric_cols = [c for c in out.columns if c != "City"]
+    out[metric_cols] = out[metric_cols].fillna(0)
+    out = out.sort_values(
+        by=["persistent_kpis_total", "degraded_kpis_total", "impacted_rats"],
+        ascending=[False, False, False],
+    )
+    top = df[df["is_degraded"]].copy()
+    sev = {"PERSISTENT_DEGRADED": 2, "DEGRADED": 1}
+    top["severity"] = top["status"].map(sev).fillna(0).astype(int)
+    for col in ["bad_days_recent", "max_streak_recent"]:
+        if col not in top.columns:
+            top[col] = pd.NA
+    top = top.sort_values(
+        by=["severity", "max_streak_recent", "bad_days_recent"],
+        ascending=[False, False, False],
+    )
+    top_cols = [
+        c
+        for c in [
+            "severity",
+            "RAT",
+            "site_code",
+            "City",
+            "KPI",
+            "status",
+            "baseline_median",
+            "recent_median",
+            "bad_days_recent",
+            "max_streak_recent",
+        ]
+        if c in top.columns
+    ]
+    top = top[top_cols].head(300)
+    return out, top

process_kpi/kpi_health_check/normalization.py ADDED Viewed

	@@ -0,0 +1,219 @@

+import re
+import numpy as np
+import pandas as pd
+from utils.utils_vars import get_physical_db
+def to_numeric(series: pd.Series) -> pd.Series:
+    if pd.api.types.is_numeric_dtype(series):
+        return pd.to_numeric(series, errors="coerce")
+    s = series.astype(str)
+    s = s.str.replace("\u00a0", "", regex=False)
+    s = s.str.replace(" ", "", regex=False)
+    s = s.str.replace("%", "", regex=False)
+    s = s.str.replace(",", ".", regex=False)
+    s = s.replace({"nan": np.nan, "None": np.nan, "": np.nan})
+    return pd.to_numeric(s, errors="coerce")
+def parse_datetime(series: pd.Series) -> pd.Series:
+    if series.empty:
+        return pd.to_datetime(series, errors="coerce")
+    first = series.dropna().astype(str).iloc[0] if series.dropna().any() else ""
+    formats: list[str | None] = []
+    if len(first) > 10:
+        formats.extend(
+            [
+                "%m.%d.%Y %H:%M:%S",
+                "%d.%m.%Y %H:%M:%S",
+                "%Y-%m-%d %H:%M:%S",
+                "%Y/%m/%d %H:%M:%S",
+                "%d/%m/%Y %H:%M:%S",
+                "%m/%d/%Y %H:%M:%S",
+            ]
+        )
+    formats.extend(
+        [
+            "%m.%d.%Y",
+            "%d.%m.%Y",
+            "%Y-%m-%d",
+            "%Y/%m/%d",
+            "%d/%m/%Y",
+            "%m/%d/%Y",
+        ]
+    )
+    for fmt in formats:
+        dt = pd.to_datetime(series, errors="coerce", format=fmt)
+        if dt.notna().any():
+            return dt
+    return pd.to_datetime(series, errors="coerce")
+def extract_site_code(value: object) -> int | None:
+    if value is None or (isinstance(value, float) and np.isnan(value)):
+        return None
+    s = str(value)
+    m = re.search(r"(\d{4,7})", s)
+    if not m:
+        return None
+    try:
+        return int(m.group(1))
+    except ValueError:
+        return None
+def infer_date_col(df: pd.DataFrame) -> str:
+    for c in ["PERIOD_START_TIME", "PERIOD_START_DATE", "date", "Date", "DATE"]:
+        if c in df.columns:
+            return c
+    raise ValueError("Cannot find a date column (expected PERIOD_START_TIME)")
+def infer_id_col(df: pd.DataFrame, rat: str) -> str:
+    rat_candidates = {
+        "2G": ["BCF name", "BCF", "BTS name", "BSC name", "DN"],
+        "3G": ["WBTS name", "WBTS ID", "DN"],
+        "LTE": ["LNBTS name", "MRBTS/SBTS name", "DN"],
+    }
+    candidates = [c for c in rat_candidates.get(rat, []) if c in df.columns]
+    if not candidates and "DN" in df.columns:
+        candidates = ["DN"]
+    if not candidates:
+        raise ValueError(f"Cannot infer an entity/site column for {rat} dataset")
+    physical_codes: set[int] | None = None
+    try:
+        physical = load_physical_db()
+        if not physical.empty and "code" in physical.columns:
+            physical_codes = set(
+                pd.to_numeric(physical["code"], errors="coerce")
+                .dropna()
+                .astype(int)
+                .tolist()
+            )
+    except Exception:
+        physical_codes = None
+    if not physical_codes:
+        return candidates[0]
+    best_col = candidates[0]
+    best_score = -1.0
+    for c in candidates:
+        sample = df[c].head(2000)
+        codes = sample.apply(extract_site_code)
+        non_null = float(codes.notna().mean()) if len(codes) else 0.0
+        if physical_codes:
+            match = (
+                float(codes.dropna().astype(int).isin(physical_codes).mean())
+                if codes.notna().any()
+                else 0.0
+            )
+            score = match * 10.0 + non_null
+        else:
+            score = non_null
+        if score > best_score:
+            best_score = score
+            best_col = c
+    return best_col
+def non_kpi_identifier_cols(df: pd.DataFrame, rat: str) -> set[str]:
+    common = {
+        "DN",
+        "PLMN name",
+        "RNC name",
+        "BSC name",
+        "BCF name",
+        "MRBTS/SBTS name",
+        "LNBTS name",
+        "WBTS name",
+        "WBTS ID",
+    }
+    rat_specific = {
+        "2G": {"BSC name", "BSC", "BCF name", "BCF", "BTS name"},
+        "3G": {"PLMN name", "RNC name", "WBTS name", "WBTS ID"},
+        "LTE": {"MRBTS/SBTS name", "LNBTS name"},
+    }
+    cols = set()
+    for c in common.union(rat_specific.get(rat, set())):
+        if c in df.columns:
+            cols.add(c)
+    return cols
+def infer_agg(kpi: str) -> str:
+    k = str(kpi).lower()
+    if any(x in k for x in ["traffic", "volume", "erl", "total", "gbytes", "gb"]):
+        return "sum"
+    return "mean"
+def load_physical_db() -> pd.DataFrame:
+    physical_db = get_physical_db().copy()
+    physical_db["code"] = physical_db["Code_Sector"].str.split("_").str[0]
+    physical_db["code"] = pd.to_numeric(physical_db["code"], errors="coerce")
+    physical_db = physical_db.dropna(subset=["code"])
+    physical_db["code"] = physical_db["code"].astype(int)
+    keep = [
+        c for c in ["code", "Longitude", "Latitude", "City"] if c in physical_db.columns
+    ]
+    return physical_db[keep].drop_duplicates("code")
+def build_daily_kpi(df_raw: pd.DataFrame, rat: str) -> tuple[pd.DataFrame, list[str]]:
+    df = df_raw.copy()
+    date_col = infer_date_col(df)
+    id_col = infer_id_col(df, rat)
+    df["date"] = parse_datetime(df[date_col])
+    df = df.dropna(subset=["date"])
+    df["date_only"] = df["date"].dt.date
+    df["site_code"] = df[id_col].apply(extract_site_code)
+    df = df.dropna(subset=["site_code"])
+    df["site_code"] = df["site_code"].astype(int)
+    meta = {date_col, id_col, "date", "date_only", "site_code"}
+    meta = meta.union(non_kpi_identifier_cols(df, rat))
+    candidate_cols = [c for c in df.columns if c not in meta]
+    numeric_cols: dict[str, pd.Series] = {}
+    for c in candidate_cols:
+        numeric_cols[c] = to_numeric(df[c])
+    numeric_df = pd.DataFrame(numeric_cols)
+    kpi_cols = [c for c in numeric_df.columns if numeric_df[c].notna().any()]
+    if not kpi_cols:
+        raise ValueError(f"No numeric KPI columns detected for {rat}")
+    base = pd.concat(
+        [
+            df[["site_code", "date_only"]].reset_index(drop=True),
+            numeric_df[kpi_cols].reset_index(drop=True),
+        ],
+        axis=1,
+    )
+    agg_dict = {k: infer_agg(k) for k in kpi_cols}
+    daily = base.groupby(["site_code", "date_only"], as_index=False).agg(agg_dict)
+    physical = load_physical_db()
+    if not physical.empty:
+        daily = pd.merge(
+            daily, physical, left_on="site_code", right_on="code", how="left"
+        )
+        daily = daily.drop(columns=[c for c in ["code"] if c in daily.columns])
+    daily["RAT"] = rat
+    return daily, kpi_cols

process_kpi/kpi_health_check/rules.py ADDED Viewed

	@@ -0,0 +1,31 @@

+def infer_kpi_direction(kpi: str) -> str:
+    k = str(kpi).lower()
+    lower_is_better = [
+        "drop",
+        "dcr",
+        "blocking",
+        "block",
+        "congestion",
+        "loss",
+        "discard",
+        "rtwp",
+        "prb usage",
+        "usage",
+        "fail",
+    ]
+    if any(x in k for x in lower_is_better):
+        return "lower_is_better"
+    return "higher_is_better"
+def infer_kpi_sla(kpi: str, direction: str) -> float | None:
+    k = str(kpi).lower()
+    if direction == "higher_is_better" and any(
+        x in k for x in ["availability", "cssr", "success", " sr"]
+    ):
+        return 98.0
+    if direction == "lower_is_better" and any(
+        x in k for x in ["drop", "dcr", "blocking", "congestion", "loss", "discard"]
+    ):
+        return 2.0
+    return None