DavMelchi commited on
Commit
a8899bd
·
1 Parent(s): 604d355

Add KPI Health Check Panel with multi-RAT analysis, persistent degradation detection, and Excel export functionality

Browse files
documentations/kpi_health_check_plan.md ADDED
@@ -0,0 +1,230 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # KPI Health Check (Panel) — Plan global
2
+
3
+ ## 1) Contexte & objectif
4
+
5
+ En NPO, on reçoit des plaintes client (site(s) impacté(s)). L’objectif est de pouvoir, en quelques minutes, vérifier si les KPI radio sont:
6
+
7
+ - Dégradés récemment
8
+ - Dégradés depuis longtemps (persistants)
9
+ - En voie de résolution / résolus
10
+
11
+ Le but est d’avoir une application Panel simple à utiliser, mais “expert-friendly”, qui standardise l’analyse et produit un rapport exportable.
12
+
13
+ ## 2) Entrées & formats de données
14
+
15
+ ### 2.1 Fichiers
16
+
17
+ - Rapports KPI par technologie: 2G / 3G / LTE
18
+ - Format: `.csv` ou `.zip` contenant un (ou plusieurs) `.csv`
19
+ - Séparateur: `;` (latin1) comme dans les apps existantes
20
+
21
+ ### 2.2 Colonnes clés attendues
22
+
23
+ - Date: `PERIOD_START_TIME` (parfois date seule ou date+heure)
24
+ - Identifiant NE/site/cell (variable selon RAT)
25
+ - 2G: `BCF name` / `DN`
26
+ - 3G: `WBTS name` / `DN`
27
+ - LTE: `LNBTS name` / `DN`
28
+
29
+ ### 2.3 KPI (liste fournie)
30
+
31
+ - Les colonnes KPI sont nombreuses et hétérogènes.
32
+ - Il faut distinguer:
33
+ - KPI “taux” (availability, success rate, CSSR, etc.) => agrégation moyenne
34
+ - KPI “compteurs/volumes/traffic” => agrégation somme
35
+
36
+ ## 3) Sorties attendues
37
+
38
+ ### 3.1 Résultats UI
39
+
40
+ - Tableau “Health Summary” par site:
41
+ - Nombre de KPI dégradés (par RAT)
42
+ - Nombre de KPI persistants
43
+ - Top KPI les plus critiques
44
+ - Drill-down par site:
45
+ - Courbes temporelles
46
+ - Jours dégradés
47
+ - Comparaison baseline vs récent
48
+
49
+ ### 3.2 Export
50
+
51
+ - Export Excel d’un rapport complet:
52
+ - Résumé datasets
53
+ - Règles KPI (seuils/direction)
54
+ - Résumé site
55
+ - Détails par KPI/site
56
+ - Série journalière (optionnel / selon volume)
57
+
58
+ ## 4) Modèle de détection (logique “expert”)
59
+
60
+ ### 4.1 Normalisation
61
+
62
+ - Parsing date en `datetime`
63
+ - Construction `date_only`
64
+ - Extraction d’un `site_code` (ou code site) depuis le nom (pattern numérique / split comme dans l’app trafic)
65
+ - Enrichissement via `physical_db` (City, Latitude/Longitude) comme dans `trafic_analysis_panel.py`
66
+
67
+ ### 4.2 Règles KPI (paramétrables)
68
+
69
+ Chaque KPI doit avoir:
70
+
71
+ - `direction`:
72
+ - `higher_is_better` (availability, success rate, CSSR, throughput, traffic)
73
+ - `lower_is_better` (drop rate, blocking, congestion, loss, discard, RTWP, PRB usage)
74
+ - `sla` (optionnel): seuil absolu
75
+ - `agg`: `mean` ou `sum`
76
+
77
+ ### 4.3 Fenêtres temporelles
78
+
79
+ Paramètres globaux:
80
+
81
+ - Baseline window (ex: 30 jours)
82
+ - Recent window (ex: 7 jours)
83
+ - Min consecutive bad days (ex: 3 jours) => persistant
84
+
85
+ ### 4.4 Critères de dégradation
86
+
87
+ Pour un couple (site, KPI, RAT):
88
+
89
+ - **Dégradation relative** vs baseline
90
+ - ex: variation > X% dans le “mauvais sens”
91
+ - **Dégradation absolue** vs SLA
92
+ - ex: availability < 98% ou drop > 2%
93
+
94
+ ### 4.5 Classification (états)
95
+
96
+ - `OK`: pas de dégradation
97
+ - `DEGRADED`: dégradé récemment
98
+ - `PERSISTENT_DEGRADED`: dégradé récemment + streak >= N jours
99
+ - `RESOLVED` (V2): dégradé avant mais OK sur les derniers jours
100
+ - `NO_DATA`: pas de points exploitables
101
+
102
+ ## 5) UX / écrans (Panel)
103
+
104
+ ### 5.0 Mode multipage (Portal)
105
+
106
+ L’app cible est un **portal Panel multipage** qui regroupe plusieurs pages (apps) sur un seul serveur Panel.
107
+
108
+ Pages initiales:
109
+
110
+ - Global Traffic Analysis (page existante: `panel_app/trafic_analysis_panel.py`)
111
+ - KPI Health Check (nouvelle page: `panel_app/kpi_health_check_panel.py`)
112
+
113
+ Navigation:
114
+
115
+ - menu latéral (sélecteur de page)
116
+ - la sidebar et le contenu principal changent selon la page sélectionnée
117
+
118
+ ### 5.1 Sidebar (configuration)
119
+
120
+ - Upload 2G/3G/LTE
121
+ - Période d’analyse optionnelle
122
+ - Paramètres: baseline/recent/threshold/persistance
123
+ - Boutons:
124
+ - Charger & construire les règles
125
+ - Lancer le health check
126
+ - Export Excel
127
+
128
+ ### 5.2 Main
129
+
130
+ - Datasets summary
131
+ - KPI Rules table (editable)
132
+ - Site Summary table
133
+ - Drill-down:
134
+ - Sélection site
135
+ - Sélection RAT
136
+ - Sélection KPI
137
+ - Courbe KPI
138
+ - Table KPI/site (statuts)
139
+
140
+ ## 6) Architecture code & organisation
141
+
142
+ ### 6.1 Modules
143
+
144
+ Objectif: **app modulaire**, pas un fichier monolithique.
145
+
146
+ - `panel_app/kpi_health_check_panel.py`
147
+ - UI Panel (widgets, layout)
148
+ - branchement des callbacks
149
+ - aucune logique “métier” lourde
150
+ - `panel_app/panel_portal.py`
151
+ - page d’accueil + navigation multipage
152
+ - import des pages et affichage via `get_page_components()`
153
+ - `process_kpi/kpi_health_check/io.py`
154
+ - lecture ZIP/CSV
155
+ - support ZIP multi-CSV (V2)
156
+ - `process_kpi/kpi_health_check/normalization.py`
157
+ - détection colonne date / parsing
158
+ - extraction `site_code` depuis BCF/WBTS/LNBTS/DN
159
+ - agrégation journalière (mean vs sum)
160
+ - enrichissement `physical_db` (City/Lat/Lon)
161
+ - `process_kpi/kpi_health_check/rules.py`
162
+ - génération des règles KPI (direction, SLA, agg)
163
+ - validation / normalisation des règles
164
+ - `process_kpi/kpi_health_check/engine.py`
165
+ - calcul baseline vs récent
166
+ - classification (OK / DEGRADED / PERSISTENT_DEGRADED / RESOLVED)
167
+ - construction tables output
168
+ - `process_kpi/kpi_health_check/multi_rat.py`
169
+ - synthèse cross-RAT par site
170
+ - “top anomalies” multi-RAT
171
+ - `process_kpi/kpi_health_check/export.py`
172
+ - build Excel bytes (réutilise `panel_app/convert_to_excel_panel.py`)
173
+
174
+ ### 6.2 Fonctions clés
175
+
176
+ - Lecture ZIP/CSV
177
+ - Détection colonnes date & ID
178
+ - Construction dataset journalier
179
+ - Génération règles KPI
180
+ - Évaluation health check
181
+ - Export Excel
182
+
183
+ ### 6.3 Règle `site_code` + enrichissement physical DB (comme l’app trafic)
184
+
185
+ - **Extraction code site**
186
+ - stratégie principale: logique proche de `trafic_analysis_panel.py` (split / préfixe numérique du nom)
187
+ - fallback: regex sur séquence de chiffres dans le nom
188
+ - **Enrichissement**
189
+ - charger `physical_db/physical_database.csv` via `get_physical_db()`
190
+ - construire `code` depuis `Code_Sector` (`split('_')[0]`) puis cast int
191
+ - jointure sur le code pour récupérer `City`, `Longitude`, `Latitude`
192
+
193
+ ## 7) Roadmap / itérations
194
+
195
+ ### V1 (MVP)
196
+
197
+ - [DONE] Upload 2G/3G/LTE (multi-RAT)
198
+ - [DONE] Détection KPI numériques
199
+ - [DONE] Règles KPI éditables
200
+ - [DONE] Détection DEGRADED / PERSISTENT_DEGRADED / OK
201
+ - [DONE] Drill-down simple + export
202
+
203
+ ### V2 (expert)
204
+
205
+ - [DONE] RESOLVED (dégradé puis OK)
206
+ - [DONE] Support ZIP multi-CSV
207
+ - [N/A] Support “cell-level” vs “site-level” (switch) (KPI confirmés par site)
208
+ - [TODO] Score de criticité (pondérer par trafic, population, criticité client)
209
+ - [DONE] Table “Top anomalies” multi-RAT (cross-RAT)
210
+ - [TODO] Visualisations avancées (heatmap par jour, histogrammes, etc.)
211
+
212
+ ### V3 (industrialisation)
213
+
214
+ - [TODO] Presets de règles par opérateur
215
+ - [TODO] Gestion profils / sauvegarde de configuration
216
+ - [TODO] Import automatique de “liste des sites plaintes"
217
+ - [TODO] Génération PDF (optionnel) et pack de preuves
218
+
219
+ ## 8) Points ouverts à confirmer
220
+
221
+ - [DONE] Les KPI sont par site
222
+ - [DONE] Les ZIP contiennent-ils parfois plusieurs CSV ? (support multi-CSV implémenté)
223
+ - [PARTIAL] Format exact de `PERIOD_START_TIME` sur tous les rapports ? (parsing renforcé, à valider sur tes fichiers)
224
+ - [TODO] Extraction du code site: règle unique ou dépend du naming ?
225
+
226
+ ## 9) Critères de réussite
227
+
228
+ - Charger un rapport KPI et obtenir un top sites dégradés en < 1 minute
229
+ - Pouvoir isoler rapidement: depuis quand, sur quels KPI, et si c’est persistant
230
+ - Export Excel exploitable pour partage interne
panel_app/kpi_health_check_panel.py ADDED
@@ -0,0 +1,475 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import io
2
+ import os
3
+ import sys
4
+ from datetime import date
5
+
6
+ import pandas as pd
7
+ import panel as pn
8
+ import plotly.express as px
9
+
10
+ ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
11
+ if ROOT_DIR not in sys.path:
12
+ sys.path.insert(0, ROOT_DIR)
13
+
14
+ from process_kpi.kpi_health_check.engine import evaluate_health_check
15
+ from process_kpi.kpi_health_check.export import build_export_bytes
16
+ from process_kpi.kpi_health_check.io import read_bytes_to_df
17
+ from process_kpi.kpi_health_check.multi_rat import compute_multirat_views
18
+ from process_kpi.kpi_health_check.normalization import (
19
+ build_daily_kpi,
20
+ infer_date_col,
21
+ infer_id_col,
22
+ )
23
+ from process_kpi.kpi_health_check.rules import infer_kpi_direction, infer_kpi_sla
24
+
25
+ pn.extension("plotly", "tabulator")
26
+
27
+ PLOTLY_CONFIG = {"displaylogo": False, "scrollZoom": True, "displayModeBar": True}
28
+
29
+
30
+ def read_fileinput_to_df(file_input: pn.widgets.FileInput) -> pd.DataFrame | None:
31
+ if file_input is None or not file_input.value:
32
+ return None
33
+
34
+ return read_bytes_to_df(file_input.value, file_input.filename or "")
35
+
36
+
37
+ current_daily_by_rat: dict[str, pd.DataFrame] = {}
38
+ current_rules_df: pd.DataFrame | None = None
39
+ current_status_df: pd.DataFrame | None = None
40
+ current_summary_df: pd.DataFrame | None = None
41
+ current_multirat_df: pd.DataFrame | None = None
42
+ current_top_anomalies_df: pd.DataFrame | None = None
43
+ current_export_bytes: bytes | None = None
44
+
45
+ file_2g = pn.widgets.FileInput(name="2G KPI report", accept=".csv,.zip")
46
+ file_3g = pn.widgets.FileInput(name="3G KPI report", accept=".csv,.zip")
47
+ file_lte = pn.widgets.FileInput(name="LTE KPI report", accept=".csv,.zip")
48
+
49
+ analysis_range = pn.widgets.DateRangePicker(name="Analysis date range (optional)")
50
+ baseline_days = pn.widgets.IntInput(name="Baseline window (days)", value=30)
51
+ recent_days = pn.widgets.IntInput(name="Recent window (days)", value=7)
52
+ rel_threshold_pct = pn.widgets.FloatInput(
53
+ name="Relative change threshold (%)", value=10.0, step=1.0
54
+ )
55
+ min_consecutive_days = pn.widgets.IntInput(
56
+ name="Min consecutive bad days (persistent)", value=3
57
+ )
58
+
59
+ load_button = pn.widgets.Button(
60
+ name="Load datasets & build rules", button_type="primary"
61
+ )
62
+ run_button = pn.widgets.Button(name="Run health check", button_type="primary")
63
+
64
+ status_pane = pn.pane.Alert(
65
+ "Upload KPI reports (ZIP/CSV), then load datasets and run health check.",
66
+ alert_type="primary",
67
+ )
68
+
69
+ datasets_table = pn.widgets.Tabulator(
70
+ height=180, sizing_mode="stretch_width", layout="fit_data_table"
71
+ )
72
+ rules_table = pn.widgets.Tabulator(
73
+ height=260, sizing_mode="stretch_width", layout="fit_data_table"
74
+ )
75
+
76
+ try:
77
+ rules_table.editable = True
78
+ except Exception: # noqa: BLE001
79
+ try:
80
+ cfg = dict(rules_table.configuration or {})
81
+ cfg["editable"] = True
82
+ rules_table.configuration = cfg
83
+ except Exception: # noqa: BLE001
84
+ pass
85
+
86
+ site_summary_table = pn.widgets.Tabulator(
87
+ height=260, sizing_mode="stretch_width", layout="fit_data_table"
88
+ )
89
+
90
+ multirat_summary_table = pn.widgets.Tabulator(
91
+ height=260, sizing_mode="stretch_width", layout="fit_data_table"
92
+ )
93
+
94
+ top_anomalies_table = pn.widgets.Tabulator(
95
+ height=260, sizing_mode="stretch_width", layout="fit_data_table"
96
+ )
97
+
98
+ site_select = pn.widgets.AutocompleteInput(
99
+ name="Select a site (Type to search)",
100
+ options={},
101
+ case_sensitive=False,
102
+ search_strategy="includes",
103
+ restrict=True,
104
+ )
105
+ rat_select = pn.widgets.RadioButtonGroup(
106
+ name="RAT", options=["2G", "3G", "LTE"], value="LTE"
107
+ )
108
+ kpi_select = pn.widgets.Select(name="KPI", options=[])
109
+
110
+ site_kpi_table = pn.widgets.Tabulator(
111
+ height=260, sizing_mode="stretch_width", layout="fit_data_table"
112
+ )
113
+ trend_plot_pane = pn.pane.Plotly(sizing_mode="stretch_both", config=PLOTLY_CONFIG)
114
+
115
+ export_button = pn.widgets.FileDownload(
116
+ label="Download KPI Health Check report",
117
+ filename="KPI_Health_Check_Report.xlsx",
118
+ button_type="primary",
119
+ )
120
+
121
+
122
+ def _filtered_daily(df: pd.DataFrame) -> pd.DataFrame:
123
+ if df is None or df.empty:
124
+ return pd.DataFrame()
125
+ if (
126
+ analysis_range.value
127
+ and len(analysis_range.value) == 2
128
+ and analysis_range.value[0]
129
+ and analysis_range.value[1]
130
+ ):
131
+ start, end = analysis_range.value
132
+ mask = (df["date_only"] >= start) & (df["date_only"] <= end)
133
+ return df[mask].copy()
134
+ return df
135
+
136
+
137
+ def _update_site_options() -> None:
138
+ all_sites = []
139
+ for df in current_daily_by_rat.values():
140
+ if df is None or df.empty:
141
+ continue
142
+ cols = [c for c in ["site_code", "City"] if c in df.columns]
143
+ all_sites.append(df[cols].drop_duplicates("site_code"))
144
+
145
+ if not all_sites:
146
+ site_select.options = {}
147
+ site_select.value = None
148
+ return
149
+
150
+ sites_df = pd.concat(all_sites, ignore_index=True).drop_duplicates("site_code")
151
+ if "City" not in sites_df.columns:
152
+ sites_df["City"] = pd.NA
153
+
154
+ sites_df = sites_df.sort_values(by=["City", "site_code"], na_position="last")
155
+
156
+ opts: dict[str, int] = {}
157
+ for _, row in sites_df.iterrows():
158
+ label = (
159
+ f"{row['City']}_{row['site_code']}"
160
+ if pd.notna(row.get("City"))
161
+ else str(row["site_code"])
162
+ )
163
+ opts[str(label)] = int(row["site_code"])
164
+
165
+ site_select.options = opts
166
+ if opts and site_select.value not in opts.values():
167
+ site_select.value = next(iter(opts.values()))
168
+
169
+
170
+ def _update_kpi_options() -> None:
171
+ rat = rat_select.value
172
+ df = current_daily_by_rat.get(rat)
173
+ if df is None or df.empty:
174
+ kpi_select.options = []
175
+ kpi_select.value = None
176
+ return
177
+
178
+ kpis = [
179
+ c
180
+ for c in df.columns
181
+ if c not in {"site_code", "date_only", "Longitude", "Latitude", "City", "RAT"}
182
+ ]
183
+ kpis = sorted([str(c) for c in kpis])
184
+ kpi_select.options = kpis
185
+ if kpis and kpi_select.value not in kpis:
186
+ kpi_select.value = kpis[0]
187
+
188
+
189
+ def _update_site_view(event=None) -> None:
190
+ if current_status_df is None or current_status_df.empty:
191
+ site_kpi_table.value = pd.DataFrame()
192
+ trend_plot_pane.object = None
193
+ return
194
+
195
+ code = site_select.value
196
+ rat = rat_select.value
197
+ kpi = kpi_select.value
198
+
199
+ if code is None or rat is None:
200
+ site_kpi_table.value = pd.DataFrame()
201
+ trend_plot_pane.object = None
202
+ return
203
+
204
+ site_df = current_status_df[
205
+ (current_status_df["site_code"] == int(code))
206
+ & (current_status_df["RAT"] == rat)
207
+ ].copy()
208
+ site_kpi_table.value = site_df
209
+
210
+ daily = current_daily_by_rat.get(rat)
211
+ if daily is None or daily.empty or not kpi or kpi not in daily.columns:
212
+ trend_plot_pane.object = None
213
+ return
214
+
215
+ d = _filtered_daily(daily)
216
+ s = d[d["site_code"] == int(code)].copy().sort_values("date_only")
217
+ if s.empty:
218
+ trend_plot_pane.object = None
219
+ return
220
+
221
+ title = f"{rat} - {kpi} - site {int(code)}"
222
+ fig = px.line(s, x="date_only", y=kpi, markers=True)
223
+ fig.update_layout(template="plotly_white", title=title)
224
+ trend_plot_pane.object = fig
225
+
226
+
227
+ def load_datasets(event=None) -> None:
228
+ try:
229
+ status_pane.alert_type = "primary"
230
+ status_pane.object = "Loading datasets..."
231
+
232
+ global current_daily_by_rat, current_rules_df
233
+ global current_status_df, current_summary_df, current_export_bytes
234
+ global current_multirat_df, current_top_anomalies_df
235
+
236
+ current_daily_by_rat = {}
237
+ current_rules_df = None
238
+ current_status_df = None
239
+ current_summary_df = None
240
+ current_multirat_df = None
241
+ current_top_anomalies_df = None
242
+ current_export_bytes = None
243
+
244
+ site_summary_table.value = pd.DataFrame()
245
+ multirat_summary_table.value = pd.DataFrame()
246
+ top_anomalies_table.value = pd.DataFrame()
247
+ site_kpi_table.value = pd.DataFrame()
248
+ trend_plot_pane.object = None
249
+
250
+ inputs = {"2G": file_2g, "3G": file_3g, "LTE": file_lte}
251
+ rows = []
252
+ rules_rows = []
253
+
254
+ loaded_any = False
255
+ for rat, widget in inputs.items():
256
+ df_raw = read_fileinput_to_df(widget)
257
+ if df_raw is None:
258
+ continue
259
+ loaded_any = True
260
+
261
+ date_col = None
262
+ id_col = None
263
+ try:
264
+ date_col = infer_date_col(df_raw)
265
+ except Exception: # noqa: BLE001
266
+ date_col = None
267
+ try:
268
+ id_col = infer_id_col(df_raw, rat)
269
+ except Exception: # noqa: BLE001
270
+ id_col = None
271
+
272
+ daily, kpi_cols = build_daily_kpi(df_raw, rat)
273
+ current_daily_by_rat[rat] = daily
274
+
275
+ d = _filtered_daily(daily)
276
+ rows.append(
277
+ {
278
+ "RAT": rat,
279
+ "rows_raw": int(df_raw.shape[0]),
280
+ "cols_raw": int(df_raw.shape[1]),
281
+ "date_col": date_col,
282
+ "id_col": id_col,
283
+ "sites": int(d["site_code"].nunique()),
284
+ "days": int(d["date_only"].nunique()),
285
+ "kpis": int(len(kpi_cols)),
286
+ }
287
+ )
288
+
289
+ for kpi in kpi_cols:
290
+ direction = infer_kpi_direction(kpi)
291
+ rules_rows.append(
292
+ {
293
+ "RAT": rat,
294
+ "KPI": kpi,
295
+ "direction": direction,
296
+ "sla": infer_kpi_sla(kpi, direction),
297
+ }
298
+ )
299
+
300
+ if not loaded_any:
301
+ raise ValueError("Please upload at least one KPI report")
302
+
303
+ datasets_table.value = pd.DataFrame(rows)
304
+
305
+ rules_df = (
306
+ pd.DataFrame(rules_rows)
307
+ .drop_duplicates(subset=["RAT", "KPI"])
308
+ .sort_values(by=["RAT", "KPI"])
309
+ )
310
+ current_rules_df = rules_df
311
+ rules_table.value = rules_df
312
+
313
+ _update_site_options()
314
+ _update_kpi_options()
315
+
316
+ status_pane.alert_type = "success"
317
+ status_pane.object = (
318
+ "Datasets loaded. Edit KPI rules if needed, then run health check."
319
+ )
320
+
321
+ except Exception as exc: # noqa: BLE001
322
+ status_pane.alert_type = "danger"
323
+ status_pane.object = f"Error: {exc}"
324
+
325
+
326
+ def run_health_check(event=None) -> None:
327
+ try:
328
+ status_pane.alert_type = "primary"
329
+ status_pane.object = "Running health check..."
330
+
331
+ global current_status_df, current_summary_df, current_export_bytes
332
+ global current_multirat_df, current_top_anomalies_df
333
+
334
+ rules_df = (
335
+ rules_table.value
336
+ if isinstance(rules_table.value, pd.DataFrame)
337
+ else pd.DataFrame()
338
+ )
339
+ if rules_df.empty:
340
+ raise ValueError("KPI rules table is empty")
341
+
342
+ all_status = []
343
+ all_summary = []
344
+
345
+ for rat, daily in current_daily_by_rat.items():
346
+ d = _filtered_daily(daily)
347
+ status_df, summary_df = evaluate_health_check(
348
+ d,
349
+ rat,
350
+ rules_df,
351
+ int(baseline_days.value),
352
+ int(recent_days.value),
353
+ float(rel_threshold_pct.value),
354
+ int(min_consecutive_days.value),
355
+ )
356
+ if not status_df.empty:
357
+ all_status.append(status_df)
358
+ if not summary_df.empty:
359
+ all_summary.append(summary_df)
360
+
361
+ current_status_df = (
362
+ pd.concat(all_status, ignore_index=True) if all_status else pd.DataFrame()
363
+ )
364
+ current_summary_df = (
365
+ pd.concat(all_summary, ignore_index=True) if all_summary else pd.DataFrame()
366
+ )
367
+ site_summary_table.value = current_summary_df
368
+
369
+ current_multirat_df, current_top_anomalies_df = compute_multirat_views(
370
+ current_status_df
371
+ )
372
+ multirat_summary_table.value = current_multirat_df
373
+ top_anomalies_table.value = current_top_anomalies_df
374
+
375
+ current_export_bytes = _build_export_bytes()
376
+
377
+ _update_site_view()
378
+
379
+ status_pane.alert_type = "success"
380
+ status_pane.object = "Health check completed."
381
+
382
+ except Exception as exc: # noqa: BLE001
383
+ status_pane.alert_type = "danger"
384
+ status_pane.object = f"Error: {exc}"
385
+
386
+
387
+ def _build_export_bytes() -> bytes:
388
+ return build_export_bytes(
389
+ (
390
+ datasets_table.value
391
+ if isinstance(datasets_table.value, pd.DataFrame)
392
+ else None
393
+ ),
394
+ rules_table.value if isinstance(rules_table.value, pd.DataFrame) else None,
395
+ current_summary_df if isinstance(current_summary_df, pd.DataFrame) else None,
396
+ current_status_df if isinstance(current_status_df, pd.DataFrame) else None,
397
+ (
398
+ current_multirat_df
399
+ if isinstance(current_multirat_df, pd.DataFrame)
400
+ else None
401
+ ),
402
+ (
403
+ current_top_anomalies_df
404
+ if isinstance(current_top_anomalies_df, pd.DataFrame)
405
+ else None
406
+ ),
407
+ )
408
+
409
+
410
+ def _export_callback() -> io.BytesIO:
411
+ data = current_export_bytes or b""
412
+ if not data:
413
+ return io.BytesIO()
414
+ return io.BytesIO(data)
415
+
416
+
417
+ load_button.on_click(load_datasets)
418
+ run_button.on_click(run_health_check)
419
+
420
+ rat_select.param.watch(lambda e: (_update_kpi_options(), _update_site_view()), "value")
421
+ site_select.param.watch(_update_site_view, "value")
422
+ kpi_select.param.watch(_update_site_view, "value")
423
+
424
+ export_button.callback = _export_callback
425
+
426
+
427
+ # Page layout components (used by the multipage portal)
428
+ sidebar = pn.Column(
429
+ file_2g,
430
+ file_3g,
431
+ file_lte,
432
+ "---",
433
+ analysis_range,
434
+ baseline_days,
435
+ recent_days,
436
+ rel_threshold_pct,
437
+ min_consecutive_days,
438
+ "---",
439
+ load_button,
440
+ run_button,
441
+ "---",
442
+ export_button,
443
+ )
444
+
445
+ main = pn.Column(
446
+ status_pane,
447
+ pn.pane.Markdown("## Datasets"),
448
+ datasets_table,
449
+ pn.pane.Markdown("## KPI Rules (editable)"),
450
+ rules_table,
451
+ pn.pane.Markdown("## Site Summary"),
452
+ site_summary_table,
453
+ pn.pane.Markdown("## Multi-RAT Summary"),
454
+ multirat_summary_table,
455
+ pn.pane.Markdown("## Top anomalies (cross-RAT)"),
456
+ top_anomalies_table,
457
+ pn.layout.Divider(),
458
+ pn.pane.Markdown("## Drill-down"),
459
+ pn.Row(site_select, rat_select, kpi_select),
460
+ pn.Row(
461
+ pn.Column(site_kpi_table, sizing_mode="stretch_width"),
462
+ pn.Column(trend_plot_pane, sizing_mode="stretch_both"),
463
+ ),
464
+ )
465
+
466
+
467
+ def get_page_components():
468
+ return sidebar, main
469
+
470
+
471
+ if __name__ == "__main__":
472
+ template = pn.template.MaterialTemplate(title="KPI Health Check - Panel")
473
+ template.sidebar.append(sidebar)
474
+ template.main.append(main)
475
+ template.servable()
panel_app/panel_portal.py ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+
4
+ import panel as pn
5
+
6
+ ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
7
+ if ROOT_DIR not in sys.path:
8
+ sys.path.insert(0, ROOT_DIR)
9
+
10
+ pn.extension("plotly", "tabulator")
11
+
12
+ import kpi_health_check_panel
13
+
14
+ # Import pages (kept as modules, not nested templates)
15
+ import trafic_analysis_panel
16
+
17
+ PAGES = {
18
+ "📊 Global Traffic Analysis": {
19
+ "get_components": trafic_analysis_panel.get_page_components,
20
+ "description": "Analyse trafic multi-RAT + cartes + exports.",
21
+ },
22
+ "📈 KPI Health Check": {
23
+ "get_components": kpi_health_check_panel.get_page_components,
24
+ "description": "Détection KPI dégradés/persistants/résolus + drill-down + export.",
25
+ },
26
+ }
27
+
28
+ HOME_PAGE = "🏠 Gallery"
29
+
30
+ page_sidebar_container = pn.Column(sizing_mode="stretch_width")
31
+ page_main_container = pn.Column(sizing_mode="stretch_both")
32
+
33
+ page_title = pn.pane.Markdown("", sizing_mode="stretch_width")
34
+ back_button = pn.widgets.Button(
35
+ name="← Back to gallery",
36
+ button_type="primary",
37
+ width=180,
38
+ )
39
+
40
+ home_button = pn.widgets.Button(
41
+ name=HOME_PAGE,
42
+ button_type="default",
43
+ width_policy="max",
44
+ )
45
+
46
+
47
+ def _load_page(page_name: str) -> None:
48
+ if page_name == HOME_PAGE:
49
+ page_title.object = "## Applications"
50
+
51
+ tiles = []
52
+ for title, meta in PAGES.items():
53
+ btn = pn.widgets.Button(name="Open", button_type="primary", width=120)
54
+ btn.on_click(lambda e, t=title: _load_page(t))
55
+
56
+ tile = pn.Column(
57
+ pn.pane.Markdown(f"### {title}\n\n{meta.get('description', '')}"),
58
+ btn,
59
+ sizing_mode="stretch_width",
60
+ margin=(10, 10, 10, 10),
61
+ )
62
+ tiles.append(tile)
63
+
64
+ gallery = pn.GridBox(*tiles, ncols=2, sizing_mode="stretch_width")
65
+ page_sidebar_container.objects = [
66
+ pn.pane.Markdown(
67
+ """### Bienvenue\n\nChoisis une application dans la gallery."""
68
+ )
69
+ ]
70
+ page_main_container.objects = [page_title, gallery]
71
+ return
72
+
73
+ meta = PAGES.get(page_name)
74
+ if meta is None:
75
+ page_sidebar_container.objects = [
76
+ pn.pane.Alert("Unknown page", alert_type="danger")
77
+ ]
78
+ page_main_container.objects = []
79
+ return
80
+
81
+ sidebar, main = meta["get_components"]()
82
+ page_title.object = f"## {page_name}"
83
+ page_sidebar_container.objects = [sidebar]
84
+ page_main_container.objects = [
85
+ pn.Row(back_button, pn.Spacer(), sizing_mode="stretch_width"),
86
+ page_title,
87
+ main,
88
+ ]
89
+
90
+
91
+ template = pn.template.MaterialTemplate(title="OML DB - Panel Portal")
92
+
93
+
94
+ def _go_home(event=None) -> None:
95
+ _load_page(HOME_PAGE)
96
+
97
+
98
+ back_button.on_click(_go_home)
99
+ home_button.on_click(_go_home)
100
+
101
+ _load_page(HOME_PAGE)
102
+
103
+ template.sidebar.append(
104
+ pn.Column(
105
+ pn.pane.Markdown("## Navigation"),
106
+ home_button,
107
+ pn.layout.Divider(),
108
+ page_sidebar_container,
109
+ sizing_mode="stretch_width",
110
+ )
111
+ )
112
+
113
+ template.main.append(page_main_container)
114
+
115
+ template.servable()
panel_app/trafic_analysis_panel.py CHANGED
@@ -16,12 +16,16 @@ if ROOT_DIR not in sys.path:
16
  from panel_app.convert_to_excel_panel import write_dfs_to_excel
17
  from utils.utils_vars import get_physical_db
18
 
19
- pn.extension("plotly", "tabulator", raw_css=[
20
- ":fullscreen { background-color: white; overflow: auto; }",
21
- "::backdrop { background-color: white; }",
22
- ".plot-fullscreen-wrapper:fullscreen { padding: 20px; display: flex; flex-direction: column; }",
23
- ".plot-fullscreen-wrapper:fullscreen > * { height: 100% !important; width: 100% !important; }",
24
- ])
 
 
 
 
25
 
26
 
27
  def read_fileinput_to_df(file_input: pn.widgets.FileInput) -> pd.DataFrame | None:
@@ -1480,7 +1484,7 @@ def _update_site_view(event=None) -> None: # noqa: D401, ARG001
1480
  ]
1481
  first_row = site_detail_df.iloc[0]
1482
  site_label = f"{first_row['code']}"
1483
- if pd.notna(first_row.get('City')):
1484
  site_label += f" ({first_row['City']})"
1485
 
1486
  if traffic_cols:
@@ -2389,8 +2393,12 @@ main_content = pn.Column(
2389
  export_button,
2390
  )
2391
 
2392
- template.sidebar.append(sidebar_content)
2393
- template.main.append(main_content)
 
2394
 
2395
 
2396
- template.servable()
 
 
 
 
16
  from panel_app.convert_to_excel_panel import write_dfs_to_excel
17
  from utils.utils_vars import get_physical_db
18
 
19
+ pn.extension(
20
+ "plotly",
21
+ "tabulator",
22
+ raw_css=[
23
+ ":fullscreen { background-color: white; overflow: auto; }",
24
+ "::backdrop { background-color: white; }",
25
+ ".plot-fullscreen-wrapper:fullscreen { padding: 20px; display: flex; flex-direction: column; }",
26
+ ".plot-fullscreen-wrapper:fullscreen > * { height: 100% !important; width: 100% !important; }",
27
+ ],
28
+ )
29
 
30
 
31
  def read_fileinput_to_df(file_input: pn.widgets.FileInput) -> pd.DataFrame | None:
 
1484
  ]
1485
  first_row = site_detail_df.iloc[0]
1486
  site_label = f"{first_row['code']}"
1487
+ if pd.notna(first_row.get("City")):
1488
  site_label += f" ({first_row['City']})"
1489
 
1490
  if traffic_cols:
 
2393
  export_button,
2394
  )
2395
 
2396
+
2397
+ def get_page_components():
2398
+ return sidebar_content, main_content
2399
 
2400
 
2401
+ if __name__ == "__main__":
2402
+ template.sidebar.append(sidebar_content)
2403
+ template.main.append(main_content)
2404
+ template.servable()
process_kpi/__init__.py ADDED
File without changes
process_kpi/kpi_health_check/__init__.py ADDED
File without changes
process_kpi/kpi_health_check/engine.py ADDED
@@ -0,0 +1,210 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from datetime import date, timedelta
2
+
3
+ import numpy as np
4
+ import pandas as pd
5
+
6
+
7
+ def window_bounds(end_date: date, days: int) -> tuple[date, date]:
8
+ start = end_date - timedelta(days=days - 1)
9
+ return start, end_date
10
+
11
+
12
+ def is_bad(
13
+ value: float | None,
14
+ baseline: float | None,
15
+ direction: str,
16
+ rel_threshold_pct: float,
17
+ sla: float | None,
18
+ ) -> bool:
19
+ if value is None or (isinstance(value, float) and np.isnan(value)):
20
+ return False
21
+ bad = False
22
+ if sla is not None and not (isinstance(sla, float) and np.isnan(sla)):
23
+ if direction == "higher_is_better":
24
+ bad = bad or (value < float(sla))
25
+ else:
26
+ bad = bad or (value > float(sla))
27
+
28
+ if baseline is None or (isinstance(baseline, float) and np.isnan(baseline)):
29
+ return bad
30
+
31
+ thr = float(rel_threshold_pct) / 100.0
32
+ if direction == "higher_is_better":
33
+ return bad or (value < baseline * (1.0 - thr))
34
+ return bad or (value > baseline * (1.0 + thr))
35
+
36
+
37
+ def max_consecutive_days(dates: list[date]) -> int:
38
+ if not dates:
39
+ return 0
40
+ dates_sorted = sorted(set(dates))
41
+ streak = 1
42
+ best = 1
43
+ for prev, cur in zip(dates_sorted, dates_sorted[1:]):
44
+ if cur == prev + timedelta(days=1):
45
+ streak += 1
46
+ else:
47
+ streak = 1
48
+ if streak > best:
49
+ best = streak
50
+ return best
51
+
52
+
53
+ def evaluate_health_check(
54
+ daily: pd.DataFrame,
55
+ rat: str,
56
+ rules_df: pd.DataFrame,
57
+ baseline_days_n: int,
58
+ recent_days_n: int,
59
+ rel_threshold_pct: float,
60
+ min_consecutive_days: int,
61
+ ) -> tuple[pd.DataFrame, pd.DataFrame]:
62
+ if daily.empty:
63
+ return pd.DataFrame(), pd.DataFrame()
64
+
65
+ end_date = max(daily["date_only"])
66
+ recent_start, recent_end = window_bounds(end_date, int(recent_days_n))
67
+ baseline_end = recent_start - timedelta(days=1)
68
+ baseline_start = baseline_end - timedelta(days=int(baseline_days_n) - 1)
69
+
70
+ rat_rules = rules_df[rules_df["RAT"] == rat].copy()
71
+ kpis = [k for k in rat_rules["KPI"].tolist() if k in daily.columns]
72
+
73
+ rows = []
74
+
75
+ for site_code, g_site in daily.groupby("site_code"):
76
+ city = (
77
+ g_site["City"].dropna().iloc[0]
78
+ if ("City" in g_site.columns and g_site["City"].notna().any())
79
+ else None
80
+ )
81
+ g_site = g_site.sort_values("date_only")
82
+
83
+ for kpi in kpis:
84
+ rule = rat_rules[rat_rules["KPI"] == kpi].iloc[0]
85
+ direction = str(rule.get("direction", "higher_is_better"))
86
+ sla = rule.get("sla", np.nan)
87
+ try:
88
+ sla_val = float(sla) if pd.notna(sla) else None
89
+ except Exception:
90
+ sla_val = None
91
+
92
+ s = g_site[["date_only", kpi]].dropna(subset=[kpi])
93
+ if s.empty:
94
+ rows.append(
95
+ {
96
+ "RAT": rat,
97
+ "site_code": int(site_code),
98
+ "City": city,
99
+ "KPI": kpi,
100
+ "status": "NO_DATA",
101
+ }
102
+ )
103
+ continue
104
+
105
+ baseline_mask = (s["date_only"] >= baseline_start) & (
106
+ s["date_only"] <= baseline_end
107
+ )
108
+ recent_mask = (s["date_only"] >= recent_start) & (
109
+ s["date_only"] <= recent_end
110
+ )
111
+
112
+ baseline = (
113
+ s.loc[baseline_mask, kpi].median() if baseline_mask.any() else np.nan
114
+ )
115
+ recent = s.loc[recent_mask, kpi].median() if recent_mask.any() else np.nan
116
+
117
+ daily_recent = s.loc[recent_mask, ["date_only", kpi]].copy()
118
+ bad_dates = []
119
+ if not daily_recent.empty:
120
+ for d, v in zip(
121
+ daily_recent["date_only"].tolist(), daily_recent[kpi].tolist()
122
+ ):
123
+ if is_bad(
124
+ float(v) if pd.notna(v) else None,
125
+ float(baseline) if pd.notna(baseline) else None,
126
+ direction,
127
+ rel_threshold_pct,
128
+ sla_val,
129
+ ):
130
+ bad_dates.append(d)
131
+
132
+ max_streak = max_consecutive_days(bad_dates)
133
+ persistent = max_streak >= int(min_consecutive_days)
134
+
135
+ is_bad_recent = is_bad(
136
+ float(recent) if pd.notna(recent) else None,
137
+ float(baseline) if pd.notna(baseline) else None,
138
+ direction,
139
+ rel_threshold_pct,
140
+ sla_val,
141
+ )
142
+
143
+ is_bad_current = is_bad_recent
144
+ if not daily_recent.empty:
145
+ last_row = daily_recent.sort_values("date_only").iloc[-1]
146
+ last_val = last_row[kpi]
147
+ is_bad_current = is_bad(
148
+ float(last_val) if pd.notna(last_val) else None,
149
+ float(baseline) if pd.notna(baseline) else None,
150
+ direction,
151
+ rel_threshold_pct,
152
+ sla_val,
153
+ )
154
+
155
+ had_bad_recent = (len(bad_dates) > 0) or bool(is_bad_recent)
156
+
157
+ if is_bad_current and persistent:
158
+ status = "PERSISTENT_DEGRADED"
159
+ elif is_bad_current:
160
+ status = "DEGRADED"
161
+ elif had_bad_recent:
162
+ status = "RESOLVED"
163
+ else:
164
+ status = "OK"
165
+
166
+ rows.append(
167
+ {
168
+ "RAT": rat,
169
+ "site_code": int(site_code),
170
+ "City": city,
171
+ "KPI": kpi,
172
+ "direction": direction,
173
+ "sla": sla_val,
174
+ "baseline_median": baseline,
175
+ "recent_median": recent,
176
+ "bad_days_recent": len(bad_dates),
177
+ "max_streak_recent": int(max_streak),
178
+ "status": status,
179
+ }
180
+ )
181
+
182
+ status_df = pd.DataFrame(rows)
183
+
184
+ summary_rows = []
185
+ for site_code, g in status_df.groupby("site_code"):
186
+ city = (
187
+ g["City"].dropna().iloc[0]
188
+ if ("City" in g.columns and g["City"].notna().any())
189
+ else None
190
+ )
191
+ degraded_cnt = int(g["status"].isin(["DEGRADED", "PERSISTENT_DEGRADED"]).sum())
192
+ persistent_cnt = int((g["status"] == "PERSISTENT_DEGRADED").sum())
193
+ resolved_cnt = int((g["status"] == "RESOLVED").sum())
194
+ summary_rows.append(
195
+ {
196
+ "RAT": rat,
197
+ "site_code": int(site_code),
198
+ "City": city,
199
+ "degraded_kpis": degraded_cnt,
200
+ "persistent_kpis": persistent_cnt,
201
+ "resolved_kpis": resolved_cnt,
202
+ }
203
+ )
204
+
205
+ summary_df = pd.DataFrame(summary_rows).sort_values(
206
+ by=["degraded_kpis", "persistent_kpis", "resolved_kpis"],
207
+ ascending=[False, False, False],
208
+ )
209
+
210
+ return status_df, summary_df
process_kpi/kpi_health_check/export.py ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+
3
+ from panel_app.convert_to_excel_panel import write_dfs_to_excel
4
+
5
+
6
+ def build_export_bytes(
7
+ datasets_df: pd.DataFrame | None,
8
+ rules_df: pd.DataFrame | None,
9
+ summary_df: pd.DataFrame | None,
10
+ status_df: pd.DataFrame | None,
11
+ multirat_summary_df: pd.DataFrame | None = None,
12
+ top_anomalies_df: pd.DataFrame | None = None,
13
+ ) -> bytes:
14
+ dfs = [
15
+ datasets_df if isinstance(datasets_df, pd.DataFrame) else pd.DataFrame(),
16
+ rules_df if isinstance(rules_df, pd.DataFrame) else pd.DataFrame(),
17
+ summary_df if isinstance(summary_df, pd.DataFrame) else pd.DataFrame(),
18
+ status_df if isinstance(status_df, pd.DataFrame) else pd.DataFrame(),
19
+ (
20
+ multirat_summary_df
21
+ if isinstance(multirat_summary_df, pd.DataFrame)
22
+ else pd.DataFrame()
23
+ ),
24
+ (
25
+ top_anomalies_df
26
+ if isinstance(top_anomalies_df, pd.DataFrame)
27
+ else pd.DataFrame()
28
+ ),
29
+ ]
30
+ sheet_names = [
31
+ "Datasets",
32
+ "KPI_Rules",
33
+ "Site_Summary",
34
+ "Site_KPI_Status",
35
+ "MultiRAT_Summary",
36
+ "Top_Anomalies",
37
+ ]
38
+ return write_dfs_to_excel(dfs, sheet_names, index=False)
process_kpi/kpi_health_check/io.py ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import io
2
+ import zipfile
3
+
4
+ import pandas as pd
5
+
6
+
7
+ def read_bytes_to_df(file_bytes: bytes, filename: str) -> pd.DataFrame:
8
+ if not file_bytes:
9
+ raise ValueError("Empty file")
10
+
11
+ filename_l = (filename or "").lower()
12
+ data = io.BytesIO(file_bytes)
13
+
14
+ if filename_l.endswith(".zip"):
15
+ with zipfile.ZipFile(data) as z:
16
+ csv_files = [f for f in z.namelist() if f.lower().endswith(".csv")]
17
+ if not csv_files:
18
+ raise ValueError("No CSV file found in the ZIP archive")
19
+ dfs = []
20
+ for csv_name in csv_files:
21
+ try:
22
+ with z.open(csv_name) as f:
23
+ df = pd.read_csv(
24
+ f,
25
+ encoding="latin1",
26
+ sep=";",
27
+ low_memory=False,
28
+ )
29
+ if isinstance(df, pd.DataFrame) and not df.empty:
30
+ dfs.append(df)
31
+ except Exception:
32
+ continue
33
+
34
+ if not dfs:
35
+ raise ValueError("No readable CSV content found in the ZIP archive")
36
+
37
+ if len(dfs) == 1:
38
+ return dfs[0]
39
+
40
+ return pd.concat(dfs, ignore_index=True, sort=False)
41
+
42
+ if filename_l.endswith(".csv"):
43
+ return pd.read_csv(data, encoding="latin1", sep=";", low_memory=False)
44
+
45
+ raise ValueError("Unsupported file format. Please upload a ZIP or CSV file.")
process_kpi/kpi_health_check/multi_rat.py ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+
3
+
4
+ def compute_multirat_views(
5
+ status_df: pd.DataFrame,
6
+ ) -> tuple[pd.DataFrame, pd.DataFrame]:
7
+ if status_df is None or status_df.empty:
8
+ return pd.DataFrame(), pd.DataFrame()
9
+
10
+ df = status_df.copy()
11
+ df["is_degraded"] = df["status"].isin(["DEGRADED", "PERSISTENT_DEGRADED"])
12
+ df["is_persistent"] = df["status"].isin(["PERSISTENT_DEGRADED"])
13
+ df["is_resolved"] = df["status"].isin(["RESOLVED"])
14
+
15
+ def _first_city(s: pd.Series):
16
+ s2 = s.dropna()
17
+ return s2.iloc[0] if not s2.empty else None
18
+
19
+ base = (
20
+ df.groupby("site_code", as_index=False)
21
+ .agg(
22
+ City=("City", _first_city),
23
+ degraded_kpis_total=("is_degraded", "sum"),
24
+ persistent_kpis_total=("is_persistent", "sum"),
25
+ resolved_kpis_total=("is_resolved", "sum"),
26
+ )
27
+ .copy()
28
+ )
29
+
30
+ impacted = (
31
+ df[df["is_degraded"]]
32
+ .groupby("site_code")["RAT"]
33
+ .nunique()
34
+ .rename("impacted_rats")
35
+ .reset_index()
36
+ )
37
+
38
+ resolved_pivot = (
39
+ df[df["is_resolved"]]
40
+ .pivot_table(
41
+ index="site_code",
42
+ columns="RAT",
43
+ values="KPI",
44
+ aggfunc="count",
45
+ fill_value=0,
46
+ )
47
+ .rename(columns=lambda c: f"resolved_{c}")
48
+ .reset_index()
49
+ )
50
+
51
+ base = pd.merge(base, impacted, on="site_code", how="left")
52
+ base["impacted_rats"] = base["impacted_rats"].fillna(0).astype(int)
53
+
54
+ degraded_pivot = (
55
+ df[df["is_degraded"]]
56
+ .pivot_table(
57
+ index="site_code",
58
+ columns="RAT",
59
+ values="KPI",
60
+ aggfunc="count",
61
+ fill_value=0,
62
+ )
63
+ .rename(columns=lambda c: f"degraded_{c}")
64
+ .reset_index()
65
+ )
66
+
67
+ persistent_pivot = (
68
+ df[df["is_persistent"]]
69
+ .pivot_table(
70
+ index="site_code",
71
+ columns="RAT",
72
+ values="KPI",
73
+ aggfunc="count",
74
+ fill_value=0,
75
+ )
76
+ .rename(columns=lambda c: f"persistent_{c}")
77
+ .reset_index()
78
+ )
79
+
80
+ out = base
81
+ if not degraded_pivot.empty:
82
+ out = pd.merge(out, degraded_pivot, on="site_code", how="left")
83
+ if not persistent_pivot.empty:
84
+ out = pd.merge(out, persistent_pivot, on="site_code", how="left")
85
+ if not resolved_pivot.empty:
86
+ out = pd.merge(out, resolved_pivot, on="site_code", how="left")
87
+
88
+ metric_cols = [c for c in out.columns if c != "City"]
89
+ out[metric_cols] = out[metric_cols].fillna(0)
90
+ out = out.sort_values(
91
+ by=["persistent_kpis_total", "degraded_kpis_total", "impacted_rats"],
92
+ ascending=[False, False, False],
93
+ )
94
+
95
+ top = df[df["is_degraded"]].copy()
96
+ sev = {"PERSISTENT_DEGRADED": 2, "DEGRADED": 1}
97
+ top["severity"] = top["status"].map(sev).fillna(0).astype(int)
98
+
99
+ for col in ["bad_days_recent", "max_streak_recent"]:
100
+ if col not in top.columns:
101
+ top[col] = pd.NA
102
+
103
+ top = top.sort_values(
104
+ by=["severity", "max_streak_recent", "bad_days_recent"],
105
+ ascending=[False, False, False],
106
+ )
107
+
108
+ top_cols = [
109
+ c
110
+ for c in [
111
+ "severity",
112
+ "RAT",
113
+ "site_code",
114
+ "City",
115
+ "KPI",
116
+ "status",
117
+ "baseline_median",
118
+ "recent_median",
119
+ "bad_days_recent",
120
+ "max_streak_recent",
121
+ ]
122
+ if c in top.columns
123
+ ]
124
+ top = top[top_cols].head(300)
125
+
126
+ return out, top
process_kpi/kpi_health_check/normalization.py ADDED
@@ -0,0 +1,219 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+
3
+ import numpy as np
4
+ import pandas as pd
5
+
6
+ from utils.utils_vars import get_physical_db
7
+
8
+
9
+ def to_numeric(series: pd.Series) -> pd.Series:
10
+ if pd.api.types.is_numeric_dtype(series):
11
+ return pd.to_numeric(series, errors="coerce")
12
+ s = series.astype(str)
13
+ s = s.str.replace("\u00a0", "", regex=False)
14
+ s = s.str.replace(" ", "", regex=False)
15
+ s = s.str.replace("%", "", regex=False)
16
+ s = s.str.replace(",", ".", regex=False)
17
+ s = s.replace({"nan": np.nan, "None": np.nan, "": np.nan})
18
+ return pd.to_numeric(s, errors="coerce")
19
+
20
+
21
+ def parse_datetime(series: pd.Series) -> pd.Series:
22
+ if series.empty:
23
+ return pd.to_datetime(series, errors="coerce")
24
+ first = series.dropna().astype(str).iloc[0] if series.dropna().any() else ""
25
+
26
+ formats: list[str | None] = []
27
+ if len(first) > 10:
28
+ formats.extend(
29
+ [
30
+ "%m.%d.%Y %H:%M:%S",
31
+ "%d.%m.%Y %H:%M:%S",
32
+ "%Y-%m-%d %H:%M:%S",
33
+ "%Y/%m/%d %H:%M:%S",
34
+ "%d/%m/%Y %H:%M:%S",
35
+ "%m/%d/%Y %H:%M:%S",
36
+ ]
37
+ )
38
+ formats.extend(
39
+ [
40
+ "%m.%d.%Y",
41
+ "%d.%m.%Y",
42
+ "%Y-%m-%d",
43
+ "%Y/%m/%d",
44
+ "%d/%m/%Y",
45
+ "%m/%d/%Y",
46
+ ]
47
+ )
48
+
49
+ for fmt in formats:
50
+ dt = pd.to_datetime(series, errors="coerce", format=fmt)
51
+ if dt.notna().any():
52
+ return dt
53
+
54
+ return pd.to_datetime(series, errors="coerce")
55
+
56
+
57
+ def extract_site_code(value: object) -> int | None:
58
+ if value is None or (isinstance(value, float) and np.isnan(value)):
59
+ return None
60
+ s = str(value)
61
+ m = re.search(r"(\d{4,7})", s)
62
+ if not m:
63
+ return None
64
+ try:
65
+ return int(m.group(1))
66
+ except ValueError:
67
+ return None
68
+
69
+
70
+ def infer_date_col(df: pd.DataFrame) -> str:
71
+ for c in ["PERIOD_START_TIME", "PERIOD_START_DATE", "date", "Date", "DATE"]:
72
+ if c in df.columns:
73
+ return c
74
+ raise ValueError("Cannot find a date column (expected PERIOD_START_TIME)")
75
+
76
+
77
+ def infer_id_col(df: pd.DataFrame, rat: str) -> str:
78
+ rat_candidates = {
79
+ "2G": ["BCF name", "BCF", "BTS name", "BSC name", "DN"],
80
+ "3G": ["WBTS name", "WBTS ID", "DN"],
81
+ "LTE": ["LNBTS name", "MRBTS/SBTS name", "DN"],
82
+ }
83
+
84
+ candidates = [c for c in rat_candidates.get(rat, []) if c in df.columns]
85
+ if not candidates and "DN" in df.columns:
86
+ candidates = ["DN"]
87
+ if not candidates:
88
+ raise ValueError(f"Cannot infer an entity/site column for {rat} dataset")
89
+
90
+ physical_codes: set[int] | None = None
91
+ try:
92
+ physical = load_physical_db()
93
+ if not physical.empty and "code" in physical.columns:
94
+ physical_codes = set(
95
+ pd.to_numeric(physical["code"], errors="coerce")
96
+ .dropna()
97
+ .astype(int)
98
+ .tolist()
99
+ )
100
+ except Exception:
101
+ physical_codes = None
102
+
103
+ if not physical_codes:
104
+ return candidates[0]
105
+
106
+ best_col = candidates[0]
107
+ best_score = -1.0
108
+ for c in candidates:
109
+ sample = df[c].head(2000)
110
+ codes = sample.apply(extract_site_code)
111
+ non_null = float(codes.notna().mean()) if len(codes) else 0.0
112
+
113
+ if physical_codes:
114
+ match = (
115
+ float(codes.dropna().astype(int).isin(physical_codes).mean())
116
+ if codes.notna().any()
117
+ else 0.0
118
+ )
119
+ score = match * 10.0 + non_null
120
+ else:
121
+ score = non_null
122
+
123
+ if score > best_score:
124
+ best_score = score
125
+ best_col = c
126
+
127
+ return best_col
128
+
129
+
130
+ def non_kpi_identifier_cols(df: pd.DataFrame, rat: str) -> set[str]:
131
+ common = {
132
+ "DN",
133
+ "PLMN name",
134
+ "RNC name",
135
+ "BSC name",
136
+ "BCF name",
137
+ "MRBTS/SBTS name",
138
+ "LNBTS name",
139
+ "WBTS name",
140
+ "WBTS ID",
141
+ }
142
+ rat_specific = {
143
+ "2G": {"BSC name", "BSC", "BCF name", "BCF", "BTS name"},
144
+ "3G": {"PLMN name", "RNC name", "WBTS name", "WBTS ID"},
145
+ "LTE": {"MRBTS/SBTS name", "LNBTS name"},
146
+ }
147
+ cols = set()
148
+ for c in common.union(rat_specific.get(rat, set())):
149
+ if c in df.columns:
150
+ cols.add(c)
151
+ return cols
152
+
153
+
154
+ def infer_agg(kpi: str) -> str:
155
+ k = str(kpi).lower()
156
+ if any(x in k for x in ["traffic", "volume", "erl", "total", "gbytes", "gb"]):
157
+ return "sum"
158
+ return "mean"
159
+
160
+
161
+ def load_physical_db() -> pd.DataFrame:
162
+ physical_db = get_physical_db().copy()
163
+ physical_db["code"] = physical_db["Code_Sector"].str.split("_").str[0]
164
+ physical_db["code"] = pd.to_numeric(physical_db["code"], errors="coerce")
165
+ physical_db = physical_db.dropna(subset=["code"])
166
+ physical_db["code"] = physical_db["code"].astype(int)
167
+ keep = [
168
+ c for c in ["code", "Longitude", "Latitude", "City"] if c in physical_db.columns
169
+ ]
170
+ return physical_db[keep].drop_duplicates("code")
171
+
172
+
173
+ def build_daily_kpi(df_raw: pd.DataFrame, rat: str) -> tuple[pd.DataFrame, list[str]]:
174
+ df = df_raw.copy()
175
+ date_col = infer_date_col(df)
176
+ id_col = infer_id_col(df, rat)
177
+
178
+ df["date"] = parse_datetime(df[date_col])
179
+ df = df.dropna(subset=["date"])
180
+ df["date_only"] = df["date"].dt.date
181
+
182
+ df["site_code"] = df[id_col].apply(extract_site_code)
183
+ df = df.dropna(subset=["site_code"])
184
+ df["site_code"] = df["site_code"].astype(int)
185
+
186
+ meta = {date_col, id_col, "date", "date_only", "site_code"}
187
+ meta = meta.union(non_kpi_identifier_cols(df, rat))
188
+ candidate_cols = [c for c in df.columns if c not in meta]
189
+
190
+ numeric_cols: dict[str, pd.Series] = {}
191
+ for c in candidate_cols:
192
+ numeric_cols[c] = to_numeric(df[c])
193
+
194
+ numeric_df = pd.DataFrame(numeric_cols)
195
+ kpi_cols = [c for c in numeric_df.columns if numeric_df[c].notna().any()]
196
+ if not kpi_cols:
197
+ raise ValueError(f"No numeric KPI columns detected for {rat}")
198
+
199
+ base = pd.concat(
200
+ [
201
+ df[["site_code", "date_only"]].reset_index(drop=True),
202
+ numeric_df[kpi_cols].reset_index(drop=True),
203
+ ],
204
+ axis=1,
205
+ )
206
+
207
+ agg_dict = {k: infer_agg(k) for k in kpi_cols}
208
+ daily = base.groupby(["site_code", "date_only"], as_index=False).agg(agg_dict)
209
+
210
+ physical = load_physical_db()
211
+ if not physical.empty:
212
+ daily = pd.merge(
213
+ daily, physical, left_on="site_code", right_on="code", how="left"
214
+ )
215
+ daily = daily.drop(columns=[c for c in ["code"] if c in daily.columns])
216
+
217
+ daily["RAT"] = rat
218
+
219
+ return daily, kpi_cols
process_kpi/kpi_health_check/rules.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ def infer_kpi_direction(kpi: str) -> str:
2
+ k = str(kpi).lower()
3
+ lower_is_better = [
4
+ "drop",
5
+ "dcr",
6
+ "blocking",
7
+ "block",
8
+ "congestion",
9
+ "loss",
10
+ "discard",
11
+ "rtwp",
12
+ "prb usage",
13
+ "usage",
14
+ "fail",
15
+ ]
16
+ if any(x in k for x in lower_is_better):
17
+ return "lower_is_better"
18
+ return "higher_is_better"
19
+
20
+
21
+ def infer_kpi_sla(kpi: str, direction: str) -> float | None:
22
+ k = str(kpi).lower()
23
+ if direction == "higher_is_better" and any(
24
+ x in k for x in ["availability", "cssr", "success", " sr"]
25
+ ):
26
+ return 98.0
27
+ if direction == "lower_is_better" and any(
28
+ x in k for x in ["drop", "dcr", "blocking", "congestion", "loss", "discard"]
29
+ ):
30
+ return 2.0
31
+ return None