SHELLAPANDIANGANHUNGING commited on
Commit
86440a2
·
verified ·
1 Parent(s): 317487e

Upload 5 files

Browse files
Files changed (5) hide show
  1. README.md +42 -12
  2. app.py +1247 -0
  3. btech.png +0 -0
  4. data.csv +0 -0
  5. requirements.txt +13 -2
README.md CHANGED
@@ -1,19 +1,49 @@
1
  ---
2
- title: Pln
3
- emoji: 🚀
4
- colorFrom: red
5
  colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
  pinned: false
11
- short_description: Streamlit template space
12
  ---
13
 
14
- # Welcome to Streamlit!
15
 
16
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
 
17
 
18
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
19
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: MineVision AI - Advanced Fatigue Analytics
3
+ emoji: ⛏️
4
+ colorFrom: blue
5
  colorTo: red
6
+ sdk: streamlit
7
+ sdk_version: 1.38.0 # Ganti dengan versi streamlit yang digunakan
8
+ app_file: app.py
 
9
  pinned: false
10
+ license: apache-2.0
11
  ---
12
 
13
+ # MineVision AI - Advanced Fatigue Analytics
14
 
15
+ ## Deskripsi
16
+ Aplikasi ini adalah dashboard analitik kelelahan berbasis web yang dirancang untuk operasi pertambangan. Menggunakan data dari sistem deteksi kelelahan (seperti Wenco DSS), aplikasi ini menyediakan wawasan dan analisis real-time untuk membantu mengidentifikasi, menilai, dan mengelola risiko kelelahan operator. Tujuannya adalah untuk meningkatkan keselamatan kerja dan produktivitas dengan mengurangi kecelakaan yang terkait dengan kelelahan.
17
 
18
+ ## Fitur Utama
19
+ * **Dashboard Eksekutif**: Menampilkan metrik keselamatan utama seperti total alert, jumlah operator dan aset, serta durasi rata-rata kejadian.
20
+ * **Analisis Tren**: Visualisasi tren kelelahan berdasarkan jam, shift, hari dalam seminggu, dan minggu.
21
+ * **Analisis Lanjutan**: Analisis berdasarkan jenis armada, kecepatan vs jam, durasi vs jam, distribusi kecepatan, dan distribusi operator per shift.
22
+ * **Kategorisasi Risiko Kelelahan**: Menganalisis kejadian berdasarkan matriks risiko kelelahan (Kritis, Tinggi, Sedang, Rendah) berdasarkan kecepatan dan waktu.
23
+ * **Wawasan Berbasis AI**: Ringkasan otomatis dan wawasan berdasarkan data yang dianalisis.
24
+ * **Asisten AI Interaktif**: Chatbot sederhana untuk menanyakan informasi tentang data kelelahan (operator terbanyak, shift terbanyak, dll.).
25
+
26
+ ## Teknologi yang Digunakan
27
+ * **Streamlit**: Framework untuk membuat aplikasi web interaktif dalam Python.
28
+ * **Pandas**: Manipulasi dan analisis data.
29
+ * **Plotly/Plotly Express**: Visualisasi data interaktif.
30
+ * **Openpyxl**: Pembacaan file Excel.
31
+
32
+ ## Cara Menggunakan
33
+ 1. Akses aplikasi melalui URL Hugging Face Spaces.
34
+ 2. Gunakan filter di sidebar untuk menyaring data berdasarkan Tahun, Bulan, Minggu, Rentang Tanggal, Operator, Shift, dan Rentang Jam.
35
+ 3. Jelajahi berbagai bagian dashboard untuk memahami pola kelelahan.
36
+ 4. Gunakan kotak chat "MineVision AI Assistant" di bagian atas untuk menanyakan pertanyaan spesifik tentang data.
37
+
38
+ ## Struktur Proyek
39
+ * `app.py`: File utama yang berisi kode aplikasi Streamlit.
40
+ * `requirements.txt`: File yang berisi daftar dependensi Python yang diperlukan untuk menjalankan aplikasi.
41
+ * `manual fatique.xlsx`: File data input contoh (jika disertakan dalam repositori).
42
+
43
+ ## Catatan
44
+ * Aplikasi ini dirancang untuk menganalisis data kelelahan operator dari file Excel. Pastikan struktur data masukan sesuai atau sesuaikan kode untuk membaca data dari sumber lain.
45
+ * Wawasan dan rekomendasi didasarkan pada analisis data historis dan prinsip-prinsip manajemen risiko kelelahan (FRMS).
46
+ * Asisten AI saat ini menyediakan jawaban berbasis aturan sederhana berdasarkan data yang tersedia dan informasi umum tentang FRMS. Ini bukan model AI canggih seperti GPT.
47
+
48
+ ## Lisensi
49
+ Apache 2.0
app.py ADDED
@@ -0,0 +1,1247 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import streamlit as st
3
+ import pandas as pd
4
+ import plotly.express as px
5
+ import plotly.graph_objects as go
6
+ import numpy as np
7
+ from datetime import datetime, timedelta
8
+ from typing import List
9
+ import os
10
+
11
+ # =================== PAGE CONFIG ===================
12
+ st.set_page_config(
13
+ page_title="PLN Audit Insight & Intelligence Dashboard",
14
+ page_icon="",
15
+ layout="wide",
16
+ initial_sidebar_state="expanded"
17
+ )
18
+
19
+ # =================== CUSTOM CSS (Updated for PLN Colors) ===================
20
+ st.markdown("""<style>
21
+ .main-header {
22
+ background-color: white;
23
+ padding: 25px;
24
+ border-radius: 12px;
25
+ margin-bottom: 25px;
26
+ box-shadow: 0 4px 12px rgba(0,0,0,0.06);
27
+ border: 1px solid #e0e0e0;
28
+ }
29
+ h1, h2, h3, h4, h5, .stMarkdown h1, .stMarkdown h2, .stMarkdown h3 {
30
+ text-align: center;
31
+ font-weight: 700;
32
+ color: #003DA5; /* Dark Blue - PLN Color */
33
+ }
34
+ .metric-card {
35
+ background: white;
36
+ padding: 16px;
37
+ border-radius: 10px;
38
+ box-shadow: 0 3px 10px rgba(0,0,0,0.05);
39
+ text-align: center;
40
+ border: 1px solid #f0f0f0;
41
+ }
42
+ .ai-insight {
43
+ background: #f0f4ff; /* Light Blue */
44
+ padding: 14px 18px;
45
+ border-left: 4px solid #003DA5; /* PLN Blue */
46
+ margin: 10px 0;
47
+ border-radius: 0 6px 6px 0;
48
+ font-size: 0.95em;
49
+ }
50
+ .ai-recommendation {
51
+ background: #e8f5e9;
52
+ padding: 14px 18px;
53
+ border-left: 4px solid #4caf50;
54
+ margin: 10px 0;
55
+ border-radius: 0 6px 6px 0;
56
+ font-size: 0.95em;
57
+ }
58
+ .risk-very-high { color: #c62828; font-weight: bold; }
59
+ .risk-high { color: #d32f2f; }
60
+ .risk-moderate { color: #f57c00; }
61
+ .risk-slight { color: #388e3c; }
62
+ .trend-worsening { color: #d32f2f; }
63
+ .trend-improvement { color: #388e3c; }
64
+ .trend-stable { color: #616161; }
65
+ .chart-container {
66
+ border: 1px solid #e0e0e0;
67
+ border-radius: 8px;
68
+ padding: 15px;
69
+ margin: 10px 0;
70
+ background-color: white;
71
+ box-shadow: 0 2px 6px rgba(0,0,0,0.03);
72
+ }
73
+ .section-title {
74
+ color: #003DA5; /* PLN Blue */
75
+ font-weight: 700;
76
+ font-size: 1.5em;
77
+ text-align: left;
78
+ margin-top: 20px;
79
+ margin-bottom: 10px;
80
+ }
81
+ .ai-section {
82
+ background: #ffffff;
83
+ padding: 20px;
84
+ border-radius: 8px;
85
+ margin: 10px 0;
86
+ box-shadow: 0 2px 6px rgba(0,0,0,0.03);
87
+ }
88
+ /* PLN Styled Selectbox and Multiselect */
89
+ .stSelectbox > label, .stMultiselect > label {
90
+ color: #003DA5; /* PLN Blue */
91
+ font-weight: bold;
92
+ }
93
+ .stSelectbox > div > div, .stMultiselect > div > div {
94
+ border: 2px solid #003DA5; /* PLN Blue Border */
95
+ border-radius: 8px;
96
+ }
97
+ .st-bq {
98
+ background-color: #f0f4ff; /* Light Blue Background */
99
+ }
100
+ .stButton > button {
101
+ background-color: #003DA5; /* PLN Blue Button */
102
+ color: white;
103
+ border: none;
104
+ border-radius: 8px;
105
+ padding: 8px 16px;
106
+ font-weight: bold;
107
+ }
108
+ .stButton > button:hover {
109
+ background-color: #0050A0; /* Darker PLN Blue on Hover */
110
+ }
111
+ /* Filter Container Styling */
112
+ .filter-container {
113
+ background-color: #f9f9f9;
114
+ border-radius: 10px;
115
+ padding: 15px;
116
+ margin-bottom: 15px;
117
+ box-shadow: 0 2px 4px rgba(0,0,0,0.05);
118
+ }
119
+ .filter-title {
120
+ color: #003DA5;
121
+ font-weight: bold;
122
+ font-size: 1.1em;
123
+ margin-bottom: 10px;
124
+ text-align: center;
125
+ }
126
+ </style>""", unsafe_allow_html=True)
127
+
128
+ # =================== DATA LOADING (FROM data.xlsx) ===================
129
+ @st.cache_data(ttl=300) # refresh every 5 min
130
+ def load_data():
131
+ file_path = "data.xlsx"
132
+ if not os.path.exists(file_path):
133
+ st.error(f"❌ File **`{file_path}`** not found. Please ensure it's in the same directory as this script.")
134
+ return pd.DataFrame()
135
+ try:
136
+ # Load Excel file
137
+ df = pd.read_excel(file_path, sheet_name='Sheet1', engine='openpyxl')
138
+ # Check for required columns
139
+ required_cols = ['created_at']
140
+ missing = [c for c in required_cols if c not in df.columns]
141
+ if missing:
142
+ st.error(f"❌ Missing required columns: {missing}. Available: {list(df.columns)}")
143
+ return pd.DataFrame()
144
+ # Parse datetime
145
+ df['created_at'] = pd.to_datetime(df['created_at'], errors='coerce')
146
+ if df['created_at'].isna().all():
147
+ st.error("❌ `created_at` column could not be parsed as datetime.")
148
+ return pd.DataFrame()
149
+ # Optional: close_at
150
+ if 'close_at' in df.columns:
151
+ df['close_at'] = pd.to_datetime(df['close_at'], errors='coerce')
152
+ df['days_to_close'] = (df['close_at'] - df['created_at']).dt.total_seconds() / (24 * 3600)
153
+ df['days_to_close'] = df['days_to_close'].apply(lambda x: x if x >= 0 else np.nan)
154
+ else:
155
+ df['days_to_close'] = np.nan
156
+ # Derived columns
157
+ df['created_month'] = df['created_at'].dt.to_period('M')
158
+ df['created_date'] = df['created_at'].dt.date
159
+ df['created_week'] = df['created_at'].dt.to_period('W')
160
+ # Keep only valid rows
161
+ df = df.dropna(subset=['created_at']).copy()
162
+ # Log shape
163
+ st.sidebar.success(f"Loaded {len(df):,} Audit Findings from `data.xlsx`")
164
+ return df
165
+ except Exception as e:
166
+ st.exception(f"Error loading data.xlsx: {e}")
167
+ return pd.DataFrame()
168
+
169
+ df = load_data()
170
+ if df.empty:
171
+ st.stop()
172
+
173
+ # =================== SIDEBAR FILTERS (Perbaikan) ===================
174
+ st.sidebar.markdown('<div class="filter-container">', unsafe_allow_html=True)
175
+ st.sidebar.markdown('<h4 class="filter-title">Filter Dashboard</h4>', unsafe_allow_html=True)
176
+
177
+ # Inisialisasi df_filtered
178
+ df_filtered = df.copy()
179
+
180
+ # Flag to track if filters were applied
181
+ filters_applied = False
182
+
183
+ # 1. Date Range Filter
184
+ min_date = df['created_at'].min().date()
185
+ max_date = df['created_at'].max().date()
186
+ date_range = st.sidebar.date_input(
187
+ "Date Range",
188
+ value=(min_date, max_date),
189
+ min_value=min_date,
190
+ max_value=max_date
191
+ )
192
+
193
+ # 2. Filter by Vendor (nama_perusahaan) - Default to All
194
+ if 'nama_perusahaan' in df.columns:
195
+ unique_vendors = sorted(df['nama_perusahaan'].dropna().astype(str).unique())
196
+ all_vendors_option = "All Vendors"
197
+ vendor_options = [all_vendors_option] + list(unique_vendors)
198
+ selected_vendor = st.sidebar.selectbox("Vendor", vendor_options, index=0) # Default to "All"
199
+ if selected_vendor != all_vendors_option:
200
+ df_filtered = df_filtered[df_filtered['nama_perusahaan'].astype(str) == selected_vendor]
201
+ filters_applied = True
202
+
203
+ # 3. Filter by Area/Unit Type (temuan_nama_distrik or creator_nama_distrik) - Renamed
204
+ area_col = None
205
+ if 'temuan_nama_distrik' in df_filtered.columns:
206
+ area_col = 'temuan_nama_distrik'
207
+ elif 'creator_nama_distrik' in df_filtered.columns:
208
+ area_col = 'creator_nama_distrik'
209
+
210
+ if area_col:
211
+ # Define mapping for display names
212
+ area_mapping = {
213
+ 'UMRO': 'Unit Maintenance',
214
+ 'UP GRESIK': 'Unit Pembangkit'
215
+ }
216
+ unique_areas_raw = sorted(df_filtered[area_col].dropna().astype(str).unique())
217
+ # Map raw values to display names, keep unmapped values as is
218
+ unique_areas_display = [area_mapping.get(area, area) for area in unique_areas_raw]
219
+ # Prepend "All" option
220
+ all_areas_option = "All Units"
221
+ area_options = [all_areas_option] + unique_areas_display
222
+ selected_area_display = st.sidebar.selectbox("Unit Type", area_options, index=0) # Default to "All"
223
+ if selected_area_display != all_areas_option:
224
+ # Reverse map the selected display name back to the raw value for filtering
225
+ selected_area_raw = next((raw for raw, disp in area_mapping.items() if disp == selected_area_display), selected_area_display)
226
+ df_filtered = df_filtered[df_filtered[area_col].astype(str) == selected_area_raw]
227
+ filters_applied = True
228
+
229
+ # 4. Status filter - Changed to selectbox (dropdown)
230
+ status_filter_applied = False
231
+ if 'temuan_status' in df_filtered.columns:
232
+ all_status = sorted(df_filtered['temuan_status'].dropna().astype(str).unique())
233
+ # Prepend "All" option
234
+ all_status_option = "All Status"
235
+ status_options = [all_status_option] + list(all_status)
236
+ selected_status = st.sidebar.selectbox(
237
+ "Status",
238
+ status_options,
239
+ index=0 # Default to "All"
240
+ )
241
+ if selected_status != all_status_option:
242
+ df_filtered = df_filtered[df_filtered['temuan_status'].astype(str) == selected_status]
243
+ status_filter_applied = True
244
+ filters_applied = True
245
+
246
+ # Apply date filter *after* other filters
247
+ if len(date_range) == 2:
248
+ df_filtered = df_filtered[
249
+ (df_filtered['created_at'].dt.date >= date_range[0]) &
250
+ (df_filtered['created_at'].dt.date <= date_range[1])
251
+ ]
252
+ if date_range[0] != min_date or date_range[1] != max_date:
253
+ filters_applied = True
254
+
255
+ # Submit Button
256
+ submit_clicked = st.sidebar.button("Apply Filters")
257
+
258
+ # Apply filters logic when button is clicked
259
+ if submit_clicked:
260
+ # The filtering based on selections already happened above
261
+ # Here we just update the summary based on the current state of df_filtered
262
+ active_filters = []
263
+ if 'selected_vendor' in locals() and selected_vendor != all_vendors_option:
264
+ active_filters.append(f"Vendor: {selected_vendor}")
265
+ if 'selected_area_display' in locals() and selected_area_display != all_areas_option:
266
+ active_filters.append(f"Unit: {selected_area_display}")
267
+ if 'selected_status' in locals() and selected_status != all_status_option:
268
+ active_filters.append(f"Status: {selected_status}")
269
+ if len(date_range) == 2 and (date_range[0] != min_date or date_range[1] != max_date):
270
+ active_filters.append(f"Date: {date_range[0]} to {date_range[1]}")
271
+
272
+ if active_filters:
273
+ st.sidebar.success("**Active Filters:**")
274
+ for f in active_filters:
275
+ st.sidebar.markdown(f"- {f}")
276
+ st.sidebar.info(f"Showing {len(df_filtered)} records based on filters.")
277
+ else:
278
+ st.sidebar.info("No specific filters applied (showing all records).")
279
+ else:
280
+ # Show default message when not submitted yet
281
+ st.sidebar.info("Set filters and click 'Apply Filters'.")
282
+
283
+ st.sidebar.markdown('</div>', unsafe_allow_html=True)
284
+
285
+
286
+ # =================== HEADER ===================
287
+ st.markdown("""
288
+ <div class="main-header">
289
+ <h1>PLN Audit Insight & Intelligence Dashboard</h1>
290
+ <p style="text-align:center; color:#546e7a; font-size:1.1em; margin-top:8px;">
291
+ Operational Risk Intelligence for Audit & Compliance
292
+ </p>
293
+ </div>
294
+ """, unsafe_allow_html=True)
295
+
296
+
297
+ # =================== 1. Pie Charts: Temuan/Person by Company (PG & UM) - PERBAIKAN ===================
298
+ st.markdown("<h3 class='section-title'>OBJECTIVE 1 - Company Reporting Activity: Who Reports the Most?</h3>", unsafe_allow_html=True)
299
+
300
+ # Asumsikan df_filtered adalah data utama yang telah difilter
301
+ df_local = df_filtered.copy()
302
+
303
+ # Tambah kolom bulan
304
+ df_local['created_month'] = df_local['created_at'].dt.to_period('M')
305
+
306
+ # --- Langsung buat Area_Type PG / UM tanpa filter ---
307
+
308
+ if 'temuan_kode_distrik' in df_local.columns:
309
+
310
+ df_local['Area_Type'] = df_local['temuan_kode_distrik'].apply(
311
+ lambda x: 'PG' if 'PG' in str(x).upper()
312
+ else 'UM' if 'UM' in str(x).upper()
313
+ else 'Other'
314
+ )
315
+
316
+ # Otomatis bagi dataset
317
+ df_pg = df_local[df_local['Area_Type'] == 'PG'].copy()
318
+ df_um = df_local[df_local['Area_Type'] == 'UM'].copy()
319
+
320
+ else:
321
+ df_pg = pd.DataFrame()
322
+ df_um = pd.DataFrame()
323
+
324
+ # --- Fungsi untuk menghitung rasio perusahaan ---
325
+ def calculate_avg_ratio_per_company(df_area):
326
+ if df_area.empty:
327
+ # Jika area tidak dipilih atau data kosong setelah filter
328
+ return pd.DataFrame()
329
+ # Hitung temuan per bulan per perusahaan
330
+ findings_by_company_month = df_area.groupby(['created_month', 'nama_perusahaan']).size().reset_index(name='findings_count')
331
+ # Hitung jumlah orang unik per bulan per perusahaan
332
+ creators_by_company_month = df_area.groupby(['created_month', 'nama_perusahaan'])['creator_nid'].nunique().reset_index(name='unique_creators')
333
+ # Gabung
334
+ merged = findings_by_company_month.merge(creators_by_company_month, on=['created_month', 'nama_perusahaan'], how='outer')
335
+ # Isi NaN dengan 0 untuk kolom yang mungkin hilang dari merge
336
+ merged = merged.fillna({'findings_count': 0, 'unique_creators': 0})
337
+ # Filter untuk menghindari pembagian dengan nol
338
+ # Kita hanya ingin menghitung rasio jika jumlah pelapor > 0
339
+ merged = merged[merged['unique_creators'] > 0]
340
+ # Hitung rasio (ignore NaN)
341
+ # Pembagian oleh 0 akan menghasilkan inf, jadi kita ganti inf dengan NaN
342
+ merged['ratio'] = merged['findings_count'] / merged['unique_creators']
343
+ merged['ratio'] = merged['ratio'].replace([np.inf, -np.inf], np.nan)
344
+
345
+ # Jika tidak ada baris valid setelah filter, kembalikan DataFrame kosong
346
+ if merged.empty:
347
+ return pd.DataFrame()
348
+
349
+ # Rata-rata bulanan per perusahaan
350
+ # Group by nama_perusahaan dan ambil mean dari rasio
351
+ # mean() akan mengabaikan NaN secara default
352
+ avg_ratio = merged.groupby('nama_perusahaan')['ratio'].mean().reset_index(name='avg_monthly_ratio')
353
+
354
+ # Jika hasil akhirnya hanya NaN (karena semua rasio perusahaan adalah NaN), kembalikan DataFrame kosong
355
+ if avg_ratio['avg_monthly_ratio'].isna().all():
356
+ return pd.DataFrame()
357
+
358
+ return avg_ratio
359
+
360
+ # Hitung untuk masing-masing area
361
+ avg_ratio_pg = calculate_avg_ratio_per_company(df_pg)
362
+ avg_ratio_um = calculate_avg_ratio_per_company(df_um)
363
+
364
+ # Fungsi untuk menentukan warna
365
+ def get_color_map(company_series):
366
+ pln_color = "#FFD700" # Kuning untuk PLN
367
+ # Daftar warna biru (dari gelap ke terang)
368
+ blue_colors = ["#1E90FF", "#87CEEB", "#B0E0E6", "#ADD8E6", "#E0F6FF"]
369
+ color_map = {}
370
+ for company in company_series:
371
+ if 'PLN' in str(company).upper():
372
+ color_map[company] = pln_color
373
+ else:
374
+ # Pilih warna biru berdasarkan indeks, ulangi jika perlu
375
+ idx = len([c for c in color_map.values() if c != pln_color]) % len(blue_colors)
376
+ color_map[company] = blue_colors[idx]
377
+ return color_map
378
+
379
+ # Plot
380
+ col1, col2 = st.columns(2)
381
+
382
+ with col1:
383
+ st.markdown("<h5>Avg Monthly Finding by Company</h5>", unsafe_allow_html=True)
384
+ if not avg_ratio_pg.empty:
385
+ color_discrete_map_pg = get_color_map(avg_ratio_pg['nama_perusahaan'])
386
+ fig_pg = px.pie(
387
+ avg_ratio_pg,
388
+ values='avg_monthly_ratio',
389
+ names='nama_perusahaan',
390
+ title='Unit Pembangkit Company',
391
+ color='nama_perusahaan',
392
+ color_discrete_map=color_discrete_map_pg
393
+ )
394
+ st.plotly_chart(fig_pg, use_container_width=True)
395
+
396
+ # AI Insight untuk PG
397
+ if not avg_ratio_pg.empty:
398
+ # Temukan perusahaan dengan rasio tertinggi dan terendah di PG
399
+ top_company_pg = avg_ratio_pg.loc[avg_ratio_pg['avg_monthly_ratio'].idxmax()]
400
+ low_company_pg = avg_ratio_pg.loc[avg_ratio_pg['avg_monthly_ratio'].idxmin()]
401
+
402
+ st.markdown("### Insight")
403
+ insight_text = (
404
+ f"<div class='ai-insight'>"
405
+ f"In PG Area, <strong>{top_company_pg['nama_perusahaan']}</strong> has the highest average finding-to-person ratio "
406
+ f"(<strong>{top_company_pg['avg_monthly_ratio']:.2f}</strong>), indicating potentially high exposure or active reporting. "
407
+ f"Consider reviewing their operational procedures. "
408
+ f"Conversely, <strong>{low_company_pg['nama_perusahaan']}</strong> has the lowest ratio "
409
+ f"(<strong>{low_company_pg['avg_monthly_ratio']:.2f}</strong>), suggesting effective risk management or lower activity levels."
410
+ f"</div>"
411
+ )
412
+ st.markdown(insight_text, unsafe_allow_html=True)
413
+ else:
414
+ st.warning("No data for PG area or all ratios are NaN.")
415
+
416
+ with col2:
417
+ st.markdown("<h5>Avg Monthly Finding by Company</h5>", unsafe_allow_html=True)
418
+ if not avg_ratio_um.empty:
419
+ color_discrete_map_um = get_color_map(avg_ratio_um['nama_perusahaan'])
420
+ fig_um = px.pie(
421
+ avg_ratio_um,
422
+ values='avg_monthly_ratio',
423
+ names='nama_perusahaan',
424
+ title='Unit Maintenance',
425
+ color='nama_perusahaan',
426
+ color_discrete_map=color_discrete_map_um
427
+ )
428
+ st.plotly_chart(fig_um, use_container_width=True)
429
+
430
+ # AI Insight untuk UM
431
+ if not avg_ratio_um.empty:
432
+ # Temukan perusahaan dengan rasio tertinggi dan terendah di UM
433
+ top_company_um = avg_ratio_um.loc[avg_ratio_um['avg_monthly_ratio'].idxmax()]
434
+ low_company_um = avg_ratio_um.loc[avg_ratio_um['avg_monthly_ratio'].idxmin()]
435
+
436
+ st.markdown("### Insight")
437
+ insight_text = (
438
+ f"<div class='ai-insight'>"
439
+ f"In UM Area, <strong>{top_company_um['nama_perusahaan']}</strong> exhibits the highest average finding-to-person ratio "
440
+ f"(<strong>{top_company_um['avg_monthly_ratio']:.2f}</strong>), warranting a focused safety audit. "
441
+ f"<strong>{low_company_um['nama_perusahaan']}</strong> shows the lowest ratio "
442
+ f"(<strong>{low_company_um['avg_monthly_ratio']:.2f}</strong>), which could reflect strong safety practices or requires verification of reporting completeness."
443
+ f"</div>"
444
+ )
445
+ st.markdown(insight_text, unsafe_allow_html=True)
446
+ else:
447
+ st.warning("No data for UM area or all ratios are NaN.")
448
+
449
+
450
+ # =================== 2. Treemap: Distribusi Temuan per Area (nama_lokasi_full) - PERBAIKAN ===================
451
+ st.markdown("<h3 class='section-title'>OBJECTIVE 2 - Active vs Inactive Locations: Who Leads?</h3>", unsafe_allow_html=True)
452
+
453
+ # Hitung temuan per bulan per lokasi
454
+ findings_by_location_month = df_local.groupby(['created_month', 'nama_lokasi_full']).size().reset_index(name='findings_count')
455
+ # Hitung jumlah orang unik per bulan per lokasi
456
+ creators_by_location_month = df_local.groupby(['created_month', 'nama_lokasi_full'])['creator_nid'].nunique().reset_index(name='unique_creators')
457
+ # Gabung
458
+ merged_loc = findings_by_location_month.merge(creators_by_location_month, on=['created_month', 'nama_lokasi_full'], how='outer')
459
+ # Isi NaN dengan 0 untuk kolom yang mungkin hilang dari merge
460
+ merged_loc = merged_loc.fillna({'findings_count': 0, 'unique_creators': 0})
461
+ # Filter untuk menghindari pembagian dengan nol
462
+ merged_loc = merged_loc[merged_loc['unique_creators'] > 0]
463
+ # Hitung rasio (ignore NaN)
464
+ # Pembagian oleh 0 akan menghasilkan inf, jadi kita ganti inf dengan NaN
465
+ merged_loc['ratio'] = merged_loc['findings_count'] / merged_loc['unique_creators']
466
+ merged_loc['ratio'] = merged_loc['ratio'].replace([np.inf, -np.inf], np.nan)
467
+
468
+ # Rata-rata bulanan per lokasi
469
+ # Group by nama_lokasi_full dan ambil mean dari rasio
470
+ # mean() akan mengabaikan NaN secara default
471
+ avg_ratio_per_location = merged_loc.groupby('nama_lokasi_full')['ratio'].mean().reset_index(name='avg_monthly_ratio')
472
+
473
+ # Filter hasil akhir untuk menghindari NaN
474
+ avg_ratio_per_location = avg_ratio_per_location.dropna(subset=['avg_monthly_ratio'])
475
+
476
+ # Plot Treemap
477
+ if not avg_ratio_per_location.empty:
478
+ # Tambahkan kolom untuk warna berdasarkan kriteria
479
+ def categorize_risk(r):
480
+ if r > 1.3:
481
+ return 'High Activity (> 1.3)' # Warna Hijau
482
+ elif r > 1.0:
483
+ return 'Medium Activity (1.0 - 1.3)' # Warna Kuning
484
+ else:
485
+ return 'Low Activity (<= 1.0)' # Warna Merah
486
+
487
+ avg_ratio_per_location['Activity_Category'] = avg_ratio_per_location['avg_monthly_ratio'].apply(categorize_risk)
488
+
489
+ # Peta warna
490
+ color_map = {
491
+ 'High Activity (> 1.3)': '#4CAF50', # Hijau
492
+ 'Medium Activity (1.0 - 1.3)': '#FFB300', # Kuning
493
+ 'Low Activity (<= 1.0)': '#D32F2F' # Merah
494
+ }
495
+
496
+ # Gunakan treemap plot dengan ukuran mencerminkan rata-rata rasio dan warna berdasarkan kategori aktivitas
497
+ fig_treemap = px.treemap(
498
+ avg_ratio_per_location,
499
+ path=['nama_lokasi_full'], # Path untuk hierarki (hanya satu level di sini)
500
+ values='avg_monthly_ratio', # Nilai yang menentukan ukuran area
501
+ title='Avg Monthly Finding by Location',
502
+ labels={'avg_monthly_ratio': 'Avg Monthly Finding/Person Ratio', 'nama_lokasi_full': 'Location'},
503
+ color='Activity_Category', # Warna berdasarkan kategori aktivitas
504
+ color_discrete_map=color_map
505
+ )
506
+ # Format hover
507
+ fig_treemap.update_traces(
508
+ hovertemplate="<b>%{label}</b><br>Avg Ratio: %{value:.2f}<br>Activity Level: %{color}<extra></extra>"
509
+ )
510
+ fig_treemap.update_layout(height=600)
511
+ st.plotly_chart(fig_treemap, use_container_width=True)
512
+
513
+ # AI Insight untuk Treemap Lokasi (Business-focused)
514
+ if not avg_ratio_per_location.empty:
515
+ # Temukan lokasi dengan rasio tertinggi dan terendah
516
+ top_location = avg_ratio_per_location.loc[avg_ratio_per_location['avg_monthly_ratio'].idxmax()]
517
+ low_location = avg_ratio_per_location.loc[avg_ratio_per_location['avg_monthly_ratio'].idxmin()]
518
+
519
+ st.markdown("### Insight")
520
+ insight_text = (
521
+ f"<div class='ai-insight'>"
522
+ f"The treemap visualizes the average finding-to-person ratio per location, indicating reporting activity levels. "
523
+ f"Locations with <span style='color:#4CAF50; font-weight:bold;'>green</span> color have a high ratio reporting"
524
+ f"Those with <span style='color:#FFB300; font-weight:bold;'>yellow</span> color have a medium ratio, indicating area with moderate reporting. "
525
+ f"Locations with <span style='color:#D32F2F; font-weight:bold;'>red</span> color have a low ratio indicate lower activity levels or potentially under-reporting. "
526
+ f"<strong>{top_location['nama_lokasi_full']}</strong> shows the highest activity level "
527
+ f"(<strong>{top_location['avg_monthly_ratio']:.2f}</strong>, color: {top_location['Activity_Category']}). "
528
+ f"<strong>{low_location['nama_lokasi_full']}</strong> shows the lowest activity level "
529
+ f"(<strong>{low_location['avg_monthly_ratio']:.2f}</strong>, color: {low_location['Activity_Category']}). "
530
+ f"Areas with high activity (green) warrant investigation into the underlying causes of frequent findings. "
531
+ f"Areas with low activity (red) should be reviewed to ensure reporting completeness and identify any hidden risks."
532
+ f"</div>"
533
+ )
534
+ st.markdown(insight_text, unsafe_allow_html=True)
535
+ else:
536
+ st.warning("No data available for location ratio calculation or all ratios are NaN.")
537
+
538
+ import plotly.express as px
539
+ import numpy as np
540
+
541
+ import plotly.express as px
542
+ import numpy as np
543
+ # =================== 3. Reporter & Executor Analysis (3a, 3b, 3c, 3d) ===================
544
+ st.markdown("<h3 class='section-title'>OBJECTIVE 3 - Frequency & Response Time: Who Reports Well? Who Executes Well?</h3>", unsafe_allow_html=True)
545
+
546
+ # 3a & 3b: Reporter Frequency & Executor Lead Time by nama (Average Monthly Rate per Division)
547
+ col_3a, col_3b = st.columns(2)
548
+
549
+ with col_3a:
550
+ st.markdown("<h5>3a. Average Finding by Division (Reporter)</h5>", unsafe_allow_html=True)
551
+ if 'nama' in df_local.columns:
552
+ # Hitung temuan per bulan per nama
553
+ findings_by_nama_month = df_local.groupby(['created_month', 'nama']).size().reset_index(name='findings_count')
554
+ # Hitung jumlah orang unik per bulan per nama
555
+ creators_by_nama_month = df_local.groupby(['created_month', 'nama'])['creator_nid'].nunique().reset_index(name='unique_creators')
556
+ # Gabung
557
+ merged_rep = findings_by_nama_month.merge(creators_by_nama_month, on=['created_month', 'nama'], how='outer')
558
+ # Isi NaN dengan 0 untuk kolom yang mungkin hilang dari merge
559
+ merged_rep = merged_rep.fillna({'findings_count': 0, 'unique_creators': 0})
560
+ # Filter untuk menghindari pembagian dengan nol
561
+ merged_rep = merged_rep[merged_rep['unique_creators'] > 0]
562
+ # Hitung rasio (ignore NaN)
563
+ merged_rep['ratio'] = merged_rep['findings_count'] / merged_rep['unique_creators']
564
+ merged_rep['ratio'] = merged_rep['ratio'].replace([np.inf, -np.inf], np.nan)
565
+
566
+ # Rata-rata bulanan per nama
567
+ avg_ratio_per_nama = merged_rep.groupby('nama')['ratio'].mean().reset_index(name='avg_monthly_ratio')
568
+
569
+ # Filter hasil akhir untuk menghindari NaN
570
+ avg_ratio_per_nama = avg_ratio_per_nama.dropna(subset=['avg_monthly_ratio'])
571
+ if not avg_ratio_per_nama.empty:
572
+ # Tambahkan kolom untuk warna KE DATAFRAME
573
+ # Urutkan untuk menentukan 5 teratas
574
+ avg_ratio_per_nama_sorted = avg_ratio_per_nama.sort_values('avg_monthly_ratio', ascending=True)
575
+ top_5_indices = avg_ratio_per_nama_sorted.tail(5).index
576
+ # Buat warna default, lalu ubah untuk top 5
577
+ avg_ratio_per_nama_sorted['color'] = '#1f77b4' # Warna default plotly
578
+ avg_ratio_per_nama_sorted.loc[avg_ratio_per_nama_sorted.index.isin(top_5_indices), 'color'] = '#4CAF50' # Warna hijau untuk top 5
579
+
580
+ # Pilihan sorting
581
+ sort_option_3a = st.selectbox("Sort 3a by:", ["Lowest First", "Highest First"], key='sort_3a')
582
+ if sort_option_3a == "Highest First":
583
+ avg_ratio_per_nama_sorted = avg_ratio_per_nama_sorted.sort_values('avg_monthly_ratio', ascending=False)
584
+ # Jika "Lowest First", sudah diurutkan ascending di atas
585
+
586
+ fig_rep_nama = px.bar(
587
+ avg_ratio_per_nama_sorted,
588
+ x='avg_monthly_ratio',
589
+ y='nama',
590
+ orientation='h',
591
+ title='Avg Monthly Finding by Division',
592
+ labels={'avg_monthly_ratio': 'Avg Monthly Finding/Person Ratio', 'nama': 'Division'},
593
+ color='color', # Gunakan nama kolom yang ditambahkan
594
+ color_discrete_map={c: c for c in avg_ratio_per_nama_sorted['color'].unique()}, # Peta warna
595
+ text=avg_ratio_per_nama_sorted['avg_monthly_ratio'].apply(lambda x: f'{x:.2f}') # Format 2 angka desimal
596
+ )
597
+ # Hapus legend untuk warna karena tidak informatif
598
+ fig_rep_nama.update_layout(yaxis={'categoryorder': 'total ascending'}, height=500, showlegend=False)
599
+ fig_rep_nama.update_traces(textposition='auto') # Posisi teks otomatis
600
+ st.plotly_chart(fig_rep_nama, use_container_width=True)
601
+
602
+ # AI Insight for 3a
603
+ top_nama = avg_ratio_per_nama_sorted.iloc[-1] if not avg_ratio_per_nama_sorted.empty else None
604
+ low_nama = avg_ratio_per_nama_sorted.iloc[0] if not avg_ratio_per_nama_sorted.empty else None
605
+ if top_nama is not None and low_nama is not None:
606
+ st.markdown("### Insight")
607
+ insight_text = (
608
+ f"<div class='ai-insight'>"
609
+ f"The division <strong>{top_nama['nama']}</strong> has the highest average finding-to-person ratio "
610
+ f"(<strong>{top_nama['avg_monthly_ratio']:.2f}</strong>), indicating potentially high reporting activity or exposure. "
611
+ f"Conversely, <strong>{low_nama['nama']}</strong> has the lowest ratio "
612
+ f"(<strong>{low_nama['avg_monthly_ratio']:.2f}</strong>), suggesting lower activity or potentially under-reporting. "
613
+ f"Monitor high-ratio divisions for potential systemic issues and verify reporting completeness in low-ratio ones."
614
+ f"</div>"
615
+ )
616
+ st.markdown(insight_text, unsafe_allow_html=True)
617
+ else:
618
+ st.warning("No data or all ratios are NaN for reporter analysis by division.")
619
+ else:
620
+ st.warning("Column 'nama' not available for reporter analysis (3a).")
621
+
622
+ with col_3b:
623
+ st.markdown("<h5>3b. Average by Division (Executor)</h5>", unsafe_allow_html=True)
624
+ if 'nama' in df_local.columns and 'days_to_close' in df_local.columns:
625
+ # Hitung rata-rata lead time per nama per bulan
626
+ leadtime_by_nama_month = df_local.groupby(['created_month', 'nama'])['days_to_close'].mean().reset_index(name='avg_leadtime')
627
+ # Rata-rata bulanan keseluruhan per nama
628
+ avg_leadtime_nama = leadtime_by_nama_month.groupby('nama')['avg_leadtime'].mean().reset_index(name='avg_monthly_leadtime')
629
+
630
+ # Filter hasil akhir untuk menghindari NaN
631
+ avg_leadtime_nama = avg_leadtime_nama.dropna(subset=['avg_monthly_leadtime'])
632
+ if not avg_leadtime_nama.empty:
633
+ # Tambahkan kolom untuk warna KE DATAFRAME
634
+ # Urutkan untuk menentukan 5 teratas
635
+ avg_leadtime_nama_sorted = avg_leadtime_nama.sort_values('avg_monthly_leadtime', ascending=True)
636
+ top_5_indices = avg_leadtime_nama_sorted.tail(5).index
637
+ # Buat warna default, lalu ubah untuk top 5
638
+ avg_leadtime_nama_sorted['color'] = '#1f77b4' # Warna default plotly
639
+ avg_leadtime_nama_sorted.loc[avg_leadtime_nama_sorted.index.isin(top_5_indices), 'color'] = '#D32F2F' # Warna merah untuk top 5
640
+
641
+ # Pilihan sorting
642
+ sort_option_3b = st.selectbox("Sort 3b by:", ["Fastest First", "Slowest First"], key='sort_3b')
643
+ if sort_option_3b == "Slowest First":
644
+ avg_leadtime_nama_sorted = avg_leadtime_nama_sorted.sort_values('avg_monthly_leadtime', ascending=False)
645
+ # Jika "Fastest First", sudah diurutkan ascending di atas
646
+
647
+ fig_exec_nama = px.bar(
648
+ avg_leadtime_nama_sorted,
649
+ x='avg_monthly_leadtime',
650
+ y='nama',
651
+ orientation='h',
652
+ title='Avg Monthly Lead Time by Division',
653
+ labels={'avg_monthly_leadtime': 'Avg Lead Time (Days)', 'nama': 'Division'},
654
+ color='color', # Gunakan nama kolom yang ditambahkan
655
+ color_discrete_map={c: c for c in avg_leadtime_nama_sorted['color'].unique()}, # Peta warna
656
+ text=avg_leadtime_nama_sorted['avg_monthly_leadtime'].apply(lambda x: f'{x:.2f}') # Format 2 angka desimal
657
+ )
658
+ # Hapus legend untuk warna karena tidak informatif
659
+ fig_exec_nama.update_layout(yaxis={'categoryorder': 'total ascending'}, height=500, showlegend=False)
660
+ fig_exec_nama.update_traces(textposition='auto') # Posisi teks otomatis
661
+ st.plotly_chart(fig_exec_nama, use_container_width=True)
662
+
663
+ # AI Insight for 3b
664
+ top_nama = avg_leadtime_nama_sorted.iloc[-1] if not avg_leadtime_nama_sorted.empty else None
665
+ low_nama = avg_leadtime_nama_sorted.iloc[0] if not avg_leadtime_nama_sorted.empty else None
666
+ if top_nama is not None and low_nama is not None:
667
+ st.markdown("### Insight")
668
+ insight_text = (
669
+ f"<div class='ai-insight'>"
670
+ f"The division <strong>{top_nama['nama']}</strong> has the highest average lead time "
671
+ f"(<strong>{top_nama['avg_monthly_leadtime']:.2f} days</strong>), indicating slower resolution. "
672
+ f"<strong>{low_nama['nama']}</strong> has the fastest average resolution "
673
+ f"(<strong>{low_nama['avg_monthly_leadtime']:.2f} days</strong>). "
674
+ f"Focus on improving SLA compliance in divisions with longer lead times."
675
+ f"</div>"
676
+ )
677
+ st.markdown(insight_text, unsafe_allow_html=True)
678
+ else:
679
+ st.warning("No data or all lead times are NaN for executor analysis by division.")
680
+ else:
681
+ st.warning("Columns 'nama' or 'days_to_close' not available for executor analysis (3b).")
682
+
683
+ # 3c & 3d: Reporter Frequency & Executor Lead Time by creator_name and nama_pic (Average Monthly Rate per Person)
684
+ col_3c, col_3d = st.columns(2)
685
+
686
+ with col_3c:
687
+ st.markdown("<h5>3c. Average Finding Rate per Reporter (Name)</h5>", unsafe_allow_html=True)
688
+ if 'creator_name' in df_local.columns:
689
+ # Hitung temuan per bulan per creator_name
690
+ findings_by_creator_month = df_local.groupby(['created_month', 'creator_name']).size().reset_index(name='findings_count')
691
+ # Hitung jumlah bulan aktif per creator_name
692
+ active_months_by_creator = findings_by_creator_month.groupby('creator_name')['created_month'].nunique().reset_index(name='active_months')
693
+ # Gabung untuk mendapatkan total temuan per creator
694
+ total_findings_by_creator = findings_by_creator_month.groupby('creator_name')['findings_count'].sum().reset_index()
695
+ # Gabung semua
696
+ merged_rep_creator = total_findings_by_creator.merge(active_months_by_creator, on='creator_name', how='outer')
697
+ # Isi NaN dengan 0
698
+ merged_rep_creator = merged_rep_creator.fillna({'findings_count': 0, 'active_months': 0})
699
+ # Filter untuk menghindari pembagian dengan nol (jika seseorang tidak aktif sepanjang periode)
700
+ merged_rep_creator = merged_rep_creator[merged_rep_creator['active_months'] > 0]
701
+ # Hitung rata-rata bulanan (ignore NaN)
702
+ merged_rep_creator['avg_monthly_rate'] = merged_rep_creator['findings_count'] / merged_rep_creator['active_months']
703
+ merged_rep_creator['avg_monthly_rate'] = merged_rep_creator['avg_monthly_rate'].replace([np.inf, -np.inf], np.nan)
704
+
705
+ # Filter hasil akhir untuk menghindari NaN
706
+ avg_rate_per_creator = merged_rep_creator.dropna(subset=['avg_monthly_rate'])
707
+ if not avg_rate_per_creator.empty:
708
+ # Tambahkan kolom untuk warna KE DATAFRAME
709
+ # Urutkan untuk menentukan 5 teratas
710
+ avg_rate_per_creator_sorted = avg_rate_per_creator.sort_values('avg_monthly_rate', ascending=True)
711
+ top_5_indices = avg_rate_per_creator_sorted.tail(5).index
712
+ # Buat warna default, lalu ubah untuk top 5
713
+ avg_rate_per_creator_sorted['color'] = '#1f77b4' # Warna default plotly
714
+ avg_rate_per_creator_sorted.loc[avg_rate_per_creator_sorted.index.isin(top_5_indices), 'color'] = '#4CAF50' # Warna hijau untuk top 5
715
+
716
+ # Pilihan sorting
717
+ sort_option_3c = st.selectbox("Sort 3c by:", ["Lowest First", "Highest First"], key='sort_3c')
718
+ if sort_option_3c == "Highest First":
719
+ avg_rate_per_creator_sorted = avg_rate_per_creator_sorted.sort_values('avg_monthly_rate', ascending=False)
720
+ # Jika "Lowest First", sudah diurutkan ascending di atas
721
+
722
+ # Ambil top 10 untuk visualisasi
723
+ top10_creators = avg_rate_per_creator_sorted.tail(1000) # Ambil 10 terakhir setelah sorting
724
+ fig_rep_creator = px.bar(
725
+ top10_creators,
726
+ x='avg_monthly_rate',
727
+ y='creator_name',
728
+ orientation='h',
729
+ title='Avg Monthly Finding by Creator Name',
730
+ labels={'avg_monthly_rate': 'Avg Monthly Finding Rate', 'creator_name': 'Creator Name'},
731
+ color='color', # Gunakan nama kolom yang ditambahkan
732
+ color_discrete_map={c: c for c in top10_creators['color'].unique()}, # Peta warna
733
+ text=top10_creators['avg_monthly_rate'].apply(lambda x: f'{x:.2f}') # Format 2 angka desimal
734
+ )
735
+ # Hapus legend untuk warna karena tidak informatif
736
+ fig_rep_creator.update_layout(yaxis={'categoryorder': 'total ascending'}, height=500, showlegend=False)
737
+ fig_rep_creator.update_traces(textposition='auto') # Posisi teks otomatis
738
+ st.plotly_chart(fig_rep_creator, use_container_width=True)
739
+
740
+ # AI Insight for 3c
741
+ top_creator = avg_rate_per_creator_sorted.iloc[-1] if not avg_rate_per_creator_sorted.empty else None
742
+ low_creator = avg_rate_per_creator_sorted.iloc[0] if not avg_rate_per_creator_sorted.empty else None
743
+ if top_creator is not None and low_creator is not None:
744
+ st.markdown("### Insight")
745
+ insight_text = (
746
+ f"<div class='ai-insight'>"
747
+ f"The reporter <strong>{top_creator['creator_name']}</strong> has the highest average monthly finding rate "
748
+ f"(<strong>{top_creator['avg_monthly_rate']:.2f}</strong>), indicating active engagement. "
749
+ f"<strong>{low_creator['creator_name']}</strong> has the lowest rate "
750
+ f"(<strong>{low_creator['avg_monthly_rate']:.2f}</strong>), which might indicate lower activity or under-reporting. "
751
+ f"Recognize high performers and investigate low performers."
752
+ f"</div>"
753
+ )
754
+ st.markdown(insight_text, unsafe_allow_html=True)
755
+ else:
756
+ st.warning("No data or all rates are NaN for reporter analysis by creator_name.")
757
+ else:
758
+ st.warning("Column 'creator_name' not available for reporter analysis (3c).")
759
+
760
+ with col_3d:
761
+ st.markdown("<h5>3d. Average Lead Time by Executor (Name)</h5>", unsafe_allow_html=True)
762
+ if 'nama_pic' in df_local.columns and 'days_to_close' in df_local.columns:
763
+ # Hitung rata-rata lead time per executor per bulan
764
+ leadtime_by_executor_month = df_local.groupby(['created_month', 'nama_pic'])['days_to_close'].mean().reset_index(name='avg_leadtime')
765
+ # Hitung jumlah bulan aktif per executor
766
+ active_months_by_executor = leadtime_by_executor_month.groupby('nama_pic')['created_month'].nunique().reset_index(name='active_months')
767
+ # Hitung total lead time per executor
768
+ total_leadtime_by_executor = leadtime_by_executor_month.groupby('nama_pic')['avg_leadtime'].sum().reset_index()
769
+ # Gabung semua
770
+ merged_exec_pic = total_leadtime_by_executor.merge(active_months_by_executor, on='nama_pic', how='outer')
771
+ # Isi NaN dengan 0
772
+ merged_exec_pic = merged_exec_pic.fillna({'avg_leadtime': 0, 'active_months': 0})
773
+ # Filter untuk menghindari pembagian dengan nol
774
+ merged_exec_pic = merged_exec_pic[merged_exec_pic['active_months'] > 0]
775
+ # Hitung rata-rata bulanan (ignore NaN)
776
+ merged_exec_pic['avg_monthly_leadtime'] = merged_exec_pic['avg_leadtime'] / merged_exec_pic['active_months']
777
+ merged_exec_pic['avg_monthly_leadtime'] = merged_exec_pic['avg_monthly_leadtime'].replace([np.inf, -np.inf], np.nan)
778
+
779
+ # Filter hasil akhir untuk menghindari NaN
780
+ avg_leadtime_per_executor = merged_exec_pic.dropna(subset=['avg_monthly_leadtime'])
781
+ if not avg_leadtime_per_executor.empty:
782
+ # Tambahkan kolom untuk warna KE DATAFRAME
783
+ # Urutkan untuk menentukan 5 teratas
784
+ avg_leadtime_per_executor_sorted = avg_leadtime_per_executor.sort_values('avg_monthly_leadtime', ascending=True)
785
+ top_5_indices = avg_leadtime_per_executor_sorted.tail(5).index
786
+ # Buat warna default, lalu ubah untuk top 5
787
+ avg_leadtime_per_executor_sorted['color'] = '#1f77b4' # Warna default plotly
788
+ avg_leadtime_per_executor_sorted.loc[avg_leadtime_per_executor_sorted.index.isin(top_5_indices), 'color'] = '#D32F2F' # Warna merah untuk top 5
789
+
790
+ # Pilihan sorting
791
+ sort_option_3d = st.selectbox("Sort 3d by:", ["Fastest First", "Slowest First"], key='sort_3d')
792
+ if sort_option_3d == "Slowest First":
793
+ avg_leadtime_per_executor_sorted = avg_leadtime_per_executor_sorted.sort_values('avg_monthly_leadtime', ascending=False)
794
+ # Jika "Fastest First", sudah diurutkan ascending di atas
795
+
796
+ # Ambil top 10 untuk visualisasi
797
+ top10_executors = avg_leadtime_per_executor_sorted.nlargest(1000, 'avg_monthly_leadtime') # Ambil 10 terlama
798
+ fig_exec_pic = px.bar(
799
+ top10_executors,
800
+ x='avg_monthly_leadtime',
801
+ y='nama_pic',
802
+ orientation='h',
803
+ title='Avg Monthly Lead Time by Executor (Name)',
804
+ labels={'avg_monthly_leadtime': 'Avg Monthly Lead Time (Days)', 'nama_pic': 'Executor Name'},
805
+ color='color', # Gunakan nama kolom yang ditambahkan
806
+ color_discrete_map={c: c for c in top10_executors['color'].unique()}, # Peta warna
807
+ text=top10_executors['avg_monthly_leadtime'].apply(lambda x: f'{x:.2f}') # Format 2 angka desimal
808
+ )
809
+ # Hapus legend untuk warna karena tidak informatif
810
+ fig_exec_pic.update_layout(yaxis={'categoryorder': 'total ascending'}, height=500, showlegend=False)
811
+ fig_exec_pic.update_traces(textposition='auto') # Posisi teks otomatis
812
+ st.plotly_chart(fig_exec_pic, use_container_width=True)
813
+
814
+ # AI Insight for 3d
815
+ top_executor = avg_leadtime_per_executor_sorted.iloc[-1] if not avg_leadtime_per_executor_sorted.empty else None
816
+ low_executor = avg_leadtime_per_executor_sorted.iloc[0] if not avg_leadtime_per_executor_sorted.empty else None
817
+ if top_executor is not None and low_executor is not None:
818
+ st.markdown("### Insight")
819
+ insight_text = (
820
+ f"<div class='ai-insight'>"
821
+ f"The executor <strong>{top_executor['nama_pic']}</strong> has the highest average monthly lead time "
822
+ f"(<strong>{top_executor['avg_monthly_leadtime']:.2f} days</strong>), indicating slower resolution. "
823
+ f"<strong>{low_executor['nama_pic']}</strong> resolves tasks fastest on average "
824
+ f"(<strong>{low_executor['avg_monthly_leadtime']:.2f} days</strong>). "
825
+ f"Focus on improving SLA compliance for executors with longer lead times."
826
+ f"</div>"
827
+ )
828
+ st.markdown(insight_text, unsafe_allow_html=True)
829
+ else:
830
+ st.warning("No data or all lead times are NaN for executor analysis by nama_pic.")
831
+ else:
832
+ st.warning("Columns 'nama_pic' or 'days_to_close' not available for executor analysis (3d).")
833
+ ####OBJECTIVE 4
834
+ try:
835
+ from wordcloud import WordCloud
836
+ import matplotlib.pyplot as plt
837
+ WORDCLOUD_AVAILABLE = True
838
+ except ImportError:
839
+ WORDCLOUD_AVAILABLE = False
840
+ st.warning("⚠️ Library `wordcloud` atau `matplotlib` tidak ditemukan. Install dengan `pip install wordcloud matplotlib` untuk fitur WordCloud.")
841
+
842
+ if WORDCLOUD_AVAILABLE:
843
+ st.markdown("<h3 class='section-title'>4. Global Text Insights (Word Clouds)</h3>", unsafe_allow_html=True)
844
+
845
+ col_wc1, col_wc2, col_wc3 = st.columns(3)
846
+
847
+ # Fungsi untuk membuat dan menampilkan wordcloud
848
+ def generate_wordcloud(text_data, title, col):
849
+ # Periksa apakah text_data adalah Series kosong atau None
850
+ if text_data is None or text_data.empty:
851
+ col.warning(f"No data available in series for {title}.")
852
+ return
853
+ # Periksa apakah semua nilai adalah NaN
854
+ if text_data.isna().all():
855
+ col.warning(f"All data is NaN for {title}.")
856
+ return
857
+ # Gabung semua teks menjadi satu string
858
+ text = ' '.join(text_data.dropna().astype(str))
859
+ # Bersihkan teks dari karakter non-alfanumerik (opsional)
860
+ import re
861
+ text = re.sub(r'[^a-zA-Z\s]', ' ', text)
862
+ if text.strip(): # Pastikan teks tidak kosong setelah pembersihan
863
+ # Buat WordCloud
864
+ wordcloud = WordCloud(
865
+ width=400,
866
+ height=300,
867
+ background_color='white',
868
+ colormap='viridis',
869
+ max_words=100,
870
+ relative_scaling=0.5,
871
+ random_state=42
872
+ ).generate(text)
873
+
874
+ # Plot menggunakan matplotlib
875
+ fig, ax = plt.subplots(figsize=(8, 6))
876
+ ax.imshow(wordcloud, interpolation='bilinear')
877
+ ax.axis('off')
878
+ ax.set_title(title, fontsize=16)
879
+ plt.tight_layout()
880
+
881
+ # Tampilkan di Streamlit
882
+ col.pyplot(fig, use_container_width=True)
883
+ else:
884
+ col.warning(f"No valid text data for {title} after cleaning.")
885
+
886
+ # Kolom Judul
887
+ with col_wc1:
888
+ if 'judul' in df_local.columns:
889
+ generate_wordcloud(df_local['judul'], "Word Cloud: Judul", col_wc1)
890
+ else:
891
+ col_wc1.warning("Column 'judul' not available.")
892
+
893
+ # Kolom Kondisi
894
+ with col_wc2:
895
+ if 'kondisi' in df_local.columns:
896
+ generate_wordcloud(df_local['kondisi'], "Word Cloud: Kondisi", col_wc2)
897
+ else:
898
+ col_wc2.warning("Column 'kondisi' not available.")
899
+
900
+ # Kolom Rekomendasi
901
+ with col_wc3:
902
+ if 'rekomendasi' in df_local.columns:
903
+ generate_wordcloud(df_local['rekomendasi'], "Word Cloud: Rekomendasi", col_wc3)
904
+ else:
905
+ col_wc3.warning("Column 'rekomendasi' not available.")
906
+ else:
907
+ st.markdown("<h3 class='section-title'>4. Global Text Insights (Word Clouds)</h3>", unsafe_allow_html=True)
908
+ st.info("WordCloud library not installed. Install `wordcloud` and `matplotlib` to enable this feature.")
909
+
910
+ # =================== 5. Matrix (Tetap Dipertahankan) ===================
911
+ st.markdown("<h3 class='section-title'>OBJECTIVE 5 - Findings vs Lead Time: Which Companies Move Slow?</h3>", unsafe_allow_html=True)
912
+
913
+ import math
914
+ import plotly.express as px
915
+ import pandas as pd
916
+ try:
917
+ df_local_matrix = df.copy()
918
+ # ============================
919
+ # 0. Filter: ONLY 1 COMPANY & 1 PROFILE (if applicable)
920
+ # ============================
921
+ # (Skipped for general dashboard view)
922
+ # ============================
923
+ # 1. Exclude Positive findings
924
+ # ============================
925
+ if 'temuan_kategori' in df_local_matrix.columns:
926
+ df_local_matrix = df_local_matrix[df_local_matrix["temuan_kategori"] != "Positive"]
927
+ # ============================
928
+ # 2. Ensure datetime columns
929
+ # ============================
930
+ df_local_matrix['created_at'] = pd.to_datetime(df_local_matrix['created_at'], errors='coerce')
931
+ df_local_matrix['close_at'] = pd.to_datetime(df_local_matrix['close_at'], errors='coerce')
932
+ # ============================
933
+ # 3. Compute LEAD TIME
934
+ # ============================
935
+ df_local_matrix['lead_time_days'] = (df_local_matrix['close_at'] - df_local_matrix['created_at']).dt.days
936
+ df_local_matrix['lead_time_days'] = df_local_matrix['lead_time_days'].fillna(0)
937
+ # ============================
938
+ # 4. Average Monthly Finding Count per Operator
939
+ # ============================
940
+ if 'nama' not in df_local_matrix.columns:
941
+ st.error("❌ Kolom 'nama' (operator) tidak ditemukan.")
942
+ # st.stop() # Stop bisa dihilangkan agar script tetap jalan
943
+ else:
944
+ # Buat kolom bulan (YYYY-MM)
945
+ df_local_matrix = df_local_matrix.assign(month=df_local_matrix['created_at'].dt.to_period('M').astype(str))
946
+ # Hitung jumlah temuan per operator per bulan
947
+ monthly_counts = (
948
+ df_local_matrix
949
+ .groupby(['nama', 'month'])['kode_temuan']
950
+ .nunique()
951
+ .reset_index(name='monthly_count')
952
+ )
953
+ # Hitung rata-rata bulanan per operator
954
+ operator_avg = (
955
+ monthly_counts
956
+ .groupby('nama')['monthly_count']
957
+ .mean() # <-- RATA-RATA per bulan (bukan total!)
958
+ .reset_index(name='Finding Count')
959
+ )
960
+ # ============================
961
+ # 5. Average Lead Time per Operator
962
+ # ============================
963
+ operator_lead = (
964
+ df_local_matrix.groupby('nama')['lead_time_days']
965
+ .mean()
966
+ .reset_index(name='Average Lead Time')
967
+ )
968
+ # ============================
969
+ # 6. Merge Risk Matrix
970
+ # ============================
971
+ risk_matrix = operator_avg.merge(operator_lead, on='nama', how='left')
972
+ risk_matrix = risk_matrix.rename(columns={'nama': 'Operator Name'})
973
+ # Handle operator tanpa lead time (e.g., belum closed)
974
+ risk_matrix['Average Lead Time'] = risk_matrix['Average Lead Time'].fillna(0)
975
+ # ============================
976
+ # 7. Quadrant Logic (unchanged)
977
+ # ============================
978
+ X_LIMIT = 20
979
+ Y_LIMIT = 3
980
+ def assign_quadrant(row):
981
+ if row['Finding Count'] >= X_LIMIT and row['Average Lead Time'] >= Y_LIMIT:
982
+ return "Quadrant I – High Leadtime & High Count"
983
+ elif row['Finding Count'] < X_LIMIT and row['Average Lead Time'] >= Y_LIMIT:
984
+ return "Quadrant II – High Leadtime but Low Count"
985
+ elif row['Finding Count'] >= X_LIMIT and row['Average Lead Time'] < Y_LIMIT:
986
+ return "Quadrant III – Low Leadtime but High Count"
987
+ else:
988
+ return "Quadrant IV – Low Leadtime & Low Count"
989
+ risk_matrix['quadrant'] = risk_matrix.apply(assign_quadrant, axis=1)
990
+ quadrant_count = risk_matrix['quadrant'].value_counts()
991
+ # ============================
992
+ # 8. Scatter Plot (format visual tetap sam persis)
993
+ # ============================
994
+ max_x = risk_matrix['Finding Count'].max() + 1
995
+ max_y = risk_matrix['Average Lead Time'].max() + 5
996
+ fig = px.scatter(
997
+ risk_matrix,
998
+ x='Finding Count',
999
+ y='Average Lead Time',
1000
+ hover_name="Operator Name",
1001
+ size=[12] * len(risk_matrix),
1002
+ size_max=15,
1003
+ title="Audit Findings Risk Matrix: Avg Monthly Count vs Lead Time"
1004
+ )
1005
+ # Background quadrant (same as original)
1006
+ fig.add_shape(type="rect", x0=X_LIMIT, x1=max_x, y0=Y_LIMIT, y1=max_y,
1007
+ fillcolor="rgba(255,0,0,0.25)", line_width=0) # Q1
1008
+ fig.add_shape(type="rect", x0=0, x1=X_LIMIT, y0=Y_LIMIT, y1=max_y,
1009
+ fillcolor="rgba(255,150,50,0.25)", line_width=0) # Q2
1010
+ fig.add_shape(type="rect", x0=X_LIMIT, x1=max_x, y0=0, y1=Y_LIMIT,
1011
+ fillcolor="rgba(255,200,200,0.25)", line_width=0) # Q3
1012
+ fig.add_shape(type="rect", x0=0, x1=X_LIMIT, y0=0, y1=Y_LIMIT,
1013
+ fillcolor="rgba(0,120,255,0.15)", line_width=0) # Q4
1014
+ fig.add_vline(x=X_LIMIT, line_dash="dash", line_color="black")
1015
+ fig.add_hline(y=Y_LIMIT, line_dash="dash", line_color="black")
1016
+ # Quadrant count annotations (same positions & style)
1017
+ fig.add_annotation(x=X_LIMIT + (max_x - X_LIMIT)/2,
1018
+ y=Y_LIMIT + (max_y - Y_LIMIT)/2,
1019
+ text=f"<b>{quadrant_count.get('Quadrant I – High Leadtime & High Count',0)}</b>",
1020
+ showarrow=False, font=dict(size=22, color="darkred"))
1021
+ fig.add_annotation(x=X_LIMIT/2,
1022
+ y=Y_LIMIT + (max_y - Y_LIMIT)/2,
1023
+ text=f"<b>{quadrant_count.get('Quadrant II – High Leadtime but Low Count',0)}</b>",
1024
+ showarrow=False, font=dict(size=22, color="orange"))
1025
+ fig.add_annotation(x=X_LIMIT + (max_x - X_LIMIT)/2,
1026
+ y=Y_LIMIT/2,
1027
+ text=f"<b>{quadrant_count.get('Quadrant III – Low Leadtime but High Count',0)}</b>",
1028
+ showarrow=False, font=dict(size=22, color="red"))
1029
+ fig.add_annotation(x=X_LIMIT/2,
1030
+ y=Y_LIMIT/2,
1031
+ text=f"<b>{quadrant_count.get('Quadrant IV – Low Leadtime & Low Count',0)}</b>",
1032
+ showarrow=False, font=dict(size=22, color="green"))
1033
+ st.plotly_chart(fig, use_container_width=True)
1034
+ # ============================
1035
+ # 9. Summary Table
1036
+ # ============================
1037
+ st.subheader("Summary (Avg Monthly Count vs Avg Lead Time)")
1038
+ st.dataframe(
1039
+ risk_matrix.sort_values("Finding Count", ascending=False),
1040
+ use_container_width=True
1041
+ )
1042
+ except Exception as e:
1043
+ st.error(f"⚠️ Error Risk Matrix: {e}")
1044
+ # st.exception(e) # Uncomment for debugging
1045
+
1046
+ # =================== 6. ✅ AI INSIGHT ENGINE (BARU - BERDASARKAN DATA & RATIO) ===================
1047
+ st.markdown("## 6. Insight & Recommendation")
1048
+
1049
+ def compute_ai_insights(df: pd.DataFrame) -> List[dict]:
1050
+ """
1051
+ Generates insights and recommendations based on the current data and average monthly ratios.
1052
+ Returns a list of dictionaries, each containing an 'insight' and a 'recommendation'.
1053
+ """
1054
+ insight_recommendations = []
1055
+
1056
+ if df.empty:
1057
+ return insight_recommendations
1058
+
1059
+ total_findings = len(df)
1060
+ total_locations = df['nama_lokasi_full'].nunique() if 'nama_lokasi_full' in df.columns else 0
1061
+ total_companies = df['nama_perusahaan'].nunique() if 'nama_perusahaan' in df.columns else 0
1062
+ total_divisions = df['nama'].nunique() if 'nama' in df.columns else 0
1063
+
1064
+ # --- 1. Insight & Recommendation: Rata-rata Bulanan Ratio Temuan/Orang Perusahaan ---
1065
+ if 'nama_perusahaan' in df.columns and 'creator_nid' in df.columns:
1066
+ df_with_month = df.copy()
1067
+ df_with_month['created_month'] = df_with_month['created_at'].dt.to_period('M')
1068
+
1069
+ # Hitung temuan per bulan per perusahaan
1070
+ findings_by_company_month = df_with_month.groupby(['created_month', 'nama_perusahaan']).size().reset_index(name='findings_count')
1071
+ # Hitung jumlah orang unik per bulan per perusahaan
1072
+ creators_by_company_month = df_with_month.groupby(['created_month', 'nama_perusahaan'])['creator_nid'].nunique().reset_index(name='unique_creators')
1073
+ # Gabung
1074
+ merged_ratio = findings_by_company_month.merge(creators_by_company_month, on=['created_month', 'nama_perusahaan'], how='outer')
1075
+ # Filter untuk menghindari pembagian dengan nol
1076
+ merged_ratio = merged_ratio[merged_ratio['unique_creators'] > 0]
1077
+ # Hitung rasio (ignore NaN)
1078
+ merged_ratio['ratio'] = merged_ratio['findings_count'] / merged_ratio['unique_creators']
1079
+ merged_ratio['ratio'] = merged_ratio['ratio'].replace([np.inf, -np.inf], np.nan)
1080
+
1081
+ # Rata-rata bulanan per perusahaan
1082
+ avg_ratio_per_company = merged_ratio.groupby('nama_perusahaan')['ratio'].mean().reset_index(name='avg_monthly_ratio')
1083
+ # Filter hasil akhir untuk menghindari NaN
1084
+ avg_ratio_per_company = avg_ratio_per_company.dropna(subset=['avg_monthly_ratio'])
1085
+
1086
+ if not avg_ratio_per_company.empty:
1087
+ # Temukan perusahaan dengan rasio tertinggi dan terendah
1088
+ top_company_ratio = avg_ratio_per_company.loc[avg_ratio_per_company['avg_monthly_ratio'].idxmax()]
1089
+ low_company_ratio = avg_ratio_per_company.loc[avg_ratio_per_company['avg_monthly_ratio'].idxmin()]
1090
+
1091
+ insight_text = (
1092
+ f"Based on the average monthly finding-to-person ratio, "
1093
+ f"Company '{top_company_ratio['nama_perusahaan']}' has the highest activity level ({top_company_ratio['avg_monthly_ratio']:.2f} findings/person/month), "
1094
+ f"while '{low_company_ratio['nama_perusahaan']}' has the lowest ({low_company_ratio['avg_monthly_ratio']:.2f} findings/person/month)."
1095
+ )
1096
+ recommendation_text = (
1097
+ f"For '{top_company_ratio['nama_perusahaan']}': Investigate the underlying reasons for the high ratio. Is it due to active reporting, higher risk, or more personnel? "
1098
+ f"For '{low_company_ratio['nama_perusahaan']}': Verify if the low ratio reflects effective risk management or potential under-reporting."
1099
+ )
1100
+ insight_recommendations.append({"insight": insight_text, "recommendation": recommendation_text})
1101
+
1102
+ # --- 2. Insight & Recommendation: Distribusi Temuan (Umum) ---
1103
+ if 'temuan_kategori' in df.columns:
1104
+ cat_counts = df['temuan_kategori'].value_counts()
1105
+ top_cat = cat_counts.index[0] if not cat_counts.empty else "N/A"
1106
+ top_cat_count = cat_counts.iloc[0] if not cat_counts.empty else 0
1107
+ if top_cat != "N/A":
1108
+ perc = (top_cat_count / total_findings) * 100
1109
+ if top_cat == "Positive":
1110
+ insight_text = (
1111
+ f"The majority of findings ({top_cat_count} or {perc:.1f}%) are categorized as 'Positive'. "
1112
+ f"This indicates a strong culture of recognizing and reporting good practices and safety compliance."
1113
+ )
1114
+ recommendation_text = (
1115
+ f"Maintain and reinforce the positive reporting culture. "
1116
+ f"Consider using these 'Positive' examples as best practice case studies for training and awareness programs."
1117
+ )
1118
+ else:
1119
+ insight_text = (
1120
+ f"The most frequent finding category is '{top_cat}' ({top_cat_count} instances, {perc:.1f}% of total). "
1121
+ f"This highlights a specific area requiring focused attention."
1122
+ )
1123
+ recommendation_text = (
1124
+ f"Conduct a root-cause analysis for the '{top_cat}' category. "
1125
+ f"Develop targeted corrective actions and preventive measures to address the underlying issues."
1126
+ )
1127
+ insight_recommendations.append({"insight": insight_text, "recommendation": recommendation_text})
1128
+
1129
+ # --- 3. Insight & Recommendation: Aktivitas Lokasi (Umum) ---
1130
+ if 'nama_lokasi_full' in df.columns and total_locations > 0:
1131
+ loc_counts = df['nama_lokasi_full'].value_counts()
1132
+ top_loc = loc_counts.index[0] if not loc_counts.empty else "N/A"
1133
+ top_loc_count = loc_counts.iloc[0] if not loc_counts.empty else 0
1134
+ if top_loc != "N/A":
1135
+ insight_text = (
1136
+ f"Location '{top_loc}' has the highest number of findings ({top_loc_count}). "
1137
+ f"This could indicate higher activity, more scrutiny, or potentially higher risk in this area."
1138
+ )
1139
+ recommendation_text = (
1140
+ f"Perform a detailed review of activities in '{top_loc}'. "
1141
+ f"Determine if the high volume is due to increased activity or specific risk factors. "
1142
+ f"Ensure adequate resources and controls are in place."
1143
+ )
1144
+ insight_recommendations.append({"insight": insight_text, "recommendation": recommendation_text})
1145
+
1146
+ # --- 4. Insight & Recommendation: Kinerja Resolusi (Umum) ---
1147
+ if 'days_to_close' in df.columns:
1148
+ closed_df = df.dropna(subset=['days_to_close'])
1149
+ if not closed_df.empty:
1150
+ avg_close_time = closed_df['days_to_close'].mean()
1151
+ median_close_time = closed_df['days_to_close'].median()
1152
+ # Ambang batas SLA, misal 7 hari
1153
+ sla_threshold = 7
1154
+ slow_findings = closed_df[closed_df['days_to_close'] > sla_threshold]
1155
+ slow_count = len(slow_findings)
1156
+ slow_percentage = (slow_count / len(closed_df)) * 100 if len(closed_df) > 0 else 0
1157
+
1158
+ insight_text = (
1159
+ f"The average time to close findings is {avg_close_time:.1f} days (median: {median_close_time:.1f} days). "
1160
+ f"{slow_count} findings ({slow_percentage:.1f}%) exceeded the {sla_threshold}-day SLA."
1161
+ )
1162
+ if slow_percentage > 20:
1163
+ recommendation_text = (
1164
+ f"The resolution performance is below target. Investigate bottlenecks in the closure process. "
1165
+ f"Prioritize findings that are taking longer than {sla_threshold} days. Consider implementing an escalation matrix."
1166
+ )
1167
+ else:
1168
+ recommendation_text = (
1169
+ f"The resolution performance is generally good, but there's room for improvement. "
1170
+ f"Focus on reducing the backlog of findings that exceed the {sla_threshold}-day SLA."
1171
+ )
1172
+ insight_recommendations.append({"insight": insight_text, "recommendation": recommendation_text})
1173
+
1174
+ # --- 5. Insight & Recommendation: Tren Bulanan (Umum) ---
1175
+ if 'created_at' in df.columns:
1176
+ monthly_trend = df.set_index('created_at').resample('M').size()
1177
+ if len(monthly_trend) >= 2:
1178
+ last_month_count = monthly_trend.iloc[-1]
1179
+ prev_month_count = monthly_trend.iloc[-2]
1180
+ if prev_month_count > 0:
1181
+ change_pct = (last_month_count - prev_month_count) / prev_month_count * 100
1182
+ trend_word = "increase" if change_pct > 0 else "decrease"
1183
+ insight_text = (
1184
+ f"There was a {change_pct:+.1f}% {trend_word} in finding volume between the last two months "
1185
+ f"({monthly_trend.index[-2].strftime('%b %Y')} and {monthly_trend.index[-1].strftime('%b %Y')})."
1186
+ )
1187
+ if abs(change_pct) > 20: # Jika perubahan besar
1188
+ recommendation_text = (
1189
+ f"Investigate the cause of this significant {trend_word} in findings. "
1190
+ f"Review operational changes, contractor activities, or audit focus shifts that occurred recently."
1191
+ )
1192
+ else:
1193
+ recommendation_text = (
1194
+ f"Monitor the trend over the next few weeks to see if this change represents a new pattern or a temporary fluctuation."
1195
+ )
1196
+ insight_recommendations.append({"insight": insight_text, "recommendation": recommendation_text})
1197
+
1198
+ # --- 6. Insight & Recommendation: Aktivitas Pelapor (Umum) ---
1199
+ if 'creator_nid' in df.columns:
1200
+ active_reporters = df['creator_nid'].nunique()
1201
+ total_reports = len(df)
1202
+ avg_reports_per_person = total_reports / active_reporters if active_reporters > 0 else 0
1203
+ # Cek apakah ada reporter dominan
1204
+ top_reporter_counts = df['creator_nid'].value_counts()
1205
+ if not top_reporter_counts.empty:
1206
+ top_reporter_id = top_reporter_counts.index[0]
1207
+ top_reporter_count = top_reporter_counts.iloc[0]
1208
+ if top_reporter_count / total_reports > 0.15: # Jika satu orang membuat > 15% laporan
1209
+ insight_text = (
1210
+ f"Reporter with ID '{top_reporter_id}' has submitted a disproportionately high number of findings ({top_reporter_count}). "
1211
+ f"They account for {top_reporter_count/total_reports*100:.1f}% of the total volume."
1212
+ )
1213
+ recommendation_text = (
1214
+ f"Recognize the active reporter. Also, ensure reporting is distributed across the team "
1215
+ f"to provide a more comprehensive view of risks across all areas and activities."
1216
+ )
1217
+ insight_recommendations.append({"insight": insight_text, "recommendation": recommendation_text})
1218
+
1219
+ return insight_recommendations
1220
+
1221
+ # Panggil fungsi untuk mendapatkan insight dan rekomendasi
1222
+ ai_insights_and_recs = compute_ai_insights(df_filtered)
1223
+
1224
+ # Tampilkan hasil
1225
+
1226
+ if ai_insights_and_recs:
1227
+ for i, item in enumerate(ai_insights_and_recs):
1228
+ insight = item["insight"]
1229
+ recommendation = item["recommendation"]
1230
+ # Tampilkan Insight
1231
+ st.markdown(f'<div class="ai-insight"><strong>Insight {i+1}:</strong> {insight}</div>', unsafe_allow_html=True)
1232
+ # Tampilkan Recommendation
1233
+ st.markdown(f'<div class="ai-recommendation"><strong>Recommendation {i+1}:</strong> {recommendation}</div>', unsafe_allow_html=True)
1234
+ else:
1235
+ # Jika tidak ada insight yang dihasilkan, mungkin karena data kosong atau kolom tidak ditemukan
1236
+ st.markdown('<div class="ai-insight">No significant AI insights could be generated. This might be due to insufficient data or missing required columns after filtering.</div>', unsafe_allow_html=True)
1237
+
1238
+ # =================== FOOTER ===================
1239
+ st.markdown("---")
1240
+ st.markdown(
1241
+ """
1242
+ <div style="text-align:center; color:#757575; font-size:0.9em;">
1243
+ <strong> Special Design for PLN </strong> • © 2025 PT Bukit Technology
1244
+ </div>
1245
+ """,
1246
+ unsafe_allow_html=True
1247
+ )
btech.png ADDED
data.csv ADDED
The diff for this file is too large to render. See raw diff
 
requirements.txt CHANGED
@@ -1,3 +1,14 @@
1
  altair
2
- pandas
3
- streamlit
 
 
 
 
 
 
 
 
 
 
 
 
1
  altair
2
+ streamlit>=1.38.0
3
+ pandas>=2.2.2
4
+ numpy>=1.26.4
5
+ plotly>=5.24.1
6
+ plotly-express>=0.4.1
7
+ openpyxl>=3.1.5
8
+ python-dateutil>=2.9.0
9
+ # --- Tambahkan untuk WordCloud ---
10
+ wordcloud>=1.9.3
11
+ matplotlib>=3.8.0
12
+ # --- Tambahkan untuk Analisis Prediktif (AI Insights) ---
13
+ statsmodels>=0.14.0
14
+ # -------------------------------