NinjainPJs commited on
Commit
40ac7c3
Β·
verified Β·
1 Parent(s): ea5e15b

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +16 -5
  2. __pycache__/app.cpython-312.pyc +0 -0
  3. app.py +1051 -0
  4. requirements.txt +3 -0
README.md CHANGED
@@ -1,12 +1,23 @@
1
  ---
2
  title: EvalPulse
3
- emoji: πŸƒ
4
- colorFrom: gray
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 6.9.0
8
  app_file: app.py
9
  pinned: false
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: EvalPulse
3
+ emoji: πŸ“‘
4
+ colorFrom: indigo
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 5.23.0
8
  app_file: app.py
9
  pinned: false
10
+ license: apache-2.0
11
+ short_description: LLM Evaluation & Drift Monitoring Dashboard
12
  ---
13
 
14
+ # EvalPulse Dashboard
15
+
16
+ Open-source LLM evaluation and semantic drift monitoring platform.
17
+
18
+ This Space runs a demo dashboard with synthetic data showing EvalPulse's monitoring capabilities:
19
+ - Health Score tracking
20
+ - Hallucination detection
21
+ - Semantic drift monitoring
22
+ - RAG quality evaluation
23
+ - Response quality scoring
__pycache__/app.cpython-312.pyc ADDED
Binary file (42.2 kB). View file
 
app.py ADDED
@@ -0,0 +1,1051 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """EvalPulse Demo Dashboard β€” self-contained HuggingFace Spaces deployment.
2
+
3
+ Runs entirely on synthetic data. No external dependencies on evalpulse or
4
+ dashboard packages.
5
+ """
6
+
7
+ from __future__ import annotations
8
+
9
+ import random
10
+ from collections import defaultdict
11
+ from dataclasses import dataclass, field
12
+ from datetime import datetime, timedelta, timezone
13
+
14
+ import gradio as gr
15
+ import plotly.graph_objects as go
16
+
17
+ # ── Lightweight EvalRecord (replaces pydantic model) ─────────────────
18
+
19
+ UTC = timezone.utc
20
+
21
+
22
+ @dataclass
23
+ class EvalRecord:
24
+ """Minimal evaluation record for demo purposes."""
25
+
26
+ app_name: str = "default"
27
+ timestamp: datetime = field(default_factory=lambda: datetime.now(UTC))
28
+ query: str = ""
29
+ context: str | None = None
30
+ response: str = ""
31
+ model_name: str = "unknown"
32
+ latency_ms: int = 0
33
+ tags: list[str] = field(default_factory=list)
34
+
35
+ # Hallucination
36
+ hallucination_score: float = 0.0
37
+ hallucination_method: str = "none"
38
+ flagged_claims: list[str] = field(default_factory=list)
39
+
40
+ # Drift
41
+ embedding_vector: list[float] = field(default_factory=list)
42
+ drift_score: float | None = None
43
+
44
+ # RAG Quality
45
+ faithfulness_score: float | None = None
46
+ context_relevance: float | None = None
47
+ answer_relevancy: float | None = None
48
+ groundedness_score: float | None = None
49
+
50
+ # Response Quality
51
+ sentiment_score: float = 0.5
52
+ toxicity_score: float = 0.0
53
+ response_length: int = 0
54
+ language_detected: str = "en"
55
+ is_denial: bool = False
56
+
57
+ # Composite
58
+ health_score: int = 0
59
+
60
+
61
+ # ── Demo data generator ─────────────────────────────────────────────
62
+
63
+
64
+ def generate_demo_records(n: int = 200) -> list[EvalRecord]:
65
+ """Generate N synthetic EvalRecords with realistic distributions.
66
+
67
+ Simulates an LLM app with:
68
+ - Generally good performance (health 70-95)
69
+ - Occasional hallucination spikes
70
+ - Gradual drift over time
71
+ - Some toxic/denial responses
72
+ """
73
+ random.seed(42)
74
+ records: list[EvalRecord] = []
75
+ now = datetime.now(UTC)
76
+
77
+ queries = [
78
+ "What is machine learning?",
79
+ "Explain neural networks",
80
+ "How does RAG work?",
81
+ "What is Python used for?",
82
+ "Describe transformer architecture",
83
+ "What are embeddings?",
84
+ "How do LLMs handle context?",
85
+ "What is fine-tuning?",
86
+ "Explain attention mechanism",
87
+ "What is prompt engineering?",
88
+ ]
89
+
90
+ models = ["llama-3.1-70b", "gpt-4o-mini", "gemini-flash"]
91
+
92
+ for i in range(n):
93
+ ts = now - timedelta(hours=n - i)
94
+ query = random.choice(queries)
95
+ model = random.choice(models)
96
+
97
+ # Simulate drift: later responses drift slightly
98
+ drift_factor = i / n * 0.1
99
+
100
+ # Base scores
101
+ halluc = random.gauss(0.12, 0.08) + drift_factor * 0.5
102
+ halluc = max(0.0, min(1.0, halluc))
103
+
104
+ drift = random.gauss(0.05, 0.03) + drift_factor
105
+ drift = max(0.0, min(1.0, drift))
106
+
107
+ sentiment = random.gauss(0.7, 0.1)
108
+ sentiment = max(0.0, min(1.0, sentiment))
109
+
110
+ toxicity = abs(random.gauss(0.02, 0.02))
111
+ toxicity = max(0.0, min(1.0, toxicity))
112
+
113
+ is_denial = random.random() < 0.05
114
+ length = random.randint(20, 200)
115
+
116
+ # RAG scores (70% of calls are RAG)
117
+ is_rag = random.random() < 0.7
118
+ faith = None
119
+ ctx_rel = None
120
+ ans_rel = None
121
+ ground = None
122
+ context = None
123
+
124
+ if is_rag:
125
+ faith = random.gauss(0.75, 0.1)
126
+ faith = max(0.0, min(1.0, faith))
127
+ ctx_rel = random.gauss(0.8, 0.08)
128
+ ctx_rel = max(0.0, min(1.0, ctx_rel))
129
+ ans_rel = random.gauss(0.78, 0.09)
130
+ ans_rel = max(0.0, min(1.0, ans_rel))
131
+ ground = 0.4 * faith + 0.3 * ctx_rel + 0.3 * ans_rel
132
+ context = f"Context for: {query}"
133
+
134
+ # Compute health score
135
+ components = [(1 - halluc) * 0.35, (1 - drift) * 0.25]
136
+ if ground is not None:
137
+ components.append(ground * 0.20)
138
+ quality = (1 - toxicity) * 0.5 + sentiment * 0.4 + 0.1
139
+ components.append(quality * 0.15)
140
+ health = int(
141
+ sum(components)
142
+ / sum([0.35, 0.25] + ([0.20] if ground else []) + [0.15])
143
+ * 100
144
+ )
145
+ health = max(0, min(100, health))
146
+
147
+ record = EvalRecord(
148
+ app_name="demo-app",
149
+ timestamp=ts,
150
+ query=query,
151
+ context=context,
152
+ response=f"Demo response for: {query}",
153
+ model_name=model,
154
+ latency_ms=random.randint(50, 500),
155
+ tags=["demo"],
156
+ hallucination_score=round(halluc, 4),
157
+ hallucination_method="embedding",
158
+ drift_score=round(drift, 4),
159
+ faithfulness_score=round(faith, 4) if faith else None,
160
+ context_relevance=round(ctx_rel, 4) if ctx_rel else None,
161
+ answer_relevancy=round(ans_rel, 4) if ans_rel else None,
162
+ groundedness_score=round(ground, 4) if ground else None,
163
+ sentiment_score=round(sentiment, 4),
164
+ toxicity_score=round(toxicity, 4),
165
+ response_length=length,
166
+ language_detected="en",
167
+ is_denial=is_denial,
168
+ health_score=health,
169
+ )
170
+ records.append(record)
171
+
172
+ return records
173
+
174
+
175
+ # ── Chart helpers (inlined from dashboard/charts.py) ─────────────────
176
+
177
+ _BG = "#0a0e1a"
178
+ _SURFACE = "#111827"
179
+ _BORDER = "#1e293b"
180
+ _TEXT = "#e2e8f0"
181
+ _TEXT_DIM = "#64748b"
182
+ _CYAN = "#06d6a0"
183
+ _AMBER = "#f59e0b"
184
+ _RED = "#ef4444"
185
+ _BLUE = "#3b82f6"
186
+ _PURPLE = "#a78bfa"
187
+ _PINK = "#f472b6"
188
+
189
+ _LAYOUT_BASE: dict = dict(
190
+ paper_bgcolor="rgba(0,0,0,0)",
191
+ plot_bgcolor="rgba(0,0,0,0)",
192
+ font=dict(family="JetBrains Mono, monospace", color=_TEXT, size=11),
193
+ margin=dict(l=48, r=24, t=48, b=40),
194
+ xaxis=dict(
195
+ gridcolor="rgba(255,255,255,0.04)",
196
+ zerolinecolor="rgba(255,255,255,0.06)",
197
+ tickfont=dict(size=10, color=_TEXT_DIM),
198
+ ),
199
+ yaxis=dict(
200
+ gridcolor="rgba(255,255,255,0.04)",
201
+ zerolinecolor="rgba(255,255,255,0.06)",
202
+ tickfont=dict(size=10, color=_TEXT_DIM),
203
+ ),
204
+ legend=dict(
205
+ font=dict(size=10, color=_TEXT_DIM),
206
+ bgcolor="rgba(0,0,0,0)",
207
+ ),
208
+ )
209
+
210
+
211
+ def _apply_layout(fig: go.Figure, height: int = 320, **kwargs) -> go.Figure:
212
+ layout = {**_LAYOUT_BASE, "height": height}
213
+ layout.update(kwargs)
214
+ fig.update_layout(**layout)
215
+ return fig
216
+
217
+
218
+ def empty_figure(title: str = "", message: str = "No data available") -> go.Figure:
219
+ """Create an empty figure with a message."""
220
+ fig = go.Figure()
221
+ _apply_layout(
222
+ fig,
223
+ height=260,
224
+ xaxis=dict(visible=False),
225
+ yaxis=dict(visible=False),
226
+ annotations=[
227
+ dict(
228
+ text=f"<i>{message}</i>",
229
+ xref="paper",
230
+ yref="paper",
231
+ x=0.5,
232
+ y=0.5,
233
+ showarrow=False,
234
+ font=dict(size=13, color=_TEXT_DIM),
235
+ )
236
+ ],
237
+ )
238
+ return fig
239
+
240
+
241
+ def health_gauge_chart(score: int | None = None) -> go.Figure:
242
+ """Create a health score gauge chart (0-100)."""
243
+ if score is None:
244
+ return empty_figure("", "Awaiting first evaluation")
245
+
246
+ if score >= 75:
247
+ bar_color = _CYAN
248
+ elif score >= 40:
249
+ bar_color = _AMBER
250
+ else:
251
+ bar_color = _RED
252
+
253
+ fig = go.Figure(
254
+ go.Indicator(
255
+ mode="gauge+number",
256
+ value=score,
257
+ number=dict(
258
+ font=dict(
259
+ size=48, color=bar_color, family="JetBrains Mono, monospace"
260
+ ),
261
+ suffix="",
262
+ ),
263
+ gauge=dict(
264
+ axis=dict(
265
+ range=[0, 100],
266
+ tickcolor=_TEXT_DIM,
267
+ tickfont=dict(size=9, color=_TEXT_DIM),
268
+ dtick=25,
269
+ ),
270
+ bgcolor="rgba(255,255,255,0.03)",
271
+ bordercolor="rgba(255,255,255,0.08)",
272
+ bar=dict(color=bar_color, thickness=0.75),
273
+ steps=[
274
+ dict(range=[0, 40], color="rgba(239,68,68,0.08)"),
275
+ dict(range=[40, 75], color="rgba(245,158,11,0.06)"),
276
+ dict(range=[75, 100], color="rgba(6,214,160,0.06)"),
277
+ ],
278
+ ),
279
+ )
280
+ )
281
+ _apply_layout(fig, height=220, margin=dict(l=24, r=24, t=16, b=8))
282
+ return fig
283
+
284
+
285
+ def radar_chart(
286
+ categories: list[str],
287
+ values: list[float],
288
+ title: str = "",
289
+ ) -> go.Figure:
290
+ """Create a radar/spider chart for multi-dimensional scores."""
291
+ if not categories or not values:
292
+ return empty_figure(title, "No RAG data yet")
293
+
294
+ # Close the polygon
295
+ cats = categories + [categories[0]]
296
+ vals = values + [values[0]]
297
+
298
+ fig = go.Figure()
299
+ fig.add_trace(
300
+ go.Scatterpolar(
301
+ r=vals,
302
+ theta=cats,
303
+ fill="toself",
304
+ fillcolor=f"rgba({int(_CYAN[1:3], 16)},{int(_CYAN[3:5], 16)},{int(_CYAN[5:7], 16)},0.12)",
305
+ line=dict(color=_CYAN, width=2),
306
+ marker=dict(size=5, color=_CYAN),
307
+ )
308
+ )
309
+
310
+ _apply_layout(fig, height=340)
311
+ fig.update_layout(
312
+ polar=dict(
313
+ bgcolor="rgba(0,0,0,0)",
314
+ radialaxis=dict(
315
+ visible=True,
316
+ range=[0, 1],
317
+ gridcolor="rgba(255,255,255,0.06)",
318
+ tickfont=dict(size=8, color=_TEXT_DIM),
319
+ ),
320
+ angularaxis=dict(
321
+ gridcolor="rgba(255,255,255,0.06)",
322
+ tickfont=dict(size=10, color=_TEXT),
323
+ ),
324
+ ),
325
+ title=dict(
326
+ text=title, font=dict(size=12, color=_TEXT_DIM), x=0, xanchor="left"
327
+ ),
328
+ )
329
+ return fig
330
+
331
+
332
+ # ── Plotly dark theme for dashboard figures ──────────────────────────
333
+
334
+ _DARK_LAYOUT: dict = dict(
335
+ paper_bgcolor="rgba(0,0,0,0)",
336
+ plot_bgcolor="rgba(0,0,0,0)",
337
+ font=dict(family="JetBrains Mono, monospace", color="#94a3b8", size=11),
338
+ autosize=True,
339
+ margin=dict(l=50, r=20, t=44, b=40),
340
+ xaxis=dict(
341
+ gridcolor="rgba(255,255,255,0.04)",
342
+ tickfont=dict(size=10, color="#475569"),
343
+ ),
344
+ yaxis=dict(
345
+ gridcolor="rgba(255,255,255,0.04)",
346
+ tickfont=dict(size=10, color="#475569"),
347
+ ),
348
+ legend=dict(font=dict(size=10, color="#64748b"), bgcolor="rgba(0,0,0,0)"),
349
+ )
350
+
351
+
352
+ def _dark(fig: go.Figure, **kw) -> go.Figure:
353
+ """Apply dark theme to a Plotly figure."""
354
+ layout = {**_DARK_LAYOUT, **kw}
355
+ fig.update_layout(**layout)
356
+ return fig
357
+
358
+
359
+ # ── Data layer (demo-only) ───────────────────────────────────────────
360
+
361
+ _DEMO_RECORDS: list[EvalRecord] | None = None
362
+
363
+
364
+ def _fetch_records(limit: int = 500) -> list[EvalRecord]:
365
+ """Return cached demo records (generated once on first call)."""
366
+ global _DEMO_RECORDS
367
+ if _DEMO_RECORDS is None:
368
+ _DEMO_RECORDS = generate_demo_records(200)
369
+ return _DEMO_RECORDS[:limit]
370
+
371
+
372
+ def _fetch_alerts(limit: int = 20) -> list:
373
+ """No real alerts in demo mode."""
374
+ return []
375
+
376
+
377
+ # ── KPI card HTML helper ────────────────────────────────────────────
378
+
379
+
380
+ def _kpi_card(label: str, value: str, sub: str, color: str) -> str:
381
+ return f"""<div style="
382
+ background:linear-gradient(145deg,#111827,#0f172a);
383
+ border:1px solid #1e293b;
384
+ border-radius:14px;
385
+ padding:18px 20px;
386
+ border-top:2.5px solid {color};
387
+ min-height:90px;
388
+ min-width:0;
389
+ width:100%;
390
+ box-sizing:border-box;
391
+ overflow:hidden;
392
+ ">
393
+ <div style="
394
+ font-family:'JetBrains Mono',monospace;
395
+ font-size:0.62em;font-weight:600;
396
+ text-transform:uppercase;letter-spacing:1.5px;
397
+ color:#64748b;margin-bottom:8px;
398
+ ">{label}</div>
399
+ <div style="
400
+ font-family:'Outfit',sans-serif;
401
+ font-size:1.8em;font-weight:700;
402
+ color:{color};line-height:1;margin-bottom:5px;
403
+ ">{value}</div>
404
+ <div style="
405
+ font-family:'JetBrains Mono',monospace;
406
+ font-size:0.68em;color:#475569;
407
+ ">{sub}</div>
408
+ </div>"""
409
+
410
+
411
+ # ── Tab 1: Overview ─────────────────────────────────────────────────
412
+
413
+
414
+ def build_overview():
415
+ records = _fetch_records(500)
416
+ alerts = _fetch_alerts(20)
417
+
418
+ if not records:
419
+ return (
420
+ _kpi_card("Health Score", "---", "no data", "#06d6a0"),
421
+ _kpi_card("Hallucination", "---", "no data", "#f59e0b"),
422
+ _kpi_card("Drift", "---", "no data", "#3b82f6"),
423
+ _kpi_card("Evaluations", "0", "", "#a78bfa"),
424
+ health_gauge_chart(None),
425
+ empty_figure("", "No evaluations yet"),
426
+ [["No alerts yet", "", "", "", "", ""]],
427
+ )
428
+
429
+ avg_health = int(sum(r.health_score for r in records) / len(records))
430
+ avg_halluc = sum(r.hallucination_score for r in records) / len(records)
431
+ drift_vals = [r.drift_score for r in records if r.drift_score is not None]
432
+ avg_drift = sum(drift_vals) / len(drift_vals) if drift_vals else None
433
+
434
+ if avg_health >= 90:
435
+ h_sub = "HEALTHY"
436
+ elif avg_health >= 75:
437
+ h_sub = "MONITORING"
438
+ elif avg_health >= 60:
439
+ h_sub = "DEGRADING"
440
+ else:
441
+ h_sub = "CRITICAL"
442
+
443
+ d_val = f"{avg_drift:.3f}" if avg_drift is not None else "..."
444
+ d_sub = (
445
+ "STABLE"
446
+ if avg_drift is not None and avg_drift < 0.15
447
+ else "DRIFTING"
448
+ if avg_drift is not None
449
+ else "BUILDING BASELINE"
450
+ )
451
+
452
+ sorted_recs = sorted(records, key=lambda r: r.timestamp)
453
+ times = [r.timestamp.strftime("%m-%d %H:%M") for r in sorted_recs]
454
+ scores = [r.health_score for r in sorted_recs]
455
+
456
+ trend = go.Figure()
457
+ trend.add_trace(
458
+ go.Scatter(
459
+ x=times,
460
+ y=scores,
461
+ mode="lines",
462
+ name="Health Score",
463
+ line=dict(color="#06d6a0", width=2, shape="spline"),
464
+ fill="tozeroy",
465
+ fillcolor="rgba(6,214,160,0.08)",
466
+ )
467
+ )
468
+ trend.add_hline(y=75, line_dash="dot", line_color="#f59e0b", line_width=1)
469
+ trend.add_hline(y=40, line_dash="dot", line_color="#ef4444", line_width=1)
470
+ _dark(
471
+ trend,
472
+ title="Health Score Trend",
473
+ yaxis=dict(range=[0, 105], **_DARK_LAYOUT["yaxis"]),
474
+ height=350,
475
+ )
476
+
477
+ alert_rows = [["---", "", "", "", "", "No alerts triggered"]]
478
+ if alerts:
479
+ alert_rows = []
480
+ for a in alerts[:20]:
481
+ alert_rows.append(
482
+ [
483
+ a.timestamp.strftime("%Y-%m-%d %H:%M"),
484
+ a.severity.upper(),
485
+ a.metric,
486
+ f"{a.value:.4f}",
487
+ f"{a.threshold:.4f}",
488
+ a.message,
489
+ ]
490
+ )
491
+
492
+ return (
493
+ _kpi_card("Health Score", str(avg_health), h_sub, "#06d6a0"),
494
+ _kpi_card(
495
+ "Hallucination", f"{avg_halluc:.1%}", f"avg of {len(records)}", "#f59e0b"
496
+ ),
497
+ _kpi_card("Drift", d_val, d_sub, "#3b82f6"),
498
+ _kpi_card("Evaluations", f"{len(records):,}", "total tracked", "#a78bfa"),
499
+ health_gauge_chart(avg_health),
500
+ trend,
501
+ alert_rows,
502
+ )
503
+
504
+
505
+ # ── Tab 2: Hallucination ────────────────────────────────────────────
506
+
507
+
508
+ def build_hallucination():
509
+ records = _fetch_records(500)
510
+ if not records:
511
+ e = empty_figure("", "No data yet")
512
+ return e, e, e, [["No data", "", "", "", ""]]
513
+
514
+ sorted_recs = sorted(records, key=lambda r: r.timestamp)
515
+ times = [r.timestamp.strftime("%m-%d %H:%M") for r in sorted_recs]
516
+ h_scores = [r.hallucination_score for r in sorted_recs]
517
+
518
+ rate = go.Figure()
519
+ rate.add_trace(
520
+ go.Scatter(
521
+ x=times,
522
+ y=h_scores,
523
+ mode="lines",
524
+ line=dict(color="#ef4444", width=2, shape="spline"),
525
+ fill="tozeroy",
526
+ fillcolor="rgba(239,68,68,0.08)",
527
+ )
528
+ )
529
+ rate.add_hline(
530
+ y=0.3,
531
+ line_dash="dot",
532
+ line_color="#f59e0b",
533
+ annotation_text="Threshold 0.3",
534
+ annotation_font_size=9,
535
+ annotation_font_color="#f59e0b",
536
+ )
537
+ _dark(
538
+ rate,
539
+ title="Hallucination Score Over Time",
540
+ yaxis=dict(range=[0, 1.05], **_DARK_LAYOUT["yaxis"]),
541
+ height=350,
542
+ )
543
+
544
+ dist = go.Figure(
545
+ go.Histogram(
546
+ x=h_scores,
547
+ nbinsx=25,
548
+ marker_color="#ef4444",
549
+ opacity=0.7,
550
+ marker_line_width=0,
551
+ )
552
+ )
553
+ dist.add_vline(x=0.3, line_dash="dot", line_color="#f59e0b")
554
+ _dark(dist, title="Score Distribution", height=300, bargap=0.05)
555
+
556
+ ms: dict[str, list[float]] = defaultdict(list)
557
+ for r in records:
558
+ ms[r.model_name].append(r.hallucination_score)
559
+ model_names = list(ms.keys())
560
+ avgs = [sum(v) / len(v) for v in ms.values()]
561
+ model_fig = go.Figure(
562
+ go.Bar(
563
+ x=model_names,
564
+ y=avgs,
565
+ marker_color=["#ef4444" if a > 0.3 else "#06d6a0" for a in avgs],
566
+ marker_line_width=0,
567
+ )
568
+ )
569
+ _dark(model_fig, title="Avg Hallucination by Model", height=300)
570
+
571
+ top = sorted(records, key=lambda r: r.hallucination_score, reverse=True)[:10]
572
+ rows = [
573
+ [
574
+ r.timestamp.strftime("%H:%M:%S"),
575
+ r.query[:50],
576
+ r.response[:60],
577
+ f"{r.hallucination_score:.3f}",
578
+ ", ".join(r.flagged_claims[:2]) if r.flagged_claims else "",
579
+ ]
580
+ for r in top
581
+ ]
582
+
583
+ return rate, dist, model_fig, rows
584
+
585
+
586
+ # ── Tab 3: Drift ────────────────────────────────────────────────────
587
+
588
+
589
+ def build_drift():
590
+ records = _fetch_records(500)
591
+ if not records:
592
+ e = empty_figure("", "No data yet")
593
+ return e, e, "No data"
594
+
595
+ sorted_recs = sorted(records, key=lambda r: r.timestamp)
596
+ drift_recs = [r for r in sorted_recs if r.drift_score is not None]
597
+
598
+ emb_recs = [
599
+ r for r in sorted_recs if r.embedding_vector and len(r.embedding_vector) > 2
600
+ ]
601
+ if len(emb_recs) >= 3:
602
+ embed = go.Figure(
603
+ go.Scatter(
604
+ x=[r.embedding_vector[0] for r in emb_recs],
605
+ y=[r.embedding_vector[1] for r in emb_recs],
606
+ mode="markers",
607
+ marker=dict(
608
+ size=8,
609
+ color=[r.hallucination_score for r in emb_recs],
610
+ colorscale=[
611
+ [0, "#06d6a0"],
612
+ [0.5, "#f59e0b"],
613
+ [1, "#ef4444"],
614
+ ],
615
+ showscale=True,
616
+ colorbar=dict(
617
+ title="Halluc",
618
+ tickfont=dict(size=9, color="#64748b"),
619
+ titlefont=dict(size=10, color="#64748b"),
620
+ ),
621
+ line=dict(width=0),
622
+ ),
623
+ text=[r.query[:30] for r in emb_recs],
624
+ hovertemplate="%{text}<br>Halluc: %{marker.color:.3f}<extra></extra>",
625
+ )
626
+ )
627
+ _dark(embed, title="Response Embedding Space", height=350)
628
+ else:
629
+ embed = empty_figure("", "Need more data for visualization")
630
+
631
+ if not drift_recs:
632
+ return (
633
+ empty_figure("", "Building baseline (need 10+ evaluations)"),
634
+ embed,
635
+ "Building baseline...",
636
+ )
637
+
638
+ times = [r.timestamp.strftime("%m-%d %H:%M") for r in drift_recs]
639
+ scores = [r.drift_score for r in drift_recs]
640
+
641
+ dfig = go.Figure()
642
+ dfig.add_trace(
643
+ go.Scatter(
644
+ x=times,
645
+ y=scores,
646
+ mode="lines",
647
+ line=dict(color="#a78bfa", width=2, shape="spline"),
648
+ fill="tozeroy",
649
+ fillcolor="rgba(167,139,250,0.08)",
650
+ )
651
+ )
652
+ dfig.add_hline(
653
+ y=0.15,
654
+ line_dash="dot",
655
+ line_color="#ef4444",
656
+ annotation_text="Threshold 0.15",
657
+ annotation_font_size=9,
658
+ annotation_font_color="#ef4444",
659
+ )
660
+ y_max = max(max(scores) * 1.2, 0.3)
661
+ _dark(
662
+ dfig,
663
+ title="Drift Score Over Time",
664
+ yaxis=dict(range=[0, y_max], **_DARK_LAYOUT["yaxis"]),
665
+ height=350,
666
+ )
667
+
668
+ avg = sum(scores) / len(scores)
669
+ if avg < 0.1:
670
+ st = "Stable"
671
+ elif avg < 0.2:
672
+ st = "Minor drift"
673
+ else:
674
+ st = "Significant drift!"
675
+
676
+ return dfig, embed, st
677
+
678
+
679
+ # ── Tab 4: RAG & Quality ────────────────────────────────────────────
680
+
681
+
682
+ def build_rag_quality():
683
+ records = _fetch_records(500)
684
+ if not records:
685
+ e = empty_figure("", "No data yet")
686
+ return e, e, e, e
687
+
688
+ sorted_recs = sorted(records, key=lambda r: r.timestamp)
689
+ times = [r.timestamp.strftime("%m-%d %H:%M") for r in sorted_recs]
690
+
691
+ qfig = go.Figure()
692
+ qfig.add_trace(
693
+ go.Scatter(
694
+ x=times,
695
+ y=[r.sentiment_score for r in sorted_recs],
696
+ mode="lines",
697
+ name="Sentiment",
698
+ line=dict(color="#3b82f6", width=2, shape="spline"),
699
+ )
700
+ )
701
+ qfig.add_trace(
702
+ go.Scatter(
703
+ x=times,
704
+ y=[r.toxicity_score for r in sorted_recs],
705
+ mode="lines",
706
+ name="Toxicity",
707
+ line=dict(color="#ef4444", width=2, shape="spline"),
708
+ )
709
+ )
710
+ _dark(
711
+ qfig,
712
+ title="Quality Metrics Over Time",
713
+ yaxis=dict(range=[0, 1.05], **_DARK_LAYOUT["yaxis"]),
714
+ height=350,
715
+ )
716
+
717
+ rag_recs = [r for r in sorted_recs if r.groundedness_score is not None]
718
+ if rag_recs:
719
+ rt = [r.timestamp.strftime("%m-%d %H:%M") for r in rag_recs]
720
+ rfig = go.Figure()
721
+ rfig.add_trace(
722
+ go.Scatter(
723
+ x=rt,
724
+ y=[r.faithfulness_score or 0 for r in rag_recs],
725
+ mode="lines",
726
+ name="Faithfulness",
727
+ line=dict(color="#06d6a0", width=2, shape="spline"),
728
+ )
729
+ )
730
+ rfig.add_trace(
731
+ go.Scatter(
732
+ x=rt,
733
+ y=[r.context_relevance or 0 for r in rag_recs],
734
+ mode="lines",
735
+ name="Context Relevance",
736
+ line=dict(color="#3b82f6", width=2, shape="spline"),
737
+ )
738
+ )
739
+ rfig.add_trace(
740
+ go.Scatter(
741
+ x=rt,
742
+ y=[r.groundedness_score or 0 for r in rag_recs],
743
+ mode="lines",
744
+ name="Groundedness",
745
+ line=dict(color="#a78bfa", width=2, dash="dash"),
746
+ )
747
+ )
748
+ _dark(
749
+ rfig,
750
+ title="RAG Quality Metrics",
751
+ yaxis=dict(range=[0, 1.05], **_DARK_LAYOUT["yaxis"]),
752
+ height=350,
753
+ )
754
+
755
+ af = sum(r.faithfulness_score or 0 for r in rag_recs) / len(rag_recs)
756
+ ac = sum(r.context_relevance or 0 for r in rag_recs) / len(rag_recs)
757
+ aa = sum(r.answer_relevancy or 0 for r in rag_recs) / len(rag_recs)
758
+ ag = sum(r.groundedness_score or 0 for r in rag_recs) / len(rag_recs)
759
+ radar = radar_chart(
760
+ ["Faithfulness", "Context Relevance", "Answer Relevancy", "Groundedness"],
761
+ [af, ac, aa, ag],
762
+ title="RAG Quality Radar",
763
+ )
764
+ else:
765
+ rfig = empty_figure("", "No RAG calls yet")
766
+ radar = empty_figure("", "No RAG data")
767
+
768
+ lang: dict[str, int] = defaultdict(int)
769
+ denials = 0
770
+ for r in records:
771
+ lang[r.language_detected] += 1
772
+ if r.is_denial:
773
+ denials += 1
774
+ bfig = go.Figure(
775
+ go.Bar(
776
+ x=list(lang.keys()),
777
+ y=list(lang.values()),
778
+ marker_color="#3b82f6",
779
+ marker_line_width=0,
780
+ )
781
+ )
782
+ _dark(
783
+ bfig,
784
+ title=f"Language Distribution | Denials: {denials}/{len(records)}",
785
+ height=300,
786
+ )
787
+
788
+ return qfig, rfig, radar, bfig
789
+
790
+
791
+ # ── CSS ─────────────────────────────────────────────────────────────
792
+
793
+ THEME_CSS = """
794
+ @import url('https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@300;400;500;600;700&family=Outfit:wght@300;400;500;600;700;800&display=swap');
795
+
796
+ body, .gradio-container {
797
+ background: #060a14 !important;
798
+ color: #e2e8f0 !important;
799
+ font-family: 'Outfit', sans-serif !important;
800
+ }
801
+ .gradio-container {
802
+ max-width: 100% !important;
803
+ width: 100% !important;
804
+ margin: 0 !important;
805
+ padding: 0 20px !important;
806
+ box-sizing: border-box !important;
807
+ overflow-x: hidden !important;
808
+ }
809
+ .main, .wrap, .contain {
810
+ max-width: 100% !important;
811
+ width: 100% !important;
812
+ overflow-x: hidden !important;
813
+ }
814
+ .app {
815
+ max-width: 100% !important;
816
+ overflow-x: hidden !important;
817
+ }
818
+ /* Plotly charts should not overflow */
819
+ .js-plotly-plot, .plotly, .plot-container, .svg-container {
820
+ max-width: 100% !important;
821
+ width: 100% !important;
822
+ overflow: hidden !important;
823
+ }
824
+ .js-plotly-plot .main-svg, .js-plotly-plot .svg-container {
825
+ max-width: 100% !important;
826
+ width: 100% !important;
827
+ }
828
+ .plot-container.plotly {
829
+ width: 100% !important;
830
+ }
831
+ /* Gradio plot wrapper */
832
+ .gr-plot, .plot-padding {
833
+ max-width: 100% !important;
834
+ overflow: hidden !important;
835
+ }
836
+
837
+ ::-webkit-scrollbar { width: 6px; }
838
+ ::-webkit-scrollbar-track { background: #0a0e1a; }
839
+ ::-webkit-scrollbar-thumb { background: #1e293b; border-radius: 3px; }
840
+
841
+ .ep-hdr {
842
+ position: relative;
843
+ padding: 24px 32px;
844
+ margin: 0 -20px 20px -20px;
845
+ background: linear-gradient(135deg, #0a0e1a 0%, #111827 50%, #0f172a 100%);
846
+ border-bottom: 1px solid rgba(6,214,160,0.15);
847
+ overflow: hidden;
848
+ box-sizing: border-box;
849
+ }
850
+ .ep-hdr::before {
851
+ content:'';position:absolute;inset:0;
852
+ background:
853
+ radial-gradient(ellipse 600px 300px at 15% 50%,rgba(6,214,160,0.06),transparent 70%),
854
+ radial-gradient(ellipse 400px 200px at 85% 30%,rgba(59,130,246,0.04),transparent 70%);
855
+ pointer-events:none;
856
+ }
857
+ .ep-hdr-in { position:relative;display:flex;align-items:center;justify-content:space-between;z-index:1; }
858
+ .ep-brand { display:flex;align-items:center;gap:14px; }
859
+ .ep-logo {
860
+ width:40px;height:40px;border-radius:10px;
861
+ background:linear-gradient(135deg,#06d6a0,#3b82f6);
862
+ display:flex;align-items:center;justify-content:center;
863
+ font-size:18px;font-weight:700;color:#060a14;
864
+ font-family:'JetBrains Mono',monospace;
865
+ box-shadow:0 0 20px rgba(6,214,160,0.3);
866
+ }
867
+ .ep-t { font-family:'Outfit';font-size:1.6em;font-weight:700;letter-spacing:-0.5px;color:#f1f5f9!important;margin:0!important; }
868
+ .ep-st { font-family:'JetBrains Mono';font-size:0.7em;color:#64748b!important;margin:3px 0 0!important;letter-spacing:0.5px;text-transform:uppercase; }
869
+ .ep-live { display:flex;align-items:center;gap:8px;font-family:'JetBrains Mono';font-size:0.72em;color:#06d6a0;letter-spacing:0.3px; }
870
+ .ep-dot {
871
+ width:7px;height:7px;border-radius:50%;background:#06d6a0;
872
+ box-shadow:0 0 8px rgba(6,214,160,0.6);
873
+ animation:pdot 2s ease-in-out infinite;
874
+ }
875
+ @keyframes pdot { 0%,100%{opacity:1} 50%{opacity:0.4} }
876
+
877
+ .tab-nav { background:transparent!important;border:none!important;gap:4px!important;padding:0 0 14px!important;border-bottom:1px solid #1e293b!important;margin-bottom:18px!important; }
878
+ .tab-nav button {
879
+ font-family:'JetBrains Mono',monospace!important;font-size:0.76em!important;font-weight:500!important;
880
+ letter-spacing:0.5px!important;text-transform:uppercase!important;color:#64748b!important;
881
+ background:transparent!important;border:1px solid transparent!important;border-radius:8px!important;
882
+ padding:8px 18px!important;transition:all 0.2s!important;
883
+ }
884
+ .tab-nav button:hover { color:#e2e8f0!important;background:rgba(255,255,255,0.03)!important; }
885
+ .tab-nav button.selected { color:#06d6a0!important;background:rgba(6,214,160,0.08)!important;border-color:rgba(6,214,160,0.2)!important; }
886
+ .tabitem { border:none!important;background:transparent!important;padding:0!important; }
887
+
888
+ table { background:#111827!important;border:1px solid #1e293b!important;border-radius:10px!important;overflow:hidden!important; }
889
+ table thead th {
890
+ background:#0f172a!important;color:#64748b!important;
891
+ font-family:'JetBrains Mono',monospace!important;font-size:0.7em!important;
892
+ font-weight:600!important;letter-spacing:0.8px!important;text-transform:uppercase!important;
893
+ padding:10px 14px!important;border-bottom:1px solid #1e293b!important;
894
+ }
895
+ table tbody td {
896
+ background:#111827!important;color:#cbd5e1!important;
897
+ font-family:'JetBrains Mono',monospace!important;font-size:0.78em!important;
898
+ padding:8px 14px!important;border-bottom:1px solid rgba(30,41,59,0.5)!important;
899
+ }
900
+ table tbody tr:hover td { background:rgba(6,214,160,0.03)!important; }
901
+
902
+ button.primary, button.secondary {
903
+ font-family:'JetBrains Mono',monospace!important;font-size:0.74em!important;
904
+ letter-spacing:0.4px!important;border-radius:8px!important;
905
+ }
906
+ button.primary { background:rgba(6,214,160,0.12)!important;color:#06d6a0!important;border:1px solid rgba(6,214,160,0.25)!important; }
907
+ button.primary:hover { background:rgba(6,214,160,0.2)!important; }
908
+ button.secondary { background:rgba(59,130,246,0.1)!important;color:#3b82f6!important;border:1px solid rgba(59,130,246,0.2)!important; }
909
+ button.secondary:hover { background:rgba(59,130,246,0.18)!important; }
910
+
911
+ .gr-row {
912
+ gap:14px!important;
913
+ flex-wrap: wrap !important;
914
+ max-width: 100% !important;
915
+ overflow: hidden !important;
916
+ }
917
+ /* Remove all white backgrounds from Gradio components */
918
+ .gr-block, .block:not(.gr-group) { border:none!important;background:transparent!important; }
919
+ .gr-padded { padding:0!important; }
920
+ .label-wrap { background:#0a0e1a!important;border:1px solid #1e293b!important;border-radius:8px!important;padding:4px 10px!important; }
921
+ .label-wrap span { color:#64748b!important;font-family:'JetBrains Mono',monospace!important;font-size:0.72em!important;letter-spacing:0.5px!important; }
922
+ /* Plot containers */
923
+ .gr-plot, .plot-wrap, .gradio-plot { background:transparent!important;border:none!important; }
924
+ div[class*="plot"] { background:transparent!important; }
925
+ /* All panel/group/box backgrounds */
926
+ .panel, .gr-panel, .gr-box, .gr-form, .gr-input-label, .gr-check-radio { background:#111827!important;border-color:#1e293b!important;color:#e2e8f0!important; }
927
+ /* File download component */
928
+ .file-preview, .upload-button { background:#111827!important;border-color:#1e293b!important;color:#94a3b8!important; }
929
+ /* Inputs and textboxes */
930
+ input, textarea, select, .gr-input { background:#111827!important;border-color:#1e293b!important;color:#e2e8f0!important; }
931
+ /* Any remaining white wrapper divs */
932
+ .contain > div, .wrap > div { background:transparent!important; }
933
+ /* Markdown text areas */
934
+ .prose, .markdown-text, .md { background:transparent!important;color:#94a3b8!important; }
935
+ /* Accordion headers */
936
+ .accordion { background:#111827!important;border-color:#1e293b!important; }
937
+ /* Prevent dataframes from causing horizontal scroll */
938
+ .dataframe, .table-wrap, .svelte-table {
939
+ max-width: 100% !important;
940
+ overflow-x: auto !important;
941
+ overflow-y: hidden !important;
942
+ }
943
+ /* KPI card row in HTML shouldn't overflow */
944
+ div[style*="display:flex"] {
945
+ flex-wrap: wrap !important;
946
+ max-width: 100% !important;
947
+ }
948
+
949
+ .ep-ftr {
950
+ margin-top:28px;padding:14px 0;border-top:1px solid #1e293b;
951
+ text-align:center;font-family:'JetBrains Mono',monospace;
952
+ font-size:0.68em;color:#334155;letter-spacing:0.3px;
953
+ }
954
+ .ep-ftr a { color:#475569;text-decoration:none; }
955
+ .ep-ftr a:hover { color:#06d6a0; }
956
+
957
+ .markdown-text h4 { color:#94a3b8!important;font-family:'Outfit',sans-serif!important; }
958
+ .markdown-text p, .markdown-text { color:#94a3b8!important; }
959
+
960
+ @media(max-width:768px) { .ep-hdr-in{flex-direction:column;gap:10px;align-items:flex-start;} }
961
+ """
962
+
963
+
964
+ # ── App ─────────────────────────────────────────────────────────────
965
+
966
+
967
+ def create_app() -> gr.Blocks:
968
+ with gr.Blocks(title="EvalPulse Dashboard", css=THEME_CSS) as app:
969
+ gr.HTML("""
970
+ <div class="ep-hdr"><div class="ep-hdr-in">
971
+ <div class="ep-brand">
972
+ <div class="ep-logo">EP</div>
973
+ <div><div class="ep-t">EvalPulse</div>
974
+ <div class="ep-st">LLM Evaluation &amp; Drift Monitor</div></div>
975
+ </div>
976
+ <div class="ep-live"><div class="ep-dot"></div>DEMO MODE</div>
977
+ </div></div>
978
+ """)
979
+
980
+ with gr.Tabs():
981
+ with gr.TabItem("Overview"):
982
+ with gr.Row():
983
+ hc = gr.HTML("Loading...")
984
+ hac = gr.HTML("Loading...")
985
+ dc = gr.HTML("Loading...")
986
+ tc = gr.HTML("Loading...")
987
+ with gr.Row():
988
+ hg = gr.Plot(label="Health Gauge")
989
+ ht = gr.Plot(label="Health Trend")
990
+ gr.Markdown("#### Recent Alerts")
991
+ at = gr.Dataframe(
992
+ headers=[
993
+ "Time",
994
+ "Severity",
995
+ "Metric",
996
+ "Value",
997
+ "Threshold",
998
+ "Message",
999
+ ],
1000
+ interactive=False,
1001
+ )
1002
+ gr.Button("Refresh", variant="primary", size="sm").click(
1003
+ fn=build_overview, outputs=[hc, hac, dc, tc, hg, ht, at]
1004
+ )
1005
+
1006
+ with gr.TabItem("Hallucination"):
1007
+ hr = gr.Plot()
1008
+ with gr.Row():
1009
+ hd = gr.Plot()
1010
+ hm = gr.Plot()
1011
+ gr.Markdown("#### Highest Hallucination Responses")
1012
+ htb = gr.Dataframe(
1013
+ headers=["Time", "Query", "Response", "Score", "Flagged"],
1014
+ interactive=False,
1015
+ )
1016
+ gr.Button("Refresh", variant="primary", size="sm").click(
1017
+ fn=build_hallucination, outputs=[hr, hd, hm, htb]
1018
+ )
1019
+
1020
+ with gr.TabItem("Semantic Drift"):
1021
+ ds = gr.Markdown("Loading...")
1022
+ dp = gr.Plot()
1023
+ de = gr.Plot()
1024
+ gr.Button("Refresh", variant="primary", size="sm").click(
1025
+ fn=build_drift, outputs=[dp, de, ds]
1026
+ )
1027
+
1028
+ with gr.TabItem("RAG & Quality"):
1029
+ qp = gr.Plot()
1030
+ with gr.Row():
1031
+ rp = gr.Plot()
1032
+ rr = gr.Plot()
1033
+ bp = gr.Plot()
1034
+ gr.Button("Refresh", variant="primary", size="sm").click(
1035
+ fn=build_rag_quality, outputs=[qp, rp, rr, bp]
1036
+ )
1037
+
1038
+ gr.HTML("""
1039
+ <div class="ep-ftr">
1040
+ EvalPulse v0.1.0 &middot; Open Source LLM Evaluation &amp; Drift Monitoring
1041
+ &middot; <a href="https://github.com/ninjacode911/Project-EvalPulse">GitHub</a>
1042
+ </div>
1043
+ """)
1044
+
1045
+ app.load(fn=build_overview, outputs=[hc, hac, dc, tc, hg, ht, at])
1046
+
1047
+ return app
1048
+
1049
+
1050
+ if __name__ == "__main__":
1051
+ create_app().launch(server_name="0.0.0.0", server_port=7860)
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ gradio>=4.0
2
+ plotly>=5.0
3
+ numpy>=1.24.0