File size: 1,136 Bytes
5d020a3
 
 
 
 
 
9476470
5d020a3
 
 
 
 
56d4ef6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9476470
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
title: Agentic Evaluation Framework
emoji: πŸ€–
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
---

# Agentic Evaluation Framework β€” Hugging Face Space

This Gradio app evaluates and compares many AI agents across tasks (QA, summarization, reasoning...) using lightweight scorers and visualizations.

## How to use
1. Upload a CSV/JSON/JSONL file with columns: `prompt`, `response`, `task`, `agent`, `reference` (reference optional).
2. Click **Run Evaluation**.
3. View per-task spider charts, heatmaps, bar plots in the Gallery, inspect per-example metrics in the table, and download the CSV report.

If no file is uploaded, a small synthetic demo dataset will be evaluated.

## Deploying
- Push this repo into a Hugging Face Space (Gradio). The `requirements.txt` will be installed automatically.

## Notes & Limitations
- Models used are lightweight but still require CPU memory (no Java).
- If `reference` is missing, hallucination/accuracy signals will be reduced.
- Coherence metric is a placeholder heuristic β€” you can replace it with grammar/perplexity models if desired.