Spaces:
Runtime error
Runtime error
| title: Agentic Evaluation Framework | |
| emoji: π€ | |
| colorFrom: indigo | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.45.0 | |
| app_file: app.py | |
| pinned: false | |
| # Agentic Evaluation Framework β Hugging Face Space | |
| This Gradio app evaluates and compares many AI agents across tasks (QA, summarization, reasoning...) using lightweight scorers and visualizations. | |
| ## How to use | |
| 1. Upload a CSV/JSON/JSONL file with columns: `prompt`, `response`, `task`, `agent`, `reference` (reference optional). | |
| 2. Click **Run Evaluation**. | |
| 3. View per-task spider charts, heatmaps, bar plots in the Gallery, inspect per-example metrics in the table, and download the CSV report. | |
| If no file is uploaded, a small synthetic demo dataset will be evaluated. | |
| ## Deploying | |
| - Push this repo into a Hugging Face Space (Gradio). The `requirements.txt` will be installed automatically. | |
| ## Notes & Limitations | |
| - Models used are lightweight but still require CPU memory (no Java). | |
| - If `reference` is missing, hallucination/accuracy signals will be reduced. | |
| - Coherence metric is a placeholder heuristic β you can replace it with grammar/perplexity models if desired. |