Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.6.0
metadata
title: Agentic Evaluation Framework
emoji: π€
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
Agentic Evaluation Framework β Hugging Face Space
This Gradio app evaluates and compares many AI agents across tasks (QA, summarization, reasoning...) using lightweight scorers and visualizations.
How to use
- Upload a CSV/JSON/JSONL file with columns:
prompt,response,task,agent,reference(reference optional). - Click Run Evaluation.
- View per-task spider charts, heatmaps, bar plots in the Gallery, inspect per-example metrics in the table, and download the CSV report.
If no file is uploaded, a small synthetic demo dataset will be evaluated.
Deploying
- Push this repo into a Hugging Face Space (Gradio). The
requirements.txtwill be installed automatically.
Notes & Limitations
- Models used are lightweight but still require CPU memory (no Java).
- If
referenceis missing, hallucination/accuracy signals will be reduced. - Coherence metric is a placeholder heuristic β you can replace it with grammar/perplexity models if desired.