Spaces:

Supastrikas-004
/

evaluation-framework

Runtime error

evaluation-framework / README.md

Update README.md

9476470 verified 5 months ago

1.14 kB

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

title: Agentic Evaluation Framework
emoji: 🤖
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false

Agentic Evaluation Framework — Hugging Face Space

This Gradio app evaluates and compares many AI agents across tasks (QA, summarization, reasoning...) using lightweight scorers and visualizations.

Upload a CSV/JSON/JSONL file with columns: prompt, response, task, agent, reference (reference optional).
Click Run Evaluation.
View per-task spider charts, heatmaps, bar plots in the Gallery, inspect per-example metrics in the table, and download the CSV report.

If no file is uploaded, a small synthetic demo dataset will be evaluated.

Push this repo into a Hugging Face Space (Gradio). The requirements.txt will be installed automatically.

Models used are lightweight but still require CPU memory (no Java).
If reference is missing, hallucination/accuracy signals will be reduced.
Coherence metric is a placeholder heuristic — you can replace it with grammar/perplexity models if desired.