Supastrikas-004's picture
Update README.md
9476470 verified

A newer version of the Gradio SDK is available: 6.6.0

Upgrade
metadata
title: Agentic Evaluation Framework
emoji: πŸ€–
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false

Agentic Evaluation Framework β€” Hugging Face Space

This Gradio app evaluates and compares many AI agents across tasks (QA, summarization, reasoning...) using lightweight scorers and visualizations.

How to use

  1. Upload a CSV/JSON/JSONL file with columns: prompt, response, task, agent, reference (reference optional).
  2. Click Run Evaluation.
  3. View per-task spider charts, heatmaps, bar plots in the Gallery, inspect per-example metrics in the table, and download the CSV report.

If no file is uploaded, a small synthetic demo dataset will be evaluated.

Deploying

  • Push this repo into a Hugging Face Space (Gradio). The requirements.txt will be installed automatically.

Notes & Limitations

  • Models used are lightweight but still require CPU memory (no Java).
  • If reference is missing, hallucination/accuracy signals will be reduced.
  • Coherence metric is a placeholder heuristic β€” you can replace it with grammar/perplexity models if desired.