Supastrikas-004's picture
Update README.md
9476470 verified
---
title: Agentic Evaluation Framework
emoji: πŸ€–
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
---
# Agentic Evaluation Framework β€” Hugging Face Space
This Gradio app evaluates and compares many AI agents across tasks (QA, summarization, reasoning...) using lightweight scorers and visualizations.
## How to use
1. Upload a CSV/JSON/JSONL file with columns: `prompt`, `response`, `task`, `agent`, `reference` (reference optional).
2. Click **Run Evaluation**.
3. View per-task spider charts, heatmaps, bar plots in the Gallery, inspect per-example metrics in the table, and download the CSV report.
If no file is uploaded, a small synthetic demo dataset will be evaluated.
## Deploying
- Push this repo into a Hugging Face Space (Gradio). The `requirements.txt` will be installed automatically.
## Notes & Limitations
- Models used are lightweight but still require CPU memory (no Java).
- If `reference` is missing, hallucination/accuracy signals will be reduced.
- Coherence metric is a placeholder heuristic β€” you can replace it with grammar/perplexity models if desired.