---
title: Agentic Evaluation Framework
emoji: 🤖
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
---

# Agentic Evaluation Framework — Hugging Face Space

This Gradio app evaluates and compares many AI agents across tasks (QA, summarization, reasoning...) using lightweight scorers and visualizations.

## How to use
1. Upload a CSV/JSON/JSONL file with columns: `prompt`, `response`, `task`, `agent`, `reference` (reference optional).
2. Click **Run Evaluation**.
3. View per-task spider charts, heatmaps, bar plots in the Gallery, inspect per-example metrics in the table, and download the CSV report.

If no file is uploaded, a small synthetic demo dataset will be evaluated.

## Deploying
- Push this repo into a Hugging Face Space (Gradio). The `requirements.txt` will be installed automatically.

## Notes & Limitations
- Models used are lightweight but still require CPU memory (no Java).
- If `reference` is missing, hallucination/accuracy signals will be reduced.
- Coherence metric is a placeholder heuristic — you can replace it with grammar/perplexity models if desired.