--- title: Agentic Evaluation Framework emoji: 🤖 colorFrom: indigo colorTo: blue sdk: gradio sdk_version: 5.45.0 app_file: app.py pinned: false --- # Agentic Evaluation Framework — Hugging Face Space This Gradio app evaluates and compares many AI agents across tasks (QA, summarization, reasoning...) using lightweight scorers and visualizations. ## How to use 1. Upload a CSV/JSON/JSONL file with columns: `prompt`, `response`, `task`, `agent`, `reference` (reference optional). 2. Click **Run Evaluation**. 3. View per-task spider charts, heatmaps, bar plots in the Gallery, inspect per-example metrics in the table, and download the CSV report. If no file is uploaded, a small synthetic demo dataset will be evaluated. ## Deploying - Push this repo into a Hugging Face Space (Gradio). The `requirements.txt` will be installed automatically. ## Notes & Limitations - Models used are lightweight but still require CPU memory (no Java). - If `reference` is missing, hallucination/accuracy signals will be reduced. - Coherence metric is a placeholder heuristic — you can replace it with grammar/perplexity models if desired.