--- title: Agentic Evaluation Framework emoji: 🤖 colorFrom: indigo colorTo: blue sdk: gradio sdk_version: 5.45.0 app_file: app.py pinned: false --- # Agentic Evaluation Framework — Hugging Face Space Upload a CSV/JSON/JSONL file with rows containing: - `prompt` (or `instruction`) - `response` - `task` (qa, summarization, reasoning, etc.) - `agent` - `reference` (optional — used for accuracy / hallucination checks) Features: - Rule-based scoring (instruction-following, coherence, grammar). - Optional LLM-based hallucination detection (ComprehensiveHallucinationDetector) — toggleable in UI. - Per-task tabs with: - Per-example metrics table - Radar (spider) charts comparing agents - Horizontal leaderboard (downloadable) - Heatmap of metric correlations - Exportable CSV report. Notes: - The LLM-judge uses transformer models and may be memory-heavy. Only enable when you have sufficient resources. The app will fall back if model loading fails. - No Java dependency: the grammar check uses LanguageToolPublicAPI, so it works on Hugging Face Spaces.