Supastrikas-004's picture
Update README.md
617fa67 verified
---
title: Agentic Evaluation Framework
emoji: πŸ€–
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
---
# Agentic Evaluation Framework β€” Hugging Face Space
Upload a CSV/JSON/JSONL file with rows containing:
- `prompt` (or `instruction`)
- `response`
- `task` (qa, summarization, reasoning, etc.)
- `agent`
- `reference` (optional β€” used for accuracy / hallucination checks)
Features:
- Rule-based scoring (instruction-following, coherence, grammar).
- Optional LLM-based hallucination detection (ComprehensiveHallucinationDetector) β€” toggleable in UI.
- Per-task tabs with:
- Per-example metrics table
- Radar (spider) charts comparing agents
- Horizontal leaderboard (downloadable)
- Heatmap of metric correlations
- Exportable CSV report.
Notes:
- The LLM-judge uses transformer models and may be memory-heavy. Only enable when you have sufficient resources. The app will fall back if model loading fails.
- No Java dependency: the grammar check uses LanguageToolPublicAPI, so it works on Hugging Face Spaces.