Spaces:

Supastrikas-004
/

agentic-evaluation-framework

Sleeping

Update README.md

617fa67 verified 5 months ago

1.08 kB

	---
	title: Agentic Evaluation Framework
	emoji: 🤖
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	sdk_version: 5.45.0
	app_file: app.py
	pinned: false
	---

	# Agentic Evaluation Framework — Hugging Face Space

	Upload a CSV/JSON/JSONL file with rows containing:
	- `prompt` (or `instruction`)
	- `response`
	- `task` (qa, summarization, reasoning, etc.)
	- `agent`
	- `reference` (optional — used for accuracy / hallucination checks)

	Features:
	- Rule-based scoring (instruction-following, coherence, grammar).
	- Optional LLM-based hallucination detection (ComprehensiveHallucinationDetector) — toggleable in UI.
	- Per-task tabs with:
	- Per-example metrics table
	- Radar (spider) charts comparing agents
	- Horizontal leaderboard (downloadable)
	- Heatmap of metric correlations
	- Exportable CSV report.

	Notes:
	- The LLM-judge uses transformer models and may be memory-heavy. Only enable when you have sufficient resources. The app will fall back if model loading fails.
	- No Java dependency: the grammar check uses LanguageToolPublicAPI, so it works on Hugging Face Spaces.