Spaces:

Supastrikas-004
/

evaluation-framework

Runtime error

evaluation-framework / README.md

Update README.md

9476470 verified 5 months ago

1.14 kB

	---
	title: Agentic Evaluation Framework
	emoji: 🤖
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	sdk_version: 5.45.0
	app_file: app.py
	pinned: false
	---

	# Agentic Evaluation Framework — Hugging Face Space

	This Gradio app evaluates and compares many AI agents across tasks (QA, summarization, reasoning...) using lightweight scorers and visualizations.

	## How to use
	1. Upload a CSV/JSON/JSONL file with columns: `prompt`, `response`, `task`, `agent`, `reference` (reference optional).
	2. Click Run Evaluation.
	3. View per-task spider charts, heatmaps, bar plots in the Gallery, inspect per-example metrics in the table, and download the CSV report.

	If no file is uploaded, a small synthetic demo dataset will be evaluated.

	## Deploying
	- Push this repo into a Hugging Face Space (Gradio). The `requirements.txt` will be installed automatically.

	## Notes & Limitations
	- Models used are lightweight but still require CPU memory (no Java).
	- If `reference` is missing, hallucination/accuracy signals will be reduced.
	- Coherence metric is a placeholder heuristic — you can replace it with grammar/perplexity models if desired.