Supastrikas-004's picture
Update README.md
617fa67 verified

A newer version of the Gradio SDK is available: 6.6.0

Upgrade
metadata
title: Agentic Evaluation Framework
emoji: 🤖
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false

Agentic Evaluation Framework — Hugging Face Space

Upload a CSV/JSON/JSONL file with rows containing:

  • prompt (or instruction)
  • response
  • task (qa, summarization, reasoning, etc.)
  • agent
  • reference (optional — used for accuracy / hallucination checks)

Features:

  • Rule-based scoring (instruction-following, coherence, grammar).
  • Optional LLM-based hallucination detection (ComprehensiveHallucinationDetector) — toggleable in UI.
  • Per-task tabs with:
    • Per-example metrics table
    • Radar (spider) charts comparing agents
    • Horizontal leaderboard (downloadable)
    • Heatmap of metric correlations
  • Exportable CSV report.

Notes:

  • The LLM-judge uses transformer models and may be memory-heavy. Only enable when you have sufficient resources. The app will fall back if model loading fails.
  • No Java dependency: the grammar check uses LanguageToolPublicAPI, so it works on Hugging Face Spaces.