Spaces:

F4biian
/

RAGognizer

Running on Zero

Batch hallucination + accuracy evaluation tool for RAG system LLMs

by vigneshwar234 - opened about 11 hours ago

Hi 👋

RAGognizer's real-time token-level hallucination detection is impressive! For teams running batch evaluation across multiple LLMs before choosing a backbone for their RAG system, I built a complementary tool.

LLM Evaluation Framework provides batch evaluation with:

→ 🔍 Hallucination Rate — batch-scored across all test samples, gives a single 0.0-1.0 rate
→ 🎯 Accuracy — verified against ground truth answers
→ 💰 Cost per 1K tokens — so you can budget your RAG inference
→ ⚡ Latency p95 — RAG pipelines are latency-sensitive, tail latency matters
→ 🧠 Reasoning Quality — for RAG models that cite reasoning

Live demo: https://huggingface.co/spaces/vigneshwar234/llm-eval-demo
GitHub: https://github.com/vignesh2027/LLM-Evaluation-Framework

Would love to discuss combining batch evaluation with real-time token-level detection!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment