Spaces:

sentence-transformers
/

quantized-retrieval

Running

Evaluate LLM generation quality on top of your retrieval — cost + hallucination

by vigneshwar234 - opened 21 days ago

Hi Sentence Transformers team 👋

Quantized retrieval is impressive work on the retrieval side. For the LLM generation layer on top of retrieved passages, I built an evaluation framework.

LLM Evaluation Framework evaluates the generation side of RAG:

→ 🔍 Hallucination Rate — does the LLM stay grounded to retrieved content or hallucinate?
→ 🎯 Accuracy — answer quality against ground truth
→ 💰 Cost per 1K tokens — the other side of RAG optimization alongside quantized retrieval
→ ⚡ Latency p95 — generation latency on top of retrieval latency
→ 🧠 Reasoning Quality — for models that cite their retrieved reasoning

Quantized retrieval (fast, cheap) + cost-optimized generation (this tool) = efficient full RAG stack.

Live demo: https://huggingface.co/spaces/vigneshwar234/llm-eval-demo
GitHub: https://github.com/vignesh2027/LLM-Evaluation-Framework

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment