Spaces:

aizip-dev
/

SLM-RAG-Arena

Running on Zero

Systematic SLM evaluation: accuracy + cost + hallucination for RAG model selection

by vigneshwar234 - opened 20 days ago

Hi aizip-dev team 👋

SLM RAG Arena is great for head-to-head comparison. For teams who want reproducible, quantitative benchmarks (not just preference-based arena scores), I built a complementary evaluation framework.

LLM Evaluation Framework for SLMs in RAG systems:

→ 🎯 Accuracy — reproducible, not subjective
→ 🔍 Hallucination Rate — critical for RAG where models must stay grounded to retrieved context
→ 💰 Cost per 1K tokens — SLMs are often chosen for cost reasons, quantify this precisely
→ ⚡ Latency p95 — RAG pipelines are latency-sensitive
→ 🧠 Reasoning Quality — for SLMs that explain their retrieval reasoning

Arena preference + quantitative metrics = better SLM selection.

Live demo: https://huggingface.co/spaces/vigneshwar234/llm-eval-demo
GitHub: https://github.com/vignesh2027/LLM-Evaluation-Framework

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment