Spaces:

TIGER-Lab
/

MMEB-Leaderboard

Running on CPU Upgrade

App Files Files Community

145

Complementary text LLM evaluation: accuracy + cost + hallucination

#136

by vigneshwar234 - opened 25 days ago

Discussion

vigneshwar234

25 days ago

Hi TIGER-Lab team 👋

MMEB's multimodal embedding evaluation is impressive. For the text-side evaluation of models you're benchmarking, I built a complementary framework.

LLM Evaluation Framework covers text LLM evaluation with 5 simultaneous metrics:

→ 🎯 Accuracy — 4-strategy cascade on MMLU and TruthfulQA
→ 💰 Cost per 1K tokens — especially relevant for embedding models at scale
→ ⚡ Latency p50/p95/p99
→ 🔍 Hallucination Rate — runs locally
→ 🧠 Reasoning Quality — CoT depth

Live demo (no API key): https://huggingface.co/spaces/vigneshwar234/llm-eval-demo
GitHub: https://github.com/vignesh2027/LLM-Evaluation-Framework

Open source, free forever. Feedback from the TIGER-Lab community welcome!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment