Spaces:

clemson-computing
/

VANTAGE-Bench-Leaderboard

Running

Open source 5-metric evaluation as a complement to VANTAGE-Bench

by vigneshwar234 - opened 25 days ago

Hi Clemson Computing team 👋

VANTAGE-Bench's approach to model ranking is solid. I built an open source framework that complements structured benchmarks with production-relevant metrics.

LLM Evaluation Framework adds:

→ 💰 Cost per 1K tokens — the production budget dimension
→ ⚡ Latency p50/p95/p99 — real-world deployment latency
→ 🔍 Hallucination Rate — overconfidence detection
→ 🎯 Accuracy — 4-strategy cascade scorer
→ 🧠 Reasoning Quality — chain-of-thought depth

One CLI command. Any LiteLLM-compatible model.

Live demo: https://huggingface.co/spaces/vigneshwar234/llm-eval-demo
GitHub: https://github.com/vignesh2027/LLM-Evaluation-Framework

Open source, 71 tests, free forever!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment