Open source 5-metric evaluation as a complement to VANTAGE-Bench

#1
by vigneshwar234 - opened

Hi Clemson Computing team ๐Ÿ‘‹

VANTAGE-Bench's approach to model ranking is solid. I built an open source framework that complements structured benchmarks with production-relevant metrics.

LLM Evaluation Framework adds:

โ†’ ๐Ÿ’ฐ Cost per 1K tokens โ€” the production budget dimension
โ†’ โšก Latency p50/p95/p99 โ€” real-world deployment latency
โ†’ ๐Ÿ” Hallucination Rate โ€” overconfidence detection
โ†’ ๐ŸŽฏ Accuracy โ€” 4-strategy cascade scorer
โ†’ ๐Ÿง  Reasoning Quality โ€” chain-of-thought depth

One CLI command. Any LiteLLM-compatible model.

Live demo: https://huggingface.co/spaces/vigneshwar234/llm-eval-demo
GitHub: https://github.com/vignesh2027/LLM-Evaluation-Framework

Open source, 71 tests, free forever!

Sign up or log in to comment