Open source 5-metric evaluation as a complement to VANTAGE-Bench
#1
by vigneshwar234 - opened
Hi Clemson Computing team ๐
VANTAGE-Bench's approach to model ranking is solid. I built an open source framework that complements structured benchmarks with production-relevant metrics.
LLM Evaluation Framework adds:
โ ๐ฐ Cost per 1K tokens โ the production budget dimension
โ โก Latency p50/p95/p99 โ real-world deployment latency
โ ๐ Hallucination Rate โ overconfidence detection
โ ๐ฏ Accuracy โ 4-strategy cascade scorer
โ ๐ง Reasoning Quality โ chain-of-thought depth
One CLI command. Any LiteLLM-compatible model.
Live demo: https://huggingface.co/spaces/vigneshwar234/llm-eval-demo
GitHub: https://github.com/vignesh2027/LLM-Evaluation-Framework
Open source, 71 tests, free forever!