Benchmarks - a hppdqdq Collection

hppdqdq 's Collections

Benchmarks

updated Jan 13, 2025

Running on CPU Upgrade

Agents

251

MMLU-Pro Leaderboard

🥇

251

More advanced and challenging multi-task evaluation
Running

62

Stick To Your Role! Leaderboard

🎭

62

Benchmarking LLMs on the stability of simulated populations
Running

53

ZeroEval Leaderboard

📊

53

Explore ZeroEval embedding benchmark online
Runtime error

Agents

26

Decentralized Arena Leaderboard

🥇

26

View and compare LLM evaluations across various domains
Runtime error

Agents

Featured

437

Open Medical-LLM Leaderboard

🥇

437

Explore and submit models for benchmarking
Paused

Agents

354

GPU Poor LLM Arena

🏆

354

Compact LLM Battle Arena: Frugal AI Face-Off!
Running

Agents

Featured

135

Open VLM Video Leaderboard

🌎

135

VLMEvalKit Eval Results in video understanding benchmark
Running on CPU Upgrade

14k

Open LLM Leaderboard

🏆

14k

Track, rank and evaluate open LLMs and chatbots
Running

Agents

486

TTS Spaces Arena

🤗

486

Blind vote on HF TTS models!