leaderboards - a MoritzLaurer Collection

MoritzLaurer 's Collections

prompt-templates

Zeroshot Classifiers

other-interesting

code generation

leaderboards

updated Mar 2

Running

4.85k

Arena Leaderboard

🏆

4.85k

View the LMArena language model leaderboard
Running on CPU Upgrade

13.9k

Open LLM Leaderboard

🏆

13.9k

Track, rank and evaluate open LLMs and chatbots
Running on CPU Upgrade

7.28k

MTEB Leaderboard

🥇

7.28k

Embedding Leaderboard
Running on CPU Upgrade

Agents

Featured

1.31k

Open ASR Leaderboard

🏆

1.31k

Explore speech recognition model benchmarks and rankings
Running

Agents

Featured

586

LLM-Perf Leaderboard

🏆

586

Explore LLM performance across hardware configurations
Running

Agents

1.5k

Big Code Models Leaderboard

📈

1.5k

Explore and submit code model evaluations on a leaderboard
Runtime error

Agents

78

Human & GPT-4 Evaluation of LLMs Leaderboard

👩

78
Runtime error

Agents

145

Hallucinations Leaderboard

🔥

145

View and submit LLM evaluations
Build error

Agents

105

Enterprise Scenarios Leaderboard

🥇

105
Running on CPU Upgrade

Agents

93

LLM Safety Leaderboard

🥇

93

Explore and submit LLM benchmarks
Running

Featured

562

Vision Arena (Testing VLMs side-by-side)

🖼

562

Explore AI-powered visual tasks in Vision Arena
Running

72

CyberSecEvalTest

📈

72

Evaluate LLMs' cybersecurity risks and capabilities
Running

Featured

452

LLM Performance Leaderboard

🐨

452

View the latest LLM performance leaderboard online
Running on CPU Upgrade

Agents

76

AIR-Bench Leaderboard

🥇

76

Explore and compare QA and long doc benchmarks
Running on CPU Upgrade

Agents

1.01k

Open VLM Leaderboard

🌎

1.01k

VLMEvalKit Evaluation Results Collection
Running

Agents

427

Reward Bench Leaderboard

📐

427

Explore RewardBench model rankings and scores
Running

Agents

230

BigCodeBench Leaderboard

🥇

230

Explore code-generation model leaderboards and task details
Runtime error

Agents

10

MJ Bench Leaderboard

🥇

10

Display and filter multimodal model leaderboard results
Running

116

MTEB Arena

⚔

116

Display MTEB Arena interface
Runtime error

Agents

Featured

151

Open LLM Progress Tracker

🔬

151

Visualize Open vs. Proprietary LLM Progress
Running

Agents

110

Judge Arena

💻

110

View and compare open‑source AI model rankings with ELO scores
Running

Agents

481

TTS Spaces Arena

🤗

481

Blind vote on HF TTS models!
Runtime error

Featured

141

smolagents LLM leaderboard

🏆

141

A leaderboard for LLMs powering smolagents