some benchmark - a little-jack Collection

Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

little-jack 's Collections

some benchmark

updated Oct 28, 2025

cais/mmlu

Viewer • Updated Mar 8, 2024 • 231k • 434k • 777
TIGER-Lab/MMLU-Pro

Benchmark • Updated May 2 • 12.1k • 158k • 489
cais/hle

Benchmark • Updated Jan 20 • 2.5k • 29.6k • 843
m-a-p/SuperGPQA

Viewer • Updated Apr 30, 2025 • 26.5k • 8.38k • 90
lmarena-ai/arena-hard-auto

Updated May 1, 2025 • 1.33k • 8
Running

Agents

202

MT Bench

📊

202

Explore and compare AI model answers on benchmark questions

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs