Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
little-jack
's Collections
Cite
agent
planning
IFT
RLHF
sft
pre-train
some benchmark
some benchmark
updated
Oct 28, 2025
Upvote
-
cais/mmlu
Viewer
•
Updated
Mar 8, 2024
•
231k
•
379k
•
683
TIGER-Lab/MMLU-Pro
Benchmark
•
Updated
2 days ago
•
12.1k
•
121k
•
447
cais/hle
Benchmark
•
Updated
Jan 20
•
2.5k
•
42.8k
•
740
m-a-p/SuperGPQA
Viewer
•
Updated
Apr 30, 2025
•
26.5k
•
9.44k
•
86
lmarena-ai/arena-hard-auto
Updated
May 1, 2025
•
962
•
6
Running
202
MT Bench
📊
202
Compare AI model responses side-by-side
Upvote
-
Share collection
View history
Collection guide
Browse collections