Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
little-jack
's Collections
Cite
agent
planning
IFT
RLHF
sft
pre-train
some benchmark
some benchmark
updated
Oct 28, 2025
Upvote
-
Sort:Â Collection
cais/mmlu
Viewer
•
Updated
Mar 8, 2024
•
231k
•
434k
•
777
TIGER-Lab/MMLU-Pro
Benchmark
•
Updated
May 2
•
12.1k
•
158k
•
489
cais/hle
Benchmark
•
Updated
Jan 20
•
2.5k
•
29.6k
•
843
m-a-p/SuperGPQA
Viewer
•
Updated
Apr 30, 2025
•
26.5k
•
8.38k
•
90
lmarena-ai/arena-hard-auto
Updated
May 1, 2025
•
1.33k
•
8
Running
Agents
202
MT Bench
📊
202
Explore and compare AI model answers on benchmark questions
Upvote
-
Sort:Â Collection
Share collection
View history
Collection guide
Browse collections