Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Croc-Prog-HF
's Collections
Chat-Style and Reasoning Datasets
Synthetic Data Generation & Datasets
Deepfake & AI content detection
Bias, Misalignment, and AI Safety
Benchmark datasets
LoreWeaver-2 Family
MultiLang-Texts HQ Datasets
Math-HQ-datasets
Benchmark datasets
updated
6 days ago
Upvote
1
Sort: Collection
cais/hle
Benchmark
•
Updated
Jan 20
•
2.5k
•
28.7k
•
849
Qwen/DeepPlanning
Viewer
•
Updated
Mar 3
•
2.14k
•
579
•
210
gaia-benchmark/GAIA
Viewer
•
Updated
Oct 28, 2025
•
932
•
23.1k
•
706
BLINK-Benchmark/BLINK
Viewer
•
Updated
Sep 3, 2025
•
3.81k
•
11.9k
•
48
openai/gsm8k
Benchmark
•
Updated
Mar 23
•
17.6k
•
935k
•
1.42k
allenai/olmOCR-bench
Benchmark
•
Updated
Feb 19
•
7.83k
•
256
TIGER-Lab/MMLU-Pro
Benchmark
•
Updated
May 2
•
12.1k
•
158k
•
489
openai/openai_humaneval
Viewer
•
Updated
Jan 4, 2024
•
164
•
231k
•
395
Muennighoff/mbpp
Viewer
•
Updated
Oct 20, 2022
•
1.4k
•
3.12k
•
25
bigcode/bigcodebench
Viewer
•
Updated
Apr 30, 2025
•
5.7k
•
46.4k
•
84
livecodebench/test_generation
Viewer
•
Updated
Jun 13, 2024
•
442
•
839
•
7
ScaleAI/SWE-bench_Pro
Benchmark
•
Updated
Feb 23
•
731
•
68.4k
•
145
SWE-bench/SWE-bench_Verified
Benchmark
•
Updated
Feb 27
•
500
•
69.2k
•
101
mteb/arguana
Benchmark
•
Updated
Apr 17
•
11.5k
•
11.3k
•
5
MathArena/hmmt_feb_2026
Benchmark
•
Updated
May 15
•
33
•
5.18k
•
5
Idavidrein/gpqa
Benchmark
•
Updated
Mar 5
•
1.25k
•
93.5k
•
471
likaixin/ScreenSpot-Pro
Benchmark
•
Updated
Mar 18
•
10.4k
•
67
harborframework/terminal-bench-2.0
Benchmark
•
Updated
Apr 24
•
18.7k
•
44
FutureMa/EvasionBench
Benchmark
•
Updated
Feb 19
•
16.7k
•
444
•
110
internlm/WildClawBench
Benchmark
•
Updated
May 15
•
23.6k
•
62
FINAL-Bench/World-Model
Viewer
•
Updated
May 15
•
100
•
371
•
38
llamaindex/ParseBench
Benchmark
•
Updated
Apr 19
•
169k
•
14.1k
•
101
mteb/BRIGHT
Benchmark
•
Updated
Apr 2
•
1.35M
•
3.42k
•
3
Delores-Lin/MDPBench
Benchmark
•
Updated
Apr 26
•
1.57k
•
20
collinear-ai/yc-bench
Benchmark
•
Updated
Mar 23
•
79
•
18
nvidia/compute-eval
Benchmark
•
Updated
Apr 27
•
2.46k
•
1.32k
•
26
Jackrong/Qwen3.5-reasoning-700x
Viewer
•
Updated
Mar 2
•
633
•
1.42k
•
119
SWE-bench/SWE-smith
Viewer
•
Updated
Dec 14, 2025
•
59.1k
•
16.5k
•
54
mercor/apex-agents
Benchmark
•
Updated
21 days ago
•
480
•
67.9k
•
133
actava/chi-bench
Benchmark
•
Updated
about 1 month ago
•
101
•
4.07k
•
56
agents-last-exam/agents-last-exam
Viewer
•
Updated
20 days ago
•
153
•
8.26k
•
197
tiiuae/PBench
Benchmark
•
Updated
May 11
•
6.34k
•
1.12k
•
15
datacurve/deep-swe
Benchmark
•
Updated
about 1 month ago
•
113
•
671
•
13
meituan-longcat/WBench
Benchmark
•
Updated
May 29
•
867
•
2.37k
•
21
ChrisHayduk/nanofold-public
Benchmark
•
Updated
16 days ago
•
11k
•
245
•
16
mercor/ACE
Benchmark
•
Updated
Apr 13
•
592
•
6.07k
•
5
LEXam-Benchmark/LEXam
Benchmark
•
Updated
May 21
•
7.54k
•
2.07k
•
45
Qwen/AgentWorldBench
Viewer
•
Updated
9 days ago
•
2.17k
•
1.58k
•
61
ArtificialAnalysis/ITBench-AA
Viewer
•
Updated
May 27
•
40
•
47.6k
•
46
Upvote
1
Sort: Collection
Share collection
View history
Collection guide
Browse collections