Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
SaylorTwift
's Collections
benchmarks
RULER Datasets Falcon-H1-3B-Base
RULER Datasets Lamma3-Instruct
RULER Datasets Qwen2.5-Instruct
RULER Datasets Qwen-3-Instruct
RULER Datasets Qwen-3
agents
Agents ressources
benchmarks
updated
May 13
Upvote
-
Sort: Collection
meituan-longcat/LARYBench
Updated
Apr 30
•
2.55k
•
18
llamaindex/ParseBench
Benchmark
•
Updated
Apr 19
•
169k
•
16.5k
•
100
nvidia/QCalEval
Viewer
•
Updated
Apr 13
•
243
•
1.1k
•
19
allenai/olmOCR-bench
Benchmark
•
Updated
Feb 19
•
7.48k
•
251
LongHorizonReasoning/longcot
Viewer
•
Updated
Apr 20
•
5k
•
300
•
13
mercor/apex-agents
Benchmark
•
Updated
18 days ago
•
480
•
68.2k
•
132
hsiung/MagicBench
Viewer
•
Updated
Apr 18
•
50
•
26
•
10
openlifescienceai/medmcqa
Viewer
•
Updated
Jan 4, 2024
•
193k
•
35.6k
•
228
openai/healthbench-professional
Viewer
•
Updated
Apr 22
•
525
•
1.24k
•
54
ShadenA/MathNet
Viewer
•
Updated
14 days ago
•
55.6k
•
4.68k
•
88
claw-eval/Claw-Eval
Benchmark
•
Updated
May 8
•
3.46k
•
28
FrontierCS/Frontier-CS
Viewer
•
Updated
about 12 hours ago
•
268
•
5.5k
•
6
Upvote
-
Sort: Collection
Share collection
View history
Collection guide
Browse collections