some benchmark - a qqliangqi Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

qqliangqi 's Collections

Cite

agent

IFT

RLHF

sft

some benchmark

updated Oct 28, 2025

cais/mmlu

Viewer • Updated Mar 8, 2024 • 231k • 297k • 632
TIGER-Lab/MMLU-Pro

Benchmark • Updated 11 days ago • 12.1k • 84.9k • 412
cais/hle

Benchmark • Updated 9 days ago • 2.5k • 20.5k • 671
m-a-p/SuperGPQA

Viewer • Updated Apr 30, 2025 • 26.5k • 4.92k • 80
lmarena-ai/arena-hard-auto

Updated May 1, 2025 • 203 • 6
Running

202

MT Bench

📊

202

Compare AI model responses side-by-side

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs