DirtyAnonymous / benchmarks.csv
likhonsheikh's picture
Upload benchmarks.csv with huggingface_hub
1f2769d verified
raw
history blame
182 Bytes
Benchmark,Base %,Distilled %,Std Dev
AIME 2024,1.5,35.2,0.8
MATH-500,25.0,89.1,1.2
GSM8K,65.0,92.8,0.5
GPQA Diamond,28.0,45.5,1.5
LiveCodeBench,15.0,32.5,2.1
HumanEval,55.0,82.3,1.8