File size: 182 Bytes
1f2769d
 
 
 
 
 
 
1
2
3
4
5
6
7
8
Benchmark,Base %,Distilled %,Std Dev
AIME 2024,1.5,35.2,0.8
MATH-500,25.0,89.1,1.2
GSM8K,65.0,92.8,0.5
GPQA Diamond,28.0,45.5,1.5
LiveCodeBench,15.0,32.5,2.1
HumanEval,55.0,82.3,1.8