| Benchmark,Base %,Distilled %,Std Dev | |
| AIME 2024,1.5,35.2,0.8 | |
| MATH-500,25.0,89.1,1.2 | |
| GSM8K,65.0,92.8,0.5 | |
| GPQA Diamond,28.0,45.5,1.5 | |
| LiveCodeBench,15.0,32.5,2.1 | |
| HumanEval,55.0,82.3,1.8 | |
| Benchmark,Base %,Distilled %,Std Dev | |
| AIME 2024,1.5,35.2,0.8 | |
| MATH-500,25.0,89.1,1.2 | |
| GSM8K,65.0,92.8,0.5 | |
| GPQA Diamond,28.0,45.5,1.5 | |
| LiveCodeBench,15.0,32.5,2.1 | |
| HumanEval,55.0,82.3,1.8 | |