Add MathArena evaluation result for aime/aime_2026
#44
by
JasperDekoninck - opened
This PR adds a new MathArena evaluation result so it can be indexed on the model leaderboard page.
Model: deepseek-ai/DeepSeek-V3.2
Competition dataset id: MathArena/aime_2026
Score: 94.17
Result file: .eval_results/MathArena--aime_2026.yaml
The results are the same as the ones displayed on our webpage.
Note: this is an experimental feature, we are currently trying to make this work as smooth as possible.
May I ask which version of deepseek-ai/DeepSeek-V3.2 these evaluation results refer to? Is it chat, think, or speciale
DeepSeek-V3.2-Thinking
Excuse me, is this score a pass@k? What is the value of k?
(Pass@1)