Add MathArena evaluation result for aime/aime_2026

#44

by JasperDekoninck - opened 11 days ago

←

This PR adds a new MathArena evaluation result so it can be indexed on the model leaderboard page.

Model: deepseek-ai/DeepSeek-V3.2
Competition dataset id: MathArena/aime_2026
Score: 94.17
Result file: .eval_results/MathArena--aime_2026.yaml

The results are the same as the ones displayed on our webpage.

Note: this is an experimental feature, we are currently trying to make this work as smooth as possible.

May I ask which version of deepseek-ai/DeepSeek-V3.2 these evaluation results refer to? Is it chat, think, or speciale

DeepSeek-V3.2-Thinking

Excuse me, is this score a pass@k? What is the value of k?

(Pass@1)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment