Add evaluation results from model card benchmark tables

#21
by SaylorTwift HF Staff - opened
Files changed (1) hide show
  1. .eval_results/vibethinker-3b.yaml +34 -0
.eval_results/vibethinker-3b.yaml ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Evaluation results for WeiboAI/VibeThinker-3B
2
+ # Extracted from the model card benchmark tables (pictures/VibeThiinker-3B.png, pictures/VibeThinker-3B+CLR.png)
3
+ # https://huggingface.co/WeiboAI/VibeThinker-3B
4
+ # Paper: https://huggingface.co/papers/2606.16140
5
+
6
+ # ---------------------------------------------------------------------------
7
+ # Mathematics
8
+ # ---------------------------------------------------------------------------
9
+
10
+ # AIME 2026 - 94.3
11
+ - dataset:
12
+ id: MathArena/aime_2026
13
+ task_id: MathArena/aime_2026
14
+ value: 94.3
15
+ date: "2026-06-19"
16
+ source:
17
+ url: https://huggingface.co/WeiboAI/VibeThinker-3B
18
+ name: VibeThinker-3B model card evaluation table
19
+ notes: "Evaluated with vLLM, temperature=1.0, top_p=0.95, top_k=-1."
20
+
21
+ # ---------------------------------------------------------------------------
22
+ # Knowledge / Reasoning
23
+ # ---------------------------------------------------------------------------
24
+
25
+ # GPQA Diamond - 70.2
26
+ - dataset:
27
+ id: Idavidrein/gpqa
28
+ task_id: diamond
29
+ value: 70.2
30
+ date: "2026-06-19"
31
+ source:
32
+ url: https://huggingface.co/WeiboAI/VibeThinker-3B
33
+ name: VibeThinker-3B model card evaluation table
34
+ notes: "Evaluated with vLLM, temperature=1.0, top_p=0.95, top_k=-1."