Delores-Lin commited on
Commit
e51b498
·
verified ·
1 Parent(s): 808e461

Add MDPBench evaluation results

Browse files

Adds MDPBench benchmark results. Keeps source attribution on the overall leaderboard entry.

Files changed (1) hide show
  1. .eval_results/mdpbench.yaml +135 -0
.eval_results/mdpbench.yaml ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: Delores-Lin/MDPBench
3
+ task_id: overall
4
+ value: 79.7
5
+ date: "2026-06-08"
6
+ source:
7
+ url: https://huggingface.co/datasets/Delores-Lin/MDPBench
8
+ name: MDPBench leaderboard
9
+ user: Delores-Lin
10
+
11
+ - dataset:
12
+ id: Delores-Lin/MDPBench
13
+ task_id: digital
14
+ value: 87.8
15
+ date: "2026-06-08"
16
+
17
+ - dataset:
18
+ id: Delores-Lin/MDPBench
19
+ task_id: photographed
20
+ value: 77.1
21
+ date: "2026-06-08"
22
+
23
+ - dataset:
24
+ id: Delores-Lin/MDPBench
25
+ task_id: latin
26
+ value: 82.7
27
+ date: "2026-06-08"
28
+
29
+ - dataset:
30
+ id: Delores-Lin/MDPBench
31
+ task_id: de
32
+ value: 86.6
33
+ date: "2026-06-08"
34
+
35
+ - dataset:
36
+ id: Delores-Lin/MDPBench
37
+ task_id: en
38
+ value: 86.5
39
+ date: "2026-06-08"
40
+
41
+ - dataset:
42
+ id: Delores-Lin/MDPBench
43
+ task_id: es
44
+ value: 69.7
45
+ date: "2026-06-08"
46
+
47
+ - dataset:
48
+ id: Delores-Lin/MDPBench
49
+ task_id: fr
50
+ value: 70.3
51
+ date: "2026-06-08"
52
+
53
+ - dataset:
54
+ id: Delores-Lin/MDPBench
55
+ task_id: id
56
+ value: 84.6
57
+ date: "2026-06-08"
58
+
59
+ - dataset:
60
+ id: Delores-Lin/MDPBench
61
+ task_id: it
62
+ value: 87.4
63
+ date: "2026-06-08"
64
+
65
+ - dataset:
66
+ id: Delores-Lin/MDPBench
67
+ task_id: nl
68
+ value: 82.7
69
+ date: "2026-06-08"
70
+
71
+ - dataset:
72
+ id: Delores-Lin/MDPBench
73
+ task_id: pt
74
+ value: 90.7
75
+ date: "2026-06-08"
76
+
77
+ - dataset:
78
+ id: Delores-Lin/MDPBench
79
+ task_id: vi
80
+ value: 85.6
81
+ date: "2026-06-08"
82
+
83
+ - dataset:
84
+ id: Delores-Lin/MDPBench
85
+ task_id: non_latin
86
+ value: 76.4
87
+ date: "2026-06-08"
88
+
89
+ - dataset:
90
+ id: Delores-Lin/MDPBench
91
+ task_id: ar
92
+ value: 78.2
93
+ date: "2026-06-08"
94
+
95
+ - dataset:
96
+ id: Delores-Lin/MDPBench
97
+ task_id: hi
98
+ value: 81.1
99
+ date: "2026-06-08"
100
+
101
+ - dataset:
102
+ id: Delores-Lin/MDPBench
103
+ task_id: jp
104
+ value: 68.8
105
+ date: "2026-06-08"
106
+
107
+ - dataset:
108
+ id: Delores-Lin/MDPBench
109
+ task_id: ko
110
+ value: 80.3
111
+ date: "2026-06-08"
112
+
113
+ - dataset:
114
+ id: Delores-Lin/MDPBench
115
+ task_id: ru
116
+ value: 74.0
117
+ date: "2026-06-08"
118
+
119
+ - dataset:
120
+ id: Delores-Lin/MDPBench
121
+ task_id: th
122
+ value: 78.5
123
+ date: "2026-06-08"
124
+
125
+ - dataset:
126
+ id: Delores-Lin/MDPBench
127
+ task_id: zh
128
+ value: 73.8
129
+ date: "2026-06-08"
130
+
131
+ - dataset:
132
+ id: Delores-Lin/MDPBench
133
+ task_id: zh_t
134
+ value: 76.3
135
+ date: "2026-06-08"