Yi30 commited on
Commit
f2ba3d4
·
verified ·
1 Parent(s): 24d23c0

Upload folder using huggingface_hub

Browse files
20250731_120452-unified-expand/configs/20250731_120452_340278.py ADDED
@@ -0,0 +1,1976 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ datasets = [
2
+ dict(
3
+ abbr='openai_humaneval',
4
+ eval_cfg=dict(
5
+ evaluator=dict(
6
+ type='ais_bench.benchmark.datasets.HumanEvalEvaluator'),
7
+ k=[
8
+ 1,
9
+ 10,
10
+ 100,
11
+ ],
12
+ pred_postprocessor=dict(
13
+ type='ais_bench.benchmark.datasets.humaneval_postprocess_v2')),
14
+ infer_cfg=dict(
15
+ inferencer=dict(
16
+ type='ais_bench.benchmark.openicl.icl_inferencer.GenInferencer'
17
+ ),
18
+ prompt_template=dict(
19
+ template=
20
+ 'You are an intelligent programming assistant to produce Python algorithmic solutions.\nCan you complete the following Python function?\n```python\n{prompt}\n```',
21
+ type=
22
+ 'ais_bench.benchmark.openicl.icl_prompt_template.PromptTemplate'
23
+ ),
24
+ retriever=dict(
25
+ type='ais_bench.benchmark.openicl.icl_retriever.ZeroRetriever')
26
+ ),
27
+ path='ais_bench/datasets/humaneval/human-eval-v2-20210705.jsonl',
28
+ reader_cfg=dict(
29
+ input_columns=[
30
+ 'prompt',
31
+ ],
32
+ output_column='task_id',
33
+ train_split='test'),
34
+ type='ais_bench.benchmark.datasets.HumanevalDataset'),
35
+ ]
36
+ eval = dict(
37
+ partitioner=dict(
38
+ out_dir='outputs/default/20250731_120452/results/',
39
+ type='ais_bench.benchmark.partitioners.naive.NaivePartitioner'),
40
+ runner=dict(
41
+ debug=True,
42
+ max_num_workers=1,
43
+ max_workers_per_gpu=1,
44
+ task=dict(
45
+ dump_details=True,
46
+ type='ais_bench.benchmark.tasks.openicl_eval.OpenICLEvalTask'),
47
+ type='ais_bench.benchmark.runners.local.LocalRunner'))
48
+ infer = dict(
49
+ partitioner=dict(
50
+ out_dir='outputs/default/20250731_120452/predictions/',
51
+ type='ais_bench.benchmark.partitioners.naive.NaivePartitioner'),
52
+ runner=dict(
53
+ debug=True,
54
+ disable_cb=False,
55
+ max_num_workers=1,
56
+ num_prompts=None,
57
+ task=dict(
58
+ type='ais_bench.benchmark.tasks.openicl_infer.OpenICLInferTask'),
59
+ type='ais_bench.benchmark.runners.local_api.LocalAPIRunner'))
60
+ models = [
61
+ dict(
62
+ abbr='vllm-api-general-chat',
63
+ attr='service',
64
+ batch_size=256,
65
+ generation_kwargs=dict(
66
+ repetition_penalty=1.03,
67
+ seed=None,
68
+ temperature=0.6,
69
+ top_k=64,
70
+ top_p=0.95),
71
+ host_ip='127.0.0.1',
72
+ host_port=8688,
73
+ max_out_len=16384,
74
+ model='/mnt/disk7/yiliu4/DeepSeek-R1-0528-G2-2nd/',
75
+ path='',
76
+ request_rate=0,
77
+ retry=2,
78
+ trust_remote_code=True,
79
+ type='ais_bench.benchmark.models.VLLMCustomAPIChat'),
80
+ ]
81
+ summarizer = dict(summary_groups=[
82
+ dict(
83
+ name='agieval-chinese',
84
+ subsets=[
85
+ 'agieval-gaokao-chinese',
86
+ 'agieval-gaokao-english',
87
+ 'agieval-gaokao-geography',
88
+ 'agieval-gaokao-history',
89
+ 'agieval-gaokao-biology',
90
+ 'agieval-gaokao-chemistry',
91
+ 'agieval-gaokao-physics',
92
+ 'agieval-gaokao-mathqa',
93
+ 'agieval-logiqa-zh',
94
+ 'agieval-jec-qa-kd',
95
+ 'agieval-jec-qa-ca',
96
+ 'agieval-gaokao-mathcloze',
97
+ ]),
98
+ dict(
99
+ name='agieval-english',
100
+ subsets=[
101
+ 'agieval-lsat-ar',
102
+ 'agieval-lsat-lr',
103
+ 'agieval-lsat-rc',
104
+ 'agieval-logiqa-en',
105
+ 'agieval-sat-math',
106
+ 'agieval-sat-en',
107
+ 'agieval-sat-en-without-passage',
108
+ 'agieval-aqua-rat',
109
+ 'agieval-math',
110
+ ]),
111
+ dict(
112
+ name='agieval-gaokao',
113
+ subsets=[
114
+ 'agieval-gaokao-chinese',
115
+ 'agieval-gaokao-english',
116
+ 'agieval-gaokao-geography',
117
+ 'agieval-gaokao-history',
118
+ 'agieval-gaokao-biology',
119
+ 'agieval-gaokao-chemistry',
120
+ 'agieval-gaokao-physics',
121
+ 'agieval-gaokao-mathqa',
122
+ 'agieval-gaokao-mathcloze',
123
+ ]),
124
+ dict(
125
+ name='agieval',
126
+ subsets=[
127
+ 'agieval-gaokao-chinese',
128
+ 'agieval-gaokao-english',
129
+ 'agieval-gaokao-geography',
130
+ 'agieval-gaokao-history',
131
+ 'agieval-gaokao-biology',
132
+ 'agieval-gaokao-chemistry',
133
+ 'agieval-gaokao-physics',
134
+ 'agieval-gaokao-mathqa',
135
+ 'agieval-logiqa-zh',
136
+ 'agieval-lsat-ar',
137
+ 'agieval-lsat-lr',
138
+ 'agieval-lsat-rc',
139
+ 'agieval-logiqa-en',
140
+ 'agieval-sat-math',
141
+ 'agieval-sat-en',
142
+ 'agieval-sat-en-without-passage',
143
+ 'agieval-aqua-rat',
144
+ 'agieval-jec-qa-kd',
145
+ 'agieval-jec-qa-ca',
146
+ 'agieval-gaokao-mathcloze',
147
+ 'agieval-math',
148
+ ]),
149
+ dict(
150
+ name='mmlu-humanities',
151
+ subsets=[
152
+ 'lukaemon_mmlu_formal_logic',
153
+ 'lukaemon_mmlu_high_school_european_history',
154
+ 'lukaemon_mmlu_high_school_us_history',
155
+ 'lukaemon_mmlu_high_school_world_history',
156
+ 'lukaemon_mmlu_international_law',
157
+ 'lukaemon_mmlu_jurisprudence',
158
+ 'lukaemon_mmlu_logical_fallacies',
159
+ 'lukaemon_mmlu_moral_disputes',
160
+ 'lukaemon_mmlu_moral_scenarios',
161
+ 'lukaemon_mmlu_philosophy',
162
+ 'lukaemon_mmlu_prehistory',
163
+ 'lukaemon_mmlu_professional_law',
164
+ 'lukaemon_mmlu_world_religions',
165
+ ]),
166
+ dict(
167
+ name='mmlu-stem',
168
+ subsets=[
169
+ 'lukaemon_mmlu_abstract_algebra',
170
+ 'lukaemon_mmlu_anatomy',
171
+ 'lukaemon_mmlu_astronomy',
172
+ 'lukaemon_mmlu_college_biology',
173
+ 'lukaemon_mmlu_college_chemistry',
174
+ 'lukaemon_mmlu_college_computer_science',
175
+ 'lukaemon_mmlu_college_mathematics',
176
+ 'lukaemon_mmlu_college_physics',
177
+ 'lukaemon_mmlu_computer_security',
178
+ 'lukaemon_mmlu_conceptual_physics',
179
+ 'lukaemon_mmlu_electrical_engineering',
180
+ 'lukaemon_mmlu_elementary_mathematics',
181
+ 'lukaemon_mmlu_high_school_biology',
182
+ 'lukaemon_mmlu_high_school_chemistry',
183
+ 'lukaemon_mmlu_high_school_computer_science',
184
+ 'lukaemon_mmlu_high_school_mathematics',
185
+ 'lukaemon_mmlu_high_school_physics',
186
+ 'lukaemon_mmlu_high_school_statistics',
187
+ 'lukaemon_mmlu_machine_learning',
188
+ ]),
189
+ dict(
190
+ name='mmlu-social-science',
191
+ subsets=[
192
+ 'lukaemon_mmlu_econometrics',
193
+ 'lukaemon_mmlu_high_school_geography',
194
+ 'lukaemon_mmlu_high_school_government_and_politics',
195
+ 'lukaemon_mmlu_high_school_macroeconomics',
196
+ 'lukaemon_mmlu_high_school_microeconomics',
197
+ 'lukaemon_mmlu_high_school_psychology',
198
+ 'lukaemon_mmlu_human_sexuality',
199
+ 'lukaemon_mmlu_professional_psychology',
200
+ 'lukaemon_mmlu_public_relations',
201
+ 'lukaemon_mmlu_security_studies',
202
+ 'lukaemon_mmlu_sociology',
203
+ 'lukaemon_mmlu_us_foreign_policy',
204
+ ]),
205
+ dict(
206
+ name='mmlu-other',
207
+ subsets=[
208
+ 'lukaemon_mmlu_business_ethics',
209
+ 'lukaemon_mmlu_clinical_knowledge',
210
+ 'lukaemon_mmlu_college_medicine',
211
+ 'lukaemon_mmlu_global_facts',
212
+ 'lukaemon_mmlu_human_aging',
213
+ 'lukaemon_mmlu_management',
214
+ 'lukaemon_mmlu_marketing',
215
+ 'lukaemon_mmlu_medical_genetics',
216
+ 'lukaemon_mmlu_miscellaneous',
217
+ 'lukaemon_mmlu_nutrition',
218
+ 'lukaemon_mmlu_professional_accounting',
219
+ 'lukaemon_mmlu_professional_medicine',
220
+ 'lukaemon_mmlu_virology',
221
+ ]),
222
+ dict(
223
+ name='mmlu',
224
+ subsets=[
225
+ 'lukaemon_mmlu_formal_logic',
226
+ 'lukaemon_mmlu_high_school_european_history',
227
+ 'lukaemon_mmlu_high_school_us_history',
228
+ 'lukaemon_mmlu_high_school_world_history',
229
+ 'lukaemon_mmlu_international_law',
230
+ 'lukaemon_mmlu_jurisprudence',
231
+ 'lukaemon_mmlu_logical_fallacies',
232
+ 'lukaemon_mmlu_moral_disputes',
233
+ 'lukaemon_mmlu_moral_scenarios',
234
+ 'lukaemon_mmlu_philosophy',
235
+ 'lukaemon_mmlu_prehistory',
236
+ 'lukaemon_mmlu_professional_law',
237
+ 'lukaemon_mmlu_world_religions',
238
+ 'lukaemon_mmlu_abstract_algebra',
239
+ 'lukaemon_mmlu_anatomy',
240
+ 'lukaemon_mmlu_astronomy',
241
+ 'lukaemon_mmlu_college_biology',
242
+ 'lukaemon_mmlu_college_chemistry',
243
+ 'lukaemon_mmlu_college_computer_science',
244
+ 'lukaemon_mmlu_college_mathematics',
245
+ 'lukaemon_mmlu_college_physics',
246
+ 'lukaemon_mmlu_computer_security',
247
+ 'lukaemon_mmlu_conceptual_physics',
248
+ 'lukaemon_mmlu_electrical_engineering',
249
+ 'lukaemon_mmlu_elementary_mathematics',
250
+ 'lukaemon_mmlu_high_school_biology',
251
+ 'lukaemon_mmlu_high_school_chemistry',
252
+ 'lukaemon_mmlu_high_school_computer_science',
253
+ 'lukaemon_mmlu_high_school_mathematics',
254
+ 'lukaemon_mmlu_high_school_physics',
255
+ 'lukaemon_mmlu_high_school_statistics',
256
+ 'lukaemon_mmlu_machine_learning',
257
+ 'lukaemon_mmlu_econometrics',
258
+ 'lukaemon_mmlu_high_school_geography',
259
+ 'lukaemon_mmlu_high_school_government_and_politics',
260
+ 'lukaemon_mmlu_high_school_macroeconomics',
261
+ 'lukaemon_mmlu_high_school_microeconomics',
262
+ 'lukaemon_mmlu_high_school_psychology',
263
+ 'lukaemon_mmlu_human_sexuality',
264
+ 'lukaemon_mmlu_professional_psychology',
265
+ 'lukaemon_mmlu_public_relations',
266
+ 'lukaemon_mmlu_security_studies',
267
+ 'lukaemon_mmlu_sociology',
268
+ 'lukaemon_mmlu_us_foreign_policy',
269
+ 'lukaemon_mmlu_business_ethics',
270
+ 'lukaemon_mmlu_clinical_knowledge',
271
+ 'lukaemon_mmlu_college_medicine',
272
+ 'lukaemon_mmlu_global_facts',
273
+ 'lukaemon_mmlu_human_aging',
274
+ 'lukaemon_mmlu_management',
275
+ 'lukaemon_mmlu_marketing',
276
+ 'lukaemon_mmlu_medical_genetics',
277
+ 'lukaemon_mmlu_miscellaneous',
278
+ 'lukaemon_mmlu_nutrition',
279
+ 'lukaemon_mmlu_professional_accounting',
280
+ 'lukaemon_mmlu_professional_medicine',
281
+ 'lukaemon_mmlu_virology',
282
+ ]),
283
+ dict(
284
+ name='mmlu-weighted',
285
+ subsets=[
286
+ 'lukaemon_mmlu_formal_logic',
287
+ 'lukaemon_mmlu_high_school_european_history',
288
+ 'lukaemon_mmlu_high_school_us_history',
289
+ 'lukaemon_mmlu_high_school_world_history',
290
+ 'lukaemon_mmlu_international_law',
291
+ 'lukaemon_mmlu_jurisprudence',
292
+ 'lukaemon_mmlu_logical_fallacies',
293
+ 'lukaemon_mmlu_moral_disputes',
294
+ 'lukaemon_mmlu_moral_scenarios',
295
+ 'lukaemon_mmlu_philosophy',
296
+ 'lukaemon_mmlu_prehistory',
297
+ 'lukaemon_mmlu_professional_law',
298
+ 'lukaemon_mmlu_world_religions',
299
+ 'lukaemon_mmlu_abstract_algebra',
300
+ 'lukaemon_mmlu_anatomy',
301
+ 'lukaemon_mmlu_astronomy',
302
+ 'lukaemon_mmlu_college_biology',
303
+ 'lukaemon_mmlu_college_chemistry',
304
+ 'lukaemon_mmlu_college_computer_science',
305
+ 'lukaemon_mmlu_college_mathematics',
306
+ 'lukaemon_mmlu_college_physics',
307
+ 'lukaemon_mmlu_computer_security',
308
+ 'lukaemon_mmlu_conceptual_physics',
309
+ 'lukaemon_mmlu_electrical_engineering',
310
+ 'lukaemon_mmlu_elementary_mathematics',
311
+ 'lukaemon_mmlu_high_school_biology',
312
+ 'lukaemon_mmlu_high_school_chemistry',
313
+ 'lukaemon_mmlu_high_school_computer_science',
314
+ 'lukaemon_mmlu_high_school_mathematics',
315
+ 'lukaemon_mmlu_high_school_physics',
316
+ 'lukaemon_mmlu_high_school_statistics',
317
+ 'lukaemon_mmlu_machine_learning',
318
+ 'lukaemon_mmlu_econometrics',
319
+ 'lukaemon_mmlu_high_school_geography',
320
+ 'lukaemon_mmlu_high_school_government_and_politics',
321
+ 'lukaemon_mmlu_high_school_macroeconomics',
322
+ 'lukaemon_mmlu_high_school_microeconomics',
323
+ 'lukaemon_mmlu_high_school_psychology',
324
+ 'lukaemon_mmlu_human_sexuality',
325
+ 'lukaemon_mmlu_professional_psychology',
326
+ 'lukaemon_mmlu_public_relations',
327
+ 'lukaemon_mmlu_security_studies',
328
+ 'lukaemon_mmlu_sociology',
329
+ 'lukaemon_mmlu_us_foreign_policy',
330
+ 'lukaemon_mmlu_business_ethics',
331
+ 'lukaemon_mmlu_clinical_knowledge',
332
+ 'lukaemon_mmlu_college_medicine',
333
+ 'lukaemon_mmlu_global_facts',
334
+ 'lukaemon_mmlu_human_aging',
335
+ 'lukaemon_mmlu_management',
336
+ 'lukaemon_mmlu_marketing',
337
+ 'lukaemon_mmlu_medical_genetics',
338
+ 'lukaemon_mmlu_miscellaneous',
339
+ 'lukaemon_mmlu_nutrition',
340
+ 'lukaemon_mmlu_professional_accounting',
341
+ 'lukaemon_mmlu_professional_medicine',
342
+ 'lukaemon_mmlu_virology',
343
+ ],
344
+ weights=dict(
345
+ lukaemon_mmlu_abstract_algebra=100,
346
+ lukaemon_mmlu_anatomy=135,
347
+ lukaemon_mmlu_astronomy=152,
348
+ lukaemon_mmlu_business_ethics=100,
349
+ lukaemon_mmlu_clinical_knowledge=265,
350
+ lukaemon_mmlu_college_biology=144,
351
+ lukaemon_mmlu_college_chemistry=100,
352
+ lukaemon_mmlu_college_computer_science=100,
353
+ lukaemon_mmlu_college_mathematics=100,
354
+ lukaemon_mmlu_college_medicine=173,
355
+ lukaemon_mmlu_college_physics=102,
356
+ lukaemon_mmlu_computer_security=100,
357
+ lukaemon_mmlu_conceptual_physics=235,
358
+ lukaemon_mmlu_econometrics=114,
359
+ lukaemon_mmlu_electrical_engineering=145,
360
+ lukaemon_mmlu_elementary_mathematics=378,
361
+ lukaemon_mmlu_formal_logic=126,
362
+ lukaemon_mmlu_global_facts=100,
363
+ lukaemon_mmlu_high_school_biology=310,
364
+ lukaemon_mmlu_high_school_chemistry=203,
365
+ lukaemon_mmlu_high_school_computer_science=100,
366
+ lukaemon_mmlu_high_school_european_history=165,
367
+ lukaemon_mmlu_high_school_geography=198,
368
+ lukaemon_mmlu_high_school_government_and_politics=193,
369
+ lukaemon_mmlu_high_school_macroeconomics=390,
370
+ lukaemon_mmlu_high_school_mathematics=270,
371
+ lukaemon_mmlu_high_school_microeconomics=238,
372
+ lukaemon_mmlu_high_school_physics=151,
373
+ lukaemon_mmlu_high_school_psychology=545,
374
+ lukaemon_mmlu_high_school_statistics=216,
375
+ lukaemon_mmlu_high_school_us_history=204,
376
+ lukaemon_mmlu_high_school_world_history=237,
377
+ lukaemon_mmlu_human_aging=223,
378
+ lukaemon_mmlu_human_sexuality=131,
379
+ lukaemon_mmlu_international_law=121,
380
+ lukaemon_mmlu_jurisprudence=108,
381
+ lukaemon_mmlu_logical_fallacies=163,
382
+ lukaemon_mmlu_machine_learning=112,
383
+ lukaemon_mmlu_management=103,
384
+ lukaemon_mmlu_marketing=234,
385
+ lukaemon_mmlu_medical_genetics=100,
386
+ lukaemon_mmlu_miscellaneous=783,
387
+ lukaemon_mmlu_moral_disputes=346,
388
+ lukaemon_mmlu_moral_scenarios=895,
389
+ lukaemon_mmlu_nutrition=306,
390
+ lukaemon_mmlu_philosophy=311,
391
+ lukaemon_mmlu_prehistory=324,
392
+ lukaemon_mmlu_professional_accounting=282,
393
+ lukaemon_mmlu_professional_law=1534,
394
+ lukaemon_mmlu_professional_medicine=272,
395
+ lukaemon_mmlu_professional_psychology=612,
396
+ lukaemon_mmlu_public_relations=110,
397
+ lukaemon_mmlu_security_studies=245,
398
+ lukaemon_mmlu_sociology=201,
399
+ lukaemon_mmlu_us_foreign_policy=100,
400
+ lukaemon_mmlu_virology=166,
401
+ lukaemon_mmlu_world_religions=171)),
402
+ dict(
403
+ name='cmmlu-humanities',
404
+ subsets=[
405
+ 'cmmlu-arts',
406
+ 'cmmlu-chinese_history',
407
+ 'cmmlu-chinese_literature',
408
+ 'cmmlu-college_law',
409
+ 'cmmlu-global_facts',
410
+ 'cmmlu-international_law',
411
+ 'cmmlu-jurisprudence',
412
+ 'cmmlu-logical',
413
+ 'cmmlu-marxist_theory',
414
+ 'cmmlu-philosophy',
415
+ 'cmmlu-professional_law',
416
+ 'cmmlu-world_history',
417
+ 'cmmlu-world_religions',
418
+ ]),
419
+ dict(
420
+ name='cmmlu-stem',
421
+ subsets=[
422
+ 'cmmlu-anatomy',
423
+ 'cmmlu-astronomy',
424
+ 'cmmlu-college_actuarial_science',
425
+ 'cmmlu-college_engineering_hydrology',
426
+ 'cmmlu-college_mathematics',
427
+ 'cmmlu-college_medical_statistics',
428
+ 'cmmlu-computer_science',
429
+ 'cmmlu-conceptual_physics',
430
+ 'cmmlu-electrical_engineering',
431
+ 'cmmlu-elementary_mathematics',
432
+ 'cmmlu-genetics',
433
+ 'cmmlu-high_school_biology',
434
+ 'cmmlu-high_school_chemistry',
435
+ 'cmmlu-high_school_mathematics',
436
+ 'cmmlu-high_school_physics',
437
+ 'cmmlu-machine_learning',
438
+ 'cmmlu-virology',
439
+ ]),
440
+ dict(
441
+ name='cmmlu-social-science',
442
+ subsets=[
443
+ 'cmmlu-ancient_chinese',
444
+ 'cmmlu-business_ethics',
445
+ 'cmmlu-chinese_civil_service_exam',
446
+ 'cmmlu-chinese_food_culture',
447
+ 'cmmlu-chinese_foreign_policy',
448
+ 'cmmlu-chinese_teacher_qualification',
449
+ 'cmmlu-college_education',
450
+ 'cmmlu-economics',
451
+ 'cmmlu-education',
452
+ 'cmmlu-elementary_chinese',
453
+ 'cmmlu-ethnology',
454
+ 'cmmlu-high_school_geography',
455
+ 'cmmlu-high_school_politics',
456
+ 'cmmlu-journalism',
457
+ 'cmmlu-management',
458
+ 'cmmlu-marketing',
459
+ 'cmmlu-modern_chinese',
460
+ 'cmmlu-professional_accounting',
461
+ 'cmmlu-professional_psychology',
462
+ 'cmmlu-public_relations',
463
+ 'cmmlu-security_study',
464
+ 'cmmlu-sociology',
465
+ ]),
466
+ dict(
467
+ name='cmmlu-other',
468
+ subsets=[
469
+ 'cmmlu-agronomy',
470
+ 'cmmlu-chinese_driving_rule',
471
+ 'cmmlu-clinical_knowledge',
472
+ 'cmmlu-college_medicine',
473
+ 'cmmlu-computer_security',
474
+ 'cmmlu-construction_project_management',
475
+ 'cmmlu-elementary_commonsense',
476
+ 'cmmlu-elementary_information_and_technology',
477
+ 'cmmlu-food_science',
478
+ 'cmmlu-human_sexuality',
479
+ 'cmmlu-legal_and_moral_basis',
480
+ 'cmmlu-nutrition',
481
+ 'cmmlu-professional_medicine',
482
+ 'cmmlu-sports_science',
483
+ 'cmmlu-traditional_chinese_medicine',
484
+ ]),
485
+ dict(
486
+ name='cmmlu-china-specific',
487
+ subsets=[
488
+ 'cmmlu-ancient_chinese',
489
+ 'cmmlu-chinese_civil_service_exam',
490
+ 'cmmlu-chinese_driving_rule',
491
+ 'cmmlu-chinese_food_culture',
492
+ 'cmmlu-chinese_foreign_policy',
493
+ 'cmmlu-chinese_history',
494
+ 'cmmlu-chinese_literature',
495
+ 'cmmlu-chinese_teacher_qualification',
496
+ 'cmmlu-construction_project_management',
497
+ 'cmmlu-elementary_chinese',
498
+ 'cmmlu-elementary_commonsense',
499
+ 'cmmlu-ethnology',
500
+ 'cmmlu-high_school_politics',
501
+ 'cmmlu-modern_chinese',
502
+ 'cmmlu-traditional_chinese_medicine',
503
+ ]),
504
+ dict(
505
+ name='cmmlu',
506
+ subsets=[
507
+ 'cmmlu-agronomy',
508
+ 'cmmlu-anatomy',
509
+ 'cmmlu-ancient_chinese',
510
+ 'cmmlu-arts',
511
+ 'cmmlu-astronomy',
512
+ 'cmmlu-business_ethics',
513
+ 'cmmlu-chinese_civil_service_exam',
514
+ 'cmmlu-chinese_driving_rule',
515
+ 'cmmlu-chinese_food_culture',
516
+ 'cmmlu-chinese_foreign_policy',
517
+ 'cmmlu-chinese_history',
518
+ 'cmmlu-chinese_literature',
519
+ 'cmmlu-chinese_teacher_qualification',
520
+ 'cmmlu-college_actuarial_science',
521
+ 'cmmlu-college_education',
522
+ 'cmmlu-college_engineering_hydrology',
523
+ 'cmmlu-college_law',
524
+ 'cmmlu-college_mathematics',
525
+ 'cmmlu-college_medical_statistics',
526
+ 'cmmlu-clinical_knowledge',
527
+ 'cmmlu-college_medicine',
528
+ 'cmmlu-computer_science',
529
+ 'cmmlu-computer_security',
530
+ 'cmmlu-conceptual_physics',
531
+ 'cmmlu-construction_project_management',
532
+ 'cmmlu-economics',
533
+ 'cmmlu-education',
534
+ 'cmmlu-elementary_chinese',
535
+ 'cmmlu-elementary_commonsense',
536
+ 'cmmlu-elementary_information_and_technology',
537
+ 'cmmlu-electrical_engineering',
538
+ 'cmmlu-elementary_mathematics',
539
+ 'cmmlu-ethnology',
540
+ 'cmmlu-food_science',
541
+ 'cmmlu-genetics',
542
+ 'cmmlu-global_facts',
543
+ 'cmmlu-high_school_biology',
544
+ 'cmmlu-high_school_chemistry',
545
+ 'cmmlu-high_school_geography',
546
+ 'cmmlu-high_school_mathematics',
547
+ 'cmmlu-high_school_physics',
548
+ 'cmmlu-high_school_politics',
549
+ 'cmmlu-human_sexuality',
550
+ 'cmmlu-international_law',
551
+ 'cmmlu-journalism',
552
+ 'cmmlu-jurisprudence',
553
+ 'cmmlu-legal_and_moral_basis',
554
+ 'cmmlu-logical',
555
+ 'cmmlu-machine_learning',
556
+ 'cmmlu-management',
557
+ 'cmmlu-marketing',
558
+ 'cmmlu-marxist_theory',
559
+ 'cmmlu-modern_chinese',
560
+ 'cmmlu-nutrition',
561
+ 'cmmlu-philosophy',
562
+ 'cmmlu-professional_accounting',
563
+ 'cmmlu-professional_law',
564
+ 'cmmlu-professional_medicine',
565
+ 'cmmlu-professional_psychology',
566
+ 'cmmlu-public_relations',
567
+ 'cmmlu-security_study',
568
+ 'cmmlu-sociology',
569
+ 'cmmlu-sports_science',
570
+ 'cmmlu-traditional_chinese_medicine',
571
+ 'cmmlu-virology',
572
+ 'cmmlu-world_history',
573
+ 'cmmlu-world_religions',
574
+ ]),
575
+ dict(
576
+ name='cmmlu-weighted',
577
+ subsets=[
578
+ 'cmmlu-agronomy',
579
+ 'cmmlu-anatomy',
580
+ 'cmmlu-ancient_chinese',
581
+ 'cmmlu-arts',
582
+ 'cmmlu-astronomy',
583
+ 'cmmlu-business_ethics',
584
+ 'cmmlu-chinese_civil_service_exam',
585
+ 'cmmlu-chinese_driving_rule',
586
+ 'cmmlu-chinese_food_culture',
587
+ 'cmmlu-chinese_foreign_policy',
588
+ 'cmmlu-chinese_history',
589
+ 'cmmlu-chinese_literature',
590
+ 'cmmlu-chinese_teacher_qualification',
591
+ 'cmmlu-college_actuarial_science',
592
+ 'cmmlu-college_education',
593
+ 'cmmlu-college_engineering_hydrology',
594
+ 'cmmlu-college_law',
595
+ 'cmmlu-college_mathematics',
596
+ 'cmmlu-college_medical_statistics',
597
+ 'cmmlu-clinical_knowledge',
598
+ 'cmmlu-college_medicine',
599
+ 'cmmlu-computer_science',
600
+ 'cmmlu-computer_security',
601
+ 'cmmlu-conceptual_physics',
602
+ 'cmmlu-construction_project_management',
603
+ 'cmmlu-economics',
604
+ 'cmmlu-education',
605
+ 'cmmlu-elementary_chinese',
606
+ 'cmmlu-elementary_commonsense',
607
+ 'cmmlu-elementary_information_and_technology',
608
+ 'cmmlu-electrical_engineering',
609
+ 'cmmlu-elementary_mathematics',
610
+ 'cmmlu-ethnology',
611
+ 'cmmlu-food_science',
612
+ 'cmmlu-genetics',
613
+ 'cmmlu-global_facts',
614
+ 'cmmlu-high_school_biology',
615
+ 'cmmlu-high_school_chemistry',
616
+ 'cmmlu-high_school_geography',
617
+ 'cmmlu-high_school_mathematics',
618
+ 'cmmlu-high_school_physics',
619
+ 'cmmlu-high_school_politics',
620
+ 'cmmlu-human_sexuality',
621
+ 'cmmlu-international_law',
622
+ 'cmmlu-journalism',
623
+ 'cmmlu-jurisprudence',
624
+ 'cmmlu-legal_and_moral_basis',
625
+ 'cmmlu-logical',
626
+ 'cmmlu-machine_learning',
627
+ 'cmmlu-management',
628
+ 'cmmlu-marketing',
629
+ 'cmmlu-marxist_theory',
630
+ 'cmmlu-modern_chinese',
631
+ 'cmmlu-nutrition',
632
+ 'cmmlu-philosophy',
633
+ 'cmmlu-professional_accounting',
634
+ 'cmmlu-professional_law',
635
+ 'cmmlu-professional_medicine',
636
+ 'cmmlu-professional_psychology',
637
+ 'cmmlu-public_relations',
638
+ 'cmmlu-security_study',
639
+ 'cmmlu-sociology',
640
+ 'cmmlu-sports_science',
641
+ 'cmmlu-traditional_chinese_medicine',
642
+ 'cmmlu-virology',
643
+ 'cmmlu-world_history',
644
+ 'cmmlu-world_religions',
645
+ ],
646
+ weights=dict({
647
+ 'cmmlu-agronomy': 169,
648
+ 'cmmlu-anatomy': 148,
649
+ 'cmmlu-ancient_chinese': 164,
650
+ 'cmmlu-arts': 160,
651
+ 'cmmlu-astronomy': 165,
652
+ 'cmmlu-business_ethics': 209,
653
+ 'cmmlu-chinese_civil_service_exam': 160,
654
+ 'cmmlu-chinese_driving_rule': 131,
655
+ 'cmmlu-chinese_food_culture': 136,
656
+ 'cmmlu-chinese_foreign_policy': 107,
657
+ 'cmmlu-chinese_history': 323,
658
+ 'cmmlu-chinese_literature': 204,
659
+ 'cmmlu-chinese_teacher_qualification': 179,
660
+ 'cmmlu-clinical_knowledge': 237,
661
+ 'cmmlu-college_actuarial_science': 106,
662
+ 'cmmlu-college_education': 107,
663
+ 'cmmlu-college_engineering_hydrology': 106,
664
+ 'cmmlu-college_law': 108,
665
+ 'cmmlu-college_mathematics': 105,
666
+ 'cmmlu-college_medical_statistics': 106,
667
+ 'cmmlu-college_medicine': 273,
668
+ 'cmmlu-computer_science': 204,
669
+ 'cmmlu-computer_security': 171,
670
+ 'cmmlu-conceptual_physics': 147,
671
+ 'cmmlu-construction_project_management': 139,
672
+ 'cmmlu-economics': 159,
673
+ 'cmmlu-education': 163,
674
+ 'cmmlu-electrical_engineering': 172,
675
+ 'cmmlu-elementary_chinese': 252,
676
+ 'cmmlu-elementary_commonsense': 198,
677
+ 'cmmlu-elementary_information_and_technology': 238,
678
+ 'cmmlu-elementary_mathematics': 230,
679
+ 'cmmlu-ethnology': 135,
680
+ 'cmmlu-food_science': 143,
681
+ 'cmmlu-genetics': 176,
682
+ 'cmmlu-global_facts': 149,
683
+ 'cmmlu-high_school_biology': 169,
684
+ 'cmmlu-high_school_chemistry': 132,
685
+ 'cmmlu-high_school_geography': 118,
686
+ 'cmmlu-high_school_mathematics': 164,
687
+ 'cmmlu-high_school_physics': 110,
688
+ 'cmmlu-high_school_politics': 143,
689
+ 'cmmlu-human_sexuality': 126,
690
+ 'cmmlu-international_law': 185,
691
+ 'cmmlu-journalism': 172,
692
+ 'cmmlu-jurisprudence': 411,
693
+ 'cmmlu-legal_and_moral_basis': 214,
694
+ 'cmmlu-logical': 123,
695
+ 'cmmlu-machine_learning': 122,
696
+ 'cmmlu-management': 210,
697
+ 'cmmlu-marketing': 180,
698
+ 'cmmlu-marxist_theory': 189,
699
+ 'cmmlu-modern_chinese': 116,
700
+ 'cmmlu-nutrition': 145,
701
+ 'cmmlu-philosophy': 105,
702
+ 'cmmlu-professional_accounting': 175,
703
+ 'cmmlu-professional_law': 211,
704
+ 'cmmlu-professional_medicine': 376,
705
+ 'cmmlu-professional_psychology': 232,
706
+ 'cmmlu-public_relations': 174,
707
+ 'cmmlu-security_study': 135,
708
+ 'cmmlu-sociology': 226,
709
+ 'cmmlu-sports_science': 165,
710
+ 'cmmlu-traditional_chinese_medicine': 185,
711
+ 'cmmlu-virology': 169,
712
+ 'cmmlu-world_history': 161,
713
+ 'cmmlu-world_religions': 160
714
+ })),
715
+ dict(
716
+ name='ceval-stem',
717
+ subsets=[
718
+ 'ceval-computer_network',
719
+ 'ceval-operating_system',
720
+ 'ceval-computer_architecture',
721
+ 'ceval-college_programming',
722
+ 'ceval-college_physics',
723
+ 'ceval-college_chemistry',
724
+ 'ceval-advanced_mathematics',
725
+ 'ceval-probability_and_statistics',
726
+ 'ceval-discrete_mathematics',
727
+ 'ceval-electrical_engineer',
728
+ 'ceval-metrology_engineer',
729
+ 'ceval-high_school_mathematics',
730
+ 'ceval-high_school_physics',
731
+ 'ceval-high_school_chemistry',
732
+ 'ceval-high_school_biology',
733
+ 'ceval-middle_school_mathematics',
734
+ 'ceval-middle_school_biology',
735
+ 'ceval-middle_school_physics',
736
+ 'ceval-middle_school_chemistry',
737
+ 'ceval-veterinary_medicine',
738
+ ]),
739
+ dict(
740
+ name='ceval-social-science',
741
+ subsets=[
742
+ 'ceval-college_economics',
743
+ 'ceval-business_administration',
744
+ 'ceval-marxism',
745
+ 'ceval-mao_zedong_thought',
746
+ 'ceval-education_science',
747
+ 'ceval-teacher_qualification',
748
+ 'ceval-high_school_politics',
749
+ 'ceval-high_school_geography',
750
+ 'ceval-middle_school_politics',
751
+ 'ceval-middle_school_geography',
752
+ ]),
753
+ dict(
754
+ name='ceval-humanities',
755
+ subsets=[
756
+ 'ceval-modern_chinese_history',
757
+ 'ceval-ideological_and_moral_cultivation',
758
+ 'ceval-logic',
759
+ 'ceval-law',
760
+ 'ceval-chinese_language_and_literature',
761
+ 'ceval-art_studies',
762
+ 'ceval-professional_tour_guide',
763
+ 'ceval-legal_professional',
764
+ 'ceval-high_school_chinese',
765
+ 'ceval-high_school_history',
766
+ 'ceval-middle_school_history',
767
+ ]),
768
+ dict(
769
+ name='ceval-other',
770
+ subsets=[
771
+ 'ceval-civil_servant',
772
+ 'ceval-sports_science',
773
+ 'ceval-plant_protection',
774
+ 'ceval-basic_medicine',
775
+ 'ceval-clinical_medicine',
776
+ 'ceval-urban_and_rural_planner',
777
+ 'ceval-accountant',
778
+ 'ceval-fire_engineer',
779
+ 'ceval-environmental_impact_assessment_engineer',
780
+ 'ceval-tax_accountant',
781
+ 'ceval-physician',
782
+ ]),
783
+ dict(
784
+ name='ceval-hard',
785
+ subsets=[
786
+ 'ceval-advanced_mathematics',
787
+ 'ceval-discrete_mathematics',
788
+ 'ceval-probability_and_statistics',
789
+ 'ceval-college_chemistry',
790
+ 'ceval-college_physics',
791
+ 'ceval-high_school_mathematics',
792
+ 'ceval-high_school_chemistry',
793
+ 'ceval-high_school_physics',
794
+ ]),
795
+ dict(
796
+ name='ceval',
797
+ subsets=[
798
+ 'ceval-computer_network',
799
+ 'ceval-operating_system',
800
+ 'ceval-computer_architecture',
801
+ 'ceval-college_programming',
802
+ 'ceval-college_physics',
803
+ 'ceval-college_chemistry',
804
+ 'ceval-advanced_mathematics',
805
+ 'ceval-probability_and_statistics',
806
+ 'ceval-discrete_mathematics',
807
+ 'ceval-electrical_engineer',
808
+ 'ceval-metrology_engineer',
809
+ 'ceval-high_school_mathematics',
810
+ 'ceval-high_school_physics',
811
+ 'ceval-high_school_chemistry',
812
+ 'ceval-high_school_biology',
813
+ 'ceval-middle_school_mathematics',
814
+ 'ceval-middle_school_biology',
815
+ 'ceval-middle_school_physics',
816
+ 'ceval-middle_school_chemistry',
817
+ 'ceval-veterinary_medicine',
818
+ 'ceval-college_economics',
819
+ 'ceval-business_administration',
820
+ 'ceval-marxism',
821
+ 'ceval-mao_zedong_thought',
822
+ 'ceval-education_science',
823
+ 'ceval-teacher_qualification',
824
+ 'ceval-high_school_politics',
825
+ 'ceval-high_school_geography',
826
+ 'ceval-middle_school_politics',
827
+ 'ceval-middle_school_geography',
828
+ 'ceval-modern_chinese_history',
829
+ 'ceval-ideological_and_moral_cultivation',
830
+ 'ceval-logic',
831
+ 'ceval-law',
832
+ 'ceval-chinese_language_and_literature',
833
+ 'ceval-art_studies',
834
+ 'ceval-professional_tour_guide',
835
+ 'ceval-legal_professional',
836
+ 'ceval-high_school_chinese',
837
+ 'ceval-high_school_history',
838
+ 'ceval-middle_school_history',
839
+ 'ceval-civil_servant',
840
+ 'ceval-sports_science',
841
+ 'ceval-plant_protection',
842
+ 'ceval-basic_medicine',
843
+ 'ceval-clinical_medicine',
844
+ 'ceval-urban_and_rural_planner',
845
+ 'ceval-accountant',
846
+ 'ceval-fire_engineer',
847
+ 'ceval-environmental_impact_assessment_engineer',
848
+ 'ceval-tax_accountant',
849
+ 'ceval-physician',
850
+ ]),
851
+ dict(
852
+ name='ceval-weighted',
853
+ subsets=[
854
+ 'ceval-computer_network',
855
+ 'ceval-operating_system',
856
+ 'ceval-computer_architecture',
857
+ 'ceval-college_programming',
858
+ 'ceval-college_physics',
859
+ 'ceval-college_chemistry',
860
+ 'ceval-advanced_mathematics',
861
+ 'ceval-probability_and_statistics',
862
+ 'ceval-discrete_mathematics',
863
+ 'ceval-electrical_engineer',
864
+ 'ceval-metrology_engineer',
865
+ 'ceval-high_school_mathematics',
866
+ 'ceval-high_school_physics',
867
+ 'ceval-high_school_chemistry',
868
+ 'ceval-high_school_biology',
869
+ 'ceval-middle_school_mathematics',
870
+ 'ceval-middle_school_biology',
871
+ 'ceval-middle_school_physics',
872
+ 'ceval-middle_school_chemistry',
873
+ 'ceval-veterinary_medicine',
874
+ 'ceval-college_economics',
875
+ 'ceval-business_administration',
876
+ 'ceval-marxism',
877
+ 'ceval-mao_zedong_thought',
878
+ 'ceval-education_science',
879
+ 'ceval-teacher_qualification',
880
+ 'ceval-high_school_politics',
881
+ 'ceval-high_school_geography',
882
+ 'ceval-middle_school_politics',
883
+ 'ceval-middle_school_geography',
884
+ 'ceval-modern_chinese_history',
885
+ 'ceval-ideological_and_moral_cultivation',
886
+ 'ceval-logic',
887
+ 'ceval-law',
888
+ 'ceval-chinese_language_and_literature',
889
+ 'ceval-art_studies',
890
+ 'ceval-professional_tour_guide',
891
+ 'ceval-legal_professional',
892
+ 'ceval-high_school_chinese',
893
+ 'ceval-high_school_history',
894
+ 'ceval-middle_school_history',
895
+ 'ceval-civil_servant',
896
+ 'ceval-sports_science',
897
+ 'ceval-plant_protection',
898
+ 'ceval-basic_medicine',
899
+ 'ceval-clinical_medicine',
900
+ 'ceval-urban_and_rural_planner',
901
+ 'ceval-accountant',
902
+ 'ceval-fire_engineer',
903
+ 'ceval-environmental_impact_assessment_engineer',
904
+ 'ceval-tax_accountant',
905
+ 'ceval-physician',
906
+ ],
907
+ weights=dict({
908
+ 'ceval-accountant': 49,
909
+ 'ceval-advanced_mathematics': 19,
910
+ 'ceval-art_studies': 33,
911
+ 'ceval-basic_medicine': 19,
912
+ 'ceval-business_administration': 33,
913
+ 'ceval-chinese_language_and_literature': 23,
914
+ 'ceval-civil_servant': 47,
915
+ 'ceval-clinical_medicine': 22,
916
+ 'ceval-college_chemistry': 24,
917
+ 'ceval-college_economics': 55,
918
+ 'ceval-college_physics': 19,
919
+ 'ceval-college_programming': 37,
920
+ 'ceval-computer_architecture': 21,
921
+ 'ceval-computer_network': 19,
922
+ 'ceval-discrete_mathematics': 16,
923
+ 'ceval-education_science': 29,
924
+ 'ceval-electrical_engineer': 37,
925
+ 'ceval-environmental_impact_assessment_engineer': 31,
926
+ 'ceval-fire_engineer': 31,
927
+ 'ceval-high_school_biology': 19,
928
+ 'ceval-high_school_chemistry': 19,
929
+ 'ceval-high_school_chinese': 19,
930
+ 'ceval-high_school_geography': 19,
931
+ 'ceval-high_school_history': 20,
932
+ 'ceval-high_school_mathematics': 18,
933
+ 'ceval-high_school_physics': 19,
934
+ 'ceval-high_school_politics': 19,
935
+ 'ceval-ideological_and_moral_cultivation': 19,
936
+ 'ceval-law': 24,
937
+ 'ceval-legal_professional': 23,
938
+ 'ceval-logic': 22,
939
+ 'ceval-mao_zedong_thought': 24,
940
+ 'ceval-marxism': 19,
941
+ 'ceval-metrology_engineer': 24,
942
+ 'ceval-middle_school_biology': 21,
943
+ 'ceval-middle_school_chemistry': 20,
944
+ 'ceval-middle_school_geography': 12,
945
+ 'ceval-middle_school_history': 22,
946
+ 'ceval-middle_school_mathematics': 19,
947
+ 'ceval-middle_school_physics': 19,
948
+ 'ceval-middle_school_politics': 21,
949
+ 'ceval-modern_chinese_history': 23,
950
+ 'ceval-operating_system': 19,
951
+ 'ceval-physician': 49,
952
+ 'ceval-plant_protection': 22,
953
+ 'ceval-probability_and_statistics': 18,
954
+ 'ceval-professional_tour_guide': 29,
955
+ 'ceval-sports_science': 19,
956
+ 'ceval-tax_accountant': 49,
957
+ 'ceval-teacher_qualification': 44,
958
+ 'ceval-urban_and_rural_planner': 46,
959
+ 'ceval-veterinary_medicine': 23
960
+ })),
961
+ dict(
962
+ name='ceval-test-stem',
963
+ subsets=[
964
+ 'ceval-test-computer_network',
965
+ 'ceval-test-operating_system',
966
+ 'ceval-test-computer_architecture',
967
+ 'ceval-test-college_programming',
968
+ 'ceval-test-college_physics',
969
+ 'ceval-test-college_chemistry',
970
+ 'ceval-test-advanced_mathematics',
971
+ 'ceval-test-probability_and_statistics',
972
+ 'ceval-test-discrete_mathematics',
973
+ 'ceval-test-electrical_engineer',
974
+ 'ceval-test-metrology_engineer',
975
+ 'ceval-test-high_school_mathematics',
976
+ 'ceval-test-high_school_physics',
977
+ 'ceval-test-high_school_chemistry',
978
+ 'ceval-test-high_school_biology',
979
+ 'ceval-test-middle_school_mathematics',
980
+ 'ceval-test-middle_school_biology',
981
+ 'ceval-test-middle_school_physics',
982
+ 'ceval-test-middle_school_chemistry',
983
+ 'ceval-test-veterinary_medicine',
984
+ ]),
985
+ dict(
986
+ name='ceval-test-social-science',
987
+ subsets=[
988
+ 'ceval-test-college_economics',
989
+ 'ceval-test-business_administration',
990
+ 'ceval-test-marxism',
991
+ 'ceval-test-mao_zedong_thought',
992
+ 'ceval-test-education_science',
993
+ 'ceval-test-teacher_qualification',
994
+ 'ceval-test-high_school_politics',
995
+ 'ceval-test-high_school_geography',
996
+ 'ceval-test-middle_school_politics',
997
+ 'ceval-test-middle_school_geography',
998
+ ]),
999
+ dict(
1000
+ name='ceval-test-humanities',
1001
+ subsets=[
1002
+ 'ceval-test-modern_chinese_history',
1003
+ 'ceval-test-ideological_and_moral_cultivation',
1004
+ 'ceval-test-logic',
1005
+ 'ceval-test-law',
1006
+ 'ceval-test-chinese_language_and_literature',
1007
+ 'ceval-test-art_studies',
1008
+ 'ceval-test-professional_tour_guide',
1009
+ 'ceval-test-legal_professional',
1010
+ 'ceval-test-high_school_chinese',
1011
+ 'ceval-test-high_school_history',
1012
+ 'ceval-test-middle_school_history',
1013
+ ]),
1014
+ dict(
1015
+ name='ceval-test-other',
1016
+ subsets=[
1017
+ 'ceval-test-civil_servant',
1018
+ 'ceval-test-sports_science',
1019
+ 'ceval-test-plant_protection',
1020
+ 'ceval-test-basic_medicine',
1021
+ 'ceval-test-clinical_medicine',
1022
+ 'ceval-test-urban_and_rural_planner',
1023
+ 'ceval-test-accountant',
1024
+ 'ceval-test-fire_engineer',
1025
+ 'ceval-test-environmental_impact_assessment_engineer',
1026
+ 'ceval-test-tax_accountant',
1027
+ 'ceval-test-physician',
1028
+ ]),
1029
+ dict(
1030
+ name='ceval-test-hard',
1031
+ subsets=[
1032
+ 'ceval-test-advanced_mathematics',
1033
+ 'ceval-test-discrete_mathematics',
1034
+ 'ceval-test-probability_and_statistics',
1035
+ 'ceval-test-college_chemistry',
1036
+ 'ceval-test-college_physics',
1037
+ 'ceval-test-high_school_mathematics',
1038
+ 'ceval-test-high_school_chemistry',
1039
+ 'ceval-test-high_school_physics',
1040
+ ]),
1041
+ dict(
1042
+ name='ceval-test',
1043
+ subsets=[
1044
+ 'ceval-test-computer_network',
1045
+ 'ceval-test-operating_system',
1046
+ 'ceval-test-computer_architecture',
1047
+ 'ceval-test-college_programming',
1048
+ 'ceval-test-college_physics',
1049
+ 'ceval-test-college_chemistry',
1050
+ 'ceval-test-advanced_mathematics',
1051
+ 'ceval-test-probability_and_statistics',
1052
+ 'ceval-test-discrete_mathematics',
1053
+ 'ceval-test-electrical_engineer',
1054
+ 'ceval-test-metrology_engineer',
1055
+ 'ceval-test-high_school_mathematics',
1056
+ 'ceval-test-high_school_physics',
1057
+ 'ceval-test-high_school_chemistry',
1058
+ 'ceval-test-high_school_biology',
1059
+ 'ceval-test-middle_school_mathematics',
1060
+ 'ceval-test-middle_school_biology',
1061
+ 'ceval-test-middle_school_physics',
1062
+ 'ceval-test-middle_school_chemistry',
1063
+ 'ceval-test-veterinary_medicine',
1064
+ 'ceval-test-college_economics',
1065
+ 'ceval-test-business_administration',
1066
+ 'ceval-test-marxism',
1067
+ 'ceval-test-mao_zedong_thought',
1068
+ 'ceval-test-education_science',
1069
+ 'ceval-test-teacher_qualification',
1070
+ 'ceval-test-high_school_politics',
1071
+ 'ceval-test-high_school_geography',
1072
+ 'ceval-test-middle_school_politics',
1073
+ 'ceval-test-middle_school_geography',
1074
+ 'ceval-test-modern_chinese_history',
1075
+ 'ceval-test-ideological_and_moral_cultivation',
1076
+ 'ceval-test-logic',
1077
+ 'ceval-test-law',
1078
+ 'ceval-test-chinese_language_and_literature',
1079
+ 'ceval-test-art_studies',
1080
+ 'ceval-test-professional_tour_guide',
1081
+ 'ceval-test-legal_professional',
1082
+ 'ceval-test-high_school_chinese',
1083
+ 'ceval-test-high_school_history',
1084
+ 'ceval-test-middle_school_history',
1085
+ 'ceval-test-civil_servant',
1086
+ 'ceval-test-sports_science',
1087
+ 'ceval-test-plant_protection',
1088
+ 'ceval-test-basic_medicine',
1089
+ 'ceval-test-clinical_medicine',
1090
+ 'ceval-test-urban_and_rural_planner',
1091
+ 'ceval-test-accountant',
1092
+ 'ceval-test-fire_engineer',
1093
+ 'ceval-test-environmental_impact_assessment_engineer',
1094
+ 'ceval-test-tax_accountant',
1095
+ 'ceval-test-physician',
1096
+ ]),
1097
+ dict(
1098
+ name='ceval-test-weighted',
1099
+ subsets=[
1100
+ 'ceval-test-computer_network',
1101
+ 'ceval-test-operating_system',
1102
+ 'ceval-test-computer_architecture',
1103
+ 'ceval-test-college_programming',
1104
+ 'ceval-test-college_physics',
1105
+ 'ceval-test-college_chemistry',
1106
+ 'ceval-test-advanced_mathematics',
1107
+ 'ceval-test-probability_and_statistics',
1108
+ 'ceval-test-discrete_mathematics',
1109
+ 'ceval-test-electrical_engineer',
1110
+ 'ceval-test-metrology_engineer',
1111
+ 'ceval-test-high_school_mathematics',
1112
+ 'ceval-test-high_school_physics',
1113
+ 'ceval-test-high_school_chemistry',
1114
+ 'ceval-test-high_school_biology',
1115
+ 'ceval-test-middle_school_mathematics',
1116
+ 'ceval-test-middle_school_biology',
1117
+ 'ceval-test-middle_school_physics',
1118
+ 'ceval-test-middle_school_chemistry',
1119
+ 'ceval-test-veterinary_medicine',
1120
+ 'ceval-test-college_economics',
1121
+ 'ceval-test-business_administration',
1122
+ 'ceval-test-marxism',
1123
+ 'ceval-test-mao_zedong_thought',
1124
+ 'ceval-test-education_science',
1125
+ 'ceval-test-teacher_qualification',
1126
+ 'ceval-test-high_school_politics',
1127
+ 'ceval-test-high_school_geography',
1128
+ 'ceval-test-middle_school_politics',
1129
+ 'ceval-test-middle_school_geography',
1130
+ 'ceval-test-modern_chinese_history',
1131
+ 'ceval-test-ideological_and_moral_cultivation',
1132
+ 'ceval-test-logic',
1133
+ 'ceval-test-law',
1134
+ 'ceval-test-chinese_language_and_literature',
1135
+ 'ceval-test-art_studies',
1136
+ 'ceval-test-professional_tour_guide',
1137
+ 'ceval-test-legal_professional',
1138
+ 'ceval-test-high_school_chinese',
1139
+ 'ceval-test-high_school_history',
1140
+ 'ceval-test-middle_school_history',
1141
+ 'ceval-test-civil_servant',
1142
+ 'ceval-test-sports_science',
1143
+ 'ceval-test-plant_protection',
1144
+ 'ceval-test-basic_medicine',
1145
+ 'ceval-test-clinical_medicine',
1146
+ 'ceval-test-urban_and_rural_planner',
1147
+ 'ceval-test-accountant',
1148
+ 'ceval-test-fire_engineer',
1149
+ 'ceval-test-environmental_impact_assessment_engineer',
1150
+ 'ceval-test-tax_accountant',
1151
+ 'ceval-test-physician',
1152
+ ],
1153
+ weights=dict({
1154
+ 'ceval-test-accountant': 443,
1155
+ 'ceval-test-advanced_mathematics': 173,
1156
+ 'ceval-test-art_studies': 298,
1157
+ 'ceval-test-basic_medicine': 175,
1158
+ 'ceval-test-business_administration': 301,
1159
+ 'ceval-test-chinese_language_and_literature': 209,
1160
+ 'ceval-test-civil_servant': 429,
1161
+ 'ceval-test-clinical_medicine': 200,
1162
+ 'ceval-test-college_chemistry': 224,
1163
+ 'ceval-test-college_economics': 497,
1164
+ 'ceval-test-college_physics': 176,
1165
+ 'ceval-test-college_programming': 342,
1166
+ 'ceval-test-computer_architecture': 193,
1167
+ 'ceval-test-computer_network': 171,
1168
+ 'ceval-test-discrete_mathematics': 153,
1169
+ 'ceval-test-education_science': 270,
1170
+ 'ceval-test-electrical_engineer': 339,
1171
+ 'ceval-test-environmental_impact_assessment_engineer': 281,
1172
+ 'ceval-test-fire_engineer': 282,
1173
+ 'ceval-test-high_school_biology': 175,
1174
+ 'ceval-test-high_school_chemistry': 172,
1175
+ 'ceval-test-high_school_chinese': 178,
1176
+ 'ceval-test-high_school_geography': 178,
1177
+ 'ceval-test-high_school_history': 182,
1178
+ 'ceval-test-high_school_mathematics': 166,
1179
+ 'ceval-test-high_school_physics': 175,
1180
+ 'ceval-test-high_school_politics': 176,
1181
+ 'ceval-test-ideological_and_moral_cultivation': 172,
1182
+ 'ceval-test-law': 221,
1183
+ 'ceval-test-legal_professional': 215,
1184
+ 'ceval-test-logic': 204,
1185
+ 'ceval-test-mao_zedong_thought': 219,
1186
+ 'ceval-test-marxism': 179,
1187
+ 'ceval-test-metrology_engineer': 219,
1188
+ 'ceval-test-middle_school_biology': 192,
1189
+ 'ceval-test-middle_school_chemistry': 185,
1190
+ 'ceval-test-middle_school_geography': 108,
1191
+ 'ceval-test-middle_school_history': 207,
1192
+ 'ceval-test-middle_school_mathematics': 177,
1193
+ 'ceval-test-middle_school_physics': 178,
1194
+ 'ceval-test-middle_school_politics': 193,
1195
+ 'ceval-test-modern_chinese_history': 212,
1196
+ 'ceval-test-operating_system': 179,
1197
+ 'ceval-test-physician': 443,
1198
+ 'ceval-test-plant_protection': 199,
1199
+ 'ceval-test-probability_and_statistics': 166,
1200
+ 'ceval-test-professional_tour_guide': 266,
1201
+ 'ceval-test-sports_science': 180,
1202
+ 'ceval-test-tax_accountant': 443,
1203
+ 'ceval-test-teacher_qualification': 399,
1204
+ 'ceval-test-urban_and_rural_planner': 418,
1205
+ 'ceval-test-veterinary_medicine': 210
1206
+ })),
1207
+ dict(
1208
+ name='bbh',
1209
+ subsets=[
1210
+ 'bbh-temporal_sequences',
1211
+ 'bbh-disambiguation_qa',
1212
+ 'bbh-date_understanding',
1213
+ 'bbh-tracking_shuffled_objects_three_objects',
1214
+ 'bbh-penguins_in_a_table',
1215
+ 'bbh-geometric_shapes',
1216
+ 'bbh-snarks',
1217
+ 'bbh-ruin_names',
1218
+ 'bbh-tracking_shuffled_objects_seven_objects',
1219
+ 'bbh-tracking_shuffled_objects_five_objects',
1220
+ 'bbh-logical_deduction_three_objects',
1221
+ 'bbh-hyperbaton',
1222
+ 'bbh-logical_deduction_five_objects',
1223
+ 'bbh-logical_deduction_seven_objects',
1224
+ 'bbh-movie_recommendation',
1225
+ 'bbh-salient_translation_error_detection',
1226
+ 'bbh-reasoning_about_colored_objects',
1227
+ 'bbh-multistep_arithmetic_two',
1228
+ 'bbh-navigate',
1229
+ 'bbh-dyck_languages',
1230
+ 'bbh-word_sorting',
1231
+ 'bbh-sports_understanding',
1232
+ 'bbh-boolean_expressions',
1233
+ 'bbh-object_counting',
1234
+ 'bbh-formal_fallacies',
1235
+ 'bbh-causal_judgement',
1236
+ 'bbh-web_of_lies',
1237
+ ]),
1238
+ dict(
1239
+ name='GaokaoBench',
1240
+ subsets=[
1241
+ 'GaokaoBench_2010-2022_Math_II_MCQs',
1242
+ 'GaokaoBench_2010-2022_Math_I_MCQs',
1243
+ 'GaokaoBench_2010-2022_History_MCQs',
1244
+ 'GaokaoBench_2010-2022_Biology_MCQs',
1245
+ 'GaokaoBench_2010-2022_Political_Science_MCQs',
1246
+ 'GaokaoBench_2010-2022_Physics_MCQs',
1247
+ 'GaokaoBench_2010-2022_Chemistry_MCQs',
1248
+ 'GaokaoBench_2010-2013_English_MCQs',
1249
+ 'GaokaoBench_2010-2022_Chinese_Modern_Lit',
1250
+ 'GaokaoBench_2010-2022_English_Fill_in_Blanks',
1251
+ 'GaokaoBench_2012-2022_English_Cloze_Test',
1252
+ 'GaokaoBench_2010-2022_Geography_MCQs',
1253
+ 'GaokaoBench_2010-2022_English_Reading_Comp',
1254
+ 'GaokaoBench_2010-2022_Chinese_Lang_and_Usage_MCQs',
1255
+ ],
1256
+ weights=dict({
1257
+ 'GaokaoBench_2010-2013_English_MCQs': 105,
1258
+ 'GaokaoBench_2010-2022_Biology_MCQs': 900,
1259
+ 'GaokaoBench_2010-2022_Chemistry_MCQs': 744,
1260
+ 'GaokaoBench_2010-2022_Chinese_Lang_and_Usage_MCQs': 240,
1261
+ 'GaokaoBench_2010-2022_Chinese_Modern_Lit': 261,
1262
+ 'GaokaoBench_2010-2022_English_Fill_in_Blanks': 900.0,
1263
+ 'GaokaoBench_2010-2022_English_Reading_Comp': 940,
1264
+ 'GaokaoBench_2010-2022_Geography_MCQs': 380,
1265
+ 'GaokaoBench_2010-2022_History_MCQs': 1148,
1266
+ 'GaokaoBench_2010-2022_Math_II_MCQs': 1090,
1267
+ 'GaokaoBench_2010-2022_Math_I_MCQs': 1070,
1268
+ 'GaokaoBench_2010-2022_Physics_MCQs': 384,
1269
+ 'GaokaoBench_2010-2022_Political_Science_MCQs': 1280,
1270
+ 'GaokaoBench_2012-2022_English_Cloze_Test': 260
1271
+ })),
1272
+ dict(
1273
+ name='flores_100_Indo-European-Germanic_English',
1274
+ subsets=[
1275
+ 'flores_100_afr-eng',
1276
+ 'flores_100_dan-eng',
1277
+ 'flores_100_deu-eng',
1278
+ 'flores_100_isl-eng',
1279
+ 'flores_100_ltz-eng',
1280
+ 'flores_100_nld-eng',
1281
+ 'flores_100_nob-eng',
1282
+ 'flores_100_swe-eng',
1283
+ ]),
1284
+ dict(
1285
+ name='flores_100_English_Indo-European-Germanic',
1286
+ subsets=[
1287
+ 'flores_100_eng-afr',
1288
+ 'flores_100_eng-dan',
1289
+ 'flores_100_eng-deu',
1290
+ 'flores_100_eng-isl',
1291
+ 'flores_100_eng-ltz',
1292
+ 'flores_100_eng-nld',
1293
+ 'flores_100_eng-nob',
1294
+ 'flores_100_eng-swe',
1295
+ ]),
1296
+ dict(
1297
+ name='flores_100_Indo-European-Romance_English',
1298
+ subsets=[
1299
+ 'flores_100_ast-eng',
1300
+ 'flores_100_cat-eng',
1301
+ 'flores_100_fra-eng',
1302
+ 'flores_100_glg-eng',
1303
+ 'flores_100_oci-eng',
1304
+ 'flores_100_por-eng',
1305
+ 'flores_100_ron-eng',
1306
+ 'flores_100_spa-eng',
1307
+ ]),
1308
+ dict(
1309
+ name='flores_100_English_Indo-European-Romance',
1310
+ subsets=[
1311
+ 'flores_100_eng-ast',
1312
+ 'flores_100_eng-cat',
1313
+ 'flores_100_eng-fra',
1314
+ 'flores_100_eng-glg',
1315
+ 'flores_100_eng-oci',
1316
+ 'flores_100_eng-por',
1317
+ 'flores_100_eng-ron',
1318
+ 'flores_100_eng-spa',
1319
+ ]),
1320
+ dict(
1321
+ name='flores_100_Indo-European-Slavic_English',
1322
+ subsets=[
1323
+ 'flores_100_bel-eng',
1324
+ 'flores_100_bos-eng',
1325
+ 'flores_100_bul-eng',
1326
+ 'flores_100_ces-eng',
1327
+ 'flores_100_hrv-eng',
1328
+ 'flores_100_mkd-eng',
1329
+ 'flores_100_pol-eng',
1330
+ 'flores_100_rus-eng',
1331
+ 'flores_100_slk-eng',
1332
+ 'flores_100_slv-eng',
1333
+ 'flores_100_srp-eng',
1334
+ 'flores_100_ukr-eng',
1335
+ ]),
1336
+ dict(
1337
+ name='flores_100_English_Indo-European-Slavic',
1338
+ subsets=[
1339
+ 'flores_100_eng-bel',
1340
+ 'flores_100_eng-bos',
1341
+ 'flores_100_eng-bul',
1342
+ 'flores_100_eng-ces',
1343
+ 'flores_100_eng-hrv',
1344
+ 'flores_100_eng-mkd',
1345
+ 'flores_100_eng-pol',
1346
+ 'flores_100_eng-rus',
1347
+ 'flores_100_eng-slk',
1348
+ 'flores_100_eng-slv',
1349
+ 'flores_100_eng-srp',
1350
+ 'flores_100_eng-ukr',
1351
+ ]),
1352
+ dict(
1353
+ name='flores_100_Indo-European-Indo-Aryan_English',
1354
+ subsets=[
1355
+ 'flores_100_asm-eng',
1356
+ 'flores_100_ben-eng',
1357
+ 'flores_100_guj-eng',
1358
+ 'flores_100_hin-eng',
1359
+ 'flores_100_mar-eng',
1360
+ 'flores_100_npi-eng',
1361
+ 'flores_100_ory-eng',
1362
+ 'flores_100_pan-eng',
1363
+ 'flores_100_snd-eng',
1364
+ 'flores_100_urd-eng',
1365
+ ]),
1366
+ dict(
1367
+ name='flores_100_English_Indo-European-Indo-Aryan',
1368
+ subsets=[
1369
+ 'flores_100_eng-asm',
1370
+ 'flores_100_eng-ben',
1371
+ 'flores_100_eng-guj',
1372
+ 'flores_100_eng-hin',
1373
+ 'flores_100_eng-mar',
1374
+ 'flores_100_eng-npi',
1375
+ 'flores_100_eng-ory',
1376
+ 'flores_100_eng-pan',
1377
+ 'flores_100_eng-snd',
1378
+ 'flores_100_eng-urd',
1379
+ ]),
1380
+ dict(
1381
+ name='flores_100_Indo-European-Other_English',
1382
+ subsets=[
1383
+ 'flores_100_ckb-eng',
1384
+ 'flores_100_cym-eng',
1385
+ 'flores_100_ell-eng',
1386
+ 'flores_100_fas-eng',
1387
+ 'flores_100_gle-eng',
1388
+ 'flores_100_hye-eng',
1389
+ 'flores_100_ita-eng',
1390
+ 'flores_100_lav-eng',
1391
+ 'flores_100_lit-eng',
1392
+ 'flores_100_pus-eng',
1393
+ 'flores_100_tgk-eng',
1394
+ ]),
1395
+ dict(
1396
+ name='flores_100_English_Indo-European-Other',
1397
+ subsets=[
1398
+ 'flores_100_eng-ckb',
1399
+ 'flores_100_eng-cym',
1400
+ 'flores_100_eng-ell',
1401
+ 'flores_100_eng-fas',
1402
+ 'flores_100_eng-gle',
1403
+ 'flores_100_eng-hye',
1404
+ 'flores_100_eng-ita',
1405
+ 'flores_100_eng-lav',
1406
+ 'flores_100_eng-lit',
1407
+ 'flores_100_eng-pus',
1408
+ 'flores_100_eng-tgk',
1409
+ ]),
1410
+ dict(
1411
+ name='flores_100_Austronesian_English',
1412
+ subsets=[
1413
+ 'flores_100_ceb-eng',
1414
+ 'flores_100_ind-eng',
1415
+ 'flores_100_jav-eng',
1416
+ 'flores_100_mri-eng',
1417
+ 'flores_100_msa-eng',
1418
+ 'flores_100_tgl-eng',
1419
+ ]),
1420
+ dict(
1421
+ name='flores_100_English_Austronesian',
1422
+ subsets=[
1423
+ 'flores_100_eng-ceb',
1424
+ 'flores_100_eng-ind',
1425
+ 'flores_100_eng-jav',
1426
+ 'flores_100_eng-mri',
1427
+ 'flores_100_eng-msa',
1428
+ 'flores_100_eng-tgl',
1429
+ ]),
1430
+ dict(
1431
+ name='flores_100_Atlantic-Congo_English',
1432
+ subsets=[
1433
+ 'flores_100_ibo-eng',
1434
+ 'flores_100_kam-eng',
1435
+ 'flores_100_kea-eng',
1436
+ 'flores_100_lin-eng',
1437
+ 'flores_100_lug-eng',
1438
+ 'flores_100_nso-eng',
1439
+ 'flores_100_nya-eng',
1440
+ 'flores_100_sna-eng',
1441
+ 'flores_100_swh-eng',
1442
+ 'flores_100_umb-eng',
1443
+ 'flores_100_wol-eng',
1444
+ 'flores_100_xho-eng',
1445
+ 'flores_100_yor-eng',
1446
+ 'flores_100_zul-eng',
1447
+ ]),
1448
+ dict(
1449
+ name='flores_100_English_Atlantic-Congo',
1450
+ subsets=[
1451
+ 'flores_100_eng-ibo',
1452
+ 'flores_100_eng-kam',
1453
+ 'flores_100_eng-kea',
1454
+ 'flores_100_eng-lin',
1455
+ 'flores_100_eng-lug',
1456
+ 'flores_100_eng-nso',
1457
+ 'flores_100_eng-nya',
1458
+ 'flores_100_eng-sna',
1459
+ 'flores_100_eng-swh',
1460
+ 'flores_100_eng-umb',
1461
+ 'flores_100_eng-wol',
1462
+ 'flores_100_eng-xho',
1463
+ 'flores_100_eng-yor',
1464
+ 'flores_100_eng-zul',
1465
+ ]),
1466
+ dict(
1467
+ name='flores_100_Afro-Asiatic_English',
1468
+ subsets=[
1469
+ 'flores_100_amh-eng',
1470
+ 'flores_100_ara-eng',
1471
+ 'flores_100_ful-eng',
1472
+ 'flores_100_mlt-eng',
1473
+ 'flores_100_orm-eng',
1474
+ 'flores_100_som-eng',
1475
+ ]),
1476
+ dict(
1477
+ name='flores_100_English_Afro-Asiatic',
1478
+ subsets=[
1479
+ 'flores_100_eng-amh',
1480
+ 'flores_100_eng-ara',
1481
+ 'flores_100_eng-ful',
1482
+ 'flores_100_eng-mlt',
1483
+ 'flores_100_eng-orm',
1484
+ 'flores_100_eng-som',
1485
+ ]),
1486
+ dict(
1487
+ name='flores_100_Turkic_English',
1488
+ subsets=[
1489
+ 'flores_100_azj-eng',
1490
+ 'flores_100_kaz-eng',
1491
+ 'flores_100_kir-eng',
1492
+ 'flores_100_tur-eng',
1493
+ 'flores_100_uzb-eng',
1494
+ ]),
1495
+ dict(
1496
+ name='flores_100_English_Turkic',
1497
+ subsets=[
1498
+ 'flores_100_eng-azj',
1499
+ 'flores_100_eng-kaz',
1500
+ 'flores_100_eng-kir',
1501
+ 'flores_100_eng-tur',
1502
+ 'flores_100_eng-uzb',
1503
+ ]),
1504
+ dict(
1505
+ name='flores_100_Dravidian_English',
1506
+ subsets=[
1507
+ 'flores_100_kan-eng',
1508
+ 'flores_100_mal-eng',
1509
+ 'flores_100_tam-eng',
1510
+ 'flores_100_tel-eng',
1511
+ ]),
1512
+ dict(
1513
+ name='flores_100_English_Dravidian',
1514
+ subsets=[
1515
+ 'flores_100_eng-kan',
1516
+ 'flores_100_eng-mal',
1517
+ 'flores_100_eng-tam',
1518
+ 'flores_100_eng-tel',
1519
+ ]),
1520
+ dict(
1521
+ name='flores_100_Sino-Tibetan_English',
1522
+ subsets=[
1523
+ 'flores_100_mya-eng',
1524
+ 'flores_100_zho_simpl-eng',
1525
+ 'flores_100_zho_trad-eng',
1526
+ ]),
1527
+ dict(
1528
+ name='flores_100_English_Sino-Tibetan',
1529
+ subsets=[
1530
+ 'flores_100_eng-mya',
1531
+ 'flores_100_eng-zho_simpl',
1532
+ 'flores_100_eng-zho_trad',
1533
+ ]),
1534
+ dict(
1535
+ name='flores_100_Other_English',
1536
+ subsets=[
1537
+ 'flores_100_est-eng',
1538
+ 'flores_100_fin-eng',
1539
+ 'flores_100_hau-eng',
1540
+ 'flores_100_heb-eng',
1541
+ 'flores_100_hun-eng',
1542
+ 'flores_100_jpn-eng',
1543
+ 'flores_100_kat-eng',
1544
+ 'flores_100_khm-eng',
1545
+ 'flores_100_kor-eng',
1546
+ 'flores_100_lao-eng',
1547
+ 'flores_100_luo-eng',
1548
+ 'flores_100_mon-eng',
1549
+ 'flores_100_tha-eng',
1550
+ 'flores_100_vie-eng',
1551
+ ]),
1552
+ dict(
1553
+ name='flores_100_English_Other',
1554
+ subsets=[
1555
+ 'flores_100_eng-est',
1556
+ 'flores_100_eng-fin',
1557
+ 'flores_100_eng-hau',
1558
+ 'flores_100_eng-heb',
1559
+ 'flores_100_eng-hun',
1560
+ 'flores_100_eng-jpn',
1561
+ 'flores_100_eng-kat',
1562
+ 'flores_100_eng-khm',
1563
+ 'flores_100_eng-kor',
1564
+ 'flores_100_eng-lao',
1565
+ 'flores_100_eng-luo',
1566
+ 'flores_100_eng-mon',
1567
+ 'flores_100_eng-tha',
1568
+ 'flores_100_eng-vie',
1569
+ ]),
1570
+ dict(
1571
+ name='flores_100',
1572
+ subsets=[
1573
+ 'flores_100_afr-eng',
1574
+ 'flores_100_dan-eng',
1575
+ 'flores_100_deu-eng',
1576
+ 'flores_100_isl-eng',
1577
+ 'flores_100_ltz-eng',
1578
+ 'flores_100_nld-eng',
1579
+ 'flores_100_nob-eng',
1580
+ 'flores_100_swe-eng',
1581
+ 'flores_100_ast-eng',
1582
+ 'flores_100_cat-eng',
1583
+ 'flores_100_fra-eng',
1584
+ 'flores_100_glg-eng',
1585
+ 'flores_100_oci-eng',
1586
+ 'flores_100_por-eng',
1587
+ 'flores_100_ron-eng',
1588
+ 'flores_100_spa-eng',
1589
+ 'flores_100_bel-eng',
1590
+ 'flores_100_bos-eng',
1591
+ 'flores_100_bul-eng',
1592
+ 'flores_100_ces-eng',
1593
+ 'flores_100_hrv-eng',
1594
+ 'flores_100_mkd-eng',
1595
+ 'flores_100_pol-eng',
1596
+ 'flores_100_rus-eng',
1597
+ 'flores_100_slk-eng',
1598
+ 'flores_100_slv-eng',
1599
+ 'flores_100_srp-eng',
1600
+ 'flores_100_ukr-eng',
1601
+ 'flores_100_asm-eng',
1602
+ 'flores_100_ben-eng',
1603
+ 'flores_100_guj-eng',
1604
+ 'flores_100_hin-eng',
1605
+ 'flores_100_mar-eng',
1606
+ 'flores_100_npi-eng',
1607
+ 'flores_100_ory-eng',
1608
+ 'flores_100_pan-eng',
1609
+ 'flores_100_snd-eng',
1610
+ 'flores_100_urd-eng',
1611
+ 'flores_100_ckb-eng',
1612
+ 'flores_100_cym-eng',
1613
+ 'flores_100_ell-eng',
1614
+ 'flores_100_fas-eng',
1615
+ 'flores_100_gle-eng',
1616
+ 'flores_100_hye-eng',
1617
+ 'flores_100_ita-eng',
1618
+ 'flores_100_lav-eng',
1619
+ 'flores_100_lit-eng',
1620
+ 'flores_100_pus-eng',
1621
+ 'flores_100_tgk-eng',
1622
+ 'flores_100_ceb-eng',
1623
+ 'flores_100_ind-eng',
1624
+ 'flores_100_jav-eng',
1625
+ 'flores_100_mri-eng',
1626
+ 'flores_100_msa-eng',
1627
+ 'flores_100_tgl-eng',
1628
+ 'flores_100_ibo-eng',
1629
+ 'flores_100_kam-eng',
1630
+ 'flores_100_kea-eng',
1631
+ 'flores_100_lin-eng',
1632
+ 'flores_100_lug-eng',
1633
+ 'flores_100_nso-eng',
1634
+ 'flores_100_nya-eng',
1635
+ 'flores_100_sna-eng',
1636
+ 'flores_100_swh-eng',
1637
+ 'flores_100_umb-eng',
1638
+ 'flores_100_wol-eng',
1639
+ 'flores_100_xho-eng',
1640
+ 'flores_100_yor-eng',
1641
+ 'flores_100_zul-eng',
1642
+ 'flores_100_amh-eng',
1643
+ 'flores_100_ara-eng',
1644
+ 'flores_100_ful-eng',
1645
+ 'flores_100_mlt-eng',
1646
+ 'flores_100_orm-eng',
1647
+ 'flores_100_som-eng',
1648
+ 'flores_100_azj-eng',
1649
+ 'flores_100_kaz-eng',
1650
+ 'flores_100_kir-eng',
1651
+ 'flores_100_tur-eng',
1652
+ 'flores_100_uzb-eng',
1653
+ 'flores_100_kan-eng',
1654
+ 'flores_100_mal-eng',
1655
+ 'flores_100_tam-eng',
1656
+ 'flores_100_tel-eng',
1657
+ 'flores_100_mya-eng',
1658
+ 'flores_100_zho_simpl-eng',
1659
+ 'flores_100_zho_trad-eng',
1660
+ 'flores_100_est-eng',
1661
+ 'flores_100_fin-eng',
1662
+ 'flores_100_hau-eng',
1663
+ 'flores_100_heb-eng',
1664
+ 'flores_100_hun-eng',
1665
+ 'flores_100_jpn-eng',
1666
+ 'flores_100_kat-eng',
1667
+ 'flores_100_khm-eng',
1668
+ 'flores_100_kor-eng',
1669
+ 'flores_100_lao-eng',
1670
+ 'flores_100_luo-eng',
1671
+ 'flores_100_mon-eng',
1672
+ 'flores_100_tha-eng',
1673
+ 'flores_100_vie-eng',
1674
+ 'flores_100_eng-afr',
1675
+ 'flores_100_eng-dan',
1676
+ 'flores_100_eng-deu',
1677
+ 'flores_100_eng-isl',
1678
+ 'flores_100_eng-ltz',
1679
+ 'flores_100_eng-nld',
1680
+ 'flores_100_eng-nob',
1681
+ 'flores_100_eng-swe',
1682
+ 'flores_100_eng-ast',
1683
+ 'flores_100_eng-cat',
1684
+ 'flores_100_eng-fra',
1685
+ 'flores_100_eng-glg',
1686
+ 'flores_100_eng-oci',
1687
+ 'flores_100_eng-por',
1688
+ 'flores_100_eng-ron',
1689
+ 'flores_100_eng-spa',
1690
+ 'flores_100_eng-bel',
1691
+ 'flores_100_eng-bos',
1692
+ 'flores_100_eng-bul',
1693
+ 'flores_100_eng-ces',
1694
+ 'flores_100_eng-hrv',
1695
+ 'flores_100_eng-mkd',
1696
+ 'flores_100_eng-pol',
1697
+ 'flores_100_eng-rus',
1698
+ 'flores_100_eng-slk',
1699
+ 'flores_100_eng-slv',
1700
+ 'flores_100_eng-srp',
1701
+ 'flores_100_eng-ukr',
1702
+ 'flores_100_eng-asm',
1703
+ 'flores_100_eng-ben',
1704
+ 'flores_100_eng-guj',
1705
+ 'flores_100_eng-hin',
1706
+ 'flores_100_eng-mar',
1707
+ 'flores_100_eng-npi',
1708
+ 'flores_100_eng-ory',
1709
+ 'flores_100_eng-pan',
1710
+ 'flores_100_eng-snd',
1711
+ 'flores_100_eng-urd',
1712
+ 'flores_100_eng-ckb',
1713
+ 'flores_100_eng-cym',
1714
+ 'flores_100_eng-ell',
1715
+ 'flores_100_eng-fas',
1716
+ 'flores_100_eng-gle',
1717
+ 'flores_100_eng-hye',
1718
+ 'flores_100_eng-ita',
1719
+ 'flores_100_eng-lav',
1720
+ 'flores_100_eng-lit',
1721
+ 'flores_100_eng-pus',
1722
+ 'flores_100_eng-tgk',
1723
+ 'flores_100_eng-ceb',
1724
+ 'flores_100_eng-ind',
1725
+ 'flores_100_eng-jav',
1726
+ 'flores_100_eng-mri',
1727
+ 'flores_100_eng-msa',
1728
+ 'flores_100_eng-tgl',
1729
+ 'flores_100_eng-ibo',
1730
+ 'flores_100_eng-kam',
1731
+ 'flores_100_eng-kea',
1732
+ 'flores_100_eng-lin',
1733
+ 'flores_100_eng-lug',
1734
+ 'flores_100_eng-nso',
1735
+ 'flores_100_eng-nya',
1736
+ 'flores_100_eng-sna',
1737
+ 'flores_100_eng-swh',
1738
+ 'flores_100_eng-umb',
1739
+ 'flores_100_eng-wol',
1740
+ 'flores_100_eng-xho',
1741
+ 'flores_100_eng-yor',
1742
+ 'flores_100_eng-zul',
1743
+ 'flores_100_eng-amh',
1744
+ 'flores_100_eng-ara',
1745
+ 'flores_100_eng-ful',
1746
+ 'flores_100_eng-mlt',
1747
+ 'flores_100_eng-orm',
1748
+ 'flores_100_eng-som',
1749
+ 'flores_100_eng-azj',
1750
+ 'flores_100_eng-kaz',
1751
+ 'flores_100_eng-kir',
1752
+ 'flores_100_eng-tur',
1753
+ 'flores_100_eng-uzb',
1754
+ 'flores_100_eng-kan',
1755
+ 'flores_100_eng-mal',
1756
+ 'flores_100_eng-tam',
1757
+ 'flores_100_eng-tel',
1758
+ 'flores_100_eng-mya',
1759
+ 'flores_100_eng-zho_simpl',
1760
+ 'flores_100_eng-zho_trad',
1761
+ 'flores_100_eng-est',
1762
+ 'flores_100_eng-fin',
1763
+ 'flores_100_eng-hau',
1764
+ 'flores_100_eng-heb',
1765
+ 'flores_100_eng-hun',
1766
+ 'flores_100_eng-jpn',
1767
+ 'flores_100_eng-kat',
1768
+ 'flores_100_eng-khm',
1769
+ 'flores_100_eng-kor',
1770
+ 'flores_100_eng-lao',
1771
+ 'flores_100_eng-luo',
1772
+ 'flores_100_eng-mon',
1773
+ 'flores_100_eng-tha',
1774
+ 'flores_100_eng-vie',
1775
+ ]),
1776
+ dict(
1777
+ name='tydiqa-goldp',
1778
+ subsets=[
1779
+ 'tydiqa-goldp_arabic',
1780
+ 'tydiqa-goldp_bengali',
1781
+ 'tydiqa-goldp_english',
1782
+ 'tydiqa-goldp_finnish',
1783
+ 'tydiqa-goldp_indonesian',
1784
+ 'tydiqa-goldp_japanese',
1785
+ 'tydiqa-goldp_korean',
1786
+ 'tydiqa-goldp_russian',
1787
+ 'tydiqa-goldp_swahili',
1788
+ 'tydiqa-goldp_telugu',
1789
+ 'tydiqa-goldp_thai',
1790
+ ]),
1791
+ dict(
1792
+ name='xiezhi',
1793
+ subsets=[
1794
+ 'xiezhi-spec_eng',
1795
+ 'xiezhi-spec_chn',
1796
+ 'xiezhi-inter_eng',
1797
+ 'xiezhi-inter_chn',
1798
+ ]),
1799
+ dict(
1800
+ name='scibench',
1801
+ subsets=[
1802
+ 'scibench-atkins',
1803
+ 'scibench-calculus',
1804
+ 'scibench-chemmc',
1805
+ 'scibench-class',
1806
+ 'scibench-diff',
1807
+ 'scibench-fund',
1808
+ 'scibench-matter',
1809
+ 'scibench-quan',
1810
+ 'scibench-stat',
1811
+ 'scibench-thermo',
1812
+ ]),
1813
+ dict(
1814
+ name='scibench_zs-cot',
1815
+ subsets=[
1816
+ 'scibench-atkins_zs-cot',
1817
+ 'scibench-calculus_zs-cot',
1818
+ 'scibench-chemmc_zs-cot',
1819
+ 'scibench-class_zs-cot',
1820
+ 'scibench-diff_zs-cot',
1821
+ 'scibench-fund_zs-cot',
1822
+ 'scibench-matter_zs-cot',
1823
+ 'scibench-quan_zs-cot',
1824
+ 'scibench-stat_zs-cot',
1825
+ 'scibench-thermo_zs-cot',
1826
+ ]),
1827
+ dict(
1828
+ name='scibench_fs',
1829
+ subsets=[
1830
+ 'scibench-atkins_fs',
1831
+ 'scibench-calculus_fs',
1832
+ 'scibench-chemmc_fs',
1833
+ 'scibench-class_fs',
1834
+ 'scibench-diff_fs',
1835
+ 'scibench-fund_fs',
1836
+ 'scibench-matter_fs',
1837
+ 'scibench-quan_fs',
1838
+ 'scibench-stat_fs',
1839
+ 'scibench-thermo_fs',
1840
+ ]),
1841
+ dict(
1842
+ name='scibench_fs-cot',
1843
+ subsets=[
1844
+ 'scibench-atkins_fs-cot',
1845
+ 'scibench-calculus_fs-cot',
1846
+ 'scibench-chemmc_fs-cot',
1847
+ 'scibench-class_fs-cot',
1848
+ 'scibench-diff_fs-cot',
1849
+ 'scibench-fund_fs-cot',
1850
+ 'scibench-matter_fs-cot',
1851
+ 'scibench-quan_fs-cot',
1852
+ 'scibench-stat_fs-cot',
1853
+ 'scibench-thermo_fs-cot',
1854
+ ]),
1855
+ dict(
1856
+ name='mgsm_latin',
1857
+ subsets=[
1858
+ 'mgsm_de',
1859
+ 'mgsm_en',
1860
+ 'mgsm_es',
1861
+ 'mgsm_fr',
1862
+ 'mgsm_sw',
1863
+ ]),
1864
+ dict(
1865
+ name='mgsm_non_latin',
1866
+ subsets=[
1867
+ 'mgsm_bn',
1868
+ 'mgsm_ja',
1869
+ 'mgsm_ru',
1870
+ 'mgsm_te',
1871
+ 'mgsm_th',
1872
+ 'mgsm_zh',
1873
+ ]),
1874
+ dict(
1875
+ name='mgsm',
1876
+ subsets=[
1877
+ 'mgsm_bn',
1878
+ 'mgsm_de',
1879
+ 'mgsm_en',
1880
+ 'mgsm_es',
1881
+ 'mgsm_fr',
1882
+ 'mgsm_ja',
1883
+ 'mgsm_ru',
1884
+ 'mgsm_sw',
1885
+ 'mgsm_te',
1886
+ 'mgsm_th',
1887
+ 'mgsm_zh',
1888
+ ]),
1889
+ dict(
1890
+ name='longbench_single-document-qa',
1891
+ subsets=[
1892
+ 'LongBench_narrativeqa',
1893
+ 'LongBench_qasper',
1894
+ 'LongBench_multifieldqa_en',
1895
+ 'LongBench_multifieldqa_zh',
1896
+ ]),
1897
+ dict(
1898
+ name='longbench_multi-document-qa',
1899
+ subsets=[
1900
+ 'LongBench_hotpotqa',
1901
+ 'LongBench_2wikimqa',
1902
+ 'LongBench_musique',
1903
+ 'LongBench_dureader',
1904
+ ]),
1905
+ dict(
1906
+ name='longbench_summarization',
1907
+ subsets=[
1908
+ 'LongBench_gov_report',
1909
+ 'LongBench_qmsum',
1910
+ 'LongBench_multi_news',
1911
+ 'LongBench_vcsum',
1912
+ ]),
1913
+ dict(
1914
+ name='longbench_few-shot-learning',
1915
+ subsets=[
1916
+ 'LongBench_trec',
1917
+ 'LongBench_triviaqa',
1918
+ 'LongBench_samsum',
1919
+ 'LongBench_lsht',
1920
+ ]),
1921
+ dict(
1922
+ name='longbench_synthetic-tasks',
1923
+ subsets=[
1924
+ 'LongBench_passage_count',
1925
+ 'LongBench_passage_retrieval_en',
1926
+ 'LongBench_passage_retrieval_zh',
1927
+ ]),
1928
+ dict(
1929
+ name='longbench_code-completion',
1930
+ subsets=[
1931
+ 'LongBench_lcc',
1932
+ 'LongBench_repobench-p',
1933
+ ]),
1934
+ dict(
1935
+ name='longbench_zh',
1936
+ subsets=[
1937
+ 'LongBench_multifieldqa_zh',
1938
+ 'LongBench_dureader',
1939
+ 'LongBench_vcsum',
1940
+ 'LongBench_lsht',
1941
+ 'LongBench_passage_retrieval_zh',
1942
+ 'LongBench_lcc',
1943
+ 'LongBench_repobench-p',
1944
+ ]),
1945
+ dict(
1946
+ name='longbench_en',
1947
+ subsets=[
1948
+ 'LongBench_narrativeqa',
1949
+ 'LongBench_qasper',
1950
+ 'LongBench_multifieldqa_en',
1951
+ 'LongBench_hotpotqa',
1952
+ 'LongBench_2wikimqa',
1953
+ 'LongBench_musique',
1954
+ 'LongBench_gov_report',
1955
+ 'LongBench_qmsum',
1956
+ 'LongBench_multi_news',
1957
+ 'LongBench_trec',
1958
+ 'LongBench_triviaqa',
1959
+ 'LongBench_samsum',
1960
+ 'LongBench_passage_count',
1961
+ 'LongBench_passage_retrieval_en',
1962
+ 'LongBench_lcc',
1963
+ 'LongBench_repobench-p',
1964
+ ]),
1965
+ dict(
1966
+ name='longbench',
1967
+ subsets=[
1968
+ 'longbench_single-document-qa',
1969
+ 'longbench_multi-document-qa',
1970
+ 'longbench_summarization',
1971
+ 'longbench_few-shot-learning',
1972
+ 'longbench_synthetic-tasks',
1973
+ 'longbench_code-completion',
1974
+ ]),
1975
+ ])
1976
+ work_dir = 'outputs/default/20250731_120452'
20250731_120452-unified-expand/predictions/vllm-api-general-chat/openai_humaneval.json ADDED
The diff for this file is too large to render. See raw diff
 
20250731_120452-unified-expand/results/vllm-api-general-chat/openai_humaneval.json ADDED
The diff for this file is too large to render. See raw diff
 
20250731_120452-unified-expand/summary/summary_20250731_120452.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ dataset,version,metric,mode,vllm-api-general-chat
2
+ openai_humaneval,f4a973,humaneval_pass@1,gen,95.73
20250731_120452-unified-expand/summary/summary_20250731_120452.md ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ | dataset | version | metric | mode | vllm-api-general-chat |
2
+ |----- | ----- | ----- | ----- | -----|
3
+ | openai_humaneval | f4a973 | humaneval_pass@1 | gen | 95.73 |
20250731_120452-unified-expand/summary/summary_20250731_120452.txt ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 20250731_120452
2
+ tabulate format
3
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4
+ dataset version metric mode vllm-api-general-chat
5
+ ---------------- --------- ---------------- ------ -----------------------
6
+ openai_humaneval f4a973 humaneval_pass@1 gen 95.73
7
+ $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
8
+
9
+ -------------------------------------------------------------------------------------------------------------------------------- THIS IS A DIVIDER --------------------------------------------------------------------------------------------------------------------------------
10
+
11
+ csv format
12
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13
+ dataset,version,metric,mode,vllm-api-general-chat
14
+ openai_humaneval,f4a973,humaneval_pass@1,gen,95.73
15
+ $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
16
+
17
+ markdown format
18
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19
+ | dataset | version | metric | mode | vllm-api-general-chat |
20
+ |----- | ----- | ----- | ----- | -----|
21
+ | openai_humaneval | f4a973 | humaneval_pass@1 | gen | 95.73 |
22
+
23
+ $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
24
+ -------------------------------------------------------------------------------------------------------------------------------- THIS IS A DIVIDER --------------------------------------------------------------------------------------------------------------------------------
25
+
26
+ raw format
27
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
28
+ -------------------------------
29
+ Model: vllm-api-general-chat
30
+ openai_humaneval: {'humaneval_pass@1': 95.73170731707317}
31
+ $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$