File size: 6,545 Bytes
411675c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
base_model:
- ByteDance-Seed/Seed-Coder-8B-Instruct
---
vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.576|±  |0.0313|
|     |       |strict-match    |     5|exact_match|↑  |0.576|±  |0.0313|

vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.602|±  |0.0219|
|     |       |strict-match    |     5|exact_match|↑  |0.598|±  |0.0219|

vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.4386|±  |0.0167|
| - humanities     |      2|none  |      |acc   |↑  |0.4000|±  |0.0343|
| - other          |      2|none  |      |acc   |↑  |0.4872|±  |0.0356|
| - social sciences|      2|none  |      |acc   |↑  |0.4389|±  |0.0364|
| - stem           |      2|none  |      |acc   |↑  |0.4316|±  |0.0288|


vllm (pretrained=/root/autodl-tmp/80-128,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  | 0.56|±  |0.0315|
|     |       |strict-match    |     5|exact_match|↑  | 0.56|±  |0.0315|

vllm (pretrained=/root/autodl-tmp/80-128,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.590|±  |0.0220|
|     |       |strict-match    |     5|exact_match|↑  |0.584|±  |0.0221|

vllm (pretrained=/root/autodl-tmp/80-128,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.4339|±  |0.0166|
| - humanities     |      2|none  |      |acc   |↑  |0.3949|±  |0.0338|
| - other          |      2|none  |      |acc   |↑  |0.4769|±  |0.0355|
| - social sciences|      2|none  |      |acc   |↑  |0.4333|±  |0.0361|
| - stem           |      2|none  |      |acc   |↑  |0.4316|±  |0.0290|


vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.584|±  |0.0312|
|     |       |strict-match    |     5|exact_match|↑  |0.584|±  |0.0312|

vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.590|±  | 0.022|
|     |       |strict-match    |     5|exact_match|↑  |0.586|±  | 0.022|

vllm (pretrained=/root/autodl-tmp/80-256,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.4246|±  |0.0165|
| - humanities     |      2|none  |      |acc   |↑  |0.3795|±  |0.0336|
| - other          |      2|none  |      |acc   |↑  |0.4872|±  |0.0356|
| - social sciences|      2|none  |      |acc   |↑  |0.4333|±  |0.0360|
| - stem           |      2|none  |      |acc   |↑  |0.4070|±  |0.0282|


vllm (pretrained=/root/autodl-tmp/80-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.604|±  | 0.031|
|     |       |strict-match    |     5|exact_match|↑  |0.600|±  | 0.031|

vllm (pretrained=/root/autodl-tmp/80-512,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.594|±  | 0.022|
|     |       |strict-match    |     5|exact_match|↑  |0.586|±  | 0.022|

vllm (pretrained=/root/autodl-tmp/80-512,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.4316|±  |0.0166|
| - humanities     |      2|none  |      |acc   |↑  |0.4000|±  |0.0341|
| - other          |      2|none  |      |acc   |↑  |0.4821|±  |0.0355|
| - social sciences|      2|none  |      |acc   |↑  |0.4278|±  |0.0356|
| - stem           |      2|none  |      |acc   |↑  |0.4211|±  |0.0289|