File size: 10,138 Bytes
71b33de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
base_model:
- huihui-ai/Seed-Coder-8B-Instruct-abliterated
---
vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct-abliterated,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.552|±  |0.0315|
|     |       |strict-match    |     5|exact_match|↑  |0.552|±  |0.0315|

vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct-abliterated,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.566|±  |0.0222|
|     |       |strict-match    |     5|exact_match|↑  |0.564|±  |0.0222|

vllm (pretrained=/root/autodl-tmp/Seed-Coder-8B-Instruct-abliterated,add_bos_token=true,max_model_len=3048,dtype=bfloat16,model_impl=transformers,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.4316|±  |0.0167|
| - humanities     |      2|none  |      |acc   |↑  |0.4205|±  |0.0344|
| - other          |      2|none  |      |acc   |↑  |0.4615|±  |0.0356|
| - social sciences|      2|none  |      |acc   |↑  |0.4278|±  |0.0359|
| - stem           |      2|none  |      |acc   |↑  |0.4211|±  |0.0289|


vllm (pretrained=/root/autodl-tmp/80-128-4096,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.544|±  |0.0316|
|     |       |strict-match    |     5|exact_match|↑  |0.540|±  |0.0316|


vllm (pretrained=/root/autodl-tmp/80-256-4096,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  | 0.56|±  |0.0315|
|     |       |strict-match    |     5|exact_match|↑  | 0.56|±  |0.0315|

vllm (pretrained=/root/autodl-tmp/80-256-4096,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.578|±  |0.0221|
|     |       |strict-match    |     5|exact_match|↑  |0.574|±  |0.0221|


vllm (pretrained=/root/autodl-tmp/80-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.564|±  |0.0314|
|     |       |strict-match    |     5|exact_match|↑  |0.564|±  |0.0314|

vllm (pretrained=/root/autodl-tmp/80-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.570|±  |0.0222|
|     |       |strict-match    |     5|exact_match|↑  |0.566|±  |0.0222|

vllm (pretrained=/root/autodl-tmp/80-512-8192,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.4246|±  |0.0167|
| - humanities     |      2|none  |      |acc   |↑  |0.3897|±  |0.0344|
| - other          |      2|none  |      |acc   |↑  |0.4667|±  |0.0356|
| - social sciences|      2|none  |      |acc   |↑  |0.4222|±  |0.0366|
| - stem           |      2|none  |      |acc   |↑  |0.4211|±  |0.0290|


vllm (pretrained=/root/autodl-tmp/81-512-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  | 0.56|±  |0.0315|
|     |       |strict-match    |     5|exact_match|↑  | 0.56|±  |0.0315|

vllm (pretrained=/root/autodl-tmp/81-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.564|±  |0.0222|
|     |       |strict-match    |     5|exact_match|↑  |0.562|±  |0.0222|


vllm (pretrained=/root/autodl-tmp/82-256-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.564|±  |0.0314|
|     |       |strict-match    |     5|exact_match|↑  |0.564|±  |0.0314|

vllm (pretrained=/root/autodl-tmp/82-256-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.586|±  |0.0220|
|     |       |strict-match    |     5|exact_match|↑  |0.580|±  |0.0221|

vllm (pretrained=/root/autodl-tmp/82-256-8192,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.4292|±  |0.0166|
| - humanities     |      2|none  |      |acc   |↑  |0.4051|±  |0.0340|
| - other          |      2|none  |      |acc   |↑  |0.4718|±  |0.0355|
| - social sciences|      2|none  |      |acc   |↑  |0.4278|±  |0.0362|
| - stem           |      2|none  |      |acc   |↑  |0.4175|±  |0.0289|


vllm (pretrained=/root/autodl-tmp/82-512-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.564|±  |0.0314|
|     |       |strict-match    |     5|exact_match|↑  |0.560|±  |0.0315|

vllm (pretrained=/root/autodl-tmp/82-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.572|±  |0.0221|
|     |       |strict-match    |     5|exact_match|↑  |0.564|±  |0.0222|


vllm (pretrained=/root/autodl-tmp/82-1024-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.552|±  |0.0315|
|     |       |strict-match    |     5|exact_match|↑  |0.552|±  |0.0315|


vllm (pretrained=/root/autodl-tmp/83-512-8192,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.532|±  |0.0316|
|     |       |strict-match    |     5|exact_match|↑  |0.532|±  |0.0316|


vllm (pretrained=/root/autodl-tmp/84-256-8192,add_bos_token=true,max_model_len=8096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.540|±  |0.0316|
|     |       |strict-match    |     5|exact_match|↑  |0.536|±  |0.0316|