File size: 3,317 Bytes
7dd7bd2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
base_model:
- a-m-team/AM-Thinking-v1
---
vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1,add_bos_token=true,max_model_len=5096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.792|±  |0.0257|
|     |       |strict-match    |     5|exact_match|↑  |0.780|±  |0.0263|

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.798|±  |0.0180|
|     |       |strict-match    |     5|exact_match|↑  |0.786|±  |0.0184|

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.8023|±  |0.0131|
| - humanities     |      2|none  |      |acc   |↑  |0.8154|±  |0.0276|
| - other          |      2|none  |      |acc   |↑  |0.8000|±  |0.0276|
| - social sciences|      2|none  |      |acc   |↑  |0.8556|±  |0.0255|
| - stem           |      2|none  |      |acc   |↑  |0.7614|±  |0.0237|


vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1-awq,add_bos_token=true,max_model_len=5096,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.820|±  |0.0243|
|     |       |strict-match    |     5|exact_match|↑  |0.816|±  |0.0246|

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1-awq,add_bos_token=true,max_model_len=3096,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.816|±  |0.0173|
|     |       |strict-match    |     5|exact_match|↑  |0.814|±  |0.0174|

vllm (pretrained=/root/autodl-tmp/AM-Thinking-v1-awq,add_bos_token=true,max_model_len=3048,dtype=bfloat16), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: 1
|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.7930|±  |0.0132|
| - humanities     |      2|none  |      |acc   |↑  |0.8051|±  |0.0278|
| - other          |      2|none  |      |acc   |↑  |0.7846|±  |0.0277|
| - social sciences|      2|none  |      |acc   |↑  |0.8444|±  |0.0261|
| - stem           |      2|none  |      |acc   |↑  |0.7579|±  |0.0242|