noneUsername commited on
Commit
c1c3f61
·
verified ·
1 Parent(s): bf51479

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - gghfez/Mistral-Small-3.2-24B-Instruct-hf
4
+ ---
5
+ vllm (pretrained=/root/autodl-tmp/Mistral-Small-3.2-24B-Instruct-hf,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
6
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
7
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
8
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.908|± |0.0183|
9
+ | | |strict-match | 5|exact_match|↑ |0.904|± |0.0187|
10
+
11
+ vllm (pretrained=/root/autodl-tmp/Mistral-Small-3.2-24B-Instruct-hf,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.8), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
12
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
13
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
14
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.908|± |0.0129|
15
+ | | |strict-match | 5|exact_match|↑ |0.902|± |0.0133|
16
+
17
+ vllm (pretrained=/root/autodl-tmp/Mistral-Small-3.2-24B-Instruct-hf,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.9), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
18
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
19
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
20
+ |mmlu | 2|none | |acc |↑ |0.8035|± |0.0129|
21
+ | - humanities | 2|none | |acc |↑ |0.8462|± |0.0247|
22
+ | - other | 2|none | |acc |↑ |0.8256|± |0.0262|
23
+ | - social sciences| 2|none | |acc |↑ |0.8389|± |0.0271|
24
+ | - stem | 2|none | |acc |↑ |0.7368|± |0.0246|
25
+
26
+
27
+ vllm (pretrained=/root/autodl-tmp/root90-128-4096-9.9999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
28
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
29
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
30
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.900|± |0.0190|
31
+ | | |strict-match | 5|exact_match|↑ |0.896|± |0.0193|
32
+
33
+ vllm (pretrained=/root/autodl-tmp/root90-128-4096-9.9999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.5), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
34
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
35
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
36
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.892|± |0.0139|
37
+ | | |strict-match | 5|exact_match|↑ |0.886|± |0.0142|
38
+
39
+
40
+
41
+ vllm (pretrained=/root/autodl-tmp/root90-256-4096-9.9999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.5), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
42
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
43
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
44
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.916|± |0.0176|
45
+ | | |strict-match | 5|exact_match|↑ |0.908|± |0.0183|
46
+
47
+ vllm (pretrained=/root/autodl-tmp/root90-256-4096-9.9999,add_bos_token=true,max_model_len=3096,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.5), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
48
+ |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
49
+ |-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
50
+ |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.904|± |0.0132|
51
+ | | |strict-match | 5|exact_match|↑ |0.898|± |0.0135|
52
+
53
+ vllm (pretrained=/root/autodl-tmp/root90-256-4096-9.9999,add_bos_token=true,max_model_len=3048,dtype=bfloat16,trust_remote_code=true,gpu_memory_utilization=0.9), gen_kwargs: (None), limit: 15.0, num_fewshot: None, batch_size: auto
54
+ | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
55
+ |------------------|------:|------|------|------|---|-----:|---|-----:|
56
+ |mmlu | 2|none | |acc |↑ |0.7895|± |0.0132|
57
+ | - humanities | 2|none | |acc |↑ |0.8256|± |0.0251|
58
+ | - other | 2|none | |acc |↑ |0.8051|± |0.0273|
59
+ | - social sciences| 2|none | |acc |↑ |0.7889|± |0.0292|
60
+ | - stem | 2|none | |acc |↑ |0.7544|± |0.0241|