jpacifico commited on
Commit
0444b8d
·
verified ·
1 Parent(s): ba2dd3a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -16,12 +16,10 @@ This is a merge of pre-trained language models created using [mergekit](https://
16
 
17
  # First benchmarks
18
 
19
- **Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit baseline.
20
  All scores are reported in comparison with the original Microsoft BitNet b1.58 BF16 model.
21
  Evaluations were performed using LM Eval Harness, all results are fully reproducible.
22
 
23
- **ARC-Challenge:** 51.62 (First-ever ≥50 score for a model in the 2B category, i.e., >1.5B and <2.5B params)
24
-
25
  | Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
26
  |------------------------------------|-----------------------------------|--------------------------------|
27
  | arc_challenge 0 shot | 47.95 | **51.62** |
@@ -39,6 +37,7 @@ Evaluations were performed using LM Eval Harness, all results are fully reproduc
39
  | mmlu 5 shot acc | 52.96 | **53.39** |
40
  | commonsense_qa 10 shot acc | **71.17** | 70.76 |
41
 
 
42
 
43
  | Model | arc_challenge (0 shot) |
44
  |----------------------------------------------------|------------------------|
 
16
 
17
  # First benchmarks
18
 
19
+ **Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit baseline (58,38).
20
  All scores are reported in comparison with the original Microsoft BitNet b1.58 BF16 model.
21
  Evaluations were performed using LM Eval Harness, all results are fully reproducible.
22
 
 
 
23
  | Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
24
  |------------------------------------|-----------------------------------|--------------------------------|
25
  | arc_challenge 0 shot | 47.95 | **51.62** |
 
37
  | mmlu 5 shot acc | 52.96 | **53.39** |
38
  | commonsense_qa 10 shot acc | **71.17** | 70.76 |
39
 
40
+ **ARC-Challenge:** 51.62 (First-ever ≥50 score for a model in the 2B category, i.e., >1.5B and <2.5B params)
41
 
42
  | Model | arc_challenge (0 shot) |
43
  |----------------------------------------------------|------------------------|