jpacifico
/

Aramis-2B-BitNet-bf16

Text Generation

Model card Files Files and versions

jpacifico commited on Aug 13, 2025

Commit

0444b8d

·

verified ·

1 Parent(s): ba2dd3a

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -16,12 +16,10 @@ This is a merge of pre-trained language models created using [mergekit](https://
 # First benchmarks
-**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit baseline.
 All scores are reported in comparison with the original Microsoft BitNet b1.58 BF16 model.
 Evaluations were performed using LM Eval Harness, all results are fully reproducible.
-**ARC-Challenge:** 51.62 (First-ever ≥50 score for a model in the 2B category, i.e., >1.5B and <2.5B params)
 | Benchmark (metric)                 | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
 |------------------------------------|-----------------------------------|--------------------------------|
 | arc_challenge 0 shot               | 47.95                             | **51.62**                      |
@@ -39,6 +37,7 @@ Evaluations were performed using LM Eval Harness, all results are fully reproduc
 | mmlu 5 shot acc                    | 52.96                             | **53.39**                      |
 | commonsense_qa 10 shot acc         | **71.17**                         | 70.76                          |
 | Model                                              | arc_challenge (0 shot) |
 |----------------------------------------------------|------------------------|

 # First benchmarks
+**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit baseline (58,38).
 All scores are reported in comparison with the original Microsoft BitNet b1.58 BF16 model.
 Evaluations were performed using LM Eval Harness, all results are fully reproducible.
 | Benchmark (metric)                 | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
 |------------------------------------|-----------------------------------|--------------------------------|
 | arc_challenge 0 shot               | 47.95                             | **51.62**                      |
 | mmlu 5 shot acc                    | 52.96                             | **53.39**                      |
 | commonsense_qa 10 shot acc         | **71.17**                         | 70.76                          |
+**ARC-Challenge:** 51.62 (First-ever ≥50 score for a model in the 2B category, i.e., >1.5B and <2.5B params)
 | Model                                              | arc_challenge (0 shot) |
 |----------------------------------------------------|------------------------|