Update README.md
Browse files
README.md
CHANGED
|
@@ -18,6 +18,8 @@ This is a merge of pre-trained language models created using [mergekit](https://
|
|
| 18 |
|
| 19 |
**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit baseline.
|
| 20 |
|
|
|
|
|
|
|
| 21 |
| Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
|
| 22 |
|------------------------------------|-----------------------------------|--------------------------------|
|
| 23 |
| arc_challenge 0 shot | 47.95 | **51.62** |
|
|
|
|
| 18 |
|
| 19 |
**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit baseline.
|
| 20 |
|
| 21 |
+
**ARC-Challenge:** 51.62 (First-ever ≥50 score for a model in the 2B category, i.e., >1.5B and <2.5B params)
|
| 22 |
+
|
| 23 |
| Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
|
| 24 |
|------------------------------------|-----------------------------------|--------------------------------|
|
| 25 |
| arc_challenge 0 shot | 47.95 | **51.62** |
|