Update README.md
Browse files
README.md
CHANGED
|
@@ -47,7 +47,7 @@ Iterative DPO + Model merging :
|
|
| 47 |
|
| 48 |
**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
|
| 49 |
All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
|
| 50 |
-
Evaluations were performed using LM Eval Harness, all results are fully reproducible.
|
| 51 |
|
| 52 |
| Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
|
| 53 |
|------------------------------------|-----------------------------------|--------------------------------|
|
|
|
|
| 47 |
|
| 48 |
**Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
|
| 49 |
All scores are reported in comparison with the original [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16) model.
|
| 50 |
+
Evaluations were performed using [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness), all results are fully reproducible.
|
| 51 |
|
| 52 |
| Benchmark (metric) | microsoft/bitnet-b1.58-2B-4T-bf16 | bitnet-dpo-merged-modelstock7 |
|
| 53 |
|------------------------------------|-----------------------------------|--------------------------------|
|