jpacifico
/

Aramis-2B-BitNet-bf16

Text Generation

Model card Files Files and versions

jpacifico commited on Aug 15, 2025

Commit

76f4a9f

·

verified ·

1 Parent(s): 0444b8d

Update README.md

Files changed (1) hide show

README.md +16 -3

README.md CHANGED Viewed

@@ -10,9 +10,22 @@ tags:
 - merge
 ---
-# merge
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 # First benchmarks
@@ -50,7 +63,7 @@ Evaluations were performed using LM Eval Harness, all results are fully reproduc
 | jpacifico/bitnet-dpo-merged-modelstock7            | **51,62**              |
-## Merge Details
 ### Merge Method
 This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [jpacifico/bitnet-dpo-merged-modelstock-retrain](https://huggingface.co/jpacifico/bitnet-dpo-merged-modelstock-retrain) as a base.

 - merge
 ---
+# model Summary
+**bitnet-dpo-merged-modelstock7** (≈2B, BitNet b1.58)
+A compact, agent-oriented small language model focused on language understanding and contextual decision-making.
+Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
+Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoints.
+**Why BitNet (and why this model)**
+	•	BitNet b1.58 uses ternary weights (−1,0,+1) with abs-mean scaling : very low memory & energy, great CPU/edge throughput, unlike classic FP/INT SLMs.
+	•	ModelStock7 demonstrates that a 2B BitNet can deliver SOTA language understanding in its class without sacrificing efficiency.
+# Training Recipe
+-Bilingual DPO (FR+EN) to sharpen preference selection across two languages.
+-Model merging (FR-centric + EN-centric) to broaden stylistic/lexical coverage.
+Goal: agent-oriented behavior → better instruction following, contextual disambiguation, and pragmatic reasoning in multi-turn settings.
 # First benchmarks
 | jpacifico/bitnet-dpo-merged-modelstock7            | **51,62**              |
+## Last checkpoint
 ### Merge Method
 This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [jpacifico/bitnet-dpo-merged-modelstock-retrain](https://huggingface.co/jpacifico/bitnet-dpo-merged-modelstock-retrain) as a base.