jpacifico commited on
Commit
9a30e92
·
verified ·
1 Parent(s): d3d1e24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -21,7 +21,7 @@ language:
21
  **Aramis-2B-BitNet** *(2.41B params / Context Length: Maximum sequence length of 4096 tokens)*
22
  A compact, agent-oriented small language model focused on contextual reasoning, language understanding and multi-turn instruction following.
23
  Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
24
- Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless to the BF16 checkpoint.
25
 
26
  **Why BitNet (and why this model)**
27
  - BitNet b1.58 uses ternary weights (−1,0,+1) with abs-mean scaling : very low memory & energy, great CPU/edge throughput, unlike classic FP/INT SLMs. For more details on the underlying architecture and efficiency of BitNet, please refer to the official Microsoft Research publication: [BitNet b1.58 2B4T Technical Report](https://arxiv.org/abs/2504.12285)
 
21
  **Aramis-2B-BitNet** *(2.41B params / Context Length: Maximum sequence length of 4096 tokens)*
22
  A compact, agent-oriented small language model focused on contextual reasoning, language understanding and multi-turn instruction following.
23
  Built with an iterative post-training recipe: bilingual DPO (FR+EN) + model merging of FR-centric and EN-centric variants.
24
+ Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lossless from the BF16 checkpoint.
25
 
26
  **Why BitNet (and why this model)**
27
  - BitNet b1.58 uses ternary weights (−1,0,+1) with abs-mean scaling : very low memory & energy, great CPU/edge throughput, unlike classic FP/INT SLMs. For more details on the underlying architecture and efficiency of BitNet, please refer to the official Microsoft Research publication: [BitNet b1.58 2B4T Technical Report](https://arxiv.org/abs/2504.12285)