jpacifico
/

Aramis-2B-BitNet-bf16

@@ -32,6 +32,7 @@ Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lo
 - jpacifico/bitnet-dpo-merged-modelstock7 (this repo): Contains the retrainable weights in BF16 format
 - [jpacifico/bitnet-dpo-fr-i2s-2](https://huggingface.co/jpacifico/bitnet-dpo-fr-i2s-2) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
 # Training Recipe
 Base model : [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16)
@@ -43,6 +44,7 @@ Iterative DPO + Model merging :
   [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
 - Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
 # First benchmarks
 **Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
@@ -99,6 +101,8 @@ HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
 - Randomness (e.g. seeds, batch sizes) may cause slight variations in results
 - The same procedure was used to evaluate all tasks presented in the benchmark tables
 # Usage with `bitnet.cpp`
 You can run this model using my demo [Colab notebook](https://github.com/jpacifico/) TBD
@@ -106,6 +110,7 @@ You can run this model using my demo [Colab notebook](https://github.com/jpacifi
 Please refer to the [bitnet.cpp](https://github.com/microsoft/BitNet) GitHub repository for detailed compilation steps, usage examples, and command-line options.
 # Last checkpoint
 ### Merge Method
@@ -138,6 +143,7 @@ tokenizer_source: jpacifico/bitnet-dpo-merged-modelstock-retrain
 ```
 # Limitations
 Not tuned for coding or formal math; prefer specialized variants if those are critical.
@@ -147,6 +153,7 @@ No explicit chain-of-thought training; improvements come from bilingual DPO + me
 This model is intended for research and development purposes only and should not be used in commercial or real-world applications without further testing. While the Microsoft Research team has applied SFT and DPO to align the BitNet base model, it may still produce unexpected, biased, or inaccurate outputs. Please use responsibly.
 - **Developed by:** Jonathan Pacifico, 2025
 - **Model type:** LLM
 - **Language(s) (NLP):** French, English

 - jpacifico/bitnet-dpo-merged-modelstock7 (this repo): Contains the retrainable weights in BF16 format
 - [jpacifico/bitnet-dpo-fr-i2s-2](https://huggingface.co/jpacifico/bitnet-dpo-fr-i2s-2) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
 # Training Recipe
 Base model : [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16)
   [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
 - Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
 # First benchmarks
 **Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
 - Randomness (e.g. seeds, batch sizes) may cause slight variations in results
 - The same procedure was used to evaluate all tasks presented in the benchmark tables
 # Usage with `bitnet.cpp`
 You can run this model using my demo [Colab notebook](https://github.com/jpacifico/) TBD
 Please refer to the [bitnet.cpp](https://github.com/microsoft/BitNet) GitHub repository for detailed compilation steps, usage examples, and command-line options.
 # Last checkpoint
 ### Merge Method
 ```
 # Limitations
 Not tuned for coding or formal math; prefer specialized variants if those are critical.
 This model is intended for research and development purposes only and should not be used in commercial or real-world applications without further testing. While the Microsoft Research team has applied SFT and DPO to align the BitNet base model, it may still produce unexpected, biased, or inaccurate outputs. Please use responsibly.
 - **Developed by:** Jonathan Pacifico, 2025
 - **Model type:** LLM
 - **Language(s) (NLP):** French, English