jpacifico commited on
Commit
5fa3cea
·
verified ·
1 Parent(s): 4495094

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -32,6 +32,7 @@ Runs natively as BitNet 1.58-bit (ternary) and is available in GGUF 1.58-bit, lo
32
  - jpacifico/bitnet-dpo-merged-modelstock7 (this repo): Contains the retrainable weights in BF16 format
33
  - [jpacifico/bitnet-dpo-fr-i2s-2](https://huggingface.co/jpacifico/bitnet-dpo-fr-i2s-2) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
34
 
 
35
  # Training Recipe
36
 
37
  Base model : [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16)
@@ -43,6 +44,7 @@ Iterative DPO + Model merging :
43
  [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
44
  - Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
45
 
 
46
  # First benchmarks
47
 
48
  **Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
@@ -99,6 +101,8 @@ HF_ALLOW_CODE_EVAL=1 lm-eval --model hf \
99
  - Randomness (e.g. seeds, batch sizes) may cause slight variations in results
100
  - The same procedure was used to evaluate all tasks presented in the benchmark tables
101
 
 
 
102
  # Usage with `bitnet.cpp`
103
 
104
  You can run this model using my demo [Colab notebook](https://github.com/jpacifico/) TBD
@@ -106,6 +110,7 @@ You can run this model using my demo [Colab notebook](https://github.com/jpacifi
106
  Please refer to the [bitnet.cpp](https://github.com/microsoft/BitNet) GitHub repository for detailed compilation steps, usage examples, and command-line options.
107
 
108
 
 
109
  # Last checkpoint
110
  ### Merge Method
111
 
@@ -138,6 +143,7 @@ tokenizer_source: jpacifico/bitnet-dpo-merged-modelstock-retrain
138
 
139
  ```
140
 
 
141
  # Limitations
142
 
143
  Not tuned for coding or formal math; prefer specialized variants if those are critical.
@@ -147,6 +153,7 @@ No explicit chain-of-thought training; improvements come from bilingual DPO + me
147
  This model is intended for research and development purposes only and should not be used in commercial or real-world applications without further testing. While the Microsoft Research team has applied SFT and DPO to align the BitNet base model, it may still produce unexpected, biased, or inaccurate outputs. Please use responsibly.
148
 
149
 
 
150
  - **Developed by:** Jonathan Pacifico, 2025
151
  - **Model type:** LLM
152
  - **Language(s) (NLP):** French, English
 
32
  - jpacifico/bitnet-dpo-merged-modelstock7 (this repo): Contains the retrainable weights in BF16 format
33
  - [jpacifico/bitnet-dpo-fr-i2s-2](https://huggingface.co/jpacifico/bitnet-dpo-fr-i2s-2) : Quantized 1.58-bit GGUF version, you can use with [bitnet.cpp](https://github.com/microsoft/BitNet)
34
 
35
+
36
  # Training Recipe
37
 
38
  Base model : [microsoft/bitnet-b1.58-2B-4T-bf16](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16)
 
44
  [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
45
  - Model merging (ModelStock and TIES methods, via [Mergekit](https://github.com/cg123/mergekit) to combine complementary strengths of bilingual models (FR-centric + EN-centric), improving robustness across reasoning and comprehension tasks while maintaining stability.
46
 
47
+
48
  # First benchmarks
49
 
50
  **Interpretation:** Significant gains on language understanding & pragmatic reasoning (ARC-C/E, Wino, BoolQ, HellaSwag, TriviaQA) with stability on other skills. Math/code are not the optimization target; GSM8K stays essentially stable relative to the BitNet 1.58-bit quantized baseline (58,38).
 
101
  - Randomness (e.g. seeds, batch sizes) may cause slight variations in results
102
  - The same procedure was used to evaluate all tasks presented in the benchmark tables
103
 
104
+
105
+
106
  # Usage with `bitnet.cpp`
107
 
108
  You can run this model using my demo [Colab notebook](https://github.com/jpacifico/) TBD
 
110
  Please refer to the [bitnet.cpp](https://github.com/microsoft/BitNet) GitHub repository for detailed compilation steps, usage examples, and command-line options.
111
 
112
 
113
+
114
  # Last checkpoint
115
  ### Merge Method
116
 
 
143
 
144
  ```
145
 
146
+
147
  # Limitations
148
 
149
  Not tuned for coding or formal math; prefer specialized variants if those are critical.
 
153
  This model is intended for research and development purposes only and should not be used in commercial or real-world applications without further testing. While the Microsoft Research team has applied SFT and DPO to align the BitNet base model, it may still produce unexpected, biased, or inaccurate outputs. Please use responsibly.
154
 
155
 
156
+
157
  - **Developed by:** Jonathan Pacifico, 2025
158
  - **Model type:** LLM
159
  - **Language(s) (NLP):** French, English