Zhayr1 commited on
Commit
807f72f
·
verified ·
1 Parent(s): 46b64af

Initial commit: Upload BitMamba-1B model, weights and benchmarks

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ training_loss_1b.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,97 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ tags:
6
+ - bitnet
7
+ - mamba
8
+ - ssm
9
+ - 1.58-bit
10
+ - ternary
11
+ - efficient-inference
12
+ datasets:
13
+ - HuggingFaceFW/fineweb-edu
14
+ - cosmopedia
15
+ - bigcode/the-stack-dedup
16
+ metrics:
17
+ - accuracy
18
+ - perplexity
19
+ library_name: jax
20
+ pipeline_tag: text-generation
21
+ inference: false
22
+ ---
23
+
24
+ # BitMamba-2-1B
25
+
26
+ <div align="center">
27
+
28
+ [![Paper](https://img.shields.io/badge/Paper-Zenodo-00649C.svg)](TU_LINK_DE_ZENODO)
29
+ [![GitHub](https://img.shields.io/badge/GitHub-Source%20Code-black)](https://github.com/Zhayr1/BitMamba-2)
30
+
31
+ </div>
32
+
33
+ **BitMamba-2-1B** is a scalable, hybrid architecture that integrates **1.58-bit ternary quantization** (BitNet) into the **Mamba-2** state space model framework. Trained from scratch on 150B tokens of high-quality data, it demonstrates that ternary SSMs follow predictable scaling laws, achieving competitive reasoning capabilities with a drastically reduced memory footprint.
34
+
35
+ ## ⚡ Key Features
36
+
37
+ - **Architecture:** Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
38
+ - **Parameters:** 1B.
39
+ - **Precision:** 1.58-bit (weights $\in \{-1, 0, 1\}$).
40
+ - **Training Tokens:** 150 Billion (FineWeb-Edu, Cosmopedia, Stack-Dedup).
41
+ - **Hardware:** Trained on Google Cloud TPU v6e.
42
+
43
+ ## 📊 Benchmark Results
44
+
45
+ | Benchmark | Metric | BitMamba-2-1B | vs. 255M Baseline |
46
+ | :------------- | :--------: | :-----------: | :---------------: |
47
+ | **ARC-Easy** | Accuracy | **63.30%** | +7.8% |
48
+ | **PIQA** | Accuracy | **68.77%** | +4.4% |
49
+ | **BoolQ** | Accuracy | **62.35%** | +3.1% |
50
+ | **HellaSwag** | Acc Norm | **45.59%** | +10.4% |
51
+ | **WikiText-2** | Perplexity | **29.62** | -22.1 |
52
+
53
+ Scaling from 255M to 1B parameters yields consistent improvements...
54
+
55
+ ![Scaling Laws](training_loss_1b.png)
56
+
57
+ ## 🚀 Usage (Inference)
58
+
59
+ This model is optimized for edge deployment using our custom C++ inference engine.
60
+
61
+ ### 1. Download the Quantized Model
62
+
63
+ Download the `bitmamba_1b.bin` file located in the files tab (or `bitmamba_cpp` folder).
64
+
65
+ ### 2. Run with C++
66
+
67
+ Go to our [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) to get the inference code.
68
+
69
+ ```bash
70
+ # Example usage after compiling bitmamba.cpp
71
+ ./bitmamba bitmamba_1b.bin "15496 11 314 716" 0.7 1.1 0.05 0.9 40 200
72
+ ```
73
+
74
+ ### 3. JAX/Flax Usage
75
+
76
+ The `bitmamba_1b.msgpack` contains the raw JAX weights for research purposes. You can load them using the source code provided in `src/` on GitHub.
77
+
78
+ ## 🛠️ Efficient Deployment
79
+
80
+ Running on a consumer **Intel Core i3-12100F CPU**:
81
+
82
+ | Model | RAM Usage | Speed |
83
+ | ----------------- | ---------- | ------------- |
84
+ | **BitMamba-2-1B** | **621 MB** | **~53 tok/s** |
85
+
86
+ ## 📜 Citation
87
+
88
+ ```bibtex
89
+ @misc{salazar2026bitmamba2,
90
+ author = {Salazar, Jesus},
91
+ title = {BitMamba-2: Efficient Scaling of 1.58-bit State Space Models},
92
+ year = {2026},
93
+ publisher = {Zenodo},
94
+ doi = {10.5281/zenodo.XXXXXXX},
95
+ url = {https://doi.org/10.5281/zenodo.XXXXXXX}
96
+ }
97
+ ```
bitmamba_cpp/bitmamba_1b.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f2b9ba34d9a23a9712b8f8c19a60641a96d3809a331ddffe0699b8d179888e1
3
+ size 644297164
config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BitMamba2LM"
4
+ ],
5
+ "model_type": "bitmamba",
6
+ "d_model": 2048,
7
+ "n_layers": 32,
8
+ "n_heads": 32,
9
+ "vocab_size": 50257,
10
+ "ssm_d_state": 128,
11
+ "ssm_d_conv": 4,
12
+ "expand": 2,
13
+ "rms_norm_eps": 1e-6,
14
+ "quantization": {
15
+ "bits": 1.58,
16
+ "group_size": null,
17
+ "zero_point": false
18
+ },
19
+ "bos_token_id": 50256,
20
+ "eos_token_id": 50256,
21
+ "transformers_version": "5.0.0"
22
+ }
jax_weights/bit_mamba_1b.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a9b4aa0abbf45088c23d19ca922ef95e1cf68aa7aa4f1c3111b7d80f6900676
3
+ size 4073769440
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<|endoftext|>",
5
+ "eos_token": "<|endoftext|>",
6
+ "errors": "replace",
7
+ "is_local": false,
8
+ "model_max_length": 1024,
9
+ "pad_token": null,
10
+ "tokenizer_class": "GPT2Tokenizer",
11
+ "unk_token": "<|endoftext|>"
12
+ }
training_loss_1b.png ADDED

Git LFS Details

  • SHA256: 5c824d3203e1f3b62bcae7cf7239fa6c1094eeb7a2c412d23ef7aaca94d27471
  • Pointer size: 131 Bytes
  • Size of remote file: 189 kB