Zhayr1 commited on
Commit
60df1d3
·
verified ·
1 Parent(s): a95ce7e

Initial commit: Upload BitMamba-1B model, weights and benchmarks

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ scaling_comparisson.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,103 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ tags:
6
+ - bitnet
7
+ - mamba
8
+ - ssm
9
+ - 1.58-bit
10
+ - ternary
11
+ - efficient-inference
12
+ - edge-computing
13
+ datasets:
14
+ - HuggingFaceFW/fineweb-edu
15
+ - cosmopedia
16
+ - bigcode/the-stack-dedup
17
+ metrics:
18
+ - accuracy
19
+ - perplexity
20
+ library_name: jax
21
+ pipeline_tag: text-generation
22
+ inference: false
23
+ ---
24
+
25
+ # BitMamba-2-255M
26
+
27
+ <div align="center">
28
+
29
+ [![Paper](https://img.shields.io/badge/Paper-Zenodo-00649C.svg)](TU_LINK_DE_ZENODO)
30
+ [![GitHub](https://img.shields.io/badge/GitHub-Source%20Code-black)](https://github.com/Zhayr1/BitMamba-2)
31
+
32
+ </div>
33
+
34
+ **BitMamba-2-255M** is the ultra-efficient baseline model of the BitMamba-2 family. It integrates **1.58-bit ternary quantization** (BitNet) into the **Mamba-2** architecture. Despite its small size, it demonstrates stable convergence and surprising reasoning capabilities, serving as the proof-of-concept for scaling ternary State Space Models.
35
+
36
+ ## ⚡ Key Features
37
+
38
+ - **Architecture:** Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
39
+ - **Parameters:** 255M.
40
+ - **Precision:** 1.58-bit (weights $\in \{-1, 0, 1\}$).
41
+ - **Training Tokens:** Trained on high-quality data (FineWeb-Edu, Cosmopedia, Stack-Dedup).
42
+ - **Hardware:** Trained on Google Cloud TPU v6e.
43
+
44
+ ## 📊 Benchmark Results
45
+
46
+ This model serves as the baseline for our scaling laws analysis.
47
+
48
+ | Benchmark | Metric | BitMamba-2-255M |
49
+ | :------------- | :--------: | :-------------: |
50
+ | **ARC-Easy** | Accuracy | 55.51% |
51
+ | **PIQA** | Accuracy | 64.42% |
52
+ | **BoolQ** | Accuracy | 59.30% |
53
+ | **HellaSwag** | Acc Norm | 35.22% |
54
+ | **WikiText-2** | Perplexity | 51.69 |
55
+
56
+ As shown in the scaling analysis below, the 255M model (blue line) establishes a stable learning trajectory, which is significantly improved upon by the 1B model (red line).
57
+
58
+ ![Scaling Laws](scaling_comparisson.png)
59
+
60
+ ## 🚀 Usage (Inference)
61
+
62
+ This model is optimized for extreme edge deployment (IoT, Mobile, Legacy Hardware) using our custom C++ inference engine.
63
+
64
+ ### 1. Download the Quantized Model
65
+
66
+ Download the `bitmamba_255m.bin` file located in the files tab.
67
+
68
+ ### 2. Run with C++
69
+
70
+ Go to our [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) to get the inference code.
71
+
72
+ ```bash
73
+ # Example usage after compiling bitmamba.cpp
74
+ # Note: Using smaller context size for speed demonstration
75
+ ./bitmamba bitmamba_255m.bin "15496 11 314 716" 0.7 1.1 0.05 0.9 40 200
76
+ ```
77
+
78
+ ### 3. JAX/Flax Usage
79
+
80
+ The `bitmamba_255m.msgpack` contains the raw JAX weights for research purposes. You can load them using the source code provided in `src/` on GitHub.
81
+
82
+ ## 🛠️ Efficient Deployment
83
+
84
+ Running on a consumer **Intel Core i3-12100F CPU**:
85
+
86
+ | Model | RAM Usage | Speed |
87
+ | ------------------- | ---------- | -------------- |
88
+ | **BitMamba-2-255M** | **252 MB** | **~146 tok/s** |
89
+
90
+ ## 📜 Citation
91
+
92
+ If you use this model or our architecture, please cite our paper:
93
+
94
+ ```bibtex
95
+ @misc{salazar2026bitmamba2,
96
+ author = {Salazar, Jesus},
97
+ title = {BitMamba-2: Efficient Scaling of 1.58-bit State Space Models},
98
+ year = {2026},
99
+ publisher = {Zenodo},
100
+ doi = {10.5281/zenodo.XXXXXXX},
101
+ url = {[https://doi.org/10.5281/zenodo.XXXXXXX](https://doi.org/10.5281/zenodo.XXXXXXX)}
102
+ }
103
+ ```
bitmamba_cpp/bitmamba_255m.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:701ea630139c4099668b2eb849f3c1ebf30fcbbb400e83782da5f743610fa198
3
+ size 258250988
config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": ["BitMamba2LM"],
3
+ "model_type": "bitmamba",
4
+ "d_model": 1024,
5
+ "n_layers": 24,
6
+ "n_heads": 16,
7
+ "vocab_size": 50257,
8
+ "ssm_d_state": 128,
9
+ "ssm_d_conv": 4,
10
+ "expand": 2,
11
+ "rms_norm_eps": 1e-6,
12
+ "quantization": {
13
+ "bits": 1.58,
14
+ "group_size": null,
15
+ "zero_point": false
16
+ },
17
+ "bos_token_id": 50256,
18
+ "eos_token_id": 50256,
19
+ "transformers_version": "5.0.0"
20
+ }
jax_weights/bitmamba_255m.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1ee4ad01aed142c1fb8e94e3efd8260ee7be01141b89b98322ac7108fdf43c0
3
+ size 1021897520
scaling_comparisson.png ADDED

Git LFS Details

  • SHA256: 2d73b1d3404dd914d99210b25a0afab5e0262f1c14a4db5dff145d9345e6aef2
  • Pointer size: 131 Bytes
  • Size of remote file: 207 kB
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<|endoftext|>",
5
+ "eos_token": "<|endoftext|>",
6
+ "errors": "replace",
7
+ "is_local": false,
8
+ "model_max_length": 1024,
9
+ "pad_token": null,
10
+ "tokenizer_class": "GPT2Tokenizer",
11
+ "unk_token": "<|endoftext|>"
12
+ }