Initial commit: Upload BitMamba-1B model, weights and benchmarks

Browse files

Files changed (8) hide show

.gitattributes +1 -0
README.md +97 -3
bitmamba_cpp/bitmamba_1b.bin +3 -0
config.json +22 -0
jax_weights/bit_mamba_1b.msgpack +3 -0
tokenizer.json +0 -0
tokenizer_config.json +12 -0
training_loss_1b.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+training_loss_1b.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,97 @@
----
-license: mit
----

+---
+language:
+  - en
+license: mit
+tags:
+  - bitnet
+  - mamba
+  - ssm
+  - 1.58-bit
+  - ternary
+  - efficient-inference
+datasets:
+  - HuggingFaceFW/fineweb-edu
+  - cosmopedia
+  - bigcode/the-stack-dedup
+metrics:
+  - accuracy
+  - perplexity
+library_name: jax
+pipeline_tag: text-generation
+inference: false
+---
+# BitMamba-2-1B
+<div align="center">
+[![Paper](https://img.shields.io/badge/Paper-Zenodo-00649C.svg)](TU_LINK_DE_ZENODO)
+[![GitHub](https://img.shields.io/badge/GitHub-Source%20Code-black)](https://github.com/Zhayr1/BitMamba-2)
+</div>
+**BitMamba-2-1B** is a scalable, hybrid architecture that integrates **1.58-bit ternary quantization** (BitNet) into the **Mamba-2** state space model framework. Trained from scratch on 150B tokens of high-quality data, it demonstrates that ternary SSMs follow predictable scaling laws, achieving competitive reasoning capabilities with a drastically reduced memory footprint.
+## ⚡ Key Features
+- **Architecture:** Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
+- **Parameters:** 1B.
+- **Precision:** 1.58-bit (weights $\in \{-1, 0, 1\}$).
+- **Training Tokens:** 150 Billion (FineWeb-Edu, Cosmopedia, Stack-Dedup).
+- **Hardware:** Trained on Google Cloud TPU v6e.
+## 📊 Benchmark Results
+| Benchmark      |   Metric   | BitMamba-2-1B | vs. 255M Baseline |
+| :------------- | :--------: | :-----------: | :---------------: |
+| **ARC-Easy**   |  Accuracy  |  **63.30%**   |       +7.8%       |
+| **PIQA**       |  Accuracy  |  **68.77%**   |       +4.4%       |
+| **BoolQ**      |  Accuracy  |  **62.35%**   |       +3.1%       |
+| **HellaSwag**  |  Acc Norm  |  **45.59%**   |      +10.4%       |
+| **WikiText-2** | Perplexity |   **29.62**   |       -22.1       |
+Scaling from 255M to 1B parameters yields consistent improvements...
+![Scaling Laws](training_loss_1b.png)
+## 🚀 Usage (Inference)
+This model is optimized for edge deployment using our custom C++ inference engine.
+### 1. Download the Quantized Model
+Download the `bitmamba_1b.bin` file located in the files tab (or `bitmamba_cpp` folder).
+### 2. Run with C++
+Go to our [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) to get the inference code.
+```bash
+# Example usage after compiling bitmamba.cpp
+./bitmamba bitmamba_1b.bin "15496 11 314 716" 0.7 1.1 0.05 0.9 40 200
+```
+### 3. JAX/Flax Usage
+The `bitmamba_1b.msgpack` contains the raw JAX weights for research purposes. You can load them using the source code provided in `src/` on GitHub.
+## 🛠️ Efficient Deployment
+Running on a consumer **Intel Core i3-12100F CPU**:
+| Model             | RAM Usage  | Speed         |
+| ----------------- | ---------- | ------------- |
+| **BitMamba-2-1B** | **621 MB** | **~53 tok/s** |
+## 📜 Citation
+```bibtex
+@misc{salazar2026bitmamba2,
+  author       = {Salazar, Jesus},
+  title        = {BitMamba-2: Efficient Scaling of 1.58-bit State Space Models},
+  year         = {2026},
+  publisher    = {Zenodo},
+  doi          = {10.5281/zenodo.XXXXXXX},
+  url          = {https://doi.org/10.5281/zenodo.XXXXXXX}
+}
+```

bitmamba_cpp/bitmamba_1b.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4f2b9ba34d9a23a9712b8f8c19a60641a96d3809a331ddffe0699b8d179888e1
+size 644297164

config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "architectures": [
+    "BitMamba2LM"
+  ],
+  "model_type": "bitmamba",
+  "d_model": 2048,
+  "n_layers": 32,
+  "n_heads": 32,
+  "vocab_size": 50257,
+  "ssm_d_state": 128,
+  "ssm_d_conv": 4,
+  "expand": 2,
+  "rms_norm_eps": 1e-6,
+  "quantization": {
+    "bits": 1.58,
+    "group_size": null,
+    "zero_point": false
+  },
+  "bos_token_id": 50256,
+  "eos_token_id": 50256,
+  "transformers_version": "5.0.0"
+}

jax_weights/bit_mamba_1b.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a9b4aa0abbf45088c23d19ca922ef95e1cf68aa7aa4f1c3111b7d80f6900676
+size 4073769440

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "is_local": false,
+  "model_max_length": 1024,
+  "pad_token": null,
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

training_loss_1b.png ADDED Viewed

Git LFS Details

SHA256: 5c824d3203e1f3b62bcae7cf7239fa6c1094eeb7a2c412d23ef7aaca94d27471
Pointer size: 131 Bytes
Size of remote file: 189 kB