Initial commit: Upload BitMamba-1B model, weights and benchmarks

Browse files

Files changed (8) hide show

.gitattributes +1 -0
README.md +103 -3
bitmamba_cpp/bitmamba_255m.bin +3 -0
config.json +20 -0
jax_weights/bitmamba_255m.msgpack +3 -0
scaling_comparisson.png +3 -0
tokenizer.json +0 -0
tokenizer_config.json +12 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+scaling_comparisson.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,103 @@
----
-license: mit
----

+---
+language:
+  - en
+license: mit
+tags:
+  - bitnet
+  - mamba
+  - ssm
+  - 1.58-bit
+  - ternary
+  - efficient-inference
+  - edge-computing
+datasets:
+  - HuggingFaceFW/fineweb-edu
+  - cosmopedia
+  - bigcode/the-stack-dedup
+metrics:
+  - accuracy
+  - perplexity
+library_name: jax
+pipeline_tag: text-generation
+inference: false
+---
+# BitMamba-2-255M
+<div align="center">
+[![Paper](https://img.shields.io/badge/Paper-Zenodo-00649C.svg)](TU_LINK_DE_ZENODO)
+[![GitHub](https://img.shields.io/badge/GitHub-Source%20Code-black)](https://github.com/Zhayr1/BitMamba-2)
+</div>
+**BitMamba-2-255M** is the ultra-efficient baseline model of the BitMamba-2 family. It integrates **1.58-bit ternary quantization** (BitNet) into the **Mamba-2** architecture. Despite its small size, it demonstrates stable convergence and surprising reasoning capabilities, serving as the proof-of-concept for scaling ternary State Space Models.
+## ⚡ Key Features
+- **Architecture:** Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
+- **Parameters:** 255M.
+- **Precision:** 1.58-bit (weights $\in \{-1, 0, 1\}$).
+- **Training Tokens:** Trained on high-quality data (FineWeb-Edu, Cosmopedia, Stack-Dedup).
+- **Hardware:** Trained on Google Cloud TPU v6e.
+## 📊 Benchmark Results
+This model serves as the baseline for our scaling laws analysis.
+| Benchmark      |   Metric   | BitMamba-2-255M |
+| :------------- | :--------: | :-------------: |
+| **ARC-Easy**   |  Accuracy  |     55.51%      |
+| **PIQA**       |  Accuracy  |     64.42%      |
+| **BoolQ**      |  Accuracy  |     59.30%      |
+| **HellaSwag**  |  Acc Norm  |     35.22%      |
+| **WikiText-2** | Perplexity |      51.69      |
+As shown in the scaling analysis below, the 255M model (blue line) establishes a stable learning trajectory, which is significantly improved upon by the 1B model (red line).
+![Scaling Laws](scaling_comparisson.png)
+## 🚀 Usage (Inference)
+This model is optimized for extreme edge deployment (IoT, Mobile, Legacy Hardware) using our custom C++ inference engine.
+### 1. Download the Quantized Model
+Download the `bitmamba_255m.bin` file located in the files tab.
+### 2. Run with C++
+Go to our [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) to get the inference code.
+```bash
+# Example usage after compiling bitmamba.cpp
+# Note: Using smaller context size for speed demonstration
+./bitmamba bitmamba_255m.bin "15496 11 314 716" 0.7 1.1 0.05 0.9 40 200
+```
+### 3. JAX/Flax Usage
+The `bitmamba_255m.msgpack` contains the raw JAX weights for research purposes. You can load them using the source code provided in `src/` on GitHub.
+## 🛠️ Efficient Deployment
+Running on a consumer **Intel Core i3-12100F CPU**:
+| Model               | RAM Usage  | Speed          |
+| ------------------- | ---------- | -------------- |
+| **BitMamba-2-255M** | **252 MB** | **~146 tok/s** |
+## 📜 Citation
+If you use this model or our architecture, please cite our paper:
+```bibtex
+@misc{salazar2026bitmamba2,
+  author       = {Salazar, Jesus},
+  title        = {BitMamba-2: Efficient Scaling of 1.58-bit State Space Models},
+  year         = {2026},
+  publisher    = {Zenodo},
+  doi          = {10.5281/zenodo.XXXXXXX},
+  url          = {[https://doi.org/10.5281/zenodo.XXXXXXX](https://doi.org/10.5281/zenodo.XXXXXXX)}
+}
+```

bitmamba_cpp/bitmamba_255m.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:701ea630139c4099668b2eb849f3c1ebf30fcbbb400e83782da5f743610fa198
+size 258250988

config.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "architectures": ["BitMamba2LM"],
+  "model_type": "bitmamba",
+  "d_model": 1024,
+  "n_layers": 24,
+  "n_heads": 16,
+  "vocab_size": 50257,
+  "ssm_d_state": 128,
+  "ssm_d_conv": 4,
+  "expand": 2,
+  "rms_norm_eps": 1e-6,
+  "quantization": {
+    "bits": 1.58,
+    "group_size": null,
+    "zero_point": false
+  },
+  "bos_token_id": 50256,
+  "eos_token_id": 50256,
+  "transformers_version": "5.0.0"
+}

jax_weights/bitmamba_255m.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b1ee4ad01aed142c1fb8e94e3efd8260ee7be01141b89b98322ac7108fdf43c0
+size 1021897520

scaling_comparisson.png ADDED Viewed

Git LFS Details

SHA256: 2d73b1d3404dd914d99210b25a0afab5e0262f1c14a4db5dff145d9345e6aef2
Pointer size: 131 Bytes
Size of remote file: 207 kB

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "is_local": false,
+  "model_max_length": 1024,
+  "pad_token": null,
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}