| | --- |
| | language: |
| | - en |
| | license: mit |
| | tags: |
| | - bitnet |
| | - mamba |
| | - ssm |
| | - 1.58-bit |
| | - ternary |
| | - efficient-inference |
| | datasets: |
| | - HuggingFaceFW/fineweb-edu |
| | - bigcode/the-stack-dedup |
| | - HuggingFaceTB/cosmopedia |
| | metrics: |
| | - accuracy |
| | - perplexity |
| | library_name: jax |
| | pipeline_tag: text-generation |
| | inference: false |
| | --- |
| | |
| | # BitMamba-2-1B |
| |
|
| | <div align="center"> |
| |
|
| | [](https://huggingface.co/spaces/Zhayr1/Bitmamba-2-1B) |
| | [](https://doi.org/10.5281/zenodo.18394665) |
| | [](https://github.com/Zhayr1/BitMamba-2) |
| |
|
| | </div> |
| |
|
| | **BitMamba-2-1B** is a scalable, hybrid architecture that integrates **1.58-bit ternary quantization** (BitNet) into the **Mamba-2** state space model framework. Trained from scratch on 150B tokens of high-quality data, it demonstrates that ternary SSMs follow predictable scaling laws, achieving competitive reasoning capabilities with a drastically reduced memory footprint. |
| |
|
| | ## β‘ Key Features |
| |
|
| | - **Architecture:** Mamba-2 SSM + BitNet b1.58 (Ternary Weights). |
| | - **Parameters:** 1B. |
| | - **Precision:** 1.58-bit (weights {-1, 0, 1}). |
| | - **Training Tokens:** 150 Billion (FineWeb-Edu, Cosmopedia, Stack-Dedup). |
| | - **Hardware:** Trained on Google Cloud TPU v6e. |
| |
|
| | ## π Benchmark Results |
| |
|
| | | Benchmark | Metric | BitMamba-2-1B | vs. 255M Baseline | |
| | | :------------- | :--------: | :-----------: | :---------------: | |
| | | **ARC-Easy** | Accuracy | **63.30%** | +7.8% | |
| | | **PIQA** | Accuracy | **68.77%** | +4.4% | |
| | | **BoolQ** | Accuracy | **62.35%** | +3.1% | |
| | | **HellaSwag** | Acc Norm | **45.59%** | +10.4% | |
| | | **WikiText-2** | Perplexity | **29.62** | -22.1 | |
| |
|
| | Scaling from 255M to 1B parameters yields consistent improvements... |
| |
|
| |  |
| |
|
| | ## π Usage (Inference) |
| |
|
| | This model is optimized for edge deployment using our custom C++ inference engine. |
| |
|
| | ### 1. Download the Quantized Model |
| |
|
| | Download the `bitmamba_1b.bin` file located in the files tab (or `bitmamba_cpp` folder). |
| |
|
| | ### 2. Run with C++ |
| |
|
| | Go to our [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) to get the inference code. |
| |
|
| | ```bash |
| | # Example usage after compiling bitmamba.cpp |
| | ./bitmamba bitmamba_1b.bin "Hello, I am" tokenizer 0.7 1.1 0.05 0.9 40 200 |
| | ``` |
| |
|
| | ### 3. JAX/Flax Usage |
| |
|
| | The `bitmamba_1b.msgpack` contains the raw JAX weights for research purposes. You can load them using the source code provided in `src/` on GitHub. |
| |
|
| | ## π οΈ Efficient Deployment |
| |
|
| | Running on a consumer **Intel Core i3-12100F CPU**: |
| |
|
| | | Model | RAM Usage | Speed | |
| | | ----------------- | ---------- | ------------- | |
| | | **BitMamba-2-1B** | **621 MB** | **~53 tok/s** | |
| |
|
| | ## π Citation |
| |
|
| | ```bibtex |
| | @misc{salazar2026bitmamba2, |
| | author = {Salazar, Jesus}, |
| | title = {{BitMamba}-2: Efficient Scaling of 1.58-bit State Space Models}, |
| | year = {2026}, |
| | publisher = {Zenodo}, |
| | doi = {10.5281/zenodo.18394665}, |
| | url = {https://doi.org/10.5281/zenodo.18394665} |
| | } |
| | ``` |