---
language:
- en
license: mit
tags:
- bitnet
- mamba
- ssm
- 1.58-bit
- ternary
- efficient-inference
datasets:
- HuggingFaceFW/fineweb-edu
- bigcode/the-stack-dedup
- HuggingFaceTB/cosmopedia
metrics:
- accuracy
- perplexity
library_name: jax
pipeline_tag: text-generation
inference: false
---
# BitMamba-2-1B
[](https://huggingface.co/spaces/Zhayr1/Bitmamba-2-1B)
[](https://doi.org/10.5281/zenodo.18394665)
[](https://github.com/Zhayr1/BitMamba-2)
**BitMamba-2-1B** is a scalable, hybrid architecture that integrates **1.58-bit ternary quantization** (BitNet) into the **Mamba-2** state space model framework. Trained from scratch on 150B tokens of high-quality data, it demonstrates that ternary SSMs follow predictable scaling laws, achieving competitive reasoning capabilities with a drastically reduced memory footprint.
## ⚡ Key Features
- **Architecture:** Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
- **Parameters:** 1B.
- **Precision:** 1.58-bit (weights {-1, 0, 1}).
- **Training Tokens:** 150 Billion (FineWeb-Edu, Cosmopedia, Stack-Dedup).
- **Hardware:** Trained on Google Cloud TPU v6e.
## 📊 Benchmark Results
| Benchmark | Metric | BitMamba-2-1B | vs. 255M Baseline |
| :------------- | :--------: | :-----------: | :---------------: |
| **ARC-Easy** | Accuracy | **63.30%** | +7.8% |
| **PIQA** | Accuracy | **68.77%** | +4.4% |
| **BoolQ** | Accuracy | **62.35%** | +3.1% |
| **HellaSwag** | Acc Norm | **45.59%** | +10.4% |
| **WikiText-2** | Perplexity | **29.62** | -22.1 |
Scaling from 255M to 1B parameters yields consistent improvements...

## 🚀 Usage (Inference)
This model is optimized for edge deployment using our custom C++ inference engine.
### 1. Download the Quantized Model
Download the `bitmamba_1b.bin` file located in the files tab (or `bitmamba_cpp` folder).
### 2. Run with C++
Go to our [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) to get the inference code.
```bash
# Example usage after compiling bitmamba.cpp
./bitmamba bitmamba_1b.bin "Hello, I am" tokenizer 0.7 1.1 0.05 0.9 40 200
```
### 3. JAX/Flax Usage
The `bitmamba_1b.msgpack` contains the raw JAX weights for research purposes. You can load them using the source code provided in `src/` on GitHub.
## 🛠️ Efficient Deployment
Running on a consumer **Intel Core i3-12100F CPU**:
| Model | RAM Usage | Speed |
| ----------------- | ---------- | ------------- |
| **BitMamba-2-1B** | **621 MB** | **~53 tok/s** |
## 📜 Citation
```bibtex
@misc{salazar2026bitmamba2,
author = {Salazar, Jesus},
title = {{BitMamba}-2: Efficient Scaling of 1.58-bit State Space Models},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.18394665},
url = {https://doi.org/10.5281/zenodo.18394665}
}
```