---
language:
- en
license: mit
tags:
- bitnet
- mamba
- ssm
- 1.58-bit
- ternary
- efficient-inference
datasets:
- HuggingFaceFW/fineweb-edu
- bigcode/the-stack-dedup
- HuggingFaceTB/cosmopedia
metrics:
- accuracy
- perplexity
library_name: jax
pipeline_tag: text-generation
inference: false
---

# BitMamba-2-1B

<div align="center">

[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm-dark.svg)](https://huggingface.co/spaces/Zhayr1/Bitmamba-2-1B)
[![Paper](https://img.shields.io/badge/Paper-Zenodo-00649C.svg)](https://doi.org/10.5281/zenodo.18394665)
[![GitHub](https://img.shields.io/badge/GitHub-Source%20Code-black)](https://github.com/Zhayr1/BitMamba-2)

</div>

**BitMamba-2-1B** is a scalable, hybrid architecture that integrates **1.58-bit ternary quantization** (BitNet) into the **Mamba-2** state space model framework. Trained from scratch on 150B tokens of high-quality data, it demonstrates that ternary SSMs follow predictable scaling laws, achieving competitive reasoning capabilities with a drastically reduced memory footprint.

## ⚡ Key Features

- **Architecture:** Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
- **Parameters:** 1B.
- **Precision:** 1.58-bit (weights {-1, 0, 1}).
- **Training Tokens:** 150 Billion (FineWeb-Edu, Cosmopedia, Stack-Dedup).
- **Hardware:** Trained on Google Cloud TPU v6e.

## 📊 Benchmark Results

| Benchmark      |   Metric   | BitMamba-2-1B | vs. 255M Baseline |
| :------------- | :--------: | :-----------: | :---------------: |
| **ARC-Easy**   |  Accuracy  |  **63.30%**   |       +7.8%       |
| **PIQA**       |  Accuracy  |  **68.77%**   |       +4.4%       |
| **BoolQ**      |  Accuracy  |  **62.35%**   |       +3.1%       |
| **HellaSwag**  |  Acc Norm  |  **45.59%**   |      +10.4%       |
| **WikiText-2** | Perplexity |   **29.62**   |       -22.1       |

Scaling from 255M to 1B parameters yields consistent improvements...

![Scaling Laws](training_loss_1b.png)

## 🚀 Usage (Inference)

This model is optimized for edge deployment using our custom C++ inference engine.

### 1. Download the Quantized Model

Download the `bitmamba_1b.bin` file located in the files tab (or `bitmamba_cpp` folder).

### 2. Run with C++

Go to our [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) to get the inference code.

```bash
# Example usage after compiling bitmamba.cpp
./bitmamba bitmamba_1b.bin "Hello, I am" tokenizer 0.7 1.1 0.05 0.9 40 200
```

### 3. JAX/Flax Usage

The `bitmamba_1b.msgpack` contains the raw JAX weights for research purposes. You can load them using the source code provided in `src/` on GitHub.

## 🛠️ Efficient Deployment

Running on a consumer **Intel Core i3-12100F CPU**:

| Model             | RAM Usage  | Speed         |
| ----------------- | ---------- | ------------- |
| **BitMamba-2-1B** | **621 MB** | **~53 tok/s** |

## 📜 Citation

```bibtex
@misc{salazar2026bitmamba2,
  author       = {Salazar, Jesus},
  title        = {{BitMamba}-2: Efficient Scaling of 1.58-bit State Space Models},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18394665},
  url          = {https://doi.org/10.5281/zenodo.18394665}
}
```