Zhayr1
/

BitMamba-2-1B

Text Generation

efficient-inference

Model card Files Files and versions

BitMamba-2-1B / README.md

Zhayr1's picture

Update README.md

8d69ad0 verified 27 days ago

|

history blame contribute delete

3.23 kB

	---
	language:
	- en
	license: mit
	tags:
	- bitnet
	- mamba
	- ssm
	- 1.58-bit
	- ternary
	- efficient-inference
	datasets:
	- HuggingFaceFW/fineweb-edu
	- bigcode/the-stack-dedup
	- HuggingFaceTB/cosmopedia
	metrics:
	- accuracy
	- perplexity
	library_name: jax
	pipeline_tag: text-generation
	inference: false
	---

	# BitMamba-2-1B

	<div align="center">

	[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm-dark.svg)](https://huggingface.co/spaces/Zhayr1/Bitmamba-2-1B)
	[![Paper](https://img.shields.io/badge/Paper-Zenodo-00649C.svg)](https://doi.org/10.5281/zenodo.18394665)
	[![GitHub](https://img.shields.io/badge/GitHub-Source%20Code-black)](https://github.com/Zhayr1/BitMamba-2)

	</div>

	BitMamba-2-1B is a scalable, hybrid architecture that integrates 1.58-bit ternary quantization (BitNet) into the Mamba-2 state space model framework. Trained from scratch on 150B tokens of high-quality data, it demonstrates that ternary SSMs follow predictable scaling laws, achieving competitive reasoning capabilities with a drastically reduced memory footprint.

	## ⚡ Key Features

	- Architecture: Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
	- Parameters: 1B.
	- Precision: 1.58-bit (weights {-1, 0, 1}).
	- Training Tokens: 150 Billion (FineWeb-Edu, Cosmopedia, Stack-Dedup).
	- Hardware: Trained on Google Cloud TPU v6e.

	## 📊 Benchmark Results

	\| Benchmark \| Metric \| BitMamba-2-1B \| vs. 255M Baseline \|
	\| :------------- \| :--------: \| :-----------: \| :---------------: \|
	\| ARC-Easy \| Accuracy \| 63.30% \| +7.8% \|
	\| PIQA \| Accuracy \| 68.77% \| +4.4% \|
	\| BoolQ \| Accuracy \| 62.35% \| +3.1% \|
	\| HellaSwag \| Acc Norm \| 45.59% \| +10.4% \|
	\| WikiText-2 \| Perplexity \| 29.62 \| -22.1 \|

	Scaling from 255M to 1B parameters yields consistent improvements...

	![Scaling Laws](training_loss_1b.png)

	## 🚀 Usage (Inference)

	This model is optimized for edge deployment using our custom C++ inference engine.

	### 1. Download the Quantized Model

	Download the `bitmamba_1b.bin` file located in the files tab (or `bitmamba_cpp` folder).

	### 2. Run with C++

	Go to our [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) to get the inference code.

	```bash
	# Example usage after compiling bitmamba.cpp
	./bitmamba bitmamba_1b.bin "Hello, I am" tokenizer 0.7 1.1 0.05 0.9 40 200
	```

	### 3. JAX/Flax Usage

	The `bitmamba_1b.msgpack` contains the raw JAX weights for research purposes. You can load them using the source code provided in `src/` on GitHub.

	## 🛠️ Efficient Deployment

	Running on a consumer Intel Core i3-12100F CPU:

	\| Model \| RAM Usage \| Speed \|
	\| ----------------- \| ---------- \| ------------- \|
	\| BitMamba-2-1B \| 621 MB \| ~53 tok/s \|

	## 📜 Citation

	```bibtex
	@misc{salazar2026bitmamba2,
	author = {Salazar, Jesus},
	title = {{BitMamba}-2: Efficient Scaling of 1.58-bit State Space Models},
	year = {2026},
	publisher = {Zenodo},
	doi = {10.5281/zenodo.18394665},
	url = {https://doi.org/10.5281/zenodo.18394665}
	}
	```