--- language: - en license: mit tags: - bitnet - mamba - ssm - 1.58-bit - ternary - efficient-inference datasets: - HuggingFaceFW/fineweb-edu - bigcode/the-stack-dedup - HuggingFaceTB/cosmopedia metrics: - accuracy - perplexity library_name: jax pipeline_tag: text-generation inference: false --- # BitMamba-2-1B
[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm-dark.svg)](https://huggingface.co/spaces/Zhayr1/Bitmamba-2-1B) [![Paper](https://img.shields.io/badge/Paper-Zenodo-00649C.svg)](https://doi.org/10.5281/zenodo.18394665) [![GitHub](https://img.shields.io/badge/GitHub-Source%20Code-black)](https://github.com/Zhayr1/BitMamba-2)
**BitMamba-2-1B** is a scalable, hybrid architecture that integrates **1.58-bit ternary quantization** (BitNet) into the **Mamba-2** state space model framework. Trained from scratch on 150B tokens of high-quality data, it demonstrates that ternary SSMs follow predictable scaling laws, achieving competitive reasoning capabilities with a drastically reduced memory footprint. ## ⚡ Key Features - **Architecture:** Mamba-2 SSM + BitNet b1.58 (Ternary Weights). - **Parameters:** 1B. - **Precision:** 1.58-bit (weights {-1, 0, 1}). - **Training Tokens:** 150 Billion (FineWeb-Edu, Cosmopedia, Stack-Dedup). - **Hardware:** Trained on Google Cloud TPU v6e. ## 📊 Benchmark Results | Benchmark | Metric | BitMamba-2-1B | vs. 255M Baseline | | :------------- | :--------: | :-----------: | :---------------: | | **ARC-Easy** | Accuracy | **63.30%** | +7.8% | | **PIQA** | Accuracy | **68.77%** | +4.4% | | **BoolQ** | Accuracy | **62.35%** | +3.1% | | **HellaSwag** | Acc Norm | **45.59%** | +10.4% | | **WikiText-2** | Perplexity | **29.62** | -22.1 | Scaling from 255M to 1B parameters yields consistent improvements... ![Scaling Laws](training_loss_1b.png) ## 🚀 Usage (Inference) This model is optimized for edge deployment using our custom C++ inference engine. ### 1. Download the Quantized Model Download the `bitmamba_1b.bin` file located in the files tab (or `bitmamba_cpp` folder). ### 2. Run with C++ Go to our [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) to get the inference code. ```bash # Example usage after compiling bitmamba.cpp ./bitmamba bitmamba_1b.bin "Hello, I am" tokenizer 0.7 1.1 0.05 0.9 40 200 ``` ### 3. JAX/Flax Usage The `bitmamba_1b.msgpack` contains the raw JAX weights for research purposes. You can load them using the source code provided in `src/` on GitHub. ## 🛠️ Efficient Deployment Running on a consumer **Intel Core i3-12100F CPU**: | Model | RAM Usage | Speed | | ----------------- | ---------- | ------------- | | **BitMamba-2-1B** | **621 MB** | **~53 tok/s** | ## 📜 Citation ```bibtex @misc{salazar2026bitmamba2, author = {Salazar, Jesus}, title = {{BitMamba}-2: Efficient Scaling of 1.58-bit State Space Models}, year = {2026}, publisher = {Zenodo}, doi = {10.5281/zenodo.18394665}, url = {https://doi.org/10.5281/zenodo.18394665} } ```