Add ARM NEON port reference, preprint citation, and mirror notice

Browse files

Files changed (1) hide show

README.md +79 -10

README.md CHANGED Viewed

@@ -10,6 +10,9 @@ tags:
 - ternary
 - efficient-inference
 - edge-computing
 datasets:
 - HuggingFaceFW/fineweb-edu
 - bigcode/the-stack-dedup
@@ -29,11 +32,55 @@ inference: false
 [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm-dark.svg)](https://huggingface.co/spaces/Zhayr1/Bitmamba-2-0.25B)
 [![Paper](https://img.shields.io/badge/Paper-Zenodo-00649C.svg)](https://doi.org/10.5281/zenodo.18394665)
 [![GitHub](https://img.shields.io/badge/GitHub-Source%20Code-black)](https://github.com/Zhayr1/BitMamba-2)
 </div>
 **BitMamba-2-255M** is the ultra-efficient baseline model of the BitMamba-2 family. It integrates **1.58-bit ternary quantization** (BitNet) into the **Mamba-2** architecture. Despite its small size, it demonstrates stable convergence and surprising reasoning capabilities, serving as the proof-of-concept for scaling ternary State Space Models.
 ## ⚡ Key Features
 - **Architecture:** Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
@@ -66,9 +113,9 @@ This model is optimized for extreme edge deployment (IoT, Mobile, Legacy Hardwar
 Download the `bitmamba_255m.bin` file located in the files tab.
-### 2. Run with C++
-Go to our [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) to get the inference code.
 ```bash
 # Example usage after compiling bitmamba.cpp
@@ -81,15 +128,14 @@ The `bitmamba_255m.msgpack` contains the raw JAX weights for research purposes.
 ## 🛠️ Efficient Deployment
-Running on a consumer **Intel Core i3-12100F CPU**:
-| Model               | RAM Usage  | Speed          |
-| ------------------- | ---------- | -------------- |
-| **BitMamba-2-255M** | **252 MB** | **~146 tok/s** |
-## 📜 Citation
-If you use this model or our architecture, please cite our paper:
 ```bibtex
 @misc{salazar2026bitmamba2,
@@ -100,4 +146,27 @@ If you use this model or our architecture, please cite our paper:
   doi          = {10.5281/zenodo.18394665},
   url          = {https://doi.org/10.5281/zenodo.18394665}
 }
-```

 - ternary
 - efficient-inference
 - edge-computing
+- arm-neon
+- apple-silicon
+- cpu-inference
 datasets:
 - HuggingFaceFW/fineweb-edu
 - bigcode/the-stack-dedup
 [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm-dark.svg)](https://huggingface.co/spaces/Zhayr1/Bitmamba-2-0.25B)
 [![Paper](https://img.shields.io/badge/Paper-Zenodo-00649C.svg)](https://doi.org/10.5281/zenodo.18394665)
 [![GitHub](https://img.shields.io/badge/GitHub-Source%20Code-black)](https://github.com/Zhayr1/BitMamba-2)
+[![ARM NEON Port](https://img.shields.io/badge/ARM%20NEON-Port-green)](https://github.com/rasata/bitmamba.cpp)
+[![Preprint](https://img.shields.io/badge/Preprint-engrXiv-blue)](https://engrxiv.org/)
 </div>
+> **Mirror repository** of [Zhayr1/BitMamba-2-0.25B](https://huggingface.co/Zhayr1/BitMamba-2-0.25B), maintained by [Aquantic Research](https://github.com/rasata/zonova-research-gpu-to-cpu-transposition) for the GPU-to-CPU/ARM neural network transposition programme.
 **BitMamba-2-255M** is the ultra-efficient baseline model of the BitMamba-2 family. It integrates **1.58-bit ternary quantization** (BitNet) into the **Mamba-2** architecture. Despite its small size, it demonstrates stable convergence and surprising reasoning capabilities, serving as the proof-of-concept for scaling ternary State Space Models.
+---
+## ARM NEON Port — Cross-Platform CPU Inference
+An **ARM NEON port** of the BitMamba-2 inference engine has been developed by Aquantic Research, enabling native inference on **Apple Silicon** (M1/M2/M3/M4) and ARM-based processors.
+| Model | Hardware | Speed | Latency/token | RAM |
+|-------|----------|-------|---------------|-----|
+| **BitMamba-2 255M** | **Apple M1 (ARM NEON)** | **82.5 tok/s** | 12.1 ms | 252 MB |
+| BitMamba-2 255M | Intel Core i3-12100F (AVX2) | ~146 tok/s | — | 252 MB |
+**Key finding**: Speed is **perfectly constant** regardless of sequence length (50, 200, or more tokens). This experimentally validates the **O(1) memory** property of SSM architectures — unlike Transformers whose memory grows with sequence length.
+### ARM NEON Port Resources
+- **Code**: [rasata/bitmamba.cpp](https://github.com/rasata/bitmamba.cpp) — ARM NEON fork with cross-platform dispatch (x86 AVX2 + ARM NEON)
+- **Preprint**: *"State Space Models as CPU-Native Neural Network Architectures: Experimental Evidence from ARM NEON Inference with 1.58-bit Quantized Mamba"* — Gabriel Zo-Hasina Rasatavohary, Aquantic Research, March 2026. To be published on [engrXiv](https://engrxiv.org/) (DOI pending).
+- **Research programme**: [GPU-to-CPU/ARM Neural Network Transposition](https://github.com/rasata/zonova-research-gpu-to-cpu-transposition)
+### Quick Start (ARM)
+```bash
+# Clone the ARM NEON fork
+git clone https://github.com/rasata/bitmamba.cpp
+cd bitmamba.cpp
+# Build (macOS Apple Silicon)
+brew install libomp
+cmake -B build && cmake --build build
+# Download weights from this repo
+wget https://huggingface.co/rasatavohary/BitMamba-2-0.25B/resolve/main/bitmamba_cpp/bitmamba_255m.bin
+# Run inference
+cd build && cp ../tokenizer.bin .
+./bitmamba ../bitmamba_255m.bin "The future of AI is" tokenizer 0.7 1.1 0.05 0.9 40 200
+```
+---
 ## ⚡ Key Features
 - **Architecture:** Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
 Download the `bitmamba_255m.bin` file located in the files tab.
+### 2. Run with C++ (x86)
+Go to the original [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) for x86 AVX2 inference, or [rasata/bitmamba.cpp](https://github.com/rasata/bitmamba.cpp) for cross-platform (x86 + ARM NEON) inference.
 ```bash
 # Example usage after compiling bitmamba.cpp
 ## 🛠️ Efficient Deployment
+| Platform | Hardware | RAM Usage | Speed |
+|----------|----------|-----------|-------|
+| x86 (original) | Intel Core i3-12100F (AVX2) | 252 MB | ~146 tok/s |
+| **ARM (NEON port)** | **Apple M1** | **252 MB** | **82.5 tok/s** |
+## 📜 Citations
+### Original model
 ```bibtex
 @misc{salazar2026bitmamba2,
   doi          = {10.5281/zenodo.18394665},
   url          = {https://doi.org/10.5281/zenodo.18394665}
 }
+```
+### ARM NEON port and CPU-native research
+```bibtex
+@misc{rasatavohary2026ssm,
+  author       = {Rasatavohary, Gabriel Zo-Hasina},
+  title        = {State Space Models as {CPU}-Native Neural Network Architectures:
+                   Experimental Evidence from {ARM NEON} Inference with 1.58-bit
+                   Quantized {Mamba}},
+  year         = {2026},
+  howpublished = {engrXiv preprint (DOI pending)},
+  note         = {Aquantic Research. First ARM NEON port of BitMamba-2.
+                   Code: \url{https://github.com/rasata/bitmamba.cpp}},
+}
+```
+## Links
+- [Original paper (Zenodo)](https://doi.org/10.5281/zenodo.18394665) — Salazar, 2026
+- [Original GitHub](https://github.com/Zhayr1/BitMamba-2) — Zhayr1
+- [ARM NEON fork](https://github.com/rasata/bitmamba.cpp) — Aquantic Research
+- [Research programme](https://github.com/rasata/zonova-research-gpu-to-cpu-transposition) — GPU-to-CPU/ARM transposition
+- [Interactive Demo](https://huggingface.co/spaces/Zhayr1/Bitmamba-2-0.25B)