rasatavohary commited on
Commit
c3007f3
Β·
verified Β·
1 Parent(s): 1635175

Add ARM NEON port reference, preprint citation, and mirror notice

Browse files
Files changed (1) hide show
  1. README.md +79 -10
README.md CHANGED
@@ -10,6 +10,9 @@ tags:
10
  - ternary
11
  - efficient-inference
12
  - edge-computing
 
 
 
13
  datasets:
14
  - HuggingFaceFW/fineweb-edu
15
  - bigcode/the-stack-dedup
@@ -29,11 +32,55 @@ inference: false
29
  [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm-dark.svg)](https://huggingface.co/spaces/Zhayr1/Bitmamba-2-0.25B)
30
  [![Paper](https://img.shields.io/badge/Paper-Zenodo-00649C.svg)](https://doi.org/10.5281/zenodo.18394665)
31
  [![GitHub](https://img.shields.io/badge/GitHub-Source%20Code-black)](https://github.com/Zhayr1/BitMamba-2)
 
 
32
 
33
  </div>
34
 
 
 
35
  **BitMamba-2-255M** is the ultra-efficient baseline model of the BitMamba-2 family. It integrates **1.58-bit ternary quantization** (BitNet) into the **Mamba-2** architecture. Despite its small size, it demonstrates stable convergence and surprising reasoning capabilities, serving as the proof-of-concept for scaling ternary State Space Models.
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  ## ⚑ Key Features
38
 
39
  - **Architecture:** Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
@@ -66,9 +113,9 @@ This model is optimized for extreme edge deployment (IoT, Mobile, Legacy Hardwar
66
 
67
  Download the `bitmamba_255m.bin` file located in the files tab.
68
 
69
- ### 2. Run with C++
70
 
71
- Go to our [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) to get the inference code.
72
 
73
  ```bash
74
  # Example usage after compiling bitmamba.cpp
@@ -81,15 +128,14 @@ The `bitmamba_255m.msgpack` contains the raw JAX weights for research purposes.
81
 
82
  ## πŸ› οΈ Efficient Deployment
83
 
84
- Running on a consumer **Intel Core i3-12100F CPU**:
 
 
 
85
 
86
- | Model | RAM Usage | Speed |
87
- | ------------------- | ---------- | -------------- |
88
- | **BitMamba-2-255M** | **252 MB** | **~146 tok/s** |
89
 
90
- ## πŸ“œ Citation
91
-
92
- If you use this model or our architecture, please cite our paper:
93
 
94
  ```bibtex
95
  @misc{salazar2026bitmamba2,
@@ -100,4 +146,27 @@ If you use this model or our architecture, please cite our paper:
100
  doi = {10.5281/zenodo.18394665},
101
  url = {https://doi.org/10.5281/zenodo.18394665}
102
  }
103
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  - ternary
11
  - efficient-inference
12
  - edge-computing
13
+ - arm-neon
14
+ - apple-silicon
15
+ - cpu-inference
16
  datasets:
17
  - HuggingFaceFW/fineweb-edu
18
  - bigcode/the-stack-dedup
 
32
  [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm-dark.svg)](https://huggingface.co/spaces/Zhayr1/Bitmamba-2-0.25B)
33
  [![Paper](https://img.shields.io/badge/Paper-Zenodo-00649C.svg)](https://doi.org/10.5281/zenodo.18394665)
34
  [![GitHub](https://img.shields.io/badge/GitHub-Source%20Code-black)](https://github.com/Zhayr1/BitMamba-2)
35
+ [![ARM NEON Port](https://img.shields.io/badge/ARM%20NEON-Port-green)](https://github.com/rasata/bitmamba.cpp)
36
+ [![Preprint](https://img.shields.io/badge/Preprint-engrXiv-blue)](https://engrxiv.org/)
37
 
38
  </div>
39
 
40
+ > **Mirror repository** of [Zhayr1/BitMamba-2-0.25B](https://huggingface.co/Zhayr1/BitMamba-2-0.25B), maintained by [Aquantic Research](https://github.com/rasata/zonova-research-gpu-to-cpu-transposition) for the GPU-to-CPU/ARM neural network transposition programme.
41
+
42
  **BitMamba-2-255M** is the ultra-efficient baseline model of the BitMamba-2 family. It integrates **1.58-bit ternary quantization** (BitNet) into the **Mamba-2** architecture. Despite its small size, it demonstrates stable convergence and surprising reasoning capabilities, serving as the proof-of-concept for scaling ternary State Space Models.
43
 
44
+ ---
45
+
46
+ ## ARM NEON Port β€” Cross-Platform CPU Inference
47
+
48
+ An **ARM NEON port** of the BitMamba-2 inference engine has been developed by Aquantic Research, enabling native inference on **Apple Silicon** (M1/M2/M3/M4) and ARM-based processors.
49
+
50
+ | Model | Hardware | Speed | Latency/token | RAM |
51
+ |-------|----------|-------|---------------|-----|
52
+ | **BitMamba-2 255M** | **Apple M1 (ARM NEON)** | **82.5 tok/s** | 12.1 ms | 252 MB |
53
+ | BitMamba-2 255M | Intel Core i3-12100F (AVX2) | ~146 tok/s | β€” | 252 MB |
54
+
55
+ **Key finding**: Speed is **perfectly constant** regardless of sequence length (50, 200, or more tokens). This experimentally validates the **O(1) memory** property of SSM architectures β€” unlike Transformers whose memory grows with sequence length.
56
+
57
+ ### ARM NEON Port Resources
58
+
59
+ - **Code**: [rasata/bitmamba.cpp](https://github.com/rasata/bitmamba.cpp) β€” ARM NEON fork with cross-platform dispatch (x86 AVX2 + ARM NEON)
60
+ - **Preprint**: *"State Space Models as CPU-Native Neural Network Architectures: Experimental Evidence from ARM NEON Inference with 1.58-bit Quantized Mamba"* β€” Gabriel Zo-Hasina Rasatavohary, Aquantic Research, March 2026. To be published on [engrXiv](https://engrxiv.org/) (DOI pending).
61
+ - **Research programme**: [GPU-to-CPU/ARM Neural Network Transposition](https://github.com/rasata/zonova-research-gpu-to-cpu-transposition)
62
+
63
+ ### Quick Start (ARM)
64
+
65
+ ```bash
66
+ # Clone the ARM NEON fork
67
+ git clone https://github.com/rasata/bitmamba.cpp
68
+ cd bitmamba.cpp
69
+
70
+ # Build (macOS Apple Silicon)
71
+ brew install libomp
72
+ cmake -B build && cmake --build build
73
+
74
+ # Download weights from this repo
75
+ wget https://huggingface.co/rasatavohary/BitMamba-2-0.25B/resolve/main/bitmamba_cpp/bitmamba_255m.bin
76
+
77
+ # Run inference
78
+ cd build && cp ../tokenizer.bin .
79
+ ./bitmamba ../bitmamba_255m.bin "The future of AI is" tokenizer 0.7 1.1 0.05 0.9 40 200
80
+ ```
81
+
82
+ ---
83
+
84
  ## ⚑ Key Features
85
 
86
  - **Architecture:** Mamba-2 SSM + BitNet b1.58 (Ternary Weights).
 
113
 
114
  Download the `bitmamba_255m.bin` file located in the files tab.
115
 
116
+ ### 2. Run with C++ (x86)
117
 
118
+ Go to the original [GitHub Repository](https://github.com/Zhayr1/bitmamba.cpp) for x86 AVX2 inference, or [rasata/bitmamba.cpp](https://github.com/rasata/bitmamba.cpp) for cross-platform (x86 + ARM NEON) inference.
119
 
120
  ```bash
121
  # Example usage after compiling bitmamba.cpp
 
128
 
129
  ## πŸ› οΈ Efficient Deployment
130
 
131
+ | Platform | Hardware | RAM Usage | Speed |
132
+ |----------|----------|-----------|-------|
133
+ | x86 (original) | Intel Core i3-12100F (AVX2) | 252 MB | ~146 tok/s |
134
+ | **ARM (NEON port)** | **Apple M1** | **252 MB** | **82.5 tok/s** |
135
 
136
+ ## πŸ“œ Citations
 
 
137
 
138
+ ### Original model
 
 
139
 
140
  ```bibtex
141
  @misc{salazar2026bitmamba2,
 
146
  doi = {10.5281/zenodo.18394665},
147
  url = {https://doi.org/10.5281/zenodo.18394665}
148
  }
149
+ ```
150
+
151
+ ### ARM NEON port and CPU-native research
152
+
153
+ ```bibtex
154
+ @misc{rasatavohary2026ssm,
155
+ author = {Rasatavohary, Gabriel Zo-Hasina},
156
+ title = {State Space Models as {CPU}-Native Neural Network Architectures:
157
+ Experimental Evidence from {ARM NEON} Inference with 1.58-bit
158
+ Quantized {Mamba}},
159
+ year = {2026},
160
+ howpublished = {engrXiv preprint (DOI pending)},
161
+ note = {Aquantic Research. First ARM NEON port of BitMamba-2.
162
+ Code: \url{https://github.com/rasata/bitmamba.cpp}},
163
+ }
164
+ ```
165
+
166
+ ## Links
167
+
168
+ - [Original paper (Zenodo)](https://doi.org/10.5281/zenodo.18394665) β€” Salazar, 2026
169
+ - [Original GitHub](https://github.com/Zhayr1/BitMamba-2) β€” Zhayr1
170
+ - [ARM NEON fork](https://github.com/rasata/bitmamba.cpp) β€” Aquantic Research
171
+ - [Research programme](https://github.com/rasata/zonova-research-gpu-to-cpu-transposition) β€” GPU-to-CPU/ARM transposition
172
+ - [Interactive Demo](https://huggingface.co/spaces/Zhayr1/Bitmamba-2-0.25B)