jakmro commited on
Commit
af063ca
Β·
verified Β·
1 Parent(s): de85205

Update organization README

Browse files
Files changed (1) hide show
  1. README.md +270 -6
README.md CHANGED
@@ -1,10 +1,274 @@
1
  ---
2
- title: README
3
- emoji: πŸ‘
4
- colorFrom: red
5
- colorTo: purple
6
  sdk: static
7
- pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: jakmro
 
 
 
3
  sdk: static
4
+ pinned: true
5
  ---
6
 
7
+ # Cactus
8
+
9
+ <img src="assets/banner.jpg" alt="Logo" style="border-radius: 30px; width: 100%;">
10
+
11
+ [![Docs][docs-shield]][docs-url]
12
+ [![Website][website-shield]][website-url]
13
+ [![GitHub][github-shield]][github-url]
14
+ [![HuggingFace][hf-shield]][hf-url]
15
+ [![Reddit][reddit-shield]][reddit-url]
16
+ [![Blog][blog-shield]][blog-url]
17
+
18
+ A hybrid low-latency energy-efficient AI engine for mobile devices & wearables.
19
+
20
+ ```
21
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
22
+ β”‚ Cactus Engine β”‚ ←── OpenAI-compatible APIs for all major languages
23
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Chat, vision, STT, RAG, tool call, cloud handoff
24
+ β”‚
25
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
26
+ β”‚ Cactus Graph β”‚ ←── Zero-copy computation graph (PyTorch for mobile)
27
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Custom models, optimised for RAM & quantisation
28
+ β”‚
29
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
30
+ β”‚ Cactus Kernels β”‚ ←── ARM SIMD kernels (Apple, Snapdragon, Exynos, etc)
31
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Custom attention, KV-cache quant, chunked prefill
32
+ ```
33
+
34
+ ## Quick Demo
35
+
36
+ - Step 1: `brew install cactus-compute/cactus/cactus`
37
+ - Step 2: `cactus transcribe` or `cactus run`
38
+
39
+ ## Cactus Engine
40
+
41
+ ```cpp
42
+ #include cactus.h
43
+
44
+ cactus_model_t model = cactus_init(
45
+ "path/to/weight/folder",
46
+ "path to txt or dir of txts for auto-rag",
47
+ );
48
+
49
+ const char* messages = R"([
50
+ {"role": "system", "content": "You are a helpful assistant."},
51
+ {"role": "user", "content": "My name is Henry Ndubuaku"}
52
+ ])";
53
+
54
+ const char* options = R"({
55
+ "max_tokens": 50,
56
+ "stop_sequences": ["<|im_end|>"]
57
+ })";
58
+
59
+ char response[4096];
60
+ int result = cactus_complete(
61
+ model, // model handle
62
+ messages, // JSON chat messages
63
+ response, // response buffer
64
+ sizeof(response), // buffer size
65
+ options, // generation options
66
+ nullptr, // tools JSON
67
+ nullptr, // streaming callback
68
+ nullptr // user data
69
+ );
70
+ ```
71
+ Example response from Gemma3-270m
72
+ ```json
73
+ {
74
+ "success": true, // generation succeeded
75
+ "error": null, // error details if failed
76
+ "cloud_handoff": false, // true if cloud model used
77
+ "response": "Hi there!",
78
+ "function_calls": [], // parsed tool calls
79
+ "confidence": 0.8193, // model confidence
80
+ "time_to_first_token_ms": 45.23,
81
+ "total_time_ms": 163.67,
82
+ "prefill_tps": 1621.89,
83
+ "decode_tps": 168.42,
84
+ "ram_usage_mb": 245.67,
85
+ "prefill_tokens": 28,
86
+ "decode_tokens": 50,
87
+ "total_tokens": 78
88
+ }
89
+ ```
90
+
91
+ ## Cactus Graph
92
+
93
+ ```cpp
94
+ #include cactus.h
95
+
96
+ CactusGraph graph;
97
+ auto a = graph.input({2, 3}, Precision::FP16);
98
+ auto b = graph.input({3, 4}, Precision::INT8);
99
+
100
+ auto x1 = graph.matmul(a, b, false);
101
+ auto x2 = graph.transpose(x1);
102
+ auto result = graph.matmul(b, x2, true);
103
+
104
+ float a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f};
105
+ float b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
106
+
107
+ graph.set_input(a, a_data, Precision::FP16);
108
+ graph.set_input(b, b_data, Precision::INT8);
109
+
110
+ graph.execute();
111
+ void* output_data = graph.get_output(result);
112
+
113
+ graph.hard_reset();
114
+ ```
115
+
116
+ ## API & SDK References
117
+
118
+ | Reference | Language | Description |
119
+ |-----------|----------|-------------|
120
+ | [Engine API](cactus_engine.md) | C | Chat completion, streaming, tool calling, transcription, embeddings, RAG, vision, VAD, vector index, cloud handoff |
121
+ | [Graph API](cactus_graph.md) | C++ | Tensor operations, matrix multiplication, attention, normalization, activation functions |
122
+ | [Python SDK](/python/) | Python | Mac, Linux |
123
+ | [Swift SDK](/apple/) | Swift | iOS, macOS, tvOS, watchOS, Android |
124
+ | [Kotlin SDK](/android/) | Kotlin | Android, iOS (via KMP) |
125
+ | [Flutter SDK](/flutter/) | Dart | iOS, macOS, Android |
126
+ | [Rust SDK](/rust/) | Rust | Mac, Linux |
127
+ | [React Native](https://github.com/cactus-compute/cactus-react-native) | JavaScript | iOS, Android |
128
+
129
+ ## Benchmarks
130
+
131
+ - All weights INT4 quantised
132
+ - LFM: 1k-prefill / 100-decode, values are prefill tps / decode tps
133
+ - LFM-VL: 256px input, values are latency / decode tps
134
+ - Parakeet: 30s audio input, values are latency / decode tps
135
+ - Missing latency = no NPU support yet
136
+
137
+ | Device | LFM 1.2B | LFMVL 1.6B | Parakeet 1.1B | RAM |
138
+ |--------|----------|------------|---------------|-----|
139
+ | Mac M4 Pro | 582/100 | 0.2s/98 | 0.1s/900k+ | 76MB |
140
+ | iPad/Mac M3 | 350/60 | 0.3s/69 | 0.3s/800k+ | 70MB |
141
+ | iPhone 17 Pro | 327/48 | 0.3s/48 | 0.3s/300k+ | 108MB |
142
+ | iPhone 13 Mini | 148/34 | 0.3s/35 | 0.7s/90k+ | 1GB |
143
+ | Galaxy S25 Ultra | 255/37 | -/34 | -/250k+ | 1.5GB |
144
+ | Pixel 6a | 70/15 | -/15 | -/17k+ | 1GB |
145
+ | Galaxy A17 5G | 32/10 | -/11 | -/40k+ | 727MB |
146
+ | CMF Phone 2 Pro | - | - | - | - |
147
+ | Raspberry Pi 5 | 69/11 | 13.3s/11 | 4.5s/180k+ | 869MB |
148
+
149
+ ## Roadmap
150
+
151
+ | Date | Status | Milestone |
152
+ |------|--------|-----------|
153
+ | Sep 2025 | Done | Released v1 |
154
+ | Oct 2025 | Done | Chunked prefill, KVCache Quant (2x prefill) |
155
+ | Nov 2025 | Done | Cactus Attention (10 & 1k prefill = same decode) |
156
+ | Dec 2025 | Done | Team grows to +6 Research Engineers |
157
+ | Jan 2026 | Done | Apple NPU/RAM, 5-11x faster iOS/Mac |
158
+ | Feb 2026 | Done | Hybrid inference, INT4, lossless Quant (1.5x) |
159
+ | Mar 2026 | Coming | Qualcomm/Google NPUs, 5-11x faster Android |
160
+ | Apr 2026 | Coming | Mediatek/Exynos NPUs, Cactus@ICLR |
161
+ | May 2026 | Coming | Kernel→C++, Graph/Engine→Rust, Mac GPU & VR |
162
+ | Jun 2026 | Coming | Torch/JAX model transpilers |
163
+ | Jul 2026 | Coming | Wearables optimisations, Cactus@ICML |
164
+ | Aug 2026 | Coming | Orchestration |
165
+ | Sep 2026 | Coming | Full Cactus paper, chip manufacturer partners |
166
+
167
+ ## Using this repo
168
+
169
+ ```
170
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
171
+ β”‚ β”‚
172
+ β”‚ Step 0: if on Linux (Ubuntu/Debian) β”‚
173
+ β”‚ sudo apt-get install python3 python3-venv python3-pip cmake β”‚
174
+ β”‚ build-essential libcurl4-openssl-dev β”‚
175
+ β”‚ β”‚
176
+ β”‚ Step 1: clone and setup β”‚
177
+ β”‚ git clone https://github.com/cactus-compute/cactus && cd cactus β”‚
178
+ β”‚ source ./setup β”‚
179
+ β”‚ β”‚
180
+ β”‚ Step 2: use the commands β”‚
181
+ │──────────────────────────────────────────────────────────────────────────────│
182
+ β”‚ β”‚
183
+ β”‚ cactus auth manage Cloud API key β”‚
184
+ β”‚ --status show key status β”‚
185
+ β”‚ --clear remove saved key β”‚
186
+ β”‚ β”‚
187
+ β”‚ cactus run <model> opens playground (auto downloads) β”‚
188
+ β”‚ --precision INT4|INT8|FP16 quantization (default: INT4) β”‚
189
+ β”‚ --token <token> HF token (gated models) β”‚
190
+ β”‚ --reconvert force reconversion from source β”‚
191
+ β”‚ β”‚
192
+ β”‚ cactus transcribe [model] live mic transcription (parakeet-1.1b) β”‚
193
+ β”‚ --file <audio.wav> transcribe file instead of mic β”‚
194
+ β”‚ --precision INT4|INT8|FP16 quantization (default: INT4) β”‚
195
+ β”‚ --token <token> HF token (gated models) β”‚
196
+ β”‚ --reconvert force reconversion from source β”‚
197
+ β”‚ β”‚
198
+ β”‚ cactus download <model> downloads model to ./weights β”‚
199
+ β”‚ --precision INT4|INT8|FP16 quantization (default: INT4) β”‚
200
+ β”‚ --token <token> HuggingFace API token β”‚
201
+ β”‚ --reconvert force reconversion from source β”‚
202
+ β”‚ β”‚
203
+ β”‚ cactus convert <model> [dir] convert model, supports LoRA merge β”‚
204
+ β”‚ --precision INT4|INT8|FP16 quantization (default: INT4) β”‚
205
+ β”‚ --lora <path> LoRA adapter to merge β”‚
206
+ β”‚ --token <token> HuggingFace API token β”‚
207
+ β”‚ β”‚
208
+ β”‚ cactus build build for ARM β†’ build/libcactus.a β”‚
209
+ β”‚ --apple Apple (iOS/macOS) β”‚
210
+ β”‚ --android Android β”‚
211
+ β”‚ --flutter Flutter (all platforms) β”‚
212
+ β”‚ --python shared lib for Python FFI β”‚
213
+ β”‚ β”‚
214
+ β”‚ cactus test run unit tests and benchmarks β”‚
215
+ β”‚ --model <model> default: LFM2-VL-450M β”‚
216
+ β”‚ --transcribe_model <model> default: moonshine-base β”‚
217
+ β”‚ --benchmark use larger models β”‚
218
+ β”‚ --precision INT4|INT8|FP16 regenerate weights with precision β”‚
219
+ β”‚ --reconvert force reconversion from source β”‚
220
+ β”‚ --no-rebuild skip building library β”‚
221
+ β”‚ --only <test> specific test (llm, vlm, stt, etc) β”‚
222
+ β”‚ --ios run on connected iPhone β”‚
223
+ β”‚ --android run on connected Android β”‚
224
+ β”‚ β”‚
225
+ β”‚ cactus clean remove all build artifacts β”‚
226
+ β”‚ cactus --help show all commands and flags β”‚
227
+ β”‚ β”‚
228
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
229
+ ```
230
+ ## Maintaining Organisations
231
+
232
+ 1. [Cactus Compute, Inc. (YC S25)](https://cactuscompute.com/)
233
+ 2. [UCLA's BruinAI](https://bruinai.org/)
234
+ 3. [Char (YC S25)](https://char.com/)
235
+ 4. [Yale's AI Society](https://www.yale-ai.org/team)
236
+ 5. [National Unoversity of Singapore's AI Society](https://www.nusaisociety.org/)
237
+ 6. [UC Irvine's AI@UCI](https://aiclub.ics.uci.edu/)
238
+ 7. [Imperial College's AI Society](https://www.imperialcollegeunion.org/csp/1391)
239
+ 8. [University of Pennsylvania's AI@Penn](https://ai-at-penn-main-105.vercel.app/)
240
+ 9. [University of Michigan Ann-Arbor MSAIL](https://msail.github.io/)
241
+ 10. [University of Colorado Boulder's AI Club](https://www.cuaiclub.org/)
242
+
243
+ ## Citation
244
+
245
+ If you use Cactus in your research, please cite it as follows:
246
+
247
+ ```bibtex
248
+ @software{cactus,
249
+ title = {Cactus: AI Inference Engine for Phones & Wearables},
250
+ author = {Ndubuaku, Henry and Cactus Team},
251
+ url = {https://github.com/cactus-compute/cactus},
252
+ year = {2025}
253
+ }
254
+ ```
255
+
256
+ **N/B:** Scroll all the way up and click the shields link for resources!
257
+
258
+ [docs-shield]: https://img.shields.io/badge/Docs-555?style=for-the-badge&logo=readthedocs&logoColor=white
259
+ [docs-url]: https://cactus-compute.github.io/cactus/
260
+
261
+ [website-shield]: https://img.shields.io/badge/Website-555?style=for-the-badge&logo=safari&logoColor=white
262
+ [website-url]: https://cactuscompute.com/
263
+
264
+ [github-shield]: https://img.shields.io/badge/GitHub-555?style=for-the-badge&logo=github&logoColor=white
265
+ [github-url]: https://github.com/cactus-compute/cactus
266
+
267
+ [hf-shield]: https://img.shields.io/badge/HuggingFace-555?style=for-the-badge&logo=huggingface&logoColor=white
268
+ [hf-url]: https://huggingface.co/Cactus-Compute
269
+
270
+ [reddit-shield]: https://img.shields.io/badge/Reddit-555?style=for-the-badge&logo=reddit&logoColor=white
271
+ [reddit-url]: https://www.reddit.com/r/cactuscompute/
272
+
273
+ [blog-shield]: https://img.shields.io/badge/Blog-555?style=for-the-badge&logo=hashnode&logoColor=white
274
+ [blog-url]: https://cactus-compute.github.io/cactus/blog/README/