hmunachii commited on
Commit
960c7eb
·
verified ·
1 Parent(s): f3c3eb1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +276 -1
README.md CHANGED
@@ -5,6 +5,281 @@ colorFrom: indigo
5
  colorTo: green
6
  sdk: static
7
  pinned: false
 
 
8
  ---
 
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  colorTo: green
6
  sdk: static
7
  pinned: false
8
+ license: apache-2.0
9
+ short_description: Framework for running AI locally on mobile devices and weara
10
  ---
11
+ [![Email][gmail-shield]][gmail-url]   [![Discord][discord-shield]][discord-url]   [![Design Docs][docs-shield]][docs-url]   
12
 
13
+ [gmail-shield]: https://img.shields.io/badge/Gmail-red?style=for-the-badge&logo=gmail&logoColor=white
14
+ [gmail-url]: mailto:founders@cactuscompute.com
15
+
16
+ [discord-shield]: https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white
17
+ [discord-url]: https://discord.gg/SdZjmfWQ
18
+
19
+ [docs-shield]: https://img.shields.io/badge/DeepWiki-009485?style=for-the-badge&logo=readthedocs&logoColor=white
20
+ [docs-url]: https://deepwiki.com/cactus-compute/cactus
21
+
22
+ Cactus is a lightweight, high-performance framework for running AI models on mobile devices, with simple and consistent APIs across C/C++, Dart/Flutter and Ts/React-Native. Cactus currently leverages GGML backends to support any GGUF model already compatible with Llama.cpp.
23
+
24
+ ## ![Features](https://img.shields.io/badge/Features-grey.svg?style=for-the-badge)
25
+
26
+ - Text completion and chat completion
27
+ - Vision Language Models
28
+ - Streaming token generation
29
+ - Embedding generation
30
+ - Text-to-speech model support (early stages)
31
+ - JSON mode with schema validation
32
+ - Chat templates with Jinja2 support
33
+ - Low memory footprint
34
+ - Battery-efficient inference
35
+ - Background processing
36
+
37
+ ## ![Why Cactus?](https://img.shields.io/badge/WHy_Cactus-grey.svg?style=for-the-badge)
38
+
39
+ - APIs are increasingly becoming expensive, especially at scale
40
+ - Private and local, data do not leave the device whatsoever
41
+ - Low-latency anf fault-tolerant, no need for users to have internet connections
42
+ - Small models excell at most tasks, big APIs are often only better at enterprise tasks like coding
43
+ - Freedom to use any GGUF model, unlike Apple Foundations and Google AI Core
44
+ - React-Native and Flutter APIs, no need for separate Swift and Android setups
45
+ - iOS xcframework and JNILibs ifworking in native setup
46
+ - Neat and tiny C++ build for custom hardware
47
+
48
+ ## ![Flutter](https://img.shields.io/badge/Flutter-grey.svg?style=for-the-badge&logo=Flutter&logoColor=white)
49
+
50
+ 1. **Update `pubspec.yaml`:**
51
+ Add `cactus` to your project's dependencies. Ensure you have `flutter: sdk: flutter` (usually present by default).
52
+ ```yaml
53
+ dependencies:
54
+ flutter:
55
+ sdk: flutter
56
+ cactus: ^0.1.0
57
+ ```
58
+ 2. **Install dependencies:**
59
+ Execute the following command in your project terminal:
60
+ ```bash
61
+ flutter pub get
62
+ ```
63
+ 3. **Basic Flutter Text Completion**
64
+ ```dart
65
+ import 'package:cactus/cactus.dart';
66
+
67
+ Future<String> basicCompletion() async {
68
+ // Initialize context
69
+ final context = await CactusContext.init(CactusInitParams(
70
+ modelPath: '/path/to/model.gguf',
71
+ contextSize: 2048,
72
+ threads: 4,
73
+ ));
74
+
75
+ // Generate response
76
+ final result = await context.completion(CactusCompletionParams(
77
+ messages: [
78
+ ChatMessage(role: 'user', content: 'Hello, how are you?')
79
+ ],
80
+ maxPredictedTokens: 100,
81
+ temperature: 0.7,
82
+ ));
83
+
84
+ context.free();
85
+ return result.text;
86
+ }
87
+ ```
88
+ To learn more, see the [Flutter Docs](docs/flutter.md). It covers chat design, embeddings, multimodal models, text-to-speech, and more.
89
+
90
+ ## ![React Native](https://img.shields.io/badge/React%20Native-grey.svg?style=for-the-badge&logo=react&logoColor=%2361DAFB)
91
+
92
+ 1. **Install the `cactus-react-native` package:**
93
+ Using npm:
94
+ ```bash
95
+ npm install cactus-react-native
96
+ ```
97
+ Or using yarn:
98
+ ```bash
99
+ yarn add cactus-react-native
100
+ ```
101
+ 2. **Install iOS Pods (if not using Expo):**
102
+ For native iOS projects, ensure you link the native dependencies. Navigate to your `ios` directory and run:
103
+ ```bash
104
+ npx pod-install
105
+ ```
106
+
107
+ 3. **Basic React-Native Text Completion**
108
+ ```typescript
109
+ import { initLlama } from 'cactus-react-native';
110
+
111
+ // Initialize a context
112
+ const context = await initLlama({
113
+ model: '/path/to/your/model.gguf',
114
+ n_ctx: 2048,
115
+ n_threads: 4,
116
+ });
117
+
118
+ // Generate text
119
+ const result = await context.completion({
120
+ messages: [
121
+ { role: 'user', content: 'Hello, how are you?' }
122
+ ],
123
+ n_predict: 100,
124
+ temperature: 0.7,
125
+ });
126
+
127
+ console.log(result.text);
128
+ ```
129
+ To learn more, see the [React Docs](docs/react.md). It covers chat design, embeddings, multimodal models, text-to-speech, and more.
130
+
131
+ ## ![C++](https://img.shields.io/badge/C%2B%2B-grey.svg?style=for-the-badge&logo=c%2B%2B&logoColor=white)
132
+
133
+ Cactus backend is written in C/C++ and can run directly on any ARM/X86/Raspberry PI hardware like phones, smart tvs, watches, speakers, cameras, laptops etc.
134
+
135
+ 1. **Setup**
136
+ You need `CMake 3.14+` installed, or install with `brew install cmake` (on macOS) or standard package managers on Linux.
137
+
138
+ 2. **Build from Source**
139
+ ```bash
140
+ git clone https://github.com/your-org/cactus.git
141
+ cd cactus
142
+ mkdir build && cd build
143
+ cmake .. -DCMAKE_BUILD_TYPE=Release
144
+ make -j$(nproc)
145
+ ```
146
+
147
+ 3. **CMake Integration**
148
+ Add to your `CMakeLists.txt`:
149
+
150
+ ```cmake
151
+ # Add Cactus as subdirectory
152
+ add_subdirectory(cactus)
153
+
154
+ # Link to your target
155
+ target_link_libraries(your_target cactus)
156
+ target_include_directories(your_target PRIVATE cactus)
157
+
158
+ # Requires C++17 or higher
159
+ ```
160
+
161
+ 4. **Basic Text Completion**
162
+ ```cpp
163
+ #include "cactus/cactus.h"
164
+ #include <iostream>
165
+
166
+ int main() {
167
+ cactus::cactus_context context;
168
+
169
+ // Configure parameters
170
+ common_params params;
171
+ params.model.path = "model.gguf";
172
+ params.n_ctx = 2048;
173
+ params.n_threads = 4;
174
+ params.n_gpu_layers = 99; // Use GPU acceleration
175
+
176
+ // Load model
177
+ if (!context.loadModel(params)) {
178
+ std::cerr << "Failed to load model" << std::endl;
179
+ return 1;
180
+ }
181
+
182
+ // Set prompt
183
+ context.params.prompt = "Hello, how are you?";
184
+ context.params.n_predict = 100;
185
+
186
+ // Initialize sampling
187
+ if (!context.initSampling()) {
188
+ std::cerr << "Failed to initialize sampling" << std::endl;
189
+ return 1;
190
+ }
191
+
192
+ // Generate response
193
+ context.beginCompletion();
194
+ context.loadPrompt();
195
+
196
+ while (context.has_next_token && !context.is_interrupted) {
197
+ auto token_output = context.doCompletion();
198
+ if (token_output.tok == -1) break;
199
+ }
200
+
201
+ std::cout << "Response: " << context.generated_text << std::endl;
202
+ return 0;
203
+ }
204
+ ```
205
+ To learn more, see the [C++ Docs](docs/cpp.md). It covers chat design, embeddings, multimodal models, text-to-speech, and more.
206
+
207
+
208
+ ## ![Using this Repo & Example Apps](https://img.shields.io/badge/Using_Repo_And_Examples-grey.svg?style=for-the-badge)
209
+
210
+ First, clone the repo with `git clone https://github.com/cactus-compute/cactus.git`, cd into it and make all scripts executable with `chmod +x scripts/*.sh`
211
+
212
+ 1. **Flutter**
213
+ - Build the Android JNILibs with `scripts/build-flutter-android.sh`.
214
+ - Build the Flutter Plugin with `scripts/build-flutter-android.sh`.
215
+ - Navigate to the example app with `cd examples/flutter`.
216
+ - Open your simulator via Xcode or Android Studio, [walkthrough](https://medium.com/@daspinola/setting-up-android-and-ios-emulators-22d82494deda) if you have not done this before.
217
+ - Always start app with this combo `flutter clean && flutter pub get && flutter run`.
218
+ - Play with the app, and make changes either to the example app or plugin as desired.
219
+
220
+ 2. **React Native**
221
+ - Build the Android JNILibs with `scripts/build-react-android.sh`.
222
+ - Build the Flutter Plugin with `scripts/build-react-android.sh`.
223
+ - Navigate to the example app with `cd examples/react`.
224
+ - Setup your simulator via Xcode or Android Studio, [walkthrough](https://medium.com/@daspinola/setting-up-android-and-ios-emulators-22d82494deda) if you have not done this before.
225
+ - Always start app with this combo `yarn && yarn ios` or `yarn && yarn android`.
226
+ - Play with the app, and make changes either to the example app or package as desired.
227
+ - For now, if changes are made in the package, you would manually copy the files/folders into the `examples/react/node_modules/cactus-react-native`.
228
+
229
+ 2. **C/C++**
230
+ - Navigate to the example app with `cd examples/cpp`.
231
+ - There are multiple main files `main_vlm, main_llm, main_embed, main_tts`.
232
+ - Build both the libraries and executable using `build.sh`.
233
+ - Run with one of the executables `./cactus_vlm`, `./cactus_llm`, `./cactus_embed`, `./cactus_tts`.
234
+ - Try different models and make changes as desired.
235
+
236
+ 4. **Contributing**
237
+ - To contribute a bug fix, create a branch after making your changes with `git checkout -b <branch-name>` and submit a PR.
238
+ - To contribute a feature, please raise as issue first so it can be discussed, to avoid intersecting with someone else.
239
+ - [Join our discord](https://discord.gg/SdZjmfWQ)
240
+
241
+ ## ![Performance](https://img.shields.io/badge/Performance-grey.svg?style=for-the-badge)
242
+
243
+ | Device | Gemma-3 1B Q8 (toks/sec) | Qwen-2.5 1.5B Q8 (toks/sec) |
244
+ |:------------------------------|:------------------------:|:---------------------------:|
245
+ | iPhone 16 Pro Max | 46 | 37 |
246
+ | iPhone 16 Pro | 46 | 37 |
247
+ | iPhone 16 | 42 | 36 |
248
+ | iPhone 15 Pro Max | 39 | 31 |
249
+ | iPhone 15 Pro | 39 | 31 |
250
+ | iPhone 14 Pro Max | 38 | 29 |
251
+ | OnePlus 13 5G | 37 | - |
252
+ | Samsung Galaxy S24 Ultra | 36 | - |
253
+ | iPhone 15 | 36 | 25 |
254
+ | OnePlus Open | 33 | - |
255
+ | Samsung Galaxy S23 5G | 32 | - |
256
+ | Samsung Galaxy S24 | 31 | - |
257
+ | iPhone 13 Pro | 30 | - |
258
+ | OnePlus 12 | 30 | - |
259
+ | Galaxy S25 Ultra | 25 | - |
260
+ | OnePlus 11 | 23 | - |
261
+ | iPhone 13 mini | 22 | - |
262
+ | Redmi K70 Ultra | 21 | - |
263
+ | Xiaomi 13 | 21 | - |
264
+ | Samsung Galaxy S24+ | 19 | - |
265
+ | Samsung Galaxy Z Fold 4 | 19 | - |
266
+ | Xiaomi Poco F6 5G | 19 | - |
267
+
268
+ ## ![Demo](https://img.shields.io/badge/Demo-grey.svg?style=for-the-badge)
269
+
270
+ We created a demo chat app we use for benchmarking:
271
+
272
+ [![Download App](https://img.shields.io/badge/Download_iOS_App-grey?style=for-the-badge&logo=apple&logoColor=white)](https://apps.apple.com/gb/app/cactus-chat/id6744444212)
273
+ [![Download App](https://img.shields.io/badge/Download_Android_App-grey?style=for-the-badge&logo=android&logoColor=white)](https://play.google.com/store/apps/details?id=com.rshemetsubuser.myapp&pcampaignid=web_share)
274
+
275
+ ## ![Recommendations](https://img.shields.io/badge/Our_Recommendations-grey.svg?style=for-the-badge)
276
+ You can run up to 10B models at Q4 on most devices, but it is not recommended for production due to file size, speed, battery, heating performance.
277
+ We generally give the following recommendation.
278
+
279
+ - **Language Generation**: `SmolLM2-360m`, `Qwen-3-600m-Q6`, `Gemma-3-1B-Q6`, `Qwen-3-1.7B-Q6`
280
+ - **Multimodal Language Generation**: `Smol-VLM-500m-Q6`, `Gemma-3n-2B-Q6`
281
+ - **Embeddings**: `nomic-v2-moe-300m-Q6`, `jina-v3-570m-Q6`
282
+ - **Text-To-Speech**: `OuteTTS-0.2-500m-Q6`
283
+
284
+ Gemma-3n-2B-Q6 is a great omni model and beats GPT 4.1 across many metrics. It is multimodal (vision, audio) and can perfectly be used for embedding
285
+ text, images, audio, as well as zero-shot classification and more, with clever prompt engineering. We are trying hard to get the weights.