Spaces:

Cactus-Compute
/

README

Configuration error

App Files Files Community

hmunachii commited on Oct 24, 2025

Commit

8aff72b

verified ·

1 Parent(s): 2d3893c

Update README.md

Browse files

Files changed (1) hide show

README.md +211 -242

README.md CHANGED Viewed

@@ -1,244 +1,213 @@
-[![Email][gmail-shield]][gmail-url]&nbsp;&nbsp;&nbsp;[![Discord][discord-shield]][discord-url]&nbsp;&nbsp;&nbsp;[![Design Docs][docs-shield]][docs-url]&nbsp;&nbsp;&nbsp;
-[gmail-shield]: https://img.shields.io/badge/Gmail-red?style=for-the-badge&logo=gmail&logoColor=white
-[gmail-url]: mailto:founders@cactuscompute.com
-[discord-shield]: https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white
-[discord-url]: https://discord.gg/SdZjmfWQ
-[docs-shield]: https://img.shields.io/badge/DeepWiki-009485?style=for-the-badge&logo=readthedocs&logoColor=white
-[docs-url]: https://deepwiki.com/cactus-compute/cactus
-A cross-platform framework for deploying LLMs, VLMs, Embedding Models, TTS models and more locally on smartphones.
-## ![Features](https://img.shields.io/badge/Features-grey.svg?style=for-the-badge)
-- Available in Flutter and React-Native for cross-platform developers.
-- Supports any GGUF model you can find on Huggingface; Qwen, Gemma, Llama, DeepSeek etc.
-- Accommodates from FP32 to as low as 2-bit quantized models, for efficiency and less device strain.
-- MCP tool-calls to make AI performant and helpful (set reminder, gallery search, reply messages) etc.
-- iOS xcframework and JNILibs for native setups
-- Neat and tiny C++ build for custom hardware
-- Chat templates with Jinja2 support
-## ![Flutter](https://img.shields.io/badge/Flutter-grey.svg?style=for-the-badge&logo=Flutter&logoColor=white)
-1.  **Update `pubspec.yaml`:**
-    Add `cactus` to your project's dependencies. Ensure you have `flutter: sdk: flutter` (usually present by default).
-    ```yaml
-    dependencies:
-      flutter:
-        sdk: flutter
-      cactus: ^0.1.3
-    ```
-2.  **Install dependencies:**
-    Execute the following command in your project terminal:
-    ```bash
-    flutter pub get
-    ```
-3. **Flutter Text Completion**
-    ```dart
-    import 'package:cactus/cactus.dart';
-    // Initialize
-    final lm = await CactusLM.init(
-        modelUrl: 'huggingface/gguf/link',
-        nCtx: 2048,
-    );
-    // Completion
-    final messages = [CactusMessage(role: CactusMessageRole.user, content: 'Hello!')];
-    final params = CactusCompletionParams(nPredict: 100, temperature: 0.7);
-    final response = await lm.completion(messages, params);
-    // Embedding
-    final text = 'Your text to embed';
-    final params = CactusEmbeddingParams(normalize: true);
-    final result = await lm.embedding(text, params);
-    ```
-4. **Flutter VLM Completion**
-    ```dart
-    import 'package:cactus/cactus.dart';
-    // Initialize (Flutter handles downloads automatically)
-    final vlm = await CactusVLM.init(
-        modelUrl: 'huggingface/gguf/link',
-        mmprojUrl: 'huggingface/gguf/mmproj/link',
-    );
-    // Multimodal Completion (can add multiple images)
-    final messages = [CactusMessage(role: CactusMessageRole.user, content: 'Describe this image')];
-    final params = CactusVLMParams(
-        images: ['/absolute/path/to/image.jpg'],
-        nPredict: 200,
-        temperature: 0.3,
-    );
-    final response = await vlm.completion(messages, params);
-    ```
-  N/B: See the [Flutter Docs](https://github.com/cactus-compute/cactus/blob/main/flutter). It covers chat design, embeddings, multimodal models, text-to-speech, and more.
-## ![React Native](https://img.shields.io/badge/React%20Native-grey.svg?style=for-the-badge&logo=react&logoColor=%2361DAFB)
-1.  **Install the `cactus-react-native` package:**
-    Using npm:
-    ```bash
-    npm install cactus-react-native
-    ```
-    Or using yarn:
-    ```bash
-    yarn add cactus-react-native
-    ```
-2.  **Install iOS Pods (if not using Expo):**
-    For native iOS projects, ensure you link the native dependencies. Navigate to your `ios` directory and run:
-    ```bash
-    npx pod-install
-    ```
-3. **React-Native Text Completion**
-    ```typescript
-    // Initialize
-    const lm = await CactusLM.init({
-        model: '/path/to/model.gguf',
-        n_ctx: 2048,
-    });
-    // Completion
-    const messages = [{ role: 'user', content: 'Hello!' }];
-    const params = { n_predict: 100, temperature: 0.7 };
-    const response = await lm.completion(messages, params);
-    // Embedding
-    const text = 'Your text to embed';
-    const params = { normalize: true };
-    const result = await lm.embedding(text, params);
-    ```
-4. **React-Native VLM**
-    ```typescript
-    // Initialize
-    const vlm = await CactusVLM.init({
-        model: '/path/to/vision-model.gguf',
-        mmproj: '/path/to/mmproj.gguf',
-    });
-    // Multimodal Completion (can add multiple images)
-    const messages = [{ role: 'user', content: 'Describe this image' }];
-    const params = {
-        images: ['/absolute/path/to/image.jpg'],
-        n_predict: 200,
-        temperature: 0.3,
-    };
-    const response = await vlm.completion(messages, params);
-    ```
-N/B: See the [React Docs](https://github.com/cactus-compute/cactus/blob/main/react). It covers chat design, embeddings, multimodal models, text-to-speech, and various options.
-## ![C++](https://img.shields.io/badge/C%2B%2B-grey.svg?style=for-the-badge&logo=c%2B%2B&logoColor=white)
-Cactus backend is written in C/C++ and can run directly on any ARM/X86/Raspberry PI hardware like phones, smart tvs, watches, speakers, cameras, laptops etc.
-1. **Setup**
-    You need `CMake 3.14+` installed, or install with `brew install cmake` (on macOS) or standard package managers on Linux.
-2. **Build from Source**
-    ```bash
-    git clone https://github.com/your-org/cactus.git
-    cd cactus
-    mkdir build && cd build
-    cmake .. -DCMAKE_BUILD_TYPE=Release
-    make -j$(nproc)
-    ```
-3. **CMake Integration**
-    Add to your `CMakeLists.txt`:
-    ```cmake
-    # Add Cactus as subdirectory
-    add_subdirectory(cactus)
-    # Link to your target
-    target_link_libraries(your_target cactus)
-    target_include_directories(your_target PRIVATE cactus)
-    # Requires C++17 or higher
-    ```
-4. **Basic Text Completion**
-    ```cpp
-    #include "cactus/cactus.h"
-    #include <iostream>
-    int main() {
-        cactus::cactus_context context;
-        // Configure parameters
-        common_params params;
-        params.model.path = "model.gguf";
-        params.n_ctx = 2048;
-        params.n_threads = 4;
-        params.n_gpu_layers = 99; // Use GPU acceleration
-        // Load model
-        if (!context.loadModel(params)) {
-            std::cerr << "Failed to load model" << std::endl;
-            return 1;
         }
-        // Set prompt
-        context.params.prompt = "Hello, how are you?";
-        context.params.n_predict = 100;
-        // Initialize sampling
-        if (!context.initSampling()) {
-            std::cerr << "Failed to initialize sampling" << std::endl;
-            return 1;
-        }
-        // Generate response
-        context.beginCompletion();
-        context.loadPrompt();
-        while (context.has_next_token && !context.is_interrupted) {
-            auto token_output = context.doCompletion();
-            if (token_output.tok == -1) break;
-        }
-        std::cout << "Response: " << context.generated_text << std::endl;
-        return 0;
     }
-    ```
- To learn more, see the [C++ Docs](https://github.com/cactus-compute/cactus/blob/main/cactus). It covers chat design, embeddings, multimodal models, text-to-speech, and more.
-## ![Performance](https://img.shields.io/badge/Performance-grey.svg?style=for-the-badge)
-| Device                        |  Gemma3 1B Q4 (toks/sec) |    Qwen3 4B Q4 (toks/sec)   |
-|:------------------------------|:------------------------:|:---------------------------:|
-| iPhone 16 Pro Max             |            54            |             18              |
-| iPhone 16 Pro                 |            54            |             18              |
-| iPhone 16                     |            49            |             16              |
-| iPhone 15 Pro Max             |            45            |             15              |
-| iPhone 15 Pro                 |            45            |             15              |
-| iPhone 14 Pro Max             |            44            |             14              |
-| OnePlus 13 5G                 |            43            |             14              |
-| Samsung Galaxy S24 Ultra      |            42            |             14              |
-| iPhone 15                     |            42            |             14              |
-| OnePlus Open                  |            38            |             13              |
-| Samsung Galaxy S23 5G         |            37            |             12              |
-| Samsung Galaxy S24            |            36            |             12              |
-| iPhone 13 Pro                 |            35            |             11              |
-| OnePlus 12                    |            35            |             11              |
-| Galaxy S25 Ultra              |            29            |             9               |
-| OnePlus 11                    |            26            |             8               |
-| iPhone 13 mini                |            25            |             8               |
-| Redmi K70 Ultra               |            24            |             8               |
-| Xiaomi 13                     |            24            |             8               |
-| Samsung Galaxy S24+           |            22            |             7               |
-| Samsung Galaxy Z Fold 4       |            22            |             7               |
-| Xiaomi Poco F6 5G             |            22            |             6               |
-We are completely open-source and would appreciate feedback!
-Repo: https://github.com/cactus-compute/cactus

+<img src="assets/banner.jpg" alt="Logo" style="border-radius: 30px; width: 100%;">
+Energy-efficient kernels & inference engine for phones.
+## Why Cactus?
+- Phones run on battery, GPUs drain energy and heat the devices.
+- 70% of phones today don't ship NPUs which most frameworks optimse for.
+- Cactus is optimsed for old and new ARM-CPU first, with NPU/DSP/ISP coming.
+- Fast on all phones with less battery drain and heating.
+## Performance (CPU only)
+- Speed for various sizes can be estimated proportionally
+- INT4 wiil give 30% gains when merged
+- GPUs yield gains but drain battery, will be passed on for NPUs
+| Device                        |  Qwen3-INT8-600m (toks/sec) |
+|:------------------------------|:------------------------:|
+| iPhone 17 Pro                 | 74 |
+| Galaxy S25 Ultra / 16 Pro     | 58 |
+| iPhone 16 / Galaxy S25 / Nothing 3 | 52 |
+| iPhone 15 Pro                 | 48 |
+| iPhone 14 Pro / OnePlus 13 5G | 47 |
+| Galaxy S24 Ultra / iPhone 15  | 42 |
+| OnePlus Open / Galaxy S23     | 41 |
+| iPhone 13 Pro / OnePlus 12    | 38 |
+| iPhone 13 mini / Redmi K70 Ultra / Xiaomi 13 / OnePlus 11 | 27 |
+| Pixel 6a / Nothing 3a / iPhone X / Galaxy S21 | 16 |
+## File Size Comparison
+| Format | Size (Qwen3-0.6B-INT8) |
+|--------|------------------------|
+| Cactus | 370-420 MB |
+| ONNX/TFLite/MLX | 600 MB |
+| GGUF | 800 MB |
+| Executorch | 944 MB |
+## Battery drain
+ - Newer devices have bigger battery
+ - NPUs are designed for less drain (2-10x)
+ - Apple Intelligence drain 0.6 percent/min on iPhone 16 Pro Max
+| Device                        |  Qwen3-INT8-600m (percent/min) |
+|:------------------------------|:------------------------:|
+| OnePlus 13 5G | 0.33 |
+| Redmi K70 Ultra / OnePlus 12 | 0.41 |
+| Galaxy S25 Ultra / Iphone 17 Pro / Nothing 3 | 0.44 |
+| Galaxy S24 Ultra / Nothing 3a / Pixel 6a | 0.48 |
+| iPhone 16 Pro Max / Xiaomi 13 | 0.50  |
+## Design
+```
+┌─────────────────┐
+│   Cactus FFI    │ ←── OpenAI compatible C API for integration
+└─────────────────┘
+         │
+┌─────────────────┐
+│  Cactus Engine  │ ←── High-level transformer engine
+└─────────────────┘
+         │
+┌─────────────────┐
+│  Cactus Graph   │ ←── Unified zero-copy computation graph
+└─────────────────┘
+         │
+┌─────────────────┐
+│ Cactus Kernels  │ ←── Low-level ARM-specific SIMD operations
+└─────────────────┘
+```
+## Cactus Graph & Kernels
+Cactus Graph is a general numerical computing framework that runs on Cactus Kernels.
+Great for implementing custom models and scientific computing, like JAX for phones.
+```cpp
+#include cactus.h
+CactusGraph graph;
+auto a = graph.input({2, 3}, Precision::FP16);
+auto b = graph.input({3, 4}, Precision::INT8);
+auto x1 = graph.matmul(a, b, false);
+auto x2 = graph.transpose(x1);
+auto result = graph.matmul(b, x2, true);
+float a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f};
+float b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
+graph.set_input(a, a_data, Precision::FP16);
+graph.set_input(b, b_data, Precision::INT8);
+graph.execute();
+void* output_data = graph.get_output(result);
+graph.hard_reset();
+```
+## Cactus Engine & APIs
+Cactus Engine is a transformer inference engine built on top of Cactus Graphs.
+It is abstracted via Cactus Foreign Function Interface APIs.
+Header files are self-documenting but documentation contributions are welcome.
+```cpp
+#include cactus.h
+const char* model_path = "path/to/weight/folder";
+cactus_model_t model = cactus_init(model_path, 2048);
+const char* messages = R"([
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "/nothink My name is Henry Ndubuaku"}
+])";
+const char* options = R"({
+    "max_tokens": 50,
+    "stop_sequences": ["<|im_end|>"]
+})";
+char response[1024];
+int result = cactus_complete(model, messages, response, sizeof(response), options, nullptr, nullptr, nullptr);
+```
+With tool support:
+```cpp
+const char* tools = R"([
+    {
+        "function": {
+            "name": "get_weather",
+            "description": "Get weather for a location",
+            "parameters": {
+                "properties": {
+                    "location": {
+                        "type": "string",
+                        "description": "City name",
+                        "required": true
+                    }
+                },
+                "required": ["location"]
+            }
         }
     }
+])";
+int result = cactus_complete(model, messages, response, sizeof(response), options, tools, nullptr, nullptr);
+```
+## Using Cactus in your apps
+Cactus SDKs run 500k+ weekly inference tasks in production today, try them!
+<a href="https://github.com/cactus-compute/cactus-flutter" target="_blank">
+  <img alt="Flutter" src="https://img.shields.io/badge/Flutter-grey.svg?style=for-the-badge&logo=Flutter&logoColor=white">
+</a> <a href="https://github.com/cactus-compute/cactus-react" target="_blank">
+  <img alt="React Native" src="https://img.shields.io/badge/React%20Native-grey.svg?style=for-the-badge&logo=react&logoColor=%2361DAFB">
+</a> <a href="https://github.com/cactus-compute/cactus-kotlin" target="_blank">
+  <img alt="Kotlin" src="https://img.shields.io/badge/Kotlin_MP-grey.svg?style=for-the-badge&logo=kotlin&logoColor=white">
+</a>
+## Getting started
+<a href="https://cactuscompute.com/docs" target="_blank">
+  <img alt="Documentation" src="https://img.shields.io/badge/Documentation-4A90E2?style=for-the-badge&logo=gitbook&logoColor=white">
+</a> <a href="https://discord.gg/bNurx3AXTJ" target="_blank">
+  <img alt="Discord" src="https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white">
+</a>
+## Demo
+<a href="https://apps.apple.com/gb/app/cactus-chat/id6744444212" target="_blank">
+  <img alt="Download iOS App" src="https://img.shields.io/badge/Try_iOS_Demo-grey?style=for-the-badge&logo=apple&logoColor=white">
+</a> <a href="https://play.google.com/store/apps/details?id=com.rshemetsubuser.myapp&pcampaignid=web_share" target="_blank">
+  <img alt="Download Android App" src="https://img.shields.io/badge/Try_Android_Demo-grey?style=for-the-badge&logo=android&logoColor=white">
+</a>
+## Using this repo
+You can run these codes directly on M-series Macbooks since they are ARM-based.
+Vanilla M3 CPU-only can run Qwen3-600m-INT8 at 60-70 toks/sec, just run the following:
+```bash
+./tests/run.sh # chmod +x first time
+```
+## Generating weights from HuggingFace
+Use any of the (270m, 350m, 360m, 600m, 750m, 1B, 1.2B, 1.7B activated params):
+```bash
+# Language models
+python3 tools/convert_hf.py google/gemma-3-270m-it weights/gemma3-270m/ --precision INT8
+python3 tools/convert_hf.py LiquidAI/LFM2-350M weights/lfm2-350m/ --precision INT8
+python3 tools/convert_hf.py HuggingFaceTB/SmolLM2-360m-Instruct weights/smollm2-360m/ --precision INT8
+python3 tools/convert_hf.py Qwen/Qwen3-0.6B weights/qwen3-600m/ --precision INT8
+python3 tools/convert_hf.py LiquidAI/LFM2-700M weights/lfm2-700m/ --precision INT8
+python3 tools/convert_hf.py google/gemma-3-1b-it weights/gemma3-1b/ --precision INT8
+python3 tools/convert_hf.py LiquidAI/LFM2-1.2B weights/lfm2-1.2B/ --precision INT8
+python3 tools/convert_hf.py Qwen/Qwen3-1.7B weights/qwen3-1.7B/ --precision INT8
+# Embedding models
+python3 tools/convert_hf.py Qwen/Qwen3-Embedding-0.6B weights/qwen3-embed-600m/ --precision INT8
+python3 tools/convert_hf.py nomic-ai/nomic-embed-text-v2-moe weights/nomic/ --precision INT8
+```
+Simply replace the weight path `tests/test_engine.cpp` with your choice.
+## Roadmap:
+- Llama, LFM, SmolVLM, Whisper, Kitten, Neuphonic
+- Python tools for porting any Torch/JAX to cactus
+- GPTQ & NPU/DSP/ISP for high-end phones
+## Limitations
+While Cactus can be used for all Apple devices including Macbooks, for computers/AMD/Intel/Nvidia generally,
+please use HuggingFace, Llama.cpp, Ollama, vLLM, MLX. They're built for those, support x86, and are all great!
+## Contributing
+We welcome contributions! Please see our [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.