Spaces:
Configuration error
Configuration error
| Energy-efficient kernels & inference engine for phones. | |
| ## Why Cactus? | |
| - Phones run on battery, GPUs drain energy and heat the devices. | |
| - 70% of phones today don't ship NPUs which most frameworks optimse for. | |
| - Cactus is optimsed for old and new ARM-CPU first, with NPU/DSP/ISP coming. | |
| - Fast on all phones with less battery drain and heating. | |
| ## Performance (CPU only) | |
| - Speed for various sizes can be estimated proportionally | |
| - INT4 wiil give 30% gains when merged | |
| - GPUs yield gains but drain battery, will be passed on for NPUs | |
| | Device | Qwen3-INT8-600m (toks/sec) | | |
| |:------------------------------|:------------------------:| | |
| | iPhone 17 Pro | 74 | | |
| | Galaxy S25 Ultra / 16 Pro | 58 | | |
| | iPhone 16 / Galaxy S25 / Nothing 3 | 52 | | |
| | iPhone 15 Pro | 48 | | |
| | iPhone 14 Pro / OnePlus 13 5G | 47 | | |
| | Galaxy S24 Ultra / iPhone 15 | 42 | | |
| | OnePlus Open / Galaxy S23 | 41 | | |
| | iPhone 13 Pro / OnePlus 12 | 38 | | |
| | iPhone 13 mini / Redmi K70 Ultra / Xiaomi 13 / OnePlus 11 | 27 | | |
| | Pixel 6a / Nothing 3a / iPhone X / Galaxy S21 | 16 | | |
| ## File Size Comparison | |
| | Format | Size (Qwen3-0.6B-INT8) | | |
| |--------|------------------------| | |
| | Cactus | 370-420 MB | | |
| | ONNX/TFLite/MLX | 600 MB | | |
| | GGUF | 800 MB | | |
| | Executorch | 944 MB | | |
| ## Battery drain | |
| - Newer devices have bigger battery | |
| - NPUs are designed for less drain (2-10x) | |
| - Apple Intelligence drain 0.6 percent/min on iPhone 16 Pro Max | |
| | Device | Qwen3-INT8-600m (percent/min) | | |
| |:------------------------------|:------------------------:| | |
| | OnePlus 13 5G | 0.33 | | |
| | Redmi K70 Ultra / OnePlus 12 | 0.41 | | |
| | Galaxy S25 Ultra / Iphone 17 Pro / Nothing 3 | 0.44 | | |
| | Galaxy S24 Ultra / Nothing 3a / Pixel 6a | 0.48 | | |
| | iPhone 16 Pro Max / Xiaomi 13 | 0.50 | | |
| ## Design | |
| ``` | |
| βββββββββββββββββββ | |
| β Cactus FFI β βββ OpenAI compatible C API for integration | |
| βββββββββββββββββββ | |
| β | |
| βββββββββββββββββββ | |
| β Cactus Engine β βββ High-level transformer engine | |
| βββββββββββββββββββ | |
| β | |
| βββββββββββββββββββ | |
| β Cactus Graph β βββ Unified zero-copy computation graph | |
| βββββββββββββββββββ | |
| β | |
| βββββββββββββββββββ | |
| β Cactus Kernels β βββ Low-level ARM-specific SIMD operations | |
| βββββββββββββββββββ | |
| ``` | |
| ## Cactus Graph & Kernels | |
| Cactus Graph is a general numerical computing framework that runs on Cactus Kernels. | |
| Great for implementing custom models and scientific computing, like JAX for phones. | |
| ```cpp | |
| #include cactus.h | |
| CactusGraph graph; | |
| auto a = graph.input({2, 3}, Precision::FP16); | |
| auto b = graph.input({3, 4}, Precision::INT8); | |
| auto x1 = graph.matmul(a, b, false); | |
| auto x2 = graph.transpose(x1); | |
| auto result = graph.matmul(b, x2, true); | |
| float a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f}; | |
| float b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}; | |
| graph.set_input(a, a_data, Precision::FP16); | |
| graph.set_input(b, b_data, Precision::INT8); | |
| graph.execute(); | |
| void* output_data = graph.get_output(result); | |
| graph.hard_reset(); | |
| ``` | |
| ## Cactus Engine & APIs | |
| Cactus Engine is a transformer inference engine built on top of Cactus Graphs. | |
| It is abstracted via Cactus Foreign Function Interface APIs. | |
| Header files are self-documenting but documentation contributions are welcome. | |
| ```cpp | |
| #include cactus.h | |
| const char* model_path = "path/to/weight/folder"; | |
| cactus_model_t model = cactus_init(model_path, 2048); | |
| const char* messages = R"([ | |
| {"role": "system", "content": "You are a helpful assistant."}, | |
| {"role": "user", "content": "/nothink My name is Henry Ndubuaku"} | |
| ])"; | |
| const char* options = R"({ | |
| "max_tokens": 50, | |
| "stop_sequences": ["<|im_end|>"] | |
| })"; | |
| char response[1024]; | |
| int result = cactus_complete(model, messages, response, sizeof(response), options, nullptr, nullptr, nullptr); | |
| ``` | |
| With tool support: | |
| ```cpp | |
| const char* tools = R"([ | |
| { | |
| "function": { | |
| "name": "get_weather", | |
| "description": "Get weather for a location", | |
| "parameters": { | |
| "properties": { | |
| "location": { | |
| "type": "string", | |
| "description": "City name", | |
| "required": true | |
| } | |
| }, | |
| "required": ["location"] | |
| } | |
| } | |
| } | |
| ])"; | |
| int result = cactus_complete(model, messages, response, sizeof(response), options, tools, nullptr, nullptr); | |
| ``` | |
| ## Using Cactus in your apps | |
| Cactus SDKs run 500k+ weekly inference tasks in production today, try them! | |
| <a href="https://github.com/cactus-compute/cactus-flutter" target="_blank"> | |
| <img alt="Flutter" src="https://img.shields.io/badge/Flutter-grey.svg?style=for-the-badge&logo=Flutter&logoColor=white"> | |
| </a> <a href="https://github.com/cactus-compute/cactus-react" target="_blank"> | |
| <img alt="React Native" src="https://img.shields.io/badge/React%20Native-grey.svg?style=for-the-badge&logo=react&logoColor=%2361DAFB"> | |
| </a> <a href="https://github.com/cactus-compute/cactus-kotlin" target="_blank"> | |
| <img alt="Kotlin" src="https://img.shields.io/badge/Kotlin_MP-grey.svg?style=for-the-badge&logo=kotlin&logoColor=white"> | |
| </a> | |
| ## Getting started | |
| <a href="https://cactuscompute.com/docs" target="_blank"> | |
| <img alt="Documentation" src="https://img.shields.io/badge/Documentation-4A90E2?style=for-the-badge&logo=gitbook&logoColor=white"> | |
| </a> <a href="https://discord.gg/bNurx3AXTJ" target="_blank"> | |
| <img alt="Discord" src="https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white"> | |
| </a> | |
| ## Demo | |
| <a href="https://apps.apple.com/gb/app/cactus-chat/id6744444212" target="_blank"> | |
| <img alt="Download iOS App" src="https://img.shields.io/badge/Try_iOS_Demo-grey?style=for-the-badge&logo=apple&logoColor=white"> | |
| </a> <a href="https://play.google.com/store/apps/details?id=com.rshemetsubuser.myapp&pcampaignid=web_share" target="_blank"> | |
| <img alt="Download Android App" src="https://img.shields.io/badge/Try_Android_Demo-grey?style=for-the-badge&logo=android&logoColor=white"> | |
| </a> | |
| ## Using this repo | |
| You can run these codes directly on M-series Macbooks since they are ARM-based. | |
| Vanilla M3 CPU-only can run Qwen3-600m-INT8 at 60-70 toks/sec, just run the following: | |
| ```bash | |
| ./tests/run.sh # chmod +x first time | |
| ``` | |
| ## Generating weights from HuggingFace | |
| Use any of the (270m, 350m, 360m, 600m, 750m, 1B, 1.2B, 1.7B activated params): | |
| ```bash | |
| # Language models | |
| python3 tools/convert_hf.py google/gemma-3-270m-it weights/gemma3-270m/ --precision INT8 | |
| python3 tools/convert_hf.py LiquidAI/LFM2-350M weights/lfm2-350m/ --precision INT8 | |
| python3 tools/convert_hf.py HuggingFaceTB/SmolLM2-360m-Instruct weights/smollm2-360m/ --precision INT8 | |
| python3 tools/convert_hf.py Qwen/Qwen3-0.6B weights/qwen3-600m/ --precision INT8 | |
| python3 tools/convert_hf.py LiquidAI/LFM2-700M weights/lfm2-700m/ --precision INT8 | |
| python3 tools/convert_hf.py google/gemma-3-1b-it weights/gemma3-1b/ --precision INT8 | |
| python3 tools/convert_hf.py LiquidAI/LFM2-1.2B weights/lfm2-1.2B/ --precision INT8 | |
| python3 tools/convert_hf.py Qwen/Qwen3-1.7B weights/qwen3-1.7B/ --precision INT8 | |
| # Embedding models | |
| python3 tools/convert_hf.py Qwen/Qwen3-Embedding-0.6B weights/qwen3-embed-600m/ --precision INT8 | |
| python3 tools/convert_hf.py nomic-ai/nomic-embed-text-v2-moe weights/nomic/ --precision INT8 | |
| ``` | |
| Simply replace the weight path `tests/test_engine.cpp` with your choice. | |
| ## Limitations | |
| While Cactus can be used for all Apple devices including Macbooks, for computers/AMD/Intel/Nvidia generally, | |
| please use HuggingFace, Llama.cpp, Ollama, vLLM, MLX. They're built for those, support x86, and are all |