| | --- |
| | title: jakmro |
| | sdk: static |
| | pinned: true |
| | --- |
| | |
| | # Cactus |
| |
|
| | <img src="assets/banner.jpg" alt="Logo" style="border-radius: 30px; width: 100%;"> |
| |
|
| | [![Docs][docs-shield]][docs-url] |
| | [![Website][website-shield]][website-url] |
| | [![GitHub][github-shield]][github-url] |
| | [![HuggingFace][hf-shield]][hf-url] |
| | [![Reddit][reddit-shield]][reddit-url] |
| | [![Blog][blog-shield]][blog-url] |
| |
|
| | A hybrid low-latency energy-efficient AI engine for mobile devices & wearables. |
| |
|
| | ``` |
| | βββββββββββββββββββ |
| | β Cactus Engine β βββ OpenAI-compatible APIs for all major languages |
| | βββββββββββββββββββ Chat, vision, STT, RAG, tool call, cloud handoff |
| | β |
| | βββββββββββββββββββ |
| | β Cactus Graph β βββ Zero-copy computation graph (PyTorch for mobile) |
| | βββββββββββββββββββ Custom models, optimised for RAM & quantisation |
| | β |
| | βββββββββββββββββββ |
| | β Cactus Kernels β βββ ARM SIMD kernels (Apple, Snapdragon, Exynos, etc) |
| | βββββββββββββββββββ Custom attention, KV-cache quant, chunked prefill |
| | ``` |
| |
|
| | ## Quick Demo |
| |
|
| | - Step 1: `brew install cactus-compute/cactus/cactus` |
| | - Step 2: `cactus transcribe` or `cactus run` |
| |
|
| | ## Cactus Engine |
| |
|
| | ```cpp |
| | #include cactus.h |
| | |
| | cactus_model_t model = cactus_init( |
| | "path/to/weight/folder", |
| | "path to txt or dir of txts for auto-rag", |
| | ); |
| | |
| | const char* messages = R"([ |
| | {"role": "system", "content": "You are a helpful assistant."}, |
| | {"role": "user", "content": "My name is Henry Ndubuaku"} |
| | ])"; |
| | |
| | const char* options = R"({ |
| | "max_tokens": 50, |
| | "stop_sequences": ["<|im_end|>"] |
| | })"; |
| | |
| | char response[4096]; |
| | int result = cactus_complete( |
| | model, // model handle |
| | messages, // JSON chat messages |
| | response, // response buffer |
| | sizeof(response), // buffer size |
| | options, // generation options |
| | nullptr, // tools JSON |
| | nullptr, // streaming callback |
| | nullptr // user data |
| | ); |
| | ``` |
| | Example response from Gemma3-270m |
| | ```json |
| | { |
| | "success": true, // generation succeeded |
| | "error": null, // error details if failed |
| | "cloud_handoff": false, // true if cloud model used |
| | "response": "Hi there!", |
| | "function_calls": [], // parsed tool calls |
| | "confidence": 0.8193, // model confidence |
| | "time_to_first_token_ms": 45.23, |
| | "total_time_ms": 163.67, |
| | "prefill_tps": 1621.89, |
| | "decode_tps": 168.42, |
| | "ram_usage_mb": 245.67, |
| | "prefill_tokens": 28, |
| | "decode_tokens": 50, |
| | "total_tokens": 78 |
| | } |
| | ``` |
| |
|
| | ## Cactus Graph |
| |
|
| | ```cpp |
| | #include cactus.h |
| | |
| | CactusGraph graph; |
| | auto a = graph.input({2, 3}, Precision::FP16); |
| | auto b = graph.input({3, 4}, Precision::INT8); |
| | |
| | auto x1 = graph.matmul(a, b, false); |
| | auto x2 = graph.transpose(x1); |
| | auto result = graph.matmul(b, x2, true); |
| | |
| | float a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f}; |
| | float b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}; |
| | |
| | graph.set_input(a, a_data, Precision::FP16); |
| | graph.set_input(b, b_data, Precision::INT8); |
| | |
| | graph.execute(); |
| | void* output_data = graph.get_output(result); |
| | |
| | graph.hard_reset(); |
| | ``` |
| |
|
| | ## API & SDK References |
| |
|
| | | Reference | Language | Description | |
| | |-----------|----------|-------------| |
| | | [Engine API](cactus_engine.md) | C | Chat completion, streaming, tool calling, transcription, embeddings, RAG, vision, VAD, vector index, cloud handoff | |
| | | [Graph API](cactus_graph.md) | C++ | Tensor operations, matrix multiplication, attention, normalization, activation functions | |
| | | [Python SDK](/python/) | Python | Mac, Linux | |
| | | [Swift SDK](/apple/) | Swift | iOS, macOS, tvOS, watchOS, Android | |
| | | [Kotlin SDK](/android/) | Kotlin | Android, iOS (via KMP) | |
| | | [Flutter SDK](/flutter/) | Dart | iOS, macOS, Android | |
| | | [Rust SDK](/rust/) | Rust | Mac, Linux | |
| | | [React Native](https://github.com/cactus-compute/cactus-react-native) | JavaScript | iOS, Android | |
| |
|
| | ## Benchmarks |
| |
|
| | - All weights INT4 quantised |
| | - LFM: 1k-prefill / 100-decode, values are prefill tps / decode tps |
| | - LFM-VL: 256px input, values are latency / decode tps |
| | - Parakeet: 30s audio input, values are latency / decode tps |
| | - Missing latency = no NPU support yet |
| |
|
| | | Device | LFM 1.2B | LFMVL 1.6B | Parakeet 1.1B | RAM | |
| | |--------|----------|------------|---------------|-----| |
| | | Mac M4 Pro | 582/100 | 0.2s/98 | 0.1s/900k+ | 76MB | |
| | | iPad/Mac M3 | 350/60 | 0.3s/69 | 0.3s/800k+ | 70MB | |
| | | iPhone 17 Pro | 327/48 | 0.3s/48 | 0.3s/300k+ | 108MB | |
| | | iPhone 13 Mini | 148/34 | 0.3s/35 | 0.7s/90k+ | 1GB | |
| | | Galaxy S25 Ultra | 255/37 | -/34 | -/250k+ | 1.5GB | |
| | | Pixel 6a | 70/15 | -/15 | -/17k+ | 1GB | |
| | | Galaxy A17 5G | 32/10 | -/11 | -/40k+ | 727MB | |
| | | CMF Phone 2 Pro | - | - | - | - | |
| | | Raspberry Pi 5 | 69/11 | 13.3s/11 | 4.5s/180k+ | 869MB | |
| |
|
| | ## Roadmap |
| |
|
| | | Date | Status | Milestone | |
| | |------|--------|-----------| |
| | | Sep 2025 | Done | Released v1 | |
| | | Oct 2025 | Done | Chunked prefill, KVCache Quant (2x prefill) | |
| | | Nov 2025 | Done | Cactus Attention (10 & 1k prefill = same decode) | |
| | | Dec 2025 | Done | Team grows to +6 Research Engineers | |
| | | Jan 2026 | Done | Apple NPU/RAM, 5-11x faster iOS/Mac | |
| | | Feb 2026 | Done | Hybrid inference, INT4, lossless Quant (1.5x) | |
| | | Mar 2026 | Coming | Qualcomm/Google NPUs, 5-11x faster Android | |
| | | Apr 2026 | Coming | Mediatek/Exynos NPUs, Cactus@ICLR | |
| | | May 2026 | Coming | KernelβC++, Graph/EngineβRust, Mac GPU & VR | |
| | | Jun 2026 | Coming | Torch/JAX model transpilers | |
| | | Jul 2026 | Coming | Wearables optimisations, Cactus@ICML | |
| | | Aug 2026 | Coming | Orchestration | |
| | | Sep 2026 | Coming | Full Cactus paper, chip manufacturer partners | |
| |
|
| | ## Using this repo |
| |
|
| | ``` |
| | ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| | β β |
| | β Step 0: if on Linux (Ubuntu/Debian) β |
| | β sudo apt-get install python3 python3-venv python3-pip cmake β |
| | β build-essential libcurl4-openssl-dev β |
| | β β |
| | β Step 1: clone and setup β |
| | β git clone https://github.com/cactus-compute/cactus && cd cactus β |
| | β source ./setup β |
| | β β |
| | β Step 2: use the commands β |
| | ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| | β β |
| | β cactus auth manage Cloud API key β |
| | β --status show key status β |
| | β --clear remove saved key β |
| | β β |
| | β cactus run <model> opens playground (auto downloads) β |
| | β --precision INT4|INT8|FP16 quantization (default: INT4) β |
| | β --token <token> HF token (gated models) β |
| | β --reconvert force reconversion from source β |
| | β β |
| | β cactus transcribe [model] live mic transcription (parakeet-1.1b) β |
| | β --file <audio.wav> transcribe file instead of mic β |
| | β --precision INT4|INT8|FP16 quantization (default: INT4) β |
| | β --token <token> HF token (gated models) β |
| | β --reconvert force reconversion from source β |
| | β β |
| | β cactus download <model> downloads model to ./weights β |
| | β --precision INT4|INT8|FP16 quantization (default: INT4) β |
| | β --token <token> HuggingFace API token β |
| | β --reconvert force reconversion from source β |
| | β β |
| | β cactus convert <model> [dir] convert model, supports LoRA merge β |
| | β --precision INT4|INT8|FP16 quantization (default: INT4) β |
| | β --lora <path> LoRA adapter to merge β |
| | β --token <token> HuggingFace API token β |
| | β β |
| | β cactus build build for ARM β build/libcactus.a β |
| | β --apple Apple (iOS/macOS) β |
| | β --android Android β |
| | β --flutter Flutter (all platforms) β |
| | β --python shared lib for Python FFI β |
| | β β |
| | β cactus test run unit tests and benchmarks β |
| | β --model <model> default: LFM2-VL-450M β |
| | β --transcribe_model <model> default: moonshine-base β |
| | β --benchmark use larger models β |
| | β --precision INT4|INT8|FP16 regenerate weights with precision β |
| | β --reconvert force reconversion from source β |
| | β --no-rebuild skip building library β |
| | β --only <test> specific test (llm, vlm, stt, etc) β |
| | β --ios run on connected iPhone β |
| | β --android run on connected Android β |
| | β β |
| | β cactus clean remove all build artifacts β |
| | β cactus --help show all commands and flags β |
| | β β |
| | ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| | ``` |
| | ## Maintaining Organisations |
| |
|
| | 1. [Cactus Compute, Inc. (YC S25)](https://cactuscompute.com/) |
| | 2. [UCLA's BruinAI](https://bruinai.org/) |
| | 3. [Char (YC S25)](https://char.com/) |
| | 4. [Yale's AI Society](https://www.yale-ai.org/team) |
| | 5. [National Unoversity of Singapore's AI Society](https://www.nusaisociety.org/) |
| | 6. [UC Irvine's AI@UCI](https://aiclub.ics.uci.edu/) |
| | 7. [Imperial College's AI Society](https://www.imperialcollegeunion.org/csp/1391) |
| | 8. [University of Pennsylvania's AI@Penn](https://ai-at-penn-main-105.vercel.app/) |
| | 9. [University of Michigan Ann-Arbor MSAIL](https://msail.github.io/) |
| | 10. [University of Colorado Boulder's AI Club](https://www.cuaiclub.org/) |
| |
|
| | ## Citation |
| |
|
| | If you use Cactus in your research, please cite it as follows: |
| |
|
| | ```bibtex |
| | @software{cactus, |
| | title = {Cactus: AI Inference Engine for Phones & Wearables}, |
| | author = {Ndubuaku, Henry and Cactus Team}, |
| | url = {https://github.com/cactus-compute/cactus}, |
| | year = {2025} |
| | } |
| | ``` |
| |
|
| | **N/B:** Scroll all the way up and click the shields link for resources! |
| |
|
| | [docs-shield]: https://img.shields.io/badge/Docs-555?style=for-the-badge&logo=readthedocs&logoColor=white |
| | [docs-url]: https://cactus-compute.github.io/cactus/ |
| |
|
| | [website-shield]: https://img.shields.io/badge/Website-555?style=for-the-badge&logo=safari&logoColor=white |
| | [website-url]: https://cactuscompute.com/ |
| |
|
| | [github-shield]: https://img.shields.io/badge/GitHub-555?style=for-the-badge&logo=github&logoColor=white |
| | [github-url]: https://github.com/cactus-compute/cactus |
| |
|
| | [hf-shield]: https://img.shields.io/badge/HuggingFace-555?style=for-the-badge&logo=huggingface&logoColor=white |
| | [hf-url]: https://huggingface.co/Cactus-Compute |
| |
|
| | [reddit-shield]: https://img.shields.io/badge/Reddit-555?style=for-the-badge&logo=reddit&logoColor=white |
| | [reddit-url]: https://www.reddit.com/r/cactuscompute/ |
| |
|
| | [blog-shield]: https://img.shields.io/badge/Blog-555?style=for-the-badge&logo=hashnode&logoColor=white |
| | [blog-url]: https://cactus-compute.github.io/cactus/blog/README/ |