Spaces:

jakmro
/

README

Running

App Files Files Community

jakmro commited on Feb 26

Commit

af063ca

verified ·

1 Parent(s): de85205

Update organization README

Browse files

Files changed (1) hide show

README.md +270 -6

README.md CHANGED Viewed

@@ -1,10 +1,274 @@
 ---
-title: README
-emoji: 👁
-colorFrom: red
-colorTo: purple
 sdk: static
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: jakmro
 sdk: static
+pinned: true
 ---
+# Cactus
+<img src="assets/banner.jpg" alt="Logo" style="border-radius: 30px; width: 100%;">
+[![Docs][docs-shield]][docs-url]
+[![Website][website-shield]][website-url]
+[![GitHub][github-shield]][github-url]
+[![HuggingFace][hf-shield]][hf-url]
+[![Reddit][reddit-shield]][reddit-url]
+[![Blog][blog-shield]][blog-url]
+A hybrid low-latency energy-efficient AI engine for mobile devices & wearables.
+```
+┌─────────────────┐
+│  Cactus Engine  │ ←── OpenAI-compatible APIs for all major languages
+└─────────────────┘     Chat, vision, STT, RAG, tool call, cloud handoff
+         │
+┌─────────────────┐
+│  Cactus Graph   │ ←── Zero-copy computation graph (PyTorch for mobile)
+└─────────────────┘     Custom models, optimised for RAM & quantisation
+         │
+┌─────────────────┐
+│ Cactus Kernels  │ ←── ARM SIMD kernels (Apple, Snapdragon, Exynos, etc)
+└─────────────────┘     Custom attention, KV-cache quant, chunked prefill
+```
+## Quick Demo
+- Step 1: `brew install cactus-compute/cactus/cactus`
+- Step 2: `cactus transcribe` or `cactus run`
+## Cactus Engine
+```cpp
+#include cactus.h
+cactus_model_t model = cactus_init(
+    "path/to/weight/folder",
+    "path to txt or dir of txts for auto-rag",
+);
+const char* messages = R"([
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "My name is Henry Ndubuaku"}
+])";
+const char* options = R"({
+    "max_tokens": 50,
+    "stop_sequences": ["<|im_end|>"]
+})";
+char response[4096];
+int result = cactus_complete(
+    model,            // model handle
+    messages,         // JSON chat messages
+    response,         // response buffer
+    sizeof(response), // buffer size
+    options,          // generation options
+    nullptr,          // tools JSON
+    nullptr,          // streaming callback
+    nullptr           // user data
+);
+```
+Example response from Gemma3-270m
+```json
+{
+    "success": true,        // generation succeeded
+    "error": null,          // error details if failed
+    "cloud_handoff": false, // true if cloud model used
+    "response": "Hi there!",
+    "function_calls": [],   // parsed tool calls
+    "confidence": 0.8193,   // model confidence
+    "time_to_first_token_ms": 45.23,
+    "total_time_ms": 163.67,
+    "prefill_tps": 1621.89,
+    "decode_tps": 168.42,
+    "ram_usage_mb": 245.67,
+    "prefill_tokens": 28,
+    "decode_tokens": 50,
+    "total_tokens": 78
+}
+```
+## Cactus Graph
+```cpp
+#include cactus.h
+CactusGraph graph;
+auto a = graph.input({2, 3}, Precision::FP16);
+auto b = graph.input({3, 4}, Precision::INT8);
+auto x1 = graph.matmul(a, b, false);
+auto x2 = graph.transpose(x1);
+auto result = graph.matmul(b, x2, true);
+float a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f};
+float b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
+graph.set_input(a, a_data, Precision::FP16);
+graph.set_input(b, b_data, Precision::INT8);
+graph.execute();
+void* output_data = graph.get_output(result);
+graph.hard_reset();
+```
+## API & SDK References
+| Reference | Language | Description |
+|-----------|----------|-------------|
+| [Engine API](cactus_engine.md) | C | Chat completion, streaming, tool calling, transcription, embeddings, RAG, vision, VAD, vector index, cloud handoff |
+| [Graph API](cactus_graph.md) | C++ | Tensor operations, matrix multiplication, attention, normalization, activation functions |
+| [Python SDK](/python/) | Python | Mac, Linux |
+| [Swift SDK](/apple/) | Swift | iOS, macOS, tvOS, watchOS, Android |
+| [Kotlin SDK](/android/) | Kotlin | Android, iOS (via KMP) |
+| [Flutter SDK](/flutter/) | Dart | iOS, macOS, Android |
+| [Rust SDK](/rust/) | Rust | Mac, Linux |
+| [React Native](https://github.com/cactus-compute/cactus-react-native) | JavaScript | iOS, Android |
+## Benchmarks
+- All weights INT4 quantised
+- LFM: 1k-prefill / 100-decode, values are prefill tps / decode tps
+- LFM-VL: 256px input, values are latency / decode tps
+- Parakeet: 30s audio input, values are latency / decode tps
+- Missing latency = no NPU support yet
+| Device | LFM 1.2B | LFMVL 1.6B | Parakeet 1.1B | RAM |
+|--------|----------|------------|---------------|-----|
+| Mac M4 Pro | 582/100 | 0.2s/98 | 0.1s/900k+ | 76MB |
+| iPad/Mac M3 | 350/60 | 0.3s/69 | 0.3s/800k+ | 70MB |
+| iPhone 17 Pro | 327/48 | 0.3s/48 | 0.3s/300k+ | 108MB |
+| iPhone 13 Mini | 148/34 | 0.3s/35 | 0.7s/90k+ | 1GB |
+| Galaxy S25 Ultra | 255/37 | -/34 | -/250k+ | 1.5GB |
+| Pixel 6a | 70/15 | -/15 | -/17k+ | 1GB |
+| Galaxy A17 5G | 32/10 | -/11 | -/40k+ | 727MB |
+| CMF Phone 2 Pro | - | - | - | - |
+| Raspberry Pi 5 | 69/11 | 13.3s/11 | 4.5s/180k+ | 869MB |
+## Roadmap
+| Date | Status | Milestone |
+|------|--------|-----------|
+| Sep 2025 | Done | Released v1 |
+| Oct 2025 | Done | Chunked prefill, KVCache Quant (2x prefill) |
+| Nov 2025 | Done | Cactus Attention (10 & 1k prefill = same decode) |
+| Dec 2025 | Done | Team grows to +6 Research Engineers |
+| Jan 2026 | Done | Apple NPU/RAM, 5-11x faster iOS/Mac |
+| Feb 2026 | Done | Hybrid inference, INT4, lossless Quant (1.5x) |
+| Mar 2026 | Coming | Qualcomm/Google NPUs, 5-11x faster Android |
+| Apr 2026 | Coming | Mediatek/Exynos NPUs, Cactus@ICLR |
+| May 2026 | Coming | Kernel→C++, Graph/Engine→Rust, Mac GPU & VR |
+| Jun 2026 | Coming | Torch/JAX model transpilers |
+| Jul 2026 | Coming | Wearables optimisations, Cactus@ICML |
+| Aug 2026 | Coming | Orchestration |
+| Sep 2026 | Coming | Full Cactus paper, chip manufacturer partners |
+## Using this repo
+```
+┌──────────────────────────────────────────────────────────────────────────────┐
+│                                                                              │
+│ Step 0: if on Linux (Ubuntu/Debian)                                          │
+│ sudo apt-get install python3 python3-venv python3-pip cmake                  │
+│   build-essential libcurl4-openssl-dev                                       │
+│                                                                              │
+│ Step 1: clone and setup                                                      │
+│ git clone https://github.com/cactus-compute/cactus && cd cactus              │
+│ source ./setup                                                               │
+│                                                                              │
+│ Step 2: use the commands                                                     │
+│──────────────────────────────────────────────────────────────────────────────│
+│                                                                              │
+│  cactus auth                         manage Cloud API key                    │
+│    --status                          show key status                         │
+│    --clear                           remove saved key                        │
+│                                                                              │
+│  cactus run <model>                  opens playground (auto downloads)       │
+│    --precision INT4|INT8|FP16        quantization (default: INT4)            │
+│    --token <token>                   HF token (gated models)                 │
+│    --reconvert                       force reconversion from source          │
+│                                                                              │
+│  cactus transcribe [model]           live mic transcription (parakeet-1.1b)  │
+│    --file <audio.wav>                transcribe file instead of mic          │
+│    --precision INT4|INT8|FP16        quantization (default: INT4)            │
+│    --token <token>                   HF token (gated models)                 │
+│    --reconvert                       force reconversion from source          │
+│                                                                              │
+│  cactus download <model>             downloads model to ./weights            │
+│    --precision INT4|INT8|FP16        quantization (default: INT4)            │
+│    --token <token>                   HuggingFace API token                   │
+│    --reconvert                       force reconversion from source          │
+│                                                                              │
+│  cactus convert <model> [dir]        convert model, supports LoRA merge      │
+│    --precision INT4|INT8|FP16        quantization (default: INT4)            │
+│    --lora <path>                     LoRA adapter to merge                   │
+│    --token <token>                   HuggingFace API token                   │
+│                                                                              │
+│  cactus build                        build for ARM → build/libcactus.a       │
+│    --apple                           Apple (iOS/macOS)                       │
+│    --android                         Android                                 │
+│    --flutter                         Flutter (all platforms)                 │
+│    --python                          shared lib for Python FFI               │
+│                                                                              │
+│  cactus test                         run unit tests and benchmarks           │
+│    --model <model>                   default: LFM2-VL-450M                   │
+│    --transcribe_model <model>        default: moonshine-base                 │
+│    --benchmark                       use larger models                       │
+│    --precision INT4|INT8|FP16        regenerate weights with precision       │
+│    --reconvert                       force reconversion from source          │
+│    --no-rebuild                      skip building library                   │
+│    --only <test>                     specific test (llm, vlm, stt, etc)      │
+│    --ios                             run on connected iPhone                 │
+│    --android                         run on connected Android                │
+│                                                                              │
+│  cactus clean                        remove all build artifacts              │
+│  cactus --help                       show all commands and flags             │
+│                                                                              │
+└──────────────────────────────────────────────────────────────────────────────┘
+```
+## Maintaining Organisations
+1. [Cactus Compute, Inc. (YC S25)](https://cactuscompute.com/)
+2. [UCLA's BruinAI](https://bruinai.org/)
+3. [Char (YC S25)](https://char.com/)
+4. [Yale's AI Society](https://www.yale-ai.org/team)
+5. [National Unoversity of Singapore's AI Society](https://www.nusaisociety.org/)
+6. [UC Irvine's AI@UCI](https://aiclub.ics.uci.edu/)
+7. [Imperial College's AI Society](https://www.imperialcollegeunion.org/csp/1391)
+8. [University of Pennsylvania's AI@Penn](https://ai-at-penn-main-105.vercel.app/)
+9. [University of Michigan Ann-Arbor MSAIL](https://msail.github.io/)
+10. [University of Colorado Boulder's AI Club](https://www.cuaiclub.org/)
+## Citation
+If you use Cactus in your research, please cite it as follows:
+```bibtex
+@software{cactus,
+  title        = {Cactus: AI Inference Engine for Phones & Wearables},
+  author       = {Ndubuaku, Henry and Cactus Team},
+  url          = {https://github.com/cactus-compute/cactus},
+  year         = {2025}
+}
+```
+**N/B:** Scroll all the way up and click the shields link for resources!
+[docs-shield]: https://img.shields.io/badge/Docs-555?style=for-the-badge&logo=readthedocs&logoColor=white
+[docs-url]: https://cactus-compute.github.io/cactus/
+[website-shield]: https://img.shields.io/badge/Website-555?style=for-the-badge&logo=safari&logoColor=white
+[website-url]: https://cactuscompute.com/
+[github-shield]: https://img.shields.io/badge/GitHub-555?style=for-the-badge&logo=github&logoColor=white
+[github-url]: https://github.com/cactus-compute/cactus
+[hf-shield]: https://img.shields.io/badge/HuggingFace-555?style=for-the-badge&logo=huggingface&logoColor=white
+[hf-url]: https://huggingface.co/Cactus-Compute
+[reddit-shield]: https://img.shields.io/badge/Reddit-555?style=for-the-badge&logo=reddit&logoColor=white
+[reddit-url]: https://www.reddit.com/r/cactuscompute/
+[blog-shield]: https://img.shields.io/badge/Blog-555?style=for-the-badge&logo=hashnode&logoColor=white
+[blog-url]: https://cactus-compute.github.io/cactus/blog/README/