Update model card: R7+R8 training, nervous system, personal AI OS
Browse files
README.md
CHANGED
|
@@ -1,241 +1,124 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
# Zora 4B
|
| 17 |
-
|
| 18 |
-
The orchestrator brain for [Zora](https://github.com/Azkabanned/zora) — a private, local-first AI
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
- **
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
|
| 42 |
-
|
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
model, tokenizer =
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
- apple-silicon
|
| 111 |
-
- tool-calling
|
| 112 |
-
- orchestrator
|
| 113 |
-
- local-ai
|
| 114 |
-
- personal-ai-os
|
| 115 |
-
base_model: Qwen/Qwen3-4B
|
| 116 |
-
---
|
| 117 |
-
|
| 118 |
-
# Zora 4B
|
| 119 |
-
|
| 120 |
-
The orchestrator brain for [Zora](https://github.com/Azkabanned/zora) — a private, local-first personal AI OS
|
| 121 |
-
that runs on Apple Silicon.
|
| 122 |
-
|
| 123 |
-
## What is this?
|
| 124 |
-
|
| 125 |
-
Zora 4B is a fine-tuned [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) model, quantised to 4-bit for efficient
|
| 126 |
-
inference on Apple Silicon via [MLX](https://github.com/ml-explore/mlx). It serves as Zora's primary reasoning
|
| 127 |
-
brain — handling tool calling, task routing, structured reflection, and conversational interaction.
|
| 128 |
-
|
| 129 |
-
This is **not** a general-purpose chat model. It is specifically trained for orchestrator behaviour: deciding
|
| 130 |
-
which tools to call, how to route tasks across local and remote compute, producing structured JSON for
|
| 131 |
-
autonomous cognition, and managing multi-step goals.
|
| 132 |
-
|
| 133 |
-
## Key capabilities
|
| 134 |
-
|
| 135 |
-
- **Tool calling** — 39+ tools with structured `<tool_call>` output format
|
| 136 |
-
- **Task routing** — classifies prompts into direct response, queued goal, or delegated work to 70B worker nodes
|
| 137 |
-
- **Structured reflection** — produces complete COG-X JSON schema for autonomous cognition loops (reasoning,
|
| 138 |
-
priorities, goals, repairs, notifications)
|
| 139 |
-
- **Task delegation** — routes complex build/code/refactor tasks to worker nodes with bigger models
|
| 140 |
-
- **Multi-turn reasoning** — maintains context across tool call chains (up to 8 rounds)
|
| 141 |
-
- **Thinking mode** — optional `<think>` blocks for chain-of-thought reasoning
|
| 142 |
-
|
| 143 |
-
## Hardware requirements
|
| 144 |
-
|
| 145 |
-
| Config | RAM | Performance |
|
| 146 |
-
|--------|-----|-------------|
|
| 147 |
-
| Mac Mini M4 24GB | 24GB | ~90 tok/s with TurboQuant KV cache (recommended orchestrator) |
|
| 148 |
-
| MacBook Pro M5 Max 128GB | 128GB | ~110 tok/s with speculative decoding |
|
| 149 |
-
| MacBook Air M3 16GB | 16GB | ~35 tok/s |
|
| 150 |
-
| Any Apple Silicon | 8GB+ | Will run, but may be slow |
|
| 151 |
-
|
| 152 |
-
The entire stack — model, KV cache, and OS — runs in **7GB RAM** on a 24GB Mac Mini. TurboQuant PolarQuant
|
| 153 |
-
compresses the KV cache to 1.5GB for 32K context.
|
| 154 |
-
|
| 155 |
-
## Usage
|
| 156 |
-
|
| 157 |
-
### With MLX (recommended)
|
| 158 |
-
|
| 159 |
-
```python
|
| 160 |
-
from mlx_lm import load, generate
|
| 161 |
-
|
| 162 |
-
model, tokenizer = load("project-zora/zora-4b")
|
| 163 |
-
response = generate(model, tokenizer, prompt="What's running on my cluster?", max_tokens=512)
|
| 164 |
-
|
| 165 |
-
As an Anthropic-compatible API
|
| 166 |
-
|
| 167 |
-
# Zora exposes the same API format as Anthropic on port 4001
|
| 168 |
-
export ANTHROPIC_BASE_URL=http://localhost:4001
|
| 169 |
-
export ANTHROPIC_API_KEY=local
|
| 170 |
-
claude # now running on your Metal GPU
|
| 171 |
-
|
| 172 |
-
With Zora orchestrator
|
| 173 |
-
|
| 174 |
-
This model is downloaded automatically when you run ./install.sh in the https://github.com/Azkabanned/zora. The
|
| 175 |
-
orchestrator loads it at startup and uses it for all local reasoning.
|
| 176 |
-
|
| 177 |
-
Training
|
| 178 |
-
|
| 179 |
-
┌───────┬───────────────────────────────────────────────────────────────┬────────────────┐
|
| 180 |
-
│ Round │ Focus │ Examples │
|
| 181 |
-
├───────┼───────────────────────────────────────────────────────────────┼────────────────┤
|
| 182 |
-
│ R1-R3 │ Core tool calling, multi-step chains │ 600+ │
|
| 183 |
-
├───────┼───────────────────────────────────────────────────────────────┼────────────────┤
|
| 184 |
-
│ R4-R5 │ Edge cases, delegation rules │ 200+ │
|
| 185 |
-
├───────┼───────────────────────────────────────────────────────────────┼────────────────┤
|
| 186 |
-
│ R6 │ All features (Team Zora, Enhanced Memory, Presence, 37 tools) │ 200+ │
|
| 187 |
-
├───────┼───────────────────────────────────────────────────────────────┼────────────────┤
|
| 188 |
-
│ R7 │ Structured JSON reflection (COG-X schema) │ 37 │
|
| 189 |
-
├───────┼───────────────────────────────────────────────────────────────┼────────────────┤
|
| 190 |
-
│ R8 │ Delegation routing (complex build tasks → worker) │ 40 │
|
| 191 |
-
├───────┼───────────────────────────────────────────────────────────────┼────────────────┤
|
| 192 |
-
│ Total │ │ 1,107 examples │
|
| 193 |
-
└───────┴───────────────────────────────────────────────────────────────┴────────────────┘
|
| 194 |
-
|
| 195 |
-
- Base model: Qwen3-4B
|
| 196 |
-
- Fine-tuning: LoRA SFT (16 layers, lr=1e-4, 2500 iterations)
|
| 197 |
-
- Final validation loss: 0.017
|
| 198 |
-
- Quantisation: 4-bit (4.5 bits per weight) via MLX
|
| 199 |
-
- Training hardware: MacBook Pro M5 Max 128GB via MLX LoRA
|
| 200 |
-
- Test result: 8/10 tool calling accuracy
|
| 201 |
-
- No personal data was used in training — all examples are synthetic
|
| 202 |
-
|
| 203 |
-
Architecture
|
| 204 |
-
|
| 205 |
-
Qwen3ForCausalLM
|
| 206 |
-
├── 36 layers, 2560 hidden size
|
| 207 |
-
├── 32 attention heads, 8 KV heads (GQA)
|
| 208 |
-
├── 9728 intermediate size (SiLU)
|
| 209 |
-
├── RoPE (theta=1M, max 40960 positions)
|
| 210 |
-
├── 4-bit quantisation (4.5 bits/weight)
|
| 211 |
-
└── TurboQuant PolarQuant KV cache compatible
|
| 212 |
|
| 213 |
-
|
|
|
|
|
|
|
|
|
|
| 214 |
|
| 215 |
-
|
| 216 |
|
| 217 |
-
|
| 218 |
-
one universal event bus
|
| 219 |
-
- Autonomous operator — follow-through engine that owns work: triages, drafts replies, sends follow-ups, tracks
|
| 220 |
-
commitments
|
| 221 |
-
- Self-improving — LoRA training pipeline runs on your hardware, improving the model from your usage patterns
|
| 222 |
-
- Privacy by architecture — all inference on-device, data never leaves your machine, secrets in macOS Keychain
|
| 223 |
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
- Trained for Zora's orchestrator context — may underperform on general chat benchmarks
|
| 227 |
-
- English only
|
| 228 |
-
- Best results with the Zora tool/system prompt format
|
| 229 |
-
- Structured JSON output may truncate on very large contexts (>10K chars) — the orchestrator has repair logic
|
| 230 |
-
for this
|
| 231 |
-
- Not suitable for tasks requiring >40K context (use a larger model via worker delegation or cloud offload)
|
| 232 |
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
Links
|
| 238 |
-
|
| 239 |
-
- https://github.com/Azkabanned/zora
|
| 240 |
-
- https://huggingface.co/Qwen/Qwen3-4B
|
| 241 |
-
- https://github.com/ml-explore/mlx
|
|
|
|
| 1 |
---
|
| 2 |
+
language: en
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
library_name: mlx
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
+
tags:
|
| 7 |
+
- mlx
|
| 8 |
+
- apple-silicon
|
| 9 |
+
- tool-calling
|
| 10 |
+
- orchestrator
|
| 11 |
+
- local-ai
|
| 12 |
+
- personal-ai-os
|
| 13 |
+
base_model: Qwen/Qwen3-4B
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# Zora 4B
|
| 17 |
+
|
| 18 |
+
The orchestrator brain for [Zora](https://github.com/Azkabanned/zora) — a private, local-first personal AI OS that runs on Apple Silicon.
|
| 19 |
+
|
| 20 |
+
## What is this?
|
| 21 |
+
|
| 22 |
+
Zora 4B is a fine-tuned [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) model, quantised to 4-bit for efficient inference on Apple Silicon via [MLX](https://github.com/ml-explore/mlx). It serves as Zora's primary reasoning brain — handling tool calling, task routing, structured reflection, and conversational interaction.
|
| 23 |
+
|
| 24 |
+
This is **not** a general-purpose chat model. It is specifically trained for orchestrator behaviour: deciding which tools to call, how to route tasks across local and remote compute, producing structured JSON for autonomous cognition, and managing multi-step goals.
|
| 25 |
+
|
| 26 |
+
## Key capabilities
|
| 27 |
+
|
| 28 |
+
- **Tool calling** — 39+ tools with structured `<tool_call>` output format
|
| 29 |
+
- **Task routing** — classifies prompts into direct response, queued goal, or delegated work to 70B worker nodes
|
| 30 |
+
- **Structured reflection** — produces complete COG-X JSON schema for autonomous cognition loops
|
| 31 |
+
- **Task delegation** — routes complex build/code/refactor tasks to worker nodes with bigger models
|
| 32 |
+
- **Multi-turn reasoning** — maintains context across tool call chains (up to 8 rounds)
|
| 33 |
+
- **Thinking mode** — optional `<think>` blocks for chain-of-thought reasoning
|
| 34 |
+
|
| 35 |
+
## Hardware requirements
|
| 36 |
+
|
| 37 |
+
| Config | RAM | Performance |
|
| 38 |
+
|--------|-----|-------------|
|
| 39 |
+
| Mac Mini M4 24GB | 24GB | ~90 tok/s with TurboQuant KV cache |
|
| 40 |
+
| MacBook Pro M5 Max 128GB | 128GB | ~110 tok/s with speculative decoding |
|
| 41 |
+
| MacBook Air M3 16GB | 16GB | ~35 tok/s |
|
| 42 |
+
| Any Apple Silicon | 8GB+ | Will run, but may be slow |
|
| 43 |
+
|
| 44 |
+
The entire stack — model, KV cache, and OS — runs in **7GB RAM** on a 24GB Mac Mini.
|
| 45 |
+
|
| 46 |
+
## Usage
|
| 47 |
+
|
| 48 |
+
### With MLX
|
| 49 |
+
|
| 50 |
+
```python
|
| 51 |
+
from mlx_lm import load, generate
|
| 52 |
+
|
| 53 |
+
model, tokenizer = load("project-zora/zora-4b")
|
| 54 |
+
response = generate(model, tokenizer, prompt="What's running on my cluster?", max_tokens=512)
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
### As an Anthropic-compatible API
|
| 58 |
+
|
| 59 |
+
```bash
|
| 60 |
+
export ANTHROPIC_BASE_URL=http://localhost:4001
|
| 61 |
+
export ANTHROPIC_API_KEY=local
|
| 62 |
+
claude # now running on your Metal GPU
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
### With Zora orchestrator
|
| 66 |
+
|
| 67 |
+
This model is downloaded automatically when you run `./install.sh` in the [Zora repository](https://github.com/Azkabanned/zora).
|
| 68 |
+
|
| 69 |
+
## Training
|
| 70 |
+
|
| 71 |
+
| Round | Focus | Examples |
|
| 72 |
+
|-------|-------|---------|
|
| 73 |
+
| R1-R3 | Core tool calling, multi-step chains | 600+ |
|
| 74 |
+
| R4-R5 | Edge cases, delegation rules | 200+ |
|
| 75 |
+
| R6 | All features (Team Zora, Enhanced Memory, Presence) | 200+ |
|
| 76 |
+
| R7 | Structured JSON reflection (COG-X schema) | 37 |
|
| 77 |
+
| R8 | Delegation routing (complex build tasks) | 40 |
|
| 78 |
+
| **Total** | | **1,107 examples** |
|
| 79 |
+
|
| 80 |
+
- **Base model:** Qwen3-4B
|
| 81 |
+
- **Method:** LoRA SFT (16 layers, lr=1e-4, 2500 iterations)
|
| 82 |
+
- **Final val loss:** 0.017
|
| 83 |
+
- **Quantisation:** 4-bit (4.5 bits per weight) via MLX
|
| 84 |
+
- **Hardware:** MacBook Pro M5 Max 128GB
|
| 85 |
+
- **Test result:** 8/10 tool calling accuracy
|
| 86 |
+
- **No personal data** — all examples are synthetic
|
| 87 |
+
|
| 88 |
+
## Architecture
|
| 89 |
+
|
| 90 |
+
```
|
| 91 |
+
Qwen3ForCausalLM
|
| 92 |
+
+-- 36 layers, 2560 hidden size
|
| 93 |
+
+-- 32 attention heads, 8 KV heads (GQA)
|
| 94 |
+
+-- 9728 intermediate size (SiLU)
|
| 95 |
+
+-- RoPE (theta=1M, max 40960 positions)
|
| 96 |
+
+-- 4-bit quantisation (4.5 bits/weight)
|
| 97 |
+
+-- TurboQuant PolarQuant KV cache compatible
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
## What makes Zora different
|
| 101 |
+
|
| 102 |
+
Zora is a personal AI OS — not a chatbot. This brain model is one part of a larger system:
|
| 103 |
+
|
| 104 |
+
- **Real-time nervous system** — events from every channel flow through one universal event bus
|
| 105 |
+
- **Autonomous operator** — follow-through engine that owns work across all channels
|
| 106 |
+
- **Self-improving** — LoRA training pipeline runs on your hardware
|
| 107 |
+
- **Privacy by architecture** — all inference on-device, data never leaves your machine
|
| 108 |
+
|
| 109 |
+
## Limitations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
+
- Trained for Zora's orchestrator context — may underperform on general chat benchmarks
|
| 112 |
+
- English only
|
| 113 |
+
- Best results with the Zora tool/system prompt format
|
| 114 |
+
- Not suitable for tasks requiring >40K context
|
| 115 |
|
| 116 |
+
## License
|
| 117 |
|
| 118 |
+
Apache 2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
|
| 120 |
+
## Links
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
|
| 122 |
+
- [Zora on GitHub](https://github.com/Azkabanned/zora)
|
| 123 |
+
- [Qwen3-4B (base model)](https://huggingface.co/Qwen/Qwen3-4B)
|
| 124 |
+
- [MLX](https://github.com/ml-explore/mlx)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|