Zora 4B

The orchestrator brain for Zora — a private, local-first personal AI OS that runs on Apple Silicon.

What is this?

Zora 4B is a fine-tuned Qwen3-4B model, quantised to 4-bit for efficient inference on Apple Silicon via MLX. It serves as Zora's primary reasoning brain — handling tool calling, task routing, structured reflection, and conversational interaction.

This is not a general-purpose chat model. It is specifically trained for orchestrator behaviour: deciding which tools to call, how to route tasks across local and remote compute, producing structured JSON for autonomous cognition, and managing multi-step goals.

Key capabilities

Tool calling — 39+ tools with structured <tool_call> output format
Task routing — classifies prompts into direct response, queued goal, or delegated work to 70B worker nodes
Structured reflection — produces complete COG-X JSON schema for autonomous cognition loops
Task delegation — routes complex build/code/refactor tasks to worker nodes with bigger models
Multi-turn reasoning — maintains context across tool call chains (up to 8 rounds)
Thinking mode — optional <think> blocks for chain-of-thought reasoning

Hardware requirements

Config	RAM	Performance
Mac Mini M4 24GB	24GB	~90 tok/s with TurboQuant KV cache
MacBook Pro M5 Max 128GB	128GB	~110 tok/s with speculative decoding
MacBook Air M3 16GB	16GB	~35 tok/s
Any Apple Silicon	8GB+	Will run, but may be slow

The entire stack — model, KV cache, and OS — runs in 7GB RAM on a 24GB Mac Mini.

Usage

With MLX

from mlx_lm import load, generate

model, tokenizer = load("project-zora/zora-4b")
response = generate(model, tokenizer, prompt="What's running on my cluster?", max_tokens=512)

As an Anthropic-compatible API

export ANTHROPIC_BASE_URL=http://localhost:4001
export ANTHROPIC_API_KEY=local
claude  # now running on your Metal GPU

With Zora orchestrator

This model is downloaded automatically when you run ./install.sh in the Zora repository.

Training

Round	Focus	Examples
R1-R3	Core tool calling, multi-step chains	600+
R4-R5	Edge cases, delegation rules	200+
R6	All features (Team Zora, Enhanced Memory, Presence)	200+
R7	Structured JSON reflection (COG-X schema)	37
R8	Delegation routing (complex build tasks)	40
Total		1,107 examples

Base model: Qwen3-4B
Method: LoRA SFT (16 layers, lr=1e-4, 2500 iterations)
Final val loss: 0.017
Quantisation: 4-bit (4.5 bits per weight) via MLX
Hardware: MacBook Pro M5 Max 128GB
Test result: 8/10 tool calling accuracy
No personal data — all examples are synthetic

Architecture

Qwen3ForCausalLM
+-- 36 layers, 2560 hidden size
+-- 32 attention heads, 8 KV heads (GQA)
+-- 9728 intermediate size (SiLU)
+-- RoPE (theta=1M, max 40960 positions)
+-- 4-bit quantisation (4.5 bits/weight)
+-- TurboQuant PolarQuant KV cache compatible

What makes Zora different

Zora is a personal AI OS — not a chatbot. This brain model is one part of a larger system:

Real-time nervous system — events from every channel flow through one universal event bus
Autonomous operator — follow-through engine that owns work across all channels
Self-improving — LoRA training pipeline runs on your hardware
Privacy by architecture — all inference on-device, data never leaves your machine

Limitations

Trained for Zora's orchestrator context — may underperform on general chat benchmarks
English only
Best results with the Zora tool/system prompt format
Not suitable for tasks requiring >40K context

License

Apache 2.0

Model tree for project-zora/zora-4b

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Quantized

(205)

this model

project-zora
/

zora-4b