Zora 4B

The orchestrator brain for Zora — a private, local-first personal AI OS that runs on Apple Silicon.

What is this?

Zora 4B is a fine-tuned Qwen3-4B model, quantised to 4-bit for efficient inference on Apple Silicon via MLX. It serves as Zora's primary reasoning brain — handling tool calling, task routing, structured reflection, and conversational interaction.

This is not a general-purpose chat model. It is specifically trained for orchestrator behaviour: deciding which tools to call, how to route tasks across local and remote compute, producing structured JSON for autonomous cognition, and managing multi-step goals.

Key capabilities

  • Tool calling — 39+ tools with structured <tool_call> output format
  • Task routing — classifies prompts into direct response, queued goal, or delegated work to 70B worker nodes
  • Structured reflection — produces complete COG-X JSON schema for autonomous cognition loops
  • Task delegation — routes complex build/code/refactor tasks to worker nodes with bigger models
  • Multi-turn reasoning — maintains context across tool call chains (up to 8 rounds)
  • Thinking mode — optional <think> blocks for chain-of-thought reasoning

Hardware requirements

Config RAM Performance
Mac Mini M4 24GB 24GB ~90 tok/s with TurboQuant KV cache
MacBook Pro M5 Max 128GB 128GB ~110 tok/s with speculative decoding
MacBook Air M3 16GB 16GB ~35 tok/s
Any Apple Silicon 8GB+ Will run, but may be slow

The entire stack — model, KV cache, and OS — runs in 7GB RAM on a 24GB Mac Mini.

Usage

With MLX

from mlx_lm import load, generate

model, tokenizer = load("project-zora/zora-4b")
response = generate(model, tokenizer, prompt="What's running on my cluster?", max_tokens=512)

As an Anthropic-compatible API

export ANTHROPIC_BASE_URL=http://localhost:4001
export ANTHROPIC_API_KEY=local
claude  # now running on your Metal GPU

With Zora orchestrator

This model is downloaded automatically when you run ./install.sh in the Zora repository.

Training

Round Focus Examples
R1-R3 Core tool calling, multi-step chains 600+
R4-R5 Edge cases, delegation rules 200+
R6 All features (Team Zora, Enhanced Memory, Presence) 200+
R7 Structured JSON reflection (COG-X schema) 37
R8 Delegation routing (complex build tasks) 40
Total 1,107 examples
  • Base model: Qwen3-4B
  • Method: LoRA SFT (16 layers, lr=1e-4, 2500 iterations)
  • Final val loss: 0.017
  • Quantisation: 4-bit (4.5 bits per weight) via MLX
  • Hardware: MacBook Pro M5 Max 128GB
  • Test result: 8/10 tool calling accuracy
  • No personal data — all examples are synthetic

Architecture

Qwen3ForCausalLM
+-- 36 layers, 2560 hidden size
+-- 32 attention heads, 8 KV heads (GQA)
+-- 9728 intermediate size (SiLU)
+-- RoPE (theta=1M, max 40960 positions)
+-- 4-bit quantisation (4.5 bits/weight)
+-- TurboQuant PolarQuant KV cache compatible

What makes Zora different

Zora is a personal AI OS — not a chatbot. This brain model is one part of a larger system:

  • Real-time nervous system — events from every channel flow through one universal event bus
  • Autonomous operator — follow-through engine that owns work across all channels
  • Self-improving — LoRA training pipeline runs on your hardware
  • Privacy by architecture — all inference on-device, data never leaves your machine

Limitations

  • Trained for Zora's orchestrator context — may underperform on general chat benchmarks
  • English only
  • Best results with the Zora tool/system prompt format
  • Not suitable for tasks requiring >40K context

License

Apache 2.0

Links

Downloads last month
333
Safetensors
Model size
0.6B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for project-zora/zora-4b

Finetuned
Qwen/Qwen3-4B
Quantized
(205)
this model