Ornith-1.0-35B-3bit

3-bit (group size 64, 3.662 bits/weight) MLX quantization of deepreinforce-ai/Ornith-1.0-35B, produced with mlx-vlm 0.6.3. Full multimodal: the vision encoder is preserved and quantized alongside the language model. For Apple Silicon. Runs in mlx-vlm or any MLX app.

This is the smallest variant (≈16 GB). It stays coherent on both vision and reasoning, but is the most aggressive precision — expect some quality loss versus 4-bit and up.

Conversion note (MoE expert fusion)

Ornith stores its 256 MoE experts unfused (per-expert), but mlx-vlm's qwen3_5_moe loader expects them fused/batched. A sanitize monkeypatch was required to stack the experts before conversion; without it the conversion failed. This is a standard mlx-vlm 3-bit quant.

Usage

uvx --from mlx-vlm mlx_vlm.generate \
  --model mlx-community/Ornith-1.0-35B-3bit --image image.png \
  --prompt "Describe this image." --max-tokens 512
from mlx_vlm import load, generate
model, processor = load("mlx-community/Ornith-1.0-35B-3bit")

Conversion check

Smoke-tested after conversion: coherent on both an image prompt (correctly read an evaluation bar chart) and a text reasoning prompt (17 * 24 solved as 408 with correct step-by-step work), no repetition loop. 125.3 tok/s generation, 946.2 tok/s prompt, peak 18.1 GB on a Macbook Pro M5 Max 128GB 40 GPU.

Refer to the original model card for architecture, benchmarks, license, and intended use.

Downloads last month
419
Safetensors
Model size
5B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/Ornith-1.0-35B-3bit

Quantized
(107)
this model

Collection including mlx-community/Ornith-1.0-35B-3bit