YOLO26s-OptiQ-6bit

Mixed-precision quantized YOLO26s for Apple Silicon via OptiQ

This is a mixed-precision quantized version of YOLO26s in MLX format, optimized with mlx-optiq for Apple Silicon inference via yolo-mlx.

Quantization Details

Property Value
Target BPW 6.0
Achieved BPW 5.97
Layers at 4-bit 11
Layers at 8-bit 115
Original size 38.4 MB
Quantized size 8.9 MB
Compression 4.3x

Benchmark Results (COCO128)

Model Total Detections Avg/Image
OptiQ 6-bit 633 4.9
Original (FP32) 681 5.3

Detection delta: -48 (-7.0%) at 4.3x compression.

Usage

Requires mlx-optiq and yolo-mlx:

pip install mlx-optiq yolo-mlx
from optiq.models.yolo import load_quantized_yolo

model = load_quantized_yolo("mlx-community/YOLO26s-OptiQ-6bit")
results = model.predict("image.jpg")

How OptiQ Works

OptiQ measures each conv layer's sensitivity via KL divergence on detection outputs, then assigns optimal per-layer bit-widths using greedy knapsack optimization. Sensitive layers (detection head, feature pyramid) get 8-bit precision while robust backbone layers get 4-bit.

Article

For more details on the methodology and results, see: Not All Layers Are Equal: Mixed-Precision Quantization for Weights and KV Cache on Apple Silicon

Credits

Downloads last month
36
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/YOLO26s-OptiQ-6bit

Finetuned
(27)
this model

Collection including mlx-community/YOLO26s-OptiQ-6bit