YOLO26l-OptiQ-6bit

Mixed-precision quantized YOLO26l for Apple Silicon via OptiQ

This is a mixed-precision quantized version of YOLO26l in MLX format, optimized with mlx-optiq for Apple Silicon inference via yolo-mlx.

Quantization Details

Property Value
Target BPW 6.0
Achieved BPW 6.00
Layers at 4-bit 16
Layers at 8-bit 174
Original size 100.7 MB
Quantized size 22.9 MB
Compression 4.4x

Benchmark Results (COCO128)

Model Total Detections Avg/Image
OptiQ 6-bit 766 6.0
Original (FP32) 766 6.0

Detection delta: +0 (+0.0%) at 4.4x compression.

Usage

Requires mlx-optiq and yolo-mlx:

pip install mlx-optiq yolo-mlx
from optiq.models.yolo import load_quantized_yolo

model = load_quantized_yolo("mlx-community/YOLO26l-OptiQ-6bit")
results = model.predict("image.jpg")

How OptiQ Works

OptiQ measures each conv layer's sensitivity via KL divergence on detection outputs, then assigns optimal per-layer bit-widths using greedy knapsack optimization. Sensitive layers (detection head, feature pyramid) get 8-bit precision while robust backbone layers get 4-bit.

Article

For more details on the methodology and results, see: Not All Layers Are Equal: Mixed-Precision Quantization for Weights and KV Cache on Apple Silicon

Credits

Downloads last month
30
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/YOLO26l-OptiQ-6bit

Finetuned
(27)
this model

Collection including mlx-community/YOLO26l-OptiQ-6bit