metadata
library_name: mlx
license: agpl-3.0
pipeline_tag: object-detection
base_model: Ultralytics/YOLO26
tags:
- mlx
- quantized
- mixed-precision
- yolo
- yolo26
- object-detection
- optiq
- apple-silicon
YOLO26m-OptiQ-6bit
Mixed-precision quantized YOLO26m for Apple Silicon via OptiQ
This is a mixed-precision quantized version of YOLO26m in MLX format, optimized with mlx-optiq for Apple Silicon inference via yolo-mlx.
Quantization Details
| Property | Value |
|---|---|
| Target BPW | 6.0 |
| Achieved BPW | 5.97 |
| Layers at 4-bit | 12 |
| Layers at 8-bit | 124 |
| Original size | 83.8 MB |
| Quantized size | 18.9 MB |
| Compression | 4.4x |
Benchmark Results (COCO128)
| Model | Total Detections | Avg/Image |
|---|---|---|
| OptiQ 6-bit | 747 | 5.8 |
| Original (FP32) | 746 | 5.8 |
Detection delta: +1 (+0.1%) at 4.4x compression.
Usage
Requires mlx-optiq and yolo-mlx:
pip install mlx-optiq yolo-mlx
from optiq.models.yolo import load_quantized_yolo
model = load_quantized_yolo("mlx-community/YOLO26m-OptiQ-6bit")
results = model.predict("image.jpg")
How OptiQ Works
OptiQ measures each conv layer's sensitivity via KL divergence on detection outputs, then assigns optimal per-layer bit-widths using greedy knapsack optimization. Sensitive layers (detection head, feature pyramid) get 8-bit precision while robust backbone layers get 4-bit.
Article
For more details on the methodology and results, see: Not All Layers Are Equal: Mixed-Precision Quantization for Weights and KV Cache on Apple Silicon