YOLO26m-OptiQ-6bit / README.md
codelion's picture
Upload README.md with huggingface_hub
d279ffc verified
metadata
library_name: mlx
license: agpl-3.0
pipeline_tag: object-detection
base_model: Ultralytics/YOLO26
tags:
  - mlx
  - quantized
  - mixed-precision
  - yolo
  - yolo26
  - object-detection
  - optiq
  - apple-silicon

YOLO26m-OptiQ-6bit

Mixed-precision quantized YOLO26m for Apple Silicon via OptiQ

This is a mixed-precision quantized version of YOLO26m in MLX format, optimized with mlx-optiq for Apple Silicon inference via yolo-mlx.

Quantization Details

Property Value
Target BPW 6.0
Achieved BPW 5.97
Layers at 4-bit 12
Layers at 8-bit 124
Original size 83.8 MB
Quantized size 18.9 MB
Compression 4.4x

Benchmark Results (COCO128)

Model Total Detections Avg/Image
OptiQ 6-bit 747 5.8
Original (FP32) 746 5.8

Detection delta: +1 (+0.1%) at 4.4x compression.

Usage

Requires mlx-optiq and yolo-mlx:

pip install mlx-optiq yolo-mlx
from optiq.models.yolo import load_quantized_yolo

model = load_quantized_yolo("mlx-community/YOLO26m-OptiQ-6bit")
results = model.predict("image.jpg")

How OptiQ Works

OptiQ measures each conv layer's sensitivity via KL divergence on detection outputs, then assigns optimal per-layer bit-widths using greedy knapsack optimization. Sensitive layers (detection head, feature pyramid) get 8-bit precision while robust backbone layers get 4-bit.

Article

For more details on the methodology and results, see: Not All Layers Are Equal: Mixed-Precision Quantization for Weights and KV Cache on Apple Silicon

Credits