mlx-community
/

YOLO26s-OptiQ-6bit

@@ -60,6 +60,12 @@ results = model.predict("image.jpg")
 OptiQ measures each conv layer's sensitivity via KL divergence on detection outputs, then assigns optimal per-layer bit-widths using greedy knapsack optimization. Sensitive layers (detection head, feature pyramid) get 8-bit precision while robust backbone layers get 4-bit.
 ## Credits
 - **Quantization:** [mlx-optiq](https://pypi.org/project/mlx-optiq/) by Thin Signal

 OptiQ measures each conv layer's sensitivity via KL divergence on detection outputs, then assigns optimal per-layer bit-widths using greedy knapsack optimization. Sensitive layers (detection head, feature pyramid) get 8-bit precision while robust backbone layers get 4-bit.
+## Article
+For more details on the methodology and results, see: [Not All Layers Are Equal: Mixed-Precision Quantization for Weights and KV Cache on Apple Silicon](https://x.com/thin_signal/status/2028412948167942334)
 ## Credits
 - **Quantization:** [mlx-optiq](https://pypi.org/project/mlx-optiq/) by Thin Signal