Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -60,6 +60,12 @@ results = model.predict("image.jpg")
|
|
| 60 |
|
| 61 |
OptiQ measures each conv layer's sensitivity via KL divergence on detection outputs, then assigns optimal per-layer bit-widths using greedy knapsack optimization. Sensitive layers (detection head, feature pyramid) get 8-bit precision while robust backbone layers get 4-bit.
|
| 62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
## Credits
|
| 64 |
|
| 65 |
- **Quantization:** [mlx-optiq](https://pypi.org/project/mlx-optiq/) by Thin Signal
|
|
|
|
| 60 |
|
| 61 |
OptiQ measures each conv layer's sensitivity via KL divergence on detection outputs, then assigns optimal per-layer bit-widths using greedy knapsack optimization. Sensitive layers (detection head, feature pyramid) get 8-bit precision while robust backbone layers get 4-bit.
|
| 62 |
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
## Article
|
| 66 |
+
|
| 67 |
+
For more details on the methodology and results, see: [Not All Layers Are Equal: Mixed-Precision Quantization for Weights and KV Cache on Apple Silicon](https://x.com/thin_signal/status/2028412948167942334)
|
| 68 |
+
|
| 69 |
## Credits
|
| 70 |
|
| 71 |
- **Quantization:** [mlx-optiq](https://pypi.org/project/mlx-optiq/) by Thin Signal
|