codelion commited on
Commit
9ea426f
·
verified ·
1 Parent(s): a434aee

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -60,6 +60,12 @@ results = model.predict("image.jpg")
60
 
61
  OptiQ measures each conv layer's sensitivity via KL divergence on detection outputs, then assigns optimal per-layer bit-widths using greedy knapsack optimization. Sensitive layers (detection head, feature pyramid) get 8-bit precision while robust backbone layers get 4-bit.
62
 
 
 
 
 
 
 
63
  ## Credits
64
 
65
  - **Quantization:** [mlx-optiq](https://pypi.org/project/mlx-optiq/) by Thin Signal
 
60
 
61
  OptiQ measures each conv layer's sensitivity via KL divergence on detection outputs, then assigns optimal per-layer bit-widths using greedy knapsack optimization. Sensitive layers (detection head, feature pyramid) get 8-bit precision while robust backbone layers get 4-bit.
62
 
63
+
64
+
65
+ ## Article
66
+
67
+ For more details on the methodology and results, see: [Not All Layers Are Equal: Mixed-Precision Quantization for Weights and KV Cache on Apple Silicon](https://x.com/thin_signal/status/2028412948167942334)
68
+
69
  ## Credits
70
 
71
  - **Quantization:** [mlx-optiq](https://pypi.org/project/mlx-optiq/) by Thin Signal