codelion commited on
Commit
a434aee
·
verified ·
1 Parent(s): aa5e42e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: mlx
3
+ license: agpl-3.0
4
+ pipeline_tag: object-detection
5
+ base_model: Ultralytics/YOLO26
6
+ tags:
7
+ - mlx
8
+ - quantized
9
+ - mixed-precision
10
+ - yolo
11
+ - yolo26
12
+ - object-detection
13
+ - optiq
14
+ - apple-silicon
15
+ ---
16
+
17
+ # YOLO26s-OptiQ-6bit
18
+
19
+ > Mixed-precision quantized YOLO26s for Apple Silicon via OptiQ
20
+
21
+ This is a mixed-precision quantized version of [YOLO26s](https://github.com/ultralytics/ultralytics) in MLX format, optimized with [mlx-optiq](https://pypi.org/project/mlx-optiq/) for Apple Silicon inference via [yolo-mlx](https://pypi.org/project/yolo-mlx/).
22
+
23
+ ## Quantization Details
24
+
25
+ | Property | Value |
26
+ |---|---|
27
+ | Target BPW | 6.0 |
28
+ | Achieved BPW | 5.97 |
29
+ | Layers at 4-bit | 11 |
30
+ | Layers at 8-bit | 115 |
31
+ | Original size | 38.4 MB |
32
+ | Quantized size | 8.9 MB |
33
+ | Compression | 4.3x |
34
+
35
+ ## Benchmark Results (COCO128)
36
+
37
+ | Model | Total Detections | Avg/Image |
38
+ |---|---|---|
39
+ | **OptiQ 6-bit** | **633** | **4.9** |
40
+ | Original (FP32) | 681 | 5.3 |
41
+
42
+ Detection delta: -48 (-7.0%) at 4.3x compression.
43
+
44
+ ## Usage
45
+
46
+ Requires `mlx-optiq` and `yolo-mlx`:
47
+
48
+ ```bash
49
+ pip install mlx-optiq yolo-mlx
50
+ ```
51
+
52
+ ```python
53
+ from optiq.models.yolo import load_quantized_yolo
54
+
55
+ model = load_quantized_yolo("mlx-community/YOLO26s-OptiQ-6bit")
56
+ results = model.predict("image.jpg")
57
+ ```
58
+
59
+ ## How OptiQ Works
60
+
61
+ OptiQ measures each conv layer's sensitivity via KL divergence on detection outputs, then assigns optimal per-layer bit-widths using greedy knapsack optimization. Sensitive layers (detection head, feature pyramid) get 8-bit precision while robust backbone layers get 4-bit.
62
+
63
+ ## Credits
64
+
65
+ - **Quantization:** [mlx-optiq](https://pypi.org/project/mlx-optiq/) by Thin Signal
66
+ - **Base model:** [YOLO26](https://github.com/ultralytics/ultralytics) by Ultralytics
67
+ - **MLX runtime:** [yolo-mlx](https://pypi.org/project/yolo-mlx/)
68
+ - **Framework:** [MLX](https://github.com/ml-explore/mlx) by Apple