Instructions to use StephanST/C-radiov4_quantized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use StephanST/C-radiov4_quantized with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir C-radiov4_quantized StephanST/C-radiov4_quantized
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -30,6 +30,8 @@ https://github.com/stephansturges/c-radio_v4_MLX
|
|
| 30 |
| --- | --- | --- | --- |
|
| 31 |
| `so400m/8bit-affine` | `nvidia/C-RADIOv4-SO400M` | 8-bit affine, group size 64 | Compact/high-precision |
|
| 32 |
| `h/8bit-affine` | `nvidia/C-RADIOv4-H` | 8-bit affine, group size 64 | Compact/high-precision |
|
|
|
|
|
|
|
| 33 |
| `so400m/mxfp8` | `nvidia/C-RADIOv4-SO400M` | `mxfp8`, group size 32 | Experimental/lower precision |
|
| 34 |
| `h/mxfp8` | `nvidia/C-RADIOv4-H` | `mxfp8`, group size 32 | Experimental/lower precision |
|
| 35 |
|
|
@@ -58,23 +60,35 @@ Measured against local bf16 MLX bundles at `512x512` on 12 WALDO crop images.
|
|
| 58 |
| `so400m/mxfp8` | 0.989820 / 0.950717 | 0.993502 / 0.977879 |
|
| 59 |
| `h/mxfp8` | 0.990217 / 0.974710 | 0.988696 / 0.976071 |
|
| 60 |
|
| 61 |
-
The 8-bit affine bundles are the recommended compact/high-precision artifacts.
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
## Speed Summary
|
| 65 |
|
| 66 |
-
|
| 67 |
|
| 68 |
| Bundle | p50 latency | Throughput |
|
| 69 |
| --- | ---: | ---: |
|
| 70 |
-
| `so400m/8bit-affine` |
|
| 71 |
-
| `h/8bit-affine` |
|
|
|
|
|
|
|
| 72 |
| `so400m/mxfp8` | 49.8 ms | 20.1 images/s |
|
| 73 |
| `h/mxfp8` | 52.6 ms | 19.0 images/s |
|
| 74 |
|
| 75 |
-
These are packed low-bit runtime measurements. The
|
| 76 |
-
storage and lower runtime weight memory
|
| 77 |
-
|
|
|
|
| 78 |
|
| 79 |
## Usage
|
| 80 |
|
|
@@ -103,6 +117,19 @@ cradio-mlx embed \
|
|
| 103 |
--save-npz embedding.npz
|
| 104 |
```
|
| 105 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
## License
|
| 107 |
|
| 108 |
The implementation code in `c-radio_v4_MLX` is MIT licensed. The model weights and these
|
|
|
|
| 30 |
| --- | --- | --- | --- |
|
| 31 |
| `so400m/8bit-affine` | `nvidia/C-RADIOv4-SO400M` | 8-bit affine, group size 64 | Compact/high-precision |
|
| 32 |
| `h/8bit-affine` | `nvidia/C-RADIOv4-H` | 8-bit affine, group size 64 | Compact/high-precision |
|
| 33 |
+
| `so400m/cider-w8a8` | `nvidia/C-RADIOv4-SO400M` | Cider W8A8, per-channel | M5+ compact/runtime low-bit |
|
| 34 |
+
| `h/cider-w8a8` | `nvidia/C-RADIOv4-H` | Cider W8A8, per-channel | M5+ compact/runtime low-bit |
|
| 35 |
| `so400m/mxfp8` | `nvidia/C-RADIOv4-SO400M` | `mxfp8`, group size 32 | Experimental/lower precision |
|
| 36 |
| `h/mxfp8` | `nvidia/C-RADIOv4-H` | `mxfp8`, group size 32 | Experimental/lower precision |
|
| 37 |
|
|
|
|
| 60 |
| `so400m/mxfp8` | 0.989820 / 0.950717 | 0.993502 / 0.977879 |
|
| 61 |
| `h/mxfp8` | 0.990217 / 0.974710 | 0.988696 / 0.976071 |
|
| 62 |
|
| 63 |
+
The 8-bit affine bundles are the recommended compact/high-precision artifacts. Cider W8A8
|
| 64 |
+
is a real low-bit runtime path for Apple M5+ machines and trades a little more embedding
|
| 65 |
+
drift for lower memory and modest speedups in some cells. The `mxfp8` bundles are included
|
| 66 |
+
for experimentation and are lower precision in these checks.
|
| 67 |
+
|
| 68 |
+
Smoke-image Cider W8A8 precision versus local bf16 MLX at `512x512`:
|
| 69 |
+
|
| 70 |
+
| Bundle | Summary cosine | Spatial cosine |
|
| 71 |
+
| --- | ---: | ---: |
|
| 72 |
+
| `so400m/cider-w8a8` | 0.998164 | 0.998889 |
|
| 73 |
+
| `h/cider-w8a8` | 0.997202 | 0.996210 |
|
| 74 |
|
| 75 |
## Speed Summary
|
| 76 |
|
| 77 |
+
MLX measurements on Apple M5 Max at `512x512`, batch 1:
|
| 78 |
|
| 79 |
| Bundle | p50 latency | Throughput |
|
| 80 |
| --- | ---: | ---: |
|
| 81 |
+
| `so400m/8bit-affine` | 49.6 ms | 20.2 images/s |
|
| 82 |
+
| `h/8bit-affine` | 74.2 ms | 13.5 images/s |
|
| 83 |
+
| `so400m/cider-w8a8` | 32.5 ms | 30.8 images/s |
|
| 84 |
+
| `h/cider-w8a8` | 47.1 ms | 21.2 images/s |
|
| 85 |
| `so400m/mxfp8` | 49.8 ms | 20.1 images/s |
|
| 86 |
| `h/mxfp8` | 52.6 ms | 19.0 images/s |
|
| 87 |
|
| 88 |
+
These are packed low-bit runtime measurements. The MLX affine and `mxfp8` bundles
|
| 89 |
+
prioritize compact storage and lower runtime weight memory over throughput. Cider W8A8 is
|
| 90 |
+
the faster low-bit runtime path found so far, but it requires Apple M5+ hardware and the
|
| 91 |
+
optional Cider package.
|
| 92 |
|
| 93 |
## Usage
|
| 94 |
|
|
|
|
| 117 |
--save-npz embedding.npz
|
| 118 |
```
|
| 119 |
|
| 120 |
+
Cider W8A8 bundles require Python `>=3.12`, Apple M5+ hardware, and Cider:
|
| 121 |
+
|
| 122 |
+
```sh
|
| 123 |
+
python -m pip install "cider @ git+https://github.com/Mininglamp-AI/cider.git"
|
| 124 |
+
cradio-mlx embed \
|
| 125 |
+
--backend mlx-h \
|
| 126 |
+
--checkpoint h/cider-w8a8 \
|
| 127 |
+
--image image.jpg \
|
| 128 |
+
--image-size 512 \
|
| 129 |
+
--dtype bfloat16 \
|
| 130 |
+
--save-npz embedding.npz
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
## License
|
| 134 |
|
| 135 |
The implementation code in `c-radio_v4_MLX` is MIT licensed. The model weights and these
|