StephanST commited on
Commit
87a03f5
·
verified ·
1 Parent(s): 029a4fb

Upload so400m/mxfp8/README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. so400m/mxfp8/README.md +8 -10
so400m/mxfp8/README.md CHANGED
@@ -45,22 +45,21 @@ Against the local bf16 MLX bundle at `512x512` on 12 WALDO crop images:
45
 
46
  | Metric | Mean | Min |
47
  | --- | ---: | ---: |
48
- | Summary cosine | 0.989676 | 0.949449 |
49
- | Spatial cosine | 0.993379 | 0.978096 |
50
 
51
  This is lower precision than the 8-bit affine bundle. Treat this as experimental.
52
 
53
  ## Measured Speed
54
 
55
- Fast-kernel compiled-forward MLX measurements at `512x512`, batch 1:
56
 
57
- | Runtime | p50 latency | Throughput |
58
- | --- | ---: | ---: |
59
- | packed | 49.8 ms | 20.1 images/s |
60
- | dequantize at load | 32.5 ms | 30.8 images/s |
61
 
62
- `packed` keeps weights low-bit at runtime but is slower for this ViT encoder. Use
63
- `--quantized-runtime dequantize` when latency matters; it expands weights to bf16 at load.
64
 
65
  ## Usage
66
 
@@ -71,7 +70,6 @@ cradio-mlx embed \
71
  --image image.jpg \
72
  --image-size 512 \
73
  --dtype bfloat16 \
74
- --quantized-runtime dequantize \
75
  --save-npz embedding.npz
76
  ```
77
 
 
45
 
46
  | Metric | Mean | Min |
47
  | --- | ---: | ---: |
48
+ | Summary cosine | 0.989820 | 0.950717 |
49
+ | Spatial cosine | 0.993502 | 0.977879 |
50
 
51
  This is lower precision than the 8-bit affine bundle. Treat this as experimental.
52
 
53
  ## Measured Speed
54
 
55
+ Packed low-bit runtime, fast-kernel compiled-forward MLX at `512x512`, batch 1:
56
 
57
+ | p50 latency | Throughput |
58
+ | ---: | ---: |
59
+ | 49.8 ms | 20.1 images/s |
 
60
 
61
+ The bf16 SO400M bundle is faster on this workload when it fits. This bundle is experimental
62
+ and lower precision than 8-bit affine.
63
 
64
  ## Usage
65
 
 
70
  --image image.jpg \
71
  --image-size 512 \
72
  --dtype bfloat16 \
 
73
  --save-npz embedding.npz
74
  ```
75