lpalbou commited on
Commit
358f022
·
verified ·
1 Parent(s): a6b8045

Put validation values directly in comparison table

Browse files
Files changed (1) hide show
  1. README.md +5 -11
README.md CHANGED
@@ -51,20 +51,14 @@ Bottom line:
51
  - The BF16 package reduces storage, not runtime memory.
52
  - This mixed q8/BF16 package reduces both storage and runtime memory. This is the package to use when generation memory footprint matters.
53
 
54
- | Layout | Disk | Runtime Memory | Improvement |
55
- | --- | ---: | --- | --- |
56
- | Original source snapshot | 118 GiB | Baseline | Baseline. |
57
- | BF16 package | 64 GiB | Same class as original | Storage only; output was byte-identical. |
58
- | This mixed q8/BF16 package | 40 GiB | Lower | Storage and memory; side-by-side quality validation passed. |
59
 
60
  Compared with the original source snapshot, this mixed q8/BF16 package cuts disk usage by about 66%, MLX peak memory by about 37%, and physical peak memory by about 35% in this validation run. It is not byte-identical to BF16, but the validation contact sheet stayed in the same visual family. The prepared q8/BF16 output was byte-identical to running `--quantize 8` from the upstream source snapshot.
61
 
62
- Raw measurements:
63
-
64
- - Original source snapshot: 32.99 GiB MLX peak, 48.90 GiB physical peak, 108.31 s.
65
- - BF16 package: 32.98 GiB MLX peak, 45.12 GiB physical peak, 114.39 s.
66
- - This mixed q8/BF16 package: 20.84 GiB MLX peak, 31.75 GiB physical peak, 110.34 s.
67
-
68
  ## Compatibility
69
 
70
  Requires `mlx-gen >= 0.18.8`.
 
51
  - The BF16 package reduces storage, not runtime memory.
52
  - This mixed q8/BF16 package reduces both storage and runtime memory. This is the package to use when generation memory footprint matters.
53
 
54
+ | Layout | Disk | MLX Peak | Physical Peak | Time | Result |
55
+ | --- | ---: | ---: | ---: | ---: | --- |
56
+ | Original source snapshot | 118 GiB | 32.99 GiB | 48.90 GiB | 108.31 s | Baseline. |
57
+ | BF16 package | 64 GiB | 32.98 GiB | 45.12 GiB | 114.39 s | Storage only; output was byte-identical. |
58
+ | This mixed q8/BF16 package | 40 GiB | 20.84 GiB | 31.75 GiB | 110.34 s | Storage and memory; side-by-side quality validation passed. |
59
 
60
  Compared with the original source snapshot, this mixed q8/BF16 package cuts disk usage by about 66%, MLX peak memory by about 37%, and physical peak memory by about 35% in this validation run. It is not byte-identical to BF16, but the validation contact sheet stayed in the same visual family. The prepared q8/BF16 output was byte-identical to running `--quantize 8` from the upstream source snapshot.
61
 
 
 
 
 
 
 
62
  ## Compatibility
63
 
64
  Requires `mlx-gen >= 0.18.8`.