lpalbou commited on
Commit
a6b8045
·
verified ·
1 Parent(s): 25db82c

Improve validation summary readability

Browse files
Files changed (1) hide show
  1. README.md +18 -9
README.md CHANGED
@@ -44,17 +44,26 @@ See the [MLX-Gen quantization docs](https://github.com/lpalbou/mlx-gen/blob/main
44
 
45
  ## Local Validation
46
 
47
- Validation used 384x224, 17 frames, 12 denoising steps, guidance 4, guidance-2 3, fps 8, and seed 4242 on Apple Silicon. Memory is measured for the full package run from model init through video save. RSS alone is not enough for MLX/Metal unified-memory pressure, so the table reports both MLX allocator peak and Darwin physical footprint.
48
 
49
- | Package | Folder Size | Runtime Precision | Mode | MLX Peak | Physical Footprint Peak | Time | Notes |
50
- | --- | ---: | --- | --- | ---: | ---: | ---: | --- |
51
- | Upstream source snapshot | 118 GiB | BF16 runtime from FP32/BF16 source files | release inactive denoiser | 32.99 GiB | 48.90 GiB | 108.31 s | Baseline source-cache run. |
52
- | BF16 prepared folder | 64 GiB | BF16 | release inactive denoiser | 32.98 GiB | 45.12 GiB | 114.39 s | Output was byte-identical to the source-cache run. |
53
- | This mixed q8/BF16 folder | 40 GiB | q8 transformer block linears, BF16 sensitive paths and VAE | release inactive denoiser | 20.84 GiB | 31.75 GiB | 110.34 s | Lower usage memory; not byte-identical to BF16 but passed side-by-side validation. |
54
- | This mixed q8/BF16 folder | 40 GiB | q8 transformer block linears, BF16 sensitive paths and VAE | no denoiser release | 35.19 GiB | 50.55 GiB | 108.10 s | Retains both quantized A14B denoisers through decode. |
55
- | This mixed q8/BF16 folder | 40 GiB | q8 transformer block linears, BF16 sensitive paths and VAE | low-RAM release | 15.48 GiB | 20.74 GiB | 108.70 s | Lowest measured usage-memory mode. |
56
 
57
- Within each package/precision, release/no-release/low-RAM outputs were byte-identical. This prepared q8 folder was also byte-identical to runtime `--quantize 8` from the upstream source snapshot. q8 is not byte-identical to BF16, but the validation contact sheet stayed in the same visual family.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
  ## Compatibility
60
 
 
44
 
45
  ## Local Validation
46
 
47
+ Validation used 384x224, 17 frames, 12 denoising steps, guidance 4, guidance-2 3, fps 8, and seed 4242 on Apple Silicon. The memory numbers cover the full run from model init through video save. MLX peak is the MLX allocator peak; physical peak is the Darwin process physical footprint, which better reflects Apple Silicon unified-memory pressure than RSS alone.
48
 
49
+ Bottom line:
 
 
 
 
 
 
50
 
51
+ - The BF16 package reduces storage, not runtime memory.
52
+ - This mixed q8/BF16 package reduces both storage and runtime memory. This is the package to use when generation memory footprint matters.
53
+
54
+ | Layout | Disk | Runtime Memory | Improvement |
55
+ | --- | ---: | --- | --- |
56
+ | Original source snapshot | 118 GiB | Baseline | Baseline. |
57
+ | BF16 package | 64 GiB | Same class as original | Storage only; output was byte-identical. |
58
+ | This mixed q8/BF16 package | 40 GiB | Lower | Storage and memory; side-by-side quality validation passed. |
59
+
60
+ Compared with the original source snapshot, this mixed q8/BF16 package cuts disk usage by about 66%, MLX peak memory by about 37%, and physical peak memory by about 35% in this validation run. It is not byte-identical to BF16, but the validation contact sheet stayed in the same visual family. The prepared q8/BF16 output was byte-identical to running `--quantize 8` from the upstream source snapshot.
61
+
62
+ Raw measurements:
63
+
64
+ - Original source snapshot: 32.99 GiB MLX peak, 48.90 GiB physical peak, 108.31 s.
65
+ - BF16 package: 32.98 GiB MLX peak, 45.12 GiB physical peak, 114.39 s.
66
+ - This mixed q8/BF16 package: 20.84 GiB MLX peak, 31.75 GiB physical peak, 110.34 s.
67
 
68
  ## Compatibility
69