AbstractFramework
/

wan2.2-t2v-a14b-diffusers-bf16

@@ -37,16 +37,26 @@ For Wan checkpoints, MLX-Gen loads transformer and VAE weights at BF16 runtime p
 ## Local Validation
-Validation used 384x224, 17 frames, 12 denoising steps, guidance 4, guidance-2 3, fps 8, and seed 4242 on Apple Silicon. Memory is measured for the full package run from model init through video save. RSS alone is not enough for MLX/Metal unified-memory pressure, so the table reports both MLX allocator peak and Darwin physical footprint.
-| Package | Folder Size | Runtime Precision | Mode | MLX Peak | Physical Footprint Peak | Time | Notes |
-| --- | ---: | --- | --- | ---: | ---: | ---: | --- |
-| Upstream source snapshot | 118 GiB | BF16 runtime from FP32/BF16 source files | release inactive denoiser | 32.99 GiB | 48.90 GiB | 108.31 s | Baseline source-cache run. |
-| This BF16 prepared folder | 64 GiB | BF16 | release inactive denoiser | 32.98 GiB | 45.12 GiB | 114.39 s | Output was byte-identical to the source-cache run. |
-| This BF16 prepared folder | 64 GiB | BF16 | no denoiser release | 59.72 GiB | 75.14 GiB | 110.97 s | Retains both A14B denoisers through decode. |
-| This BF16 prepared folder | 64 GiB | BF16 | low-RAM release | 27.75 GiB | 33.18 GiB | 110.76 s | Releases denoisers before decode and clears per-step cache. |
-| Mixed q8/BF16 prepared folder | 40 GiB | q8 transformer block linears, BF16 sensitive paths and VAE | release inactive denoiser | 20.84 GiB | 31.75 GiB | 110.34 s | Lower usage memory; not byte-identical to BF16 but passed side-by-side validation. |
-| Mixed q8/BF16 prepared folder | 40 GiB | q8 transformer block linears, BF16 sensitive paths and VAE | low-RAM release | 15.48 GiB | 20.74 GiB | 108.70 s | Lowest measured usage-memory mode. |
 ## Compatibility

 ## Local Validation
+Validation used 384x224, 17 frames, 12 denoising steps, guidance 4, guidance-2 3, fps 8, and seed 4242 on Apple Silicon. The memory numbers cover the full run from model init through video save. MLX peak is the MLX allocator peak; physical peak is the Darwin process physical footprint, which better reflects Apple Silicon unified-memory pressure than RSS alone.
+Bottom line:
+- The BF16 package reduces storage, not runtime memory. It is useful when you want a smaller, uploadable package with byte-identical output to the original source-cache run.
+- The mixed q8/BF16 package reduces both storage and runtime memory. Use it when memory footprint matters.
+| Layout | Disk | Runtime Memory | Improvement |
+| --- | ---: | --- | --- |
+| Original source snapshot | 118 GiB | Baseline | Baseline. |
+| This BF16 package | 64 GiB | Same class as original | Storage only; output was byte-identical. |
+| Mixed q8/BF16 package | 40 GiB | Lower | Storage and memory; side-by-side quality validation passed. |
+Compared with the original source snapshot, this BF16 package cuts disk usage by about 46% but does not materially reduce generation memory. The mixed q8/BF16 package cuts disk usage by about 66% and physical peak memory by about 35% in this validation run.
+Raw measurements:
+- Original source snapshot: 32.99 GiB MLX peak, 48.90 GiB physical peak, 108.31 s.
+- This BF16 package: 32.98 GiB MLX peak, 45.12 GiB physical peak, 114.39 s.
+- Mixed q8/BF16 package: 20.84 GiB MLX peak, 31.75 GiB physical peak, 110.34 s.
 ## Compatibility