ApacheOne commited on
Commit
5cbeabb
·
verified ·
1 Parent(s): e48d653

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -23,4 +23,7 @@ This isnt the perfect balance between nvfp4 layers and Dfloat11 compressed layer
23
  `flux-2-klein-4b-nvfp4_nvfp4_dfloat11.safetensors`
24
 
25
  Other models I have done get 86% size 100% accuracy doing plain Dfloat11 compression.
26
- and around 74.4% size 100% accuracy doing nvfp4 mixed with Dfloat11 compression.
 
 
 
 
23
  `flux-2-klein-4b-nvfp4_nvfp4_dfloat11.safetensors`
24
 
25
  Other models I have done get 86% size 100% accuracy doing plain Dfloat11 compression.
26
+ and around 74.4% size 100% accuracy doing nvfp4 mixed with Dfloat11 compression.
27
+
28
+ The balance needs to be found between layers we want to have the nvfp4 speed vs Dfloat11 lossless compression slower than bf16 but faster than offloading model into ram.
29
+ This matters more for larger models with many bf16 layers. Wan, qwen, ltx are high on the list to do.