Update README.md
Browse files
README.md
CHANGED
|
@@ -23,4 +23,7 @@ This isnt the perfect balance between nvfp4 layers and Dfloat11 compressed layer
|
|
| 23 |
`flux-2-klein-4b-nvfp4_nvfp4_dfloat11.safetensors`
|
| 24 |
|
| 25 |
Other models I have done get 86% size 100% accuracy doing plain Dfloat11 compression.
|
| 26 |
-
and around 74.4% size 100% accuracy doing nvfp4 mixed with Dfloat11 compression.
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
`flux-2-klein-4b-nvfp4_nvfp4_dfloat11.safetensors`
|
| 24 |
|
| 25 |
Other models I have done get 86% size 100% accuracy doing plain Dfloat11 compression.
|
| 26 |
+
and around 74.4% size 100% accuracy doing nvfp4 mixed with Dfloat11 compression.
|
| 27 |
+
|
| 28 |
+
The balance needs to be found between layers we want to have the nvfp4 speed vs Dfloat11 lossless compression slower than bf16 but faster than offloading model into ram.
|
| 29 |
+
This matters more for larger models with many bf16 layers. Wan, qwen, ltx are high on the list to do.
|