ApacheOne
/

FluxKlein4b-nvfp4_dfloat11_mixed

lossless compression

70% size, 100% accuracy

86% size 100% accuracy

74.4% size 100% accuracy

Model card Files Files and versions

ApacheOne commited on Mar 23

Commit

5cbeabb

·

verified ·

1 Parent(s): e48d653

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -23,4 +23,7 @@ This isnt the perfect balance between nvfp4 layers and Dfloat11 compressed layer
 `flux-2-klein-4b-nvfp4_nvfp4_dfloat11.safetensors`
 Other models I have done get 86% size 100% accuracy doing plain Dfloat11 compression.
-and around 74.4% size 100% accuracy doing nvfp4 mixed with Dfloat11 compression.

 `flux-2-klein-4b-nvfp4_nvfp4_dfloat11.safetensors`
 Other models I have done get 86% size 100% accuracy doing plain Dfloat11 compression.
+and around 74.4% size 100% accuracy doing nvfp4 mixed with Dfloat11 compression.
+The balance needs to be found between layers we want to have the nvfp4 speed vs Dfloat11 lossless compression slower than bf16 but faster than offloading model into ram.
+This matters more for larger models with many bf16 layers. Wan, qwen, ltx are high on the list to do.