| --- |
| base_model_relation: quantized |
| tags: |
| - dfloat11 |
| - nvfp4 |
| - df11 |
| - lossless compression |
| - 70% size, 100% accuracy |
| - 86% size 100% accuracy |
| - 74.4% size 100% accuracy |
| language: |
| - en |
| base_model: |
| - black-forest-labs/FLUX.2-klein-4B |
| --- |
| |
| Just some testing. |
|
|
| The bf16 layers are compressed using the Dfloat11 lossless compression mixed with nvfp4 layers. |
|
|
| This isnt the perfect balance between nvfp4 layers and Dfloat11 compressed layers and changes a good amount from model to model but it is a start. |
|
|
| `flux-2-klein-4b-nvfp4_nvfp4_dfloat11.safetensors` |
|
|
| Other models I have done get 86% size 100% accuracy doing plain Dfloat11 compression. |
| and around 74.4% size 100% accuracy doing nvfp4 mixed with Dfloat11 compression. |
|
|
| The balance needs to be found between layers we want to have the nvfp4 speed vs Dfloat11 lossless compression slower than bf16 but faster than offloading model into ram. |
| This matters more for larger models with many bf16 layers. Wan, qwen, ltx are high on the list to do. |