ApacheOne
/

FluxKlein4b-nvfp4_dfloat11_mixed

lossless compression

70% size, 100% accuracy

86% size 100% accuracy

74.4% size 100% accuracy

Model card Files Files and versions

FluxKlein4b-nvfp4_dfloat11_mixed / README.md

ApacheOne's picture

Update README.md

5cbeabb verified 14 days ago

|

history blame contribute delete

997 Bytes

	---
	base_model_relation: quantized
	tags:
	- dfloat11
	- nvfp4
	- df11
	- lossless compression
	- 70% size, 100% accuracy
	- 86% size 100% accuracy
	- 74.4% size 100% accuracy
	language:
	- en
	base_model:
	- black-forest-labs/FLUX.2-klein-4B
	---

	Just some testing.

	The bf16 layers are compressed using the Dfloat11 lossless compression mixed with nvfp4 layers.

	This isnt the perfect balance between nvfp4 layers and Dfloat11 compressed layers and changes a good amount from model to model but it is a start.

	`flux-2-klein-4b-nvfp4_nvfp4_dfloat11.safetensors`

	Other models I have done get 86% size 100% accuracy doing plain Dfloat11 compression.
	and around 74.4% size 100% accuracy doing nvfp4 mixed with Dfloat11 compression.

	The balance needs to be found between layers we want to have the nvfp4 speed vs Dfloat11 lossless compression slower than bf16 but faster than offloading model into ram.
	This matters more for larger models with many bf16 layers. Wan, qwen, ltx are high on the list to do.