--- base_model: - black-forest-labs/FLUX.1-dev base_model_relation: quantized pipeline_tag: text-to-image tags: - dfloat11 - df11 - lossless compression - 70% size, 100% accuracy --- ## DFloat11 Compressed Model: `black-forest-labs/FLUX.1-dev` This is a **losslessly compressed** version of [`black-forest-labs/FLUX.1-dev`](https://huggingface.co/black-forest-labs/FLUX.1-dev) using our custom **DFloat11** format. The outputs of this compressed model are **bit-for-bit identical** to the original BFloat16 model, while reducing GPU memory consumption by approximately **30%**. ### ๐Ÿ” How It Works DFloat11 compresses model weights using **Huffman coding** of BFloat16 exponent bits, combined with **hardware-aware algorithmic designs** that enable efficient on-the-fly decompression directly on the GPU. During inference, the weights remain compressed in GPU memory and are **decompressed just before matrix multiplications**, then **immediately discarded after use** to minimize memory footprint. Key benefits: * **No CPU decompression or host-device data transfer** --- all operations are handled entirely on the GPU. * DFloat11 is **much faster than CPU-offloading approaches**, enabling practical deployment in memory-constrained environments. * The compression is **fully lossless**, guaranteeing that the modelโ€™s outputs are **bit-for-bit identical** to those of the original model. ### ๐Ÿ”ง How to Use 1. Install or upgrade the DFloat11 package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*: ```bash pip install dfloat11[cuda12] # or if you have CUDA version 11: # pip install dfloat11[cuda11] ``` 2. Install or upgrade the diffusers package: ```bash pip install -U diffusers ``` 3. Save the following code as a Python file `flux1.py`: ```python import torch from diffusers import FluxPipeline, FluxTransformer2DModel from dfloat11 import DFloat11Model from transformers.modeling_utils import no_init_weights with no_init_weights(): transformer = FluxTransformer2DModel.from_config( FluxTransformer2DModel.load_config( "black-forest-labs/FLUX.1-dev", subfolder="transformer" ) ).to(torch.bfloat16) pipe = FluxPipeline.from_pretrained( "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16 ) DFloat11Model.from_pretrained( 'DFloat11/FLUX.1-dev-DF11', device='cpu', bfloat16_model=pipe.transformer, ) pipe.enable_model_cpu_offload() prompt = "A scenic landscape with mountains, a river, and a clear sky." image = pipe( prompt, width=1024, height=1024, guidance_scale=3.5, num_inference_steps=50, max_sequence_length=512, generator=torch.Generator(device="cuda").manual_seed(0) ).images[0] image.save("image.png") ``` 4. Run `python flux1.py` in your terminal. ### ๐Ÿ“„ Learn More * **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651) * **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11) * **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)