For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11
This is my first time using DF11 to compress a model outside the Flux architecture.
The process for compressing Flux-based models is much more straightforward as compared to other architectures because the compression code requires a pattern_dict as input, but the original example code only provides it for Flux, which meant I had to learn the notation myself and modify it to fit other models.
After a long wait, the output compressed model works fine on my RTX 4090, fully fitting into 24GB of VRAM unlike the uncompressed version, but do let me know if you run into any problems.
This is the pattern_dict I used for compression:
pattern_dict={
"transformer_blocks\.\d+": (
"norm1.linear_1",
"norm1.linear_2",
"attn1.to_q",
"attn1.to_k",
"attn1.to_v",
"attn1.to_out.0",
"norm2.linear_1",
"norm2.linear_2",
"attn2.to_q",
"attn2.to_k",
"attn2.to_v",
"attn2.to_out.0",
"norm3.linear_1",
"norm3.linear_2",
"ff.net.0.proj",
"ff.net.2"
)
}
How to Use
diffusers
Install the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed):
pip install dfloat11[cuda12] # or if you have CUDA version 11: # pip install dfloat11[cuda11]To use the DFloat11 model, run the following example code in Python:
import torch from diffusers import Cosmos2TextToImagePipeline, CosmosTransformer3DModel from dfloat11 import DFloat11Model from transformers.modeling_utils import no_init_weights with no_init_weights(): transformer = CosmosTransformer3DModel.from_config( CosmosTransformer3DModel.load_config( "nvidia/Cosmos-Predict2-14B-Text2Image", subfolder="transformer" ), torch_dtype=torch.bfloat16 ).to(torch.bfloat16) pipe = Cosmos2TextToImagePipeline.from_pretrained( "nvidia/Cosmos-Predict2-14B-Text2Image", transformer=transformer, torch_dtype=torch.bfloat16 ) DFloat11Model.from_pretrained("mingyi456/Cosmos-Predict2-14B-Text2Image-DF11", device='cpu', bfloat16_model=pipe.transformer) pipe.enable_model_cpu_offload() prompt = "A close-up shot captures a vibrant yellow scrubber vigorously working on a grimy plate, its bristles moving in circular motions to lift stubborn grease and food residue. The dish, once covered in remnants of a hearty meal, gradually reveals its original glossy surface. Suds form and bubble around the scrubber, creating a satisfying visual of cleanliness in progress. The sound of scrubbing fills the air, accompanied by the gentle clinking of the dish against the sink. As the scrubber continues its task, the dish transforms, gleaming under the bright kitchen lights, symbolizing the triumph of cleanliness over mess." negative_prompt = "The video captures a series of frames showing ugly scenes, static with no motion, motion blur, over-saturation, shaky footage, low resolution, grainy texture, pixelated images, poorly lit areas, underexposed and overexposed scenes, poor color balance, washed out colors, choppy sequences, jerky movements, low frame rate, artifacting, color banding, unnatural transitions, outdated special effects, fake elements, unconvincing visuals, poorly edited content, jump cuts, visual noise, and flickering. Overall, the video is of poor quality." image = pipe( prompt, negative_prompt=negative_prompt, max_sequence_length=256, generator=torch.Generator("cpu").manual_seed(0) ).images[0] image.save("Cosmos-Predict2-14B-Text2Image.png")
ComfyUI
Follow the instructions (have not tested myself) here: https://github.com/LeanModels/ComfyUI-DFloat11
Currently, this model will not work with ComfyUI out of the box, because the custom node currently only supports Flux models. It should be possible to modify the code to successfully load this model as well, but it requires another pattern_dict that is of a completely different form compared to the one used to compress the model. If you are interested in running this model in ComfyUI, please try to contact the developer to request support.
- Downloads last month
- 4
Model tree for mingyi456/Cosmos-Predict2-14B-Text2Image-DF11
Base model
nvidia/Cosmos-Predict2-14B-Text2Image