mingyi456's picture
Update README.md
557b66d verified
---
base_model:
- nvidia/Cosmos-Predict2-14B-Text2Image
base_model_relation: quantized
library_name: diffusers
license: other
license_name: nvidia-open-model-license
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license
language:
- en
pipeline_tag: text-to-image
---
For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11
This is my first time using DF11 to compress a model outside the Flux architecture.
The process for compressing Flux-based models is much more straightforward as compared to other architectures because the compression code requires a `pattern_dict` as input, but the original [example code](https://github.com/LeanModels/DFloat11/tree/master/examples/compress_flux1) only provides it for Flux, which meant I had to learn the notation myself and modify it to fit other models.
After a long wait, the output compressed model works fine on my RTX 4090, fully fitting into 24GB of VRAM unlike the uncompressed version, but do let me know if you run into any problems.
This is the `pattern_dict` I used for compression:
```python
pattern_dict={
"transformer_blocks\.\d+": (
"norm1.linear_1",
"norm1.linear_2",
"attn1.to_q",
"attn1.to_k",
"attn1.to_v",
"attn1.to_out.0",
"norm2.linear_1",
"norm2.linear_2",
"attn2.to_q",
"attn2.to_k",
"attn2.to_v",
"attn2.to_out.0",
"norm3.linear_1",
"norm3.linear_2",
"ff.net.0.proj",
"ff.net.2"
)
}
```
### How to Use
#### `diffusers`
1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:
```bash
pip install dfloat11[cuda12]
# or if you have CUDA version 11:
# pip install dfloat11[cuda11]
```
2. To use the DFloat11 model, run the following example code in Python:
```python
import torch
from diffusers import Cosmos2TextToImagePipeline, CosmosTransformer3DModel
from dfloat11 import DFloat11Model
from transformers.modeling_utils import no_init_weights
with no_init_weights():
transformer = CosmosTransformer3DModel.from_config(
CosmosTransformer3DModel.load_config(
"nvidia/Cosmos-Predict2-14B-Text2Image",
subfolder="transformer"
),
torch_dtype=torch.bfloat16
).to(torch.bfloat16)
pipe = Cosmos2TextToImagePipeline.from_pretrained(
"nvidia/Cosmos-Predict2-14B-Text2Image",
transformer=transformer,
torch_dtype=torch.bfloat16
)
DFloat11Model.from_pretrained("mingyi456/Cosmos-Predict2-14B-Text2Image-DF11", device='cpu', bfloat16_model=pipe.transformer)
pipe.enable_model_cpu_offload()
prompt = "A close-up shot captures a vibrant yellow scrubber vigorously working on a grimy plate, its bristles moving in circular motions to lift stubborn grease and food residue. The dish, once covered in remnants of a hearty meal, gradually reveals its original glossy surface. Suds form and bubble around the scrubber, creating a satisfying visual of cleanliness in progress. The sound of scrubbing fills the air, accompanied by the gentle clinking of the dish against the sink. As the scrubber continues its task, the dish transforms, gleaming under the bright kitchen lights, symbolizing the triumph of cleanliness over mess."
negative_prompt = "The video captures a series of frames showing ugly scenes, static with no motion, motion blur, over-saturation, shaky footage, low resolution, grainy texture, pixelated images, poorly lit areas, underexposed and overexposed scenes, poor color balance, washed out colors, choppy sequences, jerky movements, low frame rate, artifacting, color banding, unnatural transitions, outdated special effects, fake elements, unconvincing visuals, poorly edited content, jump cuts, visual noise, and flickering. Overall, the video is of poor quality."
image = pipe(
prompt,
negative_prompt=negative_prompt,
max_sequence_length=256,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("Cosmos-Predict2-14B-Text2Image.png")
```
#### ComfyUI
~~Follow the instructions (have not tested myself) here: https://github.com/LeanModels/ComfyUI-DFloat11~~
Currently, this model will not work with ComfyUI out of the box, because the custom node currently only supports Flux models. It should be possible to modify the code to successfully load this model as well, but it requires another `pattern_dict` that is of a completely different form compared to the one used to compress the model. If you are interested in running this model in ComfyUI, please try to contact the developer to request support.