| | --- |
| | license: apache-2.0 |
| | base_model: |
| | - lodestones/Chroma1-Flash |
| | base_model_relation: quantized |
| | language: |
| | - en |
| | pipeline_tag: text-to-image |
| | library_name: diffusers |
| | --- |
| | ## Update: I have uploaded an updated version of this model, that should further reduce disk size and VRAM usage by ~82 MB. This is because I missed out on compressing a small portion of the model (the `distilled_guidance_layer.layers`) in my original upload. There is <u>no need to download again</u> if you are not having any issues with the older version. |
| |
|
| | Being a distilled model, this model requires different parameters to run optimally compared to the undistilled Chroma1-HD version. However, after quite a while of testing, I am unable to determine what settings to use. The [model card](https://huggingface.co/lodestones/Chroma1-Flash) is blank, but the commit message by the author says "use heun 8 steps CFG=1". Sadly, trying to use `HeunDiscreteScheduler` or `FlowMatchHeunDiscreteScheduler` with this model causes the pipeline to fail, presumably due to [this issue](https://github.com/huggingface/diffusers/issues/9971) with the `diffusers` library. |
| |
|
| |
|
| | For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11 |
| |
|
| | This is my first time using DF11 to compress a model outside the Flux architecture. The process for compressing Flux-based models is much more straightforward as compared to other architectures because the compression code requires a `pattern_dict` as input, but the original [example code](https://github.com/LeanModels/DFloat11/tree/master/examples/compress_flux1) only provides it for Flux, which meant I had to learn the notation myself and modify it to fit other models. At least Chroma is just a pruned version of Flux, so it was relatively simple to derive the correct `pattern_dict` this time. Do let me know if you run into any problems. |
| |
|
| | This is the `pattern_dict` I used for compression: |
| | ```python |
| | pattern_dict = { |
| | r"distilled_guidance_layer\.layers\.\d+": ( |
| | "linear_1", |
| | "linear_2" |
| | ), |
| | r"transformer_blocks\.\d+": ( |
| | "attn.to_q", |
| | "attn.to_k", |
| | "attn.to_v", |
| | "attn.add_k_proj", |
| | "attn.add_v_proj", |
| | "attn.add_q_proj", |
| | "attn.to_out.0", |
| | "attn.to_add_out", |
| | "ff.net.0.proj", |
| | "ff.net.2", |
| | "ff_context.net.0.proj", |
| | "ff_context.net.2", |
| | ), |
| | r"single_transformer_blocks\.\d+": ( |
| | "proj_mlp", |
| | "proj_out", |
| | "attn.to_q", |
| | "attn.to_k", |
| | "attn.to_v", |
| | ), |
| | } |
| | ``` |
| |
|
| | ### How to Use |
| |
|
| | #### `diffusers` |
| |
|
| | 1. Install the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*: |
| |
|
| | ```bash |
| | pip install dfloat11[cuda12] |
| | # or if you have CUDA version 11: |
| | # pip install dfloat11[cuda11] |
| | ``` |
| | 2. To use the DFloat11 model, run the following example code in Python: |
| | ```python |
| | import torch |
| | from diffusers import ChromaPipeline, ChromaTransformer2DModel |
| | from dfloat11 import DFloat11Model |
| | from transformers.modeling_utils import no_init_weights |
| | with no_init_weights(): |
| | transformer = ChromaTransformer2DModel.from_config( |
| | ChromaTransformer2DModel.load_config( |
| | "lodestones/Chroma1-Flash", |
| | subfolder="transformer" |
| | ), |
| | torch_dtype=torch.bfloat16 |
| | ).to(torch.bfloat16) |
| | |
| | pipe = ChromaPipeline.from_pretrained( |
| | "lodestones/Chroma1-Flash", |
| | transformer=transformer, |
| | torch_dtype=torch.bfloat16 |
| | ) |
| | DFloat11Model.from_pretrained("mingyi456/Chroma1-Flash-DF11", device='cpu', bfloat16_model=pipe.transformer) |
| | pipe.enable_model_cpu_offload() |
| | prompt = "A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done." |
| | negative_prompt = "low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors" |
| | |
| | # Call the pipeline with your own parameters, I am not sure what are the optimal settings for this model in `diffusers` |
| | |
| | ``` |
| | #### ComfyUI |
| | Refer to this [model](https://huggingface.co/mingyi456/Chroma1-Flash-DF11-ComfyUI) page instead, and follow the instructions there. |