mingyi456
/

Ace-Step1.5-DF11-ComfyUI

Diffusion Single File

Model card Files Files and versions

mingyi456 commited on 22 days ago

Commit

a142735

·

verified ·

1 Parent(s): 8339ed8

Update README.md

Files changed (1) hide show

README.md +84 -1

README.md CHANGED Viewed

@@ -10,4 +10,87 @@ tags:
 base_model:
 - ACE-Step/Ace-Step1.5
 base_model_relation: quantized
----

 base_model:
 - ACE-Step/Ace-Step1.5
 base_model_relation: quantized
+---
+For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11
+Feel free to request for other models for compression as well, although models whose architecture I am unfamiliar with might be slightly tricky for me.
+### How to Use
+#### ComfyUI
+Install the ComfyUI DFloat11 Extended node via the ComfyUI manager. After installing, simply replace the "Load Diffusion Model" node of an existing workflow with the "Load Diffusion Model" node. If you run into any issues, feel free to leave a comment.
+#### Official implementation
+This is coming soon, but I suspect that these existing compressed weights might be compatible out-of-the-box with the official implementation.
+### Compression Details
+This is the `pattern_dict` for compressing ACEStep15-based models in ComfyUI:
+```python
+pattern_dict_comfyui = {
+    r"decoder\.time_embed": (
+        "linear_1",
+        "linear_2",
+        "time_proj",
+    ),
+    r"decoder\.time_embed_r": (
+        "linear_1",
+        "linear_2",
+        "time_proj",
+    ),
+    r"decoder\.layers\.\d+": (
+        "self_attn.q_proj",
+        "self_attn.k_proj",
+        "self_attn.v_proj",
+        "self_attn.o_proj",
+        "cross_attn.q_proj",
+        "cross_attn.k_proj",
+        "cross_attn.v_proj",
+        "cross_attn.o_proj",
+        "mlp.gate_proj",
+        "mlp.up_proj",
+        "mlp.down_proj",
+    ),
+    r"encoder\.lyric_encoder\.layers\.\d++": (
+        "self_attn.q_proj",
+        "self_attn.k_proj",
+        "self_attn.v_proj",
+        "self_attn.o_proj",
+        "mlp.gate_proj",
+        "mlp.up_proj",
+        "mlp.down_proj",
+    ),
+    r"encoder\.timbre_encoder\.layers\.\d+": (
+        "self_attn.q_proj",
+        "self_attn.k_proj",
+        "self_attn.v_proj",
+        "self_attn.o_proj",
+        "mlp.gate_proj",
+        "mlp.up_proj",
+        "mlp.down_proj",
+    ),
+    r"tokenizer\.attention_pooler\.layers\.\d+": (
+        "self_attn.q_proj",
+        "self_attn.k_proj",
+        "self_attn.v_proj",
+        "self_attn.o_proj",
+        "mlp.gate_proj",
+        "mlp.up_proj",
+        "mlp.down_proj",
+    ),
+    r"detokenizer\.layers\.\d+": (
+        "self_attn.q_proj",
+        "self_attn.k_proj",
+        "self_attn.v_proj",
+        "self_attn.o_proj",
+        "mlp.gate_proj",
+        "mlp.up_proj",
+        "mlp.down_proj",
+    ),
+}
+```