lainlives kelseye commited on
Commit
4e7e518
·
0 Parent(s):

Duplicate from DiffSynth-Studio/Qwen-Image-Layered-Control

Browse files

Co-authored-by: kelseye.xh <kelseye@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/image_1_0_0.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/image_1_1_0.png filter=lfs diff=lfs merge=lfs -text
38
+ assets/image_1_2_0.png filter=lfs diff=lfs merge=lfs -text
39
+ assets/image_1_3_0.png filter=lfs diff=lfs merge=lfs -text
40
+ assets/image_1_4_0.png filter=lfs diff=lfs merge=lfs -text
41
+ assets/image_1_5_0.png filter=lfs diff=lfs merge=lfs -text
42
+ assets/image_1_6_0.png filter=lfs diff=lfs merge=lfs -text
43
+ assets/image_1_7_0.png filter=lfs diff=lfs merge=lfs -text
44
+ assets/image_1_input.png filter=lfs diff=lfs merge=lfs -text
45
+ assets/image_2_0_0.png filter=lfs diff=lfs merge=lfs -text
46
+ assets/image_2_1_0.png filter=lfs diff=lfs merge=lfs -text
47
+ assets/image_2_2_0.png filter=lfs diff=lfs merge=lfs -text
48
+ assets/image_2_3_0.png filter=lfs diff=lfs merge=lfs -text
49
+ assets/image_2_input.png filter=lfs diff=lfs merge=lfs -text
50
+ assets/image_3_0_0.png filter=lfs diff=lfs merge=lfs -text
51
+ assets/image_3_1_0.png filter=lfs diff=lfs merge=lfs -text
52
+ assets/image_3_2_0.png filter=lfs diff=lfs merge=lfs -text
53
+ assets/image_3_3_0.png filter=lfs diff=lfs merge=lfs -text
54
+ assets/image_3_input.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Qwen-Image-Layered
5
+
6
+ ## Model Introduction
7
+
8
+ This model is trained based on the model [Qwen/Qwen-Image-Layered](https://modelscope.cn/models/Qwen/Qwen-Image-Layered) using the dataset [artplus/PrismLayersPro](https://modelscope.cn/datasets/artplus/PrismLayersPro), enabling text-controlled extraction of segmented layers.
9
+
10
+ For more details about training strategies and implementation, feel free to check our [technical blog](https://modelscope.cn/learn/4938).
11
+
12
+ ## Usage Tips
13
+
14
+ * The model architecture has been changed from multi-image output to single-image output, producing only the layer relevant to the provided text description.
15
+ * The model was trained exclusively on English text, but retains Chinese language understanding capabilities inherited from the base model.
16
+ * The native training resolution is 1024x1024; however, inference at other resolutions is supported.
17
+ * The model struggles to separate multiple entities that are heavily occluded or overlapping, such as the cartoon skeleton head and hat in the examples.
18
+ * The model excels at decomposing poster-like graphics but performs poorly on photographic images, especially those involving complex lighting and shadows.
19
+ * The model supports negative prompts—users can specify content they wish to exclude via negative prompt descriptions.
20
+
21
+ ## Demo Examples
22
+
23
+ **Some images contain white text on light backgrounds. ModelScope users should click the "☀︎" icon in the top-right corner to switch to dark mode for better visibility.**
24
+
25
+ ### Example 1
26
+
27
+ <div style="display: flex; justify-content: space-between;">
28
+
29
+ <div style="width: 30%;">
30
+
31
+ |Input Image|
32
+ |-|
33
+ |![](./assets/image_1_input.png)|
34
+
35
+ </div>
36
+
37
+ <div style="width: 66%;">
38
+
39
+ |Prompt|Output Image|Prompt|Output Image|
40
+ |-|-|-|-|
41
+ |A solid, uniform color with no distinguishable features or objects|![](./assets/image_1_0_0.png)|Text 'TRICK'|![](./assets/image_1_4_0.png)|
42
+ |Cloud|![](./assets/image_1_1_0.png)|Text 'TRICK OR TREAT'|![](./assets/image_1_3_0.png)|
43
+ |A cartoon skeleton character wearing a purple hat and holding a gift box|![](./assets/image_1_2_0.png)|Text 'TRICK OR'|![](./assets/image_1_7_0.png)|
44
+ |A purple hat and a head|![](./assets/image_1_5_0.png)|A gift box|![](./assets/image_1_6_0.png)|
45
+
46
+ </div>
47
+
48
+ </div>
49
+
50
+ ### Example 2
51
+
52
+ <div style="display: flex; justify-content: space-between;">
53
+
54
+ <div style="width: 30%;">
55
+
56
+ |Input Image|
57
+ |-|
58
+ |![](./assets/image_2_input.png)|
59
+
60
+ </div>
61
+
62
+ <div style="width: 66%;">
63
+
64
+ |Prompt|Output Image|Prompt|Output Image|
65
+ |-|-|-|-|
66
+ |Blue sky, white clouds, a garden with colorful flowers|![](./assets/image_2_0_0.png)|Colorful, intricate floral wreath|![](./assets/image_2_2_0.png)|
67
+ |Girl, wreath, kitten|![](./assets/image_2_1_0.png)|Girl, kitten|![](./assets/image_2_3_0.png)|
68
+
69
+ </div>
70
+
71
+ </div>
72
+
73
+ ### Example 3
74
+
75
+ <div style="display: flex; justify-content: space-between;">
76
+
77
+ <div style="width: 30%;">
78
+
79
+ |Input Image|
80
+ |-|
81
+ |![](./assets/image_3_input.png)|
82
+
83
+ </div>
84
+
85
+ <div style="width: 66%;">
86
+
87
+ |Prompt|Output Image|Prompt|Output Image|
88
+ |-|-|-|-|
89
+ |A clear blue sky and a turbulent sea|![](./assets/image_3_0_0.png)|Text "The Life I Long For"|![](./assets/image_3_2_0.png)|
90
+ |A seagull|![](./assets/image_3_1_0.png)|Text "Life"|![](./assets/image_3_3_0.png)|
91
+
92
+ </div>
93
+
94
+ </div>
95
+
96
+ ## Inference Code
97
+
98
+ Install DiffSynth-Studio:
99
+
100
+ ```
101
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
102
+ cd DiffSynth-Studio
103
+ pip install -e .
104
+ ```
105
+
106
+ Model inference:
107
+
108
+ ```python
109
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
110
+ from PIL import Image
111
+ import torch, requests
112
+
113
+ pipe = QwenImagePipeline.from_pretrained(
114
+ torch_dtype=torch.bfloat16,
115
+ device="cuda",
116
+ model_configs=[
117
+ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Layered-Control", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
118
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
119
+ ModelConfig(model_id="Qwen/Qwen-Image-Layered", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
120
+ ],
121
+ processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
122
+ )
123
+ prompt = "A cartoon skeleton character wearing a purple hat and holding a gift box"
124
+ input_image = requests.get("https://modelscope.oss-cn-beijing.aliyuncs.com/resource/images/trick_or_treat.png", stream=True).raw
125
+ input_image = Image.open(input_image).convert("RGBA").resize((1024, 1024))
126
+ input_image.save("image_input.png")
127
+ images = pipe(
128
+ prompt,
129
+ seed=0,
130
+ num_inference_steps=30, cfg_scale=4,
131
+ height=1024, width=1024,
132
+ layer_input_image=input_image,
133
+ layer_num=0,
134
+ )
135
+ images[0].save("image.png")
136
+ ```
README_from_modelscope.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ frameworks: PyTorch
3
+ license: Apache License 2.0
4
+ tags: []
5
+ tasks:
6
+ - text-to-image-synthesis
7
+ base_model:
8
+ - Qwen/Qwen-Image-Layered
9
+ base_model_relation: finetune
10
+ ---
11
+ # Qwen-Image-Layered
12
+
13
+ ## 模型介绍
14
+
15
+ 本模型基于模型 [Qwen/Qwen-Image-Layered](https://modelscope.cn/models/Qwen/Qwen-Image-Layered) 在数据集 [artplus/PrismLayersPro](https://modelscope.cn/datasets/artplus/PrismLayersPro) 上进行了训练,可以通过文本控制拆分的图层内容。
16
+
17
+
18
+ 更多关于训练策略和实现细节,欢迎查看我们的[技术博客](https://modelscope.cn/learn/4938)。
19
+
20
+ ## 使用技巧
21
+
22
+ * 模型结构从多图输出改为了单图输出,仅输出与文本描述相关的图层
23
+ * 模型只用英文文本训练过,但仍从基础模型继承了中文理解能力
24
+ * 模型训练的原生分辨率是1024x1024,支持以其他分辨率进行推理
25
+ * 模型难以拆分“互相遮挡”的多个实体,例如样例中的卡通骷髅头和帽子
26
+ * 模型擅长拆分海报图层,不擅长拆分摄影图像,尤其是存在光影的照片
27
+ * 模型支持负向提示词,可以通过负向提示词描述不希望出现在结果的内容
28
+
29
+ ## 效果展示
30
+
31
+ **部分图片为纯白色文本,魔搭社区用户请点击页面右上角的“☀︎”切换到暗色模式**
32
+
33
+ ### 样例1
34
+
35
+ <div style="display: flex; justify-content: space-between;">
36
+
37
+ <div style="width: 30%;">
38
+
39
+ |输入图|
40
+ |-|
41
+ |![](./assets/image_1_input.png)|
42
+
43
+ </div>
44
+
45
+ <div style="width: 66%;">
46
+
47
+ |提示词|输出图|提示词|输出图|
48
+ |-|-|-|-|
49
+ |A solid, uniform color with no distinguishable features or objects|![](./assets/image_1_0_0.png)|Text 'TRICK'|![](./assets/image_1_4_0.png)|
50
+ |Cloud|![](./assets/image_1_1_0.png)|Text 'TRICK OR TREAT'|![](./assets/image_1_3_0.png)|
51
+ |A cartoon skeleton character wearing a purple hat and holding a gift box|![](./assets/image_1_2_0.png)|Text 'TRICK OR'|![](./assets/image_1_7_0.png)|
52
+ |A purple hat and a head|![](./assets/image_1_5_0.png)|A gift box|![](./assets/image_1_6_0.png)|
53
+
54
+ </div>
55
+
56
+ </div>
57
+
58
+ ### 样例2
59
+
60
+ <div style="display: flex; justify-content: space-between;">
61
+
62
+ <div style="width: 30%;">
63
+
64
+ |输入图|
65
+ |-|
66
+ |![](./assets/image_2_input.png)|
67
+
68
+ </div>
69
+
70
+ <div style="width: 66%;">
71
+
72
+ |提示词|输出图|提示词|输出图|
73
+ |-|-|-|-|
74
+ |蓝天,白云,一片花园,花园里有五颜六色的花|![](./assets/image_2_0_0.png)|五彩的精致花环|![](./assets/image_2_2_0.png)|
75
+ |少女、花环、小猫|![](./assets/image_2_1_0.png)|少女、小猫|![](./assets/image_2_3_0.png)|
76
+
77
+ </div>
78
+
79
+ </div>
80
+
81
+ ### 样例3
82
+
83
+ <div style="display: flex; justify-content: space-between;">
84
+
85
+ <div style="width: 30%;">
86
+
87
+ |输入图|
88
+ |-|
89
+ |![](./assets/image_3_input.png)|
90
+
91
+ </div>
92
+
93
+ <div style="width: 66%;">
94
+
95
+ |提示词|输出图|提示词|输出图|
96
+ |-|-|-|-|
97
+ |一片湛蓝的天空和波涛汹涌的大海|![](./assets/image_3_0_0.png)|文字“向往的生活”|![](./assets/image_3_2_0.png)|
98
+ |一只海鸥|![](./assets/image_3_1_0.png)|文字“生活”|![](./assets/image_3_3_0.png)|
99
+
100
+ </div>
101
+
102
+ </div>
103
+
104
+ ## 推理代码
105
+
106
+ 安装 DiffSynth-Studio:
107
+
108
+ ```
109
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
110
+ cd DiffSynth-Studio
111
+ pip install -e .
112
+ ```
113
+
114
+ 模型推理:
115
+
116
+ ```python
117
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
118
+ from PIL import Image
119
+ import torch, requests
120
+
121
+ pipe = QwenImagePipeline.from_pretrained(
122
+ torch_dtype=torch.bfloat16,
123
+ device="cuda",
124
+ model_configs=[
125
+ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Layered-Control", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
126
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
127
+ ModelConfig(model_id="Qwen/Qwen-Image-Layered", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
128
+ ],
129
+ processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
130
+ )
131
+ prompt = "A cartoon skeleton character wearing a purple hat and holding a gift box"
132
+ input_image = requests.get("https://modelscope.oss-cn-beijing.aliyuncs.com/resource/images/trick_or_treat.png", stream=True).raw
133
+ input_image = Image.open(input_image).convert("RGBA").resize((1024, 1024))
134
+ input_image.save("image_input.png")
135
+ images = pipe(
136
+ prompt,
137
+ seed=0,
138
+ num_inference_steps=30, cfg_scale=4,
139
+ height=1024, width=1024,
140
+ layer_input_image=input_image,
141
+ layer_num=0,
142
+ )
143
+ images[0].save("image.png")
144
+ ```
assets/image_1_0_0.png ADDED

Git LFS Details

  • SHA256: 7571c7a59e6a301c2909978baeffa4c2d25aa31103dc026e702e6e0b77f4d545
  • Pointer size: 131 Bytes
  • Size of remote file: 766 kB
assets/image_1_1_0.png ADDED

Git LFS Details

  • SHA256: cf931fc683c3b51aea11d0cc18bcb3e108fea6e157a60f9aa64d3e9316edb67b
  • Pointer size: 131 Bytes
  • Size of remote file: 764 kB
assets/image_1_2_0.png ADDED

Git LFS Details

  • SHA256: e5e1e50a3549c9a88fac681ede16d5682e6bfc52bc584276fef9fd4b1439dda8
  • Pointer size: 131 Bytes
  • Size of remote file: 880 kB
assets/image_1_3_0.png ADDED

Git LFS Details

  • SHA256: 695c67883053681cbde394e0189cfc31c2a45d5c9b44887e9872dca8b4ec20b3
  • Pointer size: 131 Bytes
  • Size of remote file: 720 kB
assets/image_1_4_0.png ADDED

Git LFS Details

  • SHA256: fb02a4888540a023af32cb13c52f8883bc83f436544a3d9dec3c07a9c59578ca
  • Pointer size: 131 Bytes
  • Size of remote file: 650 kB
assets/image_1_5_0.png ADDED

Git LFS Details

  • SHA256: 9cc2f7958c5c27cdefa7309112d831435ac5b05d075bde7b4a6571e6a81e5f40
  • Pointer size: 131 Bytes
  • Size of remote file: 714 kB
assets/image_1_6_0.png ADDED

Git LFS Details

  • SHA256: 8c243e61ce6f592e936013fa33c8825edf544a9ddc31cdf3e65a7fedfc857741
  • Pointer size: 131 Bytes
  • Size of remote file: 637 kB
assets/image_1_7_0.png ADDED

Git LFS Details

  • SHA256: a15ad9e370a58b5e77f608affaf44870888e0081a2294f04119ca98131561ea4
  • Pointer size: 131 Bytes
  • Size of remote file: 660 kB
assets/image_1_input.png ADDED

Git LFS Details

  • SHA256: 0bf0cf15ba21de772f11eb11bf9fa9f62a4d2467347c98559b1d257220bd50ef
  • Pointer size: 131 Bytes
  • Size of remote file: 902 kB
assets/image_2_0_0.png ADDED

Git LFS Details

  • SHA256: f72f561ea8b1a20ab9215ef1285d5a767867d63a79b8384cdcb65ab281e3cca5
  • Pointer size: 132 Bytes
  • Size of remote file: 1.11 MB
assets/image_2_1_0.png ADDED

Git LFS Details

  • SHA256: 21615ea7ff938ba73922c36daac996da4efa97984bfd72f42c4cab73c04e864a
  • Pointer size: 132 Bytes
  • Size of remote file: 1.27 MB
assets/image_2_2_0.png ADDED

Git LFS Details

  • SHA256: f387a8f1646ce99b06156596fa0210fdfbb5b71c349427eb8e848b2722bfe569
  • Pointer size: 131 Bytes
  • Size of remote file: 761 kB
assets/image_2_3_0.png ADDED

Git LFS Details

  • SHA256: 149bc856488fe40d485d93e5788c3ea66ebab22cf0faa5bd5b11e10080602441
  • Pointer size: 132 Bytes
  • Size of remote file: 1.17 MB
assets/image_2_input.png ADDED

Git LFS Details

  • SHA256: ba1980967215c5090e26673dd38805b6d140662a9fff6f4e3fe2422485723c9a
  • Pointer size: 132 Bytes
  • Size of remote file: 1.32 MB
assets/image_3_0_0.png ADDED

Git LFS Details

  • SHA256: bcebe462984c8df120eddc998f7277f3c226dd717d3270b9b0cdba9154d5b65e
  • Pointer size: 132 Bytes
  • Size of remote file: 1.31 MB
assets/image_3_1_0.png ADDED

Git LFS Details

  • SHA256: fac7be288f3c4ead811edc2a388651424a07ac5ce6ef9f278af0861589bf5c01
  • Pointer size: 131 Bytes
  • Size of remote file: 613 kB
assets/image_3_2_0.png ADDED

Git LFS Details

  • SHA256: 168cff1bc58b7ef2e98dee24686ec9cf4923c79910c5728a3c0366307fbe5214
  • Pointer size: 131 Bytes
  • Size of remote file: 671 kB
assets/image_3_3_0.png ADDED

Git LFS Details

  • SHA256: e8e98774b8dd5afad15d12ef7f5895c5b0280391f6f26b4a8ec736356c602e49
  • Pointer size: 131 Bytes
  • Size of remote file: 627 kB
assets/image_3_input.png ADDED

Git LFS Details

  • SHA256: 17af2255d4311cc9a9bf96b3c5650a7754a74ccbb6fc677487f5c16de7264d91
  • Pointer size: 132 Bytes
  • Size of remote file: 1.37 MB
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"framework":"Pytorch","task":"text-to-image-synthesis"}
qwen_image_layered_control_bf16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63b1966f0423bdc94d87273b8958de91e0a8f642c635f9113632d09cae3aa4ad
3
+ size 40861043888
transformer/config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "QwenImageTransformer2DModel",
3
+ "_diffusers_version": "0.36.0.dev0",
4
+ "use_additional_t_cond": true,
5
+ "attention_head_dim": 128,
6
+ "axes_dims_rope": [
7
+ 16,
8
+ 56,
9
+ 56
10
+ ],
11
+ "guidance_embeds": false,
12
+ "in_channels": 64,
13
+ "joint_attention_dim": 3584,
14
+ "num_attention_heads": 24,
15
+ "num_layers": 60,
16
+ "out_channels": 16,
17
+ "patch_size": 2,
18
+ "use_layer3d_rope": true,
19
+ "zero_cond_t": false
20
+ }
transformer/diffusion_pytorch_model-00001-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5353f1dbff8445840012bd2aff2fd209034aa42d0ce623a55f3f542036244a2
3
+ size 9973590960
transformer/diffusion_pytorch_model-00002-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:957d266a7ccdcc9d3f225c82b0afa831ba5084c851b86934b9e4e9f10163b985
3
+ size 9987326040
transformer/diffusion_pytorch_model-00003-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f0e2bec2869de66f02b53bda77bc11618aba229453be56170209a654ddff0c0
3
+ size 9987307408
transformer/diffusion_pytorch_model-00004-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5244cf56dd45667fc8f373d43550bc187909bc48489f380fa3dcbb02901e7dcf
3
+ size 9930685680
transformer/diffusion_pytorch_model-00005-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45ecb944aad539ceaae9e3ba99dc9f2d650ba034cf4b305b0e83ebce0bb7b55c
3
+ size 982130448