kelseye commited on
Commit
b4dc7c3
·
verified ·
1 Parent(s): 2d4079f

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/canny_1.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/canny_2.png filter=lfs diff=lfs merge=lfs -text
38
+ assets/image_1_1.png filter=lfs diff=lfs merge=lfs -text
39
+ assets/image_1_2.png filter=lfs diff=lfs merge=lfs -text
40
+ assets/image_2_1.png filter=lfs diff=lfs merge=lfs -text
41
+ assets/image_2_2.png filter=lfs diff=lfs merge=lfs -text
42
+ assets/image_3_1.png filter=lfs diff=lfs merge=lfs -text
43
+ assets/image_3_2.png filter=lfs diff=lfs merge=lfs -text
44
+ assets/title.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Qwen-Image Image Structure Control Model
5
+
6
+ ![](./assets/title.png)
7
+
8
+ ## Model Introduction
9
+
10
+ This model is an image structure control model trained based on [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image), with the ControlNet architecture. It can control the structure of generated images using edge detection (Canny) maps. The training framework is built on [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio), and the dataset used is [BLIP3o](https://modelscope.cn/datasets/BLIP3o/BLIP3o-60k).
11
+
12
+ ## Result Demonstration
13
+
14
+ |Canny Edge Map|Generated Image 1|Generated Image 2|
15
+ |-|-|-|
16
+ |![](./assets/canny_3.png)|![](./assets/image_3_1.png)|![](./assets/image_3_2.png)|
17
+ |![](./assets/canny_2.png)|![](./assets/image_2_1.png)|![](./assets/image_2_2.png)|
18
+ |![](./assets/canny_1.png)|![](./assets/image_1_1.png)|![](./assets/image_1_2.png)|
19
+
20
+ ## Inference Code
21
+ ```
22
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
23
+ cd DiffSynth-Studio
24
+ pip install -e .
25
+ ```
26
+
27
+ ```python
28
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig, ControlNetInput
29
+ from PIL import Image
30
+ import torch
31
+ from modelscope import dataset_snapshot_download
32
+
33
+
34
+ pipe = QwenImagePipeline.from_pretrained(
35
+ torch_dtype=torch.bfloat16,
36
+ device="cuda",
37
+ model_configs=[
38
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
39
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
40
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
41
+ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny", origin_file_pattern="model.safetensors"),
42
+ ],
43
+ tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
44
+ )
45
+
46
+ dataset_snapshot_download(
47
+ dataset_id="DiffSynth-Studio/example_image_dataset",
48
+ local_dir="./data/example_image_dataset",
49
+ allow_file_pattern="canny/image_1.jpg"
50
+ )
51
+ controlnet_image = Image.open("data/example_image_dataset/canny/image_1.jpg").resize((1328, 1328))
52
+ ```
53
+
54
+ prompt = "A little dog with shiny, soft fur and lively eyes, set in a spring courtyard with cherry blossoms falling, creating a beautiful and warm atmosphere."
55
+ image = pipe(
56
+ prompt, seed=0,
57
+ blockwise_controlnet_inputs=[ControlNetInput(image=controlnet_image)]
58
+ )
59
+ image.save("image.jpg")
60
+ ```
README_from_modelscope.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ frameworks:
3
+ - Pytorch
4
+ license: Apache License 2.0
5
+ tasks:
6
+ - text-to-image-synthesis
7
+
8
+ #model-type:
9
+ ##如 gpt、phi、llama、chatglm、baichuan 等
10
+ #- gpt
11
+
12
+ #domain:
13
+ ##如 nlp、cv、audio、multi-modal
14
+ #- nlp
15
+
16
+ #language:
17
+ ##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
18
+ #- cn
19
+
20
+ #metrics:
21
+ ##如 CIDEr、Blue、ROUGE 等
22
+ #- CIDEr
23
+
24
+ #tags:
25
+ ##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
26
+ #- pretrained
27
+
28
+ #tools:
29
+ ##如 vllm、fastchat、llamacpp、AdaSeq 等
30
+ #- vllm
31
+ base_model:
32
+ - Qwen/Qwen-Image
33
+ base_model_relation: adapter
34
+ ---
35
+ # Qwen-Image 图像结构控制模型
36
+
37
+ ![](./assets/title.png)
38
+
39
+ ## 模型介绍
40
+
41
+ 本模型是基于 [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image) 训练的图像结构控制模型,模型结构为 ControlNet,可根据边缘检测(Canny)图控制生成的图像结构。训练框架基于 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) 构建,采用的数据集是 [BLIP3o](https://modelscope.cn/datasets/BLIP3o/BLIP3o-60k)。
42
+
43
+
44
+ ## 效果展示
45
+
46
+ |结构图|生成图1|生成图2|
47
+ |-|-|-|
48
+ |![](./assets/canny_3.png)|![](./assets/image_3_1.png)|![](./assets/image_3_2.png)|
49
+ |![](./assets/canny_2.png)|![](./assets/image_2_1.png)|![](./assets/image_2_2.png)|
50
+ |![](./assets/canny_1.png)|![](./assets/image_1_1.png)|![](./assets/image_1_2.png)|
51
+
52
+ ## 推理代码
53
+ ```
54
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
55
+ cd DiffSynth-Studio
56
+ pip install -e .
57
+ ```
58
+
59
+ ```python
60
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig, ControlNetInput
61
+ from PIL import Image
62
+ import torch
63
+ from modelscope import dataset_snapshot_download
64
+
65
+
66
+ pipe = QwenImagePipeline.from_pretrained(
67
+ torch_dtype=torch.bfloat16,
68
+ device="cuda",
69
+ model_configs=[
70
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
71
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
72
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
73
+ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny", origin_file_pattern="model.safetensors"),
74
+ ],
75
+ tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
76
+ )
77
+
78
+ dataset_snapshot_download(
79
+ dataset_id="DiffSynth-Studio/example_image_dataset",
80
+ local_dir="./data/example_image_dataset",
81
+ allow_file_pattern="canny/image_1.jpg"
82
+ )
83
+ controlnet_image = Image.open("data/example_image_dataset/canny/image_1.jpg").resize((1328, 1328))
84
+
85
+ prompt = "一只小狗,毛发光洁柔顺,眼神灵动,背景是樱花纷飞的春日庭院,唯美温馨。"
86
+ image = pipe(
87
+ prompt, seed=0,
88
+ blockwise_controlnet_inputs=[ControlNetInput(image=controlnet_image)]
89
+ )
90
+ image.save("image.jpg")
91
+ ```
assets/canny_1.png ADDED

Git LFS Details

  • SHA256: d1fdd81e3a6b25b26098deea9830e1a8cfed5e685a0b350d1dad8378a1953c6d
  • Pointer size: 131 Bytes
  • Size of remote file: 158 kB
assets/canny_2.png ADDED

Git LFS Details

  • SHA256: 2e8da61bdaad91a4a72cc8082e4e268f98ac63f12ca8081f0b7b0d7eb398bc76
  • Pointer size: 131 Bytes
  • Size of remote file: 199 kB
assets/canny_3.png ADDED
assets/image_1_1.png ADDED

Git LFS Details

  • SHA256: 05a80d44518b67b0f614a41db04b8638e933f665284e47bc5ad286c924aed968
  • Pointer size: 132 Bytes
  • Size of remote file: 1.89 MB
assets/image_1_2.png ADDED

Git LFS Details

  • SHA256: 0bba256177b81223cc073ecd2ef604a91c12e6bdc8dd3b7757b8bd41e55375e0
  • Pointer size: 132 Bytes
  • Size of remote file: 1.79 MB
assets/image_2_1.png ADDED

Git LFS Details

  • SHA256: 0fc1afdedb9093b550e7b5b160bd9dcf2231627579c4820473e847ccdb0e5592
  • Pointer size: 132 Bytes
  • Size of remote file: 2.08 MB
assets/image_2_2.png ADDED

Git LFS Details

  • SHA256: cb52be64e0875f8388613b72bf98e04b7e5d38dce3b106d6d364dde3f94c61b7
  • Pointer size: 132 Bytes
  • Size of remote file: 1.98 MB
assets/image_3_1.png ADDED

Git LFS Details

  • SHA256: 173e42f753139ef2811b098980f154ee2123837ac715439c39a6247a2e3e0011
  • Pointer size: 132 Bytes
  • Size of remote file: 1.71 MB
assets/image_3_2.png ADDED

Git LFS Details

  • SHA256: 3c6083e3b7a6a8c40b361a010892229e0ed0a233457acaa5701aef7f4b25a0f8
  • Pointer size: 132 Bytes
  • Size of remote file: 1.77 MB
assets/title.png ADDED

Git LFS Details

  • SHA256: 839913c7c151aedb2ad6e7f52f8b0aede4c1eafcb23b0b68cb05fde93a12d763
  • Pointer size: 132 Bytes
  • Size of remote file: 3.78 MB
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"framework":"Pytorch","task":"text-to-image-synthesis"}
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16f54f5c3c84cb213eba98d304e7486816a39b2891715c662024f40192093429
3
+ size 2266838080