kelseye commited on
Commit
b42868f
·
verified ·
1 Parent(s): 9901bb0

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -5,23 +5,22 @@ license: apache-2.0
5
 
6
  ## Model Introduction
7
 
8
- This model is trained based on [Qwen/Qwen-Image-Layered](https://modelscope.cn/models/Qwen/Qwen-Image-Layered) using the dataset [artplus/PrismLayersPro](https://modelscope.cn/datasets/artplus/PrismLayersPro), enabling text-controlled extraction of segmented image layers.
9
 
10
-
11
- For more details about training strategies and implementation, feel free to check our [technical blog](https://huggingface.co/blog/kelseye/qwen-image-layered-control).
12
 
13
  ## Usage Tips
14
 
15
- * The model architecture has been modified from multi-image output to single-image output, producing only the layer relevant to the textual description.
16
- * The model was trained exclusively on English text but inherits Chinese language understanding capabilities from the base model.
17
  * The native training resolution is 1024x1024; however, inference at other resolutions is supported.
18
- * The model struggles to separate multiple overlapping entities (e.g., the cartoon skeleton and hat in the examples).
19
- * The model excels at decomposing poster-like images but performs poorly on photographic images, especially those involving complex lighting and shadows.
20
- * Negative prompts are supporteduse them to specify content you want excluded from the output.
21
 
22
  ## Demo Examples
23
 
24
- **Some images contain white text on light backgrounds. Users of ModelScope community should click the "☀︎" icon at the top-right corner to switch to dark mode for better visibility.**
25
 
26
  ### Example 1
27
 
@@ -64,8 +63,8 @@ For more details about training strategies and implementation, feel free to chec
64
 
65
  |Prompt|Output Image|Prompt|Output Image|
66
  |-|-|-|-|
67
- |蓝天,白云,一片花园,花园里有五颜六色的花|![](./assets/image_2_0_0.png)|五彩的精致花环|![](./assets/image_2_2_0.png)|
68
- |少女、花环、小猫|![](./assets/image_2_1_0.png)|少女、小猫|![](./assets/image_2_3_0.png)|
69
 
70
  </div>
71
 
@@ -87,8 +86,8 @@ For more details about training strategies and implementation, feel free to chec
87
 
88
  |Prompt|Output Image|Prompt|Output Image|
89
  |-|-|-|-|
90
- |一片湛蓝的天空和波涛汹涌的大海|![](./assets/image_3_0_0.png)|文字“向往的生活”|![](./assets/image_3_2_0.png)|
91
- |一只海鸥|![](./assets/image_3_1_0.png)|文字“生活”|![](./assets/image_3_3_0.png)|
92
 
93
  </div>
94
 
@@ -104,7 +103,7 @@ cd DiffSynth-Studio
104
  pip install -e .
105
  ```
106
 
107
- Model Inference:
108
 
109
  ```python
110
  from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
 
5
 
6
  ## Model Introduction
7
 
8
+ This model is trained based on the model [Qwen/Qwen-Image-Layered](https://modelscope.cn/models/Qwen/Qwen-Image-Layered) using the dataset [artplus/PrismLayersPro](https://modelscope.cn/datasets/artplus/PrismLayersPro), enabling text-controlled extraction of segmented layers.
9
 
10
+ For more details about training strategies and implementation, feel free to check our [technical blog](https://modelscope.cn/learn/4938).
 
11
 
12
  ## Usage Tips
13
 
14
+ * The model architecture has been changed from multi-image output to single-image output, producing only the layer relevant to the provided text description.
15
+ * The model was trained exclusively on English text, but retains Chinese language understanding capabilities inherited from the base model.
16
  * The native training resolution is 1024x1024; however, inference at other resolutions is supported.
17
+ * The model struggles to separate multiple entities that are heavily occluded or overlapping, such as the cartoon skeleton head and hat in the examples.
18
+ * The model excels at decomposing poster-like graphics but performs poorly on photographic images, especially those involving complex lighting and shadows.
19
+ * The model supports negative promptsusers can specify content they wish to exclude via negative prompt descriptions.
20
 
21
  ## Demo Examples
22
 
23
+ **Some images contain white text on light backgrounds. ModelScope users should click the "☀︎" icon in the top-right corner to switch to dark mode for better visibility.**
24
 
25
  ### Example 1
26
 
 
63
 
64
  |Prompt|Output Image|Prompt|Output Image|
65
  |-|-|-|-|
66
+ |Blue sky, white clouds, a garden with colorful flowers|![](./assets/image_2_0_0.png)|Colorful, intricate floral wreath|![](./assets/image_2_2_0.png)|
67
+ |Girl, wreath, kitten|![](./assets/image_2_1_0.png)|Girl, kitten|![](./assets/image_2_3_0.png)|
68
 
69
  </div>
70
 
 
86
 
87
  |Prompt|Output Image|Prompt|Output Image|
88
  |-|-|-|-|
89
+ |A clear blue sky and a turbulent sea|![](./assets/image_3_0_0.png)|Text "The Life I Long For"|![](./assets/image_3_2_0.png)|
90
+ |A seagull|![](./assets/image_3_1_0.png)|Text "Life"|![](./assets/image_3_3_0.png)|
91
 
92
  </div>
93
 
 
103
  pip install -e .
104
  ```
105
 
106
+ Model inference:
107
 
108
  ```python
109
  from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
README_from_modelscope.md CHANGED
@@ -14,6 +14,9 @@ base_model_relation: finetune
14
 
15
  本模型基于模型 [Qwen/Qwen-Image-Layered](https://modelscope.cn/models/Qwen/Qwen-Image-Layered) 在数据集 [artplus/PrismLayersPro](https://modelscope.cn/datasets/artplus/PrismLayersPro) 上进行了训练,可以通过文本控制拆分的图层内容。
16
 
 
 
 
17
  ## 使用技巧
18
 
19
  * 模型结构从多图输出改为了单图输出,仅输出与文本描述相关的图层
 
14
 
15
  本模型基于模型 [Qwen/Qwen-Image-Layered](https://modelscope.cn/models/Qwen/Qwen-Image-Layered) 在数据集 [artplus/PrismLayersPro](https://modelscope.cn/datasets/artplus/PrismLayersPro) 上进行了训练,可以通过文本控制拆分的图层内容。
16
 
17
+
18
+ 更多关于训练策略和实现细节,欢迎查看我们的[技术博客](https://modelscope.cn/learn/4938)。
19
+
20
  ## 使用技巧
21
 
22
  * 模型结构从多图输出改为了单图输出,仅输出与文本描述相关的图层
qwen_image_layered_control_bf16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63b1966f0423bdc94d87273b8958de91e0a8f642c635f9113632d09cae3aa4ad
3
+ size 40861043888