Moran232 commited on
Commit
1b7991e
·
verified ·
1 Parent(s): c3fffb4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +210 -0
README.md ADDED
@@ -0,0 +1,210 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ pipeline_tag: image-to-image
7
+ ---
8
+ <h1 align="center">JoyAI-Image-Edit<br><sub><sup>Awakening Spatial Intelligence in Unified Multimodal Understanding and Generation</sup></sub></h1>
9
+
10
+ <div align="center">
11
+
12
+ [![Report PDF](https://img.shields.io/badge/Report-PDF-red)](https://joyai-image.s3.cn-north-1.jdcloud-oss.com/JoyAI-Image.pdf)
13
+ [![Project](https://img.shields.io/badge/Project-JoyAI--Image-333399)](https://github.com/jd-opensource/JoyAI-Image)
14
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-JoyAI--Image--Edit-yellow)](https://huggingface.co/jdopensource/JoyAI-Image-Edit)&#160;
15
+ [![ModelScope](https://img.shields.io/badge/%F0%9F%A4%96%20ModelScope-JoyAI--Image--Edit-624aff)](https://modelscope.cn/models/jd-opensource/JoyAI-Image-Edit)&#160;
16
+ [![Demo](https://img.shields.io/badge/%F0%9F%9A%80%20Demo-Spatial--Edit-orange)](https://huggingface.co/spaces/stevengrove/JoyAI-Image-Edit-Space)&#160;
17
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)
18
+
19
+
20
+ </div>
21
+
22
+ ## 🐶 JoyAI-Image-Edit
23
+
24
+ JoyAI-Image-Edit is a multimodal foundation model specialized in instruction-guided image editing. It enables precise and controllable edits by leveraging strong spatial understanding, including scene parsing, relational grounding, and instruction decomposition, allowing complex modifications to be applied accurately to specified regions.
25
+
26
+ ## 🚀 Quick Start
27
+
28
+ **Requirements**: Python >= 3.10, CUDA-capable GPU
29
+
30
+ ### Core Dependencies
31
+ The transformers version must be **between 4.57 and 4.58**; otherwise, incorrect results may occur.
32
+ | Package | Version | Purpose |
33
+ |---------|---------|---------|
34
+ | `torch` | >= 2.8 | PyTorch |
35
+ | `transformers` | >= 4.57.0, < 4.58.0 | Text encoder |
36
+ | `torchvison` | - |Image process|
37
+ | `einops` | - |Tensor manipulation|
38
+
39
+ ### Install the [Pull Request](https://github.com/huggingface/diffusers/pull/13444]) of JoyAI-Image-Edit of diffusers
40
+ ```bash
41
+ pip install git+https://github.com/huggingface/diffusers.git@refs/pull/13444/head
42
+ ```
43
+
44
+ ### Or install from this repo (PR will merge to diffusers main branch soon)
45
+ ```bash
46
+ pip install torch==2.8 transformers==4.57.6 torchvision einops
47
+
48
+ pip install git+https://github.com/Moran232/diffusers.git@joyimage_edit
49
+ ```
50
+
51
+ ### Running with Diffusers
52
+ ```python
53
+ import torch
54
+ from PIL import Image
55
+
56
+ from diffusers import JoyImageEditPipeline
57
+
58
+ pipeline = JoyImageEditPipeline.from_pretrained("jdopensource/JoyAI-Image-Edit-Diffusers")
59
+ pipeline.to(torch.bfloat16)
60
+ pipeline.to("cuda")
61
+ pipeline.set_progress_bar_config(disable=None)
62
+ print("pipeline loaded")
63
+
64
+ img_path = "./test_images/input.png"
65
+ prompt = "Move the board into the red box and finally remove the red box."
66
+
67
+ image = Image.open(img_path).convert("RGB")
68
+ prompts = [f"<|im_start|>user\n<image>\n{prompt}<|im_end|>\n"]
69
+
70
+ inputs = {
71
+ "image": image,
72
+ "prompt": prompts,
73
+ "generator": torch.manual_seed(0),
74
+ "num_inference_steps": 30,
75
+ "guidance_scale": 4.0,
76
+ }
77
+
78
+ print("run pipeline...")
79
+
80
+ with torch.inference_mode():
81
+ output = pipeline(**inputs)
82
+ image = output.images[0]
83
+ image.save("joyai_image_edit_output.png")
84
+ print("image saved.")
85
+
86
+ ```
87
+
88
+
89
+ ## More Usages
90
+
91
+ ### Spatial Editing Reference
92
+
93
+ JoyAI-Image supports three spatial editing prompt patterns: **Object Move**, **Object Rotation**, and **Camera Control**. For the most stable behavior, we recommend following the prompt templates below as closely as possible.
94
+
95
+ #### 1. Object Move
96
+
97
+ Use this pattern when you want to move a target object into a specified region.
98
+
99
+ **Prompt template:**
100
+
101
+ ```text
102
+ Move the <object> into the red box and finally remove the red box.
103
+ ```
104
+
105
+ **Rules:**
106
+
107
+ * Replace `<object>` with a clear description of the target object to be moved.
108
+ * The **red box** indicates the target destination in the image.
109
+ * The phrase **"finally remove the red box"** means the guidance box should not appear in the final edited result.
110
+
111
+ **Example:**
112
+
113
+ ```text
114
+ Move the board into the red box and finally remove the red box.
115
+ ```
116
+ <p align="center">
117
+ <img src="test_images/input1.png" width="40%" />
118
+ <img src="test_images/output1_predicted.png" width="40%" />
119
+ </p>
120
+
121
+ #### 2. Object Rotation
122
+
123
+ Use this pattern when you want to rotate an object to a specific canonical view.
124
+
125
+ **Prompt template:**
126
+
127
+ ```text
128
+ Rotate the <object> to show the <view> side view.
129
+ ```
130
+
131
+ **Supported `<view>` values:**
132
+
133
+ * `front`
134
+ * `right`
135
+ * `left`
136
+ * `rear`
137
+ * `front right`
138
+ * `front left`
139
+ * `rear right`
140
+ * `rear left`
141
+
142
+ **Rules:**
143
+
144
+ * Replace `<object>` with a clear description of the object to rotate.
145
+ * Replace `<view>` with one of the supported directions above.
146
+ * This instruction is intended to change the **object orientation**, while keeping the object identity and surrounding scene as consistent as possible.
147
+
148
+ **Examples:**
149
+
150
+ ```text
151
+ Rotate the dog to show the left side view.
152
+ ```
153
+ <p align="center">
154
+ <img src="test_images/input2.png" width="40%" />
155
+ <img src="test_images/output2_predicted.png" width="40%" />
156
+ </p>
157
+
158
+
159
+ #### 3. Camera Control
160
+
161
+ Use this pattern when you want to change only the camera viewpoint while keeping the 3D scene itself unchanged.
162
+
163
+ **Prompt template:**
164
+
165
+ ```text
166
+ Move the camera.
167
+ - Camera rotation: Yaw {y_rotation}°, Pitch {p_rotation}°.
168
+ - Camera zoom: in/out/unchanged.
169
+ - Keep the 3D scene static; only change the viewpoint.
170
+ ```
171
+
172
+ **Rules:**
173
+
174
+ * `{y_rotation}` specifies the yaw rotation angle in degrees.
175
+ * `{p_rotation}` specifies the pitch rotation angle in degrees.
176
+ * `Camera zoom` must be one of:
177
+
178
+ * `in`
179
+ * `out`
180
+ * `unchanged`
181
+ * The last line is important: it explicitly tells the model to preserve the 3D scene content and geometry, and only adjust the camera viewpoint.
182
+
183
+ **Examples:**
184
+
185
+ ```text
186
+ Move the camera.
187
+ - Camera rotation: Yaw 45°, Pitch 0°.
188
+ - Camera zoom: in.
189
+ - Keep the 3D scene static; only change the viewpoint.
190
+ ```
191
+
192
+ ```text
193
+ Move the camera.
194
+ - Camera rotation: Yaw 0.0°, Pitch -15.0°.
195
+ - Camera zoom: unchanged.
196
+ - Keep the 3D scene static; only change the viewpoint.
197
+ ```
198
+ <p align="center">
199
+ <img src="test_images/input3.png" width="40%" />
200
+ <img src="test_images/output3_predicted.png" width="40%" />
201
+ </p>
202
+
203
+ ## License Agreement
204
+
205
+ JoyAI-Image is licensed under Apache 2.0.
206
+
207
+ ## ☎️ We're Hiring!
208
+
209
+ We are actively hiring Research Scientists, AI Infra Engineers, and Interns to join us in building next-generation generative foundation models and bringing them into real-world applications. If you’re interested, please send your resume to: [huanghaoyang.ocean@jd.com](mailto:huanghaoyang.ocean@jd.com)
210
+