yeq6x commited on
Commit
0b35a03
·
1 Parent(s): 6fda490

カスタムした

Browse files
Files changed (2) hide show
  1. README.md +50 -5
  2. app.py +87 -278
README.md CHANGED
@@ -1,14 +1,59 @@
1
  ---
2
- title: Qwen Image Edit 2511 Fast
3
- emoji: 🏆💨
4
- colorFrom: pink
5
- colorTo: red
6
  sdk: gradio
7
  sdk_version: 6.2.0
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
- short_description: Fast 4 step inference of Qwen Image Edit 2511
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: QIE-Image2GuideBody
3
+ emoji: 🎨✨
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: gradio
7
  sdk_version: 6.2.0
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
+ short_description: Two-stage anime character to guide body converter
12
+ ---
13
+
14
+ # 🎨✨ QIE-Image2GuideBody
15
+
16
+ A two-stage conversion pipeline that transforms anime character images into structured guide body representations.
17
+
18
+ ## Overview
19
+
20
+ This application performs a two-stage conversion process:
21
+
22
+ 1. **Stage 1: Anime Character → Base Body**
23
+ - Converts anime-style character images into base body structure
24
+ - Removes stylistic details while preserving pose and proportions
25
+
26
+ 2. **Stage 2: Base Body → Guide Body**
27
+ - Transforms the base body into a clear guide with structure lines
28
+ - Produces easily understandable skeletal/structural representations
29
+
30
+ ## Technology
31
+
32
+ Built on [Qwen-Image-Edit-2511](https://huggingface.co/Qwen/Qwen-Image-Edit-2511) with:
33
+ - Custom LoRA models for each conversion stage
34
+ - [Qwen-Image-Lightning-2511](https://huggingface.co/lightx2v/Qwen-Image-Edit-2511-Lightning) for fast 4-step inference
35
+
36
+ ## Configuration
37
+
38
+ The application uses environment variables for customization:
39
+
40
+ ### LoRA Settings
41
+ - `STAGE1_LORA_REPO`: Repository for Stage 1 LoRA (anime → base body)
42
+ - `STAGE1_LORA_WEIGHT`: Weight filename for Stage 1 LoRA
43
+ - `STAGE2_LORA_REPO`: Repository for Stage 2 LoRA (base body → guide body)
44
+ - `STAGE2_LORA_WEIGHT`: Weight filename for Stage 2 LoRA
45
+
46
+ ### Prompt Settings
47
+ - `STAGE1_PROMPT`: Prompt for Stage 1 conversion
48
+ - `STAGE2_PROMPT`: Prompt for Stage 2 conversion
49
+
50
+ ## Usage
51
+
52
+ 1. Upload an anime character image
53
+ 2. Click "Convert to Guide Body"
54
+ 3. View the intermediate base body result (Stage 1)
55
+ 4. View the final guide body result (Stage 2)
56
+
57
  ---
58
 
59
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py CHANGED
@@ -11,202 +11,16 @@ from diffusers import FlowMatchEulerDiscreteScheduler, QwenImageEditPlusPipeline
11
  # from qwenimage.transformer_qwenimage import QwenImageTransformer2DModel
12
  # from qwenimage.qwen_fa3_processor import QwenDoubleStreamAttnProcessorFA3
13
 
14
- from huggingface_hub import InferenceClient
15
  import math
16
-
17
  import os
18
- import base64
19
- from io import BytesIO
20
- import json
21
-
22
- SYSTEM_PROMPT = '''
23
- # Edit Instruction Rewriter
24
- You are a professional edit instruction rewriter. Your task is to generate a precise, concise, and visually achievable professional-level edit instruction based on the user-provided instruction and the image to be edited.
25
-
26
- Please strictly follow the rewriting rules below:
27
-
28
- ## 1. General Principles
29
- - Keep the rewritten prompt **concise and comprehensive**. Avoid overly long sentences and unnecessary descriptive language.
30
- - If the instruction is contradictory, vague, or unachievable, prioritize reasonable inference and correction, and supplement details when necessary.
31
- - Keep the main part of the original instruction unchanged, only enhancing its clarity, rationality, and visual feasibility.
32
- - All added objects or modifications must align with the logic and style of the scene in the input images.
33
- - If multiple sub-images are to be generated, describe the content of each sub-image individually.
34
-
35
- ## 2. Task-Type Handling Rules
36
-
37
- ### 1. Add, Delete, Replace Tasks
38
- - If the instruction is clear (already includes task type, target entity, position, quantity, attributes), preserve the original intent and only refine the grammar.
39
- - If the description is vague, supplement with minimal but sufficient details (category, color, size, orientation, position, etc.). For example:
40
- > Original: "Add an animal"
41
- > Rewritten: "Add a light-gray cat in the bottom-right corner, sitting and facing the camera"
42
- - Remove meaningless instructions: e.g., "Add 0 objects" should be ignored or flagged as invalid.
43
- - For replacement tasks, specify "Replace Y with X" and briefly describe the key visual features of X.
44
-
45
- ### 2. Text Editing Tasks
46
- - All text content must be enclosed in English double quotes `" "`. Keep the original language of the text, and keep the capitalization.
47
- - Both adding new text and replacing existing text are text replacement tasks, For example:
48
- - Replace "xx" to "yy"
49
- - Replace the mask / bounding box to "yy"
50
- - Replace the visual object to "yy"
51
- - Specify text position, color, and layout only if user has required.
52
- - If font is specified, keep the original language of the font.
53
-
54
- ### 3. Human Editing Tasks
55
- - Make the smallest changes to the given user's prompt.
56
- - If changes to background, action, expression, camera shot, or ambient lighting are required, please list each modification individually.
57
- - **Edits to makeup or facial features / expression must be subtle, not exaggerated, and must preserve the subject's identity consistency.**
58
- > Original: "Add eyebrows to the face"
59
- > Rewritten: "Slightly thicken the person's eyebrows with little change, look natural."
60
-
61
- ### 4. Style Conversion or Enhancement Tasks
62
- - If a style is specified, describe it concisely using key visual features. For example:
63
- > Original: "Disco style"
64
- > Rewritten: "1970s disco style: flashing lights, disco ball, mirrored walls, vibrant colors"
65
- - For style reference, analyze the original image and extract key characteristics (color, composition, texture, lighting, artistic style, etc.), integrating them into the instruction.
66
- - **Colorization tasks (including old photo restoration) must use the fixed template:**
67
- "Restore and colorize the old photo."
68
- - Clearly specify the object to be modified. For example:
69
- > Original: Modify the subject in Picture 1 to match the style of Picture 2.
70
- > Rewritten: Change the girl in Picture 1 to the ink-wash style of Picture 2 — rendered in black-and-white watercolor with soft color transitions.
71
-
72
- ### 5. Material Replacement
73
- - Clearly specify the object and the material. For example: "Change the material of the apple to papercut style."
74
- - For text material replacement, use the fixed template:
75
- "Change the material of text "xxxx" to laser style"
76
-
77
- ### 6. Logo/Pattern Editing
78
- - Material replacement should preserve the original shape and structure as much as possible. For example:
79
- > Original: "Convert to sapphire material"
80
- > Rewritten: "Convert the main subject in the image to sapphire material, preserving similar shape and structure"
81
- - When migrating logos/patterns to new scenes, ensure shape and structure consistency. For example:
82
- > Original: "Migrate the logo in the image to a new scene"
83
- > Rewritten: "Migrate the logo in the image to a new scene, preserving similar shape and structure"
84
-
85
- ### 7. Multi-Image Tasks
86
- - Rewritten prompts must clearly point out which image's element is being modified. For example:
87
- > Original: "Replace the subject of picture 1 with the subject of picture 2"
88
- > Rewritten: "Replace the girl of picture 1 with the boy of picture 2, keeping picture 2's background unchanged"
89
- - For stylization tasks, describe the reference image's style in the rewritten prompt, while preserving the visual content of the source image.
90
-
91
- ## 3. Rationale and Logic Check
92
- - Resolve contradictory instructions: e.g., "Remove all trees but keep all trees" requires logical correction.
93
- - Supplement missing critical information: e.g., if position is unspecified, choose a reasonable area based on composition (near subject, blank space, center/edge, etc.).
94
-
95
- # Output Format Example
96
- ```json
97
- {
98
- "Rewritten": "..."
99
- }
100
- '''
101
 
102
- def polish_prompt_hf(original_prompt, img_list):
103
- """
104
- Rewrites the prompt using a Hugging Face InferenceClient.
105
- Supports multiple images via img_list.
106
- """
107
- # Ensure HF_TOKEN is set
108
- api_key = os.environ.get("inference_providers")
109
- if not api_key:
110
- print("Warning: HF_TOKEN not set. Falling back to original prompt.")
111
- return original_prompt
112
- prompt = f"{SYSTEM_PROMPT}\n\nUser Input: {original_prompt}\n\nRewritten Prompt:"
113
- system_prompt = "you are a helpful assistant, you should provide useful answers to users."
114
- try:
115
- # Initialize the client
116
- client = InferenceClient(
117
- provider="nebius",
118
- api_key=api_key,
119
- )
120
-
121
- # Convert list of images to base64 data URLs
122
- image_urls = []
123
- if img_list is not None:
124
- # Ensure img_list is actually a list
125
- if not isinstance(img_list, list):
126
- img_list = [img_list]
127
-
128
- for img in img_list:
129
- image_url = None
130
- # If img is a PIL Image
131
- if hasattr(img, 'save'): # Check if it's a PIL Image
132
- buffered = BytesIO()
133
- img.save(buffered, format="PNG")
134
- img_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')
135
- image_url = f"data:image/png;base64,{img_base64}"
136
- # If img is already a file path (string)
137
- elif isinstance(img, str):
138
- with open(img, "rb") as image_file:
139
- img_base64 = base64.b64encode(image_file.read()).decode('utf-8')
140
- image_url = f"data:image/png;base64,{img_base64}"
141
- else:
142
- print(f"Warning: Unexpected image type: {type(img)}, skipping...")
143
- continue
144
-
145
- if image_url:
146
- image_urls.append(image_url)
147
-
148
- # Build the content array with text first, then all images
149
- content = [
150
- {
151
- "type": "text",
152
- "text": prompt
153
- }
154
- ]
155
-
156
- # Add all images to the content
157
- for image_url in image_urls:
158
- content.append({
159
- "type": "image_url",
160
- "image_url": {
161
- "url": image_url
162
- }
163
- })
164
-
165
- # Format the messages for the chat completions API
166
- messages = [
167
- {"role": "system", "content": system_prompt},
168
- {
169
- "role": "user",
170
- "content": content
171
- }
172
- ]
173
-
174
- # Call the API
175
- completion = client.chat.completions.create(
176
- model="Qwen/Qwen2.5-VL-72B-Instruct",
177
- messages=messages,
178
- )
179
-
180
- # Parse the response
181
- result = completion.choices[0].message.content
182
-
183
- # Try to extract JSON if present
184
- if '"Rewritten"' in result:
185
- try:
186
- # Clean up the response
187
- result = result.replace('```json', '').replace('```', '')
188
- result_json = json.loads(result)
189
- polished_prompt = result_json.get('Rewritten', result)
190
- except:
191
- polished_prompt = result
192
- else:
193
- polished_prompt = result
194
-
195
- polished_prompt = polished_prompt.strip().replace("\n", " ")
196
- return polished_prompt
197
-
198
- except Exception as e:
199
- print(f"Error during API call to Hugging Face: {e}")
200
- # Fallback to original prompt if enhancement fails
201
- return original_prompt
202
-
203
-
204
-
205
- def encode_image(pil_image):
206
- import io
207
- buffered = io.BytesIO()
208
- pil_image.save(buffered, format="PNG")
209
- return base64.b64encode(buffered.getvalue()).decode("utf-8")
210
 
211
  # --- Model Loading ---
212
  dtype = torch.bfloat16
@@ -233,15 +47,29 @@ scheduler_config = {
233
  # Initialize scheduler with Lightning config
234
  scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)
235
 
236
- # Load the model pipeline
237
- pipe = QwenImageEditPlusPipeline.from_pretrained("Qwen/Qwen-Image-Edit-2511",
 
 
 
 
 
 
 
 
 
 
 
 
238
  scheduler=scheduler,
239
  torch_dtype=dtype).to(device)
240
- pipe.load_lora_weights(
241
- "lightx2v/Qwen-Image-Edit-2511-Lightning",
242
  weight_name="Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16.safetensors"
243
  )
244
- pipe.fuse_lora()
 
 
245
 
246
  # # Apply the same optimizations from the first version
247
  # pipe.transformer.__class__ = QwenImageTransformer2DModel
@@ -250,59 +78,47 @@ pipe.fuse_lora()
250
  # # --- Ahead-of-time compilation ---
251
  # optimize_pipeline_(pipe, image=[Image.new("RGB", (1024, 1024)), Image.new("RGB", (1024, 1024))], prompt="prompt")
252
 
253
- # --- UI Constants and Helpers ---
254
  MAX_SEED = np.iinfo(np.int32).max
255
 
256
- def use_output_as_input(output_images):
257
- """Convert output images to input format for the gallery"""
258
- if output_images is None or len(output_images) == 0:
259
- return []
260
- return output_images
261
-
262
- # --- Main Inference Function (with hardcoded negative prompt) ---
263
  @spaces.GPU()
264
  def infer(
265
  images,
266
- prompt,
267
  seed=42,
268
  randomize_seed=False,
269
  true_guidance_scale=1.0,
270
  num_inference_steps=4,
271
  height=None,
272
  width=None,
273
- rewrite_prompt=True,
274
- num_images_per_prompt=1,
275
  progress=gr.Progress(track_tqdm=True),
276
  ):
277
  """
278
- Run image-editing inference using the Qwen-Image-Edit pipeline.
279
 
280
  Parameters:
281
  images (list): Input images from the Gradio gallery (PIL or path-based).
282
- prompt (str): Editing instruction (may be rewritten by LLM if enabled).
283
  seed (int): Random seed for reproducibility.
284
  randomize_seed (bool): If True, overrides seed with a random value.
285
  true_guidance_scale (float): CFG scale used by Qwen-Image.
286
  num_inference_steps (int): Number of diffusion steps.
287
  height (int | None): Optional output height override.
288
  width (int | None): Optional output width override.
289
- rewrite_prompt (bool): Whether to rewrite the prompt using Qwen-2.5-VL.
290
- num_images_per_prompt (int): Number of images to generate.
291
  progress: Gradio progress callback.
292
 
293
  Returns:
294
- tuple: (generated_images, seed_used, UI_visibility_update)
295
  """
296
-
297
- # Hardcode the negative prompt as requested
298
  negative_prompt = " "
299
-
300
  if randomize_seed:
301
  seed = random.randint(0, MAX_SEED)
302
 
303
  # Set up the generator for reproducibility
304
  generator = torch.Generator(device=device).manual_seed(seed)
305
-
306
  # Load input images into PIL Images
307
  pil_images = []
308
  if images is not None:
@@ -319,29 +135,45 @@ def infer(
319
 
320
  if height==256 and width==256:
321
  height, width = None, None
322
- print(f"Calling pipeline with prompt: '{prompt}'")
323
- print(f"Negative Prompt: '{negative_prompt}'")
 
 
324
  print(f"Seed: {seed}, Steps: {num_inference_steps}, Guidance: {true_guidance_scale}, Size: {width}x{height}")
325
- if rewrite_prompt and len(pil_images) > 0:
326
- prompt = polish_prompt_hf(prompt, pil_images)
327
- print(f"Rewritten Prompt: {prompt}")
328
-
329
 
330
- # Generate the image
331
- image = pipe(
332
  image=pil_images if len(pil_images) > 0 else None,
333
- prompt=prompt,
334
  height=height,
335
  width=width,
336
  negative_prompt=negative_prompt,
337
  num_inference_steps=num_inference_steps,
338
  generator=generator,
339
  true_cfg_scale=true_guidance_scale,
340
- num_images_per_prompt=num_images_per_prompt,
341
  ).images
342
 
343
- # Return images, seed, and make button visible
344
- return image, seed, gr.update(visible=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
345
 
346
  # --- Examples and UI Layout ---
347
  examples = []
@@ -349,54 +181,47 @@ examples = []
349
  css = """
350
  #col-container {
351
  margin: 0 auto;
352
- max-width: 1024px;
353
  }
354
  #logo-title {
355
  text-align: center;
356
  }
357
- #logo-title img {
358
- width: 400px;
359
- }
360
- #edit_text{margin-top: -62px !important}
361
  """
362
 
363
  with gr.Blocks(css=css) as demo:
364
  with gr.Column(elem_id="col-container"):
365
  gr.HTML("""
366
  <div id="logo-title">
367
- <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_edit_logo.png" alt="Qwen-Image Edit Logo" width="400" style="display: block; margin: 0 auto;">
368
- <h2 style="font-style: italic;color: #5b47d1;margin-top: -27px !important;margin-left: 96px">[Plus] Fast, 4-steps with LightX2V LoRA</h2>
369
  </div>
370
  """)
371
  gr.Markdown("""
372
- [Learn more](https://github.com/QwenLM/Qwen-Image) about the Qwen-Image series.
373
- This demo uses the new [Qwen-Image-Edit-2511](https://huggingface.co/Qwen/Qwen-Image-Edit-2511) with the [Qwen-Image-Lightning-2511](https://huggingface.co/lightx2v/Qwen-Image-Edit-2511-Lightning) LoRA for accelerated inference.
374
- Try on [Qwen Chat](https://chat.qwen.ai/), or [download model](https://huggingface.co/Qwen/Qwen-Image-Edit-2509) to run locally with ComfyUI or diffusers.
 
375
  """)
 
376
  with gr.Row():
377
- with gr.Column():
378
- input_images = gr.Gallery(label="Input Images",
379
- show_label=False,
380
- type="pil",
 
381
  interactive=True)
382
 
383
- with gr.Column():
384
- result = gr.Gallery(label="Result", show_label=False, type="pil", interactive=False)
385
- # Add this button right after the result gallery - initially hidden
386
- use_output_btn = gr.Button("↗️ Use as input", variant="secondary", size="sm", visible=False)
387
 
388
- with gr.Row():
389
- prompt = gr.Text(
390
- label="Prompt",
391
- show_label=False,
392
- placeholder="describe the edit instruction",
393
- container=False,
394
- )
395
- run_button = gr.Button("Edit!", variant="primary")
396
 
397
- with gr.Accordion("Advanced Settings", open=False):
398
- # Negative prompt UI element is removed here
399
 
 
400
  seed = gr.Slider(
401
  label="Seed",
402
  minimum=0,
@@ -408,7 +233,6 @@ with gr.Blocks(css=css) as demo:
408
  randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
409
 
410
  with gr.Row():
411
-
412
  true_guidance_scale = gr.Slider(
413
  label="True guidance scale",
414
  minimum=1.0,
@@ -424,7 +248,7 @@ with gr.Blocks(css=css) as demo:
424
  step=1,
425
  value=4,
426
  )
427
-
428
  height = gr.Slider(
429
  label="Height",
430
  minimum=256,
@@ -432,7 +256,7 @@ with gr.Blocks(css=css) as demo:
432
  step=8,
433
  value=None,
434
  )
435
-
436
  width = gr.Slider(
437
  label="Width",
438
  minimum=256,
@@ -440,34 +264,19 @@ with gr.Blocks(css=css) as demo:
440
  step=8,
441
  value=None,
442
  )
443
-
444
-
445
- rewrite_prompt = gr.Checkbox(label="Rewrite prompt", value=True)
446
-
447
- # gr.Examples(examples=examples, inputs=[prompt], outputs=[result, seed], fn=infer, cache_examples=False)
448
 
449
- gr.on(
450
- triggers=[run_button.click, prompt.submit],
451
  fn=infer,
452
  inputs=[
453
  input_images,
454
- prompt,
455
  seed,
456
  randomize_seed,
457
  true_guidance_scale,
458
  num_inference_steps,
459
  height,
460
  width,
461
- rewrite_prompt,
462
  ],
463
- outputs=[result, seed, use_output_btn], # Added use_output_btn to outputs
464
- )
465
-
466
- # Add the new event handler for the "Use Output as Input" button
467
- use_output_btn.click(
468
- fn=use_output_as_input,
469
- inputs=[result],
470
- outputs=[input_images]
471
  )
472
 
473
  if __name__ == "__main__":
 
11
  # from qwenimage.transformer_qwenimage import QwenImageTransformer2DModel
12
  # from qwenimage.qwen_fa3_processor import QwenDoubleStreamAttnProcessorFA3
13
 
 
14
  import math
 
15
  import os
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
+ # --- Environment Variables for LoRA and Prompts ---
18
+ STAGE1_LORA_REPO = os.environ.get("STAGE1_LORA_REPO", "default/stage1-lora")
19
+ STAGE1_LORA_WEIGHT = os.environ.get("STAGE1_LORA_WEIGHT", "stage1.safetensors")
20
+ STAGE2_LORA_REPO = os.environ.get("STAGE2_LORA_REPO", "default/stage2-lora")
21
+ STAGE2_LORA_WEIGHT = os.environ.get("STAGE2_LORA_WEIGHT", "stage2.safetensors")
22
+ STAGE1_PROMPT = os.environ.get("STAGE1_PROMPT", "Convert anime character to base body structure")
23
+ STAGE2_PROMPT = os.environ.get("STAGE2_PROMPT", "Convert base body to clear guide body with structure lines")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  # --- Model Loading ---
26
  dtype = torch.bfloat16
 
47
  # Initialize scheduler with Lightning config
48
  scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)
49
 
50
+ # Load Stage 1 pipeline (Anime -> Base Body)
51
+ pipe_stage1 = QwenImageEditPlusPipeline.from_pretrained("Qwen/Qwen-Image-Edit-2511",
52
+ scheduler=scheduler,
53
+ torch_dtype=dtype).to(device)
54
+ pipe_stage1.load_lora_weights(
55
+ "lightx2v/Qwen-Image-Edit-2511-Lightning",
56
+ weight_name="Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16.safetensors"
57
+ )
58
+ pipe_stage1.load_lora_weights(STAGE1_LORA_REPO, weight_name=STAGE1_LORA_WEIGHT, adapter_name="stage1")
59
+ pipe_stage1.set_adapters(["default", "stage1"], adapter_weights=[1.0, 1.0])
60
+ pipe_stage1.fuse_lora()
61
+
62
+ # Load Stage 2 pipeline (Base Body -> Guide Body)
63
+ pipe_stage2 = QwenImageEditPlusPipeline.from_pretrained("Qwen/Qwen-Image-Edit-2511",
64
  scheduler=scheduler,
65
  torch_dtype=dtype).to(device)
66
+ pipe_stage2.load_lora_weights(
67
+ "lightx2v/Qwen-Image-Edit-2511-Lightning",
68
  weight_name="Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16.safetensors"
69
  )
70
+ pipe_stage2.load_lora_weights(STAGE2_LORA_REPO, weight_name=STAGE2_LORA_WEIGHT, adapter_name="stage2")
71
+ pipe_stage2.set_adapters(["default", "stage2"], adapter_weights=[1.0, 1.0])
72
+ pipe_stage2.fuse_lora()
73
 
74
  # # Apply the same optimizations from the first version
75
  # pipe.transformer.__class__ = QwenImageTransformer2DModel
 
78
  # # --- Ahead-of-time compilation ---
79
  # optimize_pipeline_(pipe, image=[Image.new("RGB", (1024, 1024)), Image.new("RGB", (1024, 1024))], prompt="prompt")
80
 
81
+ # --- UI Constants ---
82
  MAX_SEED = np.iinfo(np.int32).max
83
 
84
+ # --- Main Inference Function (Two-Stage Conversion) ---
 
 
 
 
 
 
85
  @spaces.GPU()
86
  def infer(
87
  images,
 
88
  seed=42,
89
  randomize_seed=False,
90
  true_guidance_scale=1.0,
91
  num_inference_steps=4,
92
  height=None,
93
  width=None,
 
 
94
  progress=gr.Progress(track_tqdm=True),
95
  ):
96
  """
97
+ Run two-stage image conversion: Anime Character -> Base Body -> Guide Body.
98
 
99
  Parameters:
100
  images (list): Input images from the Gradio gallery (PIL or path-based).
 
101
  seed (int): Random seed for reproducibility.
102
  randomize_seed (bool): If True, overrides seed with a random value.
103
  true_guidance_scale (float): CFG scale used by Qwen-Image.
104
  num_inference_steps (int): Number of diffusion steps.
105
  height (int | None): Optional output height override.
106
  width (int | None): Optional output width override.
 
 
107
  progress: Gradio progress callback.
108
 
109
  Returns:
110
+ tuple: (stage1_images, stage2_images, seed_used)
111
  """
112
+
113
+ # Hardcode the negative prompt
114
  negative_prompt = " "
115
+
116
  if randomize_seed:
117
  seed = random.randint(0, MAX_SEED)
118
 
119
  # Set up the generator for reproducibility
120
  generator = torch.Generator(device=device).manual_seed(seed)
121
+
122
  # Load input images into PIL Images
123
  pil_images = []
124
  if images is not None:
 
135
 
136
  if height==256 and width==256:
137
  height, width = None, None
138
+
139
+ # Stage 1: Anime Character -> Base Body
140
+ print(f"[Stage 1] Converting to base body...")
141
+ print(f"Prompt: '{STAGE1_PROMPT}'")
142
  print(f"Seed: {seed}, Steps: {num_inference_steps}, Guidance: {true_guidance_scale}, Size: {width}x{height}")
 
 
 
 
143
 
144
+ stage1_images = pipe_stage1(
 
145
  image=pil_images if len(pil_images) > 0 else None,
146
+ prompt=STAGE1_PROMPT,
147
  height=height,
148
  width=width,
149
  negative_prompt=negative_prompt,
150
  num_inference_steps=num_inference_steps,
151
  generator=generator,
152
  true_cfg_scale=true_guidance_scale,
153
+ num_images_per_prompt=1,
154
  ).images
155
 
156
+ # Stage 2: Base Body -> Guide Body
157
+ print(f"[Stage 2] Converting to guide body...")
158
+ print(f"Prompt: '{STAGE2_PROMPT}'")
159
+
160
+ # Use same seed for stage 2
161
+ generator = torch.Generator(device=device).manual_seed(seed)
162
+
163
+ stage2_images = pipe_stage2(
164
+ image=stage1_images,
165
+ prompt=STAGE2_PROMPT,
166
+ height=height,
167
+ width=width,
168
+ negative_prompt=negative_prompt,
169
+ num_inference_steps=num_inference_steps,
170
+ generator=generator,
171
+ true_cfg_scale=true_guidance_scale,
172
+ num_images_per_prompt=1,
173
+ ).images
174
+
175
+ # Return stage1 (base body), stage2 (guide body), and seed
176
+ return stage1_images, stage2_images, seed
177
 
178
  # --- Examples and UI Layout ---
179
  examples = []
 
181
  css = """
182
  #col-container {
183
  margin: 0 auto;
184
+ max-width: 1600px;
185
  }
186
  #logo-title {
187
  text-align: center;
188
  }
 
 
 
 
189
  """
190
 
191
  with gr.Blocks(css=css) as demo:
192
  with gr.Column(elem_id="col-container"):
193
  gr.HTML("""
194
  <div id="logo-title">
195
+ <h1>🎨✨ QIE-Image2GuideBody</h1>
196
+ <h3 style="color: #5b47d1;">Anime Character Base Body Guide Body Converter</h3>
197
  </div>
198
  """)
199
  gr.Markdown("""
200
+ Two-stage conversion pipeline powered by [Qwen-Image-Edit-2511](https://huggingface.co/Qwen/Qwen-Image-Edit-2511) with custom LoRAs.
201
+
202
+ **Stage 1:** Converts anime characters to base body structure
203
+ **Stage 2:** Converts base body to clear guide body with structure lines
204
  """)
205
+
206
  with gr.Row():
207
+ with gr.Column(scale=1):
208
+ gr.Markdown("### 1️⃣ Input (Anime Character)")
209
+ input_images = gr.Gallery(label="Input Images",
210
+ show_label=False,
211
+ type="pil",
212
  interactive=True)
213
 
214
+ with gr.Column(scale=1):
215
+ gr.Markdown("### 2️⃣ Stage 1 (Base Body)")
216
+ stage1_result = gr.Gallery(label="Base Body", show_label=False, type="pil", interactive=False)
 
217
 
218
+ with gr.Column(scale=1):
219
+ gr.Markdown("### 3️⃣ Stage 2 (Guide Body)")
220
+ stage2_result = gr.Gallery(label="Guide Body", show_label=False, type="pil", interactive=False)
 
 
 
 
 
221
 
222
+ run_button = gr.Button("🚀 Convert to Guide Body", variant="primary", size="lg")
 
223
 
224
+ with gr.Accordion("Advanced Settings", open=False):
225
  seed = gr.Slider(
226
  label="Seed",
227
  minimum=0,
 
233
  randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
234
 
235
  with gr.Row():
 
236
  true_guidance_scale = gr.Slider(
237
  label="True guidance scale",
238
  minimum=1.0,
 
248
  step=1,
249
  value=4,
250
  )
251
+
252
  height = gr.Slider(
253
  label="Height",
254
  minimum=256,
 
256
  step=8,
257
  value=None,
258
  )
259
+
260
  width = gr.Slider(
261
  label="Width",
262
  minimum=256,
 
264
  step=8,
265
  value=None,
266
  )
 
 
 
 
 
267
 
268
+ run_button.click(
 
269
  fn=infer,
270
  inputs=[
271
  input_images,
 
272
  seed,
273
  randomize_seed,
274
  true_guidance_scale,
275
  num_inference_steps,
276
  height,
277
  width,
 
278
  ],
279
+ outputs=[stage1_result, stage2_result, seed],
 
 
 
 
 
 
 
280
  )
281
 
282
  if __name__ == "__main__":