QIE-Image2GuideBody

Running on Zero

App Files Files Community

yeq6x commited on 6 days ago

Commit

0b35a03

1 Parent(s): 6fda490

カスタムした

Browse files

Files changed (2) hide show

README.md +50 -5
app.py +87 -278

README.md CHANGED Viewed

@@ -1,14 +1,59 @@
 ---
-title: Qwen Image Edit 2511 Fast
-emoji: 🏆💨
-colorFrom: pink
-colorTo: red
 sdk: gradio
 sdk_version: 6.2.0
 app_file: app.py
 pinned: false
 license: apache-2.0
-short_description: Fast 4 step inference of Qwen Image Edit 2511
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: QIE-Image2GuideBody
+emoji: 🎨✨
+colorFrom: blue
+colorTo: purple
 sdk: gradio
 sdk_version: 6.2.0
 app_file: app.py
 pinned: false
 license: apache-2.0
+short_description: Two-stage anime character to guide body converter
+---
+# 🎨✨ QIE-Image2GuideBody
+A two-stage conversion pipeline that transforms anime character images into structured guide body representations.
+## Overview
+This application performs a two-stage conversion process:
+1. **Stage 1: Anime Character → Base Body**
+   - Converts anime-style character images into base body structure
+   - Removes stylistic details while preserving pose and proportions
+2. **Stage 2: Base Body → Guide Body**
+   - Transforms the base body into a clear guide with structure lines
+   - Produces easily understandable skeletal/structural representations
+## Technology
+Built on [Qwen-Image-Edit-2511](https://huggingface.co/Qwen/Qwen-Image-Edit-2511) with:
+- Custom LoRA models for each conversion stage
+- [Qwen-Image-Lightning-2511](https://huggingface.co/lightx2v/Qwen-Image-Edit-2511-Lightning) for fast 4-step inference
+## Configuration
+The application uses environment variables for customization:
+### LoRA Settings
+- `STAGE1_LORA_REPO`: Repository for Stage 1 LoRA (anime → base body)
+- `STAGE1_LORA_WEIGHT`: Weight filename for Stage 1 LoRA
+- `STAGE2_LORA_REPO`: Repository for Stage 2 LoRA (base body → guide body)
+- `STAGE2_LORA_WEIGHT`: Weight filename for Stage 2 LoRA
+### Prompt Settings
+- `STAGE1_PROMPT`: Prompt for Stage 1 conversion
+- `STAGE2_PROMPT`: Prompt for Stage 2 conversion
+## Usage
+1. Upload an anime character image
+2. Click "Convert to Guide Body"
+3. View the intermediate base body result (Stage 1)
+4. View the final guide body result (Stage 2)
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -11,202 +11,16 @@ from diffusers import FlowMatchEulerDiscreteScheduler, QwenImageEditPlusPipeline
 # from qwenimage.transformer_qwenimage import QwenImageTransformer2DModel
 # from qwenimage.qwen_fa3_processor import QwenDoubleStreamAttnProcessorFA3
-from huggingface_hub import InferenceClient
 import math
 import os
-import base64
-from io import BytesIO
-import json
-SYSTEM_PROMPT = '''
-# Edit Instruction Rewriter
-You are a professional edit instruction rewriter. Your task is to generate a precise, concise, and visually achievable professional-level edit instruction based on the user-provided instruction and the image to be edited.
-Please strictly follow the rewriting rules below:
-## 1. General Principles
-- Keep the rewritten prompt **concise and comprehensive**. Avoid overly long sentences and unnecessary descriptive language.
-- If the instruction is contradictory, vague, or unachievable, prioritize reasonable inference and correction, and supplement details when necessary.
-- Keep the main part of the original instruction unchanged, only enhancing its clarity, rationality, and visual feasibility.
-- All added objects or modifications must align with the logic and style of the scene in the input images.
-- If multiple sub-images are to be generated, describe the content of each sub-image individually.
-## 2. Task-Type Handling Rules
-### 1. Add, Delete, Replace Tasks
-- If the instruction is clear (already includes task type, target entity, position, quantity, attributes), preserve the original intent and only refine the grammar.
-- If the description is vague, supplement with minimal but sufficient details (category, color, size, orientation, position, etc.). For example:
-    > Original: "Add an animal"
-    > Rewritten: "Add a light-gray cat in the bottom-right corner, sitting and facing the camera"
-- Remove meaningless instructions: e.g., "Add 0 objects" should be ignored or flagged as invalid.
-- For replacement tasks, specify "Replace Y with X" and briefly describe the key visual features of X.
-### 2. Text Editing Tasks
-- All text content must be enclosed in English double quotes `" "`. Keep the original language of the text, and keep the capitalization.
-- Both adding new text and replacing existing text are text replacement tasks, For example:
-    - Replace "xx" to "yy"
-    - Replace the mask / bounding box to "yy"
-    - Replace the visual object to "yy"
-- Specify text position, color, and layout only if user has required.
-- If font is specified, keep the original language of the font.
-### 3. Human Editing Tasks
-- Make the smallest changes to the given user's prompt.
-- If changes to background, action, expression, camera shot, or ambient lighting are required, please list each modification individually.
-- **Edits to makeup or facial features / expression must be subtle, not exaggerated, and must preserve the subject's identity consistency.**
-    > Original: "Add eyebrows to the face"
-    > Rewritten: "Slightly thicken the person's eyebrows with little change, look natural."
-### 4. Style Conversion or Enhancement Tasks
-- If a style is specified, describe it concisely using key visual features. For example:
-    > Original: "Disco style"
-    > Rewritten: "1970s disco style: flashing lights, disco ball, mirrored walls, vibrant colors"
-- For style reference, analyze the original image and extract key characteristics (color, composition, texture, lighting, artistic style, etc.), integrating them into the instruction.
-- **Colorization tasks (including old photo restoration) must use the fixed template:**
-  "Restore and colorize the old photo."
-- Clearly specify the object to be modified. For example:
-    > Original: Modify the subject in Picture 1 to match the style of Picture 2.
-    > Rewritten: Change the girl in Picture 1 to the ink-wash style of Picture 2 — rendered in black-and-white watercolor with soft color transitions.
-### 5. Material Replacement
-- Clearly specify the object and the material. For example: "Change the material of the apple to papercut style."
-- For text material replacement, use the fixed template:
-    "Change the material of text "xxxx" to laser style"
-### 6. Logo/Pattern Editing
-- Material replacement should preserve the original shape and structure as much as possible. For example:
-   > Original: "Convert to sapphire material"
-   > Rewritten: "Convert the main subject in the image to sapphire material, preserving similar shape and structure"
-- When migrating logos/patterns to new scenes, ensure shape and structure consistency. For example:
-   > Original: "Migrate the logo in the image to a new scene"
-   > Rewritten: "Migrate the logo in the image to a new scene, preserving similar shape and structure"
-### 7. Multi-Image Tasks
-- Rewritten prompts must clearly point out which image's element is being modified. For example:
-    > Original: "Replace the subject of picture 1 with the subject of picture 2"
-    > Rewritten: "Replace the girl of picture 1 with the boy of picture 2, keeping picture 2's background unchanged"
-- For stylization tasks, describe the reference image's style in the rewritten prompt, while preserving the visual content of the source image.
-## 3. Rationale and Logic Check
-- Resolve contradictory instructions: e.g., "Remove all trees but keep all trees" requires logical correction.
-- Supplement missing critical information: e.g., if position is unspecified, choose a reasonable area based on composition (near subject, blank space, center/edge, etc.).
-# Output Format Example
-```json
-{
-   "Rewritten": "..."
-}
-'''
-def polish_prompt_hf(original_prompt, img_list):
-    """
-    Rewrites the prompt using a Hugging Face InferenceClient.
-    Supports multiple images via img_list.
-    """
-    # Ensure HF_TOKEN is set
-    api_key = os.environ.get("inference_providers")
-    if not api_key:
-        print("Warning: HF_TOKEN not set. Falling back to original prompt.")
-        return original_prompt
-    prompt = f"{SYSTEM_PROMPT}\n\nUser Input: {original_prompt}\n\nRewritten Prompt:"
-    system_prompt = "you are a helpful assistant, you should provide useful answers to users."
-    try:
-        # Initialize the client
-        client = InferenceClient(
-            provider="nebius",
-            api_key=api_key,
-        )
-        # Convert list of images to base64 data URLs
-        image_urls = []
-        if img_list is not None:
-            # Ensure img_list is actually a list
-            if not isinstance(img_list, list):
-                img_list = [img_list]
-            for img in img_list:
-                image_url = None
-                # If img is a PIL Image
-                if hasattr(img, 'save'):  # Check if it's a PIL Image
-                    buffered = BytesIO()
-                    img.save(buffered, format="PNG")
-                    img_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')
-                    image_url = f"data:image/png;base64,{img_base64}"
-                # If img is already a file path (string)
-                elif isinstance(img, str):
-                    with open(img, "rb") as image_file:
-                        img_base64 = base64.b64encode(image_file.read()).decode('utf-8')
-                    image_url = f"data:image/png;base64,{img_base64}"
-                else:
-                    print(f"Warning: Unexpected image type: {type(img)}, skipping...")
-                    continue
-                if image_url:
-                    image_urls.append(image_url)
-        # Build the content array with text first, then all images
-        content = [
-            {
-                "type": "text",
-                "text": prompt
-            }
-        ]
-        # Add all images to the content
-        for image_url in image_urls:
-            content.append({
-                "type": "image_url",
-                "image_url": {
-                    "url": image_url
-                }
-            })
-        # Format the messages for the chat completions API
-        messages = [
-            {"role": "system", "content": system_prompt},
-            {
-                "role": "user",
-                "content": content
-            }
-        ]
-        # Call the API
-        completion = client.chat.completions.create(
-            model="Qwen/Qwen2.5-VL-72B-Instruct",
-            messages=messages,
-        )
-        # Parse the response
-        result = completion.choices[0].message.content
-        # Try to extract JSON if present
-        if '"Rewritten"' in result:
-            try:
-                # Clean up the response
-                result = result.replace('```json', '').replace('```', '')
-                result_json = json.loads(result)
-                polished_prompt = result_json.get('Rewritten', result)
-            except:
-                polished_prompt = result
-        else:
-            polished_prompt = result
-        polished_prompt = polished_prompt.strip().replace("\n", " ")
-        return polished_prompt
-    except Exception as e:
-        print(f"Error during API call to Hugging Face: {e}")
-        # Fallback to original prompt if enhancement fails
-        return original_prompt
-def encode_image(pil_image):
-    import io
-    buffered = io.BytesIO()
-    pil_image.save(buffered, format="PNG")
-    return base64.b64encode(buffered.getvalue()).decode("utf-8")
 # --- Model Loading ---
 dtype = torch.bfloat16
@@ -233,15 +47,29 @@ scheduler_config = {
 # Initialize scheduler with Lightning config
 scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)
-# Load the model pipeline
-pipe = QwenImageEditPlusPipeline.from_pretrained("Qwen/Qwen-Image-Edit-2511",
                                                  scheduler=scheduler,
                                                  torch_dtype=dtype).to(device)
-pipe.load_lora_weights(
-        "lightx2v/Qwen-Image-Edit-2511-Lightning",
         weight_name="Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16.safetensors"
 )
-pipe.fuse_lora()
 # # Apply the same optimizations from the first version
 # pipe.transformer.__class__ = QwenImageTransformer2DModel
@@ -250,59 +78,47 @@ pipe.fuse_lora()
 # # --- Ahead-of-time compilation ---
 # optimize_pipeline_(pipe, image=[Image.new("RGB", (1024, 1024)), Image.new("RGB", (1024, 1024))], prompt="prompt")
-# --- UI Constants and Helpers ---
 MAX_SEED = np.iinfo(np.int32).max
-def use_output_as_input(output_images):
-    """Convert output images to input format for the gallery"""
-    if output_images is None or len(output_images) == 0:
-        return []
-    return output_images
-# --- Main Inference Function (with hardcoded negative prompt) ---
 @spaces.GPU()
 def infer(
     images,
-    prompt,
     seed=42,
     randomize_seed=False,
     true_guidance_scale=1.0,
     num_inference_steps=4,
     height=None,
     width=None,
-    rewrite_prompt=True,
-    num_images_per_prompt=1,
     progress=gr.Progress(track_tqdm=True),
 ):
     """
-    Run image-editing inference using the Qwen-Image-Edit pipeline.
     Parameters:
         images (list): Input images from the Gradio gallery (PIL or path-based).
-        prompt (str): Editing instruction (may be rewritten by LLM if enabled).
         seed (int): Random seed for reproducibility.
         randomize_seed (bool): If True, overrides seed with a random value.
         true_guidance_scale (float): CFG scale used by Qwen-Image.
         num_inference_steps (int): Number of diffusion steps.
         height (int | None): Optional output height override.
         width (int | None): Optional output width override.
-        rewrite_prompt (bool): Whether to rewrite the prompt using Qwen-2.5-VL.
-        num_images_per_prompt (int): Number of images to generate.
         progress: Gradio progress callback.
     Returns:
-        tuple: (generated_images, seed_used, UI_visibility_update)
     """
-    # Hardcode the negative prompt as requested
     negative_prompt = " "
     if randomize_seed:
         seed = random.randint(0, MAX_SEED)
     # Set up the generator for reproducibility
     generator = torch.Generator(device=device).manual_seed(seed)
     # Load input images into PIL Images
     pil_images = []
     if images is not None:
@@ -319,29 +135,45 @@ def infer(
     if height==256 and width==256:
         height, width = None, None
-    print(f"Calling pipeline with prompt: '{prompt}'")
-    print(f"Negative Prompt: '{negative_prompt}'")
     print(f"Seed: {seed}, Steps: {num_inference_steps}, Guidance: {true_guidance_scale}, Size: {width}x{height}")
-    if rewrite_prompt and len(pil_images) > 0:
-        prompt = polish_prompt_hf(prompt, pil_images)
-        print(f"Rewritten Prompt: {prompt}")
-    # Generate the image
-    image = pipe(
         image=pil_images if len(pil_images) > 0 else None,
-        prompt=prompt,
         height=height,
         width=width,
         negative_prompt=negative_prompt,
         num_inference_steps=num_inference_steps,
         generator=generator,
         true_cfg_scale=true_guidance_scale,
-        num_images_per_prompt=num_images_per_prompt,
     ).images
-    # Return images, seed, and make button visible
-    return image, seed, gr.update(visible=True)
 # --- Examples and UI Layout ---
 examples = []
@@ -349,54 +181,47 @@ examples = []
 css = """
 #col-container {
     margin: 0 auto;
-    max-width: 1024px;
 }
 #logo-title {
     text-align: center;
 }
-#logo-title img {
-    width: 400px;
-}
-#edit_text{margin-top: -62px !important}
 """
 with gr.Blocks(css=css) as demo:
     with gr.Column(elem_id="col-container"):
         gr.HTML("""
         <div id="logo-title">
-            <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_edit_logo.png" alt="Qwen-Image Edit Logo" width="400" style="display: block; margin: 0 auto;">
-            <h2 style="font-style: italic;color: #5b47d1;margin-top: -27px !important;margin-left: 96px">[Plus] Fast, 4-steps with LightX2V LoRA</h2>
         </div>
         """)
         gr.Markdown("""
-        [Learn more](https://github.com/QwenLM/Qwen-Image) about the Qwen-Image series.
-        This demo uses the new [Qwen-Image-Edit-2511](https://huggingface.co/Qwen/Qwen-Image-Edit-2511) with the [Qwen-Image-Lightning-2511](https://huggingface.co/lightx2v/Qwen-Image-Edit-2511-Lightning) LoRA for accelerated inference.
-        Try on [Qwen Chat](https://chat.qwen.ai/), or [download model](https://huggingface.co/Qwen/Qwen-Image-Edit-2509) to run locally with ComfyUI or diffusers.
         """)
         with gr.Row():
-            with gr.Column():
-                input_images = gr.Gallery(label="Input Images",
-                                          show_label=False,
-                                          type="pil",
                                           interactive=True)
-            with gr.Column():
-                result = gr.Gallery(label="Result", show_label=False, type="pil", interactive=False)
-                # Add this button right after the result gallery - initially hidden
-                use_output_btn = gr.Button("↗️ Use as input", variant="secondary", size="sm", visible=False)
-        with gr.Row():
-            prompt = gr.Text(
-                    label="Prompt",
-                    show_label=False,
-                    placeholder="describe the edit instruction",
-                    container=False,
-            )
-            run_button = gr.Button("Edit!", variant="primary")
-        with gr.Accordion("Advanced Settings", open=False):
-            # Negative prompt UI element is removed here
             seed = gr.Slider(
                 label="Seed",
                 minimum=0,
@@ -408,7 +233,6 @@ with gr.Blocks(css=css) as demo:
             randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
             with gr.Row():
                 true_guidance_scale = gr.Slider(
                     label="True guidance scale",
                     minimum=1.0,
@@ -424,7 +248,7 @@ with gr.Blocks(css=css) as demo:
                     step=1,
                     value=4,
                 )
                 height = gr.Slider(
                     label="Height",
                     minimum=256,
@@ -432,7 +256,7 @@ with gr.Blocks(css=css) as demo:
                     step=8,
                     value=None,
                 )
                 width = gr.Slider(
                     label="Width",
                     minimum=256,
@@ -440,34 +264,19 @@ with gr.Blocks(css=css) as demo:
                     step=8,
                     value=None,
                 )
-                rewrite_prompt = gr.Checkbox(label="Rewrite prompt", value=True)
-        # gr.Examples(examples=examples, inputs=[prompt], outputs=[result, seed], fn=infer, cache_examples=False)
-    gr.on(
-        triggers=[run_button.click, prompt.submit],
         fn=infer,
         inputs=[
             input_images,
-            prompt,
             seed,
             randomize_seed,
             true_guidance_scale,
             num_inference_steps,
             height,
             width,
-            rewrite_prompt,
         ],
-        outputs=[result, seed, use_output_btn],  # Added use_output_btn to outputs
-    )
-    # Add the new event handler for the "Use Output as Input" button
-    use_output_btn.click(
-        fn=use_output_as_input,
-        inputs=[result],
-        outputs=[input_images]
     )
 if __name__ == "__main__":

 # from qwenimage.transformer_qwenimage import QwenImageTransformer2DModel
 # from qwenimage.qwen_fa3_processor import QwenDoubleStreamAttnProcessorFA3
 import math
 import os
+# --- Environment Variables for LoRA and Prompts ---
+STAGE1_LORA_REPO = os.environ.get("STAGE1_LORA_REPO", "default/stage1-lora")
+STAGE1_LORA_WEIGHT = os.environ.get("STAGE1_LORA_WEIGHT", "stage1.safetensors")
+STAGE2_LORA_REPO = os.environ.get("STAGE2_LORA_REPO", "default/stage2-lora")
+STAGE2_LORA_WEIGHT = os.environ.get("STAGE2_LORA_WEIGHT", "stage2.safetensors")
+STAGE1_PROMPT = os.environ.get("STAGE1_PROMPT", "Convert anime character to base body structure")
+STAGE2_PROMPT = os.environ.get("STAGE2_PROMPT", "Convert base body to clear guide body with structure lines")
 # --- Model Loading ---
 dtype = torch.bfloat16
 # Initialize scheduler with Lightning config
 scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)
+# Load Stage 1 pipeline (Anime -> Base Body)
+pipe_stage1 = QwenImageEditPlusPipeline.from_pretrained("Qwen/Qwen-Image-Edit-2511",
+                                                 scheduler=scheduler,
+                                                 torch_dtype=dtype).to(device)
+pipe_stage1.load_lora_weights(
+        "lightx2v/Qwen-Image-Edit-2511-Lightning",
+        weight_name="Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16.safetensors"
+)
+pipe_stage1.load_lora_weights(STAGE1_LORA_REPO, weight_name=STAGE1_LORA_WEIGHT, adapter_name="stage1")
+pipe_stage1.set_adapters(["default", "stage1"], adapter_weights=[1.0, 1.0])
+pipe_stage1.fuse_lora()
+# Load Stage 2 pipeline (Base Body -> Guide Body)
+pipe_stage2 = QwenImageEditPlusPipeline.from_pretrained("Qwen/Qwen-Image-Edit-2511",
                                                  scheduler=scheduler,
                                                  torch_dtype=dtype).to(device)
+pipe_stage2.load_lora_weights(
+        "lightx2v/Qwen-Image-Edit-2511-Lightning",
         weight_name="Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16.safetensors"
 )
+pipe_stage2.load_lora_weights(STAGE2_LORA_REPO, weight_name=STAGE2_LORA_WEIGHT, adapter_name="stage2")
+pipe_stage2.set_adapters(["default", "stage2"], adapter_weights=[1.0, 1.0])
+pipe_stage2.fuse_lora()
 # # Apply the same optimizations from the first version
 # pipe.transformer.__class__ = QwenImageTransformer2DModel
 # # --- Ahead-of-time compilation ---
 # optimize_pipeline_(pipe, image=[Image.new("RGB", (1024, 1024)), Image.new("RGB", (1024, 1024))], prompt="prompt")
+# --- UI Constants ---
 MAX_SEED = np.iinfo(np.int32).max
+# --- Main Inference Function (Two-Stage Conversion) ---
 @spaces.GPU()
 def infer(
     images,
     seed=42,
     randomize_seed=False,
     true_guidance_scale=1.0,
     num_inference_steps=4,
     height=None,
     width=None,
     progress=gr.Progress(track_tqdm=True),
 ):
     """
+    Run two-stage image conversion: Anime Character -> Base Body -> Guide Body.
     Parameters:
         images (list): Input images from the Gradio gallery (PIL or path-based).
         seed (int): Random seed for reproducibility.
         randomize_seed (bool): If True, overrides seed with a random value.
         true_guidance_scale (float): CFG scale used by Qwen-Image.
         num_inference_steps (int): Number of diffusion steps.
         height (int | None): Optional output height override.
         width (int | None): Optional output width override.
         progress: Gradio progress callback.
     Returns:
+        tuple: (stage1_images, stage2_images, seed_used)
     """
+    # Hardcode the negative prompt
     negative_prompt = " "
     if randomize_seed:
         seed = random.randint(0, MAX_SEED)
     # Set up the generator for reproducibility
     generator = torch.Generator(device=device).manual_seed(seed)
     # Load input images into PIL Images
     pil_images = []
     if images is not None:
     if height==256 and width==256:
         height, width = None, None
+    # Stage 1: Anime Character -> Base Body
+    print(f"[Stage 1] Converting to base body...")
+    print(f"Prompt: '{STAGE1_PROMPT}'")
     print(f"Seed: {seed}, Steps: {num_inference_steps}, Guidance: {true_guidance_scale}, Size: {width}x{height}")
+    stage1_images = pipe_stage1(
         image=pil_images if len(pil_images) > 0 else None,
+        prompt=STAGE1_PROMPT,
         height=height,
         width=width,
         negative_prompt=negative_prompt,
         num_inference_steps=num_inference_steps,
         generator=generator,
         true_cfg_scale=true_guidance_scale,
+        num_images_per_prompt=1,
     ).images
+    # Stage 2: Base Body -> Guide Body
+    print(f"[Stage 2] Converting to guide body...")
+    print(f"Prompt: '{STAGE2_PROMPT}'")
+    # Use same seed for stage 2
+    generator = torch.Generator(device=device).manual_seed(seed)
+    stage2_images = pipe_stage2(
+        image=stage1_images,
+        prompt=STAGE2_PROMPT,
+        height=height,
+        width=width,
+        negative_prompt=negative_prompt,
+        num_inference_steps=num_inference_steps,
+        generator=generator,
+        true_cfg_scale=true_guidance_scale,
+        num_images_per_prompt=1,
+    ).images
+    # Return stage1 (base body), stage2 (guide body), and seed
+    return stage1_images, stage2_images, seed
 # --- Examples and UI Layout ---
 examples = []
 css = """
 #col-container {
     margin: 0 auto;
+    max-width: 1600px;
 }
 #logo-title {
     text-align: center;
 }
 """
 with gr.Blocks(css=css) as demo:
     with gr.Column(elem_id="col-container"):
         gr.HTML("""
         <div id="logo-title">
+            <h1>🎨✨ QIE-Image2GuideBody</h1>
+            <h3 style="color: #5b47d1;">Anime Character → Base Body → Guide Body Converter</h3>
         </div>
         """)
         gr.Markdown("""
+        Two-stage conversion pipeline powered by [Qwen-Image-Edit-2511](https://huggingface.co/Qwen/Qwen-Image-Edit-2511) with custom LoRAs.
+        **Stage 1:** Converts anime characters to base body structure
+        **Stage 2:** Converts base body to clear guide body with structure lines
         """)
         with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown("### 1️⃣ Input (Anime Character)")
+                input_images = gr.Gallery(label="Input Images",
+                                          show_label=False,
+                                          type="pil",
                                           interactive=True)
+            with gr.Column(scale=1):
+                gr.Markdown("### 2️⃣ Stage 1 (Base Body)")
+                stage1_result = gr.Gallery(label="Base Body", show_label=False, type="pil", interactive=False)
+            with gr.Column(scale=1):
+                gr.Markdown("### 3️⃣ Stage 2 (Guide Body)")
+                stage2_result = gr.Gallery(label="Guide Body", show_label=False, type="pil", interactive=False)
+        run_button = gr.Button("🚀 Convert to Guide Body", variant="primary", size="lg")
+        with gr.Accordion("Advanced Settings", open=False):
             seed = gr.Slider(
                 label="Seed",
                 minimum=0,
             randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
             with gr.Row():
                 true_guidance_scale = gr.Slider(
                     label="True guidance scale",
                     minimum=1.0,
                     step=1,
                     value=4,
                 )
                 height = gr.Slider(
                     label="Height",
                     minimum=256,
                     step=8,
                     value=None,
                 )
                 width = gr.Slider(
                     label="Width",
                     minimum=256,
                     step=8,
                     value=None,
                 )
+    run_button.click(
         fn=infer,
         inputs=[
             input_images,
             seed,
             randomize_seed,
             true_guidance_scale,
             num_inference_steps,
             height,
             width,
         ],
+        outputs=[stage1_result, stage2_result, seed],
     )
 if __name__ == "__main__":