Spaces:

SEVAQWERTY
/

test

Sleeping

App Files Files Community

SEVAQWERTY commited on Feb 28

Commit

29014f3

verified ·

1 Parent(s): 24ba517

Update app.py

Browse files

Files changed (1) hide show

app.py +104 -474

app.py CHANGED Viewed

@@ -1,474 +1,104 @@
-import gradio as gr
-import numpy as np
-import random
-import torch
-import spaces
-from PIL import Image
-from diffusers import FlowMatchEulerDiscreteScheduler, QwenImageEditPlusPipeline
-# from optimization import optimize_pipeline_
-# from qwenimage.pipeline_qwenimage_edit_plus import QwenImageEditPlusPipeline
-# from qwenimage.transformer_qwenimage import QwenImageTransformer2DModel
-# from qwenimage.qwen_fa3_processor import QwenDoubleStreamAttnProcessorFA3
-from huggingface_hub import InferenceClient
-import math
-import os
-import base64
-from io import BytesIO
-import json
-SYSTEM_PROMPT = '''
-# Edit Instruction Rewriter
-You are a professional edit instruction rewriter. Your task is to generate a precise, concise, and visually achievable professional-level edit instruction based on the user-provided instruction and the image to be edited.
-Please strictly follow the rewriting rules below:
-## 1. General Principles
-- Keep the rewritten prompt **concise and comprehensive**. Avoid overly long sentences and unnecessary descriptive language.
-- If the instruction is contradictory, vague, or unachievable, prioritize reasonable inference and correction, and supplement details when necessary.
-- Keep the main part of the original instruction unchanged, only enhancing its clarity, rationality, and visual feasibility.
-- All added objects or modifications must align with the logic and style of the scene in the input images.
-- If multiple sub-images are to be generated, describe the content of each sub-image individually.
-## 2. Task-Type Handling Rules
-### 1. Add, Delete, Replace Tasks
-- If the instruction is clear (already includes task type, target entity, position, quantity, attributes), preserve the original intent and only refine the grammar.
-- If the description is vague, supplement with minimal but sufficient details (category, color, size, orientation, position, etc.). For example:
-    > Original: "Add an animal"
-    > Rewritten: "Add a light-gray cat in the bottom-right corner, sitting and facing the camera"
-- Remove meaningless instructions: e.g., "Add 0 objects" should be ignored or flagged as invalid.
-- For replacement tasks, specify "Replace Y with X" and briefly describe the key visual features of X.
-### 2. Text Editing Tasks
-- All text content must be enclosed in English double quotes `" "`. Keep the original language of the text, and keep the capitalization.
-- Both adding new text and replacing existing text are text replacement tasks, For example:
-    - Replace "xx" to "yy"
-    - Replace the mask / bounding box to "yy"
-    - Replace the visual object to "yy"
-- Specify text position, color, and layout only if user has required.
-- If font is specified, keep the original language of the font.
-### 3. Human Editing Tasks
-- Make the smallest changes to the given user's prompt.
-- If changes to background, action, expression, camera shot, or ambient lighting are required, please list each modification individually.
-- **Edits to makeup or facial features / expression must be subtle, not exaggerated, and must preserve the subject's identity consistency.**
-    > Original: "Add eyebrows to the face"
-    > Rewritten: "Slightly thicken the person's eyebrows with little change, look natural."
-### 4. Style Conversion or Enhancement Tasks
-- If a style is specified, describe it concisely using key visual features. For example:
-    > Original: "Disco style"
-    > Rewritten: "1970s disco style: flashing lights, disco ball, mirrored walls, vibrant colors"
-- For style reference, analyze the original image and extract key characteristics (color, composition, texture, lighting, artistic style, etc.), integrating them into the instruction.
-- **Colorization tasks (including old photo restoration) must use the fixed template:**
-  "Restore and colorize the old photo."
-- Clearly specify the object to be modified. For example:
-    > Original: Modify the subject in Picture 1 to match the style of Picture 2.
-    > Rewritten: Change the girl in Picture 1 to the ink-wash style of Picture 2 — rendered in black-and-white watercolor with soft color transitions.
-### 5. Material Replacement
-- Clearly specify the object and the material. For example: "Change the material of the apple to papercut style."
-- For text material replacement, use the fixed template:
-    "Change the material of text "xxxx" to laser style"
-### 6. Logo/Pattern Editing
-- Material replacement should preserve the original shape and structure as much as possible. For example:
-   > Original: "Convert to sapphire material"
-   > Rewritten: "Convert the main subject in the image to sapphire material, preserving similar shape and structure"
-- When migrating logos/patterns to new scenes, ensure shape and structure consistency. For example:
-   > Original: "Migrate the logo in the image to a new scene"
-   > Rewritten: "Migrate the logo in the image to a new scene, preserving similar shape and structure"
-### 7. Multi-Image Tasks
-- Rewritten prompts must clearly point out which image's element is being modified. For example:
-    > Original: "Replace the subject of picture 1 with the subject of picture 2"
-    > Rewritten: "Replace the girl of picture 1 with the boy of picture 2, keeping picture 2's background unchanged"
-- For stylization tasks, describe the reference image's style in the rewritten prompt, while preserving the visual content of the source image.
-## 3. Rationale and Logic Check
-- Resolve contradictory instructions: e.g., "Remove all trees but keep all trees" requires logical correction.
-- Supplement missing critical information: e.g., if position is unspecified, choose a reasonable area based on composition (near subject, blank space, center/edge, etc.).
-# Output Format Example
-```json
-{
-   "Rewritten": "..."
-}
-'''
-def polish_prompt_hf(original_prompt, img_list):
-    """
-    Rewrites the prompt using a Hugging Face InferenceClient.
-    Supports multiple images via img_list.
-    """
-    # Ensure HF_TOKEN is set
-    api_key = os.environ.get("inference_providers")
-    if not api_key:
-        print("Warning: HF_TOKEN not set. Falling back to original prompt.")
-        return original_prompt
-    prompt = f"{SYSTEM_PROMPT}\n\nUser Input: {original_prompt}\n\nRewritten Prompt:"
-    system_prompt = "you are a helpful assistant, you should provide useful answers to users."
-    try:
-        # Initialize the client
-        client = InferenceClient(
-            provider="nebius",
-            api_key=api_key,
-        )
-        # Convert list of images to base64 data URLs
-        image_urls = []
-        if img_list is not None:
-            # Ensure img_list is actually a list
-            if not isinstance(img_list, list):
-                img_list = [img_list]
-            for img in img_list:
-                image_url = None
-                # If img is a PIL Image
-                if hasattr(img, 'save'):  # Check if it's a PIL Image
-                    buffered = BytesIO()
-                    img.save(buffered, format="PNG")
-                    img_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')
-                    image_url = f"data:image/png;base64,{img_base64}"
-                # If img is already a file path (string)
-                elif isinstance(img, str):
-                    with open(img, "rb") as image_file:
-                        img_base64 = base64.b64encode(image_file.read()).decode('utf-8')
-                    image_url = f"data:image/png;base64,{img_base64}"
-                else:
-                    print(f"Warning: Unexpected image type: {type(img)}, skipping...")
-                    continue
-                if image_url:
-                    image_urls.append(image_url)
-        # Build the content array with text first, then all images
-        content = [
-            {
-                "type": "text",
-                "text": prompt
-            }
-        ]
-        # Add all images to the content
-        for image_url in image_urls:
-            content.append({
-                "type": "image_url",
-                "image_url": {
-                    "url": image_url
-                }
-            })
-        # Format the messages for the chat completions API
-        messages = [
-            {"role": "system", "content": system_prompt},
-            {
-                "role": "user",
-                "content": content
-            }
-        ]
-        # Call the API
-        completion = client.chat.completions.create(
-            model="Qwen/Qwen2.5-VL-72B-Instruct",
-            messages=messages,
-        )
-        # Parse the response
-        result = completion.choices[0].message.content
-        # Try to extract JSON if present
-        if '"Rewritten"' in result:
-            try:
-                # Clean up the response
-                result = result.replace('```json', '').replace('```', '')
-                result_json = json.loads(result)
-                polished_prompt = result_json.get('Rewritten', result)
-            except:
-                polished_prompt = result
-        else:
-            polished_prompt = result
-        polished_prompt = polished_prompt.strip().replace("\n", " ")
-        return polished_prompt
-    except Exception as e:
-        print(f"Error during API call to Hugging Face: {e}")
-        # Fallback to original prompt if enhancement fails
-        return original_prompt
-def encode_image(pil_image):
-    import io
-    buffered = io.BytesIO()
-    pil_image.save(buffered, format="PNG")
-    return base64.b64encode(buffered.getvalue()).decode("utf-8")
-# --- Model Loading ---
-dtype = torch.bfloat16
-device = "cuda" if torch.cuda.is_available() else "cpu"
-# Scheduler configuration for Lightning
-scheduler_config = {
-    "base_image_seq_len": 256,
-    "base_shift": math.log(3),
-    "invert_sigmas": False,
-    "max_image_seq_len": 8192,
-    "max_shift": math.log(3),
-    "num_train_timesteps": 1000,
-    "shift": 1.0,
-    "shift_terminal": None,
-    "stochastic_sampling": False,
-    "time_shift_type": "exponential",
-    "use_beta_sigmas": False,
-    "use_dynamic_shifting": True,
-    "use_exponential_sigmas": False,
-    "use_karras_sigmas": False,
-}
-# Initialize scheduler with Lightning config
-scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)
-# Load the model pipeline
-pipe = QwenImageEditPlusPipeline.from_pretrained("Qwen/Qwen-Image-Edit-2511",
-                                                 scheduler=scheduler,
-                                                 torch_dtype=dtype).to(device)
-pipe.load_lora_weights(
-        "lightx2v/Qwen-Image-Edit-2511-Lightning",
-        weight_name="Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16.safetensors"
-)
-pipe.fuse_lora()
-# # Apply the same optimizations from the first version
-# pipe.transformer.__class__ = QwenImageTransformer2DModel
-# pipe.transformer.set_attn_processor(QwenDoubleStreamAttnProcessorFA3())
-# # --- Ahead-of-time compilation ---
-# optimize_pipeline_(pipe, image=[Image.new("RGB", (1024, 1024)), Image.new("RGB", (1024, 1024))], prompt="prompt")
-# --- UI Constants and Helpers ---
-MAX_SEED = np.iinfo(np.int32).max
-def use_output_as_input(output_images):
-    """Convert output images to input format for the gallery"""
-    if output_images is None or len(output_images) == 0:
-        return []
-    return output_images
-# --- Main Inference Function (with hardcoded negative prompt) ---
-@spaces.GPU()
-def infer(
-    images,
-    prompt,
-    seed=42,
-    randomize_seed=False,
-    true_guidance_scale=1.0,
-    num_inference_steps=4,
-    height=None,
-    width=None,
-    rewrite_prompt=True,
-    num_images_per_prompt=1,
-    progress=gr.Progress(track_tqdm=True),
-):
-    """
-    Run image-editing inference using the Qwen-Image-Edit pipeline.
-    Parameters:
-        images (list): Input images from the Gradio gallery (PIL or path-based).
-        prompt (str): Editing instruction (may be rewritten by LLM if enabled).
-        seed (int): Random seed for reproducibility.
-        randomize_seed (bool): If True, overrides seed with a random value.
-        true_guidance_scale (float): CFG scale used by Qwen-Image.
-        num_inference_steps (int): Number of diffusion steps.
-        height (int | None): Optional output height override.
-        width (int | None): Optional output width override.
-        rewrite_prompt (bool): Whether to rewrite the prompt using Qwen-2.5-VL.
-        num_images_per_prompt (int): Number of images to generate.
-        progress: Gradio progress callback.
-    Returns:
-        tuple: (generated_images, seed_used, UI_visibility_update)
-    """
-    # Hardcode the negative prompt as requested
-    negative_prompt = " "
-    if randomize_seed:
-        seed = random.randint(0, MAX_SEED)
-    # Set up the generator for reproducibility
-    generator = torch.Generator(device=device).manual_seed(seed)
-    # Load input images into PIL Images
-    pil_images = []
-    if images is not None:
-        for item in images:
-            try:
-                if isinstance(item[0], Image.Image):
-                    pil_images.append(item[0].convert("RGB"))
-                elif isinstance(item[0], str):
-                    pil_images.append(Image.open(item[0]).convert("RGB"))
-                elif hasattr(item, "name"):
-                    pil_images.append(Image.open(item.name).convert("RGB"))
-            except Exception:
-                continue
-    if height==256 and width==256:
-        height, width = None, None
-    print(f"Calling pipeline with prompt: '{prompt}'")
-    print(f"Negative Prompt: '{negative_prompt}'")
-    print(f"Seed: {seed}, Steps: {num_inference_steps}, Guidance: {true_guidance_scale}, Size: {width}x{height}")
-    if rewrite_prompt and len(pil_images) > 0:
-        prompt = polish_prompt_hf(prompt, pil_images)
-        print(f"Rewritten Prompt: {prompt}")
-    # Generate the image
-    image = pipe(
-        image=pil_images if len(pil_images) > 0 else None,
-        prompt=prompt,
-        height=height,
-        width=width,
-        negative_prompt=negative_prompt,
-        num_inference_steps=num_inference_steps,
-        generator=generator,
-        true_cfg_scale=true_guidance_scale,
-        num_images_per_prompt=num_images_per_prompt,
-    ).images
-    # Return images, seed, and make button visible
-    return image, seed, gr.update(visible=True)
-# --- Examples and UI Layout ---
-examples = []
-css = """
-#col-container {
-    margin: 0 auto;
-    max-width: 1024px;
-}
-#logo-title {
-    text-align: center;
-}
-#logo-title img {
-    width: 400px;
-}
-#edit_text{margin-top: -62px !important}
-"""
-with gr.Blocks(css=css) as demo:
-    with gr.Column(elem_id="col-container"):
-        gr.HTML("""
-        <div id="logo-title">
-            <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_edit_logo.png" alt="Qwen-Image Edit Logo" width="400" style="display: block; margin: 0 auto;">
-            <h2 style="font-style: italic;color: #5b47d1;margin-top: -27px !important;margin-left: 96px">[Plus] Fast, 4-steps with LightX2V LoRA</h2>
-        </div>
-        """)
-        gr.Markdown("""
-        [Learn more](https://github.com/QwenLM/Qwen-Image) about the Qwen-Image series.
-        This demo uses the new [Qwen-Image-Edit-2511](https://huggingface.co/Qwen/Qwen-Image-Edit-2511) with the [Qwen-Image-Lightning-2511](https://huggingface.co/lightx2v/Qwen-Image-Edit-2511-Lightning) LoRA for accelerated inference.
-        Try on [Qwen Chat](https://chat.qwen.ai/), or [download model](https://huggingface.co/Qwen/Qwen-Image-Edit-2509) to run locally with ComfyUI or diffusers.
-        """)
-        with gr.Row():
-            with gr.Column():
-                input_images = gr.Gallery(label="Input Images",
-                                          show_label=False,
-                                          type="pil",
-                                          interactive=True)
-            with gr.Column():
-                result = gr.Gallery(label="Result", show_label=False, type="pil", interactive=False)
-                # Add this button right after the result gallery - initially hidden
-                use_output_btn = gr.Button("↗️ Use as input", variant="secondary", size="sm", visible=False)
-        with gr.Row():
-            prompt = gr.Text(
-                    label="Prompt",
-                    show_label=False,
-                    placeholder="describe the edit instruction",
-                    container=False,
-            )
-            run_button = gr.Button("Edit!", variant="primary")
-        with gr.Accordion("Advanced Settings", open=False):
-            # Negative prompt UI element is removed here
-            seed = gr.Slider(
-                label="Seed",
-                minimum=0,
-                maximum=MAX_SEED,
-                step=1,
-                value=0,
-            )
-            randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
-            with gr.Row():
-                true_guidance_scale = gr.Slider(
-                    label="True guidance scale",
-                    minimum=1.0,
-                    maximum=10.0,
-                    step=0.1,
-                    value=1.0
-                )
-                num_inference_steps = gr.Slider(
-                    label="Number of inference steps",
-                    minimum=1,
-                    maximum=40,
-                    step=1,
-                    value=4,
-                )
-                height = gr.Slider(
-                    label="Height",
-                    minimum=256,
-                    maximum=2048,
-                    step=8,
-                    value=None,
-                )
-                width = gr.Slider(
-                    label="Width",
-                    minimum=256,
-                    maximum=2048,
-                    step=8,
-                    value=None,
-                )
-                rewrite_prompt = gr.Checkbox(label="Rewrite prompt", value=True)
-        # gr.Examples(examples=examples, inputs=[prompt], outputs=[result, seed], fn=infer, cache_examples=False)
-    gr.on(
-        triggers=[run_button.click, prompt.submit],
-        fn=infer,
-        inputs=[
-            input_images,
-            prompt,
-            seed,
-            randomize_seed,
-            true_guidance_scale,
-            num_inference_steps,
-            height,
-            width,
-            rewrite_prompt,
-        ],
-        outputs=[result, seed, use_output_btn],  # Added use_output_btn to outputs
-    )
-    # Add the new event handler for the "Use Output as Input" button
-    use_output_btn.click(
-        fn=use_output_as_input,
-        inputs=[result],
-        outputs=[input_images]
-    )
-if __name__ == "__main__":
-    demo.launch(mcp_server=True)

+import gradio as gr
+import torch
+import random
+import numpy as np
+from PIL import Image
+from diffusers import StableDiffusionInstructPix2PixPipeline
+import spaces
+# ==============================
+# Device (CPU ONLY)
+# ==============================
+device = "cpu"
+dtype = torch.float32
+print("Loading InstructPix2Pix pipeline...")
+pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(
+    "timbrooks/instruct-pix2pix",
+    torch_dtype=dtype,
+    safety_checker=None
+).to(device)
+# CPU optimizations
+pipe.enable_attention_slicing()
+print("Model loaded successfully.")
+MAX_SEED = np.iinfo(np.int32).max
+# ==============================
+# Inference Function
+# ==============================
+@spaces.GPU()  # Safe even on CPU Basic
+def infer(
+    image,
+    prompt,
+    seed=0,
+    randomize_seed=True,
+    guidance_scale=7.5,
+    num_inference_steps=20,
+):
+    if image is None:
+        return None, seed
+    if randomize_seed:
+        seed = random.randint(0, MAX_SEED)
+    generator = torch.Generator(device=device).manual_seed(seed)
+    image = image.convert("RGB").resize((512, 512))
+    result = pipe(
+        prompt=prompt,
+        image=image,
+        guidance_scale=guidance_scale,
+        num_inference_steps=min(num_inference_steps, 30),
+        generator=generator,
+    ).images[0]
+    return result, seed
+# ==============================
+# UI
+# ==============================
+with gr.Blocks() as demo:
+    gr.Markdown("# 🖼️ Image Edit (CPU Version)")
+    gr.Markdown("Stable Diffusion InstructPix2Pix – works on 16GB CPU Basic")
+    with gr.Row():
+        input_image = gr.Image(type="pil", label="Input Image")
+        output_image = gr.Image(type="pil", label="Edited Image")
+    prompt = gr.Textbox(
+        label="Edit Instruction",
+        placeholder="e.g. make the sky pink"
+    )
+    with gr.Row():
+        seed = gr.Slider(0, MAX_SEED, value=0, step=1, label="Seed")
+        randomize_seed = gr.Checkbox(value=True, label="Randomize Seed")
+    with gr.Row():
+        guidance_scale = gr.Slider(1.0, 15.0, value=7.5, step=0.5, label="Guidance Scale")
+        num_inference_steps = gr.Slider(1, 40, value=20, step=1, label="Steps")
+    run_button = gr.Button("Edit Image")
+    run_button.click(
+        fn=infer,
+        inputs=[
+            input_image,
+            prompt,
+            seed,
+            randomize_seed,
+            guidance_scale,
+            num_inference_steps
+        ],
+        outputs=[output_image, seed],
+    )
+if __name__ == "__main__":
+    demo.launch()