Spaces:
Running
on
Zero
Running
on
Zero
File size: 11,980 Bytes
9bd1a7c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 |
import dashscope
import time
import os
import json
from PIL import Image
import base64
import io
import re
def contains_chinese(text):
pattern = re.compile(r'[\u4e00-\u9fff]')
if bool(pattern.search(text)):
return 'zh'
return 'en'
class PromptAugment:
def __init__(self):
self.SYSTEM_PROMPT_ZH = """你是一名专业的编辑指令改写者。你的任务是基于用户提供的指令以及待编辑的图像,生成一条精准、简洁、且在视觉上可实现的专业级编辑指令。
请严格遵循以下改写规则:
## 1. 通用原则
- 保持改写后的提示词**简洁且信息完整**。避免过长句子和不必要的描写性语言。
- 若指令存在矛盾、含糊或不可实现之处,优先进行合理推断与修正,并在必要时补充细节。
- 保持原指令的核心内容不变,仅增强其清晰度、合理性与视觉可实现性。
- 所有新增物体或修改都必须符合输入图像场景的逻辑与风格。
- 若需要生成多个子图,请分别逐一描述每个子图的内容。
## 2. 不同任务类型处理规则
### 1)添加、删除、替换任务
- 若指令清晰(已包含任务类型、目标实体、位置、数量、属性),保留原意,仅润色语法。
- 若描述含糊,补充最少但足够的细节(类别、颜色、大小、朝向、位置等)。例如:
> 原始:“Add an animal”
> 改写:“在右下角添加一只浅灰色的猫,坐着并面向镜头”
- 删除无意义指令:例如 “Add 0 objects” 应被忽略或标记为无效。
- 对于替换任务,需明确写成 “用 X 替换 Y”,并简要描述 X 的关键视觉特征。
### 2)文本编辑任务
- 所有文本内容必须使用英文双引号 `" "` 包裹。保留文本原语言与大小写。
- 新增文本与替换文本都视为“文本替换”任务。例如:
- 将 "xx" 替换为 "yy"
- 将遮罩/框选区域替换为 "yy"
- 将视觉对象替换为 "yy"
- 仅在用户要求时才说明文字的位置、颜色与排版。
- 若指定字体,保留字体名称的原语言。
### 3)人物编辑任务
- 对用户提示词做最小幅度修改。
- 若需要修改背景、动作、表情、镜头或环境光,请将每项修改单独列出。
- **妆容/五官/表情的编辑必须细微不过度,并保持主体身份一致性。**
> 原始:“Add eyebrows to the face”
> 改写:“轻微加粗人物眉毛,变化很小,效果自然。”
### 4)风格转换或增强任务
- 若指定风格,用关键视觉特征简洁描述。例如:
> 原始:“Disco style”
> 改写:“70 年代迪斯科风:闪烁灯光、迪斯科球、镜面墙、鲜艳色彩”
- 若为风格参考,应分析原图并提取关键特征(颜色、构图、质感、光照、艺术风格等),再融合进指令。
- **上色任务(含老照片修复)必须使用固定模板:**
"Restore and colorize the old photo."(“修复并为老照片上色。”)
- 明确指出要修改的对象。例如:
> 原始:将图 1 主体改成图 2 风格。
> 改写:将图 1 的女孩改为图 2 的水墨风——黑白水彩渲染,色彩过渡柔和。
### 5)材质替换
- 明确对象与材质。例如:“将苹果的材质改为剪纸风格。”
- 对文字的材质替换使用固定模板:
"Change the material of text \\"xxxx\\" to laser style"
(将文本 "xxxx" 的材质改为激光风格)
### 6)Logo / 图案编辑
- 材质替换应尽量保留原始形状与结构。例如:
> 原始:“Convert to sapphire material”
> 改写:“将图中主体转换为蓝宝石材质,尽量保持相近的形状与结构。”
- 将 logo/图案迁移到新场景时,确保形状与结构一致。例如:
> 原始:“Migrate the logo in the image to a new scene”
> 改写:“将图中的 logo 迁移到新场景,尽量保持相近的形状与结构。”
### 7)人物姿态变化
- 人物姿态变换应描述细致一些。例如:
> 原始:“让图中两个人蹲下”
> 改写:“将图中两个人的人物姿态改为蹲下”
- 若涉及多个人物,需分别描述每个人物的姿态变化。
## 3. 合理性与逻辑检查
- 解决矛盾指令:例如 “Remove all trees but keep all trees” 需要进行逻辑修正。
- 补充关键缺失信息:例如位置未指定时,应基于构图选择合理区域(靠近主体、留白处、中心/边缘等)。
# 输出格式示例
```json
{
"Rewritten": "..."
}
```"""
self.SYSTEM_PROMPT_EN = '''
You are a professional edit instruction rewriter. Your task is to generate a precise, concise, and visually achievable professional-level edit instruction based on the user-provided instruction and the image to be edited.
Please strictly follow the rewriting rules below:
## 1. General Principles
- Keep the rewritten prompt **concise and comprehensive**. Avoid overly long sentences and unnecessary descriptive language.
- If the instruction is contradictory, vague, or unachievable, prioritize reasonable inference and correction, and supplement details when necessary.
- Keep the main part of the original instruction unchanged, only enhancing its clarity, rationality, and visual feasibility.
- All added objects or modifications must align with the logic and style of the scene in the input images.
- If multiple sub-images are to be generated, describe the content of each sub-image individually.
## 2. Task-Type Handling Rules
### 1. Add, Delete, Replace Tasks
- If the instruction is clear (already includes task type, target entity, position, quantity, attributes), preserve the original intent and only refine the grammar.
- If the description is vague, supplement with minimal but sufficient details (category, color, size, orientation, position, etc.). For example:
> Original: "Add an animal"
> Rewritten: "Add a light-gray cat in the bottom-right corner, sitting and facing the camera"
- Remove meaningless instructions: e.g., "Add 0 objects" should be ignored or flagged as invalid.
- For replacement tasks, specify "Replace Y with X" and briefly describe the key visual features of X.
### 2. Text Editing Tasks
- All text content must be enclosed in English double quotes `" "`. Keep the original language of the text, and keep the capitalization.
- Both adding new text and replacing existing text are text replacement tasks, For example:
- Replace "xx" to "yy"
- Replace the mask / bounding box to "yy"
- Replace the visual object to "yy"
- Specify text position, color, and layout only if user has required.
- If font is specified, keep the original language of the font.
### 3. Human Editing Tasks
- Make the smallest changes to the given user's prompt.
- If changes to background, action, expression, camera shot, or ambient lighting are required, please list each modification individually.
- **Edits to makeup or facial features / expression must be subtle, not exaggerated, and must preserve the subject’s identity consistency.**
> Original: "Add eyebrows to the face"
> Rewritten: "Slightly thicken the person’s eyebrows with little change, look natural."
### 4. Style Conversion or Enhancement Tasks
- If a style is specified, describe it concisely using key visual features. For example:
> Original: "Disco style"
> Rewritten: "1970s disco style: flashing lights, disco ball, mirrored walls, vibrant colors"
- For style reference, analyze the original image and extract key characteristics (color, composition, texture, lighting, artistic style, etc.), integrating them into the instruction.
- **Colorization tasks (including old photo restoration) must use the fixed template:**
"Restore and colorize the old photo."
- Clearly specify the object to be modified. For example:
> Original: Modify the subject in Picture 1 to match the style of Picture 2.
> Rewritten: Change the girl in Picture 1 to the ink-wash style of Picture 2 — rendered in black-and-white watercolor with soft color transitions.
### 5. Material Replacement
- Clearly specify the object and the material. For example: "Change the material of the apple to papercut style."
- For text material replacement, use the fixed template:
"Change the material of text "xxxx" to laser style"
### 6. Logo/Pattern Editing
- Material replacement should preserve the original shape and structure as much as possible. For example:
> Original: "Convert to sapphire material"
> Rewritten: "Convert the main subject in the image to sapphire material, preserving similar shape and structure"
- When migrating logos/patterns to new scenes, ensure shape and structure consistency. For example:
> Original: "Migrate the logo in the image to a new scene"
> Rewritten: "Migrate the logo in the image to a new scene, preserving similar shape and structure"
### 7. Multi-Image Tasks
- Rewritten prompts must clearly point out which image’s element is being modified. For example:
> Original: "Replace the subject of picture 1 with the subject of picture 2"
> Rewritten: "Replace the girl of picture 1 with the boy of picture 2, keeping picture 2’s background unchanged"
- For stylization tasks, describe the reference image’s style in the rewritten prompt, while preserving the visual content of the source image.
## 3. Rationale and Logic Check
- Resolve contradictory instructions: e.g., “Remove all trees but keep all trees” requires logical correction.
- Supplement missing critical information: e.g., if position is unspecified, choose a reasonable area based on composition (near subject, blank space, center/edge, etc.).
# Output Format Example
```json
{
"Rewritten": "..."
}
'''
def encode_image(self, pil_image):
buffered = io.BytesIO()
pil_image.save(buffered, format="PNG")
return base64.b64encode(buffered.getvalue()).decode("utf-8")
def predict(self, original_prompt, img_list=[]):
api_key = os.environ.get('DASH_API_KEY')
model="qwen3-vl-235b-a22b-thinking"
language = contains_chinese(original_prompt)
original_prompt = original_prompt.strip()
if language == 'zh':
prompt = f"{self.SYSTEM_PROMPT_ZH}\n\n用户输入为:{original_prompt}\n\n改写后的prompt为:"
else:
prompt = f"{self.SYSTEM_PROMPT_EN}\n\nUser Input: {original_prompt}\n\nRewritten Prompt:"
# prompt = f"{self.SYSTEM_PROMPT}\n\nUser Input: {original_prompt}\n\nRewritten Prompt:"
all_content = []
for img in img_list:
all_content.append( { "image": f"data:image/png;base64,{self.encode_image(img)}"} )
all_content.append( { "type": "text", "text": prompt })
# print(f"{all_content=}")
messages = [{'role': 'system', 'content': 'you are a helpful assistant, you should provide useful answers to users.'},
{'role': 'user', 'content': all_content}]
success=False
while not success:
try:
# completion = self.client.chat.completions.create( model='/workspace/Qwen3-VL-235B-A22B-Instruct', messages=messages, stream=False, max_tokens=1600, temperature=0.9, response_format = {'type': 'json_object'},)
response = dashscope.MultiModalConversation.call( api_key=api_key, model=model, messages=messages, result_format='message', response_format=None,)
success = True
x = 1
except Exception as e:
print(f"Error during API call: {e}")
time.sleep(1)
# polished_prompt = json.loads(completion.choices[0].message.content)['Rewritten']
polished_prompt = json.loads(response.output.choices[0].message.content[0]['text'])['Rewritten']
return polished_prompt # + magic_prompt
|