the-carnage commited on
Commit
177eb65
Β·
0 Parent(s):

πŸš€ Upgrade to AI Image Editor Pro with Gemini-style parsing

Browse files
Files changed (4) hide show
  1. .gitignore +3 -0
  2. README.md +37 -0
  3. app.py +964 -0
  4. requirements.txt +19 -0
.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ .venv/
2
+ __pycache__/
3
+ *.pyc
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: AI Image Editor
3
+ emoji: 🎨
4
+ colorFrom: purple
5
+ colorTo: blue
6
+ sdk: streamlit
7
+ sdk_version: 1.28.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # 🎨 AI Image Editor
14
+
15
+ > **Gemini-style image editing, but private and self-hosted!**
16
+
17
+ Edit images using natural language instructions. Remove objects, replace elements, change colors - all powered by open-source AI models.
18
+
19
+ ## ✨ Features
20
+
21
+ - πŸ—£οΈ **Natural Language Editing** - Just describe what you want to change
22
+ - 🎯 **Auto Object Detection** - Automatically finds objects using CLIPSeg
23
+ - πŸ–ŒοΈ **Precise Inpainting** - Stable Diffusion for high-quality results
24
+ - πŸ”’ **100% Private** - No external APIs, all processing is local
25
+ - πŸ’» **CPU & GPU Support** - Works on any hardware
26
+
27
+ ## πŸ“ Usage Examples
28
+
29
+ | Instruction | What It Does |
30
+ | -------------------------------- | ----------------------------------------- |
31
+ | `remove the person` | Erases a person and fills with background |
32
+ | `replace the car with a bicycle` | Swaps objects |
33
+ | `change the sky to sunset` | Changes appearance |
34
+
35
+ ## πŸ”’ Privacy
36
+
37
+ All processing happens locally. No data is sent to external APIs.
app.py ADDED
@@ -0,0 +1,964 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 🎨 AI Image Editor Pro - Streamlit Version
3
+ =============================================
4
+ A private, self-hosted AI image editing tool using open-source models.
5
+ Runs on Hugging Face Spaces with Streamlit SDK.
6
+ Now with advanced Gemini-style instruction understanding!
7
+ """
8
+
9
+ import os
10
+ import gc
11
+ import re
12
+ import torch
13
+ import numpy as np
14
+ import streamlit as st
15
+ from PIL import Image
16
+ from typing import Tuple, Optional, Dict, List
17
+ from io import BytesIO
18
+
19
+ # ============================================================================
20
+ # PAGE CONFIG (must be first Streamlit command)
21
+ # ============================================================================
22
+
23
+ st.set_page_config(
24
+ page_title="🎨 AI Image Editor Pro",
25
+ page_icon="🎨",
26
+ layout="wide",
27
+ initial_sidebar_state="expanded"
28
+ )
29
+
30
+ # ============================================================================
31
+ # CONFIGURATION
32
+ # ============================================================================
33
+
34
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
35
+ DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32
36
+ INPAINT_MODEL = "runwayml/stable-diffusion-inpainting"
37
+ CLIPSEG_MODEL = "CIDAS/clipseg-rd64-refined"
38
+
39
+ # ============================================================================
40
+ # ADVANCED INSTRUCTION PARSER - GEMINI-STYLE
41
+ # ============================================================================
42
+
43
+ class GeminiStyleParser:
44
+ """
45
+ Advanced natural language parser that understands complex editing instructions
46
+ like Google Gemini. Handles various phrasings, synonyms, and compound commands.
47
+ """
48
+
49
+ # Comprehensive action patterns with synonyms
50
+ REMOVE_KEYWORDS = [
51
+ "remove", "delete", "erase", "get rid of", "take out", "eliminate",
52
+ "clear", "wipe", "clean up", "take away", "disappear", "vanish",
53
+ "make disappear", "get away", "rid of", "cut out", "crop out",
54
+ "hide", "discard", "throw away", "dispose", "extract", "pull out",
55
+ "subtract", "minus", "without", "lose", "drop", "ditch", "nix",
56
+ "scratch", "strike", "zap", "nuke", "kill", "destroy", "obliterate"
57
+ ]
58
+
59
+ REPLACE_KEYWORDS = [
60
+ "replace", "swap", "switch", "substitute", "exchange", "trade",
61
+ "put", "place", "add", "insert", "set", "change to", "turn into",
62
+ "transform to", "convert to", "make it", "make this", "transform into",
63
+ "morph into", "become", "evolve into", "shift to"
64
+ ]
65
+
66
+ CHANGE_KEYWORDS = [
67
+ "change", "modify", "alter", "adjust", "edit", "transform",
68
+ "convert", "turn", "make", "update", "recolor", "repaint",
69
+ "tint", "color", "paint", "dye", "shade", "hue", "tone",
70
+ "brighten", "darken", "lighten", "saturate", "desaturate"
71
+ ]
72
+
73
+ ADD_KEYWORDS = [
74
+ "add", "insert", "put", "place", "include", "attach",
75
+ "append", "introduce", "bring", "create", "generate",
76
+ "draw", "paint", "render", "give", "apply", "overlay"
77
+ ]
78
+
79
+ ENHANCE_KEYWORDS = [
80
+ "enhance", "improve", "beautify", "upgrade", "refine",
81
+ "polish", "perfect", "optimize", "boost", "amplify",
82
+ "sharpen", "clarify", "fix", "repair", "restore"
83
+ ]
84
+
85
+ # Prepositions and connectors
86
+ PREPOSITIONS = [
87
+ "with", "to", "into", "as", "by", "for", "from",
88
+ "using", "via", "through", "in place of", "instead of"
89
+ ]
90
+
91
+ # Color mappings for better understanding
92
+ COLORS = {
93
+ "red": "vibrant red colored",
94
+ "blue": "deep blue colored",
95
+ "green": "lush green colored",
96
+ "yellow": "bright yellow colored",
97
+ "orange": "warm orange colored",
98
+ "purple": "rich purple colored",
99
+ "pink": "soft pink colored",
100
+ "black": "pure black colored",
101
+ "white": "clean white colored",
102
+ "gold": "shimmering golden colored",
103
+ "silver": "metallic silver colored",
104
+ "brown": "natural brown colored",
105
+ "gray": "neutral gray colored",
106
+ "grey": "neutral grey colored",
107
+ "cyan": "cyan turquoise colored",
108
+ "magenta": "vivid magenta colored",
109
+ "teal": "elegant teal colored",
110
+ "navy": "deep navy blue colored",
111
+ "maroon": "rich maroon colored",
112
+ "olive": "earthy olive colored",
113
+ "coral": "beautiful coral colored",
114
+ "beige": "soft beige colored",
115
+ "tan": "warm tan colored",
116
+ "cream": "creamy off-white colored",
117
+ "mint": "fresh mint green colored",
118
+ "lavender": "delicate lavender colored",
119
+ "rose": "romantic rose colored",
120
+ "burgundy": "deep burgundy colored",
121
+ "bronze": "warm bronze colored"
122
+ }
123
+
124
+ # Object synonyms for better detection
125
+ OBJECT_SYNONYMS = {
126
+ "person": ["person", "human", "man", "woman", "people", "guy", "girl", "boy", "lady", "gentleman", "individual", "figure", "someone", "somebody", "pedestrian"],
127
+ "sky": ["sky", "clouds", "heaven", "atmosphere", "air above", "skyline"],
128
+ "car": ["car", "vehicle", "automobile", "auto", "ride", "wheels", "sedan", "suv", "truck", "van"],
129
+ "background": ["background", "backdrop", "behind", "scenery", "setting", "surroundings", "environment"],
130
+ "text": ["text", "words", "letters", "writing", "inscription", "watermark", "logo", "signature", "label", "caption"],
131
+ "grass": ["grass", "lawn", "turf", "field", "meadow", "greenery"],
132
+ "tree": ["tree", "plant", "vegetation", "foliage", "bush", "shrub"],
133
+ "water": ["water", "ocean", "sea", "lake", "river", "pond", "pool", "stream"],
134
+ "building": ["building", "house", "structure", "architecture", "construction", "edifice"],
135
+ "animal": ["animal", "pet", "creature", "dog", "cat", "bird"],
136
+ "face": ["face", "facial", "head", "portrait", "visage"],
137
+ "hair": ["hair", "hairstyle", "locks", "mane", "tresses"],
138
+ "clothes": ["clothes", "clothing", "outfit", "dress", "shirt", "pants", "garment", "attire", "wear"],
139
+ "wall": ["wall", "walls", "surface", "partition"],
140
+ "floor": ["floor", "ground", "flooring", "surface below"],
141
+ "window": ["window", "glass", "pane", "windowpane"],
142
+ "door": ["door", "doorway", "entrance", "entry", "gate"]
143
+ }
144
+
145
+ # Scene/style transformations
146
+ STYLE_TRANSFORMS = {
147
+ "sunset": "beautiful golden sunset sky with orange and pink clouds, dramatic lighting",
148
+ "sunrise": "stunning sunrise with warm golden light, peaceful morning atmosphere",
149
+ "night": "dark nighttime scene with stars, moonlit atmosphere",
150
+ "day": "bright daylight, clear blue sky, natural sunlight",
151
+ "winter": "snowy winter scene, frost covered, cold atmosphere",
152
+ "summer": "bright summer day, warm sunny atmosphere",
153
+ "autumn": "fall colors, orange and brown leaves, autumn atmosphere",
154
+ "spring": "fresh spring scene, blooming flowers, new growth",
155
+ "rain": "rainy weather, wet surfaces, overcast sky",
156
+ "snow": "heavy snowfall, white snow covered, winter wonderland",
157
+ "foggy": "misty foggy atmosphere, soft diffused light",
158
+ "stormy": "dramatic stormy sky, dark clouds, lightning",
159
+ "vintage": "vintage retro aesthetic, warm sepia tones, nostalgic feel",
160
+ "cyberpunk": "neon cyberpunk aesthetic, futuristic, glowing lights",
161
+ "fantasy": "magical fantasy scene, ethereal atmosphere, dreamlike",
162
+ "realistic": "photorealistic, natural, lifelike quality",
163
+ "cartoon": "cartoon animated style, colorful, illustrated",
164
+ "anime": "anime style, japanese animation aesthetic",
165
+ "watercolor": "watercolor painting style, soft brushstrokes",
166
+ "oil painting": "oil painting style, rich textures, artistic",
167
+ "sketch": "pencil sketch style, hand-drawn look",
168
+ "cinematic": "cinematic movie quality, dramatic lighting, film-like",
169
+ "hdr": "high dynamic range, vivid colors, enhanced contrast",
170
+ "dreamy": "soft dreamy atmosphere, ethereal glow, romantic",
171
+ "dramatic": "dramatic lighting, high contrast, intense mood",
172
+ "peaceful": "calm peaceful atmosphere, serene, tranquil",
173
+ "scary": "dark scary atmosphere, horror aesthetic, ominous",
174
+ "happy": "bright cheerful atmosphere, joyful, vibrant colors",
175
+ "sad": "melancholic atmosphere, muted colors, somber mood"
176
+ }
177
+
178
+ def __init__(self):
179
+ self.last_confidence = 0.0
180
+ self.interpretation = ""
181
+
182
+ def normalize_text(self, text: str) -> str:
183
+ """Normalize input text for better parsing."""
184
+ text = text.lower().strip()
185
+ # Remove extra whitespace
186
+ text = re.sub(r'\s+', ' ', text)
187
+ # Remove common punctuation that doesn't affect meaning
188
+ text = re.sub(r'[.,!?;:]+$', '', text)
189
+ # Handle contractions
190
+ text = text.replace("don't", "do not")
191
+ text = text.replace("can't", "cannot")
192
+ text = text.replace("won't", "will not")
193
+ text = text.replace("i'd", "i would")
194
+ text = text.replace("i'm", "i am")
195
+ text = text.replace("it's", "it is")
196
+ return text
197
+
198
+ def extract_target_object(self, text: str) -> str:
199
+ """Extract the target object from the instruction."""
200
+ # Remove common filler words
201
+ filler_words = ["the", "a", "an", "this", "that", "those", "these", "my", "your", "please", "kindly", "can you", "could you", "would you", "i want to", "i'd like to", "i would like to"]
202
+ result = text
203
+ for filler in filler_words:
204
+ result = re.sub(r'\b' + filler + r'\b', '', result, flags=re.IGNORECASE)
205
+ return result.strip()
206
+
207
+ def find_best_synonym(self, target: str) -> str:
208
+ """Find the best matching object for CLIPSeg detection."""
209
+ target_lower = target.lower()
210
+
211
+ # Check if target matches any known synonym
212
+ for main_object, synonyms in self.OBJECT_SYNONYMS.items():
213
+ for synonym in synonyms:
214
+ if synonym in target_lower or target_lower in synonym:
215
+ return main_object
216
+
217
+ return target
218
+
219
+ def enhance_prompt(self, prompt: str) -> str:
220
+ """Enhance the replacement prompt for better results."""
221
+ prompt_lower = prompt.lower()
222
+
223
+ # Check for style transformations
224
+ for style_key, style_value in self.STYLE_TRANSFORMS.items():
225
+ if style_key in prompt_lower:
226
+ return f"{style_value}, high quality, detailed, professional"
227
+
228
+ # Check for colors and enhance
229
+ for color_key, color_value in self.COLORS.items():
230
+ if color_key in prompt_lower:
231
+ prompt = prompt.replace(color_key, color_value)
232
+
233
+ # Add quality modifiers if not present
234
+ quality_terms = ["high quality", "detailed", "professional", "beautiful", "stunning"]
235
+ has_quality = any(term in prompt_lower for term in quality_terms)
236
+
237
+ if not has_quality:
238
+ prompt = f"{prompt}, high quality, detailed, professional photography"
239
+
240
+ return prompt
241
+
242
+ def detect_action_type(self, text: str) -> str:
243
+ """Detect the type of editing action requested."""
244
+ text_lower = text.lower()
245
+
246
+ for keyword in self.REMOVE_KEYWORDS:
247
+ if keyword in text_lower:
248
+ return "remove"
249
+
250
+ for keyword in self.ADD_KEYWORDS:
251
+ if keyword in text_lower:
252
+ return "add"
253
+
254
+ for keyword in self.REPLACE_KEYWORDS:
255
+ if keyword in text_lower:
256
+ return "replace"
257
+
258
+ for keyword in self.CHANGE_KEYWORDS:
259
+ if keyword in text_lower:
260
+ return "change"
261
+
262
+ for keyword in self.ENHANCE_KEYWORDS:
263
+ if keyword in text_lower:
264
+ return "enhance"
265
+
266
+ return "general"
267
+
268
+ def parse(self, instruction: str) -> Tuple[str, str, float]:
269
+ """
270
+ Parse the instruction and return (target, replacement_prompt, confidence).
271
+ This is the main parsing method that handles all types of instructions.
272
+ """
273
+ original = instruction
274
+ normalized = self.normalize_text(instruction)
275
+ action_type = self.detect_action_type(normalized)
276
+
277
+ target = ""
278
+ replacement = ""
279
+ confidence = 0.5
280
+
281
+ # ===== REMOVE ACTION =====
282
+ if action_type == "remove":
283
+ for keyword in self.REMOVE_KEYWORDS:
284
+ if keyword in normalized:
285
+ target = normalized.split(keyword, 1)[-1].strip()
286
+ break
287
+
288
+ target = self.extract_target_object(target)
289
+ target = self.find_best_synonym(target)
290
+ replacement = "clean empty background, seamless natural texture, nothing there, blank space"
291
+ confidence = 0.85
292
+ self.interpretation = f"πŸ—‘οΈ Remove: Detecting and removing '{target}'"
293
+
294
+ # ===== ADD ACTION =====
295
+ elif action_type == "add":
296
+ for keyword in self.ADD_KEYWORDS:
297
+ if keyword in normalized:
298
+ parts = normalized.split(keyword, 1)
299
+ if len(parts) > 1:
300
+ target = "main subject area"
301
+ replacement = parts[1].strip()
302
+ break
303
+
304
+ replacement = self.extract_target_object(replacement)
305
+ replacement = self.enhance_prompt(replacement)
306
+ confidence = 0.75
307
+ self.interpretation = f"βž• Add: Adding '{replacement}' to the image"
308
+
309
+ # ===== REPLACE ACTION =====
310
+ elif action_type == "replace":
311
+ # Try to find "X with Y" or "X to Y" patterns
312
+ preposition_found = False
313
+ for prep in self.PREPOSITIONS:
314
+ if f" {prep} " in normalized:
315
+ parts = normalized.split(f" {prep} ", 1)
316
+
317
+ # Extract target from first part
318
+ first_part = parts[0]
319
+ for keyword in self.REPLACE_KEYWORDS + self.CHANGE_KEYWORDS:
320
+ first_part = first_part.replace(keyword, "")
321
+ target = self.extract_target_object(first_part)
322
+ target = self.find_best_synonym(target)
323
+
324
+ # Extract replacement from second part
325
+ replacement = self.extract_target_object(parts[1])
326
+ replacement = self.enhance_prompt(replacement)
327
+
328
+ preposition_found = True
329
+ confidence = 0.9
330
+ break
331
+
332
+ if not preposition_found:
333
+ # Fallback: try to extract target and use generic replacement
334
+ for keyword in self.REPLACE_KEYWORDS:
335
+ if keyword in normalized:
336
+ target = normalized.split(keyword, 1)[-1].strip()
337
+ target = self.extract_target_object(target)
338
+ target = self.find_best_synonym(target)
339
+ replacement = "something different, new object, alternative"
340
+ confidence = 0.6
341
+ break
342
+
343
+ self.interpretation = f"πŸ”„ Replace: Replacing '{target}' with '{replacement[:50]}...'"
344
+
345
+ # ===== CHANGE ACTION =====
346
+ elif action_type == "change":
347
+ # Look for patterns like "change X to Y" or "make X Y"
348
+ preposition_found = False
349
+ for prep in ["to", "into", "as"]:
350
+ if f" {prep} " in normalized:
351
+ parts = normalized.split(f" {prep} ", 1)
352
+
353
+ # Extract target from first part
354
+ first_part = parts[0]
355
+ for keyword in self.CHANGE_KEYWORDS:
356
+ first_part = first_part.replace(keyword, "")
357
+ target = self.extract_target_object(first_part)
358
+ target = self.find_best_synonym(target)
359
+
360
+ # Extract new state from second part
361
+ new_state = self.extract_target_object(parts[1])
362
+
363
+ # Combine target with new state for replacement
364
+ replacement = f"{target} that is {new_state}, {self.enhance_prompt(new_state)}"
365
+
366
+ preposition_found = True
367
+ confidence = 0.85
368
+ break
369
+
370
+ if not preposition_found:
371
+ # Check for color changes like "make it red"
372
+ for color in self.COLORS.keys():
373
+ if color in normalized:
374
+ target = "main subject"
375
+ replacement = f"{self.COLORS[color]}, high quality, detailed"
376
+ confidence = 0.8
377
+ preposition_found = True
378
+ break
379
+
380
+ if not preposition_found:
381
+ target = "main subject"
382
+ replacement = self.enhance_prompt(normalized)
383
+ confidence = 0.6
384
+
385
+ self.interpretation = f"✏️ Change: Modifying '{target}' β†’ '{replacement[:50]}...'"
386
+
387
+ # ===== ENHANCE ACTION =====
388
+ elif action_type == "enhance":
389
+ target = "main subject"
390
+ replacement = "enhanced improved professional high quality detailed stunning beautiful"
391
+ confidence = 0.7
392
+ self.interpretation = f"✨ Enhance: Improving overall image quality"
393
+
394
+ # ===== GENERAL/UNKNOWN ACTION =====
395
+ else:
396
+ # Try to intelligently guess from the instruction
397
+ # Check if it's just a noun/object (user wants to remove it)
398
+ words = normalized.split()
399
+ if len(words) <= 3:
400
+ target = self.find_best_synonym(normalized)
401
+ replacement = "clean empty background, seamless natural texture"
402
+ confidence = 0.5
403
+ self.interpretation = f"πŸ€” Guessing: You might want to remove '{target}'?"
404
+ else:
405
+ # Treat as a creative prompt
406
+ target = "main subject area"
407
+ replacement = self.enhance_prompt(normalized)
408
+ confidence = 0.5
409
+ self.interpretation = f"🎨 Creative: Applying '{replacement[:50]}...'"
410
+
411
+ # Final cleanup
412
+ target = target.strip() if target else "main subject"
413
+ replacement = replacement.strip() if replacement else "improved version"
414
+
415
+ # Store confidence
416
+ self.last_confidence = confidence
417
+
418
+ return target, replacement, confidence
419
+
420
+
421
+ # Create global parser instance
422
+ gemini_parser = GeminiStyleParser()
423
+
424
+
425
+ def parse_instruction(instruction: str) -> Tuple[str, str]:
426
+ """
427
+ Enhanced parsing function that uses the GeminiStyleParser.
428
+ Maintains backward compatibility with existing code.
429
+ """
430
+ target, replacement, _ = gemini_parser.parse(instruction)
431
+ return target, replacement
432
+
433
+
434
+ # ============================================================================
435
+ # MODEL CACHING
436
+ # ============================================================================
437
+
438
+ @st.cache_resource
439
+ def load_inpaint_pipeline():
440
+ """Load and cache the inpainting pipeline."""
441
+ from diffusers import StableDiffusionInpaintPipeline
442
+
443
+ pipe = StableDiffusionInpaintPipeline.from_pretrained(
444
+ INPAINT_MODEL,
445
+ torch_dtype=DTYPE,
446
+ safety_checker=None,
447
+ requires_safety_checker=False
448
+ )
449
+
450
+ pipe = pipe.to(DEVICE)
451
+
452
+ if DEVICE == "cuda":
453
+ pipe.enable_attention_slicing()
454
+ try:
455
+ pipe.enable_xformers_memory_efficient_attention()
456
+ except Exception:
457
+ pass
458
+ else:
459
+ pipe.enable_attention_slicing(1)
460
+
461
+ return pipe
462
+
463
+
464
+ @st.cache_resource
465
+ def load_clipseg():
466
+ """Load and cache CLIPSeg for automatic mask generation."""
467
+ from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation
468
+
469
+ processor = CLIPSegProcessor.from_pretrained(CLIPSEG_MODEL)
470
+ model = CLIPSegForImageSegmentation.from_pretrained(CLIPSEG_MODEL)
471
+ model = model.to(DEVICE)
472
+ model.eval()
473
+
474
+ return processor, model
475
+
476
+
477
+ # ============================================================================
478
+ # MASK GENERATION (Enhanced)
479
+ # ============================================================================
480
+
481
+ def generate_mask_clipseg(
482
+ image: Image.Image,
483
+ target_text: str,
484
+ threshold: float = 0.3,
485
+ expand_pixels: int = 10
486
+ ) -> Optional[Image.Image]:
487
+ """Generate a segmentation mask using CLIPSeg with enhanced detection."""
488
+ try:
489
+ processor, model = load_clipseg()
490
+
491
+ # Try multiple variations of the target text for better detection
492
+ target_variations = [
493
+ target_text,
494
+ f"a {target_text}",
495
+ f"the {target_text}",
496
+ f"{target_text} in photo",
497
+ f"photo of {target_text}"
498
+ ]
499
+
500
+ best_mask = None
501
+ best_score = 0
502
+
503
+ for variation in target_variations:
504
+ inputs = processor(
505
+ text=[variation],
506
+ images=[image],
507
+ padding=True,
508
+ return_tensors="pt"
509
+ )
510
+ inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
511
+
512
+ with torch.no_grad():
513
+ outputs = model(**inputs)
514
+ preds = outputs.logits
515
+
516
+ pred = torch.sigmoid(preds[0]).cpu().numpy()
517
+ score = pred.max()
518
+
519
+ if score > best_score:
520
+ best_score = score
521
+ best_mask = pred
522
+
523
+ if best_mask is None:
524
+ return None
525
+
526
+ # Resize to original image size
527
+ pred_pil = Image.fromarray((best_mask * 255).astype(np.uint8))
528
+ pred_resized = pred_pil.resize(image.size, Image.BILINEAR)
529
+ pred_array = np.array(pred_resized)
530
+
531
+ # Apply threshold
532
+ mask = (pred_array > (threshold * 255)).astype(np.uint8) * 255
533
+
534
+ # Expand mask
535
+ if expand_pixels > 0:
536
+ from PIL import ImageFilter
537
+ mask_image = Image.fromarray(mask, mode="L")
538
+ mask_image = mask_image.filter(
539
+ ImageFilter.MaxFilter(size=expand_pixels * 2 + 1)
540
+ )
541
+ mask_image = mask_image.filter(
542
+ ImageFilter.GaussianBlur(radius=3)
543
+ )
544
+ return mask_image
545
+
546
+ return Image.fromarray(mask, mode="L")
547
+
548
+ except Exception as e:
549
+ st.error(f"Mask generation error: {str(e)}")
550
+ return None
551
+
552
+
553
+ def process_manual_mask(mask_image: Image.Image, target_size: Tuple[int, int]) -> Image.Image:
554
+ """Process a manually uploaded mask."""
555
+ mask = mask_image.convert("L")
556
+ mask = mask.resize(target_size, Image.LANCZOS)
557
+ mask_array = np.array(mask)
558
+ mask_array = ((mask_array > 127) * 255).astype(np.uint8)
559
+ return Image.fromarray(mask_array, mode="L")
560
+
561
+
562
+ # ============================================================================
563
+ # IMAGE INPAINTING (Enhanced)
564
+ # ============================================================================
565
+
566
+ def inpaint_image(
567
+ image: Image.Image,
568
+ mask: Image.Image,
569
+ prompt: str,
570
+ negative_prompt: str = "blurry, bad quality, distorted, ugly, deformed, low resolution, pixelated, jpeg artifacts, watermark, text, logo",
571
+ num_inference_steps: int = 30,
572
+ guidance_scale: float = 7.5
573
+ ) -> Optional[Image.Image]:
574
+ """Inpaint the masked region of an image with enhanced prompts."""
575
+ try:
576
+ pipe = load_inpaint_pipeline()
577
+
578
+ # Resize for SD (512x512)
579
+ original_size = image.size
580
+ target_size = (512, 512)
581
+
582
+ image_resized = image.resize(target_size, Image.LANCZOS)
583
+ mask_resized = mask.resize(target_size, Image.NEAREST)
584
+
585
+ if image_resized.mode != "RGB":
586
+ image_resized = image_resized.convert("RGB")
587
+
588
+ # Adjust steps for CPU
589
+ if DEVICE == "cpu":
590
+ num_inference_steps = min(num_inference_steps, 20)
591
+
592
+ # Enhanced prompt engineering
593
+ enhanced_prompt = f"{prompt}, masterpiece, best quality, highly detailed, sharp focus, professional"
594
+
595
+ with torch.inference_mode():
596
+ result = pipe(
597
+ prompt=enhanced_prompt,
598
+ negative_prompt=negative_prompt,
599
+ image=image_resized,
600
+ mask_image=mask_resized,
601
+ num_inference_steps=num_inference_steps,
602
+ guidance_scale=guidance_scale
603
+ ).images[0]
604
+
605
+ result = result.resize(original_size, Image.LANCZOS)
606
+
607
+ if DEVICE == "cpu":
608
+ gc.collect()
609
+
610
+ return result
611
+
612
+ except Exception as e:
613
+ st.error(f"Inpainting error: {str(e)}")
614
+ return None
615
+
616
+
617
+ # ============================================================================
618
+ # CUSTOM CSS FOR PRO LOOK
619
+ # ============================================================================
620
+
621
+ def inject_custom_css():
622
+ """Inject custom CSS for a more professional look."""
623
+ st.markdown("""
624
+ <style>
625
+ /* Dark theme with gradients */
626
+ .stApp {
627
+ background: linear-gradient(135deg, #1a1a2e 0%, #16213e 50%, #0f3460 100%);
628
+ }
629
+
630
+ /* Styled headers */
631
+ h1 {
632
+ background: linear-gradient(90deg, #e94560, #0f3460);
633
+ -webkit-background-clip: text;
634
+ -webkit-text-fill-color: transparent;
635
+ font-size: 2.5rem !important;
636
+ }
637
+
638
+ /* Card-like containers */
639
+ .stButton > button {
640
+ background: linear-gradient(90deg, #e94560, #533483);
641
+ border: none;
642
+ border-radius: 10px;
643
+ font-weight: bold;
644
+ transition: all 0.3s ease;
645
+ }
646
+
647
+ .stButton > button:hover {
648
+ transform: translateY(-2px);
649
+ box-shadow: 0 5px 20px rgba(233, 69, 96, 0.4);
650
+ }
651
+
652
+ /* Styled file uploader */
653
+ .stFileUploader {
654
+ border: 2px dashed #e94560;
655
+ border-radius: 15px;
656
+ padding: 20px;
657
+ }
658
+
659
+ /* Confidence indicator */
660
+ .confidence-high {
661
+ color: #4ade80;
662
+ font-weight: bold;
663
+ }
664
+
665
+ .confidence-medium {
666
+ color: #fbbf24;
667
+ font-weight: bold;
668
+ }
669
+
670
+ .confidence-low {
671
+ color: #f87171;
672
+ font-weight: bold;
673
+ }
674
+
675
+ /* Interpretation box */
676
+ .interpretation-box {
677
+ background: rgba(233, 69, 96, 0.1);
678
+ border-left: 4px solid #e94560;
679
+ padding: 10px 15px;
680
+ border-radius: 0 10px 10px 0;
681
+ margin: 10px 0;
682
+ }
683
+
684
+ /* Pro badge */
685
+ .pro-badge {
686
+ background: linear-gradient(90deg, #e94560, #533483);
687
+ padding: 2px 10px;
688
+ border-radius: 20px;
689
+ font-size: 0.8rem;
690
+ font-weight: bold;
691
+ color: white;
692
+ }
693
+
694
+ /* Smooth transitions */
695
+ * {
696
+ transition: background-color 0.3s ease, color 0.3s ease;
697
+ }
698
+ </style>
699
+ """, unsafe_allow_html=True)
700
+
701
+
702
+ # ============================================================================
703
+ # MAIN APP
704
+ # ============================================================================
705
+
706
+ def main():
707
+ inject_custom_css()
708
+
709
+ st.markdown("""
710
+ <div style="display: flex; align-items: center; gap: 10px;">
711
+ <h1>🎨 AI Image Editor</h1>
712
+ <span class="pro-badge">PRO</span>
713
+ </div>
714
+ """, unsafe_allow_html=True)
715
+
716
+ st.markdown("**Gemini-style image editing with advanced prompt understanding - 100% Private!**")
717
+
718
+ # Sidebar
719
+ with st.sidebar:
720
+ st.header("βš™οΈ Settings")
721
+
722
+ auto_mask = st.checkbox(
723
+ "πŸ” Auto-detect region",
724
+ value=True,
725
+ help="Automatically find the object to edit using AI"
726
+ )
727
+
728
+ st.markdown("---")
729
+ st.subheader("🎚️ Advanced Options")
730
+
731
+ mask_threshold = st.slider(
732
+ "Detection Sensitivity",
733
+ min_value=0.1,
734
+ max_value=0.9,
735
+ value=0.25,
736
+ step=0.05,
737
+ help="Lower = larger detection area"
738
+ )
739
+
740
+ mask_expansion = st.slider(
741
+ "Mask Expansion (px)",
742
+ min_value=0,
743
+ max_value=50,
744
+ value=15,
745
+ step=2,
746
+ help="Expand the detected area for better blending"
747
+ )
748
+
749
+ num_steps = st.slider(
750
+ "Quality Steps",
751
+ min_value=10,
752
+ max_value=50,
753
+ value=20 if DEVICE == "cpu" else 35,
754
+ step=5,
755
+ help="More = better quality but slower"
756
+ )
757
+
758
+ guidance_scale = st.slider(
759
+ "Prompt Strength",
760
+ min_value=1.0,
761
+ max_value=15.0,
762
+ value=8.5,
763
+ step=0.5,
764
+ help="Higher = more closely follows your instructions"
765
+ )
766
+
767
+ st.markdown("---")
768
+ device_emoji = "πŸš€" if DEVICE == "cuda" else "πŸ’»"
769
+ st.info(f"{device_emoji} Device: **{DEVICE.upper()}**")
770
+
771
+ if DEVICE == "cpu":
772
+ st.warning("⚠️ Running on CPU. Edits may take 1-3 minutes.")
773
+ else:
774
+ st.success("βœ… GPU detected! Fast processing enabled.")
775
+
776
+ # Main content
777
+ col1, col2 = st.columns(2)
778
+
779
+ with col1:
780
+ st.subheader("πŸ“· Upload Image")
781
+ uploaded_file = st.file_uploader(
782
+ "Choose an image",
783
+ type=["png", "jpg", "jpeg", "webp", "bmp"],
784
+ label_visibility="collapsed"
785
+ )
786
+
787
+ image = None
788
+ if uploaded_file is not None:
789
+ image = Image.open(uploaded_file).convert("RGB")
790
+ st.image(image, caption="Original Image", use_container_width=True)
791
+
792
+ st.subheader("✏️ What would you like to change?")
793
+ instruction = st.text_area(
794
+ "Describe your edit naturally",
795
+ placeholder="Examples:\nβ€’ 'Remove the person in the background'\nβ€’ 'Replace the sky with a sunset'\nβ€’ 'Make the car red'\nβ€’ 'Add a rainbow'\nβ€’ 'Turn the grass into snow'\nβ€’ 'Delete the watermark'",
796
+ label_visibility="collapsed",
797
+ height=120
798
+ )
799
+
800
+ # Show interpretation preview
801
+ if instruction:
802
+ target_preview, replacement_preview, confidence = gemini_parser.parse(instruction)
803
+
804
+ confidence_class = "high" if confidence >= 0.8 else "medium" if confidence >= 0.6 else "low"
805
+ confidence_pct = int(confidence * 100)
806
+
807
+ st.markdown(f"""
808
+ <div class="interpretation-box">
809
+ <strong>🧠 Understanding:</strong> {gemini_parser.interpretation}<br>
810
+ <span class="confidence-{confidence_class}">Confidence: {confidence_pct}%</span>
811
+ </div>
812
+ """, unsafe_allow_html=True)
813
+
814
+ mask_file = None
815
+ if not auto_mask:
816
+ st.subheader("πŸ“ Manual Mask")
817
+ mask_file = st.file_uploader(
818
+ "Upload a black & white mask (white = area to edit)",
819
+ type=["png", "jpg", "jpeg"],
820
+ key="mask"
821
+ )
822
+
823
+ edit_clicked = st.button(
824
+ "🎨 Apply Edit",
825
+ type="primary",
826
+ use_container_width=True,
827
+ disabled=(uploaded_file is None or not instruction)
828
+ )
829
+
830
+ with col2:
831
+ st.subheader("✨ Result")
832
+ result_placeholder = st.empty()
833
+ mask_placeholder = st.empty()
834
+ status_placeholder = st.empty()
835
+ download_placeholder = st.empty()
836
+
837
+ if edit_clicked and image is not None and instruction:
838
+ try:
839
+ target, replacement_prompt, confidence = gemini_parser.parse(instruction)
840
+
841
+ status_placeholder.info(f"🎯 **Target:** `{target}`\n\n✨ **Generating:** `{replacement_prompt[:100]}...`")
842
+
843
+ # Generate mask
844
+ if mask_file is not None:
845
+ mask_img = Image.open(mask_file)
846
+ final_mask = process_manual_mask(mask_img, image.size)
847
+ status_placeholder.info("πŸ“ Using manual mask...")
848
+ elif auto_mask:
849
+ with st.spinner(f"πŸ” AI detecting '{target}'..."):
850
+ final_mask = generate_mask_clipseg(
851
+ image=image,
852
+ target_text=target,
853
+ threshold=mask_threshold,
854
+ expand_pixels=mask_expansion
855
+ )
856
+ if final_mask is None:
857
+ st.error("Failed to generate mask")
858
+ st.stop()
859
+ else:
860
+ st.error("Please upload a mask or enable auto-detection!")
861
+ st.stop()
862
+
863
+ # Check mask has content
864
+ mask_array = np.array(final_mask)
865
+ if mask_array.max() < 128:
866
+ st.warning(f"⚠️ Could not confidently detect '{target}'. Trying with broader detection...")
867
+ # Retry with lower threshold
868
+ final_mask = generate_mask_clipseg(
869
+ image=image,
870
+ target_text=target,
871
+ threshold=mask_threshold * 0.5,
872
+ expand_pixels=mask_expansion * 2
873
+ )
874
+ if final_mask is None or np.array(final_mask).max() < 128:
875
+ st.error(f"❌ Still could not detect '{target}'. Try different wording or upload a mask.")
876
+ st.stop()
877
+
878
+ mask_placeholder.image(final_mask, caption="🎭 Detected Area", use_container_width=True)
879
+
880
+ # Inpaint
881
+ with st.spinner("🎨 AI is editing your image... This may take a moment."):
882
+ result = inpaint_image(
883
+ image=image,
884
+ mask=final_mask,
885
+ prompt=replacement_prompt,
886
+ num_inference_steps=num_steps,
887
+ guidance_scale=guidance_scale
888
+ )
889
+
890
+ if result is not None:
891
+ result_placeholder.image(result, caption="βœ… Edited Image", use_container_width=True)
892
+ status_placeholder.success("βœ… Edit complete!")
893
+
894
+ buf = BytesIO()
895
+ result.save(buf, format="PNG")
896
+ download_placeholder.download_button(
897
+ label="πŸ“₯ Download Result",
898
+ data=buf.getvalue(),
899
+ file_name="edited_image.png",
900
+ mime="image/png",
901
+ use_container_width=True
902
+ )
903
+ else:
904
+ st.error("Inpainting failed")
905
+
906
+ except Exception as e:
907
+ st.error(f"❌ Error: {str(e)}")
908
+
909
+ elif uploaded_file is None:
910
+ result_placeholder.info("πŸ‘† Upload an image to get started")
911
+
912
+ # Enhanced Examples Section
913
+ st.markdown("---")
914
+ st.subheader("πŸ’‘ Pro Tips & Examples")
915
+
916
+ c1, c2, c3, c4 = st.columns(4)
917
+
918
+ with c1:
919
+ st.markdown("""
920
+ **πŸ—‘οΈ Remove Objects:**
921
+ - `remove the person`
922
+ - `delete the watermark`
923
+ - `erase the car`
924
+ - `get rid of the background`
925
+ - `take out the text`
926
+ """)
927
+
928
+ with c2:
929
+ st.markdown("""
930
+ **πŸ”„ Replace Objects:**
931
+ - `replace sky with sunset`
932
+ - `swap the car with a bike`
933
+ - `change background to beach`
934
+ - `turn grass into snow`
935
+ """)
936
+
937
+ with c3:
938
+ st.markdown("""
939
+ **🎨 Change Colors:**
940
+ - `make the car red`
941
+ - `change dress to blue`
942
+ - `turn hair blonde`
943
+ - `paint walls white`
944
+ """)
945
+
946
+ with c4:
947
+ st.markdown("""
948
+ **✨ Transform Styles:**
949
+ - `make it sunset lighting`
950
+ - `turn into winter scene`
951
+ - `add cyberpunk aesthetic`
952
+ - `make it cinematic`
953
+ """)
954
+
955
+ st.markdown("---")
956
+ st.markdown(
957
+ """<center>πŸ”’ <b>Privacy First</b> - All processing happens locally. No data sent to external APIs.<br>
958
+ <span style="color: #888;">Powered by Stable Diffusion + CLIPSeg | Created with ❀️</span></center>""",
959
+ unsafe_allow_html=True
960
+ )
961
+
962
+
963
+ if __name__ == "__main__":
964
+ main()
requirements.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core ML Framework
2
+ torch>=2.0.0
3
+ torchvision>=0.15.0
4
+
5
+ # Hugging Face Libraries
6
+ transformers>=4.35.0
7
+ diffusers>=0.24.0
8
+ accelerate>=0.25.0
9
+ safetensors>=0.4.0
10
+
11
+ # Image Processing
12
+ Pillow>=10.0.0
13
+ numpy>=1.24.0
14
+
15
+ # Streamlit UI
16
+ streamlit>=1.28.0
17
+
18
+ # Hugging Face Hub
19
+ huggingface_hub>=0.19.0