Spaces:

LogicGoInfotechSpaces
/

object_remover

Running

App Files Files Community

HariLogicgo commited on Dec 31, 2025

Commit

861422e

1 Parent(s): 5517f63

new gemini api

Browse files

Files changed (10) hide show

API_Usage_Guide.md +8 -4
HTTP_API_Documentation.txt +10 -6
README.md +0 -0
api/main.py +93 -56
app.py +0 -2
assets/big-lama.pt +0 -3
requirements.txt +8 -26
src/core.py +196 -536
src/helper.py +0 -87
src/st_style.py +0 -42

API_Usage_Guide.md CHANGED Viewed

@@ -4,7 +4,9 @@
 This guide provides step-by-step instructions for using the Photo Object Removal API to remove objects from images using AI inpainting.
 **Base URL:** `https://logicgoinfotechspaces-object-remover.hf.space`
-**Authentication:** Bearer token (optional)
 ## Quick Start
@@ -39,7 +41,7 @@ Remove objects using the uploaded image and mask:
 ```bash
 curl -H "Authorization: Bearer <API_TOKEN>" \
   -H "Content-Type: application/json" \
-  -d '{"image_id":"9cf61445-f83b-4c97-9272-c81647f90d68","mask_id":"d044a390-dde2-408a-b7cf-d508385e56ed"}' \
   https://logicgoinfotechspaces-object-remover.hf.space/inpaint
 ```
 **Response:** `{"result":"output_b09568698bbd4aa591b1598c01f2f745.png"}`
@@ -57,7 +59,7 @@ Use `/inpaint-url` to get a shareable URL:
 ```bash
 curl -H "Authorization: Bearer <API_TOKEN>" \
   -H "Content-Type: application/json" \
-  -d '{"image_id":"<image_id>","mask_id":"<mask_id>"}' \
   https://logicgoinfotechspaces-object-remover.hf.space/inpaint-url
 ```
 **Response:**
@@ -74,6 +76,7 @@ Upload and process in a single request:
 curl -H "Authorization: Bearer <API_TOKEN>" \
   -F image=@image.jpg \
   -F mask=@mask.jpg \
   https://logicgoinfotechspaces-object-remover.hf.space/inpaint-multipart
 ```
@@ -112,7 +115,8 @@ curl -L https://logicgoinfotechspaces-object-remover.hf.space/download/output_xx
 ```json
 {
   "image_id": "9cf61445-f83b-4c97-9272-c81647f90d68",
-  "mask_id": "d044a390-dde2-408a-b7cf-d508385e56ed"
 }
 ```

 This guide provides step-by-step instructions for using the Photo Object Removal API to remove objects from images using AI inpainting.
 **Base URL:** `https://logicgoinfotechspaces-object-remover.hf.space`
+**Authentication:** Bearer token (optional)
+**Storage:** Uploaded images/masks are saved in MongoDB GridFS (database `object_remover`); IDs returned by upload endpoints are pulled from GridFS before sending to Gemini.
+- Processing is delegated to Google Gemini/Imagen edit API; only lightweight CPU work (mask prep, file IO) happens on this server.
 ## Quick Start
 ```bash
 curl -H "Authorization: Bearer <API_TOKEN>" \
   -H "Content-Type: application/json" \
+  -d '{"image_id":"9cf61445-f83b-4c97-9272-c81647f90d68","mask_id":"d044a390-dde2-408a-b7cf-d508385e56ed","prompt":"Describe what should be removed"}' \
   https://logicgoinfotechspaces-object-remover.hf.space/inpaint
 ```
 **Response:** `{"result":"output_b09568698bbd4aa591b1598c01f2f745.png"}`
 ```bash
 curl -H "Authorization: Bearer <API_TOKEN>" \
   -H "Content-Type: application/json" \
+  -d '{"image_id":"<image_id>","mask_id":"<mask_id>","prompt":"Describe what should be removed"}' \
   https://logicgoinfotechspaces-object-remover.hf.space/inpaint-url
 ```
 **Response:**
 curl -H "Authorization: Bearer <API_TOKEN>" \
   -F image=@image.jpg \
   -F mask=@mask.jpg \
+  -F prompt="Describe what should be removed" \
   https://logicgoinfotechspaces-object-remover.hf.space/inpaint-multipart
 ```
 ```json
 {
   "image_id": "9cf61445-f83b-4c97-9272-c81647f90d68",
+  "mask_id": "d044a390-dde2-408a-b7cf-d508385e56ed",
+  "prompt": "Describe what should be removed"
 }
 ```

HTTP_API_Documentation.txt CHANGED Viewed

@@ -7,6 +7,8 @@ Authentication:
 - Set API_TOKEN environment variable on server to enable auth
 - Send header: Authorization: Bearer <API_TOKEN>
 - If API_TOKEN not set, all endpoints are publicly accessible
 Available Endpoints:
@@ -34,21 +36,21 @@ Available Endpoints:
 4. POST /inpaint
    - Process inpainting using uploaded image and mask IDs
    - Content-Type: application/json
-   - Body: {"image_id":"<image_id>","mask_id":"<mask_id>"}
    - Returns: {"result":"output_xxx.png"}
    - Simple response with just the filename
 5. POST /inpaint-url
    - Same as /inpaint but returns JSON with public download URL
    - Content-Type: application/json
-   - Body: {"image_id":"<image_id>","mask_id":"<mask_id>"}
    - Returns: {"result":"output_xxx.png","url":"https://.../download/output_xxx.png"}
    - Use this endpoint if you need a shareable URL
 6. POST /inpaint-multipart
    - Process inpainting with direct file upload (no separate upload steps)
    - Content-Type: multipart/form-data
-   - Form fields: image (file), mask (file)
    - Returns: {"result":"output_xxx.png","url":"https://.../download/output_xxx.png"}
 7. GET /download/{filename}
@@ -84,14 +86,14 @@ curl -H "Authorization: Bearer <API_TOKEN>" \
 4. Inpaint (returns simple JSON):
 curl -H "Authorization: Bearer <API_TOKEN>" \
   -H "Content-Type: application/json" \
-  -d '{"image_id":"9cf61445-f83b-4c97-9272-c81647f90d68","mask_id":"d044a390-dde2-408a-b7cf-d508385e56ed"}' \
   https://logicgoinfotechspaces-object-remover.hf.space/inpaint
 # Response: {"result":"output_b09568698bbd4aa591b1598c01f2f745.png"}
 5. Inpaint-URL (returns JSON with public URL):
 curl -H "Authorization: Bearer <API_TOKEN>" \
   -H "Content-Type: application/json" \
-  -d '{"image_id":"9cf61445-f83b-4c97-9272-c81647f90d68","mask_id":"d044a390-dde2-408a-b7cf-d508385e56ed"}' \
   https://logicgoinfotechspaces-object-remover.hf.space/inpaint-url
 # Response: {"result":"output_b09568698bbd4aa591b1598c01f2f745.png","url":"https://logicgoinfotechspaces-object-remover.hf.space/download/output_b09568698bbd4aa591b1598c01f2f745.png"}
@@ -140,7 +142,8 @@ POSTMAN EXAMPLES:
    Body: raw JSON
    {
      "image_id": "9cf61445-f83b-4c97-9272-c81647f90d68",
-     "mask_id": "d044a390-dde2-408a-b7cf-d508385e56ed"
    }
 5. Inpaint Multipart (one-step):
@@ -150,6 +153,7 @@ POSTMAN EXAMPLES:
    Body: form-data
    Key: image, Type: File, Value: select your image file
    Key: mask, Type: File, Value: select your mask file
 IMPORTANT NOTES:

 - Set API_TOKEN environment variable on server to enable auth
 - Send header: Authorization: Bearer <API_TOKEN>
 - If API_TOKEN not set, all endpoints are publicly accessible
+- Inpainting work is delegated to Google Gemini/Imagen edit API; no GPU is needed on the server.
+- Uploads are stored in MongoDB GridFS (database `object_remover`); the IDs returned by upload endpoints are fetched from GridFS when processing.
 Available Endpoints:
 4. POST /inpaint
    - Process inpainting using uploaded image and mask IDs
    - Content-Type: application/json
+   - Body: {"image_id":"<image_id>","mask_id":"<mask_id>","prompt":"optional text about what to remove"}
    - Returns: {"result":"output_xxx.png"}
    - Simple response with just the filename
 5. POST /inpaint-url
    - Same as /inpaint but returns JSON with public download URL
    - Content-Type: application/json
+   - Body: {"image_id":"<image_id>","mask_id":"<mask_id>","prompt":"optional text about what to remove"}
    - Returns: {"result":"output_xxx.png","url":"https://.../download/output_xxx.png"}
    - Use this endpoint if you need a shareable URL
 6. POST /inpaint-multipart
    - Process inpainting with direct file upload (no separate upload steps)
    - Content-Type: multipart/form-data
+   - Form fields: image (file), mask (file), prompt (optional text)
    - Returns: {"result":"output_xxx.png","url":"https://.../download/output_xxx.png"}
 7. GET /download/{filename}
 4. Inpaint (returns simple JSON):
 curl -H "Authorization: Bearer <API_TOKEN>" \
   -H "Content-Type: application/json" \
+  -d '{"image_id":"9cf61445-f83b-4c97-9272-c81647f90d68","mask_id":"d044a390-dde2-408a-b7cf-d508385e56ed","prompt":"Remove the car and repair the road"}' \
   https://logicgoinfotechspaces-object-remover.hf.space/inpaint
 # Response: {"result":"output_b09568698bbd4aa591b1598c01f2f745.png"}
 5. Inpaint-URL (returns JSON with public URL):
 curl -H "Authorization: Bearer <API_TOKEN>" \
   -H "Content-Type: application/json" \
+  -d '{"image_id":"9cf61445-f83b-4c97-9272-c81647f90d68","mask_id":"d044a390-dde2-408a-b7cf-d508385e56ed","prompt":"Remove the car and repair the road"}' \
   https://logicgoinfotechspaces-object-remover.hf.space/inpaint-url
 # Response: {"result":"output_b09568698bbd4aa591b1598c01f2f745.png","url":"https://logicgoinfotechspaces-object-remover.hf.space/download/output_b09568698bbd4aa591b1598c01f2f745.png"}
    Body: raw JSON
    {
      "image_id": "9cf61445-f83b-4c97-9272-c81647f90d68",
+     "mask_id": "d044a390-dde2-408a-b7cf-d508385e56ed",
+     "prompt": "Remove the car and restore the street"
    }
 5. Inpaint Multipart (one-step):
    Body: form-data
    Key: image, Type: File, Value: select your image file
    Key: mask, Type: File, Value: select your mask file
+   Key: prompt, Type: Text, Value: describe what to remove (optional)
 IMPORTANT NOTES:

README.md CHANGED Viewed

Binary files a/README.md and b/README.md differ

api/main.py CHANGED Viewed

@@ -4,6 +4,7 @@ import uuid
 import shutil
 import re
 from datetime import datetime, timedelta, date
 from typing import Dict, List, Optional
 import numpy as np
@@ -19,14 +20,24 @@ from fastapi import (
 )
 from fastapi.responses import FileResponse, JSONResponse
 from pydantic import BaseModel
-from PIL import Image
 import cv2
 import logging
 from bson import ObjectId
 from pymongo import MongoClient
 import time
 logging.basicConfig(level=logging.INFO)
 log = logging.getLogger("api")
@@ -53,10 +64,15 @@ app = FastAPI(title="Photo Object Removal API", version="1.0.0")
 file_store: Dict[str, Dict[str, str]] = {}
 logs: List[Dict[str, str]] = []
-MONGO_URI = "mongodb+srv://harilogicgo_db_user:pdnh6UCMsWvuTCoi@kiddoimages.k2a4nuv.mongodb.net/?appName=KiddoImages"
 mongo_client = MongoClient(MONGO_URI)
 mongo_db = mongo_client["object_remover"]
 mongo_logs = mongo_db["api_logs"]
 ADMIN_MONGO_URI = os.environ.get("MONGODB_ADMIN")
 DEFAULT_CATEGORY_ID = "69368f722e46bd68ae188984"
@@ -71,11 +87,15 @@ def _init_admin_mongo() -> None:
     try:
         admin_client = MongoClient(ADMIN_MONGO_URI)
         # get_default_database() extracts database from connection string (e.g., /adminPanel)
-        admin_db = admin_client.get_default_database()
         if admin_db is None:
-            # Fallback if no database in URI
-            admin_db = admin_client["admin"]
-            log.warning("No database in connection string, defaulting to 'admin'")
         admin_media_clicks = admin_db["media_clicks"]
         log.info(
@@ -112,6 +132,50 @@ def _admin_logging_status() -> Dict[str, object]:
     }
 def _build_ai_edit_daily_count(
     existing: Optional[List[Dict[str, object]]],
     today: date,
@@ -223,6 +287,7 @@ class InpaintRequest(BaseModel):
     mask_id: str
     invert_mask: bool = True  # True => selected/painted area is removed
     passthrough: bool = False  # If True, return the original image unchanged
     user_id: Optional[str] = None
     category_id: Optional[str] = None
@@ -382,47 +447,18 @@ def logging_status(_: None = Depends(bearer_auth)) -> Dict[str, object]:
 @app.post("/upload-image")
 def upload_image(image: UploadFile = File(...), _: None = Depends(bearer_auth)) -> Dict[str, str]:
-    ext = os.path.splitext(image.filename)[1] or ".png"
-    file_id = str(uuid.uuid4())
-    stored_name = f"{file_id}{ext}"
-    stored_path = os.path.join(UPLOAD_DIR, stored_name)
-    with open(stored_path, "wb") as f:
-        shutil.copyfileobj(image.file, f)
-    file_store[file_id] = {
-        "type": "image",
-        "filename": image.filename,
-        "stored_name": stored_name,
-        "path": stored_path,
-        "timestamp": datetime.utcnow().isoformat(),
-    }
     logs.append({"id": file_id, "filename": image.filename, "type": "image", "timestamp": datetime.utcnow().isoformat()})
     return {"id": file_id, "filename": image.filename}
 @app.post("/upload-mask")
 def upload_mask(mask: UploadFile = File(...), _: None = Depends(bearer_auth)) -> Dict[str, str]:
-    ext = os.path.splitext(mask.filename)[1] or ".png"
-    file_id = str(uuid.uuid4())
-    stored_name = f"{file_id}{ext}"
-    stored_path = os.path.join(UPLOAD_DIR, stored_name)
-    with open(stored_path, "wb") as f:
-        shutil.copyfileobj(mask.file, f)
-    file_store[file_id] = {
-        "type": "mask",
-        "filename": mask.filename,
-        "stored_name": stored_name,
-        "path": stored_path,
-        "timestamp": datetime.utcnow().isoformat(),
-    }
     logs.append({"id": file_id, "filename": mask.filename, "type": "mask", "timestamp": datetime.utcnow().isoformat()})
     return {"id": file_id, "filename": mask.filename}
-def _load_rgba_image(path: str) -> Image.Image:
-    img = Image.open(path)
-    return img.convert("RGBA")
 def _compress_image(image_path: str, output_path: str, quality: int = 85) -> None:
     """
     Compress an image to reduce file size.
@@ -503,14 +539,8 @@ def inpaint(req: InpaintRequest, request: Request, _: None = Depends(bearer_auth
     compressed_url = None
     try:
-        if req.image_id not in file_store or file_store[req.image_id]["type"] != "image":
-            raise HTTPException(status_code=404, detail="image_id not found")
-        if req.mask_id not in file_store or file_store[req.mask_id]["type"] != "mask":
-            raise HTTPException(status_code=404, detail="mask_id not found")
-        img_rgba = _load_rgba_image(file_store[req.image_id]["path"])
-        mask_img = Image.open(file_store[req.mask_id]["path"])
         mask_rgba = _load_rgba_mask_from_image(mask_img)
         if req.passthrough:
@@ -519,7 +549,8 @@ def inpaint(req: InpaintRequest, request: Request, _: None = Depends(bearer_auth
             result = process_inpaint(
                 np.array(img_rgba),
                 mask_rgba,
-                invert_mask=req.invert_mask
             )
         output_name = f"output_{uuid.uuid4().hex}.png"
@@ -608,19 +639,19 @@ def inpaint_url(req: InpaintRequest, request: Request, _: None = Depends(bearer_
     result_name = None
     try:
-        if req.image_id not in file_store or file_store[req.image_id]["type"] != "image":
-            raise HTTPException(status_code=404, detail="image_id not found")
-        if req.mask_id not in file_store or file_store[req.mask_id]["type"] != "mask":
-            raise HTTPException(status_code=404, detail="mask_id not found")
-        img_rgba = _load_rgba_image(file_store[req.image_id]["path"])
-        mask_img = Image.open(file_store[req.mask_id]["path"])  # may be RGB/gray/RGBA
         mask_rgba = _load_rgba_mask_from_image(mask_img)
         if req.passthrough:
             result = np.array(img_rgba.convert("RGB"))
         else:
-            result = process_inpaint(np.array(img_rgba), mask_rgba, invert_mask=req.invert_mask)
         result_name = f"output_{uuid.uuid4().hex}.png"
         result_path = os.path.join(OUTPUT_DIR, result_name)
         Image.fromarray(result).save(result_path, "PNG", optimize=False, compress_level=1)
@@ -662,6 +693,7 @@ def inpaint_multipart(
     invert_mask: bool = True,
     mask_is_painted: bool = False,  # if True, mask file is the painted-on image (e.g., black strokes on original)
     passthrough: bool = False,
     user_id: Optional[str] = Form(None),
     category_id: Optional[str] = Form(None),
     _: None = Depends(bearer_auth),
@@ -774,7 +806,12 @@ def inpaint_multipart(
         actual_invert = invert_mask  # Use default True for painted masks
         log.info("Using invert_mask=%s (mask_is_painted=%s)", actual_invert, mask_is_painted)
-        result = process_inpaint(np.array(img), mask_rgba, invert_mask=actual_invert)
         result_name = f"output_{uuid.uuid4().hex}.png"
         result_path = os.path.join(OUTPUT_DIR, result_name)
         Image.fromarray(result).save(result_path, "PNG", optimize=False, compress_level=1)
@@ -1930,4 +1967,4 @@ def get_logs(_: None = Depends(bearer_auth)) -> JSONResponse:
 # @app.get("/logs")
 # def get_logs(_: None = Depends(bearer_auth)) -> JSONResponse:
-#     return JSONResponse(content=logs)

 import shutil
 import re
 from datetime import datetime, timedelta, date
+from io import BytesIO
 from typing import Dict, List, Optional
 import numpy as np
 )
 from fastapi.responses import FileResponse, JSONResponse
 from pydantic import BaseModel
+from PIL import Image, UnidentifiedImageError
 import cv2
 import logging
+from gridfs import GridFS
+from gridfs.errors import NoFile
 from bson import ObjectId
 from pymongo import MongoClient
 import time
+# Load environment variables from .env if present
+try:
+    from dotenv import load_dotenv
+    load_dotenv()
+except Exception:
+    pass
 logging.basicConfig(level=logging.INFO)
 log = logging.getLogger("api")
 file_store: Dict[str, Dict[str, str]] = {}
 logs: List[Dict[str, str]] = []
+MONGO_URI = (
+    os.environ.get("MONGO_URI")
+    or os.environ.get("MONGODB_URI")
+    or "mongodb+srv://harilogicgo_db_user:pdnh6UCMsWvuTCoi@kiddoimages.k2a4nuv.mongodb.net/?appName=KiddoImages"
+)
 mongo_client = MongoClient(MONGO_URI)
 mongo_db = mongo_client["object_remover"]
 mongo_logs = mongo_db["api_logs"]
+grid_fs = GridFS(mongo_db)
 ADMIN_MONGO_URI = os.environ.get("MONGODB_ADMIN")
 DEFAULT_CATEGORY_ID = "69368f722e46bd68ae188984"
     try:
         admin_client = MongoClient(ADMIN_MONGO_URI)
         # get_default_database() extracts database from connection string (e.g., /adminPanel)
+        try:
+            admin_db = admin_client.get_default_database()
+        except Exception as db_err:
+            admin_db = None
+            log.warning("Admin Mongo URI has no default DB; error=%s", db_err)
         if admin_db is None:
+            # Fallback to provided default for this app
+            admin_db = admin_client["object_remover"]
+            log.warning("No database in connection string, defaulting to 'object_remover'")
         admin_media_clicks = admin_db["media_clicks"]
         log.info(
     }
+def _save_upload_to_gridfs(upload: UploadFile, file_type: str) -> str:
+    """Store an uploaded file into GridFS and return its ObjectId string."""
+    data = upload.file.read()
+    if not data:
+        raise HTTPException(status_code=400, detail=f"{file_type} file is empty")
+    oid = grid_fs.put(
+        data,
+        filename=upload.filename or f"{file_type}.bin",
+        contentType=upload.content_type,
+        metadata={"type": file_type},
+    )
+    return str(oid)
+def _read_gridfs_bytes(file_id: str, expected_type: str) -> bytes:
+    """Fetch raw bytes from GridFS and validate the stored type metadata."""
+    try:
+        oid = ObjectId(file_id)
+    except Exception:
+        raise HTTPException(status_code=404, detail=f"{expected_type}_id invalid")
+    try:
+        grid_out = grid_fs.get(oid)
+    except NoFile:
+        raise HTTPException(status_code=404, detail=f"{expected_type}_id not found")
+    meta = grid_out.metadata or {}
+    stored_type = meta.get("type")
+    if stored_type and stored_type != expected_type:
+        raise HTTPException(status_code=404, detail=f"{expected_type}_id not found")
+    return grid_out.read()
+def _load_rgba_image_from_gridfs(file_id: str, expected_type: str) -> Image.Image:
+    """Load an image from GridFS and convert to RGBA."""
+    data = _read_gridfs_bytes(file_id, expected_type)
+    try:
+        img = Image.open(BytesIO(data))
+    except UnidentifiedImageError:
+        raise HTTPException(status_code=422, detail=f"{expected_type} is not a valid image")
+    return img.convert("RGBA")
 def _build_ai_edit_daily_count(
     existing: Optional[List[Dict[str, object]]],
     today: date,
     mask_id: str
     invert_mask: bool = True  # True => selected/painted area is removed
     passthrough: bool = False  # If True, return the original image unchanged
+    prompt: Optional[str] = None  # Optional: describe what to remove
     user_id: Optional[str] = None
     category_id: Optional[str] = None
 @app.post("/upload-image")
 def upload_image(image: UploadFile = File(...), _: None = Depends(bearer_auth)) -> Dict[str, str]:
+    file_id = _save_upload_to_gridfs(image, "image")
     logs.append({"id": file_id, "filename": image.filename, "type": "image", "timestamp": datetime.utcnow().isoformat()})
     return {"id": file_id, "filename": image.filename}
 @app.post("/upload-mask")
 def upload_mask(mask: UploadFile = File(...), _: None = Depends(bearer_auth)) -> Dict[str, str]:
+    file_id = _save_upload_to_gridfs(mask, "mask")
     logs.append({"id": file_id, "filename": mask.filename, "type": "mask", "timestamp": datetime.utcnow().isoformat()})
     return {"id": file_id, "filename": mask.filename}
 def _compress_image(image_path: str, output_path: str, quality: int = 85) -> None:
     """
     Compress an image to reduce file size.
     compressed_url = None
     try:
+        img_rgba = _load_rgba_image_from_gridfs(req.image_id, "image")
+        mask_img = _load_rgba_image_from_gridfs(req.mask_id, "mask")
         mask_rgba = _load_rgba_mask_from_image(mask_img)
         if req.passthrough:
             result = process_inpaint(
                 np.array(img_rgba),
                 mask_rgba,
+                invert_mask=req.invert_mask,
+                prompt=req.prompt,
             )
         output_name = f"output_{uuid.uuid4().hex}.png"
     result_name = None
     try:
+        img_rgba = _load_rgba_image_from_gridfs(req.image_id, "image")
+        mask_img = _load_rgba_image_from_gridfs(req.mask_id, "mask")  # may be RGB/gray/RGBA
         mask_rgba = _load_rgba_mask_from_image(mask_img)
         if req.passthrough:
             result = np.array(img_rgba.convert("RGB"))
         else:
+            result = process_inpaint(
+                np.array(img_rgba),
+                mask_rgba,
+                invert_mask=req.invert_mask,
+                prompt=req.prompt,
+            )
         result_name = f"output_{uuid.uuid4().hex}.png"
         result_path = os.path.join(OUTPUT_DIR, result_name)
         Image.fromarray(result).save(result_path, "PNG", optimize=False, compress_level=1)
     invert_mask: bool = True,
     mask_is_painted: bool = False,  # if True, mask file is the painted-on image (e.g., black strokes on original)
     passthrough: bool = False,
+    prompt: Optional[str] = Form(None),
     user_id: Optional[str] = Form(None),
     category_id: Optional[str] = Form(None),
     _: None = Depends(bearer_auth),
         actual_invert = invert_mask  # Use default True for painted masks
         log.info("Using invert_mask=%s (mask_is_painted=%s)", actual_invert, mask_is_painted)
+        result = process_inpaint(
+            np.array(img),
+            mask_rgba,
+            invert_mask=actual_invert,
+            prompt=prompt,
+        )
         result_name = f"output_{uuid.uuid4().hex}.png"
         result_path = os.path.join(OUTPUT_DIR, result_name)
         Image.fromarray(result).save(result_path, "PNG", optimize=False, compress_level=1)
 # @app.get("/logs")
 # def get_logs(_: None = Depends(bearer_auth)) -> JSONResponse:
+#     return JSONResponse(content=logs)

app.py CHANGED Viewed

@@ -2,10 +2,8 @@
 # Model based on: https://github.com/saic-mdal/lama
 import numpy as np
-import pandas as pd
 import streamlit as st
 import os
-from datetime import datetime
 from PIL import Image
 from streamlit_drawable_canvas import st_canvas
 from io import BytesIO

 # Model based on: https://github.com/saic-mdal/lama
 import numpy as np
 import streamlit as st
 import os
 from PIL import Image
 from streamlit_drawable_canvas import st_canvas
 from io import BytesIO

assets/big-lama.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:344c77bbcb158f17dd143070d1e789f38a66c04202311ae3a258ef66667a9ea9
-size 205669692

requirements.txt CHANGED Viewed

@@ -1,29 +1,11 @@
-torch
-torchvision
-numpy
-opencv-python-headless
-matplotlib
-streamlit==1.24.1
-gradio==5.47.0
-streamlit-drawable-canvas==0.9.0
-pyyaml
-tqdm
-easydict==1.9.0
-scikit-image
-scipy>=1.14.1
-tensorflow
-joblib
-pandas
-albumentations==0.5.2
-hydra-core==1.1.0
-pytorch-lightning==1.2.9
-tabulate
-kornia==0.5.0
-webdataset
-packaging
-wldhx.yadisk-direct
-altair<5
 fastapi
 uvicorn[standard]
 python-multipart
-pymongo

 fastapi
 uvicorn[standard]
 python-multipart
+google-genai>=1.38.0
+google-generativeai
+python-dotenv
+numpy
+opencv-python-headless
+Pillow
+pymongo
+streamlit==1.24.1

src/core.py CHANGED Viewed

@@ -1,556 +1,216 @@
-import base64
-import json
 import os
-import re
-import time
-import uuid
 from io import BytesIO
-from pathlib import Path
-import cv2
-# For inpainting
-import numpy as np
-import pandas as pd
-import streamlit as st
-from PIL import Image
-from streamlit_drawable_canvas import st_canvas
-import argparse
-import io
-import multiprocessing
-from typing import Union
-import torch
 try:
-    torch._C._jit_override_can_fuse_on_cpu(False)
-    torch._C._jit_override_can_fuse_on_gpu(False)
-    torch._C._jit_set_texpr_fuser_enabled(False)
-    torch._C._jit_set_nvfuser_enabled(False)
-except:
     pass
-from src.helper import (
-    download_model,
-    load_img,
-    norm_img,
-    numpy_to_bytes,
-    pad_img_to_modulo,
-    resize_max_size,
 )
-NUM_THREADS = str(multiprocessing.cpu_count())
-os.environ["OMP_NUM_THREADS"] = NUM_THREADS
-os.environ["OPENBLAS_NUM_THREADS"] = NUM_THREADS
-os.environ["MKL_NUM_THREADS"] = NUM_THREADS
-os.environ["VECLIB_MAXIMUM_THREADS"] = NUM_THREADS
-os.environ["NUMEXPR_NUM_THREADS"] = NUM_THREADS
-if os.environ.get("CACHE_DIR"):
-    os.environ["TORCH_HOME"] = os.environ["CACHE_DIR"]
-#BUILD_DIR = os.environ.get("LAMA_CLEANER_BUILD_DIR", "./lama_cleaner/app/build")
-# For Seam-carving
-from scipy import ndimage as ndi
-SEAM_COLOR = np.array([255, 200, 200])    # seam visualization color (BGR)
-SHOULD_DOWNSIZE = True                    # if True, downsize image for faster carving
-DOWNSIZE_WIDTH = 500                      # resized image width if SHOULD_DOWNSIZE is True
-ENERGY_MASK_CONST = 100000.0              # large energy value for protective masking
-MASK_THRESHOLD = 10                       # minimum pixel intensity for binary mask
-USE_FORWARD_ENERGY = True                 # if True, use forward energy algorithm
-device_str = os.environ.get("DEVICE", "cuda" if torch.cuda.is_available() else "cpu")
-device = torch.device(device_str)
-model_path = "./assets/big-lama.pt"
-model = torch.jit.load(model_path, map_location=device)
-model = model.to(device)
-model.eval()
-########################################
-# UTILITY CODE
-########################################
-def visualize(im, boolmask=None, rotate=False):
-    vis = im.astype(np.uint8)
-    if boolmask is not None:
-        vis[np.where(boolmask == False)] = SEAM_COLOR
-    if rotate:
-        vis = rotate_image(vis, False)
-    cv2.imshow("visualization", vis)
-    cv2.waitKey(1)
-    return vis
-def resize(image, width):
-    dim = None
-    h, w = image.shape[:2]
-    dim = (width, int(h * width / float(w)))
-    image = image.astype('float32')
-    return cv2.resize(image, dim)
-def rotate_image(image, clockwise):
-    k = 1 if clockwise else 3
-    return np.rot90(image, k)
-########################################
-# ENERGY FUNCTIONS
-########################################
-def backward_energy(im):
     """
-    Simple gradient magnitude energy map.
     """
-    xgrad = ndi.convolve1d(im, np.array([1, 0, -1]), axis=1, mode='wrap')
-    ygrad = ndi.convolve1d(im, np.array([1, 0, -1]), axis=0, mode='wrap')
-    grad_mag = np.sqrt(np.sum(xgrad**2, axis=2) + np.sum(ygrad**2, axis=2))
-    # vis = visualize(grad_mag)
-    # cv2.imwrite("backward_energy_demo.jpg", vis)
-    return grad_mag
-def forward_energy(im):
-    """
-    Forward energy algorithm as described in "Improved Seam Carving for Video Retargeting"
-    by Rubinstein, Shamir, Avidan.
-    Vectorized code adapted from
-    https://github.com/axu2/improved-seam-carving.
-    """
-    h, w = im.shape[:2]
-    im = cv2.cvtColor(im.astype(np.uint8), cv2.COLOR_BGR2GRAY).astype(np.float64)
-    energy = np.zeros((h, w))
-    m = np.zeros((h, w))
-    U = np.roll(im, 1, axis=0)
-    L = np.roll(im, 1, axis=1)
-    R = np.roll(im, -1, axis=1)
-    cU = np.abs(R - L)
-    cL = np.abs(U - L) + cU
-    cR = np.abs(U - R) + cU
-    for i in range(1, h):
-        mU = m[i-1]
-        mL = np.roll(mU, 1)
-        mR = np.roll(mU, -1)
-        mULR = np.array([mU, mL, mR])
-        cULR = np.array([cU[i], cL[i], cR[i]])
-        mULR += cULR
-        argmins = np.argmin(mULR, axis=0)
-        m[i] = np.choose(argmins, mULR)
-        energy[i] = np.choose(argmins, cULR)
-    # vis = visualize(energy)
-    # cv2.imwrite("forward_energy_demo.jpg", vis)
-    return energy
-########################################
-# SEAM HELPER FUNCTIONS
-########################################
-def add_seam(im, seam_idx):
-    """
-    Add a vertical seam to a 3-channel color image at the indices provided
-    by averaging the pixels values to the left and right of the seam.
-    Code adapted from https://github.com/vivianhylee/seam-carving.
-    """
-    h, w = im.shape[:2]
-    output = np.zeros((h, w + 1, 3))
-    for row in range(h):
-        col = seam_idx[row]
-        for ch in range(3):
-            if col == 0:
-                p = np.mean(im[row, col: col + 2, ch])
-                output[row, col, ch] = im[row, col, ch]
-                output[row, col + 1, ch] = p
-                output[row, col + 1:, ch] = im[row, col:, ch]
-            else:
-                p = np.mean(im[row, col - 1: col + 1, ch])
-                output[row, : col, ch] = im[row, : col, ch]
-                output[row, col, ch] = p
-                output[row, col + 1:, ch] = im[row, col:, ch]
-    return output
-def add_seam_grayscale(im, seam_idx):
-    """
-    Add a vertical seam to a grayscale image at the indices provided
-    by averaging the pixels values to the left and right of the seam.
-    """
-    h, w = im.shape[:2]
-    output = np.zeros((h, w + 1))
-    for row in range(h):
-        col = seam_idx[row]
-        if col == 0:
-            p = np.mean(im[row, col: col + 2])
-            output[row, col] = im[row, col]
-            output[row, col + 1] = p
-            output[row, col + 1:] = im[row, col:]
-        else:
-            p = np.mean(im[row, col - 1: col + 1])
-            output[row, : col] = im[row, : col]
-            output[row, col] = p
-            output[row, col + 1:] = im[row, col:]
-    return output
-def remove_seam(im, boolmask):
-    h, w = im.shape[:2]
-    boolmask3c = np.stack([boolmask] * 3, axis=2)
-    return im[boolmask3c].reshape((h, w - 1, 3))
-def remove_seam_grayscale(im, boolmask):
-    h, w = im.shape[:2]
-    return im[boolmask].reshape((h, w - 1))
-def get_minimum_seam(im, mask=None, remove_mask=None):
     """
-    DP algorithm for finding the seam of minimum energy. Code adapted from
-    https://karthikkaranth.me/blog/implementing-seam-carving-with-python/
     """
-    h, w = im.shape[:2]
-    energyfn = forward_energy if USE_FORWARD_ENERGY else backward_energy
-    M = energyfn(im)
-    if mask is not None:
-        M[np.where(mask > MASK_THRESHOLD)] = ENERGY_MASK_CONST
-    # give removal mask priority over protective mask by using larger negative value
-    if remove_mask is not None:
-        M[np.where(remove_mask > MASK_THRESHOLD)] = -ENERGY_MASK_CONST * 100
-    seam_idx, boolmask = compute_shortest_path(M, im, h, w)
-    return np.array(seam_idx), boolmask
-def compute_shortest_path(M, im, h, w):
-    backtrack = np.zeros_like(M, dtype=np.int_)
-    # populate DP matrix
-    for i in range(1, h):
-        for j in range(0, w):
-            if j == 0:
-                idx = np.argmin(M[i - 1, j:j + 2])
-                backtrack[i, j] = idx + j
-                min_energy = M[i-1, idx + j]
-            else:
-                idx = np.argmin(M[i - 1, j - 1:j + 2])
-                backtrack[i, j] = idx + j - 1
-                min_energy = M[i - 1, idx + j - 1]
-            M[i, j] += min_energy
-    # backtrack to find path
-    seam_idx = []
-    boolmask = np.ones((h, w), dtype=np.bool_)
-    j = np.argmin(M[-1])
-    for i in range(h-1, -1, -1):
-        boolmask[i, j] = False
-        seam_idx.append(j)
-        j = backtrack[i, j]
-    seam_idx.reverse()
-    return seam_idx, boolmask
-########################################
-# MAIN ALGORITHM
-########################################
-def seams_removal(im, num_remove, mask=None, vis=False, rot=False):
-    for _ in range(num_remove):
-        seam_idx, boolmask = get_minimum_seam(im, mask)
-        if vis:
-            visualize(im, boolmask, rotate=rot)
-        im = remove_seam(im, boolmask)
-        if mask is not None:
-            mask = remove_seam_grayscale(mask, boolmask)
-    return im, mask
-def seams_insertion(im, num_add, mask=None, vis=False, rot=False):
-    seams_record = []
-    temp_im = im.copy()
-    temp_mask = mask.copy() if mask is not None else None
-    for _ in range(num_add):
-        seam_idx, boolmask = get_minimum_seam(temp_im, temp_mask)
-        if vis:
-            visualize(temp_im, boolmask, rotate=rot)
-        seams_record.append(seam_idx)
-        temp_im = remove_seam(temp_im, boolmask)
-        if temp_mask is not None:
-            temp_mask = remove_seam_grayscale(temp_mask, boolmask)
-    seams_record.reverse()
-    for _ in range(num_add):
-        seam = seams_record.pop()
-        im = add_seam(im, seam)
-        if vis:
-            visualize(im, rotate=rot)
-        if mask is not None:
-            mask = add_seam_grayscale(mask, seam)
-        # update the remaining seam indices
-        for remaining_seam in seams_record:
-            remaining_seam[np.where(remaining_seam >= seam)] += 2
-    return im, mask
-########################################
-# MAIN DRIVER FUNCTIONS
-########################################
-def seam_carve(im, dy, dx, mask=None, vis=False):
-    im = im.astype(np.float64)
-    h, w = im.shape[:2]
-    assert h + dy > 0 and w + dx > 0 and dy <= h and dx <= w
-    if mask is not None:
-        mask = mask.astype(np.float64)
-    output = im
-    if dx < 0:
-        output, mask = seams_removal(output, -dx, mask, vis)
-    elif dx > 0:
-        output, mask = seams_insertion(output, dx, mask, vis)
-    if dy < 0:
-        output = rotate_image(output, True)
-        if mask is not None:
-            mask = rotate_image(mask, True)
-        output, mask = seams_removal(output, -dy, mask, vis, rot=True)
-        output = rotate_image(output, False)
-    elif dy > 0:
-        output = rotate_image(output, True)
-        if mask is not None:
-            mask = rotate_image(mask, True)
-        output, mask = seams_insertion(output, dy, mask, vis, rot=True)
-        output = rotate_image(output, False)
-    return output
-def object_removal(im, rmask, mask=None, vis=False, horizontal_removal=False):
-    im = im.astype(np.float64)
-    rmask = rmask.astype(np.float64)
-    if mask is not None:
-        mask = mask.astype(np.float64)
-    output = im
-    h, w = im.shape[:2]
-    if horizontal_removal:
-        output = rotate_image(output, True)
-        rmask = rotate_image(rmask, True)
-        if mask is not None:
-            mask = rotate_image(mask, True)
-    while len(np.where(rmask > MASK_THRESHOLD)[0]) > 0:
-        seam_idx, boolmask = get_minimum_seam(output, mask, rmask)
-        if vis:
-            visualize(output, boolmask, rotate=horizontal_removal)
-        output = remove_seam(output, boolmask)
-        rmask = remove_seam_grayscale(rmask, boolmask)
-        if mask is not None:
-            mask = remove_seam_grayscale(mask, boolmask)
-    num_add = (h if horizontal_removal else w) - output.shape[1]
-    output, mask = seams_insertion(output, num_add, mask, vis, rot=horizontal_removal)
-    if horizontal_removal:
-        output = rotate_image(output, False)
-    return output
-def s_image(im,mask,vs,hs,mode="resize"):
-    im = cv2.cvtColor(im, cv2.COLOR_RGBA2RGB)
-    mask = 255-mask[:,:,3]
-    h, w = im.shape[:2]
-    if SHOULD_DOWNSIZE and w > DOWNSIZE_WIDTH:
-        im = resize(im, width=DOWNSIZE_WIDTH)
-        if mask is not None:
-            mask = resize(mask, width=DOWNSIZE_WIDTH)
-    # image resize mode
-    if mode=="resize":
-        dy = hs#reverse
-        dx = vs#reverse
-        assert dy is not None and dx is not None
-        output = seam_carve(im, dy, dx, mask, False)
-    # object removal mode
-    elif mode=="remove":
-        assert mask is not None
-        output = object_removal(im, mask, None, False, True)
-    return output
-##### Inpainting helper code
-def run(image, mask):
     """
-    image: [C, H, W]
-    mask: [1, H, W]
-    return: BGR IMAGE
     """
-    origin_height, origin_width = image.shape[1:]
-    image = pad_img_to_modulo(image, mod=8)
-    mask = pad_img_to_modulo(mask, mod=8)
-    mask = (mask > 0) * 1
-    image = torch.from_numpy(image).unsqueeze(0).to(device)
-    mask = torch.from_numpy(mask).unsqueeze(0).to(device)
-    start = time.time()
-    with torch.no_grad():
-        inpainted_image = model(image, mask)
-    print(f"process time: {(time.time() - start)*1000}ms")
-    cur_res = inpainted_image[0].permute(1, 2, 0).detach().cpu().numpy()
-    cur_res = cur_res[0:origin_height, 0:origin_width, :]
-    cur_res = np.clip(cur_res * 255, 0, 255).astype("uint8")
-    cur_res = cv2.cvtColor(cur_res, cv2.COLOR_BGR2RGB)
-    return cur_res
-def get_args_parser():
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--port", default=8080, type=int)
-    parser.add_argument("--device", default="cuda", type=str)
-    parser.add_argument("--debug", action="store_true")
-    return parser.parse_args()
-def process_inpaint(image, mask, invert_mask=True):
-    """
-    Process inpainting - handles both alpha-based masks and RGB-based masks.
-    Preserves original image quality and dimensions.
-    Reference: https://huggingface.co/spaces/aryadytm/remove-photo-object
-    """
-    image = cv2.cvtColor(image, cv2.COLOR_RGBA2RGB)
-    original_shape = image.shape  # (H, W, C)
-    interpolation = cv2.INTER_CUBIC
-    # Preserve original size - only resize if absolutely necessary for memory/performance
-    # Keep original quality by preserving dimensions
-    max_dimension = max(image.shape[:2])
-    # Don't resize unless image is extremely large (over 3000px) to preserve quality
-    if max_dimension > 3000:
-        size_limit = 3000
-        print(f"Very large image detected ({max_dimension}px), resizing to {size_limit}px for processing")
-    else:
-        size_limit = max_dimension  # Keep original size to preserve quality
-        print(f"Preserving original image size: {max_dimension}px (no resize)")
-    print(f"Origin image shape: {original_shape}")
-    # Resize image only if needed
-    if size_limit < max_dimension:
-        image_resized = resize_max_size(image, size_limit=size_limit, interpolation=interpolation)
-        print(f"Resized image shape: {image_resized.shape}")
-    else:
-        image_resized = image
-        print(f"Image not resized: {image_resized.shape}")
-    image = norm_img(image_resized)
-    # Handle mask: check if we should use alpha channel or RGB channels
-    alpha_channel = mask[:,:,3]
-    rgb_channels = mask[:,:,:3]
-    # Check if alpha is meaningful (not all 255)
-    alpha_mean = alpha_channel.mean()
-    if alpha_mean < 240:
-        # Alpha channel is meaningful (has transparent areas)
-        # Reference model logic: mask = 255-mask[:,:,3]
-        # alpha=0 (transparent) → 255 (white/remove)
-        # alpha=255 (opaque) → 0 (black/keep)
-        mask = 255 - alpha_channel
-        transparent_count = int((alpha_channel < 128).sum())
-        print(f"Using alpha channel: {transparent_count} transparent pixels → white (to remove)")
-        # For alpha-based masks: invert_mask=True means keep current (white=remove is correct)
-        # invert_mask=False means invert (white becomes black)
-        if not invert_mask:
-            mask = 255 - mask
-            print(f"Applied invert_mask=False: inverted alpha-based mask")
-    else:
-        # Alpha is mostly opaque (255), use RGB channels instead
-        # RGB masks: white (255) = remove, black (0) = keep (standard convention)
-        gray = cv2.cvtColor(rgb_channels, cv2.COLOR_RGB2GRAY)
-        mask = (gray > 128).astype(np.uint8) * 255
-        white_count = int((mask > 128).sum())
-        print(f"Using RGB channels: {white_count} white pixels (to remove)")
-        # For RGB-based masks: white=remove is already correct
-        # invert_mask=False means we want black=remove (invert it)
-        if not invert_mask:
-            mask = 255 - mask  # invert: white becomes black, black becomes white
-            print(f"Applied invert_mask=False: inverted RGB mask (now black=remove)")
-    # Resize mask to match image dimensions (always force exact match)
-    target_h, target_w = image_resized.shape[:2]
-    if mask.shape[:2] != (target_h, target_w):
-        mask = cv2.resize(mask, (target_w, target_h), interpolation=cv2.INTER_NEAREST)
-    # Debug: log final mask statistics
-    mask_nonzero = int((mask > 128).sum())
-    mask_total = mask.shape[0] * mask.shape[1]
-    print(f"Final mask before normalization: {mask_nonzero}/{mask_total} pixels marked for removal ({100*mask_nonzero/mask_total:.2f}%)")
-    if mask_nonzero < 10:
-        print("ERROR: Mask is empty or almost empty! Returning original image.")
-        # Return original image at original size
-        original_rgb = (image_resized * 255).astype(np.uint8)
-        return cv2.resize(cv2.cvtColor(original_rgb, cv2.COLOR_RGB2BGR),
-                         (original_shape[1], original_shape[0]),
-                         interpolation=cv2.INTER_CUBIC)
-    # Verify mask is correct before normalization
-    print(f"Mask verification: {mask_nonzero} pixels will be removed, shape: {mask.shape}")
-    mask = norm_img(mask)
-    # Verify normalized mask
-    mask_normalized_ones = int((mask > 0.5).sum())
-    print(f"After normalization: {mask_normalized_ones} pixels marked for removal (value > 0.5)")
-    # Run inpainting
-    print("Running LaMa model for inpainting...")
-    res_np_img = run(image, mask)
-    print(f"Inpainting complete. Output shape: {res_np_img.shape}")
-    # Verify output changed
-    original_for_compare = (image_resized * 255).astype(np.uint8)
-    original_bgr = cv2.cvtColor(original_for_compare, cv2.COLOR_RGB2BGR)
-    diff = np.abs(res_np_img.astype(np.float32) - original_bgr.astype(np.float32))
-    diff_pixels = int((diff.sum(axis=2) > 10).sum())  # Pixels that changed by more than 10 in any channel
-    print(f"Output verification: {diff_pixels} pixels differ from input (should be > 0 if inpainting worked)")
-    # Resize back to original dimensions if we resized (use LANCZOS4 for better quality)
-    if size_limit < max_dimension:
-        res_np_img = cv2.resize(res_np_img, (original_shape[1], original_shape[0]),
-                               interpolation=cv2.INTER_LANCZOS4)
-        print(f"Resized output back to original size: {res_np_img.shape}")
-    return cv2.cvtColor(res_np_img, cv2.COLOR_BGR2RGB)

+import logging
 import os
 from io import BytesIO
+# Load environment variables from .env if present (helps local dev)
 try:
+    from dotenv import load_dotenv
+    load_dotenv()
+except Exception:
     pass
+import base64
+import cv2
+import numpy as np
+from PIL import Image
+import google.generativeai as genai
+log = logging.getLogger(__name__)
+# Remote inference configuration (Gemini API key only; no Vertex required)
+DEFAULT_MODEL_ID = os.environ.get("GEMINI_IMAGE_MODEL", "gemini-2.5-flash-image")
+DEFAULT_PROMPT = os.environ.get(
+    "GEMINI_IMAGE_PROMPT",
+    (
+        "TASK TYPE: STRICT IMAGE INPAINTING — OBJECT REMOVAL ONLY\n\n"
+        "You are given:\n"
+        "1) An original image\n"
+        "2) A binary mask image\n\n"
+        "MASK RULE (MANDATORY):\n"
+        "• White pixels (#FFFFFF) indicate the exact region to be REMOVED.\n"
+        "• Black pixels (#000000) indicate regions that MUST remain completely unchanged.\n\n"
+        "PRIMARY OBJECTIVE:\n"
+        "Completely delete everything inside the white masked area.\n"
+        "The object in the white region must be fully removed with no visible remnants,\n"
+        "no partial shapes, no outlines, no shadows, and no color traces.\n\n"
+        "INPAINTING INSTRUCTIONS:\n"
+        "Ignore the content of the white masked area entirely.\n"
+        "Reconstruct that region using ONLY surrounding background information.\n"
+        "Extend nearby background textures, patterns, and structures naturally.\n"
+        "Match lighting direction, brightness, contrast, color temperature, and noise.\n"
+        "Continue edges, lines, and surfaces realistically across the removed area.\n"
+        "Blend boundaries smoothly so the edit is visually undetectable.\n\n"
+        "STRICT CONSTRAINTS:\n"
+        "• Do NOT generate or keep any part of the removed object.\n"
+        "• Do NOT invent new objects or details.\n"
+        "• Do NOT repaint, modify, blur, or enhance any black (unmasked) area.\n"
+        "• Do NOT change the original image composition.\n"
+        "• Do NOT change camera angle, perspective, or scale.\n\n"
+        "QUALITY REQUIREMENTS:\n"
+        "• No ghosting or transparent object remains.\n"
+        "• No edge halos or smearing.\n"
+        "• No repeated textures or patchy fills.\n"
+        "• Result must look like the object never existed.\n\n"
+        "FAILURE CONDITIONS (MUST BE AVOIDED):\n"
+        "If any object fragment, outline, shadow, or color from the removed object\n"
+        "is still visible, the result is incorrect and must be re-generated."
+    ),
 )
+_GENAI_MODEL: genai.GenerativeModel | None = None
+def _resize_mask(mask: np.ndarray, target_hw: tuple[int, int]) -> np.ndarray:
+    """Resize mask to match the target height/width."""
+    target_h, target_w = target_hw
+    if mask.shape[:2] == (target_h, target_w):
+        return mask
+    return cv2.resize(mask, (target_w, target_h), interpolation=cv2.INTER_NEAREST)
+def _binary_mask_from_rgba(mask: np.ndarray, invert_mask: bool) -> np.ndarray:
     """
+    Normalize incoming RGBA masks to a 0/255 binary mask.
+    - Transparent alpha (0) is treated as "remove"
+    - White/bright RGB is treated as "remove" when alpha is mostly opaque
     """
+    if mask.shape[2] == 3:
+        alpha_channel = np.ones(mask.shape[:2], dtype=np.uint8) * 255
+        rgb_channels = mask
+    else:
+        alpha_channel = mask[:, :, 3]
+        rgb_channels = mask[:, :, :3]
+    # If alpha carries information, prefer it
+    if alpha_channel.mean() < 240:
+        mask_bw = np.where(alpha_channel < 128, 255, 0).astype(np.uint8)
+    else:
+        gray = cv2.cvtColor(rgb_channels, cv2.COLOR_RGB2GRAY)
+        mask_bw = np.where(gray > 128, 255, 0).astype(np.uint8)
+    if not invert_mask:
+        mask_bw = 255 - mask_bw
+    return mask_bw
+def _pil_to_png_bytes(img: Image.Image) -> bytes:
+    """Encode a PIL image to PNG bytes for Gemini edit endpoints."""
+    buf = BytesIO()
+    img.save(buf, format="PNG")
+    buf.seek(0)
+    return buf.getvalue()
+def _get_gemini_model() -> genai.GenerativeModel:
+    global _GENAI_MODEL
+    if _GENAI_MODEL is None:
+        api_key = (
+            os.environ.get("GEMINI_API_KEY")
+            or os.environ.get("GOOGLE_API_KEY")
+            or os.environ.get("GOOGLE_GENAI_API_KEY")
+        )
+        if not api_key:
+            raise RuntimeError("Set Gemini API key via GEMINI_API_KEY / GOOGLE_API_KEY / GOOGLE_GENAI_API_KEY")
+        genai.configure(api_key=api_key)
+        model_id = os.environ.get("GEMINI_IMAGE_MODEL", DEFAULT_MODEL_ID)
+        _GENAI_MODEL = genai.GenerativeModel(model_id)
+    return _GENAI_MODEL
+def _call_gemini_edit(
+    image_rgb: np.ndarray,
+    mask_bw: np.ndarray,
+    prompt: str | None,
+    target_size: tuple[int, int],
+) -> Image.Image:
     """
+    Send source image + binary mask to Gemini via API-key-only generate_content.
+    We include both the base image and the mask as separate parts and instruct the model to remove masked regions.
     """
+    model = _get_gemini_model()
+    base_image = Image.fromarray(image_rgb).convert("RGB")
+    mask_image = Image.fromarray(mask_bw).convert("L")
+    # Build a guidance image where the removal region is painted white for clarity
+    guidance_rgb = image_rgb.copy()
+    guidance_rgb[mask_bw > 0] = 255
+    guidance_image = Image.fromarray(guidance_rgb).convert("RGB")
+    base_bytes = _pil_to_png_bytes(base_image)
+    mask_bytes = _pil_to_png_bytes(mask_image)
+    guidance_bytes = _pil_to_png_bytes(guidance_image)
+    # Enrich prompt to explicitly describe the two images being sent
+    effective_prompt = (
+        (prompt or DEFAULT_PROMPT).strip()
+        + "\nIMAGE ORDER:\n"
+        + "Image A: Original photo with the removal region painted white.\n"
+        + "Image B: Binary mask (white=remove, black=keep). Use this mask to decide what to remove.\n"
+    )
+    log.info(
+        "Calling Gemini generate_content model=%s (mask-guided remove) mask_pixels=%d",
+        model.model_name,
+        int((mask_bw > 0).sum()),
+    )
+    # Build content parts: prompt + guidance image + mask image (explicit order)
+    content = [
+        effective_prompt,
+        {"mime_type": "image/png", "data": guidance_bytes},
+        {"mime_type": "image/png", "data": mask_bytes},
+    ]
+    response = model.generate_content(content, stream=False)
+    output_img: Image.Image | None = None
+    # Extract first image from response parts
+    try:
+        for candidate in getattr(response, "candidates", []):
+            parts = getattr(candidate, "content", None)
+            if not parts or not getattr(parts, "parts", None):
+                continue
+            for part in parts.parts:
+                inline = getattr(part, "inline_data", None)
+                if inline and inline.data:
+                    data = inline.data
+                    if isinstance(data, str):
+                        data = base64.b64decode(data)
+                    output_img = Image.open(BytesIO(data)).convert("RGB")
+                    break
+            if output_img:
+                break
+    except Exception as err:
+        log.warning("Failed to parse Gemini response image: %s", err)
+    if output_img is None:
+        raise RuntimeError("Gemini generate_content returned no image")
+    # Ensure output matches original dimensions if Gemini rescaled
+    if output_img.size != target_size:
+        output_img = output_img.resize(target_size, Image.Resampling.LANCZOS)
+    return output_img
+def process_inpaint(
+    image: np.ndarray,
+    mask: np.ndarray,
+    invert_mask: bool = True,
+    prompt: str | None = None,
+) -> np.ndarray:
     """
+    Forward inpainting to Gemini edit API using source image + mask.
     """
+    image_rgba = Image.fromarray(image).convert("RGBA")
+    image_rgb = np.array(image_rgba.convert("RGB"))
+    mask_rgba = np.array(Image.fromarray(mask).convert("RGBA"))
+    mask_bw = _binary_mask_from_rgba(mask_rgba, invert_mask)
+    mask_bw = _resize_mask(mask_bw, image_rgb.shape[:2])
+    target_size = (image_rgb.shape[1], image_rgb.shape[0])  # (width, height)
+    edited_image = _call_gemini_edit(image_rgb, mask_bw, prompt, target_size)
+    return np.array(edited_image)

src/helper.py DELETED Viewed

@@ -1,87 +0,0 @@
-import os
-import sys
-from urllib.parse import urlparse
-import cv2
-import numpy as np
-import torch
-from torch.hub import download_url_to_file, get_dir
-LAMA_MODEL_URL = os.environ.get(
-    "LAMA_MODEL_URL",
-    "https://github.com/Sanster/models/releases/download/add_big_lama/big-lama.pt",
-)
-def download_model(url=LAMA_MODEL_URL):
-    parts = urlparse(url)
-    hub_dir = get_dir()
-    model_dir = os.path.join(hub_dir, "checkpoints")
-    if not os.path.isdir(model_dir):
-        os.makedirs(os.path.join(model_dir, "hub", "checkpoints"))
-    filename = os.path.basename(parts.path)
-    cached_file = os.path.join(model_dir, filename)
-    if not os.path.exists(cached_file):
-        sys.stderr.write('Downloading: "{}" to {}\n'.format(url, cached_file))
-        hash_prefix = None
-        download_url_to_file(url, cached_file, hash_prefix, progress=True)
-    return cached_file
-def ceil_modulo(x, mod):
-    if x % mod == 0:
-        return x
-    return (x // mod + 1) * mod
-def numpy_to_bytes(image_numpy: np.ndarray) -> bytes:
-    data = cv2.imencode(".jpg", image_numpy)[1]
-    image_bytes = data.tobytes()
-    return image_bytes
-def load_img(img_bytes, gray: bool = False):
-    nparr = np.frombuffer(img_bytes, np.uint8)
-    if gray:
-        np_img = cv2.imdecode(nparr, cv2.IMREAD_GRAYSCALE)
-    else:
-        np_img = cv2.imdecode(nparr, cv2.IMREAD_UNCHANGED)
-        if len(np_img.shape) == 3 and np_img.shape[2] == 4:
-            np_img = cv2.cvtColor(np_img, cv2.COLOR_BGRA2RGB)
-        else:
-            np_img = cv2.cvtColor(np_img, cv2.COLOR_BGR2RGB)
-    return np_img
-def norm_img(np_img):
-    if len(np_img.shape) == 2:
-        np_img = np_img[:, :, np.newaxis]
-    np_img = np.transpose(np_img, (2, 0, 1))
-    np_img = np_img.astype("float32") / 255
-    return np_img
-def resize_max_size(
-    np_img, size_limit: int, interpolation=cv2.INTER_CUBIC
-) -> np.ndarray:
-    # Resize image's longer size to size_limit if longer size larger than size_limit
-    h, w = np_img.shape[:2]
-    if max(h, w) > size_limit:
-        ratio = size_limit / max(h, w)
-        new_w = int(w * ratio + 0.5)
-        new_h = int(h * ratio + 0.5)
-        return cv2.resize(np_img, dsize=(new_w, new_h), interpolation=interpolation)
-    else:
-        return np_img
-def pad_img_to_modulo(img, mod):
-    channels, height, width = img.shape
-    out_height = ceil_modulo(height, mod)
-    out_width = ceil_modulo(width, mod)
-    return np.pad(
-        img,
-        ((0, 0), (0, out_height - height), (0, out_width - width)),
-        mode="symmetric",
-    )

src/st_style.py DELETED Viewed

@@ -1,42 +0,0 @@
-button_style = """
-<style>
-div.stButton > button:first-child {
-    background-color: rgb(255, 75, 75);
-    color: rgb(255, 255, 255);
-}
-div.stButton > button:hover {
-    background-color: rgb(255, 75, 75);
-    color: rgb(255, 255, 255);
-}
-div.stButton > button:active {
-    background-color: rgb(255, 75, 75);
-    color: rgb(255, 255, 255);
-}
-div.stButton > button:focus {
-    background-color: rgb(255, 75, 75);
-    color: rgb(255, 255, 255);
-}
-.css-1cpxqw2:focus:not(:active) {
-    background-color: rgb(255, 75, 75);
-    border-color: rgb(255, 75, 75);
-    color: rgb(255, 255, 255);
-}
-"""
-style = """
-<style>
-#MainMenu {
-    visibility: hidden;
-}
-footer {
-    visibility: hidden;
-}
-header {
-    visibility: hidden;
-}
-</style>
-"""
-def apply_prod_style(st):
-    return st.markdown(style, unsafe_allow_html=True)