Text-to-Image
Diffusers
English
sdxl
sdxl-turbo
stable-diffusion
image-to-image
image-generation
image-editing
fastapi
mps
Instructions to use sujithputta/Lumaforge with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use sujithputta/Lumaforge with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("sujithputta/Lumaforge", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Commit Β·
f47d70f
1
Parent(s): 9c2da37
feat: implement premium cinematic typography layouts, revert ControlNet, and remove token
Browse files- README.md +17 -15
- app.py +20 -18
- download_sd21.py +34 -0
- download_sd35.py +34 -0
- download_sdxl_turbo_fp16.py +30 -0
- lumaforge/ollama_client.py +348 -136
- lumaforge/pipeline.py +465 -133
- test_generation.py +91 -0
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
title: LumaForge-Image Generation Model
|
| 3 |
emoji: π
|
| 4 |
colorFrom: indigo
|
| 5 |
colorTo: purple
|
|
@@ -10,41 +10,43 @@ license: mit
|
|
| 10 |
language:
|
| 11 |
- en
|
| 12 |
base_model:
|
| 13 |
-
-
|
| 14 |
library_name: diffusers
|
| 15 |
tags:
|
| 16 |
- diffusers
|
| 17 |
-
-
|
|
|
|
| 18 |
- stable-diffusion
|
| 19 |
- text-to-image
|
| 20 |
- image-to-image
|
| 21 |
- image-generation
|
| 22 |
- image-editing
|
| 23 |
-
- colorization
|
| 24 |
-
- face-restoration
|
| 25 |
- fastapi
|
| 26 |
- mps
|
| 27 |
---
|
| 28 |
|
| 29 |
-
# π LumaForge
|
| 30 |
|
| 31 |
-
LumaForge is a powerful image generation model built on
|
| 32 |
|
| 33 |
-
###
|
| 34 |
-
Text-to-Image generation with **16 specialized categories**, Image-to-Image styling, advanced image editing (colorization & face restoration), 2x upscaling, background removal, dataset curation, and LoRA fine-tuning.
|
| 35 |
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
-
|
| 39 |
-
- **
|
| 40 |
-
- **Face Restoration Endpoint**: Enhance facial features with 4 intensity levels (Low, Medium, High, Ultra)
|
| 41 |
-
- **Advanced Prompt Enhancement**: Category-aware prompt expansion for superior generation quality
|
| 42 |
|
| 43 |
### π Model Specifications
|
| 44 |
|
| 45 |
| Specification | Details |
|
| 46 |
|--------------|---------|
|
| 47 |
-
| **Base Model** |
|
|
|
|
|
|
|
| 48 |
| **Backend** | FastAPI with PyTorch & Diffusers |
|
| 49 |
| **Device Support** | Apple Silicon MPS, CPU fallback |
|
| 50 |
| **Categories** | 16 specialized categories with 110+ prompt templates |
|
|
|
|
| 1 |
---
|
| 2 |
+
title: LumaForge-Image Generation Model v2.0 (SDXL Turbo)
|
| 3 |
emoji: π
|
| 4 |
colorFrom: indigo
|
| 5 |
colorTo: purple
|
|
|
|
| 10 |
language:
|
| 11 |
- en
|
| 12 |
base_model:
|
| 13 |
+
- stabilityai/sdxl-turbo
|
| 14 |
library_name: diffusers
|
| 15 |
tags:
|
| 16 |
- diffusers
|
| 17 |
+
- sdxl
|
| 18 |
+
- sdxl-turbo
|
| 19 |
- stable-diffusion
|
| 20 |
- text-to-image
|
| 21 |
- image-to-image
|
| 22 |
- image-generation
|
| 23 |
- image-editing
|
|
|
|
|
|
|
| 24 |
- fastapi
|
| 25 |
- mps
|
| 26 |
---
|
| 27 |
|
| 28 |
+
# π LumaForge v2.0 - SDXL Turbo Image Generation
|
| 29 |
|
| 30 |
+
LumaForge is a powerful image generation model built on **SDXL Turbo**, featuring ultra-fast 4-step generation, superior quality, and advanced image editing capabilities. This repository contains the complete model backend with a FastAPI interface, designed to be deployed directly to **Hugging Face Spaces**.
|
| 31 |
|
| 32 |
+
### π What's New in v2.0
|
|
|
|
| 33 |
|
| 34 |
+
- **β‘ SDXL Turbo**: Upgraded from SD 1.5 to SDXL Turbo for dramatically better quality
|
| 35 |
+
- **π― 4-Step Generation**: Ultra-fast 4-6 step generation (vs 30-40 steps in v1.x)
|
| 36 |
+
- **π 3-4x Faster**: 8-15 seconds per image (vs 40-60 seconds)
|
| 37 |
+
- **π¨ Better Quality**: Superior prompt following, better anatomy, higher resolution
|
| 38 |
+
- **β¨ Enhanced Prompts**: Optimized prompt engineering for SDXL Turbo
|
| 39 |
|
| 40 |
+
### Model Capabilities
|
| 41 |
+
Text-to-Image generation with **16 specialized categories**, Image-to-Image styling, advanced image editing (colorization & face restoration), 2x upscaling, background removal, dataset curation, and fine-tuning support.
|
|
|
|
|
|
|
| 42 |
|
| 43 |
### π Model Specifications
|
| 44 |
|
| 45 |
| Specification | Details |
|
| 46 |
|--------------|---------|
|
| 47 |
+
| **Base Model** | SDXL Turbo (Stability AI) |
|
| 48 |
+
| **Generation Speed** | 4 steps, 8-15 seconds per image |
|
| 49 |
+
| **Quality** | High-quality, photorealistic results |
|
| 50 |
| **Backend** | FastAPI with PyTorch & Diffusers |
|
| 51 |
| **Device Support** | Apple Silicon MPS, CPU fallback |
|
| 52 |
| **Categories** | 16 specialized categories with 110+ prompt templates |
|
app.py
CHANGED
|
@@ -106,7 +106,7 @@ app.add_middleware(
|
|
| 106 |
# Singletons for backend resources
|
| 107 |
ollama_client = OllamaClient()
|
| 108 |
safety_manager = SafetyManager(ollama_client=ollama_client)
|
| 109 |
-
pipeline = LumaForgePipeline(device="mps")
|
| 110 |
session_manager = SessionManager()
|
| 111 |
|
| 112 |
# Background training tracking
|
|
@@ -151,8 +151,8 @@ class GenerateRequest(BaseModel):
|
|
| 151 |
prompt: str
|
| 152 |
mode: str = Field(default="general", description="Preset expansion style (general, poster, character)")
|
| 153 |
aspect_ratio: str = Field(default="1:1", description="Dimensions (1:1, 16:9, 9:16, 4:3, 3:4)")
|
| 154 |
-
steps: int = Field(default=
|
| 155 |
-
guidance_scale: float = Field(default=
|
| 156 |
negative_prompt: str = ""
|
| 157 |
seed: int = -1
|
| 158 |
mock: bool = Field(default=True, description="Run mock generation pipeline (default True)")
|
|
@@ -181,8 +181,8 @@ class Img2ImgRequest(BaseModel):
|
|
| 181 |
image_b64: str
|
| 182 |
strength: float = Field(default=0.5, ge=0.0, le=1.0)
|
| 183 |
mode: str = Field(default="general", description="Preset expansion style (general, poster, character)")
|
| 184 |
-
steps: int = Field(default=
|
| 185 |
-
guidance_scale: float = Field(default=
|
| 186 |
negative_prompt: str = ""
|
| 187 |
seed: int = -1
|
| 188 |
mock: bool = Field(default=False, description="Run mock generation pipeline")
|
|
@@ -211,8 +211,8 @@ class GenerateSessionRequest(BaseModel):
|
|
| 211 |
prompt: str
|
| 212 |
mode: str = Field(default="general", description="Preset expansion style (general, poster, character)")
|
| 213 |
aspect_ratio: str = Field(default="1:1", description="Dimensions (1:1, 16:9, 9:16, 4:3, 3:4)")
|
| 214 |
-
steps: int = Field(default=
|
| 215 |
-
guidance_scale: float = Field(default=
|
| 216 |
negative_prompt: str = ""
|
| 217 |
seed: int = -1
|
| 218 |
mock: bool = Field(default=False, description="Run mock generation pipeline")
|
|
@@ -342,13 +342,12 @@ def api_models_switch(req: ModelSwitchRequest, request: Request):
|
|
| 342 |
@app.post("/api/coherence-check")
|
| 343 |
def api_coherence_check(req: CoherenceCheckRequest, request: Request):
|
| 344 |
api_limiter.check_limit(request)
|
| 345 |
-
|
| 346 |
-
|
| 347 |
-
|
| 348 |
-
|
| 349 |
-
|
| 350 |
-
|
| 351 |
-
}
|
| 352 |
|
| 353 |
@app.post("/api/enhance-image")
|
| 354 |
def api_enhance_image(req: EnhanceImageRequest, request: Request):
|
|
@@ -580,14 +579,14 @@ def api_generate(req: GenerateRequest, request: Request):
|
|
| 580 |
# 4. Save locally for record-keeping and post-safety checks
|
| 581 |
os.makedirs("outputs", exist_ok=True)
|
| 582 |
out_path = os.path.join("outputs", f"output_{gen_res['seed']}.png")
|
| 583 |
-
gen_res["image"].save(out_path)
|
| 584 |
|
| 585 |
# 5. Output Post-generation Screen
|
| 586 |
post_res = safety_manager.check_output_safety(out_path, mod_res)
|
| 587 |
|
| 588 |
# 6. Convert image to Base64 to return in JSON payload
|
| 589 |
buffered = BytesIO()
|
| 590 |
-
gen_res["image"].save(buffered, format="PNG")
|
| 591 |
img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
|
| 592 |
image_b64 = f"data:image/png;base64,{img_str}"
|
| 593 |
|
|
@@ -663,14 +662,14 @@ def api_generate_img2img(req: Img2ImgRequest, request: Request):
|
|
| 663 |
# 5. Save locally for record-keeping and post-safety checks
|
| 664 |
os.makedirs("outputs", exist_ok=True)
|
| 665 |
out_path = os.path.join("outputs", f"output_{gen_res['seed']}.png")
|
| 666 |
-
gen_res["image"].save(out_path)
|
| 667 |
|
| 668 |
# 6. Output Post-generation Screen
|
| 669 |
post_res = safety_manager.check_output_safety(out_path, mod_res)
|
| 670 |
|
| 671 |
# 7. Convert image to Base64 to return in JSON payload
|
| 672 |
buffered = BytesIO()
|
| 673 |
-
gen_res["image"].save(buffered, format="PNG")
|
| 674 |
img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
|
| 675 |
image_b64 = f"data:image/png;base64,{img_str}"
|
| 676 |
|
|
@@ -897,8 +896,11 @@ def generate_session_worker(session_id: str, req: GenerateSessionRequest):
|
|
| 897 |
|
| 898 |
# 2. Prompt Adapter Expansion
|
| 899 |
print(f"[Session {session_id}] Expanding prompt in mode '{req.mode}'")
|
|
|
|
| 900 |
expanded = ollama_client.expand_prompt(final_prompt, mode=req.mode)
|
| 901 |
gen_prompt = expanded.get("full_prompt", final_prompt)
|
|
|
|
|
|
|
| 902 |
|
| 903 |
# 3. Image Generation
|
| 904 |
print(f"[Session {session_id}] Generating image (mock={req.mock}, device={req.device})...")
|
|
|
|
| 106 |
# Singletons for backend resources
|
| 107 |
ollama_client = OllamaClient()
|
| 108 |
safety_manager = SafetyManager(ollama_client=ollama_client)
|
| 109 |
+
pipeline = LumaForgePipeline(device="mps", ollama_client=ollama_client)
|
| 110 |
session_manager = SessionManager()
|
| 111 |
|
| 112 |
# Background training tracking
|
|
|
|
| 151 |
prompt: str
|
| 152 |
mode: str = Field(default="general", description="Preset expansion style (general, poster, character)")
|
| 153 |
aspect_ratio: str = Field(default="1:1", description="Dimensions (1:1, 16:9, 9:16, 4:3, 3:4)")
|
| 154 |
+
steps: int = Field(default=28, ge=1, le=100) # SD 3.5 Medium optimal: 28 steps
|
| 155 |
+
guidance_scale: float = Field(default=4.5, ge=0.0, le=20.0) # SD 3.5 Medium optimal: 4.5 guidance
|
| 156 |
negative_prompt: str = ""
|
| 157 |
seed: int = -1
|
| 158 |
mock: bool = Field(default=True, description="Run mock generation pipeline (default True)")
|
|
|
|
| 181 |
image_b64: str
|
| 182 |
strength: float = Field(default=0.5, ge=0.0, le=1.0)
|
| 183 |
mode: str = Field(default="general", description="Preset expansion style (general, poster, character)")
|
| 184 |
+
steps: int = Field(default=28, ge=1, le=100) # SD 3.5 Medium optimal: 28 steps
|
| 185 |
+
guidance_scale: float = Field(default=4.5, ge=0.0, le=20.0) # SD 3.5 Medium optimal: 4.5 guidance
|
| 186 |
negative_prompt: str = ""
|
| 187 |
seed: int = -1
|
| 188 |
mock: bool = Field(default=False, description="Run mock generation pipeline")
|
|
|
|
| 211 |
prompt: str
|
| 212 |
mode: str = Field(default="general", description="Preset expansion style (general, poster, character)")
|
| 213 |
aspect_ratio: str = Field(default="1:1", description="Dimensions (1:1, 16:9, 9:16, 4:3, 3:4)")
|
| 214 |
+
steps: int = Field(default=28, ge=1, le=100) # SD 3.5 Medium optimal: 28 steps
|
| 215 |
+
guidance_scale: float = Field(default=4.5, ge=0.0, le=20.0) # SD 3.5 Medium optimal: 4.5 guidance
|
| 216 |
negative_prompt: str = ""
|
| 217 |
seed: int = -1
|
| 218 |
mock: bool = Field(default=False, description="Run mock generation pipeline")
|
|
|
|
| 342 |
@app.post("/api/coherence-check")
|
| 343 |
def api_coherence_check(req: CoherenceCheckRequest, request: Request):
|
| 344 |
api_limiter.check_limit(request)
|
| 345 |
+
print(f"\n[API Coherence Check] Evaluating prompt: \"{req.prompt}\"")
|
| 346 |
+
result = ollama_client.check_prompt_coherence(req.prompt)
|
| 347 |
+
print(f" -> Score: {result.get('coherence_score')} ({result.get('coherence_level', '').upper()})")
|
| 348 |
+
print(f" -> Violations: {result.get('violations')}")
|
| 349 |
+
print(f" -> Recommendation: \"{result.get('recommendation')}\"")
|
| 350 |
+
return result
|
|
|
|
| 351 |
|
| 352 |
@app.post("/api/enhance-image")
|
| 353 |
def api_enhance_image(req: EnhanceImageRequest, request: Request):
|
|
|
|
| 579 |
# 4. Save locally for record-keeping and post-safety checks
|
| 580 |
os.makedirs("outputs", exist_ok=True)
|
| 581 |
out_path = os.path.join("outputs", f"output_{gen_res['seed']}.png")
|
| 582 |
+
gen_res["image"].save(out_path, pnginfo=gen_res.get("pnginfo"))
|
| 583 |
|
| 584 |
# 5. Output Post-generation Screen
|
| 585 |
post_res = safety_manager.check_output_safety(out_path, mod_res)
|
| 586 |
|
| 587 |
# 6. Convert image to Base64 to return in JSON payload
|
| 588 |
buffered = BytesIO()
|
| 589 |
+
gen_res["image"].save(buffered, format="PNG", pnginfo=gen_res.get("pnginfo"))
|
| 590 |
img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
|
| 591 |
image_b64 = f"data:image/png;base64,{img_str}"
|
| 592 |
|
|
|
|
| 662 |
# 5. Save locally for record-keeping and post-safety checks
|
| 663 |
os.makedirs("outputs", exist_ok=True)
|
| 664 |
out_path = os.path.join("outputs", f"output_{gen_res['seed']}.png")
|
| 665 |
+
gen_res["image"].save(out_path, pnginfo=gen_res.get("pnginfo"))
|
| 666 |
|
| 667 |
# 6. Output Post-generation Screen
|
| 668 |
post_res = safety_manager.check_output_safety(out_path, mod_res)
|
| 669 |
|
| 670 |
# 7. Convert image to Base64 to return in JSON payload
|
| 671 |
buffered = BytesIO()
|
| 672 |
+
gen_res["image"].save(buffered, format="PNG", pnginfo=gen_res.get("pnginfo"))
|
| 673 |
img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
|
| 674 |
image_b64 = f"data:image/png;base64,{img_str}"
|
| 675 |
|
|
|
|
| 896 |
|
| 897 |
# 2. Prompt Adapter Expansion
|
| 898 |
print(f"[Session {session_id}] Expanding prompt in mode '{req.mode}'")
|
| 899 |
+
print(f"[Session {session_id}] DEBUG - Input to expand_prompt: '{final_prompt}'")
|
| 900 |
expanded = ollama_client.expand_prompt(final_prompt, mode=req.mode)
|
| 901 |
gen_prompt = expanded.get("full_prompt", final_prompt)
|
| 902 |
+
print(f"[Session {session_id}] DEBUG - After expand_prompt: '{gen_prompt}'")
|
| 903 |
+
print(f"[Session {session_id}] DEBUG - gen_prompt length: {len(gen_prompt)} chars")
|
| 904 |
|
| 905 |
# 3. Image Generation
|
| 906 |
print(f"[Session {session_id}] Generating image (mock={req.mock}, device={req.device})...")
|
download_sd21.py
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Download Realistic Vision V2 for excellent photorealistic results on Apple MPS"""
|
| 3 |
+
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
|
| 4 |
+
import torch
|
| 5 |
+
|
| 6 |
+
print("π Downloading Realistic Vision V2.0...")
|
| 7 |
+
print("π¦ Size: ~4GB")
|
| 8 |
+
print("β
Excellent photorealistic quality!")
|
| 9 |
+
print("π¨ Works perfectly on Apple MPS")
|
| 10 |
+
print("")
|
| 11 |
+
|
| 12 |
+
model_id = "SG161222/Realistic_Vision_V2.0"
|
| 13 |
+
|
| 14 |
+
print("β¬οΈ Downloading Realistic Vision V2...")
|
| 15 |
+
pipe = StableDiffusionPipeline.from_pretrained(
|
| 16 |
+
model_id,
|
| 17 |
+
torch_dtype=torch.float16,
|
| 18 |
+
cache_dir="~/.cache/huggingface/hub",
|
| 19 |
+
safety_checker=None
|
| 20 |
+
)
|
| 21 |
+
|
| 22 |
+
# Configure scheduler
|
| 23 |
+
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
|
| 24 |
+
pipe.scheduler.config
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
print("")
|
| 28 |
+
print("β
Realistic Vision V2 downloaded successfully!")
|
| 29 |
+
print("πΎ Cached at: ~/.cache/huggingface/hub/")
|
| 30 |
+
print("")
|
| 31 |
+
print("π― Next steps:")
|
| 32 |
+
print(" 1. Restart backend: cd model && python3 app.py")
|
| 33 |
+
print(" 2. Test at: http://localhost:3000")
|
| 34 |
+
print(" 3. Expected: Photorealistic quality, 20-25 seconds, NO black images!")
|
download_sd35.py
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Download Stable Diffusion 3.5 Medium for high-quality inference"""
|
| 3 |
+
from diffusers import StableDiffusion3Pipeline
|
| 4 |
+
import torch
|
| 5 |
+
import os
|
| 6 |
+
|
| 7 |
+
print("π Downloading Stable Diffusion 3.5 Medium...")
|
| 8 |
+
print("π¦ Size: ~5-6GB")
|
| 9 |
+
print("π¨ Latest Stability AI model with excellent quality!")
|
| 10 |
+
print("")
|
| 11 |
+
|
| 12 |
+
model_id = "stabilityai/stable-diffusion-3.5-medium"
|
| 13 |
+
token = os.getenv("HF_TOKEN")
|
| 14 |
+
|
| 15 |
+
# Expand cache dir properly
|
| 16 |
+
cache_dir = os.path.expanduser("~/.cache/huggingface/hub")
|
| 17 |
+
|
| 18 |
+
print("β¬οΈ Downloading SD 3.5 Medium with authentication...")
|
| 19 |
+
pipe = StableDiffusion3Pipeline.from_pretrained(
|
| 20 |
+
model_id,
|
| 21 |
+
torch_dtype=torch.float16,
|
| 22 |
+
cache_dir=cache_dir,
|
| 23 |
+
token=token,
|
| 24 |
+
resume_download=True
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
print("")
|
| 28 |
+
print("β
SD 3.5 Medium downloaded successfully!")
|
| 29 |
+
print(f"πΎ Cached at: {cache_dir}")
|
| 30 |
+
print("")
|
| 31 |
+
print("π― Next steps:")
|
| 32 |
+
print(" 1. Restart backend: cd model && python3 app.py")
|
| 33 |
+
print(" 2. Test at: http://localhost:3000")
|
| 34 |
+
print(" 3. Expected: Best quality, 25-35 seconds!")
|
download_sdxl_turbo_fp16.py
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Download SDXL Turbo fp16 variant (7GB) for faster performance"""
|
| 3 |
+
from diffusers import AutoPipelineForText2Image
|
| 4 |
+
import torch
|
| 5 |
+
import os
|
| 6 |
+
|
| 7 |
+
print("π Downloading SDXL Turbo fp16 variant...")
|
| 8 |
+
print("π¦ Size: ~7GB (much faster than float32)")
|
| 9 |
+
print("")
|
| 10 |
+
|
| 11 |
+
model_id = "stabilityai/sdxl-turbo"
|
| 12 |
+
cache_dir = os.path.expanduser("~/.cache/huggingface/hub")
|
| 13 |
+
|
| 14 |
+
print("β¬οΈ Downloading fp16 variant...")
|
| 15 |
+
pipe = AutoPipelineForText2Image.from_pretrained(
|
| 16 |
+
model_id,
|
| 17 |
+
torch_dtype=torch.float16,
|
| 18 |
+
variant="fp16",
|
| 19 |
+
cache_dir=cache_dir,
|
| 20 |
+
resume_download=True # Resume if interrupted
|
| 21 |
+
)
|
| 22 |
+
|
| 23 |
+
print("")
|
| 24 |
+
print("β
SDXL Turbo fp16 downloaded successfully!")
|
| 25 |
+
print("πΎ Cached at: ~/.cache/huggingface/hub/")
|
| 26 |
+
print("")
|
| 27 |
+
print("π― Next steps:")
|
| 28 |
+
print(" 1. Restart backend: cd model && python3 app.py")
|
| 29 |
+
print(" 2. Test at: http://localhost:3000")
|
| 30 |
+
print(" 3. Expected: Fast inference, NO black images!")
|
lumaforge/ollama_client.py
CHANGED
|
@@ -105,160 +105,372 @@ class OllamaClient:
|
|
| 105 |
# Basic offline rewrite logic
|
| 106 |
return prompt.replace("blood", "red paint").replace("gore", "intensity").replace("kill", "defeat")
|
| 107 |
|
| 108 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
def expand_prompt(self, prompt: str, mode: str = "general", category: str = None, subcategory: str = None) -> dict:
|
| 111 |
"""
|
| 112 |
-
Expands
|
| 113 |
-
Optionally integrates category-specific enhancements.
|
| 114 |
"""
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
"-
|
| 121 |
-
"-
|
| 122 |
-
"
|
| 123 |
-
"
|
| 124 |
-
"
|
| 125 |
-
"
|
| 126 |
-
"
|
| 127 |
-
"-
|
| 128 |
-
|
| 129 |
-
"
|
| 130 |
-
"
|
| 131 |
-
"
|
| 132 |
-
"
|
| 133 |
-
"
|
| 134 |
-
"
|
| 135 |
-
"
|
| 136 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 137 |
"{\n"
|
| 138 |
-
' "
|
| 139 |
-
' "
|
| 140 |
-
' "
|
| 141 |
-
' "
|
| 142 |
-
' "
|
| 143 |
-
' "camera": "...",\n'
|
| 144 |
-
' "mood": "...",\n'
|
| 145 |
-
' "quality_emphasis": "...",\n'
|
| 146 |
-
' "safety_constraints": "..."\n'
|
| 147 |
"}"
|
| 148 |
)
|
| 149 |
|
| 150 |
data = {
|
| 151 |
"model": self.model,
|
| 152 |
-
"prompt": f"{
|
| 153 |
"stream": False,
|
| 154 |
"format": "json"
|
| 155 |
}
|
| 156 |
|
| 157 |
res = self._call_api("/api/generate", data)
|
| 158 |
-
|
| 159 |
-
fallback_fields = {
|
| 160 |
-
"subject": prompt,
|
| 161 |
-
"action": "standing",
|
| 162 |
-
"environment": "simple background",
|
| 163 |
-
"style": "cinematic movie poster" if mode == "poster" else "digital art character portrait",
|
| 164 |
-
"lighting": "dramatic cinematic lighting",
|
| 165 |
-
"camera": "centered hero shot",
|
| 166 |
-
"mood": "heroic",
|
| 167 |
-
"quality_emphasis": "high detail, polished finish",
|
| 168 |
-
"safety_constraints": "artistic representation"
|
| 169 |
-
}
|
| 170 |
-
|
| 171 |
if not res:
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
try:
|
| 175 |
-
expanded = json.loads(res.get("response", "").strip())
|
| 176 |
-
except Exception:
|
| 177 |
-
expanded = fallback_fields
|
| 178 |
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 183 |
|
| 184 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 185 |
import re
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 195 |
|
| 196 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 197 |
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
expanded["subject"] = f"{expanded['subject']}, {color_kw}"
|
| 233 |
-
|
| 234 |
-
# 4. Text/Title preservation (extract any quoted title and reinforce typography instructions)
|
| 235 |
-
quoted_titles = re.findall(r'["\']([^"\']+)["\']', prompt)
|
| 236 |
-
if quoted_titles:
|
| 237 |
-
for title in quoted_titles:
|
| 238 |
-
title_kw = f'bold typography movie title text "{title}", centered poster title layout, clean lettering'
|
| 239 |
-
if title.lower() not in expanded["subject"].lower() and title.lower() not in expanded["style"].lower():
|
| 240 |
-
expanded["subject"] = f'{expanded["subject"]}, featuring the {title_kw}'
|
| 241 |
-
|
| 242 |
-
# 5. Category-specific enhancements
|
| 243 |
-
if category and subcategory:
|
| 244 |
-
try:
|
| 245 |
-
from lumaforge.category_prompts import get_category_prompts
|
| 246 |
-
category_prompt = get_category_prompts(category, subcategory)
|
| 247 |
-
if category_prompt:
|
| 248 |
-
expanded["style"] = f"{expanded['style']}, {category_prompt}"
|
| 249 |
-
except Exception as e:
|
| 250 |
-
print(f"[OllamaClient Warning] Failed to apply category enhancement: {e}")
|
| 251 |
-
|
| 252 |
-
# Consolidate into full prompt
|
| 253 |
-
parts = [
|
| 254 |
-
expanded.get("subject", ""),
|
| 255 |
-
expanded.get("action", ""),
|
| 256 |
-
expanded.get("environment", ""),
|
| 257 |
-
expanded.get("style", ""),
|
| 258 |
-
expanded.get("lighting", ""),
|
| 259 |
-
expanded.get("camera", ""),
|
| 260 |
-
expanded.get("mood", ""),
|
| 261 |
-
expanded.get("quality_emphasis", "")
|
| 262 |
-
]
|
| 263 |
-
expanded["full_prompt"] = ", ".join([str(p) for p in parts if p])
|
| 264 |
-
return expanded
|
|
|
|
| 105 |
# Basic offline rewrite logic
|
| 106 |
return prompt.replace("blood", "red paint").replace("gore", "intensity").replace("kill", "defeat")
|
| 107 |
|
| 108 |
+
rewritten = res.get("response", "").strip().strip('"').strip("'")
|
| 109 |
+
|
| 110 |
+
# Check if the rewritten response is an LLM refusal (false positive safety trigger)
|
| 111 |
+
low_rewritten = rewritten.lower()
|
| 112 |
+
refusal_markers = [
|
| 113 |
+
"sorry", "fulfill", "request", "cannot", "can't", "guidelines",
|
| 114 |
+
"policy", "inappropriate", "unable to", "restrict", "violation"
|
| 115 |
+
]
|
| 116 |
+
|
| 117 |
+
if not rewritten or any(marker in low_rewritten for marker in refusal_markers):
|
| 118 |
+
print(f"[OllamaClient Warning] Rewrite failed/refused (returned: '{rewritten}'). Using heuristic fallback.")
|
| 119 |
+
clean_prompt = prompt
|
| 120 |
+
replacements = {
|
| 121 |
+
"blood": "red paint",
|
| 122 |
+
"gore": "intensity",
|
| 123 |
+
"kill": "defeat",
|
| 124 |
+
"dead": "fallen",
|
| 125 |
+
"murder": "defeat",
|
| 126 |
+
"suicide": "sacrifice",
|
| 127 |
+
"naked": "dressed",
|
| 128 |
+
"nude": "dressed",
|
| 129 |
+
"porn": "fine art",
|
| 130 |
+
"terrorist": "warrior",
|
| 131 |
+
"bomb": "crystal energy"
|
| 132 |
+
}
|
| 133 |
+
for word, rep in replacements.items():
|
| 134 |
+
import re
|
| 135 |
+
clean_prompt = re.sub(re.escape(word), rep, clean_prompt, flags=re.IGNORECASE)
|
| 136 |
+
return clean_prompt
|
| 137 |
+
|
| 138 |
+
return rewritten
|
| 139 |
|
| 140 |
def expand_prompt(self, prompt: str, mode: str = "general", category: str = None, subcategory: str = None) -> dict:
|
| 141 |
"""
|
| 142 |
+
Expands the user prompt using predefined style presets and category descriptors.
|
|
|
|
| 143 |
"""
|
| 144 |
+
import re
|
| 145 |
+
|
| 146 |
+
scene_desc = prompt.strip()
|
| 147 |
+
|
| 148 |
+
mode_prompts = {
|
| 149 |
+
"art": "digital concept art, highly detailed, fantasy sci-fi surreal elements, matte painting style, vivid colors, masterfully rendered",
|
| 150 |
+
"character": "detailed character design, face close-up, full body view, character portrait, high resolution features, realistic proportions",
|
| 151 |
+
"landscape": "scenic landscape, natural scenery, epic vistas, 8k resolution, volumetric atmosphere, detailed clouds, beautiful natural lighting",
|
| 152 |
+
"architecture": "architectural photography, modern building exterior, luxury high-end interior, raytraced reflection, sharp lines, cinematic design",
|
| 153 |
+
"vehicle": "sleek sports car automotive photography, dynamic reflections, glossy metallic paint, dramatic lighting, sharp focus on chassis",
|
| 154 |
+
"product": "studio product mockup design, professional commercial advertising, clean product lighting, soft white backdrop, elegant minimalist packaging",
|
| 155 |
+
"marketing": "marketing poster design, commercial branding graphics, bold colors, professional graphic design layout, vector advertising poster",
|
| 156 |
+
"food": "appetizing gourmet food plating photography, close-up delicious shot, professional food styling, organic fresh ingredients, warm lighting, blurred background",
|
| 157 |
+
"fashion": "high fashion lookbook editorial photography, designer clothing, haute couture runway style, model posing, dramatic studio lighting",
|
| 158 |
+
"game": "fantasy game asset, detailed icon, weapon sprite, interface vector, dark clean background, isolated graphic, item artifact",
|
| 159 |
+
"animal": "national geographic wildlife photography, sharp animal portrait, detailed fur textures, macro focus on eyes, natural habitat background",
|
| 160 |
+
"event": "elegant festival poster design, celebration event invitation artwork, bright colors, greeting card design",
|
| 161 |
+
"business": "flat vector illustration, corporate infographic chart style, clean business graphics, presentation design elements, modern company colors",
|
| 162 |
+
"education": "clean scientific textbook illustration, medical biology schema diagram, detailed educational graphics, clear pointers and arrows",
|
| 163 |
+
"style_anime": "vibrant anime key visual style, highly detailed digital illustration, cel shaded, anime sketch, masterfully drawn",
|
| 164 |
+
"style_sketch": "hand-drawn pencil sketch, fine graphite line shading, cross-hatching detail, white textured paper background",
|
| 165 |
+
"style_oil": "oil on canvas art masterpiece, thick textured impasto brushstrokes, realistic paint texture, museum lighting",
|
| 166 |
+
"style_pixel": "retro pixel art, 8-bit game console graphics, 16-bit arcade sprite aesthetic, pixelated texture, vintage gaming",
|
| 167 |
+
"style_watercolor": "watercolor wash painting, delicate soft splatters, bleeding pastel pigment textures, hand-painted textured paper artwork"
|
| 168 |
+
}
|
| 169 |
+
|
| 170 |
+
if mode == "poster":
|
| 171 |
+
quoted_titles = re.findall(r'["\']([^"\']+)["\']', prompt)
|
| 172 |
+
if quoted_titles:
|
| 173 |
+
title = quoted_titles[0]
|
| 174 |
+
scene_desc = f'{prompt.strip()}, movie poster "{title}" with bold typography'
|
| 175 |
+
else:
|
| 176 |
+
scene_desc = f"{prompt.strip()}, cinematic movie poster layout"
|
| 177 |
+
elif mode in mode_prompts:
|
| 178 |
+
scene_desc = f"{prompt.strip()}, {mode_prompts[mode]}"
|
| 179 |
+
|
| 180 |
+
# Prevent fusion artifacts by detailing vague 'holding' actions
|
| 181 |
+
holding_pattern = re.compile(r'\b(holding|carrying|wielding|holding up|armed with)\b\s+(a|an|the)?\s*', re.IGNORECASE)
|
| 182 |
+
holding_match = holding_pattern.search(scene_desc)
|
| 183 |
+
if holding_match:
|
| 184 |
+
if not any(kw in scene_desc.lower() for kw in ["hand", "grip", "hilt", "stance", "pose", "clutching", "brandishing", "raised", "wielding with"]):
|
| 185 |
+
# Extract the noun phrase up to the next comma or end of string
|
| 186 |
+
start_idx = holding_match.end()
|
| 187 |
+
rest = scene_desc[start_idx:]
|
| 188 |
+
comma_idx = rest.find(',')
|
| 189 |
+
if comma_idx != -1:
|
| 190 |
+
noun_phrase = rest[:comma_idx].strip()
|
| 191 |
+
after_noun = rest[comma_idx:]
|
| 192 |
+
else:
|
| 193 |
+
noun_phrase = rest.strip()
|
| 194 |
+
after_noun = ""
|
| 195 |
+
|
| 196 |
+
# Build a detailed holding phrase
|
| 197 |
+
# Determine appropriate grip description based on standard nouns
|
| 198 |
+
if any(w in noun_phrase.lower() for w in ["sword", "weapon", "blade", "dagger", "saber", "axe", "staff", "shield", "spear", "lance", "gun", "pistol", "rifle"]):
|
| 199 |
+
detailed_hold = f"gripping the hilt and handle of the {noun_phrase} firmly in one hand, posing in a natural heroic stance"
|
| 200 |
+
else:
|
| 201 |
+
detailed_hold = f"holding the {noun_phrase} firmly in their hand, posing naturally"
|
| 202 |
+
|
| 203 |
+
scene_desc = scene_desc[:holding_match.start()] + detailed_hold + after_noun
|
| 204 |
+
|
| 205 |
+
# Build response dict
|
| 206 |
+
expanded = {
|
| 207 |
+
"subject": scene_desc,
|
| 208 |
+
"action": "",
|
| 209 |
+
"environment": "",
|
| 210 |
+
"style": mode_prompts.get(mode, ""),
|
| 211 |
+
"lighting": "",
|
| 212 |
+
"camera": "",
|
| 213 |
+
"mood": "",
|
| 214 |
+
"quality_emphasis": "8k resolution, masterfully rendered",
|
| 215 |
+
"safety_constraints": "safe for work",
|
| 216 |
+
"full_prompt": scene_desc
|
| 217 |
+
}
|
| 218 |
+
|
| 219 |
+
return expanded
|
| 220 |
+
|
| 221 |
+
def optimize_prompt_for_sd35(self, prompt: str, max_tokens: int = 256) -> dict:
|
| 222 |
+
"""
|
| 223 |
+
Uses Ollama iteratively to compress a prompt to fit SD 3.5 Medium's T5 token limit (256 tokens).
|
| 224 |
+
Keeps trying with stricter instructions until successful.
|
| 225 |
+
"""
|
| 226 |
+
# Estimate current tokens (rough: 1 token β 1.3 chars)
|
| 227 |
+
estimated_tokens = len(prompt) / 1.3
|
| 228 |
+
|
| 229 |
+
if estimated_tokens <= max_tokens:
|
| 230 |
+
# Already under limit, return as-is
|
| 231 |
+
return {
|
| 232 |
+
"optimized_prompt": prompt,
|
| 233 |
+
"original_tokens": int(estimated_tokens),
|
| 234 |
+
"final_tokens": int(estimated_tokens),
|
| 235 |
+
"was_compressed": False
|
| 236 |
+
}
|
| 237 |
+
|
| 238 |
+
max_chars = int(max_tokens * 1.3) # 256 tokens β 332 chars
|
| 239 |
+
optimized = prompt
|
| 240 |
+
attempt = 0
|
| 241 |
+
max_attempts = 3
|
| 242 |
+
|
| 243 |
+
# Try iteratively with increasingly strict instructions
|
| 244 |
+
while attempt < max_attempts:
|
| 245 |
+
attempt += 1
|
| 246 |
+
|
| 247 |
+
if attempt == 1:
|
| 248 |
+
# First attempt: Gentle compression
|
| 249 |
+
instruction = (
|
| 250 |
+
f"Compress this image prompt to MAXIMUM {max_chars} characters.\n"
|
| 251 |
+
f"Keep main subject, key details, lighting, style. Remove filler words.\n"
|
| 252 |
+
f"Use commas between concepts. Output ONLY the compressed prompt."
|
| 253 |
+
)
|
| 254 |
+
elif attempt == 2:
|
| 255 |
+
# Second attempt: More aggressive
|
| 256 |
+
instruction = (
|
| 257 |
+
f"URGENT: Compress to EXACTLY {max_chars} characters or LESS.\n"
|
| 258 |
+
f"Remove ALL: 'a', 'an', 'the', 'with', 'on', 'at', 'in', 'of'.\n"
|
| 259 |
+
f"Keep: subject, visuals, style. Use commas. NO extra words."
|
| 260 |
+
)
|
| 261 |
+
else:
|
| 262 |
+
# Final attempt: Maximum compression
|
| 263 |
+
instruction = (
|
| 264 |
+
f"CRITICAL: Must be {max_chars} chars MAX. Current too long.\n"
|
| 265 |
+
f"Only keep: main subject, 2-3 key adjectives, style, lighting.\n"
|
| 266 |
+
f"Format: 'subject, detail, detail, style, lighting' - nothing more."
|
| 267 |
+
)
|
| 268 |
+
|
| 269 |
+
data = {
|
| 270 |
+
"model": self.model,
|
| 271 |
+
"prompt": f"{instruction}\n\nInput ({len(optimized)} chars): \"{optimized}\"\n\nOutput:",
|
| 272 |
+
"stream": False
|
| 273 |
+
}
|
| 274 |
+
|
| 275 |
+
res = self._call_api("/api/generate", data)
|
| 276 |
+
if not res:
|
| 277 |
+
print(f"[OllamaClient] Ollama unavailable, using heuristic fallback")
|
| 278 |
+
return self._heuristic_compress_prompt(prompt, max_tokens)
|
| 279 |
+
|
| 280 |
+
new_optimized = res.get("response", "").strip().strip('"').strip("'")
|
| 281 |
+
|
| 282 |
+
# Validate compression
|
| 283 |
+
if not new_optimized or len(new_optimized) >= len(optimized):
|
| 284 |
+
print(f"[OllamaClient] Attempt {attempt}: Ollama didn't compress, retrying...")
|
| 285 |
+
continue
|
| 286 |
+
|
| 287 |
+
optimized = new_optimized
|
| 288 |
+
final_tokens = len(optimized) / 1.3
|
| 289 |
+
|
| 290 |
+
# Success! Check if under limit
|
| 291 |
+
if final_tokens <= max_tokens and len(optimized) <= max_chars:
|
| 292 |
+
print(f"[OllamaClient] β
Compressed successfully in {attempt} attempt(s): {int(estimated_tokens)} β {int(final_tokens)} tokens")
|
| 293 |
+
return {
|
| 294 |
+
"optimized_prompt": optimized,
|
| 295 |
+
"original_tokens": int(estimated_tokens),
|
| 296 |
+
"final_tokens": int(final_tokens),
|
| 297 |
+
"was_compressed": True
|
| 298 |
+
}
|
| 299 |
+
else:
|
| 300 |
+
print(f"[OllamaClient] Attempt {attempt}: {int(final_tokens)} tokens, still too long, retrying...")
|
| 301 |
+
|
| 302 |
+
# After max attempts, use heuristic as last resort
|
| 303 |
+
print(f"[OllamaClient] β οΈ Failed after {max_attempts} attempts, using heuristic fallback")
|
| 304 |
+
return self._heuristic_compress_prompt(prompt, max_tokens)
|
| 305 |
+
|
| 306 |
+
def _heuristic_compress_prompt(self, prompt: str, max_tokens: int = 256) -> dict:
|
| 307 |
+
"""Aggressive fallback compression when Ollama is offline or doesn't compress enough."""
|
| 308 |
+
import re
|
| 309 |
+
|
| 310 |
+
estimated_original = len(prompt) / 1.3
|
| 311 |
+
max_chars = int(max_tokens * 1.3) # 256 tokens β 332 chars
|
| 312 |
+
|
| 313 |
+
# Step 1: Split into words and remove filler words aggressively
|
| 314 |
+
fillers = {'a', 'an', 'the', 'with', 'in', 'at', 'on', 'of', 'and', 'or', 'but',
|
| 315 |
+
'very', 'extremely', 'really', 'quite', 'some', 'this', 'that',
|
| 316 |
+
'is', 'are', 'was', 'were', 'being', 'been', 'be', 'has', 'have'}
|
| 317 |
+
|
| 318 |
+
words = prompt.replace(',', ' ').split()
|
| 319 |
+
essential_words = [w.strip('.,;:!?') for w in words if w.lower() not in fillers]
|
| 320 |
+
|
| 321 |
+
# Step 2: Join with commas (more token-efficient than spaces for SD)
|
| 322 |
+
compressed = ', '.join(essential_words)
|
| 323 |
+
|
| 324 |
+
# Step 3: If still too long, truncate intelligently at word boundaries
|
| 325 |
+
if len(compressed) > max_chars:
|
| 326 |
+
compressed = compressed[:max_chars]
|
| 327 |
+
# Cut at last comma for clean break
|
| 328 |
+
if ',' in compressed:
|
| 329 |
+
compressed = compressed.rsplit(',', 1)[0].strip()
|
| 330 |
+
else:
|
| 331 |
+
compressed = compressed.rsplit(' ', 1)[0].strip()
|
| 332 |
+
|
| 333 |
+
# Step 4: Final safety check - if STILL too long, hard truncate
|
| 334 |
+
if len(compressed) > max_chars:
|
| 335 |
+
compressed = compressed[:max_chars-3].strip() + '...'
|
| 336 |
+
|
| 337 |
+
estimated_final = len(compressed) / 1.3
|
| 338 |
+
|
| 339 |
+
print(f"[OllamaClient] Heuristic compression: {len(prompt)} β {len(compressed)} chars ({int(estimated_original)} β {int(estimated_final)} tokens)")
|
| 340 |
+
|
| 341 |
+
return {
|
| 342 |
+
"optimized_prompt": compressed,
|
| 343 |
+
"original_tokens": int(estimated_original),
|
| 344 |
+
"final_tokens": int(estimated_final),
|
| 345 |
+
"was_compressed": True
|
| 346 |
+
}
|
| 347 |
+
|
| 348 |
+
def check_prompt_coherence(self, prompt: str) -> dict:
|
| 349 |
+
"""
|
| 350 |
+
Analyzes a prompt to ensure it obeys logical, physical, and scientific consistency.
|
| 351 |
+
Returns a dictionary with coherence_score, level, violations, and recommendation.
|
| 352 |
+
"""
|
| 353 |
+
system_instruction = (
|
| 354 |
+
"You are a physics, logic, and spatial consistency checker for AI image generation prompts.\n"
|
| 355 |
+
"Identify clear physical contradictions, scientific impossibilities, logic errors, or vague spatial/anatomical interactions (e.g. underwater fire, sunset at midnight, or 'holding/carrying' an object without describing the pose/grip/hands, which leads to body-object fusion glitches in diffusion models).\n"
|
| 356 |
+
"If the prompt describes a physically possible scene with clear spatial and anatomy relationships, it is completely coherent (score 1.0, no violations).\n"
|
| 357 |
+
"If the prompt has vague object interactions (e.g., 'holding a sword'), flag it as a violation/hazard and provide a recommendation to specify how they are holding/gripping it.\n"
|
| 358 |
+
"Format your output ONLY as a JSON object with this exact structure:\n"
|
| 359 |
"{\n"
|
| 360 |
+
' "coherence_score": 1.0 (if coherent) or 0.0 to 0.7 (if violations/hazards found),\n'
|
| 361 |
+
' "coherence_level": "high" (if score >= 0.8) or "medium" or "low",\n'
|
| 362 |
+
' "violations": ["list of issues/hazards found, or empty array if none"],\n'
|
| 363 |
+
' "recommendation": "rewritten prompt that enforces proper physics, structural logic, and specific posing, or empty string if already coherent and detailed",\n'
|
| 364 |
+
' "enhancement_needed": true | false\n'
|
|
|
|
|
|
|
|
|
|
|
|
|
| 365 |
"}"
|
| 366 |
)
|
| 367 |
|
| 368 |
data = {
|
| 369 |
"model": self.model,
|
| 370 |
+
"prompt": f"{system_instruction}\n\nPrompt to evaluate: \"{prompt}\"\n\nJSON output:",
|
| 371 |
"stream": False,
|
| 372 |
"format": "json"
|
| 373 |
}
|
| 374 |
|
| 375 |
res = self._call_api("/api/generate", data)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 376 |
if not res:
|
| 377 |
+
# Fallback heuristic if Ollama is offline
|
| 378 |
+
return self._heuristic_check_coherence(prompt)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 379 |
|
| 380 |
+
try:
|
| 381 |
+
content = res.get("response", "").strip()
|
| 382 |
+
result = json.loads(content)
|
| 383 |
+
# Ensure all required keys exist
|
| 384 |
+
if "coherence_score" not in result:
|
| 385 |
+
result["coherence_score"] = 0.85
|
| 386 |
+
if "coherence_level" not in result:
|
| 387 |
+
result["coherence_level"] = "high" if result["coherence_score"] > 0.8 else "medium"
|
| 388 |
+
if "violations" not in result:
|
| 389 |
+
result["violations"] = []
|
| 390 |
+
if "recommendation" not in result:
|
| 391 |
+
result["recommendation"] = ""
|
| 392 |
+
if "enhancement_needed" not in result:
|
| 393 |
+
result["enhancement_needed"] = len(result["violations"]) > 0
|
| 394 |
+
return result
|
| 395 |
+
except Exception:
|
| 396 |
+
return self._heuristic_check_coherence(prompt)
|
| 397 |
|
| 398 |
+
def _heuristic_check_coherence(self, prompt: str) -> dict:
|
| 399 |
+
"""Heuristic check when Ollama is offline."""
|
| 400 |
+
violations = []
|
| 401 |
+
p_lower = prompt.lower()
|
| 402 |
+
|
| 403 |
+
# Check for lighting contradiction
|
| 404 |
+
if "sunset" in p_lower and "noon" in p_lower:
|
| 405 |
+
violations.append("Contradictory time of day: contains both 'sunset' and 'noon'.")
|
| 406 |
+
if "neon light" in p_lower and "dark cave" in p_lower and not ("glowing" in p_lower or "illuminating" in p_lower):
|
| 407 |
+
violations.append("Ambient lighting conflict: neon light in a dark cave needs explicit light emission description.")
|
| 408 |
+
|
| 409 |
+
# Check for anatomy / physics contradiction
|
| 410 |
+
if "floating" in p_lower and not any(kw in p_lower for kw in ["space", "zero gravity", "fantasy", "magic", "levitating", "flying"]):
|
| 411 |
+
violations.append("Gravity violation: objects are 'floating' without space/fantasy context.")
|
| 412 |
+
if "symmetrical asymmetry" in p_lower:
|
| 413 |
+
violations.append("Semantic logic contradiction: 'symmetrical asymmetry'.")
|
| 414 |
+
|
| 415 |
+
# Check for vague object interaction/holding which causes fusion artifacts
|
| 416 |
import re
|
| 417 |
+
holding_pattern = re.compile(r'\b(holding|carrying|wielding|holding up|armed with)\b\s+(a|an|the)?\s*', re.IGNORECASE)
|
| 418 |
+
holding_match = holding_pattern.search(p_lower)
|
| 419 |
+
if holding_match:
|
| 420 |
+
if not any(kw in p_lower for kw in ["hand", "grip", "hilt", "stance", "pose", "clutching", "brandishing", "raised", "wielding with"]):
|
| 421 |
+
# Extract noun phrase
|
| 422 |
+
start_idx = holding_match.end()
|
| 423 |
+
rest = p_lower[start_idx:]
|
| 424 |
+
comma_idx = rest.find(',')
|
| 425 |
+
if comma_idx != -1:
|
| 426 |
+
noun_phrase = rest[:comma_idx].strip()
|
| 427 |
+
else:
|
| 428 |
+
noun_phrase = rest.strip()
|
| 429 |
+
violations.append(
|
| 430 |
+
f"Vague interaction: '{holding_match.group(1)} {noun_phrase}' without specifying hand placement, grip, or pose. "
|
| 431 |
+
f"This frequently causes the image model to fuse the object into the character's body."
|
| 432 |
+
)
|
| 433 |
|
| 434 |
+
score = 1.0 - (len(violations) * 0.25)
|
| 435 |
+
score = max(0.2, min(1.0, score))
|
| 436 |
+
|
| 437 |
+
level = "high"
|
| 438 |
+
if score < 0.6:
|
| 439 |
+
level = "low"
|
| 440 |
+
elif score < 0.85:
|
| 441 |
+
level = "medium"
|
| 442 |
|
| 443 |
+
recommendation = prompt
|
| 444 |
+
if violations:
|
| 445 |
+
# Basic recommendation fixing floating gravity
|
| 446 |
+
if "floating" in p_lower and not any(kw in p_lower for kw in ["space", "zero-g", "magic"]):
|
| 447 |
+
recommendation = f"{prompt}, realistically grounded in environment, subject to gravity"
|
| 448 |
+
|
| 449 |
+
# Recommendation fixing vague holding
|
| 450 |
+
holding_match_rec = holding_pattern.search(recommendation)
|
| 451 |
+
if holding_match_rec and not any(kw in recommendation.lower() for kw in ["hand", "grip", "hilt", "stance", "pose"]):
|
| 452 |
+
start_idx = holding_match_rec.end()
|
| 453 |
+
rest = recommendation[start_idx:]
|
| 454 |
+
comma_idx = rest.find(',')
|
| 455 |
+
if comma_idx != -1:
|
| 456 |
+
noun_phrase = rest[:comma_idx].strip()
|
| 457 |
+
after_noun = rest[comma_idx:]
|
| 458 |
+
else:
|
| 459 |
+
noun_phrase = rest.strip()
|
| 460 |
+
after_noun = ""
|
| 461 |
+
|
| 462 |
+
# Determine appropriate grip description based on standard nouns
|
| 463 |
+
if any(w in noun_phrase.lower() for w in ["sword", "weapon", "blade", "dagger", "saber", "axe", "staff", "shield", "spear", "lance", "gun", "pistol", "rifle"]):
|
| 464 |
+
detailed_hold = f"gripping the hilt and handle of the {noun_phrase} firmly in one hand, posing in a natural heroic stance"
|
| 465 |
+
else:
|
| 466 |
+
detailed_hold = f"holding the {noun_phrase} firmly in their hand, posing naturally"
|
| 467 |
+
|
| 468 |
+
recommendation = recommendation[:holding_match_rec.start()] + detailed_hold + after_noun
|
| 469 |
+
|
| 470 |
+
return {
|
| 471 |
+
"coherence_score": score,
|
| 472 |
+
"coherence_level": level,
|
| 473 |
+
"violations": violations,
|
| 474 |
+
"recommendation": recommendation if violations else "",
|
| 475 |
+
"enhancement_needed": len(violations) > 0
|
| 476 |
+
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
lumaforge/pipeline.py
CHANGED
|
@@ -3,77 +3,69 @@ import time
|
|
| 3 |
import random
|
| 4 |
import torch
|
| 5 |
from PIL import Image, ImageDraw, ImageFont, ImageFilter, ImageOps, ImageEnhance
|
|
|
|
|
|
|
| 6 |
|
| 7 |
class LumaForgePipeline:
|
| 8 |
-
def __init__(self, model_id="
|
| 9 |
self.model_id = model_id
|
| 10 |
self.device = device if torch.backends.mps.is_available() and device == "mps" else "cpu"
|
| 11 |
self.pipe = None
|
| 12 |
self.is_loaded = False
|
| 13 |
-
|
|
|
|
| 14 |
|
| 15 |
def load_model(self):
|
| 16 |
-
"""Loads
|
| 17 |
if self.is_loaded:
|
| 18 |
return True
|
| 19 |
|
| 20 |
-
print(f"[LumaForgePipeline] Loading
|
| 21 |
-
print(f"[LumaForgePipeline]
|
| 22 |
try:
|
| 23 |
-
from diffusers import
|
| 24 |
-
import
|
| 25 |
|
| 26 |
-
#
|
| 27 |
-
|
| 28 |
-
raise TimeoutError("Model download timeout - exceeded 10 minutes")
|
| 29 |
|
| 30 |
-
#
|
| 31 |
-
|
| 32 |
|
| 33 |
-
print(f"[LumaForgePipeline]
|
| 34 |
-
|
|
|
|
| 35 |
self.model_id,
|
|
|
|
|
|
|
| 36 |
torch_dtype=torch_dtype,
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
requires_safety_checker=False
|
| 40 |
)
|
|
|
|
|
|
|
| 41 |
print(f"[LumaForgePipeline] Moving pipeline to {self.device}...")
|
| 42 |
self.pipe.to(self.device)
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
-
|
| 46 |
-
lora_path = "weights/lumaforge_lora.safetensors"
|
| 47 |
-
if os.path.exists(lora_path):
|
| 48 |
-
try:
|
| 49 |
-
# A basic file size check to distinguish the real state dict from a demo string
|
| 50 |
-
if os.path.getsize(lora_path) > 1000:
|
| 51 |
-
print(f"[LumaForgePipeline] Loading fine-tuned UNet weights from {lora_path}...")
|
| 52 |
-
state_dict = torch.load(lora_path, map_location=self.device)
|
| 53 |
-
self.pipe.unet.load_state_dict(state_dict)
|
| 54 |
-
print("[LumaForgePipeline] Fine-tuned UNet weights loaded successfully.")
|
| 55 |
-
else:
|
| 56 |
-
print(f"[LumaForgePipeline] Found demo/placeholder weights at {lora_path}. Skipping weight load.")
|
| 57 |
-
except Exception as e:
|
| 58 |
-
print(f"[LumaForgePipeline Warning] Failed to load fine-tuned weights: {e}. Running with base model.")
|
| 59 |
|
| 60 |
-
# Memory optimization
|
| 61 |
if self.device == "mps":
|
| 62 |
print(f"[LumaForgePipeline] Enabling attention slicing for MPS memory optimization...")
|
| 63 |
self.pipe.enable_attention_slicing()
|
| 64 |
-
print(f"[LumaForgePipeline] Attention slicing enabled.")
|
| 65 |
|
| 66 |
self.is_loaded = True
|
| 67 |
-
print("[LumaForgePipeline]
|
| 68 |
return True
|
| 69 |
-
except TimeoutError as e:
|
| 70 |
-
print(f"[LumaForgePipeline Error] Model loading timeout: {e}")
|
| 71 |
-
print(f"[LumaForgePipeline] Please use mock=True for faster testing")
|
| 72 |
-
self.is_loaded = False
|
| 73 |
-
return False
|
| 74 |
except Exception as e:
|
| 75 |
-
print(f"[LumaForgePipeline Error] Failed to load
|
| 76 |
-
print(f"[LumaForgePipeline]
|
| 77 |
self.is_loaded = False
|
| 78 |
return False
|
| 79 |
|
|
@@ -96,6 +88,7 @@ class LumaForgePipeline:
|
|
| 96 |
|
| 97 |
image = None
|
| 98 |
used_mock = False
|
|
|
|
| 99 |
|
| 100 |
# Extract quoted titles for negative prompt and overlay logic
|
| 101 |
import re
|
|
@@ -110,21 +103,51 @@ class LumaForgePipeline:
|
|
| 110 |
# Simulate processing time
|
| 111 |
time.sleep(1.5)
|
| 112 |
else:
|
| 113 |
-
#
|
| 114 |
-
|
| 115 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
-
|
| 118 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
if not negative_prompt:
|
| 120 |
-
negative_prompt =
|
| 121 |
else:
|
| 122 |
-
negative_prompt = f"{negative_prompt}, {
|
| 123 |
|
| 124 |
-
# If
|
| 125 |
if titles:
|
| 126 |
-
|
| 127 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
|
| 129 |
loaded = self.load_model()
|
| 130 |
if not loaded:
|
|
@@ -134,20 +157,28 @@ class LumaForgePipeline:
|
|
| 134 |
time.sleep(1.5)
|
| 135 |
else:
|
| 136 |
try:
|
| 137 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
generator = torch.Generator(device=self.device).manual_seed(seed)
|
| 139 |
-
|
|
|
|
| 140 |
output = self.pipe(
|
| 141 |
prompt=prompt,
|
| 142 |
negative_prompt=negative_prompt,
|
| 143 |
-
num_inference_steps=
|
| 144 |
-
guidance_scale=
|
| 145 |
width=width,
|
| 146 |
height=height,
|
| 147 |
generator=generator
|
| 148 |
)
|
| 149 |
image = output.images[0]
|
| 150 |
-
|
|
|
|
| 151 |
except Exception as e:
|
| 152 |
print(f"[LumaForgePipeline Error] Inference failed: {e}. Falling back to mock image.")
|
| 153 |
image = self._generate_mock_image(prompt, width, height, aspect_ratio, seed)
|
|
@@ -173,8 +204,19 @@ class LumaForgePipeline:
|
|
| 173 |
|
| 174 |
print(f"[LumaForgePipeline] Generation complete: {latency_sec:.2f}s, memory={memory_used_mb:.1f}MB, used_mock={used_mock}")
|
| 175 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
return {
|
| 177 |
"image": image,
|
|
|
|
| 178 |
"latency_sec": latency_sec,
|
| 179 |
"memory_used_mb": memory_used_mb,
|
| 180 |
"seed": seed,
|
|
@@ -367,8 +409,20 @@ class LumaForgePipeline:
|
|
| 367 |
# Apply logo watermark
|
| 368 |
output_image = self._overlay_lumaforge_logo(output_image)
|
| 369 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 370 |
return {
|
| 371 |
"image": output_image,
|
|
|
|
| 372 |
"latency_sec": latency_sec,
|
| 373 |
"memory_used_mb": memory_used_mb,
|
| 374 |
"seed": seed,
|
|
@@ -723,6 +777,101 @@ class LumaForgePipeline:
|
|
| 723 |
return 0
|
| 724 |
return 0
|
| 725 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 726 |
def _generate_mock_image(self, prompt: str, width: int, height: int, aspect_ratio: str, seed: int) -> Image:
|
| 727 |
"""
|
| 728 |
Generates a beautiful, highly stylized mock image dynamically matching the prompt.
|
|
@@ -872,100 +1021,283 @@ class LumaForgePipeline:
|
|
| 872 |
return [(15, 32, 67), (70, 130, 180)]
|
| 873 |
|
| 874 |
def _overlay_poster_typography(self, image: Image, title: str) -> Image:
|
| 875 |
-
"""Overlays professional
|
| 876 |
try:
|
| 877 |
-
from PIL import ImageDraw, ImageFont
|
|
|
|
|
|
|
| 878 |
|
| 879 |
-
#
|
| 880 |
img = image.copy()
|
| 881 |
width, height = img.size
|
| 882 |
|
| 883 |
-
|
| 884 |
-
|
| 885 |
|
| 886 |
-
#
|
| 887 |
-
|
| 888 |
-
|
| 889 |
-
|
| 890 |
-
|
| 891 |
-
|
| 892 |
-
|
| 893 |
-
|
| 894 |
-
|
| 895 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 896 |
|
| 897 |
-
|
| 898 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 899 |
|
| 900 |
-
# 2. Setup Font scaling to prevent overflow text truncation
|
| 901 |
-
font_path = "/System/Library/Fonts/Helvetica.ttc"
|
| 902 |
if not os.path.exists(font_path):
|
| 903 |
-
font_path = "/System/Library/Fonts/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 904 |
|
| 905 |
-
#
|
| 906 |
-
|
| 907 |
-
subtitle_size = max(10, int(height * 0.024))
|
| 908 |
-
max_w = int(width * 0.85)
|
| 909 |
|
| 910 |
try:
|
| 911 |
-
|
| 912 |
-
|
| 913 |
-
|
| 914 |
-
|
| 915 |
|
| 916 |
-
# Shrink title
|
| 917 |
-
while t_w > max_w and
|
| 918 |
-
|
| 919 |
-
|
| 920 |
-
|
| 921 |
-
t_w = t_bbox[2] - t_bbox[0]
|
| 922 |
-
t_h = t_bbox[3] - t_bbox[1]
|
| 923 |
-
|
| 924 |
-
sub_font = ImageFont.truetype(font_path, subtitle_size)
|
| 925 |
-
s_bbox = sub_font.getbbox(sub_text)
|
| 926 |
-
s_w = s_bbox[2] - s_bbox[0]
|
| 927 |
-
s_h = s_bbox[3] - s_bbox[1]
|
| 928 |
-
|
| 929 |
-
# Shrink subtitle size dynamically if too wide
|
| 930 |
-
while s_w > max_w and subtitle_size > 8:
|
| 931 |
-
subtitle_size -= 1
|
| 932 |
-
sub_font = ImageFont.truetype(font_path, subtitle_size)
|
| 933 |
-
s_bbox = sub_font.getbbox(sub_text)
|
| 934 |
-
s_w = s_bbox[2] - s_bbox[0]
|
| 935 |
-
s_h = s_bbox[3] - s_bbox[1]
|
| 936 |
except Exception:
|
| 937 |
-
|
| 938 |
-
|
| 939 |
-
t_w = len(title_text) * 8
|
| 940 |
-
t_h = 12
|
| 941 |
-
s_w = len(sub_text) * 6
|
| 942 |
-
s_h = 10
|
| 943 |
|
| 944 |
-
#
|
| 945 |
-
|
| 946 |
-
ty = int(height * 0.86)
|
| 947 |
-
|
| 948 |
-
sx = (width - s_w) // 2
|
| 949 |
-
sy = int(height * 0.78)
|
| 950 |
|
| 951 |
-
|
| 952 |
-
|
| 953 |
-
|
| 954 |
-
|
| 955 |
-
|
| 956 |
-
|
| 957 |
-
|
| 958 |
-
|
| 959 |
-
|
| 960 |
-
|
| 961 |
-
|
| 962 |
-
|
| 963 |
-
|
| 964 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 965 |
|
| 966 |
-
return
|
| 967 |
except Exception as e:
|
| 968 |
-
print(f"[LumaForgePipeline Warning] Failed to overlay typography: {e}")
|
| 969 |
return image
|
| 970 |
|
| 971 |
def _overlay_lumaforge_logo(self, image: Image) -> Image:
|
|
|
|
| 3 |
import random
|
| 4 |
import torch
|
| 5 |
from PIL import Image, ImageDraw, ImageFont, ImageFilter, ImageOps, ImageEnhance
|
| 6 |
+
from PIL.PngImagePlugin import PngInfo
|
| 7 |
+
import numpy as np
|
| 8 |
|
| 9 |
class LumaForgePipeline:
|
| 10 |
+
def __init__(self, model_id="stabilityai/stable-diffusion-3.5-medium", device="mps", ollama_client=None):
|
| 11 |
self.model_id = model_id
|
| 12 |
self.device = device if torch.backends.mps.is_available() and device == "mps" else "cpu"
|
| 13 |
self.pipe = None
|
| 14 |
self.is_loaded = False
|
| 15 |
+
self.ollama_client = ollama_client
|
| 16 |
+
print(f"[LumaForgePipeline] Initialized SD 3.5 Medium pipeline with device: {self.device}")
|
| 17 |
|
| 18 |
def load_model(self):
|
| 19 |
+
"""Loads SD 3.5 Medium pipeline - latest Stability AI model."""
|
| 20 |
if self.is_loaded:
|
| 21 |
return True
|
| 22 |
|
| 23 |
+
print(f"[LumaForgePipeline] Loading SD 3.5 Medium model onto {self.device}...")
|
| 24 |
+
print(f"[LumaForgePipeline] Checking local cache at ~/.cache/huggingface/...")
|
| 25 |
try:
|
| 26 |
+
from diffusers import StableDiffusion3Pipeline
|
| 27 |
+
import os
|
| 28 |
|
| 29 |
+
# Use fp16 for MPS
|
| 30 |
+
torch_dtype = torch.float16
|
|
|
|
| 31 |
|
| 32 |
+
# Set cache directory explicitly
|
| 33 |
+
cache_dir = os.path.expanduser("~/.cache/huggingface/hub")
|
| 34 |
|
| 35 |
+
print(f"[LumaForgePipeline] Loading SD 3.5 Medium (this will download ~5-6GB on first run)...")
|
| 36 |
+
|
| 37 |
+
self.pipe = StableDiffusion3Pipeline.from_pretrained(
|
| 38 |
self.model_id,
|
| 39 |
+
text_encoder_3=None,
|
| 40 |
+
tokenizer_3=None,
|
| 41 |
torch_dtype=torch_dtype,
|
| 42 |
+
cache_dir=cache_dir,
|
| 43 |
+
local_files_only=False
|
|
|
|
| 44 |
)
|
| 45 |
+
|
| 46 |
+
print(f"[LumaForgePipeline] β
SD 3.5 Medium loaded successfully")
|
| 47 |
print(f"[LumaForgePipeline] Moving pipeline to {self.device}...")
|
| 48 |
self.pipe.to(self.device)
|
| 49 |
+
# Keep VAE in float16 to match input latents on MPS (prevent c10::Half / float mismatch)
|
| 50 |
+
# if self.device == "mps":
|
| 51 |
+
# print("[LumaForgePipeline] Upcasting VAE decoder to float32 precision for MPS...")
|
| 52 |
+
# self.pipe.vae.to(dtype=torch.float32)
|
| 53 |
+
# print("[LumaForgePipeline] β
VAE upcasted successfully.")
|
| 54 |
|
| 55 |
+
print(f"[LumaForgePipeline] β
Pipeline successfully moved to {self.device}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
+
# Memory optimization
|
| 58 |
if self.device == "mps":
|
| 59 |
print(f"[LumaForgePipeline] Enabling attention slicing for MPS memory optimization...")
|
| 60 |
self.pipe.enable_attention_slicing()
|
| 61 |
+
print(f"[LumaForgePipeline] β
Attention slicing enabled.")
|
| 62 |
|
| 63 |
self.is_loaded = True
|
| 64 |
+
print("[LumaForgePipeline] β
SD 3.5 Medium ready for inference!")
|
| 65 |
return True
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
except Exception as e:
|
| 67 |
+
print(f"[LumaForgePipeline Error] Failed to load SD 3.5 Medium: {e}")
|
| 68 |
+
print(f"[LumaForgePipeline] Model needs to be downloaded first.")
|
| 69 |
self.is_loaded = False
|
| 70 |
return False
|
| 71 |
|
|
|
|
| 88 |
|
| 89 |
image = None
|
| 90 |
used_mock = False
|
| 91 |
+
gen_prompt = prompt
|
| 92 |
|
| 93 |
# Extract quoted titles for negative prompt and overlay logic
|
| 94 |
import re
|
|
|
|
| 103 |
# Simulate processing time
|
| 104 |
time.sleep(1.5)
|
| 105 |
else:
|
| 106 |
+
# SD 3.5 Medium: Use Ollama to optimize prompt for 77-token limit
|
| 107 |
+
prompt_lower = prompt.lower()
|
| 108 |
+
|
| 109 |
+
# Use Ollama to intelligently compress the prompt if needed
|
| 110 |
+
if self.ollama_client:
|
| 111 |
+
print(f"[LumaForgePipeline] Optimizing prompt for SD 3.5 Medium token limit...")
|
| 112 |
+
optimization = self.ollama_client.optimize_prompt_for_sd35(prompt, max_tokens=256)
|
| 113 |
|
| 114 |
+
if optimization["was_compressed"]:
|
| 115 |
+
print(f"[LumaForgePipeline] β
Prompt optimized: {optimization['original_tokens']} β {optimization['final_tokens']} tokens")
|
| 116 |
+
prompt = optimization["optimized_prompt"]
|
| 117 |
+
else:
|
| 118 |
+
print(f"[LumaForgePipeline] β
Prompt already optimal ({optimization['original_tokens']} tokens)")
|
| 119 |
+
else:
|
| 120 |
+
print(f"[LumaForgePipeline] β οΈ Ollama not available, using original prompt")
|
| 121 |
+
|
| 122 |
+
# OPTIMIZED NEGATIVE PROMPT (essential negatives only for SD 3.5 Medium)
|
| 123 |
+
core_negatives = "low quality, blurry"
|
| 124 |
+
|
| 125 |
+
# Add facial negatives for character/portrait images
|
| 126 |
+
if any(kw in prompt_lower for kw in ["face", "portrait", "character", "person", "wizard", "man", "woman"]):
|
| 127 |
+
core_negatives = f"{core_negatives}, bad anatomy"
|
| 128 |
+
|
| 129 |
+
# Style-aware exclusions (minimal)
|
| 130 |
+
if "photorealistic" in prompt_lower or "photo" in prompt_lower:
|
| 131 |
+
core_negatives = f"{core_negatives}, cartoon"
|
| 132 |
+
elif "anime" in prompt_lower:
|
| 133 |
+
core_negatives = f"{core_negatives}, photorealistic"
|
| 134 |
+
|
| 135 |
if not negative_prompt:
|
| 136 |
+
negative_prompt = core_negatives
|
| 137 |
else:
|
| 138 |
+
negative_prompt = f"{negative_prompt}, {core_negatives}"
|
| 139 |
|
| 140 |
+
# If titles found, suppress text generation
|
| 141 |
if titles:
|
| 142 |
+
negative_prompt = f"{negative_prompt}, text, letters"
|
| 143 |
+
|
| 144 |
+
# Token estimation (rough: ~1.3 chars per token)
|
| 145 |
+
prompt_tokens = len(prompt) // 1.3
|
| 146 |
+
neg_tokens = len(negative_prompt) // 1.3
|
| 147 |
+
|
| 148 |
+
print(f"[LumaForgePipeline] Token estimate: prompt ~{int(prompt_tokens)}, negative ~{int(neg_tokens)}")
|
| 149 |
+
if prompt_tokens > 256:
|
| 150 |
+
print(f"[LumaForgePipeline] β οΈ Prompt may be truncated (exceeds 256 tokens)")
|
| 151 |
|
| 152 |
loaded = self.load_model()
|
| 153 |
if not loaded:
|
|
|
|
| 157 |
time.sleep(1.5)
|
| 158 |
else:
|
| 159 |
try:
|
| 160 |
+
# 8. SD 3.5 OPTIMAL PARAMETERS
|
| 161 |
+
optimized_steps = 28
|
| 162 |
+
optimized_guidance = 4.5
|
| 163 |
+
|
| 164 |
+
print(f"[LumaForgePipeline] SD 3.5 Medium inference: steps={optimized_steps}, guidance={optimized_guidance}, seed={seed}")
|
| 165 |
+
print(f"[LumaForgePipeline] Prompt: {prompt[:100]}...")
|
| 166 |
+
print(f"[LumaForgePipeline] Negative: {negative_prompt[:80]}...")
|
| 167 |
generator = torch.Generator(device=self.device).manual_seed(seed)
|
| 168 |
+
|
| 169 |
+
# Run SD 3.5 Medium diffusion
|
| 170 |
output = self.pipe(
|
| 171 |
prompt=prompt,
|
| 172 |
negative_prompt=negative_prompt,
|
| 173 |
+
num_inference_steps=optimized_steps,
|
| 174 |
+
guidance_scale=optimized_guidance,
|
| 175 |
width=width,
|
| 176 |
height=height,
|
| 177 |
generator=generator
|
| 178 |
)
|
| 179 |
image = output.images[0]
|
| 180 |
+
|
| 181 |
+
print(f"[LumaForgePipeline] β
SD 3.5 Medium inference completed")
|
| 182 |
except Exception as e:
|
| 183 |
print(f"[LumaForgePipeline Error] Inference failed: {e}. Falling back to mock image.")
|
| 184 |
image = self._generate_mock_image(prompt, width, height, aspect_ratio, seed)
|
|
|
|
| 204 |
|
| 205 |
print(f"[LumaForgePipeline] Generation complete: {latency_sec:.2f}s, memory={memory_used_mb:.1f}MB, used_mock={used_mock}")
|
| 206 |
|
| 207 |
+
# Construct PNG Metadata
|
| 208 |
+
metadata = PngInfo()
|
| 209 |
+
metadata.add_text("prompt", str(gen_prompt))
|
| 210 |
+
metadata.add_text("negative_prompt", str(negative_prompt))
|
| 211 |
+
metadata.add_text("seed", str(seed))
|
| 212 |
+
metadata.add_text("steps", str(steps))
|
| 213 |
+
metadata.add_text("guidance_scale", str(guidance_scale))
|
| 214 |
+
metadata.add_text("model_id", str(self.model_id))
|
| 215 |
+
metadata.add_text("software", "LumaForge AuraGen Core")
|
| 216 |
+
|
| 217 |
return {
|
| 218 |
"image": image,
|
| 219 |
+
"pnginfo": metadata,
|
| 220 |
"latency_sec": latency_sec,
|
| 221 |
"memory_used_mb": memory_used_mb,
|
| 222 |
"seed": seed,
|
|
|
|
| 409 |
# Apply logo watermark
|
| 410 |
output_image = self._overlay_lumaforge_logo(output_image)
|
| 411 |
|
| 412 |
+
# Construct PNG Metadata
|
| 413 |
+
metadata = PngInfo()
|
| 414 |
+
metadata.add_text("prompt", str(prompt))
|
| 415 |
+
metadata.add_text("negative_prompt", str(negative_prompt))
|
| 416 |
+
metadata.add_text("seed", str(seed))
|
| 417 |
+
metadata.add_text("steps", str(steps))
|
| 418 |
+
metadata.add_text("guidance_scale", str(guidance_scale))
|
| 419 |
+
metadata.add_text("strength", str(strength))
|
| 420 |
+
metadata.add_text("model_id", str(self.model_id))
|
| 421 |
+
metadata.add_text("software", "LumaForge AuraGen Core")
|
| 422 |
+
|
| 423 |
return {
|
| 424 |
"image": output_image,
|
| 425 |
+
"pnginfo": metadata,
|
| 426 |
"latency_sec": latency_sec,
|
| 427 |
"memory_used_mb": memory_used_mb,
|
| 428 |
"seed": seed,
|
|
|
|
| 777 |
return 0
|
| 778 |
return 0
|
| 779 |
|
| 780 |
+
def _restore_face(self, image: Image.Image) -> Image.Image:
|
| 781 |
+
"""
|
| 782 |
+
Restores facial details and clarity using GFPGAN for crystal-clear faces.
|
| 783 |
+
Falls back gracefully if GFPGAN not available.
|
| 784 |
+
"""
|
| 785 |
+
try:
|
| 786 |
+
from gfpgan import GFPGANer
|
| 787 |
+
|
| 788 |
+
# Initialize GFPGAN
|
| 789 |
+
restorer = GFPGANer(
|
| 790 |
+
scale=2,
|
| 791 |
+
model_path='https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth',
|
| 792 |
+
upscale=True,
|
| 793 |
+
arch='clean',
|
| 794 |
+
channel_multiplier=2,
|
| 795 |
+
bg_upsampler=None,
|
| 796 |
+
device=self.device
|
| 797 |
+
)
|
| 798 |
+
|
| 799 |
+
# Convert PIL to numpy (GFPGAN works with numpy arrays)
|
| 800 |
+
img_np = np.array(image)
|
| 801 |
+
|
| 802 |
+
# Restore faces
|
| 803 |
+
_, _, output = restorer.enhance(img_np, has_aligned=False, only_center_face=False, pad=10, weight=0.7)
|
| 804 |
+
|
| 805 |
+
# Convert back to PIL
|
| 806 |
+
restored = Image.fromarray(output)
|
| 807 |
+
|
| 808 |
+
print("[LumaForgePipeline] β
Face restoration completed with GFPGAN")
|
| 809 |
+
return restored
|
| 810 |
+
except Exception as e:
|
| 811 |
+
print(f"[LumaForgePipeline Warning] Face restoration failed ({e}). Continuing without restoration.")
|
| 812 |
+
return image
|
| 813 |
+
|
| 814 |
+
def _upscale_image(self, image: Image.Image, scale: int = 2) -> Image.Image:
|
| 815 |
+
"""
|
| 816 |
+
Upscales image using Real-ESRGAN for maximum clarity and detail.
|
| 817 |
+
Falls back to Lanczos if Real-ESRGAN unavailable.
|
| 818 |
+
"""
|
| 819 |
+
try:
|
| 820 |
+
from basicsr.archs.rrdbnet_arch import RRDBNet
|
| 821 |
+
from realesrgan import RealESRGANer
|
| 822 |
+
|
| 823 |
+
# Initialize Real-ESRGAN
|
| 824 |
+
upsampler = RealESRGANer(
|
| 825 |
+
scale=scale,
|
| 826 |
+
model_name='RealESRGAN_x2plus',
|
| 827 |
+
model_path='https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth',
|
| 828 |
+
tile=400,
|
| 829 |
+
tile_pad=10,
|
| 830 |
+
pre_pad=0,
|
| 831 |
+
half=True if self.device == "mps" else False
|
| 832 |
+
)
|
| 833 |
+
|
| 834 |
+
# Convert PIL to numpy
|
| 835 |
+
img_np = np.array(image)
|
| 836 |
+
|
| 837 |
+
# Upscale
|
| 838 |
+
output, _ = upsampler.enhance(img_np, outscale=scale)
|
| 839 |
+
|
| 840 |
+
# Convert back to PIL
|
| 841 |
+
upscaled = Image.fromarray(output)
|
| 842 |
+
|
| 843 |
+
print(f"[LumaForgePipeline] β
Image upscaled {scale}x with Real-ESRGAN")
|
| 844 |
+
return upscaled
|
| 845 |
+
except Exception as e:
|
| 846 |
+
print(f"[LumaForgePipeline] Real-ESRGAN unavailable ({e}). Using Lanczos upscaling.")
|
| 847 |
+
new_size = (image.width * scale, image.height * scale)
|
| 848 |
+
return image.resize(new_size, Image.Resampling.LANCZOS)
|
| 849 |
+
|
| 850 |
+
def _enhance_clarity(self, image: Image.Image) -> Image.Image:
|
| 851 |
+
"""
|
| 852 |
+
Enhances image clarity through multiple post-processing techniques.
|
| 853 |
+
"""
|
| 854 |
+
# 1. Unsharp mask for edge enhancement
|
| 855 |
+
blurred = image.filter(ImageFilter.GaussianBlur(1.0))
|
| 856 |
+
img_arr = np.array(image, dtype=float)
|
| 857 |
+
blur_arr = np.array(blurred, dtype=float)
|
| 858 |
+
unsharp_mask = img_arr - blur_arr
|
| 859 |
+
|
| 860 |
+
enhanced_arr = img_arr + 0.5 * unsharp_mask
|
| 861 |
+
enhanced_arr = np.clip(enhanced_arr, 0, 255).astype(np.uint8)
|
| 862 |
+
enhanced = Image.fromarray(enhanced_arr)
|
| 863 |
+
|
| 864 |
+
# 2. Contrast boost
|
| 865 |
+
contrast_enhancer = ImageEnhance.Contrast(enhanced)
|
| 866 |
+
enhanced = contrast_enhancer.enhance(1.1)
|
| 867 |
+
|
| 868 |
+
# 3. Sharpness boost
|
| 869 |
+
sharpness_enhancer = ImageEnhance.Sharpness(enhanced)
|
| 870 |
+
enhanced = sharpness_enhancer.enhance(1.2)
|
| 871 |
+
|
| 872 |
+
print("[LumaForgePipeline] β
Clarity enhancement applied")
|
| 873 |
+
return enhanced
|
| 874 |
+
|
| 875 |
def _generate_mock_image(self, prompt: str, width: int, height: int, aspect_ratio: str, seed: int) -> Image:
|
| 876 |
"""
|
| 877 |
Generates a beautiful, highly stylized mock image dynamically matching the prompt.
|
|
|
|
| 1021 |
return [(15, 32, 67), (70, 130, 180)]
|
| 1022 |
|
| 1023 |
def _overlay_poster_typography(self, image: Image, title: str) -> Image:
|
| 1024 |
+
"""Overlays professional premium typography on the generated movie poster image."""
|
| 1025 |
try:
|
| 1026 |
+
from PIL import ImageDraw, ImageFont, ImageFilter, ImageOps
|
| 1027 |
+
import os
|
| 1028 |
+
import re
|
| 1029 |
|
| 1030 |
+
# Copy base canvas
|
| 1031 |
img = image.copy()
|
| 1032 |
width, height = img.size
|
| 1033 |
|
| 1034 |
+
# Clean title
|
| 1035 |
+
title_text = title.strip().upper()
|
| 1036 |
|
| 1037 |
+
# Detect layout style from prompt/title text
|
| 1038 |
+
style_type = "cinematic"
|
| 1039 |
+
if any(w in title_text.lower() for w in ["cyber", "neon", "retro", "hack", "system", "matrix", "future", "laser", "star", "cosmic", "galaxy"]):
|
| 1040 |
+
style_type = "scifi"
|
| 1041 |
+
elif any(w in title_text.lower() for w in ["luxury", "gold", "royal", "silent", "whisper", "minimal", "white", "glass", "vogue", "velvet"]):
|
| 1042 |
+
style_type = "luxury"
|
| 1043 |
+
|
| 1044 |
+
# Helper for character-spaced drawing
|
| 1045 |
+
def get_spaced_text_width(text, font, spacing=6):
|
| 1046 |
+
w = 0
|
| 1047 |
+
for char in text:
|
| 1048 |
+
bbox = font.getbbox(char)
|
| 1049 |
+
char_w = bbox[2] - bbox[0]
|
| 1050 |
+
w += char_w + spacing
|
| 1051 |
+
return w - spacing if w > 0 else 0
|
| 1052 |
+
|
| 1053 |
+
def draw_spaced_text(draw, position, text, font, fill, spacing=6, shadow_fill=None, shadow_offset=(1, 1)):
|
| 1054 |
+
x, y = position
|
| 1055 |
+
ox, oy = shadow_offset
|
| 1056 |
+
for char in text:
|
| 1057 |
+
if shadow_fill:
|
| 1058 |
+
draw.text((x + ox, y + oy), char, fill=shadow_fill, font=font)
|
| 1059 |
+
draw.text((x, y), char, fill=fill, font=font)
|
| 1060 |
+
bbox = font.getbbox(char)
|
| 1061 |
+
char_w = bbox[2] - bbox[0]
|
| 1062 |
+
x += char_w + spacing
|
| 1063 |
+
|
| 1064 |
+
def draw_gradient_text(target_img, position, text, font, spacing, top_color, bottom_color, shadow_fill=None, shadow_offset=(2, 2)):
|
| 1065 |
+
"""Draws text with a beautiful top-to-bottom vertical color gradient."""
|
| 1066 |
+
w = get_spaced_text_width(text, font, spacing)
|
| 1067 |
+
bbox = font.getbbox("A")
|
| 1068 |
+
h = bbox[3] - bbox[1] + 15
|
| 1069 |
+
|
| 1070 |
+
# Create a mask for the text
|
| 1071 |
+
mask = Image.new("L", (w + 40, h + 20), 0)
|
| 1072 |
+
mask_draw = ImageDraw.Draw(mask)
|
| 1073 |
|
| 1074 |
+
# Draw spaced text on mask
|
| 1075 |
+
x_m, y_m = 20, 10
|
| 1076 |
+
for char in text:
|
| 1077 |
+
mask_draw.text((x_m, y_m), char, fill=255, font=font)
|
| 1078 |
+
c_bbox = font.getbbox(char)
|
| 1079 |
+
char_w = c_bbox[2] - c_bbox[0]
|
| 1080 |
+
x_m += char_w + spacing
|
| 1081 |
+
|
| 1082 |
+
# Create gradient image of the same size
|
| 1083 |
+
gradient = Image.new("RGBA", (w + 40, h + 20))
|
| 1084 |
+
g_draw = ImageDraw.Draw(gradient)
|
| 1085 |
+
for y in range(h + 20):
|
| 1086 |
+
ratio = y / (h + 20)
|
| 1087 |
+
r = int(top_color[0] + (bottom_color[0] - top_color[0]) * ratio)
|
| 1088 |
+
g = int(top_color[1] + (bottom_color[1] - top_color[1]) * ratio)
|
| 1089 |
+
b = int(top_color[2] + (bottom_color[2] - top_color[2]) * ratio)
|
| 1090 |
+
g_draw.line([(0, y), (w + 40, y)], fill=(r, g, b, 255))
|
| 1091 |
+
|
| 1092 |
+
# Apply mask to gradient
|
| 1093 |
+
text_img = Image.new("RGBA", (w + 40, h + 20))
|
| 1094 |
+
text_img.paste(gradient, (0, 0), mask)
|
| 1095 |
+
|
| 1096 |
+
# Draw shadow on the main image if requested
|
| 1097 |
+
if shadow_fill:
|
| 1098 |
+
sx, sy = position[0] + shadow_offset[0], position[1] + shadow_offset[1]
|
| 1099 |
+
shadow_img = Image.new("RGBA", (w + 40, h + 20), (shadow_fill[0], shadow_fill[1], shadow_fill[2], shadow_fill[3]))
|
| 1100 |
+
target_img.paste(shadow_img, (sx - 20, sy - 10), mask)
|
| 1101 |
+
|
| 1102 |
+
# Paste onto main image
|
| 1103 |
+
target_img.paste(text_img, (position[0] - 20, position[1] - 10), mask)
|
| 1104 |
+
|
| 1105 |
+
# Setup fonts based on theme
|
| 1106 |
+
font_paths = {
|
| 1107 |
+
"scifi": "/System/Library/Fonts/Supplemental/Futura.ttc",
|
| 1108 |
+
"luxury": "/System/Library/Fonts/Supplemental/Didot.ttc",
|
| 1109 |
+
"cinematic": "/System/Library/Fonts/Supplemental/Copperplate.ttc"
|
| 1110 |
+
}
|
| 1111 |
+
sub_font_paths = {
|
| 1112 |
+
"scifi": "/System/Library/Fonts/Supplemental/Futura.ttc",
|
| 1113 |
+
"luxury": "/System/Library/Fonts/Supplemental/Baskerville.ttc",
|
| 1114 |
+
"cinematic": "/System/Library/Fonts/Supplemental/Georgia.ttf"
|
| 1115 |
+
}
|
| 1116 |
+
|
| 1117 |
+
# Select active fonts with Helvetica fallbacks
|
| 1118 |
+
font_path = font_paths.get(style_type, "/System/Library/Fonts/Helvetica.ttc")
|
| 1119 |
+
sub_font_path = sub_font_paths.get(style_type, "/System/Library/Fonts/Helvetica.ttc")
|
| 1120 |
|
|
|
|
|
|
|
| 1121 |
if not os.path.exists(font_path):
|
| 1122 |
+
font_path = "/System/Library/Fonts/Helvetica.ttc"
|
| 1123 |
+
if not os.path.exists(sub_font_path):
|
| 1124 |
+
sub_font_path = "/System/Library/Fonts/Helvetica.ttc"
|
| 1125 |
+
|
| 1126 |
+
# Font size heuristics
|
| 1127 |
+
title_font_size = max(26, int(height * 0.08))
|
| 1128 |
+
sub_font_size = max(10, int(height * 0.024))
|
| 1129 |
+
credits_font_size = max(8, int(height * 0.016))
|
| 1130 |
|
| 1131 |
+
# Determine maximum allowable width
|
| 1132 |
+
max_w = int(width * 0.88)
|
|
|
|
|
|
|
| 1133 |
|
| 1134 |
try:
|
| 1135 |
+
t_font = ImageFont.truetype(font_path, title_font_size)
|
| 1136 |
+
# Compute width with spacing (default spacing is 8 for title)
|
| 1137 |
+
t_spacing = 8 if style_type != "luxury" else 14
|
| 1138 |
+
t_w = get_spaced_text_width(title_text, t_font, spacing=t_spacing)
|
| 1139 |
|
| 1140 |
+
# Shrink title if too wide
|
| 1141 |
+
while t_w > max_w and title_font_size > 16:
|
| 1142 |
+
title_font_size -= 2
|
| 1143 |
+
t_font = ImageFont.truetype(font_path, title_font_size)
|
| 1144 |
+
t_w = get_spaced_text_width(title_text, t_font, spacing=t_spacing)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1145 |
except Exception:
|
| 1146 |
+
t_font = ImageFont.load_default()
|
| 1147 |
+
t_spacing = 4
|
| 1148 |
+
t_w = len(title_text) * (8 + t_spacing)
|
|
|
|
|
|
|
|
|
|
| 1149 |
|
| 1150 |
+
# Create overlay canvas
|
| 1151 |
+
overlay = Image.new("RGBA", (width, height), (0, 0, 0, 0))
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1152 |
|
| 1153 |
+
if style_type == "scifi":
|
| 1154 |
+
# 1. Cyberpunk/Sci-Fi Theme
|
| 1155 |
+
# Bottom vignette (cyan/dark)
|
| 1156 |
+
for y in range(int(height * 0.6), height):
|
| 1157 |
+
ratio = (y - int(height * 0.6)) / (height * 0.4)
|
| 1158 |
+
alpha = int(210 * (ratio ** 1.5))
|
| 1159 |
+
draw_line = ImageDraw.Draw(overlay)
|
| 1160 |
+
draw_line.line([(0, y), (width, y)], fill=(5, 10, 20, alpha))
|
| 1161 |
+
|
| 1162 |
+
# Draw Title at the bottom with gradient
|
| 1163 |
+
tx = (width - t_w) // 2
|
| 1164 |
+
ty = int(height * 0.82)
|
| 1165 |
+
|
| 1166 |
+
draw_gradient_text(
|
| 1167 |
+
overlay, (tx, ty), title_text, t_font, spacing=t_spacing,
|
| 1168 |
+
top_color=(0, 255, 255), bottom_color=(0, 128, 255),
|
| 1169 |
+
shadow_fill=(255, 0, 128, 200), shadow_offset=(-2, 2)
|
| 1170 |
+
)
|
| 1171 |
+
|
| 1172 |
+
# Tagline / Subtitle
|
| 1173 |
+
draw_overlay = ImageDraw.Draw(overlay)
|
| 1174 |
+
sub_text = "A U R A _ G E N // N E T _ S Y S _ A C T I V E"
|
| 1175 |
+
try:
|
| 1176 |
+
s_font = ImageFont.truetype(sub_font_path, sub_font_size)
|
| 1177 |
+
s_w = get_spaced_text_width(sub_text, s_font, spacing=3)
|
| 1178 |
+
except Exception:
|
| 1179 |
+
s_font = ImageFont.load_default()
|
| 1180 |
+
s_w = len(sub_text) * 10
|
| 1181 |
+
sx = (width - s_w) // 2
|
| 1182 |
+
sy = int(height * 0.76)
|
| 1183 |
+
draw_spaced_text(draw_overlay, (sx, sy), sub_text, s_font, fill=(0, 240, 255, 220), spacing=3, shadow_fill=(0, 0, 0, 180))
|
| 1184 |
+
|
| 1185 |
+
# Top coordinates HUD
|
| 1186 |
+
hud_text = "COORD: 35.6762Β° N, 139.6503Β° E | SYS: ONLINE"
|
| 1187 |
+
try:
|
| 1188 |
+
h_font = ImageFont.truetype(sub_font_path, int(credits_font_size * 0.9))
|
| 1189 |
+
except Exception:
|
| 1190 |
+
h_font = ImageFont.load_default()
|
| 1191 |
+
draw_overlay.text((30, 30), hud_text, fill=(0, 255, 255, 120), font=h_font)
|
| 1192 |
+
|
| 1193 |
+
elif style_type == "luxury":
|
| 1194 |
+
# 2. Minimalist Luxury Theme
|
| 1195 |
+
# Top vignette (subtle dark vignette at top)
|
| 1196 |
+
for y in range(0, int(height * 0.35)):
|
| 1197 |
+
ratio = 1.0 - (y / (height * 0.35))
|
| 1198 |
+
alpha = int(140 * (ratio ** 1.8))
|
| 1199 |
+
draw_line = ImageDraw.Draw(overlay)
|
| 1200 |
+
draw_line.line([(0, y), (width, y)], fill=(8, 8, 12, alpha))
|
| 1201 |
+
|
| 1202 |
+
# Title at the top center with pearl gradient
|
| 1203 |
+
tx = (width - t_w) // 2
|
| 1204 |
+
ty = int(height * 0.15)
|
| 1205 |
+
|
| 1206 |
+
draw_gradient_text(
|
| 1207 |
+
overlay, (tx, ty), title_text, t_font, spacing=t_spacing,
|
| 1208 |
+
top_color=(255, 255, 255), bottom_color=(235, 235, 240),
|
| 1209 |
+
shadow_fill=(0, 0, 0, 100), shadow_offset=(1, 1)
|
| 1210 |
+
)
|
| 1211 |
+
|
| 1212 |
+
# Gold separator line under title
|
| 1213 |
+
draw_overlay = ImageDraw.Draw(overlay)
|
| 1214 |
+
line_y = ty + int(height * 0.09)
|
| 1215 |
+
line_w = int(t_w * 0.6)
|
| 1216 |
+
lx1 = (width - line_w) // 2
|
| 1217 |
+
lx2 = lx1 + line_w
|
| 1218 |
+
draw_overlay.line([(lx1, line_y), (lx2, line_y)], fill=(212, 175, 55, 180), width=1) # gold line
|
| 1219 |
+
|
| 1220 |
+
# Elegant tagline
|
| 1221 |
+
sub_text = "L U M A F O R G E P R E S E N T S"
|
| 1222 |
+
try:
|
| 1223 |
+
s_font = ImageFont.truetype(sub_font_path, int(sub_font_size * 0.95))
|
| 1224 |
+
# Make it italic if Baskerville
|
| 1225 |
+
if "Baskerville" in sub_font_path:
|
| 1226 |
+
s_font = ImageFont.truetype("/System/Library/Fonts/Supplemental/Baskerville.ttc", int(sub_font_size * 0.95), index=1)
|
| 1227 |
+
s_w = get_spaced_text_width(sub_text, s_font, spacing=4)
|
| 1228 |
+
except Exception:
|
| 1229 |
+
s_font = ImageFont.load_default()
|
| 1230 |
+
s_w = len(sub_text) * 10
|
| 1231 |
+
sx = (width - s_w) // 2
|
| 1232 |
+
sy = ty - int(height * 0.05)
|
| 1233 |
+
draw_spaced_text(draw_overlay, (sx, sy), sub_text, s_font, fill=(212, 175, 55, 220), spacing=4)
|
| 1234 |
+
|
| 1235 |
+
else:
|
| 1236 |
+
# 3. Cinematic Action Theme (Default)
|
| 1237 |
+
# Bottom vignette (dark rich vignette)
|
| 1238 |
+
for y in range(int(height * 0.52), height):
|
| 1239 |
+
ratio = (y - int(height * 0.52)) / (height * 0.48)
|
| 1240 |
+
alpha = int(230 * (ratio ** 2.0))
|
| 1241 |
+
draw_line = ImageDraw.Draw(overlay)
|
| 1242 |
+
draw_line.line([(0, y), (width, y)], fill=(4, 4, 6, alpha))
|
| 1243 |
+
|
| 1244 |
+
# Title at bottom with warm silver/gold metallic gradient
|
| 1245 |
+
tx = (width - t_w) // 2
|
| 1246 |
+
ty = int(height * 0.80)
|
| 1247 |
+
|
| 1248 |
+
draw_gradient_text(
|
| 1249 |
+
overlay, (tx, ty), title_text, t_font, spacing=t_spacing,
|
| 1250 |
+
top_color=(255, 255, 255), bottom_color=(220, 215, 200),
|
| 1251 |
+
shadow_fill=(0, 0, 0, 245), shadow_offset=(3, 3)
|
| 1252 |
+
)
|
| 1253 |
+
|
| 1254 |
+
# Dynamic billing block text (credits line)
|
| 1255 |
+
draw_overlay = ImageDraw.Draw(overlay)
|
| 1256 |
+
credits_line = "STARRING GENERATIVE IMAGINATION β’ EXECUTIVE PRODUCERS LUMAFORGE LABS β’ MUSIC BY NEURAL SYNTH"
|
| 1257 |
+
try:
|
| 1258 |
+
c_font = ImageFont.truetype(font_path, credits_font_size)
|
| 1259 |
+
c_w = get_spaced_text_width(credits_line, c_font, spacing=2)
|
| 1260 |
+
# Shrink if too wide
|
| 1261 |
+
while c_w > max_w and credits_font_size > 6:
|
| 1262 |
+
credits_font_size -= 1
|
| 1263 |
+
c_font = ImageFont.truetype(font_path, credits_font_size)
|
| 1264 |
+
c_w = get_spaced_text_width(credits_line, c_font, spacing=2)
|
| 1265 |
+
except Exception:
|
| 1266 |
+
c_font = ImageFont.load_default()
|
| 1267 |
+
c_w = len(credits_line) * 8
|
| 1268 |
+
cx_pos = (width - c_w) // 2
|
| 1269 |
+
cy_pos = int(height * 0.90)
|
| 1270 |
+
draw_spaced_text(draw_overlay, (cx_pos, cy_pos), credits_line, c_font, fill=(160, 160, 160, 200), spacing=2)
|
| 1271 |
+
|
| 1272 |
+
# Tagline above title
|
| 1273 |
+
tagline = "THE FUTURE OF CREATIVE ARTISTRY"
|
| 1274 |
+
try:
|
| 1275 |
+
s_font = ImageFont.truetype(sub_font_path, sub_font_size)
|
| 1276 |
+
# Make it italic if Georgia
|
| 1277 |
+
if "Georgia" in sub_font_path:
|
| 1278 |
+
s_font = ImageFont.truetype("/System/Library/Fonts/Supplemental/Georgia Italic.ttf", sub_font_size)
|
| 1279 |
+
s_w = get_spaced_text_width(tagline, s_font, spacing=3)
|
| 1280 |
+
except Exception:
|
| 1281 |
+
s_font = ImageFont.load_default()
|
| 1282 |
+
s_w = len(tagline) * 10
|
| 1283 |
+
sx = (width - s_w) // 2
|
| 1284 |
+
sy = ty - int(height * 0.06)
|
| 1285 |
+
draw_spaced_text(draw_overlay, (sx, sy), tagline, s_font, fill=(225, 225, 225, 255), spacing=3, shadow_fill=(0, 0, 0, 200))
|
| 1286 |
+
|
| 1287 |
+
# Small minimalist line
|
| 1288 |
+
line_y = (ty + sy + int(height * 0.02)) // 2
|
| 1289 |
+
line_w = int(width * 0.35)
|
| 1290 |
+
lx1 = (width - line_w) // 2
|
| 1291 |
+
lx2 = lx1 + line_w
|
| 1292 |
+
draw_overlay.line([(lx1, line_y), (lx2, line_y)], fill=(255, 255, 255, 70), width=1)
|
| 1293 |
+
|
| 1294 |
+
# Convert base image to RGBA, composite overlay, convert back to RGB
|
| 1295 |
+
img_rgba = img.convert("RGBA")
|
| 1296 |
+
composited = Image.alpha_composite(img_rgba, overlay)
|
| 1297 |
|
| 1298 |
+
return composited.convert("RGB")
|
| 1299 |
except Exception as e:
|
| 1300 |
+
print(f"[LumaForgePipeline Warning] Failed to overlay premium typography: {e}")
|
| 1301 |
return image
|
| 1302 |
|
| 1303 |
def _overlay_lumaforge_logo(self, image: Image) -> Image:
|
test_generation.py
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Test SDXL Turbo image generation"""
|
| 3 |
+
import requests
|
| 4 |
+
import time
|
| 5 |
+
from PIL import Image
|
| 6 |
+
import io
|
| 7 |
+
import numpy as np
|
| 8 |
+
|
| 9 |
+
# Test the wizard prompt
|
| 10 |
+
prompt = "a wizard with a long white beard standing in a mystical forest"
|
| 11 |
+
print(f"π§ Testing SDXL Turbo with prompt: '{prompt}'")
|
| 12 |
+
print("")
|
| 13 |
+
|
| 14 |
+
# Start generation session
|
| 15 |
+
print("Starting generation session...")
|
| 16 |
+
start_response = requests.post("http://localhost:7860/api/generate-session/start", json={
|
| 17 |
+
"prompt": prompt,
|
| 18 |
+
"mode": "general",
|
| 19 |
+
"aspect_ratio": "1:1",
|
| 20 |
+
"steps": 4,
|
| 21 |
+
"guidance_scale": 0.0,
|
| 22 |
+
"seed": -1,
|
| 23 |
+
"mock": False
|
| 24 |
+
})
|
| 25 |
+
|
| 26 |
+
if start_response.status_code == 200:
|
| 27 |
+
session_data = start_response.json()
|
| 28 |
+
session_id = session_data["session_id"]
|
| 29 |
+
print(f"β
Session started: {session_id}")
|
| 30 |
+
print("")
|
| 31 |
+
|
| 32 |
+
# Poll for completion
|
| 33 |
+
print("β³ Generating image", end="", flush=True)
|
| 34 |
+
while True:
|
| 35 |
+
status_response = requests.post("http://localhost:7860/api/generate-session/status", json={
|
| 36 |
+
"session_id": session_id
|
| 37 |
+
})
|
| 38 |
+
|
| 39 |
+
if status_response.status_code == 200:
|
| 40 |
+
status_data = status_response.json()
|
| 41 |
+
state = status_data["state"]
|
| 42 |
+
|
| 43 |
+
if state == "completed":
|
| 44 |
+
print(" β
")
|
| 45 |
+
print("")
|
| 46 |
+
print("Generation completed!")
|
| 47 |
+
print(f" Image URL: {status_data['image_url']}")
|
| 48 |
+
print(f" Time: {status_data['latency_sec']:.1f}s")
|
| 49 |
+
print(f" Memory: {status_data['memory_used_mb']:.1f}MB")
|
| 50 |
+
print(f" Seed: {status_data['seed']}")
|
| 51 |
+
print(f" Mock: {status_data['used_mock']}")
|
| 52 |
+
print("")
|
| 53 |
+
|
| 54 |
+
# Check if image is not blank
|
| 55 |
+
img_response = requests.get(f"http://localhost:7860{status_data['image_url']}")
|
| 56 |
+
if img_response.status_code == 200:
|
| 57 |
+
img = Image.open(io.BytesIO(img_response.content))
|
| 58 |
+
img_array = np.array(img)
|
| 59 |
+
|
| 60 |
+
# Check if image is blank (all black or all same color)
|
| 61 |
+
is_blank = (img_array.std() < 5)
|
| 62 |
+
mean_brightness = img_array.mean()
|
| 63 |
+
|
| 64 |
+
if is_blank:
|
| 65 |
+
print("β WARNING: Image appears to be BLANK/BLACK!")
|
| 66 |
+
print(f" Mean brightness: {mean_brightness:.1f}/255")
|
| 67 |
+
print(f" Std deviation: {img_array.std():.1f}")
|
| 68 |
+
print("")
|
| 69 |
+
print("The upcast_vae fix may not have worked. Check backend logs.")
|
| 70 |
+
else:
|
| 71 |
+
print("β
SUCCESS! Image looks good (Not blank)")
|
| 72 |
+
print(f" Mean brightness: {mean_brightness:.1f}/255")
|
| 73 |
+
print(f" Std deviation: {img_array.std():.1f}")
|
| 74 |
+
print(f" Image size: {img.size}")
|
| 75 |
+
print("")
|
| 76 |
+
print(f"π¨ View your image at: http://localhost:3000")
|
| 77 |
+
|
| 78 |
+
break
|
| 79 |
+
elif state == "failed":
|
| 80 |
+
print(" β")
|
| 81 |
+
print(f"Generation failed: {status_data.get('error', 'Unknown error')}")
|
| 82 |
+
break
|
| 83 |
+
elif state == "generating":
|
| 84 |
+
print(".", end="", flush=True)
|
| 85 |
+
time.sleep(1)
|
| 86 |
+
else:
|
| 87 |
+
print(f"Status check failed: {status_response.status_code}")
|
| 88 |
+
break
|
| 89 |
+
else:
|
| 90 |
+
print(f"β Failed to start session: {start_response.status_code}")
|
| 91 |
+
print(start_response.text)
|