Spaces:

oliau
/

StyleForge

Sleeping

App Files Files Community

Olivia commited on 27 days ago

Commit

d3d412a

1 Parent(s): fac30cc

info endpoint

Browse files

Files changed (2) hide show

README.md +89 -0
app.py +53 -49

README.md CHANGED Viewed

@@ -125,6 +125,95 @@ Monitor and compare inference performance across backends.
 ---
 ## Technical Details
 ### Architecture

 ---
+## Deep Dive: New AI Features 🆕
+### AI-Powered Segmentation (U²-Net)
+**Overview**: StyleForge now uses the U²-Net (U-shape 2-level U-Net) deep learning model for automatic foreground/background segmentation. This eliminates the need for manual masking when applying different styles to specific image regions.
+#### How U²-Net Works
+```
+Input Image (any size)
+    ↓
+┌─────────────────────────────────┐
+│  Encoder (U-Net style)           │
+│  - Extracts multi-scale features  │
+│  - 6 encoder stages               │
+│  - Deep supervision paths        │
+├─────────────────────────────────┤
+│  Decoder                          │
+│  - Reconstructs segmentation mask  │
+│  - Salient object detection      │
+└─────────────────────────────────┘
+    ↓
+Binary Mask (256 levels)
+    ↓
+Foreground (white) / Background (black)
+```
+**Technical Details**:
+- **Architecture**: U²-Net with a deep encoder-decoder structure
+- **Input**: RGB image of any size
+- **Output**: Grayscale mask where white = foreground, black = background
+- **Model Size**: ~176 MB pre-trained weights
+- **Inference Time**: ~200-500ms per image (CPU), ~50-100ms (GPU)
+**Why U²-Net?**
+- Trained on 20,000+ images with diverse subjects
+- Excellent at detecting humans, animals, objects, and products
+- Handles complex backgrounds and edges
+- Works without requiring bounding boxes or user input
+**Use Cases**:
+- **Portrait Photography**: Style the subject differently from the background
+- **Product Photography**: Apply artistic effects to products while keeping clean backgrounds
+- **Creative Composites**: Apply different artistic styles to foreground vs background
+#### Gram Matrices: Representing Style
+The Gram matrix is computed from the feature activations:
+```
+F = feature map of shape (C, H, W)
+Gram(F)[i,j] = Σ_k F[i,k] ⋅ F[j,k]
+```
+This captures:
+- **Texture information**: How features correlate spatially
+- **Color patterns**: Which colors appear together
+- **Brush strokes**: Directionality and scale of textures
+- **Style signature**: Unique fingerprint of the artistic style
+#### Fine-Tuning Process
+The system fine-tunes a pre-trained Fast Style Transfer model:
+1. **Load base model** (e.g., Udnie style)
+2. **Freeze early layers** (preserve low-level transformations)
+3. **Train on style loss** using the extracted Gram matrices
+4. **Iterate** with Adam optimizer (lr=0.001)
+5. **Save** as a reusable `.pth` file
+```
+Base Model → Extracted Style Features → Fine-tuned Model
+   ↓              ↓                        ↓
+ Udnie        Starry Night          Custom "Starry Udnie"
+```
+**Training Time**:
+- 100 iterations: ~30-60 seconds (GPU)
+- 200 iterations: ~60-120 seconds (GPU)
+- More iterations = better style matching
+**Why VGG19?**
+- Pre-trained on ImageNet (1M+ images)
+- Learned rich feature representations
+- Standard in style transfer research (Gatys et al., Johnson et al.)
+- Captures both low-level (textures) and high-level (patterns) features
+---
 ## Technical Details
 ### Architecture

app.py CHANGED Viewed

@@ -70,46 +70,51 @@ except ImportError:
 # Device will be determined when needed within GPU tasks
 _SPACES_ZERO_GPU = SPACES_AVAILABLE  # From spaces import above
-# Create a device proxy that works like torch.device but lazy-loads on ZeroGPU
-class _DeviceProxy:
-    """Proxy for torch.device that lazy-loads CUDA on ZeroGPU"""
-    def __init__(self):
-        self._device = None
-    @property
-    def type(self):
-        self._ensure_device()
-        return self._device.type
     def __str__(self):
-        self._ensure_device()
-        return str(self._device)
     def __repr__(self):
-        self._ensure_device()
-        return repr(self._device)
-    def _ensure_device(self):
-        """Lazy device initialization - only calls torch.cuda.is_available() when needed"""
-        if self._device is None:
-            if torch.cuda.is_available():
-                self._device = torch.device('cuda')
-            else:
-                self._device = torch.device('cpu')
     def __eq__(self, other):
-        return str(self) == str(other)
-DEVICE = _DeviceProxy()
 if _SPACES_ZERO_GPU:
     print(f"Device: Will use CUDA within GPU tasks (ZeroGPU mode)")
 else:
     # Only access device if not ZeroGPU to avoid CUDA init
-    DEVICE._ensure_device()
-    print(f"Device: {DEVICE}")
 if SPACES_AVAILABLE:
     print("ZeroGPU support enabled")
@@ -287,7 +292,7 @@ def get_vgg_extractor():
     """Lazy load VGG feature extractor (with ZeroGPU support)"""
     global _vgg_extractor
     if _vgg_extractor is None:
-        _vgg_extractor = VGGFeatureExtractor().to(DEVICE)
         _vgg_extractor.eval()
     return _vgg_extractor
@@ -556,7 +561,7 @@ def load_model(style: str, backend: str = 'auto') -> TransformerNet:
         print(f"Loading {style} model with {backend} backend...")
         model_path = get_model_path(style)
-        model = TransformerNet(num_residual_blocks=5, backend=backend).to(DEVICE)
         model.load_checkpoint(str(model_path))
         model.eval()
@@ -573,8 +578,7 @@ print("=" * 50)
 if _SPACES_ZERO_GPU:
     print("Device: CUDA (ZeroGPU mode - lazy initialization)")
 else:
-    DEVICE._ensure_device()
-    print(f"Device: {DEVICE.type.upper()}")
 print(f"CUDA Kernels: {'Available' if CUDA_KERNELS_AVAILABLE else 'Not Available (will compile on first GPU task)'}")
 # Skip model preloading on ZeroGPU to avoid CUDA init in main process
@@ -613,7 +617,7 @@ def blend_models(style1: str, style2: str, alpha: float, backend: str = 'auto')
     model2 = load_model(style2, backend)
     # Create new model
-    blended = TransformerNet(num_residual_blocks=5, backend=backend).to(DEVICE)
     blended.eval()
     # Blend weights
@@ -686,12 +690,12 @@ def apply_region_style(
     # Preprocess
     import torchvision.transforms as transforms
     transform = transforms.Compose([transforms.ToTensor()])
-    img_tensor = transform(image).unsqueeze(0).to(DEVICE)
     # Convert mask to tensor
     mask_np = np.array(mask)
     mask_tensor = torch.from_numpy(mask_np).float() / 255.0
-    mask_tensor = mask_tensor.unsqueeze(0).unsqueeze(0).to(DEVICE)
     # Stylize with both models
     with torch.no_grad():
@@ -917,7 +921,7 @@ def train_custom_style(
             transforms.ToTensor(),
             transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
         ])
-        style_tensor = style_transform(style_image).unsqueeze(0).to(DEVICE)
         # Extract style features from multiple layers
         with torch.no_grad():
@@ -947,7 +951,7 @@ def train_custom_style(
                 g = int(255 * x / 256)
                 content_img.putpixel((x, y), (r, g, 128))
-        content_tensor = style_transform(content_img).unsqueeze(0).to(DEVICE)
         # Training loop
         model.train()
@@ -1054,7 +1058,7 @@ def extract_style_from_image(
         ])
         # Process style image
-        style_tensor = transform(style_image).unsqueeze(0).to(DEVICE)
         # Extract style features
         with torch.no_grad():
@@ -1066,7 +1070,7 @@ def extract_style_from_image(
         progress.append("Style features extracted. Creating style model...")
         # Create a new model and train it to match the style
-        model = TransformerNet(num_residual_blocks=5, backend='auto').to(DEVICE)
         # Use a simple content image for training the transform
         if content_image is None:
@@ -1077,7 +1081,7 @@ def extract_style_from_image(
                     content_image.putpixel((x, y), (x, y, 128))
         content_image = content_image.convert('RGB')
-        content_tensor = transform(content_image).unsqueeze(0).to(DEVICE)
         # Extract content features
         with torch.no_grad():
@@ -1288,14 +1292,14 @@ def create_benchmark_comparison(style: str) -> str:
     for backend_name, backend_key in [('PyTorch', 'pytorch'), ('CUDA Kernels', 'cuda')]:
         try:
             model = load_model(style, backend_key)
-            test_tensor = preprocess_image(test_img).to(DEVICE)
             times = []
             for _ in range(3):
                 start = time.perf_counter()
                 with torch.no_grad():
                     _ = model(test_tensor)
-                if DEVICE.type == 'cuda':
                     torch.cuda.synchronize()
                 times.append((time.perf_counter() - start) * 1000)
@@ -1397,7 +1401,7 @@ def stylize_image_impl(
             style_display = STYLES.get(style, style)
         # Preprocess
-        input_tensor = preprocess_image(input_image).to(DEVICE)
         # Stylize with timing
         start = time.perf_counter()
@@ -1405,7 +1409,7 @@ def stylize_image_impl(
         with torch.no_grad():
             output_tensor = model(input_tensor)
-        if DEVICE.type == 'cuda':
             torch.cuda.synchronize()
         elapsed_ms = (time.perf_counter() - start) * 1000
@@ -1452,7 +1456,7 @@ def stylize_image_impl(
 | **Avg Time** | {stats['avg_ms']:.1f if stats else elapsed_ms:.1f} ms |
 | **Total Images** | {stats['total_inferences'] if stats else 1} |
 | **Size** | {width}x{height} |
-| **Device** | {DEVICE.type.upper()} |
 ---
 {perf_tracker.get_comparison()}
@@ -1514,12 +1518,12 @@ def process_webcam_frame(image: Image.Image, style: str, backend: str) -> Image.
         else:
             model = load_model(style, backend)
-        input_tensor = preprocess_image(image).to(DEVICE)
         with torch.no_grad():
             output_tensor = model(input_tensor)
-        if DEVICE.type == 'cuda':
             torch.cuda.synchronize()
         output_image = postprocess_tensor(output_tensor.cpu())
@@ -1639,14 +1643,14 @@ def run_backend_comparison(style: str) -> str:
     # Test PyTorch backend
     try:
         model = load_model(style, 'pytorch')
-        test_tensor = preprocess_image(test_img).to(DEVICE)
         times = []
         for _ in range(5):
             start = time.perf_counter()
             with torch.no_grad():
                 _ = model(test_tensor)
-            if DEVICE.type == 'cuda':
                 torch.cuda.synchronize()
             times.append((time.perf_counter() - start) * 1000)
@@ -1657,14 +1661,14 @@ def run_backend_comparison(style: str) -> str:
     # Test CUDA backend
     try:
         model = load_model(style, 'cuda')
-        test_tensor = preprocess_image(test_img).to(DEVICE)
         times = []
         for _ in range(5):
             start = time.perf_counter()
             with torch.no_grad():
                 _ = model(test_tensor)
-            if DEVICE.type == 'cuda':
                 torch.cuda.synchronize()
             times.append((time.perf_counter() - start) * 1000)
@@ -1711,7 +1715,7 @@ def create_style_blend_output(
     model = get_blended_model(style1, style2, alpha, backend)
     # Process
-    input_tensor = preprocess_image(input_image).to(DEVICE)
     with torch.no_grad():
         output_tensor = model(input_tensor)

 # Device will be determined when needed within GPU tasks
 _SPACES_ZERO_GPU = SPACES_AVAILABLE  # From spaces import above
+# Lazy device initialization for ZeroGPU compatibility
+_device_cache = None
+def get_device():
+    """
+    Get the current device (lazy-loaded on ZeroGPU).
+    On ZeroGPU, this must be called within a GPU task context to properly
+    initialize CUDA. Calling this at module level will cause errors.
+    """
+    global _device_cache
+    if _device_cache is None:
+        if torch.cuda.is_available():
+            _device_cache = torch.device('cuda')
+        else:
+            _device_cache = torch.device('cpu')
+    return _device_cache
+# For backwards compatibility, keep DEVICE as a property
+class _DeviceProperty:
+    """Property that returns the actual device when accessed."""
     def __str__(self):
+        return str(get_device())
     def __repr__(self):
+        return repr(get_device())
+    @property
+    def type(self):
+        return get_device().type
     def __eq__(self, other):
+        return str(get_device()) == str(other)
+DEVICE = _DeviceProperty()
 if _SPACES_ZERO_GPU:
     print(f"Device: Will use CUDA within GPU tasks (ZeroGPU mode)")
 else:
     # Only access device if not ZeroGPU to avoid CUDA init
+    print(f"Device: {get_device()}")
 if SPACES_AVAILABLE:
     print("ZeroGPU support enabled")
     """Lazy load VGG feature extractor (with ZeroGPU support)"""
     global _vgg_extractor
     if _vgg_extractor is None:
+        _vgg_extractor = VGGFeatureExtractor().to(get_device())
         _vgg_extractor.eval()
     return _vgg_extractor
         print(f"Loading {style} model with {backend} backend...")
         model_path = get_model_path(style)
+        model = TransformerNet(num_residual_blocks=5, backend=backend).to(get_device())
         model.load_checkpoint(str(model_path))
         model.eval()
 if _SPACES_ZERO_GPU:
     print("Device: CUDA (ZeroGPU mode - lazy initialization)")
 else:
+    print(f"Device: {get_device().type.upper()}")
 print(f"CUDA Kernels: {'Available' if CUDA_KERNELS_AVAILABLE else 'Not Available (will compile on first GPU task)'}")
 # Skip model preloading on ZeroGPU to avoid CUDA init in main process
     model2 = load_model(style2, backend)
     # Create new model
+    blended = TransformerNet(num_residual_blocks=5, backend=backend).to(get_device())
     blended.eval()
     # Blend weights
     # Preprocess
     import torchvision.transforms as transforms
     transform = transforms.Compose([transforms.ToTensor()])
+    img_tensor = transform(image).unsqueeze(0).to(get_device())
     # Convert mask to tensor
     mask_np = np.array(mask)
     mask_tensor = torch.from_numpy(mask_np).float() / 255.0
+    mask_tensor = mask_tensor.unsqueeze(0).unsqueeze(0).to(get_device())
     # Stylize with both models
     with torch.no_grad():
             transforms.ToTensor(),
             transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
         ])
+        style_tensor = style_transform(style_image).unsqueeze(0).to(get_device())
         # Extract style features from multiple layers
         with torch.no_grad():
                 g = int(255 * x / 256)
                 content_img.putpixel((x, y), (r, g, 128))
+        content_tensor = style_transform(content_img).unsqueeze(0).to(get_device())
         # Training loop
         model.train()
         ])
         # Process style image
+        style_tensor = transform(style_image).unsqueeze(0).to(get_device())
         # Extract style features
         with torch.no_grad():
         progress.append("Style features extracted. Creating style model...")
         # Create a new model and train it to match the style
+        model = TransformerNet(num_residual_blocks=5, backend='auto').to(get_device())
         # Use a simple content image for training the transform
         if content_image is None:
                     content_image.putpixel((x, y), (x, y, 128))
         content_image = content_image.convert('RGB')
+        content_tensor = transform(content_image).unsqueeze(0).to(get_device())
         # Extract content features
         with torch.no_grad():
     for backend_name, backend_key in [('PyTorch', 'pytorch'), ('CUDA Kernels', 'cuda')]:
         try:
             model = load_model(style, backend_key)
+            test_tensor = preprocess_image(test_img).to(get_device())
             times = []
             for _ in range(3):
                 start = time.perf_counter()
                 with torch.no_grad():
                     _ = model(test_tensor)
+                if get_device().type == 'cuda':
                     torch.cuda.synchronize()
                 times.append((time.perf_counter() - start) * 1000)
             style_display = STYLES.get(style, style)
         # Preprocess
+        input_tensor = preprocess_image(input_image).to(get_device())
         # Stylize with timing
         start = time.perf_counter()
         with torch.no_grad():
             output_tensor = model(input_tensor)
+        if get_device().type == 'cuda':
             torch.cuda.synchronize()
         elapsed_ms = (time.perf_counter() - start) * 1000
 | **Avg Time** | {stats['avg_ms']:.1f if stats else elapsed_ms:.1f} ms |
 | **Total Images** | {stats['total_inferences'] if stats else 1} |
 | **Size** | {width}x{height} |
+| **Device** | {get_device().type.upper()} |
 ---
 {perf_tracker.get_comparison()}
         else:
             model = load_model(style, backend)
+        input_tensor = preprocess_image(image).to(get_device())
         with torch.no_grad():
             output_tensor = model(input_tensor)
+        if get_device().type == 'cuda':
             torch.cuda.synchronize()
         output_image = postprocess_tensor(output_tensor.cpu())
     # Test PyTorch backend
     try:
         model = load_model(style, 'pytorch')
+        test_tensor = preprocess_image(test_img).to(get_device())
         times = []
         for _ in range(5):
             start = time.perf_counter()
             with torch.no_grad():
                 _ = model(test_tensor)
+            if get_device().type == 'cuda':
                 torch.cuda.synchronize()
             times.append((time.perf_counter() - start) * 1000)
     # Test CUDA backend
     try:
         model = load_model(style, 'cuda')
+        test_tensor = preprocess_image(test_img).to(get_device())
         times = []
         for _ in range(5):
             start = time.perf_counter()
             with torch.no_grad():
                 _ = model(test_tensor)
+            if get_device().type == 'cuda':
                 torch.cuda.synchronize()
             times.append((time.perf_counter() - start) * 1000)
     model = get_blended_model(style1, style2, alpha, backend)
     # Process
+    input_tensor = preprocess_image(input_image).to(get_device())
     with torch.no_grad():
         output_tensor = model(input_tensor)