Olivia commited on
Commit
d3d412a
·
1 Parent(s): fac30cc

info endpoint

Browse files
Files changed (2) hide show
  1. README.md +89 -0
  2. app.py +53 -49
README.md CHANGED
@@ -125,6 +125,95 @@ Monitor and compare inference performance across backends.
125
 
126
  ---
127
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
128
  ## Technical Details
129
 
130
  ### Architecture
 
125
 
126
  ---
127
 
128
+ ## Deep Dive: New AI Features 🆕
129
+
130
+ ### AI-Powered Segmentation (U²-Net)
131
+
132
+ **Overview**: StyleForge now uses the U²-Net (U-shape 2-level U-Net) deep learning model for automatic foreground/background segmentation. This eliminates the need for manual masking when applying different styles to specific image regions.
133
+
134
+ #### How U²-Net Works
135
+
136
+ ```
137
+ Input Image (any size)
138
+
139
+ ┌─────────────────────────────────┐
140
+ │ Encoder (U-Net style) │
141
+ │ - Extracts multi-scale features │
142
+ │ - 6 encoder stages │
143
+ │ - Deep supervision paths │
144
+ ├─────────────────────────────────┤
145
+ │ Decoder │
146
+ │ - Reconstructs segmentation mask │
147
+ │ - Salient object detection │
148
+ └─────────────────────────────────┘
149
+
150
+ Binary Mask (256 levels)
151
+
152
+ Foreground (white) / Background (black)
153
+ ```
154
+
155
+ **Technical Details**:
156
+ - **Architecture**: U²-Net with a deep encoder-decoder structure
157
+ - **Input**: RGB image of any size
158
+ - **Output**: Grayscale mask where white = foreground, black = background
159
+ - **Model Size**: ~176 MB pre-trained weights
160
+ - **Inference Time**: ~200-500ms per image (CPU), ~50-100ms (GPU)
161
+
162
+ **Why U²-Net?**
163
+ - Trained on 20,000+ images with diverse subjects
164
+ - Excellent at detecting humans, animals, objects, and products
165
+ - Handles complex backgrounds and edges
166
+ - Works without requiring bounding boxes or user input
167
+
168
+ **Use Cases**:
169
+ - **Portrait Photography**: Style the subject differently from the background
170
+ - **Product Photography**: Apply artistic effects to products while keeping clean backgrounds
171
+ - **Creative Composites**: Apply different artistic styles to foreground vs background
172
+
173
+ #### Gram Matrices: Representing Style
174
+
175
+ The Gram matrix is computed from the feature activations:
176
+
177
+ ```
178
+ F = feature map of shape (C, H, W)
179
+ Gram(F)[i,j] = Σ_k F[i,k] ⋅ F[j,k]
180
+ ```
181
+
182
+ This captures:
183
+ - **Texture information**: How features correlate spatially
184
+ - **Color patterns**: Which colors appear together
185
+ - **Brush strokes**: Directionality and scale of textures
186
+ - **Style signature**: Unique fingerprint of the artistic style
187
+
188
+ #### Fine-Tuning Process
189
+
190
+ The system fine-tunes a pre-trained Fast Style Transfer model:
191
+
192
+ 1. **Load base model** (e.g., Udnie style)
193
+ 2. **Freeze early layers** (preserve low-level transformations)
194
+ 3. **Train on style loss** using the extracted Gram matrices
195
+ 4. **Iterate** with Adam optimizer (lr=0.001)
196
+ 5. **Save** as a reusable `.pth` file
197
+
198
+ ```
199
+ Base Model → Extracted Style Features → Fine-tuned Model
200
+ ↓ ↓ ↓
201
+ Udnie Starry Night Custom "Starry Udnie"
202
+ ```
203
+
204
+ **Training Time**:
205
+ - 100 iterations: ~30-60 seconds (GPU)
206
+ - 200 iterations: ~60-120 seconds (GPU)
207
+ - More iterations = better style matching
208
+
209
+ **Why VGG19?**
210
+ - Pre-trained on ImageNet (1M+ images)
211
+ - Learned rich feature representations
212
+ - Standard in style transfer research (Gatys et al., Johnson et al.)
213
+ - Captures both low-level (textures) and high-level (patterns) features
214
+
215
+ ---
216
+
217
  ## Technical Details
218
 
219
  ### Architecture
app.py CHANGED
@@ -70,46 +70,51 @@ except ImportError:
70
  # Device will be determined when needed within GPU tasks
71
  _SPACES_ZERO_GPU = SPACES_AVAILABLE # From spaces import above
72
 
73
- # Create a device proxy that works like torch.device but lazy-loads on ZeroGPU
74
- class _DeviceProxy:
75
- """Proxy for torch.device that lazy-loads CUDA on ZeroGPU"""
76
 
77
- def __init__(self):
78
- self._device = None
79
 
80
- @property
81
- def type(self):
82
- self._ensure_device()
83
- return self._device.type
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
  def __str__(self):
86
- self._ensure_device()
87
- return str(self._device)
88
 
89
  def __repr__(self):
90
- self._ensure_device()
91
- return repr(self._device)
92
-
93
- def _ensure_device(self):
94
- """Lazy device initialization - only calls torch.cuda.is_available() when needed"""
95
- if self._device is None:
96
- if torch.cuda.is_available():
97
- self._device = torch.device('cuda')
98
- else:
99
- self._device = torch.device('cpu')
100
 
101
  def __eq__(self, other):
102
- return str(self) == str(other)
103
 
104
 
105
- DEVICE = _DeviceProxy()
106
 
107
  if _SPACES_ZERO_GPU:
108
  print(f"Device: Will use CUDA within GPU tasks (ZeroGPU mode)")
109
  else:
110
  # Only access device if not ZeroGPU to avoid CUDA init
111
- DEVICE._ensure_device()
112
- print(f"Device: {DEVICE}")
113
  if SPACES_AVAILABLE:
114
  print("ZeroGPU support enabled")
115
 
@@ -287,7 +292,7 @@ def get_vgg_extractor():
287
  """Lazy load VGG feature extractor (with ZeroGPU support)"""
288
  global _vgg_extractor
289
  if _vgg_extractor is None:
290
- _vgg_extractor = VGGFeatureExtractor().to(DEVICE)
291
  _vgg_extractor.eval()
292
  return _vgg_extractor
293
 
@@ -556,7 +561,7 @@ def load_model(style: str, backend: str = 'auto') -> TransformerNet:
556
  print(f"Loading {style} model with {backend} backend...")
557
  model_path = get_model_path(style)
558
 
559
- model = TransformerNet(num_residual_blocks=5, backend=backend).to(DEVICE)
560
  model.load_checkpoint(str(model_path))
561
  model.eval()
562
 
@@ -573,8 +578,7 @@ print("=" * 50)
573
  if _SPACES_ZERO_GPU:
574
  print("Device: CUDA (ZeroGPU mode - lazy initialization)")
575
  else:
576
- DEVICE._ensure_device()
577
- print(f"Device: {DEVICE.type.upper()}")
578
  print(f"CUDA Kernels: {'Available' if CUDA_KERNELS_AVAILABLE else 'Not Available (will compile on first GPU task)'}")
579
 
580
  # Skip model preloading on ZeroGPU to avoid CUDA init in main process
@@ -613,7 +617,7 @@ def blend_models(style1: str, style2: str, alpha: float, backend: str = 'auto')
613
  model2 = load_model(style2, backend)
614
 
615
  # Create new model
616
- blended = TransformerNet(num_residual_blocks=5, backend=backend).to(DEVICE)
617
  blended.eval()
618
 
619
  # Blend weights
@@ -686,12 +690,12 @@ def apply_region_style(
686
  # Preprocess
687
  import torchvision.transforms as transforms
688
  transform = transforms.Compose([transforms.ToTensor()])
689
- img_tensor = transform(image).unsqueeze(0).to(DEVICE)
690
 
691
  # Convert mask to tensor
692
  mask_np = np.array(mask)
693
  mask_tensor = torch.from_numpy(mask_np).float() / 255.0
694
- mask_tensor = mask_tensor.unsqueeze(0).unsqueeze(0).to(DEVICE)
695
 
696
  # Stylize with both models
697
  with torch.no_grad():
@@ -917,7 +921,7 @@ def train_custom_style(
917
  transforms.ToTensor(),
918
  transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
919
  ])
920
- style_tensor = style_transform(style_image).unsqueeze(0).to(DEVICE)
921
 
922
  # Extract style features from multiple layers
923
  with torch.no_grad():
@@ -947,7 +951,7 @@ def train_custom_style(
947
  g = int(255 * x / 256)
948
  content_img.putpixel((x, y), (r, g, 128))
949
 
950
- content_tensor = style_transform(content_img).unsqueeze(0).to(DEVICE)
951
 
952
  # Training loop
953
  model.train()
@@ -1054,7 +1058,7 @@ def extract_style_from_image(
1054
  ])
1055
 
1056
  # Process style image
1057
- style_tensor = transform(style_image).unsqueeze(0).to(DEVICE)
1058
 
1059
  # Extract style features
1060
  with torch.no_grad():
@@ -1066,7 +1070,7 @@ def extract_style_from_image(
1066
  progress.append("Style features extracted. Creating style model...")
1067
 
1068
  # Create a new model and train it to match the style
1069
- model = TransformerNet(num_residual_blocks=5, backend='auto').to(DEVICE)
1070
 
1071
  # Use a simple content image for training the transform
1072
  if content_image is None:
@@ -1077,7 +1081,7 @@ def extract_style_from_image(
1077
  content_image.putpixel((x, y), (x, y, 128))
1078
 
1079
  content_image = content_image.convert('RGB')
1080
- content_tensor = transform(content_image).unsqueeze(0).to(DEVICE)
1081
 
1082
  # Extract content features
1083
  with torch.no_grad():
@@ -1288,14 +1292,14 @@ def create_benchmark_comparison(style: str) -> str:
1288
  for backend_name, backend_key in [('PyTorch', 'pytorch'), ('CUDA Kernels', 'cuda')]:
1289
  try:
1290
  model = load_model(style, backend_key)
1291
- test_tensor = preprocess_image(test_img).to(DEVICE)
1292
 
1293
  times = []
1294
  for _ in range(3):
1295
  start = time.perf_counter()
1296
  with torch.no_grad():
1297
  _ = model(test_tensor)
1298
- if DEVICE.type == 'cuda':
1299
  torch.cuda.synchronize()
1300
  times.append((time.perf_counter() - start) * 1000)
1301
 
@@ -1397,7 +1401,7 @@ def stylize_image_impl(
1397
  style_display = STYLES.get(style, style)
1398
 
1399
  # Preprocess
1400
- input_tensor = preprocess_image(input_image).to(DEVICE)
1401
 
1402
  # Stylize with timing
1403
  start = time.perf_counter()
@@ -1405,7 +1409,7 @@ def stylize_image_impl(
1405
  with torch.no_grad():
1406
  output_tensor = model(input_tensor)
1407
 
1408
- if DEVICE.type == 'cuda':
1409
  torch.cuda.synchronize()
1410
 
1411
  elapsed_ms = (time.perf_counter() - start) * 1000
@@ -1452,7 +1456,7 @@ def stylize_image_impl(
1452
  | **Avg Time** | {stats['avg_ms']:.1f if stats else elapsed_ms:.1f} ms |
1453
  | **Total Images** | {stats['total_inferences'] if stats else 1} |
1454
  | **Size** | {width}x{height} |
1455
- | **Device** | {DEVICE.type.upper()} |
1456
 
1457
  ---
1458
  {perf_tracker.get_comparison()}
@@ -1514,12 +1518,12 @@ def process_webcam_frame(image: Image.Image, style: str, backend: str) -> Image.
1514
  else:
1515
  model = load_model(style, backend)
1516
 
1517
- input_tensor = preprocess_image(image).to(DEVICE)
1518
 
1519
  with torch.no_grad():
1520
  output_tensor = model(input_tensor)
1521
 
1522
- if DEVICE.type == 'cuda':
1523
  torch.cuda.synchronize()
1524
 
1525
  output_image = postprocess_tensor(output_tensor.cpu())
@@ -1639,14 +1643,14 @@ def run_backend_comparison(style: str) -> str:
1639
  # Test PyTorch backend
1640
  try:
1641
  model = load_model(style, 'pytorch')
1642
- test_tensor = preprocess_image(test_img).to(DEVICE)
1643
 
1644
  times = []
1645
  for _ in range(5):
1646
  start = time.perf_counter()
1647
  with torch.no_grad():
1648
  _ = model(test_tensor)
1649
- if DEVICE.type == 'cuda':
1650
  torch.cuda.synchronize()
1651
  times.append((time.perf_counter() - start) * 1000)
1652
 
@@ -1657,14 +1661,14 @@ def run_backend_comparison(style: str) -> str:
1657
  # Test CUDA backend
1658
  try:
1659
  model = load_model(style, 'cuda')
1660
- test_tensor = preprocess_image(test_img).to(DEVICE)
1661
 
1662
  times = []
1663
  for _ in range(5):
1664
  start = time.perf_counter()
1665
  with torch.no_grad():
1666
  _ = model(test_tensor)
1667
- if DEVICE.type == 'cuda':
1668
  torch.cuda.synchronize()
1669
  times.append((time.perf_counter() - start) * 1000)
1670
 
@@ -1711,7 +1715,7 @@ def create_style_blend_output(
1711
  model = get_blended_model(style1, style2, alpha, backend)
1712
 
1713
  # Process
1714
- input_tensor = preprocess_image(input_image).to(DEVICE)
1715
 
1716
  with torch.no_grad():
1717
  output_tensor = model(input_tensor)
 
70
  # Device will be determined when needed within GPU tasks
71
  _SPACES_ZERO_GPU = SPACES_AVAILABLE # From spaces import above
72
 
73
+ # Lazy device initialization for ZeroGPU compatibility
74
+ _device_cache = None
 
75
 
 
 
76
 
77
+ def get_device():
78
+ """
79
+ Get the current device (lazy-loaded on ZeroGPU).
80
+
81
+ On ZeroGPU, this must be called within a GPU task context to properly
82
+ initialize CUDA. Calling this at module level will cause errors.
83
+ """
84
+ global _device_cache
85
+ if _device_cache is None:
86
+ if torch.cuda.is_available():
87
+ _device_cache = torch.device('cuda')
88
+ else:
89
+ _device_cache = torch.device('cpu')
90
+ return _device_cache
91
+
92
+
93
+ # For backwards compatibility, keep DEVICE as a property
94
+ class _DeviceProperty:
95
+ """Property that returns the actual device when accessed."""
96
 
97
  def __str__(self):
98
+ return str(get_device())
 
99
 
100
  def __repr__(self):
101
+ return repr(get_device())
102
+
103
+ @property
104
+ def type(self):
105
+ return get_device().type
 
 
 
 
 
106
 
107
  def __eq__(self, other):
108
+ return str(get_device()) == str(other)
109
 
110
 
111
+ DEVICE = _DeviceProperty()
112
 
113
  if _SPACES_ZERO_GPU:
114
  print(f"Device: Will use CUDA within GPU tasks (ZeroGPU mode)")
115
  else:
116
  # Only access device if not ZeroGPU to avoid CUDA init
117
+ print(f"Device: {get_device()}")
 
118
  if SPACES_AVAILABLE:
119
  print("ZeroGPU support enabled")
120
 
 
292
  """Lazy load VGG feature extractor (with ZeroGPU support)"""
293
  global _vgg_extractor
294
  if _vgg_extractor is None:
295
+ _vgg_extractor = VGGFeatureExtractor().to(get_device())
296
  _vgg_extractor.eval()
297
  return _vgg_extractor
298
 
 
561
  print(f"Loading {style} model with {backend} backend...")
562
  model_path = get_model_path(style)
563
 
564
+ model = TransformerNet(num_residual_blocks=5, backend=backend).to(get_device())
565
  model.load_checkpoint(str(model_path))
566
  model.eval()
567
 
 
578
  if _SPACES_ZERO_GPU:
579
  print("Device: CUDA (ZeroGPU mode - lazy initialization)")
580
  else:
581
+ print(f"Device: {get_device().type.upper()}")
 
582
  print(f"CUDA Kernels: {'Available' if CUDA_KERNELS_AVAILABLE else 'Not Available (will compile on first GPU task)'}")
583
 
584
  # Skip model preloading on ZeroGPU to avoid CUDA init in main process
 
617
  model2 = load_model(style2, backend)
618
 
619
  # Create new model
620
+ blended = TransformerNet(num_residual_blocks=5, backend=backend).to(get_device())
621
  blended.eval()
622
 
623
  # Blend weights
 
690
  # Preprocess
691
  import torchvision.transforms as transforms
692
  transform = transforms.Compose([transforms.ToTensor()])
693
+ img_tensor = transform(image).unsqueeze(0).to(get_device())
694
 
695
  # Convert mask to tensor
696
  mask_np = np.array(mask)
697
  mask_tensor = torch.from_numpy(mask_np).float() / 255.0
698
+ mask_tensor = mask_tensor.unsqueeze(0).unsqueeze(0).to(get_device())
699
 
700
  # Stylize with both models
701
  with torch.no_grad():
 
921
  transforms.ToTensor(),
922
  transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
923
  ])
924
+ style_tensor = style_transform(style_image).unsqueeze(0).to(get_device())
925
 
926
  # Extract style features from multiple layers
927
  with torch.no_grad():
 
951
  g = int(255 * x / 256)
952
  content_img.putpixel((x, y), (r, g, 128))
953
 
954
+ content_tensor = style_transform(content_img).unsqueeze(0).to(get_device())
955
 
956
  # Training loop
957
  model.train()
 
1058
  ])
1059
 
1060
  # Process style image
1061
+ style_tensor = transform(style_image).unsqueeze(0).to(get_device())
1062
 
1063
  # Extract style features
1064
  with torch.no_grad():
 
1070
  progress.append("Style features extracted. Creating style model...")
1071
 
1072
  # Create a new model and train it to match the style
1073
+ model = TransformerNet(num_residual_blocks=5, backend='auto').to(get_device())
1074
 
1075
  # Use a simple content image for training the transform
1076
  if content_image is None:
 
1081
  content_image.putpixel((x, y), (x, y, 128))
1082
 
1083
  content_image = content_image.convert('RGB')
1084
+ content_tensor = transform(content_image).unsqueeze(0).to(get_device())
1085
 
1086
  # Extract content features
1087
  with torch.no_grad():
 
1292
  for backend_name, backend_key in [('PyTorch', 'pytorch'), ('CUDA Kernels', 'cuda')]:
1293
  try:
1294
  model = load_model(style, backend_key)
1295
+ test_tensor = preprocess_image(test_img).to(get_device())
1296
 
1297
  times = []
1298
  for _ in range(3):
1299
  start = time.perf_counter()
1300
  with torch.no_grad():
1301
  _ = model(test_tensor)
1302
+ if get_device().type == 'cuda':
1303
  torch.cuda.synchronize()
1304
  times.append((time.perf_counter() - start) * 1000)
1305
 
 
1401
  style_display = STYLES.get(style, style)
1402
 
1403
  # Preprocess
1404
+ input_tensor = preprocess_image(input_image).to(get_device())
1405
 
1406
  # Stylize with timing
1407
  start = time.perf_counter()
 
1409
  with torch.no_grad():
1410
  output_tensor = model(input_tensor)
1411
 
1412
+ if get_device().type == 'cuda':
1413
  torch.cuda.synchronize()
1414
 
1415
  elapsed_ms = (time.perf_counter() - start) * 1000
 
1456
  | **Avg Time** | {stats['avg_ms']:.1f if stats else elapsed_ms:.1f} ms |
1457
  | **Total Images** | {stats['total_inferences'] if stats else 1} |
1458
  | **Size** | {width}x{height} |
1459
+ | **Device** | {get_device().type.upper()} |
1460
 
1461
  ---
1462
  {perf_tracker.get_comparison()}
 
1518
  else:
1519
  model = load_model(style, backend)
1520
 
1521
+ input_tensor = preprocess_image(image).to(get_device())
1522
 
1523
  with torch.no_grad():
1524
  output_tensor = model(input_tensor)
1525
 
1526
+ if get_device().type == 'cuda':
1527
  torch.cuda.synchronize()
1528
 
1529
  output_image = postprocess_tensor(output_tensor.cpu())
 
1643
  # Test PyTorch backend
1644
  try:
1645
  model = load_model(style, 'pytorch')
1646
+ test_tensor = preprocess_image(test_img).to(get_device())
1647
 
1648
  times = []
1649
  for _ in range(5):
1650
  start = time.perf_counter()
1651
  with torch.no_grad():
1652
  _ = model(test_tensor)
1653
+ if get_device().type == 'cuda':
1654
  torch.cuda.synchronize()
1655
  times.append((time.perf_counter() - start) * 1000)
1656
 
 
1661
  # Test CUDA backend
1662
  try:
1663
  model = load_model(style, 'cuda')
1664
+ test_tensor = preprocess_image(test_img).to(get_device())
1665
 
1666
  times = []
1667
  for _ in range(5):
1668
  start = time.perf_counter()
1669
  with torch.no_grad():
1670
  _ = model(test_tensor)
1671
+ if get_device().type == 'cuda':
1672
  torch.cuda.synchronize()
1673
  times.append((time.perf_counter() - start) * 1000)
1674
 
 
1715
  model = get_blended_model(style1, style2, alpha, backend)
1716
 
1717
  # Process
1718
+ input_tensor = preprocess_image(input_image).to(get_device())
1719
 
1720
  with torch.no_grad():
1721
  output_tensor = model(input_tensor)