Spaces:
Sleeping
Sleeping
Olivia
commited on
Commit
·
d3d412a
1
Parent(s):
fac30cc
info endpoint
Browse files
README.md
CHANGED
|
@@ -125,6 +125,95 @@ Monitor and compare inference performance across backends.
|
|
| 125 |
|
| 126 |
---
|
| 127 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
## Technical Details
|
| 129 |
|
| 130 |
### Architecture
|
|
|
|
| 125 |
|
| 126 |
---
|
| 127 |
|
| 128 |
+
## Deep Dive: New AI Features 🆕
|
| 129 |
+
|
| 130 |
+
### AI-Powered Segmentation (U²-Net)
|
| 131 |
+
|
| 132 |
+
**Overview**: StyleForge now uses the U²-Net (U-shape 2-level U-Net) deep learning model for automatic foreground/background segmentation. This eliminates the need for manual masking when applying different styles to specific image regions.
|
| 133 |
+
|
| 134 |
+
#### How U²-Net Works
|
| 135 |
+
|
| 136 |
+
```
|
| 137 |
+
Input Image (any size)
|
| 138 |
+
↓
|
| 139 |
+
┌─────────────────────────────────┐
|
| 140 |
+
│ Encoder (U-Net style) │
|
| 141 |
+
│ - Extracts multi-scale features │
|
| 142 |
+
│ - 6 encoder stages │
|
| 143 |
+
│ - Deep supervision paths │
|
| 144 |
+
├─────────────────────────────────┤
|
| 145 |
+
│ Decoder │
|
| 146 |
+
│ - Reconstructs segmentation mask │
|
| 147 |
+
│ - Salient object detection │
|
| 148 |
+
└─────────────────────────────────┘
|
| 149 |
+
↓
|
| 150 |
+
Binary Mask (256 levels)
|
| 151 |
+
↓
|
| 152 |
+
Foreground (white) / Background (black)
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
**Technical Details**:
|
| 156 |
+
- **Architecture**: U²-Net with a deep encoder-decoder structure
|
| 157 |
+
- **Input**: RGB image of any size
|
| 158 |
+
- **Output**: Grayscale mask where white = foreground, black = background
|
| 159 |
+
- **Model Size**: ~176 MB pre-trained weights
|
| 160 |
+
- **Inference Time**: ~200-500ms per image (CPU), ~50-100ms (GPU)
|
| 161 |
+
|
| 162 |
+
**Why U²-Net?**
|
| 163 |
+
- Trained on 20,000+ images with diverse subjects
|
| 164 |
+
- Excellent at detecting humans, animals, objects, and products
|
| 165 |
+
- Handles complex backgrounds and edges
|
| 166 |
+
- Works without requiring bounding boxes or user input
|
| 167 |
+
|
| 168 |
+
**Use Cases**:
|
| 169 |
+
- **Portrait Photography**: Style the subject differently from the background
|
| 170 |
+
- **Product Photography**: Apply artistic effects to products while keeping clean backgrounds
|
| 171 |
+
- **Creative Composites**: Apply different artistic styles to foreground vs background
|
| 172 |
+
|
| 173 |
+
#### Gram Matrices: Representing Style
|
| 174 |
+
|
| 175 |
+
The Gram matrix is computed from the feature activations:
|
| 176 |
+
|
| 177 |
+
```
|
| 178 |
+
F = feature map of shape (C, H, W)
|
| 179 |
+
Gram(F)[i,j] = Σ_k F[i,k] ⋅ F[j,k]
|
| 180 |
+
```
|
| 181 |
+
|
| 182 |
+
This captures:
|
| 183 |
+
- **Texture information**: How features correlate spatially
|
| 184 |
+
- **Color patterns**: Which colors appear together
|
| 185 |
+
- **Brush strokes**: Directionality and scale of textures
|
| 186 |
+
- **Style signature**: Unique fingerprint of the artistic style
|
| 187 |
+
|
| 188 |
+
#### Fine-Tuning Process
|
| 189 |
+
|
| 190 |
+
The system fine-tunes a pre-trained Fast Style Transfer model:
|
| 191 |
+
|
| 192 |
+
1. **Load base model** (e.g., Udnie style)
|
| 193 |
+
2. **Freeze early layers** (preserve low-level transformations)
|
| 194 |
+
3. **Train on style loss** using the extracted Gram matrices
|
| 195 |
+
4. **Iterate** with Adam optimizer (lr=0.001)
|
| 196 |
+
5. **Save** as a reusable `.pth` file
|
| 197 |
+
|
| 198 |
+
```
|
| 199 |
+
Base Model → Extracted Style Features → Fine-tuned Model
|
| 200 |
+
↓ ↓ ↓
|
| 201 |
+
Udnie Starry Night Custom "Starry Udnie"
|
| 202 |
+
```
|
| 203 |
+
|
| 204 |
+
**Training Time**:
|
| 205 |
+
- 100 iterations: ~30-60 seconds (GPU)
|
| 206 |
+
- 200 iterations: ~60-120 seconds (GPU)
|
| 207 |
+
- More iterations = better style matching
|
| 208 |
+
|
| 209 |
+
**Why VGG19?**
|
| 210 |
+
- Pre-trained on ImageNet (1M+ images)
|
| 211 |
+
- Learned rich feature representations
|
| 212 |
+
- Standard in style transfer research (Gatys et al., Johnson et al.)
|
| 213 |
+
- Captures both low-level (textures) and high-level (patterns) features
|
| 214 |
+
|
| 215 |
+
---
|
| 216 |
+
|
| 217 |
## Technical Details
|
| 218 |
|
| 219 |
### Architecture
|
app.py
CHANGED
|
@@ -70,46 +70,51 @@ except ImportError:
|
|
| 70 |
# Device will be determined when needed within GPU tasks
|
| 71 |
_SPACES_ZERO_GPU = SPACES_AVAILABLE # From spaces import above
|
| 72 |
|
| 73 |
-
#
|
| 74 |
-
|
| 75 |
-
"""Proxy for torch.device that lazy-loads CUDA on ZeroGPU"""
|
| 76 |
|
| 77 |
-
def __init__(self):
|
| 78 |
-
self._device = None
|
| 79 |
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
|
| 85 |
def __str__(self):
|
| 86 |
-
|
| 87 |
-
return str(self._device)
|
| 88 |
|
| 89 |
def __repr__(self):
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
def
|
| 94 |
-
|
| 95 |
-
if self._device is None:
|
| 96 |
-
if torch.cuda.is_available():
|
| 97 |
-
self._device = torch.device('cuda')
|
| 98 |
-
else:
|
| 99 |
-
self._device = torch.device('cpu')
|
| 100 |
|
| 101 |
def __eq__(self, other):
|
| 102 |
-
return str(
|
| 103 |
|
| 104 |
|
| 105 |
-
DEVICE =
|
| 106 |
|
| 107 |
if _SPACES_ZERO_GPU:
|
| 108 |
print(f"Device: Will use CUDA within GPU tasks (ZeroGPU mode)")
|
| 109 |
else:
|
| 110 |
# Only access device if not ZeroGPU to avoid CUDA init
|
| 111 |
-
|
| 112 |
-
print(f"Device: {DEVICE}")
|
| 113 |
if SPACES_AVAILABLE:
|
| 114 |
print("ZeroGPU support enabled")
|
| 115 |
|
|
@@ -287,7 +292,7 @@ def get_vgg_extractor():
|
|
| 287 |
"""Lazy load VGG feature extractor (with ZeroGPU support)"""
|
| 288 |
global _vgg_extractor
|
| 289 |
if _vgg_extractor is None:
|
| 290 |
-
_vgg_extractor = VGGFeatureExtractor().to(
|
| 291 |
_vgg_extractor.eval()
|
| 292 |
return _vgg_extractor
|
| 293 |
|
|
@@ -556,7 +561,7 @@ def load_model(style: str, backend: str = 'auto') -> TransformerNet:
|
|
| 556 |
print(f"Loading {style} model with {backend} backend...")
|
| 557 |
model_path = get_model_path(style)
|
| 558 |
|
| 559 |
-
model = TransformerNet(num_residual_blocks=5, backend=backend).to(
|
| 560 |
model.load_checkpoint(str(model_path))
|
| 561 |
model.eval()
|
| 562 |
|
|
@@ -573,8 +578,7 @@ print("=" * 50)
|
|
| 573 |
if _SPACES_ZERO_GPU:
|
| 574 |
print("Device: CUDA (ZeroGPU mode - lazy initialization)")
|
| 575 |
else:
|
| 576 |
-
|
| 577 |
-
print(f"Device: {DEVICE.type.upper()}")
|
| 578 |
print(f"CUDA Kernels: {'Available' if CUDA_KERNELS_AVAILABLE else 'Not Available (will compile on first GPU task)'}")
|
| 579 |
|
| 580 |
# Skip model preloading on ZeroGPU to avoid CUDA init in main process
|
|
@@ -613,7 +617,7 @@ def blend_models(style1: str, style2: str, alpha: float, backend: str = 'auto')
|
|
| 613 |
model2 = load_model(style2, backend)
|
| 614 |
|
| 615 |
# Create new model
|
| 616 |
-
blended = TransformerNet(num_residual_blocks=5, backend=backend).to(
|
| 617 |
blended.eval()
|
| 618 |
|
| 619 |
# Blend weights
|
|
@@ -686,12 +690,12 @@ def apply_region_style(
|
|
| 686 |
# Preprocess
|
| 687 |
import torchvision.transforms as transforms
|
| 688 |
transform = transforms.Compose([transforms.ToTensor()])
|
| 689 |
-
img_tensor = transform(image).unsqueeze(0).to(
|
| 690 |
|
| 691 |
# Convert mask to tensor
|
| 692 |
mask_np = np.array(mask)
|
| 693 |
mask_tensor = torch.from_numpy(mask_np).float() / 255.0
|
| 694 |
-
mask_tensor = mask_tensor.unsqueeze(0).unsqueeze(0).to(
|
| 695 |
|
| 696 |
# Stylize with both models
|
| 697 |
with torch.no_grad():
|
|
@@ -917,7 +921,7 @@ def train_custom_style(
|
|
| 917 |
transforms.ToTensor(),
|
| 918 |
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
|
| 919 |
])
|
| 920 |
-
style_tensor = style_transform(style_image).unsqueeze(0).to(
|
| 921 |
|
| 922 |
# Extract style features from multiple layers
|
| 923 |
with torch.no_grad():
|
|
@@ -947,7 +951,7 @@ def train_custom_style(
|
|
| 947 |
g = int(255 * x / 256)
|
| 948 |
content_img.putpixel((x, y), (r, g, 128))
|
| 949 |
|
| 950 |
-
content_tensor = style_transform(content_img).unsqueeze(0).to(
|
| 951 |
|
| 952 |
# Training loop
|
| 953 |
model.train()
|
|
@@ -1054,7 +1058,7 @@ def extract_style_from_image(
|
|
| 1054 |
])
|
| 1055 |
|
| 1056 |
# Process style image
|
| 1057 |
-
style_tensor = transform(style_image).unsqueeze(0).to(
|
| 1058 |
|
| 1059 |
# Extract style features
|
| 1060 |
with torch.no_grad():
|
|
@@ -1066,7 +1070,7 @@ def extract_style_from_image(
|
|
| 1066 |
progress.append("Style features extracted. Creating style model...")
|
| 1067 |
|
| 1068 |
# Create a new model and train it to match the style
|
| 1069 |
-
model = TransformerNet(num_residual_blocks=5, backend='auto').to(
|
| 1070 |
|
| 1071 |
# Use a simple content image for training the transform
|
| 1072 |
if content_image is None:
|
|
@@ -1077,7 +1081,7 @@ def extract_style_from_image(
|
|
| 1077 |
content_image.putpixel((x, y), (x, y, 128))
|
| 1078 |
|
| 1079 |
content_image = content_image.convert('RGB')
|
| 1080 |
-
content_tensor = transform(content_image).unsqueeze(0).to(
|
| 1081 |
|
| 1082 |
# Extract content features
|
| 1083 |
with torch.no_grad():
|
|
@@ -1288,14 +1292,14 @@ def create_benchmark_comparison(style: str) -> str:
|
|
| 1288 |
for backend_name, backend_key in [('PyTorch', 'pytorch'), ('CUDA Kernels', 'cuda')]:
|
| 1289 |
try:
|
| 1290 |
model = load_model(style, backend_key)
|
| 1291 |
-
test_tensor = preprocess_image(test_img).to(
|
| 1292 |
|
| 1293 |
times = []
|
| 1294 |
for _ in range(3):
|
| 1295 |
start = time.perf_counter()
|
| 1296 |
with torch.no_grad():
|
| 1297 |
_ = model(test_tensor)
|
| 1298 |
-
if
|
| 1299 |
torch.cuda.synchronize()
|
| 1300 |
times.append((time.perf_counter() - start) * 1000)
|
| 1301 |
|
|
@@ -1397,7 +1401,7 @@ def stylize_image_impl(
|
|
| 1397 |
style_display = STYLES.get(style, style)
|
| 1398 |
|
| 1399 |
# Preprocess
|
| 1400 |
-
input_tensor = preprocess_image(input_image).to(
|
| 1401 |
|
| 1402 |
# Stylize with timing
|
| 1403 |
start = time.perf_counter()
|
|
@@ -1405,7 +1409,7 @@ def stylize_image_impl(
|
|
| 1405 |
with torch.no_grad():
|
| 1406 |
output_tensor = model(input_tensor)
|
| 1407 |
|
| 1408 |
-
if
|
| 1409 |
torch.cuda.synchronize()
|
| 1410 |
|
| 1411 |
elapsed_ms = (time.perf_counter() - start) * 1000
|
|
@@ -1452,7 +1456,7 @@ def stylize_image_impl(
|
|
| 1452 |
| **Avg Time** | {stats['avg_ms']:.1f if stats else elapsed_ms:.1f} ms |
|
| 1453 |
| **Total Images** | {stats['total_inferences'] if stats else 1} |
|
| 1454 |
| **Size** | {width}x{height} |
|
| 1455 |
-
| **Device** | {
|
| 1456 |
|
| 1457 |
---
|
| 1458 |
{perf_tracker.get_comparison()}
|
|
@@ -1514,12 +1518,12 @@ def process_webcam_frame(image: Image.Image, style: str, backend: str) -> Image.
|
|
| 1514 |
else:
|
| 1515 |
model = load_model(style, backend)
|
| 1516 |
|
| 1517 |
-
input_tensor = preprocess_image(image).to(
|
| 1518 |
|
| 1519 |
with torch.no_grad():
|
| 1520 |
output_tensor = model(input_tensor)
|
| 1521 |
|
| 1522 |
-
if
|
| 1523 |
torch.cuda.synchronize()
|
| 1524 |
|
| 1525 |
output_image = postprocess_tensor(output_tensor.cpu())
|
|
@@ -1639,14 +1643,14 @@ def run_backend_comparison(style: str) -> str:
|
|
| 1639 |
# Test PyTorch backend
|
| 1640 |
try:
|
| 1641 |
model = load_model(style, 'pytorch')
|
| 1642 |
-
test_tensor = preprocess_image(test_img).to(
|
| 1643 |
|
| 1644 |
times = []
|
| 1645 |
for _ in range(5):
|
| 1646 |
start = time.perf_counter()
|
| 1647 |
with torch.no_grad():
|
| 1648 |
_ = model(test_tensor)
|
| 1649 |
-
if
|
| 1650 |
torch.cuda.synchronize()
|
| 1651 |
times.append((time.perf_counter() - start) * 1000)
|
| 1652 |
|
|
@@ -1657,14 +1661,14 @@ def run_backend_comparison(style: str) -> str:
|
|
| 1657 |
# Test CUDA backend
|
| 1658 |
try:
|
| 1659 |
model = load_model(style, 'cuda')
|
| 1660 |
-
test_tensor = preprocess_image(test_img).to(
|
| 1661 |
|
| 1662 |
times = []
|
| 1663 |
for _ in range(5):
|
| 1664 |
start = time.perf_counter()
|
| 1665 |
with torch.no_grad():
|
| 1666 |
_ = model(test_tensor)
|
| 1667 |
-
if
|
| 1668 |
torch.cuda.synchronize()
|
| 1669 |
times.append((time.perf_counter() - start) * 1000)
|
| 1670 |
|
|
@@ -1711,7 +1715,7 @@ def create_style_blend_output(
|
|
| 1711 |
model = get_blended_model(style1, style2, alpha, backend)
|
| 1712 |
|
| 1713 |
# Process
|
| 1714 |
-
input_tensor = preprocess_image(input_image).to(
|
| 1715 |
|
| 1716 |
with torch.no_grad():
|
| 1717 |
output_tensor = model(input_tensor)
|
|
|
|
| 70 |
# Device will be determined when needed within GPU tasks
|
| 71 |
_SPACES_ZERO_GPU = SPACES_AVAILABLE # From spaces import above
|
| 72 |
|
| 73 |
+
# Lazy device initialization for ZeroGPU compatibility
|
| 74 |
+
_device_cache = None
|
|
|
|
| 75 |
|
|
|
|
|
|
|
| 76 |
|
| 77 |
+
def get_device():
|
| 78 |
+
"""
|
| 79 |
+
Get the current device (lazy-loaded on ZeroGPU).
|
| 80 |
+
|
| 81 |
+
On ZeroGPU, this must be called within a GPU task context to properly
|
| 82 |
+
initialize CUDA. Calling this at module level will cause errors.
|
| 83 |
+
"""
|
| 84 |
+
global _device_cache
|
| 85 |
+
if _device_cache is None:
|
| 86 |
+
if torch.cuda.is_available():
|
| 87 |
+
_device_cache = torch.device('cuda')
|
| 88 |
+
else:
|
| 89 |
+
_device_cache = torch.device('cpu')
|
| 90 |
+
return _device_cache
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
# For backwards compatibility, keep DEVICE as a property
|
| 94 |
+
class _DeviceProperty:
|
| 95 |
+
"""Property that returns the actual device when accessed."""
|
| 96 |
|
| 97 |
def __str__(self):
|
| 98 |
+
return str(get_device())
|
|
|
|
| 99 |
|
| 100 |
def __repr__(self):
|
| 101 |
+
return repr(get_device())
|
| 102 |
+
|
| 103 |
+
@property
|
| 104 |
+
def type(self):
|
| 105 |
+
return get_device().type
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
def __eq__(self, other):
|
| 108 |
+
return str(get_device()) == str(other)
|
| 109 |
|
| 110 |
|
| 111 |
+
DEVICE = _DeviceProperty()
|
| 112 |
|
| 113 |
if _SPACES_ZERO_GPU:
|
| 114 |
print(f"Device: Will use CUDA within GPU tasks (ZeroGPU mode)")
|
| 115 |
else:
|
| 116 |
# Only access device if not ZeroGPU to avoid CUDA init
|
| 117 |
+
print(f"Device: {get_device()}")
|
|
|
|
| 118 |
if SPACES_AVAILABLE:
|
| 119 |
print("ZeroGPU support enabled")
|
| 120 |
|
|
|
|
| 292 |
"""Lazy load VGG feature extractor (with ZeroGPU support)"""
|
| 293 |
global _vgg_extractor
|
| 294 |
if _vgg_extractor is None:
|
| 295 |
+
_vgg_extractor = VGGFeatureExtractor().to(get_device())
|
| 296 |
_vgg_extractor.eval()
|
| 297 |
return _vgg_extractor
|
| 298 |
|
|
|
|
| 561 |
print(f"Loading {style} model with {backend} backend...")
|
| 562 |
model_path = get_model_path(style)
|
| 563 |
|
| 564 |
+
model = TransformerNet(num_residual_blocks=5, backend=backend).to(get_device())
|
| 565 |
model.load_checkpoint(str(model_path))
|
| 566 |
model.eval()
|
| 567 |
|
|
|
|
| 578 |
if _SPACES_ZERO_GPU:
|
| 579 |
print("Device: CUDA (ZeroGPU mode - lazy initialization)")
|
| 580 |
else:
|
| 581 |
+
print(f"Device: {get_device().type.upper()}")
|
|
|
|
| 582 |
print(f"CUDA Kernels: {'Available' if CUDA_KERNELS_AVAILABLE else 'Not Available (will compile on first GPU task)'}")
|
| 583 |
|
| 584 |
# Skip model preloading on ZeroGPU to avoid CUDA init in main process
|
|
|
|
| 617 |
model2 = load_model(style2, backend)
|
| 618 |
|
| 619 |
# Create new model
|
| 620 |
+
blended = TransformerNet(num_residual_blocks=5, backend=backend).to(get_device())
|
| 621 |
blended.eval()
|
| 622 |
|
| 623 |
# Blend weights
|
|
|
|
| 690 |
# Preprocess
|
| 691 |
import torchvision.transforms as transforms
|
| 692 |
transform = transforms.Compose([transforms.ToTensor()])
|
| 693 |
+
img_tensor = transform(image).unsqueeze(0).to(get_device())
|
| 694 |
|
| 695 |
# Convert mask to tensor
|
| 696 |
mask_np = np.array(mask)
|
| 697 |
mask_tensor = torch.from_numpy(mask_np).float() / 255.0
|
| 698 |
+
mask_tensor = mask_tensor.unsqueeze(0).unsqueeze(0).to(get_device())
|
| 699 |
|
| 700 |
# Stylize with both models
|
| 701 |
with torch.no_grad():
|
|
|
|
| 921 |
transforms.ToTensor(),
|
| 922 |
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
|
| 923 |
])
|
| 924 |
+
style_tensor = style_transform(style_image).unsqueeze(0).to(get_device())
|
| 925 |
|
| 926 |
# Extract style features from multiple layers
|
| 927 |
with torch.no_grad():
|
|
|
|
| 951 |
g = int(255 * x / 256)
|
| 952 |
content_img.putpixel((x, y), (r, g, 128))
|
| 953 |
|
| 954 |
+
content_tensor = style_transform(content_img).unsqueeze(0).to(get_device())
|
| 955 |
|
| 956 |
# Training loop
|
| 957 |
model.train()
|
|
|
|
| 1058 |
])
|
| 1059 |
|
| 1060 |
# Process style image
|
| 1061 |
+
style_tensor = transform(style_image).unsqueeze(0).to(get_device())
|
| 1062 |
|
| 1063 |
# Extract style features
|
| 1064 |
with torch.no_grad():
|
|
|
|
| 1070 |
progress.append("Style features extracted. Creating style model...")
|
| 1071 |
|
| 1072 |
# Create a new model and train it to match the style
|
| 1073 |
+
model = TransformerNet(num_residual_blocks=5, backend='auto').to(get_device())
|
| 1074 |
|
| 1075 |
# Use a simple content image for training the transform
|
| 1076 |
if content_image is None:
|
|
|
|
| 1081 |
content_image.putpixel((x, y), (x, y, 128))
|
| 1082 |
|
| 1083 |
content_image = content_image.convert('RGB')
|
| 1084 |
+
content_tensor = transform(content_image).unsqueeze(0).to(get_device())
|
| 1085 |
|
| 1086 |
# Extract content features
|
| 1087 |
with torch.no_grad():
|
|
|
|
| 1292 |
for backend_name, backend_key in [('PyTorch', 'pytorch'), ('CUDA Kernels', 'cuda')]:
|
| 1293 |
try:
|
| 1294 |
model = load_model(style, backend_key)
|
| 1295 |
+
test_tensor = preprocess_image(test_img).to(get_device())
|
| 1296 |
|
| 1297 |
times = []
|
| 1298 |
for _ in range(3):
|
| 1299 |
start = time.perf_counter()
|
| 1300 |
with torch.no_grad():
|
| 1301 |
_ = model(test_tensor)
|
| 1302 |
+
if get_device().type == 'cuda':
|
| 1303 |
torch.cuda.synchronize()
|
| 1304 |
times.append((time.perf_counter() - start) * 1000)
|
| 1305 |
|
|
|
|
| 1401 |
style_display = STYLES.get(style, style)
|
| 1402 |
|
| 1403 |
# Preprocess
|
| 1404 |
+
input_tensor = preprocess_image(input_image).to(get_device())
|
| 1405 |
|
| 1406 |
# Stylize with timing
|
| 1407 |
start = time.perf_counter()
|
|
|
|
| 1409 |
with torch.no_grad():
|
| 1410 |
output_tensor = model(input_tensor)
|
| 1411 |
|
| 1412 |
+
if get_device().type == 'cuda':
|
| 1413 |
torch.cuda.synchronize()
|
| 1414 |
|
| 1415 |
elapsed_ms = (time.perf_counter() - start) * 1000
|
|
|
|
| 1456 |
| **Avg Time** | {stats['avg_ms']:.1f if stats else elapsed_ms:.1f} ms |
|
| 1457 |
| **Total Images** | {stats['total_inferences'] if stats else 1} |
|
| 1458 |
| **Size** | {width}x{height} |
|
| 1459 |
+
| **Device** | {get_device().type.upper()} |
|
| 1460 |
|
| 1461 |
---
|
| 1462 |
{perf_tracker.get_comparison()}
|
|
|
|
| 1518 |
else:
|
| 1519 |
model = load_model(style, backend)
|
| 1520 |
|
| 1521 |
+
input_tensor = preprocess_image(image).to(get_device())
|
| 1522 |
|
| 1523 |
with torch.no_grad():
|
| 1524 |
output_tensor = model(input_tensor)
|
| 1525 |
|
| 1526 |
+
if get_device().type == 'cuda':
|
| 1527 |
torch.cuda.synchronize()
|
| 1528 |
|
| 1529 |
output_image = postprocess_tensor(output_tensor.cpu())
|
|
|
|
| 1643 |
# Test PyTorch backend
|
| 1644 |
try:
|
| 1645 |
model = load_model(style, 'pytorch')
|
| 1646 |
+
test_tensor = preprocess_image(test_img).to(get_device())
|
| 1647 |
|
| 1648 |
times = []
|
| 1649 |
for _ in range(5):
|
| 1650 |
start = time.perf_counter()
|
| 1651 |
with torch.no_grad():
|
| 1652 |
_ = model(test_tensor)
|
| 1653 |
+
if get_device().type == 'cuda':
|
| 1654 |
torch.cuda.synchronize()
|
| 1655 |
times.append((time.perf_counter() - start) * 1000)
|
| 1656 |
|
|
|
|
| 1661 |
# Test CUDA backend
|
| 1662 |
try:
|
| 1663 |
model = load_model(style, 'cuda')
|
| 1664 |
+
test_tensor = preprocess_image(test_img).to(get_device())
|
| 1665 |
|
| 1666 |
times = []
|
| 1667 |
for _ in range(5):
|
| 1668 |
start = time.perf_counter()
|
| 1669 |
with torch.no_grad():
|
| 1670 |
_ = model(test_tensor)
|
| 1671 |
+
if get_device().type == 'cuda':
|
| 1672 |
torch.cuda.synchronize()
|
| 1673 |
times.append((time.perf_counter() - start) * 1000)
|
| 1674 |
|
|
|
|
| 1715 |
model = get_blended_model(style1, style2, alpha, backend)
|
| 1716 |
|
| 1717 |
# Process
|
| 1718 |
+
input_tensor = preprocess_image(input_image).to(get_device())
|
| 1719 |
|
| 1720 |
with torch.no_grad():
|
| 1721 |
output_tensor = model(input_tensor)
|