Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
6.5.1
title: StyleForge
emoji: π¨
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit
StyleForge: Real-Time Neural Style Transfer
Transform your photos into artwork using fast neural style transfer with custom CUDA kernel acceleration.
Overview
StyleForge is a high-performance neural style transfer application that combines cutting-edge machine learning with custom GPU optimization. It demonstrates end-to-end ML pipeline development, from model architecture to CUDA kernel optimization and web deployment.
Key Features
| Feature | Description |
|---|---|
| 4 Pre-trained Styles | Candy, Mosaic, Rain Princess, Udnie |
| AI-Powered Segmentation π | Automatic foreground/background detection using UΒ²-Net |
| VGG19 Style Extraction π | Real style extraction using neural feature matching |
| Style Blending | Interpolate between styles in latent space |
| Region Transfer | Apply different styles to different image regions |
| Real-time Webcam | Live video style transformation |
| CUDA Acceleration | 8-9x faster with custom fused kernels |
| Performance Dashboard | Live charts comparing backends |
Quick Start
- Upload any image (JPG, PNG, WebP)
- Select an artistic style
- Choose your backend (Auto recommended)
- Click "Stylize Image"
- Download your result!
Features Guide
1. Quick Style Transfer
The fastest way to transform your images.
- Side-by-side comparison: See original and stylized versions together
- Watermark option: Add branding for social sharing
- Backend selection: Choose between CUDA Kernels (fastest) or PyTorch (compatible)
2. Style Blending
Mix two styles together to create unique artistic combinations.
How it works: Style blending interpolates between model weights in the latent space.
- Blend ratio 0% = Pure Style 1
- Blend ratio 50% = Equal mix of both styles
- Blend ratio 100% = Pure Style 2
This demonstrates that neural styles exist in a continuous manifold where you can navigate between artistic styles.
3. Region Transfer π
Apply different styles to different parts of your image using AI-powered segmentation.
Mask Types:
| Mask | Description | Use Case |
|---|---|---|
| AI: Foreground | Automatically detect main subject | Portraits, product photos |
| AI: Background | Automatically detect background | Sky replacement, effects |
| Horizontal Split | Top/bottom division | Sky vs landscape |
| Vertical Split | Left/right division | Portrait effects |
| Center Circle | Circular focus region | Spotlight subjects |
| Corner Box | Top-left quadrant only | Creative framing |
| Full | Entire image | Standard transfer |
AI Segmentation: Uses the UΒ²-Net deep learning model for automatic subject detection without manual masking.
4. Create Style π
Extract artistic style from any image using VGG19 neural feature matching.
How it works:
- Upload an artwork image (painting, illustration, photo with artistic style)
- VGG19 pre-trained network extracts style features (textures, colors, patterns)
- A transformation network is fine-tuned to match those features
- Your custom style model is saved and available in all tabs
This is real style extraction - the system learns the artistic characteristics from your image, not just copying an existing style.
Tips for best results:
- Use artwork with clear artistic direction (paintings, illustrations, stylized photos)
- Higher iterations = better style matching (but slower)
- GPU is recommended for training (100 iterations β 30-60 seconds)
5. Webcam Live
Real-time style transfer on your webcam feed.
Requirements:
- Browser camera permissions
- Recommended: GPU device for smooth performance
Performance:
- GPU: 20-30 FPS
- CPU: 5-10 FPS
6. Performance Dashboard
Monitor and compare inference performance across backends.
Metrics tracked:
- Inference time per image
- Average/min/max times
- Backend comparison (CUDA vs PyTorch)
- Speedup calculations
Deep Dive: New AI Features π
AI-Powered Segmentation (UΒ²-Net)
Overview: StyleForge now uses the UΒ²-Net (U-shape 2-level U-Net) deep learning model for automatic foreground/background segmentation. This eliminates the need for manual masking when applying different styles to specific image regions.
How UΒ²-Net Works
Input Image (any size)
β
βββββββββββββββββββββββββββββββββββ
β Encoder (U-Net style) β
β - Extracts multi-scale features β
β - 6 encoder stages β
β - Deep supervision paths β
βββββββββββββββββββββββββββββββββββ€
β Decoder β
β - Reconstructs segmentation mask β
β - Salient object detection β
βββββββββββββββββββββββββββββββββββ
β
Binary Mask (256 levels)
β
Foreground (white) / Background (black)
Technical Details:
- Architecture: UΒ²-Net with a deep encoder-decoder structure
- Input: RGB image of any size
- Output: Grayscale mask where white = foreground, black = background
- Model Size: ~176 MB pre-trained weights
- Inference Time: ~200-500ms per image (CPU), ~50-100ms (GPU)
Why UΒ²-Net?
- Trained on 20,000+ images with diverse subjects
- Excellent at detecting humans, animals, objects, and products
- Handles complex backgrounds and edges
- Works without requiring bounding boxes or user input
Use Cases:
- Portrait Photography: Style the subject differently from the background
- Product Photography: Apply artistic effects to products while keeping clean backgrounds
- Creative Composites: Apply different artistic styles to foreground vs background
Gram Matrices: Representing Style
The Gram matrix is computed from the feature activations:
F = feature map of shape (C, H, W)
Gram(F)[i,j] = Ξ£_k F[i,k] β
F[j,k]
This captures:
- Texture information: How features correlate spatially
- Color patterns: Which colors appear together
- Brush strokes: Directionality and scale of textures
- Style signature: Unique fingerprint of the artistic style
Fine-Tuning Process
The system fine-tunes a pre-trained Fast Style Transfer model:
- Load base model (e.g., Udnie style)
- Freeze early layers (preserve low-level transformations)
- Train on style loss using the extracted Gram matrices
- Iterate with Adam optimizer (lr=0.001)
- Save as a reusable
.pthfile
Base Model β Extracted Style Features β Fine-tuned Model
β β β
Udnie Starry Night Custom "Starry Udnie"
Training Time:
- 100 iterations: ~30-60 seconds (GPU)
- 200 iterations: ~60-120 seconds (GPU)
- More iterations = better style matching
Why VGG19?
- Pre-trained on ImageNet (1M+ images)
- Learned rich feature representations
- Standard in style transfer research (Gatys et al., Johnson et al.)
- Captures both low-level (textures) and high-level (patterns) features
Technical Details
Architecture
StyleForge uses the Fast Neural Style Transfer architecture from Johnson et al.:
Input Image (3 x H x W)
β
βββββββββββββββββββββββββββββββββββ
β Encoder (3 Conv + InstanceNorm) β
βββββββββββββββββββββββββββββββββββ€
β Transformer (5 Residual Blocks) β
βββββββββββββββββββββββββββββββββββ€
β Decoder (3 Upsample + InstanceNorm) β
βββββββββββββββββββββββββββββββββββ
β
Output Image (3 x H x W)
Layers:
- ConvLayer: Conv2d β InstanceNorm β ReLU
- ResidualBlock: Two ConvLayers with skip connection
- UpsampleConvLayer: Upsample β Conv2d β InstanceNorm β ReLU
CUDA Kernel Optimization
Custom CUDA kernels provide 8-9x speedup over PyTorch baseline.
Fused InstanceNorm Kernel:
- Combines mean, variance, normalization, and affine transform into single kernel
- Uses
float4vectorized loads for 4x memory bandwidth - Warp-level parallel reductions
- Shared memory tiling for reduced global memory traffic
Performance Comparison (512x512 image):
| Backend | Time | Speedup |
|---|---|---|
| PyTorch | ~80ms | 1.0x |
| CUDA Kernels | ~10ms | 8.0x |
ML Concepts Demonstrated
| Concept | Implementation |
|---|---|
| Style Transfer | Neural artistic stylization |
| Latent Space | Style blending shows continuous style space |
| Conditional Generation | Region-based style application |
| Transfer Learning | Custom styles from base models |
| Performance Optimization | CUDA kernels, JIT compilation, caching |
| Model Deployment | Gradio web interface, CI/CD pipeline |
Styles Gallery
| Style | Description | Best For |
|---|---|---|
| π¬ Candy | Bright, colorful pop-art transformation | Portraits, vibrant scenes |
| π¨ Mosaic | Fragmented tile-like reconstruction | Landscapes, architecture |
| π§οΈ Rain Princess | Moody impressionistic style | Moody, atmospheric photos |
| πΌοΈ Udnie | Bold abstract expressionist | High-contrast images |
Performance Benchmarks
Inference Time (milliseconds)
| Resolution | CUDA | PyTorch | Speedup |
|---|---|---|---|
| 256x256 | 5ms | 40ms | 8.0x |
| 512x512 | 10ms | 80ms | 8.0x |
| 1024x1024 | 35ms | 280ms | 8.0x |
FPS Performance (Webcam)
| Device | Resolution | FPS |
|---|---|---|
| NVIDIA GPU | 640x480 | 25-30 |
| CPU (Modern) | 640x480 | 5-10 |
Run Locally
Using pip
git clone https://github.com/olivialiau/StyleForge
cd StyleForge/huggingface-space
pip install -r requirements.txt
python app.py
Using conda (recommended)
git clone https://github.com/olivialiau/StyleForge
cd StyleForge/huggingface-space
conda env create -f environment.yml
conda activate styleforge
python app.py
Open http://localhost:7860 in your browser.
API Usage
You can use StyleForge programmatically:
import requests
from PIL import Image
from io import BytesIO
# Prepare image
img = Image.open("path/to/image.jpg")
# Call API
response = requests.post(
"https://olivialiau-styleforge.hf.space/api/predict",
json={
"data": [
{"name": "image.jpg", "data": "base64_encoded_image"},
"candy", # style
"auto", # backend
False, # show_comparison
False # add_watermark
]
}
)
result = response.json()
output_img = Image.open(BytesIO(base64.b64decode(result["data"][0])))
Embed in Your Website
<iframe
src="https://olivialiau-styleforge.hf.space"
frameborder="0"
width="100%"
height="850"
allow="camera; microphone"
></iframe>
Project Structure
StyleForge/
βββ huggingface-space/
β βββ app.py # Main Gradio application
β βββ requirements.txt # Python dependencies
β βββ README.md # This file
β βββ kernels/ # Custom CUDA kernels
β β βββ __init__.py
β β βββ cuda_build.py # JIT compilation utilities
β β βββ instance_norm_wrapper.py
β β βββ instance_norm.cu # CUDA source code
β βββ models/ # Model weights (auto-downloaded)
β βββ custom_styles/ # User-trained styles
βββ .github/
β βββ workflows/
β βββ deploy-huggingface.yml # CI/CD pipeline
βββ saved_models/ # Local model cache
Development
CI/CD Pipeline
The project uses GitHub Actions for automatic deployment to Hugging Face Spaces:
# .github/workflows/deploy-huggingface.yml
on:
push:
branches: [main]
paths: ['huggingface-space/**']
Push to main branch β Auto-deploys to Hugging Face Space.
Adding New Styles
- Train a model using the original repo's training script
- Save weights as
.pthfile - Add to
models/directory or update URL map inget_model_path() - Add entry to
STYLESandSTYLE_DESCRIPTIONSdictionaries
FAQ
Q: How does the style extraction work?
A: The new VGG19-based style extraction uses a pre-trained neural network to analyze artistic features (textures, brush strokes, color patterns) from your artwork. It then fine-tunes a transformation network to reproduce those features. This is the same technique used in the original neural style transfer research.
Q: What's the difference between backends?
A:
- Auto: Uses CUDA if available, otherwise PyTorch
- CUDA Kernels: Fastest, requires GPU and compilation
- PyTorch: Compatible fallback, works on CPU
Q: Can I use this commercially?
A: Yes! StyleForge is MIT licensed. The pre-trained models are from the fast-neural-style-transfer repo.
Q: How large can my input image be?
A: Any size, but larger images take longer. Webcam mode auto-scales to 640px max dimension for performance.
Q: Why does compilation take time on first run?
A: CUDA kernels are JIT-compiled on first use. This only happens once per session.
Acknowledgments
- Johnson et al. - Perceptual Losses for Real-Time Style Transfer
- yakhyo/fast-neural-style-transfer - Pre-trained model weights
- Rembg - AI background removal (UΒ²-Net)
- VGG19 - Pre-trained feature extractor for style extraction
- Hugging Face - Spaces hosting platform
- Gradio - UI framework
- PyTorch - Deep learning framework
Author
Olivia - USC Computer Science
License
MIT License - see LICENSE for details.
Made with β€οΈ and CUDA