StyleForge / README.md
github-actions[bot]
Deploy from GitHub - 2026-01-19 19:44:29
e1a9f73
---
title: StyleForge
emoji: 🎨
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit
---
# StyleForge: Real-Time Neural Style Transfer
Transform your photos into artwork using fast neural style transfer with custom CUDA kernel acceleration.
[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/olivialiau/styleforge)
[![GitHub](https://img.shields.io/badge/GitHub-StyleForge-blue?logo=github)](https://github.com/olivialiau/StyleForge)
[![License: MIT](https://img.shields.io/badge/License-MIT-purple.svg)](https://opensource.org/licenses/MIT)
## Overview
StyleForge is a high-performance neural style transfer application that combines cutting-edge machine learning with custom GPU optimization. It demonstrates end-to-end ML pipeline development, from model architecture to CUDA kernel optimization and web deployment.
### Key Features
| Feature | Description |
|---------|-------------|
| **4 Pre-trained Styles** | Candy, Mosaic, Rain Princess, Udnie |
| **AI-Powered Segmentation** πŸ†• | Automatic foreground/background detection using UΒ²-Net |
| **VGG19 Style Extraction** πŸ†• | Real style extraction using neural feature matching |
| **Style Blending** | Interpolate between styles in latent space |
| **Region Transfer** | Apply different styles to different image regions |
| **Real-time Webcam** | Live video style transformation |
| **CUDA Acceleration** | 8-9x faster with custom fused kernels |
| **Performance Dashboard** | Live charts comparing backends |
## Quick Start
1. **Upload** any image (JPG, PNG, WebP)
2. **Select** an artistic style
3. **Choose** your backend (Auto recommended)
4. **Click** "Stylize Image"
5. **Download** your result!
---
## Features Guide
### 1. Quick Style Transfer
The fastest way to transform your images.
- **Side-by-side comparison**: See original and stylized versions together
- **Watermark option**: Add branding for social sharing
- **Backend selection**: Choose between CUDA Kernels (fastest) or PyTorch (compatible)
### 2. Style Blending
Mix two styles together to create unique artistic combinations.
**How it works**: Style blending interpolates between model weights in the latent space.
- Blend ratio 0% = Pure Style 1
- Blend ratio 50% = Equal mix of both styles
- Blend ratio 100% = Pure Style 2
This demonstrates that neural styles exist in a continuous manifold where you can navigate between artistic styles.
### 3. Region Transfer πŸ†•
Apply different styles to different parts of your image using **AI-powered segmentation**.
**Mask Types**:
| Mask | Description | Use Case |
|------|-------------|----------|
| **AI: Foreground** | Automatically detect main subject | Portraits, product photos |
| **AI: Background** | Automatically detect background | Sky replacement, effects |
| Horizontal Split | Top/bottom division | Sky vs landscape |
| Vertical Split | Left/right division | Portrait effects |
| Center Circle | Circular focus region | Spotlight subjects |
| Corner Box | Top-left quadrant only | Creative framing |
| Full | Entire image | Standard transfer |
**AI Segmentation**: Uses the UΒ²-Net deep learning model for automatic subject detection without manual masking.
### 4. Create Style πŸ†•
**Extract** artistic style from any image using **VGG19 neural feature matching**.
**How it works**:
1. Upload an artwork image (painting, illustration, photo with artistic style)
2. VGG19 pre-trained network extracts style features (textures, colors, patterns)
3. A transformation network is fine-tuned to match those features
4. Your custom style model is saved and available in all tabs
This is **real style extraction** - the system learns the artistic characteristics from your image, not just copying an existing style.
**Tips for best results**:
- Use artwork with clear artistic direction (paintings, illustrations, stylized photos)
- Higher iterations = better style matching (but slower)
- GPU is recommended for training (100 iterations β‰ˆ 30-60 seconds)
### 5. Webcam Live
Real-time style transfer on your webcam feed.
**Requirements**:
- Browser camera permissions
- Recommended: GPU device for smooth performance
**Performance**:
- GPU: 20-30 FPS
- CPU: 5-10 FPS
### 6. Performance Dashboard
Monitor and compare inference performance across backends.
**Metrics tracked**:
- Inference time per image
- Average/min/max times
- Backend comparison (CUDA vs PyTorch)
- Speedup calculations
---
## Deep Dive: New AI Features πŸ†•
### AI-Powered Segmentation (UΒ²-Net)
**Overview**: StyleForge now uses the UΒ²-Net (U-shape 2-level U-Net) deep learning model for automatic foreground/background segmentation. This eliminates the need for manual masking when applying different styles to specific image regions.
#### How UΒ²-Net Works
```
Input Image (any size)
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Encoder (U-Net style) β”‚
β”‚ - Extracts multi-scale features β”‚
β”‚ - 6 encoder stages β”‚
β”‚ - Deep supervision paths β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Decoder β”‚
β”‚ - Reconstructs segmentation mask β”‚
β”‚ - Salient object detection β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
Binary Mask (256 levels)
↓
Foreground (white) / Background (black)
```
**Technical Details**:
- **Architecture**: UΒ²-Net with a deep encoder-decoder structure
- **Input**: RGB image of any size
- **Output**: Grayscale mask where white = foreground, black = background
- **Model Size**: ~176 MB pre-trained weights
- **Inference Time**: ~200-500ms per image (CPU), ~50-100ms (GPU)
**Why UΒ²-Net?**
- Trained on 20,000+ images with diverse subjects
- Excellent at detecting humans, animals, objects, and products
- Handles complex backgrounds and edges
- Works without requiring bounding boxes or user input
**Use Cases**:
- **Portrait Photography**: Style the subject differently from the background
- **Product Photography**: Apply artistic effects to products while keeping clean backgrounds
- **Creative Composites**: Apply different artistic styles to foreground vs background
#### Gram Matrices: Representing Style
The Gram matrix is computed from the feature activations:
```
F = feature map of shape (C, H, W)
Gram(F)[i,j] = Ξ£_k F[i,k] β‹… F[j,k]
```
This captures:
- **Texture information**: How features correlate spatially
- **Color patterns**: Which colors appear together
- **Brush strokes**: Directionality and scale of textures
- **Style signature**: Unique fingerprint of the artistic style
#### Fine-Tuning Process
The system fine-tunes a pre-trained Fast Style Transfer model:
1. **Load base model** (e.g., Udnie style)
2. **Freeze early layers** (preserve low-level transformations)
3. **Train on style loss** using the extracted Gram matrices
4. **Iterate** with Adam optimizer (lr=0.001)
5. **Save** as a reusable `.pth` file
```
Base Model β†’ Extracted Style Features β†’ Fine-tuned Model
↓ ↓ ↓
Udnie Starry Night Custom "Starry Udnie"
```
**Training Time**:
- 100 iterations: ~30-60 seconds (GPU)
- 200 iterations: ~60-120 seconds (GPU)
- More iterations = better style matching
**Why VGG19?**
- Pre-trained on ImageNet (1M+ images)
- Learned rich feature representations
- Standard in style transfer research (Gatys et al., Johnson et al.)
- Captures both low-level (textures) and high-level (patterns) features
---
## Technical Details
### Architecture
StyleForge uses the **Fast Neural Style Transfer** architecture from Johnson et al.:
```
Input Image (3 x H x W)
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Encoder (3 Conv + InstanceNorm) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Transformer (5 Residual Blocks) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Decoder (3 Upsample + InstanceNorm) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
Output Image (3 x H x W)
```
**Layers**:
- **ConvLayer**: Conv2d β†’ InstanceNorm β†’ ReLU
- **ResidualBlock**: Two ConvLayers with skip connection
- **UpsampleConvLayer**: Upsample β†’ Conv2d β†’ InstanceNorm β†’ ReLU
### CUDA Kernel Optimization
Custom CUDA kernels provide 8-9x speedup over PyTorch baseline.
**Fused InstanceNorm Kernel**:
- Combines mean, variance, normalization, and affine transform into single kernel
- Uses `float4` vectorized loads for 4x memory bandwidth
- Warp-level parallel reductions
- Shared memory tiling for reduced global memory traffic
**Performance Comparison** (512x512 image):
| Backend | Time | Speedup |
|---------|------|---------|
| PyTorch | ~80ms | 1.0x |
| CUDA Kernels | ~10ms | 8.0x |
### ML Concepts Demonstrated
| Concept | Implementation |
|---------|----------------|
| **Style Transfer** | Neural artistic stylization |
| **Latent Space** | Style blending shows continuous style space |
| **Conditional Generation** | Region-based style application |
| **Transfer Learning** | Custom styles from base models |
| **Performance Optimization** | CUDA kernels, JIT compilation, caching |
| **Model Deployment** | Gradio web interface, CI/CD pipeline |
---
## Styles Gallery
| Style | Description | Best For |
|-------|-------------|----------|
| 🍬 **Candy** | Bright, colorful pop-art transformation | Portraits, vibrant scenes |
| 🎨 **Mosaic** | Fragmented tile-like reconstruction | Landscapes, architecture |
| 🌧️ **Rain Princess** | Moody impressionistic style | Moody, atmospheric photos |
| πŸ–ΌοΈ **Udnie** | Bold abstract expressionist | High-contrast images |
---
## Performance Benchmarks
### Inference Time (milliseconds)
| Resolution | CUDA | PyTorch | Speedup |
|------------|------|---------|---------|
| 256x256 | 5ms | 40ms | 8.0x |
| 512x512 | 10ms | 80ms | 8.0x |
| 1024x1024 | 35ms | 280ms | 8.0x |
### FPS Performance (Webcam)
| Device | Resolution | FPS |
|--------|------------|-----|
| NVIDIA GPU | 640x480 | 25-30 |
| CPU (Modern) | 640x480 | 5-10 |
---
## Run Locally
### Using pip
```bash
git clone https://github.com/olivialiau/StyleForge
cd StyleForge/huggingface-space
pip install -r requirements.txt
python app.py
```
### Using conda (recommended)
```bash
git clone https://github.com/olivialiau/StyleForge
cd StyleForge/huggingface-space
conda env create -f environment.yml
conda activate styleforge
python app.py
```
Open http://localhost:7860 in your browser.
---
## API Usage
You can use StyleForge programmatically:
```python
import requests
from PIL import Image
from io import BytesIO
# Prepare image
img = Image.open("path/to/image.jpg")
# Call API
response = requests.post(
"https://olivialiau-styleforge.hf.space/api/predict",
json={
"data": [
{"name": "image.jpg", "data": "base64_encoded_image"},
"candy", # style
"auto", # backend
False, # show_comparison
False # add_watermark
]
}
)
result = response.json()
output_img = Image.open(BytesIO(base64.b64decode(result["data"][0])))
```
---
## Embed in Your Website
```html
<iframe
src="https://olivialiau-styleforge.hf.space"
frameborder="0"
width="100%"
height="850"
allow="camera; microphone"
></iframe>
```
---
## Project Structure
```
StyleForge/
β”œβ”€β”€ huggingface-space/
β”‚ β”œβ”€β”€ app.py # Main Gradio application
β”‚ β”œβ”€β”€ requirements.txt # Python dependencies
β”‚ β”œβ”€β”€ README.md # This file
β”‚ β”œβ”€β”€ kernels/ # Custom CUDA kernels
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ β”œβ”€β”€ cuda_build.py # JIT compilation utilities
β”‚ β”‚ β”œβ”€β”€ instance_norm_wrapper.py
β”‚ β”‚ └── instance_norm.cu # CUDA source code
β”‚ β”œβ”€β”€ models/ # Model weights (auto-downloaded)
β”‚ └── custom_styles/ # User-trained styles
β”œβ”€β”€ .github/
β”‚ └── workflows/
β”‚ └── deploy-huggingface.yml # CI/CD pipeline
└── saved_models/ # Local model cache
```
---
## Development
### CI/CD Pipeline
The project uses GitHub Actions for automatic deployment to Hugging Face Spaces:
```yaml
# .github/workflows/deploy-huggingface.yml
on:
push:
branches: [main]
paths: ['huggingface-space/**']
```
Push to `main` branch β†’ Auto-deploys to Hugging Face Space.
### Adding New Styles
1. Train a model using the original repo's training script
2. Save weights as `.pth` file
3. Add to `models/` directory or update URL map in `get_model_path()`
4. Add entry to `STYLES` and `STYLE_DESCRIPTIONS` dictionaries
---
## FAQ
**Q: How does the style extraction work?**
A: The new VGG19-based style extraction uses a pre-trained neural network to analyze artistic features (textures, brush strokes, color patterns) from your artwork. It then fine-tunes a transformation network to reproduce those features. This is the same technique used in the original neural style transfer research.
**Q: What's the difference between backends?**
A:
- **Auto**: Uses CUDA if available, otherwise PyTorch
- **CUDA Kernels**: Fastest, requires GPU and compilation
- **PyTorch**: Compatible fallback, works on CPU
**Q: Can I use this commercially?**
A: Yes! StyleForge is MIT licensed. The pre-trained models are from the fast-neural-style-transfer repo.
**Q: How large can my input image be?**
A: Any size, but larger images take longer. Webcam mode auto-scales to 640px max dimension for performance.
**Q: Why does compilation take time on first run?**
A: CUDA kernels are JIT-compiled on first use. This only happens once per session.
---
## Acknowledgments
- [Johnson et al.](https://arxiv.org/abs/1603.08155) - Perceptual Losses for Real-Time Style Transfer
- [yakhyo/fast-neural-style-transfer](https://github.com/yakhyo/fast-neural-style-transfer) - Pre-trained model weights
- [Rembg](https://github.com/danielgatis/rembg) - AI background removal (UΒ²-Net)
- [VGG19](https://pytorch.org/vision/stable/models.html) - Pre-trained feature extractor for style extraction
- [Hugging Face](https://huggingface.co) - Spaces hosting platform
- [Gradio](https://gradio.app) - UI framework
- [PyTorch](https://pytorch.org) - Deep learning framework
---
## Author
**Olivia** - USC Computer Science
[GitHub](https://github.com/olivialiau/StyleForge)
---
## License
MIT License - see [LICENSE](LICENSE) for details.
---
Made with ❀️ and CUDA