Spaces:
Sleeping
Sleeping
| title: StyleForge | |
| emoji: π¨ | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.12.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # StyleForge: Real-Time Neural Style Transfer | |
| Transform your photos into artwork using fast neural style transfer with custom CUDA kernel acceleration. | |
| [](https://huggingface.co/spaces/olivialiau/styleforge) | |
| [](https://github.com/olivialiau/StyleForge) | |
| [](https://opensource.org/licenses/MIT) | |
| ## Overview | |
| StyleForge is a high-performance neural style transfer application that combines cutting-edge machine learning with custom GPU optimization. It demonstrates end-to-end ML pipeline development, from model architecture to CUDA kernel optimization and web deployment. | |
| ### Key Features | |
| | Feature | Description | | |
| |---------|-------------| | |
| | **4 Pre-trained Styles** | Candy, Mosaic, Rain Princess, Udnie | | |
| | **AI-Powered Segmentation** π | Automatic foreground/background detection using UΒ²-Net | | |
| | **VGG19 Style Extraction** π | Real style extraction using neural feature matching | | |
| | **Style Blending** | Interpolate between styles in latent space | | |
| | **Region Transfer** | Apply different styles to different image regions | | |
| | **Real-time Webcam** | Live video style transformation | | |
| | **CUDA Acceleration** | 8-9x faster with custom fused kernels | | |
| | **Performance Dashboard** | Live charts comparing backends | | |
| ## Quick Start | |
| 1. **Upload** any image (JPG, PNG, WebP) | |
| 2. **Select** an artistic style | |
| 3. **Choose** your backend (Auto recommended) | |
| 4. **Click** "Stylize Image" | |
| 5. **Download** your result! | |
| --- | |
| ## Features Guide | |
| ### 1. Quick Style Transfer | |
| The fastest way to transform your images. | |
| - **Side-by-side comparison**: See original and stylized versions together | |
| - **Watermark option**: Add branding for social sharing | |
| - **Backend selection**: Choose between CUDA Kernels (fastest) or PyTorch (compatible) | |
| ### 2. Style Blending | |
| Mix two styles together to create unique artistic combinations. | |
| **How it works**: Style blending interpolates between model weights in the latent space. | |
| - Blend ratio 0% = Pure Style 1 | |
| - Blend ratio 50% = Equal mix of both styles | |
| - Blend ratio 100% = Pure Style 2 | |
| This demonstrates that neural styles exist in a continuous manifold where you can navigate between artistic styles. | |
| ### 3. Region Transfer π | |
| Apply different styles to different parts of your image using **AI-powered segmentation**. | |
| **Mask Types**: | |
| | Mask | Description | Use Case | | |
| |------|-------------|----------| | |
| | **AI: Foreground** | Automatically detect main subject | Portraits, product photos | | |
| | **AI: Background** | Automatically detect background | Sky replacement, effects | | |
| | Horizontal Split | Top/bottom division | Sky vs landscape | | |
| | Vertical Split | Left/right division | Portrait effects | | |
| | Center Circle | Circular focus region | Spotlight subjects | | |
| | Corner Box | Top-left quadrant only | Creative framing | | |
| | Full | Entire image | Standard transfer | | |
| **AI Segmentation**: Uses the UΒ²-Net deep learning model for automatic subject detection without manual masking. | |
| ### 4. Create Style π | |
| **Extract** artistic style from any image using **VGG19 neural feature matching**. | |
| **How it works**: | |
| 1. Upload an artwork image (painting, illustration, photo with artistic style) | |
| 2. VGG19 pre-trained network extracts style features (textures, colors, patterns) | |
| 3. A transformation network is fine-tuned to match those features | |
| 4. Your custom style model is saved and available in all tabs | |
| This is **real style extraction** - the system learns the artistic characteristics from your image, not just copying an existing style. | |
| **Tips for best results**: | |
| - Use artwork with clear artistic direction (paintings, illustrations, stylized photos) | |
| - Higher iterations = better style matching (but slower) | |
| - GPU is recommended for training (100 iterations β 30-60 seconds) | |
| ### 5. Webcam Live | |
| Real-time style transfer on your webcam feed. | |
| **Requirements**: | |
| - Browser camera permissions | |
| - Recommended: GPU device for smooth performance | |
| **Performance**: | |
| - GPU: 20-30 FPS | |
| - CPU: 5-10 FPS | |
| ### 6. Performance Dashboard | |
| Monitor and compare inference performance across backends. | |
| **Metrics tracked**: | |
| - Inference time per image | |
| - Average/min/max times | |
| - Backend comparison (CUDA vs PyTorch) | |
| - Speedup calculations | |
| --- | |
| ## Deep Dive: New AI Features π | |
| ### AI-Powered Segmentation (UΒ²-Net) | |
| **Overview**: StyleForge now uses the UΒ²-Net (U-shape 2-level U-Net) deep learning model for automatic foreground/background segmentation. This eliminates the need for manual masking when applying different styles to specific image regions. | |
| #### How UΒ²-Net Works | |
| ``` | |
| Input Image (any size) | |
| β | |
| βββββββββββββββββββββββββββββββββββ | |
| β Encoder (U-Net style) β | |
| β - Extracts multi-scale features β | |
| β - 6 encoder stages β | |
| β - Deep supervision paths β | |
| βββββββββββββββββββββββββββββββββββ€ | |
| β Decoder β | |
| β - Reconstructs segmentation mask β | |
| β - Salient object detection β | |
| βββββββββββββββββββββββββββββββββββ | |
| β | |
| Binary Mask (256 levels) | |
| β | |
| Foreground (white) / Background (black) | |
| ``` | |
| **Technical Details**: | |
| - **Architecture**: UΒ²-Net with a deep encoder-decoder structure | |
| - **Input**: RGB image of any size | |
| - **Output**: Grayscale mask where white = foreground, black = background | |
| - **Model Size**: ~176 MB pre-trained weights | |
| - **Inference Time**: ~200-500ms per image (CPU), ~50-100ms (GPU) | |
| **Why UΒ²-Net?** | |
| - Trained on 20,000+ images with diverse subjects | |
| - Excellent at detecting humans, animals, objects, and products | |
| - Handles complex backgrounds and edges | |
| - Works without requiring bounding boxes or user input | |
| **Use Cases**: | |
| - **Portrait Photography**: Style the subject differently from the background | |
| - **Product Photography**: Apply artistic effects to products while keeping clean backgrounds | |
| - **Creative Composites**: Apply different artistic styles to foreground vs background | |
| #### Gram Matrices: Representing Style | |
| The Gram matrix is computed from the feature activations: | |
| ``` | |
| F = feature map of shape (C, H, W) | |
| Gram(F)[i,j] = Ξ£_k F[i,k] β F[j,k] | |
| ``` | |
| This captures: | |
| - **Texture information**: How features correlate spatially | |
| - **Color patterns**: Which colors appear together | |
| - **Brush strokes**: Directionality and scale of textures | |
| - **Style signature**: Unique fingerprint of the artistic style | |
| #### Fine-Tuning Process | |
| The system fine-tunes a pre-trained Fast Style Transfer model: | |
| 1. **Load base model** (e.g., Udnie style) | |
| 2. **Freeze early layers** (preserve low-level transformations) | |
| 3. **Train on style loss** using the extracted Gram matrices | |
| 4. **Iterate** with Adam optimizer (lr=0.001) | |
| 5. **Save** as a reusable `.pth` file | |
| ``` | |
| Base Model β Extracted Style Features β Fine-tuned Model | |
| β β β | |
| Udnie Starry Night Custom "Starry Udnie" | |
| ``` | |
| **Training Time**: | |
| - 100 iterations: ~30-60 seconds (GPU) | |
| - 200 iterations: ~60-120 seconds (GPU) | |
| - More iterations = better style matching | |
| **Why VGG19?** | |
| - Pre-trained on ImageNet (1M+ images) | |
| - Learned rich feature representations | |
| - Standard in style transfer research (Gatys et al., Johnson et al.) | |
| - Captures both low-level (textures) and high-level (patterns) features | |
| --- | |
| ## Technical Details | |
| ### Architecture | |
| StyleForge uses the **Fast Neural Style Transfer** architecture from Johnson et al.: | |
| ``` | |
| Input Image (3 x H x W) | |
| β | |
| βββββββββββββββββββββββββββββββββββ | |
| β Encoder (3 Conv + InstanceNorm) β | |
| βββββββββββββββββββββββββββββββββββ€ | |
| β Transformer (5 Residual Blocks) β | |
| βββββββββββββββββββββββββββββββββββ€ | |
| β Decoder (3 Upsample + InstanceNorm) β | |
| βββββββββββββββββββββββββββββββββββ | |
| β | |
| Output Image (3 x H x W) | |
| ``` | |
| **Layers**: | |
| - **ConvLayer**: Conv2d β InstanceNorm β ReLU | |
| - **ResidualBlock**: Two ConvLayers with skip connection | |
| - **UpsampleConvLayer**: Upsample β Conv2d β InstanceNorm β ReLU | |
| ### CUDA Kernel Optimization | |
| Custom CUDA kernels provide 8-9x speedup over PyTorch baseline. | |
| **Fused InstanceNorm Kernel**: | |
| - Combines mean, variance, normalization, and affine transform into single kernel | |
| - Uses `float4` vectorized loads for 4x memory bandwidth | |
| - Warp-level parallel reductions | |
| - Shared memory tiling for reduced global memory traffic | |
| **Performance Comparison** (512x512 image): | |
| | Backend | Time | Speedup | | |
| |---------|------|---------| | |
| | PyTorch | ~80ms | 1.0x | | |
| | CUDA Kernels | ~10ms | 8.0x | | |
| ### ML Concepts Demonstrated | |
| | Concept | Implementation | | |
| |---------|----------------| | |
| | **Style Transfer** | Neural artistic stylization | | |
| | **Latent Space** | Style blending shows continuous style space | | |
| | **Conditional Generation** | Region-based style application | | |
| | **Transfer Learning** | Custom styles from base models | | |
| | **Performance Optimization** | CUDA kernels, JIT compilation, caching | | |
| | **Model Deployment** | Gradio web interface, CI/CD pipeline | | |
| --- | |
| ## Styles Gallery | |
| | Style | Description | Best For | | |
| |-------|-------------|----------| | |
| | π¬ **Candy** | Bright, colorful pop-art transformation | Portraits, vibrant scenes | | |
| | π¨ **Mosaic** | Fragmented tile-like reconstruction | Landscapes, architecture | | |
| | π§οΈ **Rain Princess** | Moody impressionistic style | Moody, atmospheric photos | | |
| | πΌοΈ **Udnie** | Bold abstract expressionist | High-contrast images | | |
| --- | |
| ## Performance Benchmarks | |
| ### Inference Time (milliseconds) | |
| | Resolution | CUDA | PyTorch | Speedup | | |
| |------------|------|---------|---------| | |
| | 256x256 | 5ms | 40ms | 8.0x | | |
| | 512x512 | 10ms | 80ms | 8.0x | | |
| | 1024x1024 | 35ms | 280ms | 8.0x | | |
| ### FPS Performance (Webcam) | |
| | Device | Resolution | FPS | | |
| |--------|------------|-----| | |
| | NVIDIA GPU | 640x480 | 25-30 | | |
| | CPU (Modern) | 640x480 | 5-10 | | |
| --- | |
| ## Run Locally | |
| ### Using pip | |
| ```bash | |
| git clone https://github.com/olivialiau/StyleForge | |
| cd StyleForge/huggingface-space | |
| pip install -r requirements.txt | |
| python app.py | |
| ``` | |
| ### Using conda (recommended) | |
| ```bash | |
| git clone https://github.com/olivialiau/StyleForge | |
| cd StyleForge/huggingface-space | |
| conda env create -f environment.yml | |
| conda activate styleforge | |
| python app.py | |
| ``` | |
| Open http://localhost:7860 in your browser. | |
| --- | |
| ## API Usage | |
| You can use StyleForge programmatically: | |
| ```python | |
| import requests | |
| from PIL import Image | |
| from io import BytesIO | |
| # Prepare image | |
| img = Image.open("path/to/image.jpg") | |
| # Call API | |
| response = requests.post( | |
| "https://olivialiau-styleforge.hf.space/api/predict", | |
| json={ | |
| "data": [ | |
| {"name": "image.jpg", "data": "base64_encoded_image"}, | |
| "candy", # style | |
| "auto", # backend | |
| False, # show_comparison | |
| False # add_watermark | |
| ] | |
| } | |
| ) | |
| result = response.json() | |
| output_img = Image.open(BytesIO(base64.b64decode(result["data"][0]))) | |
| ``` | |
| --- | |
| ## Embed in Your Website | |
| ```html | |
| <iframe | |
| src="https://olivialiau-styleforge.hf.space" | |
| frameborder="0" | |
| width="100%" | |
| height="850" | |
| allow="camera; microphone" | |
| ></iframe> | |
| ``` | |
| --- | |
| ## Project Structure | |
| ``` | |
| StyleForge/ | |
| βββ huggingface-space/ | |
| β βββ app.py # Main Gradio application | |
| β βββ requirements.txt # Python dependencies | |
| β βββ README.md # This file | |
| β βββ kernels/ # Custom CUDA kernels | |
| β β βββ __init__.py | |
| β β βββ cuda_build.py # JIT compilation utilities | |
| β β βββ instance_norm_wrapper.py | |
| β β βββ instance_norm.cu # CUDA source code | |
| β βββ models/ # Model weights (auto-downloaded) | |
| β βββ custom_styles/ # User-trained styles | |
| βββ .github/ | |
| β βββ workflows/ | |
| β βββ deploy-huggingface.yml # CI/CD pipeline | |
| βββ saved_models/ # Local model cache | |
| ``` | |
| --- | |
| ## Development | |
| ### CI/CD Pipeline | |
| The project uses GitHub Actions for automatic deployment to Hugging Face Spaces: | |
| ```yaml | |
| # .github/workflows/deploy-huggingface.yml | |
| on: | |
| push: | |
| branches: [main] | |
| paths: ['huggingface-space/**'] | |
| ``` | |
| Push to `main` branch β Auto-deploys to Hugging Face Space. | |
| ### Adding New Styles | |
| 1. Train a model using the original repo's training script | |
| 2. Save weights as `.pth` file | |
| 3. Add to `models/` directory or update URL map in `get_model_path()` | |
| 4. Add entry to `STYLES` and `STYLE_DESCRIPTIONS` dictionaries | |
| --- | |
| ## FAQ | |
| **Q: How does the style extraction work?** | |
| A: The new VGG19-based style extraction uses a pre-trained neural network to analyze artistic features (textures, brush strokes, color patterns) from your artwork. It then fine-tunes a transformation network to reproduce those features. This is the same technique used in the original neural style transfer research. | |
| **Q: What's the difference between backends?** | |
| A: | |
| - **Auto**: Uses CUDA if available, otherwise PyTorch | |
| - **CUDA Kernels**: Fastest, requires GPU and compilation | |
| - **PyTorch**: Compatible fallback, works on CPU | |
| **Q: Can I use this commercially?** | |
| A: Yes! StyleForge is MIT licensed. The pre-trained models are from the fast-neural-style-transfer repo. | |
| **Q: How large can my input image be?** | |
| A: Any size, but larger images take longer. Webcam mode auto-scales to 640px max dimension for performance. | |
| **Q: Why does compilation take time on first run?** | |
| A: CUDA kernels are JIT-compiled on first use. This only happens once per session. | |
| --- | |
| ## Acknowledgments | |
| - [Johnson et al.](https://arxiv.org/abs/1603.08155) - Perceptual Losses for Real-Time Style Transfer | |
| - [yakhyo/fast-neural-style-transfer](https://github.com/yakhyo/fast-neural-style-transfer) - Pre-trained model weights | |
| - [Rembg](https://github.com/danielgatis/rembg) - AI background removal (UΒ²-Net) | |
| - [VGG19](https://pytorch.org/vision/stable/models.html) - Pre-trained feature extractor for style extraction | |
| - [Hugging Face](https://huggingface.co) - Spaces hosting platform | |
| - [Gradio](https://gradio.app) - UI framework | |
| - [PyTorch](https://pytorch.org) - Deep learning framework | |
| --- | |
| ## Author | |
| **Olivia** - USC Computer Science | |
| [GitHub](https://github.com/olivialiau/StyleForge) | |
| --- | |
| ## License | |
| MIT License - see [LICENSE](LICENSE) for details. | |
| --- | |
| Made with β€οΈ and CUDA | |