| --- |
| license: mit |
| tags: |
| - image-to-image |
| - style-transfer |
| - pytorch |
| - beginner |
| - fast-inference |
| pipeline_tag: image-to-image |
| datasets: |
| - coco |
| metrics: |
| - perceptual-loss |
| --- |
| |
| # mini-style-transfer |
|
|
| A small, fast artistic style transfer model built with PyTorch as a learning project. |
| Applies 4 artistic styles to any photo in **under 1 second on CPU**. |
|
|
| Based on [Johnson et al. (2016) β Perceptual Losses for Real-Time Style Transfer](https://arxiv.org/abs/1603.08155). |
|
|
| --- |
|
|
| ## What it does |
|
|
| | Input photo | + Style painting | β Output | |
| |---|---|---| |
| | Any photo (any size) | Starry Night / Mosaic / Candy / Sketch | Stylised version | |
|
|
| --- |
|
|
| ## Styles available |
|
|
| | File | Style | |
| |---|---| |
| | `starry_night.pth` | Van Gogh β Starry Night | |
| | `mosaic.pth` | Classic mosaic tile pattern | |
| | `candy.pth` | Bright candy colours | |
| | `sketch.pth` | Pencil sketch look | |
|
|
| --- |
|
|
| ## Quick start |
|
|
| ```python |
| import torch |
| from torchvision import transforms |
| from PIL import Image |
| from model import StyleNet |
| |
| # 1. Load model |
| model = StyleNet() |
| model.load_state_dict(torch.load("starry_night.pth", map_location="cpu")) |
| model.eval() |
| |
| # 2. Prepare your image |
| img = Image.open("my_photo.jpg").convert("RGB") |
| to_tensor = transforms.Compose([ |
| transforms.ToTensor(), |
| transforms.Normalize(mean=[0.485, 0.456, 0.406], |
| std=[0.229, 0.224, 0.225]), |
| ]) |
| tensor = to_tensor(img).unsqueeze(0) |
| |
| # 3. Run inference |
| with torch.no_grad(): |
| output = model(tensor).squeeze(0).clamp(0, 1) |
| |
| # 4. Save result |
| result = transforms.ToPILImage()(output) |
| result.save("styled_output.jpg") |
| print("Done! Open styled_output.jpg") |
| ``` |
|
|
| Or use the included `run.py` script: |
|
|
| ```bash |
| python run.py --model starry_night.pth --input my_photo.jpg --output result.jpg |
| ``` |
|
|
| --- |
|
|
| ## Model details |
|
|
| | Property | Value | |
| |---|---| |
| | Architecture | Feed-forward CNN (Encoder β 5Γ ResBlock β Decoder) | |
| | Parameters | ~450K | |
| | Model size | ~1.7 MB per style | |
| | Input | Any RGB image, any resolution | |
| | Output | Same size as input, styled | |
| | Framework | PyTorch 2.x | |
| | Normalisation | ImageNet mean/std | |
|
|
| --- |
|
|
| ## Training details |
|
|
| | Property | Value | |
| |---|---| |
| | Content dataset | MS-COCO train2017 (subset) | |
| | Style images | 4 artwork images | |
| | Epochs | 2 per style | |
| | Batch size | 4 | |
| | Image size (training) | 256 Γ 256 | |
| | Optimizer | Adam, lr=1e-3 | |
| | Loss | Perceptual (VGG16) β content + style | |
| | Content weight | 1.0 | |
| | Style weight | 1e5 | |
| | Training time | ~45 min per style (GPU) | |
|
|
| --- |
|
|
| ## Repository structure |
|
|
| ``` |
| mini-style-transfer/ |
| βββ model.py β StyleNet architecture |
| βββ train.py β Training script |
| βββ run.py β Inference script |
| βββ starry_night.pth β Trained weights (starry night style) |
| βββ mosaic.pth β Trained weights (mosaic style) |
| βββ candy.pth β Trained weights (candy style) |
| βββ sketch.pth β Trained weights (sketch style) |
| βββ README.md β This file |
| ``` |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - Each style is a **separate model file** β there is no single multi-style model yet |
| - Works best on **natural photos** (landscapes, portraits, cities) |
| - Cartoons, diagrams, and text-heavy images may give unexpected results |
| - Training images were 256Γ256; very high-resolution outputs may look slightly blurry |
| - Not suitable for commercial use without further evaluation |
|
|
| --- |
|
|
| ## What I learned building this |
|
|
| - How **convolutional encoders and decoders** work together |
| - What **Instance Normalisation** does vs Batch Normalisation |
| - How **Gram matrices** capture texture and style |
| - What **perceptual loss** is and why pixel-level loss looks bad for style transfer |
| - How to use a **pretrained VGG** network as a feature extractor without training it |
|
|
| --- |
|
|
| ## References |
|
|
| - Johnson, J., Alahi, A., & Fei-Fei, L. (2016). [Perceptual Losses for Real-Time Style Transfer and Super-Resolution](https://arxiv.org/abs/1603.08155) |
| - Gatys, L., Ecker, A., & Bethge, M. (2015). [A Neural Algorithm of Artistic Style](https://arxiv.org/abs/1508.06576) |
|
|
| --- |
|
|
| *Built as a learning project. Feedback and suggestions welcome!* |
|
|