File size: 4,172 Bytes
626b231 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | ---
license: mit
tags:
- image-to-image
- style-transfer
- pytorch
- beginner
- fast-inference
pipeline_tag: image-to-image
datasets:
- coco
metrics:
- perceptual-loss
---
# mini-style-transfer
A small, fast artistic style transfer model built with PyTorch as a learning project.
Applies 4 artistic styles to any photo in **under 1 second on CPU**.
Based on [Johnson et al. (2016) β Perceptual Losses for Real-Time Style Transfer](https://arxiv.org/abs/1603.08155).
---
## What it does
| Input photo | + Style painting | β Output |
|---|---|---|
| Any photo (any size) | Starry Night / Mosaic / Candy / Sketch | Stylised version |
---
## Styles available
| File | Style |
|---|---|
| `starry_night.pth` | Van Gogh β Starry Night |
| `mosaic.pth` | Classic mosaic tile pattern |
| `candy.pth` | Bright candy colours |
| `sketch.pth` | Pencil sketch look |
---
## Quick start
```python
import torch
from torchvision import transforms
from PIL import Image
from model import StyleNet
# 1. Load model
model = StyleNet()
model.load_state_dict(torch.load("starry_night.pth", map_location="cpu"))
model.eval()
# 2. Prepare your image
img = Image.open("my_photo.jpg").convert("RGB")
to_tensor = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
tensor = to_tensor(img).unsqueeze(0)
# 3. Run inference
with torch.no_grad():
output = model(tensor).squeeze(0).clamp(0, 1)
# 4. Save result
result = transforms.ToPILImage()(output)
result.save("styled_output.jpg")
print("Done! Open styled_output.jpg")
```
Or use the included `run.py` script:
```bash
python run.py --model starry_night.pth --input my_photo.jpg --output result.jpg
```
---
## Model details
| Property | Value |
|---|---|
| Architecture | Feed-forward CNN (Encoder β 5Γ ResBlock β Decoder) |
| Parameters | ~450K |
| Model size | ~1.7 MB per style |
| Input | Any RGB image, any resolution |
| Output | Same size as input, styled |
| Framework | PyTorch 2.x |
| Normalisation | ImageNet mean/std |
---
## Training details
| Property | Value |
|---|---|
| Content dataset | MS-COCO train2017 (subset) |
| Style images | 4 artwork images |
| Epochs | 2 per style |
| Batch size | 4 |
| Image size (training) | 256 Γ 256 |
| Optimizer | Adam, lr=1e-3 |
| Loss | Perceptual (VGG16) β content + style |
| Content weight | 1.0 |
| Style weight | 1e5 |
| Training time | ~45 min per style (GPU) |
---
## Repository structure
```
mini-style-transfer/
βββ model.py β StyleNet architecture
βββ train.py β Training script
βββ run.py β Inference script
βββ starry_night.pth β Trained weights (starry night style)
βββ mosaic.pth β Trained weights (mosaic style)
βββ candy.pth β Trained weights (candy style)
βββ sketch.pth β Trained weights (sketch style)
βββ README.md β This file
```
---
## Limitations
- Each style is a **separate model file** β there is no single multi-style model yet
- Works best on **natural photos** (landscapes, portraits, cities)
- Cartoons, diagrams, and text-heavy images may give unexpected results
- Training images were 256Γ256; very high-resolution outputs may look slightly blurry
- Not suitable for commercial use without further evaluation
---
## What I learned building this
- How **convolutional encoders and decoders** work together
- What **Instance Normalisation** does vs Batch Normalisation
- How **Gram matrices** capture texture and style
- What **perceptual loss** is and why pixel-level loss looks bad for style transfer
- How to use a **pretrained VGG** network as a feature extractor without training it
---
## References
- Johnson, J., Alahi, A., & Fei-Fei, L. (2016). [Perceptual Losses for Real-Time Style Transfer and Super-Resolution](https://arxiv.org/abs/1603.08155)
- Gatys, L., Ecker, A., & Bethge, M. (2015). [A Neural Algorithm of Artistic Style](https://arxiv.org/abs/1508.06576)
---
*Built as a learning project. Feedback and suggestions welcome!*
|