File size: 4,172 Bytes
626b231
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
license: mit
tags:
  - image-to-image
  - style-transfer
  - pytorch
  - beginner
  - fast-inference
pipeline_tag: image-to-image
datasets:
  - coco
metrics:
  - perceptual-loss
---

# mini-style-transfer

A small, fast artistic style transfer model built with PyTorch as a learning project.  
Applies 4 artistic styles to any photo in **under 1 second on CPU**.

Based on [Johnson et al. (2016) β€” Perceptual Losses for Real-Time Style Transfer](https://arxiv.org/abs/1603.08155).

---

## What it does

| Input photo | + Style painting | β†’ Output |
|---|---|---|
| Any photo (any size) | Starry Night / Mosaic / Candy / Sketch | Stylised version |

---

## Styles available

| File | Style |
|---|---|
| `starry_night.pth` | Van Gogh β€” Starry Night |
| `mosaic.pth` | Classic mosaic tile pattern |
| `candy.pth` | Bright candy colours |
| `sketch.pth` | Pencil sketch look |

---

## Quick start

```python
import torch
from torchvision import transforms
from PIL import Image
from model import StyleNet

# 1. Load model
model = StyleNet()
model.load_state_dict(torch.load("starry_night.pth", map_location="cpu"))
model.eval()

# 2. Prepare your image
img = Image.open("my_photo.jpg").convert("RGB")
to_tensor = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
])
tensor = to_tensor(img).unsqueeze(0)

# 3. Run inference
with torch.no_grad():
    output = model(tensor).squeeze(0).clamp(0, 1)

# 4. Save result
result = transforms.ToPILImage()(output)
result.save("styled_output.jpg")
print("Done! Open styled_output.jpg")
```

Or use the included `run.py` script:

```bash
python run.py --model starry_night.pth --input my_photo.jpg --output result.jpg
```

---

## Model details

| Property | Value |
|---|---|
| Architecture | Feed-forward CNN (Encoder β†’ 5Γ— ResBlock β†’ Decoder) |
| Parameters | ~450K |
| Model size | ~1.7 MB per style |
| Input | Any RGB image, any resolution |
| Output | Same size as input, styled |
| Framework | PyTorch 2.x |
| Normalisation | ImageNet mean/std |

---

## Training details

| Property | Value |
|---|---|
| Content dataset | MS-COCO train2017 (subset) |
| Style images | 4 artwork images |
| Epochs | 2 per style |
| Batch size | 4 |
| Image size (training) | 256 Γ— 256 |
| Optimizer | Adam, lr=1e-3 |
| Loss | Perceptual (VGG16) β€” content + style |
| Content weight | 1.0 |
| Style weight | 1e5 |
| Training time | ~45 min per style (GPU) |

---

## Repository structure

```
mini-style-transfer/
β”œβ”€β”€ model.py            ← StyleNet architecture
β”œβ”€β”€ train.py            ← Training script
β”œβ”€β”€ run.py              ← Inference script
β”œβ”€β”€ starry_night.pth    ← Trained weights (starry night style)
β”œβ”€β”€ mosaic.pth          ← Trained weights (mosaic style)
β”œβ”€β”€ candy.pth           ← Trained weights (candy style)
β”œβ”€β”€ sketch.pth          ← Trained weights (sketch style)
└── README.md           ← This file
```

---

## Limitations

- Each style is a **separate model file** β€” there is no single multi-style model yet
- Works best on **natural photos** (landscapes, portraits, cities)
- Cartoons, diagrams, and text-heavy images may give unexpected results
- Training images were 256Γ—256; very high-resolution outputs may look slightly blurry
- Not suitable for commercial use without further evaluation

---

## What I learned building this

- How **convolutional encoders and decoders** work together
- What **Instance Normalisation** does vs Batch Normalisation
- How **Gram matrices** capture texture and style
- What **perceptual loss** is and why pixel-level loss looks bad for style transfer
- How to use a **pretrained VGG** network as a feature extractor without training it

---

## References

- Johnson, J., Alahi, A., & Fei-Fei, L. (2016). [Perceptual Losses for Real-Time Style Transfer and Super-Resolution](https://arxiv.org/abs/1603.08155)
- Gatys, L., Ecker, A., & Bethge, M. (2015). [A Neural Algorithm of Artistic Style](https://arxiv.org/abs/1508.06576)

---

*Built as a learning project. Feedback and suggestions welcome!*