mini-style-transfer / README.md
Ateshh's picture
Upload 5 files
626b231 verified
---
license: mit
tags:
- image-to-image
- style-transfer
- pytorch
- beginner
- fast-inference
pipeline_tag: image-to-image
datasets:
- coco
metrics:
- perceptual-loss
---
# mini-style-transfer
A small, fast artistic style transfer model built with PyTorch as a learning project.
Applies 4 artistic styles to any photo in **under 1 second on CPU**.
Based on [Johnson et al. (2016) β€” Perceptual Losses for Real-Time Style Transfer](https://arxiv.org/abs/1603.08155).
---
## What it does
| Input photo | + Style painting | β†’ Output |
|---|---|---|
| Any photo (any size) | Starry Night / Mosaic / Candy / Sketch | Stylised version |
---
## Styles available
| File | Style |
|---|---|
| `starry_night.pth` | Van Gogh β€” Starry Night |
| `mosaic.pth` | Classic mosaic tile pattern |
| `candy.pth` | Bright candy colours |
| `sketch.pth` | Pencil sketch look |
---
## Quick start
```python
import torch
from torchvision import transforms
from PIL import Image
from model import StyleNet
# 1. Load model
model = StyleNet()
model.load_state_dict(torch.load("starry_night.pth", map_location="cpu"))
model.eval()
# 2. Prepare your image
img = Image.open("my_photo.jpg").convert("RGB")
to_tensor = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
tensor = to_tensor(img).unsqueeze(0)
# 3. Run inference
with torch.no_grad():
output = model(tensor).squeeze(0).clamp(0, 1)
# 4. Save result
result = transforms.ToPILImage()(output)
result.save("styled_output.jpg")
print("Done! Open styled_output.jpg")
```
Or use the included `run.py` script:
```bash
python run.py --model starry_night.pth --input my_photo.jpg --output result.jpg
```
---
## Model details
| Property | Value |
|---|---|
| Architecture | Feed-forward CNN (Encoder β†’ 5Γ— ResBlock β†’ Decoder) |
| Parameters | ~450K |
| Model size | ~1.7 MB per style |
| Input | Any RGB image, any resolution |
| Output | Same size as input, styled |
| Framework | PyTorch 2.x |
| Normalisation | ImageNet mean/std |
---
## Training details
| Property | Value |
|---|---|
| Content dataset | MS-COCO train2017 (subset) |
| Style images | 4 artwork images |
| Epochs | 2 per style |
| Batch size | 4 |
| Image size (training) | 256 Γ— 256 |
| Optimizer | Adam, lr=1e-3 |
| Loss | Perceptual (VGG16) β€” content + style |
| Content weight | 1.0 |
| Style weight | 1e5 |
| Training time | ~45 min per style (GPU) |
---
## Repository structure
```
mini-style-transfer/
β”œβ”€β”€ model.py ← StyleNet architecture
β”œβ”€β”€ train.py ← Training script
β”œβ”€β”€ run.py ← Inference script
β”œβ”€β”€ starry_night.pth ← Trained weights (starry night style)
β”œβ”€β”€ mosaic.pth ← Trained weights (mosaic style)
β”œβ”€β”€ candy.pth ← Trained weights (candy style)
β”œβ”€β”€ sketch.pth ← Trained weights (sketch style)
└── README.md ← This file
```
---
## Limitations
- Each style is a **separate model file** β€” there is no single multi-style model yet
- Works best on **natural photos** (landscapes, portraits, cities)
- Cartoons, diagrams, and text-heavy images may give unexpected results
- Training images were 256Γ—256; very high-resolution outputs may look slightly blurry
- Not suitable for commercial use without further evaluation
---
## What I learned building this
- How **convolutional encoders and decoders** work together
- What **Instance Normalisation** does vs Batch Normalisation
- How **Gram matrices** capture texture and style
- What **perceptual loss** is and why pixel-level loss looks bad for style transfer
- How to use a **pretrained VGG** network as a feature extractor without training it
---
## References
- Johnson, J., Alahi, A., & Fei-Fei, L. (2016). [Perceptual Losses for Real-Time Style Transfer and Super-Resolution](https://arxiv.org/abs/1603.08155)
- Gatys, L., Ecker, A., & Bethge, M. (2015). [A Neural Algorithm of Artistic Style](https://arxiv.org/abs/1508.06576)
---
*Built as a learning project. Feedback and suggestions welcome!*