Upload 5 files

626b231 verified about 1 month ago

4.17 kB

	---
	license: mit
	tags:
	- image-to-image
	- style-transfer
	- pytorch
	- beginner
	- fast-inference
	pipeline_tag: image-to-image
	datasets:
	- coco
	metrics:
	- perceptual-loss
	---

	# mini-style-transfer

	A small, fast artistic style transfer model built with PyTorch as a learning project.
	Applies 4 artistic styles to any photo in under 1 second on CPU.

	Based on [Johnson et al. (2016) — Perceptual Losses for Real-Time Style Transfer](https://arxiv.org/abs/1603.08155).

	---

	## What it does

	\| Input photo \| + Style painting \| → Output \|
	\|---\|---\|---\|
	\| Any photo (any size) \| Starry Night / Mosaic / Candy / Sketch \| Stylised version \|

	---

	## Styles available

	\| File \| Style \|
	\|---\|---\|
	\| `starry_night.pth` \| Van Gogh — Starry Night \|
	\| `mosaic.pth` \| Classic mosaic tile pattern \|
	\| `candy.pth` \| Bright candy colours \|
	\| `sketch.pth` \| Pencil sketch look \|

	---

	## Quick start

	```python
	import torch
	from torchvision import transforms
	from PIL import Image
	from model import StyleNet

	# 1. Load model
	model = StyleNet()
	model.load_state_dict(torch.load("starry_night.pth", map_location="cpu"))
	model.eval()

	# 2. Prepare your image
	img = Image.open("my_photo.jpg").convert("RGB")
	to_tensor = transforms.Compose([
	transforms.ToTensor(),
	transforms.Normalize(mean=[0.485, 0.456, 0.406],
	std=[0.229, 0.224, 0.225]),
	])
	tensor = to_tensor(img).unsqueeze(0)

	# 3. Run inference
	with torch.no_grad():
	output = model(tensor).squeeze(0).clamp(0, 1)

	# 4. Save result
	result = transforms.ToPILImage()(output)
	result.save("styled_output.jpg")
	print("Done! Open styled_output.jpg")
	```

	Or use the included `run.py` script:

	```bash
	python run.py --model starry_night.pth --input my_photo.jpg --output result.jpg
	```

	---

	## Model details

	\| Property \| Value \|
	\|---\|---\|
	\| Architecture \| Feed-forward CNN (Encoder → 5× ResBlock → Decoder) \|
	\| Parameters \| ~450K \|
	\| Model size \| ~1.7 MB per style \|
	\| Input \| Any RGB image, any resolution \|
	\| Output \| Same size as input, styled \|
	\| Framework \| PyTorch 2.x \|
	\| Normalisation \| ImageNet mean/std \|

	---

	## Training details

	\| Property \| Value \|
	\|---\|---\|
	\| Content dataset \| MS-COCO train2017 (subset) \|
	\| Style images \| 4 artwork images \|
	\| Epochs \| 2 per style \|
	\| Batch size \| 4 \|
	\| Image size (training) \| 256 × 256 \|
	\| Optimizer \| Adam, lr=1e-3 \|
	\| Loss \| Perceptual (VGG16) — content + style \|
	\| Content weight \| 1.0 \|
	\| Style weight \| 1e5 \|
	\| Training time \| ~45 min per style (GPU) \|

	---

	## Repository structure

	```
	mini-style-transfer/
	├── model.py ← StyleNet architecture
	├── train.py ← Training script
	├── run.py ← Inference script
	├── starry_night.pth ← Trained weights (starry night style)
	├── mosaic.pth ← Trained weights (mosaic style)
	├── candy.pth ← Trained weights (candy style)
	├── sketch.pth ← Trained weights (sketch style)
	└── README.md ← This file
	```

	---

	## Limitations

	- Each style is a separate model file — there is no single multi-style model yet
	- Works best on natural photos (landscapes, portraits, cities)
	- Cartoons, diagrams, and text-heavy images may give unexpected results
	- Training images were 256×256; very high-resolution outputs may look slightly blurry
	- Not suitable for commercial use without further evaluation

	---

	## What I learned building this

	- How convolutional encoders and decoders work together
	- What Instance Normalisation does vs Batch Normalisation
	- How Gram matrices capture texture and style
	- What perceptual loss is and why pixel-level loss looks bad for style transfer
	- How to use a pretrained VGG network as a feature extractor without training it

	---

	## References

	- Johnson, J., Alahi, A., & Fei-Fei, L. (2016). [Perceptual Losses for Real-Time Style Transfer and Super-Resolution](https://arxiv.org/abs/1603.08155)
	- Gatys, L., Ecker, A., & Bethge, M. (2015). [A Neural Algorithm of Artistic Style](https://arxiv.org/abs/1508.06576)

	---

	Built as a learning project. Feedback and suggestions welcome!