SD3 LoRA - Tilt-Shift Miniature Photography Style (tsmini)

Model Details

Model Description

This is a LoRA adapter trained on Stable Diffusion 3 Medium for the tilt-shift miniature photography style. The trigger word is tsmini.

Developed by: [Zeta Young]
License: CreativeML OpenRAIL-M
Base model: Stable Diffusion 3 Medium (stabilityai/stable-diffusion-3-medium-diffusers)
Adapter type: LoRA (rank=16, alpha=16)

Training Details

Training Data

26 high-quality tilt-shift miniature photography images
Style: shallow depth of field, aerial/diorama perspective, miniature effect
Resolution: 1024×1024

Training Configuration

Parameter	Value
Training script	`diffusers/examples/dreambooth/train_dreambooth_lora_sd3.py`
Instance prompt	"in the style of tsmini"
Resolution	1024
Train batch size	1
Gradient accumulation steps	4
Learning rate	1e-4
LR scheduler	constant
Max train steps	1000
LoRA rank	16
LoRA alpha	16
Mixed precision	bf16
Gradient checkpointing	enabled

Training Hardware

2× NVIDIA RTX 4090 (24GB each)
Single GPU training (SD3 ~2B params fits on one card)

How to Use

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    torch_dtype=torch.bfloat16,
).to("cuda")

# Load LoRA
pipe.load_lora_weights("your-username/sd3-lora-tsmini", weight_name="pytorch_lora_weights.safetensors")

# Generate with style
image = pipe(
    "tsmini, tilt-shift miniature photography, miniature effect, aerial view, shallow depth of field, a city street at sunset with glowing streets",
    num_inference_steps=28,
    guidance_scale=7.0,
    height=1024,
    width=1024,
).images[0]
image.save("output.png")

Results

LoRA Scale Comparison

Scale	Quality	Style Strength
0.5	Good	Weak
0.75	Good	Moderate
1.0	Best	Strong
1.5	Artifacts appear	Too strong
2.0	Severe artifacts	Overloaded

See eval_images/ folder for full comparison images.

Observations

The LoRA successfully transfers the tilt-shift miniature style
Most effective on: cityscapes, aerial views, scenes with clear depth layers
Scale > 1.0 causes artifacts (noise, ghosting) — this is inherent to LoRA scaling, not a data quality issue
Style strength can be adjusted via pipe.set_adapters(["tsmini"], adapter_weights=[scale])

Limitations

Style effect is subtle on some prompts (e.g., close-up subjects)
Works best with prompts describing scenes with depth/distance
Trained on a small dataset (26 images), may not generalize to all scene types

Key Learnings

Official training scripts > custom scripts — SD3's multi-encoder + flow matching details are easy to get wrong
Timestep consistency is critical — flow matching timestep range must match between training and inference
Data quality > data quantity — 26 consistent-style images sufficient for style LoRA
Upscale doesn't help for style LoRA — LoRA learns style features, not pixel details

Downloads last month: 17

Model tree for Zeta-Young/sd3-lora-tsmini

Base model

stabilityai/stable-diffusion-3-medium

Adapter

(70)

this model