import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-3-medium", dtype=torch.bfloat16, device_map="cuda")
pipe.load_lora_weights("Zeta-Young/sd3-lora-tsmini")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]SD3 LoRA - Tilt-Shift Miniature Photography Style (tsmini)
Model Details
Model Description
This is a LoRA adapter trained on Stable Diffusion 3 Medium for the tilt-shift miniature photography style. The trigger word is tsmini.
- Developed by: [Zeta Young]
- License: CreativeML OpenRAIL-M
- Base model: Stable Diffusion 3 Medium (
stabilityai/stable-diffusion-3-medium-diffusers) - Adapter type: LoRA (rank=16, alpha=16)
Training Details
Training Data
- 26 high-quality tilt-shift miniature photography images
- Style: shallow depth of field, aerial/diorama perspective, miniature effect
- Resolution: 1024ร1024
Training Configuration
| Parameter | Value |
|---|---|
| Training script | diffusers/examples/dreambooth/train_dreambooth_lora_sd3.py |
| Instance prompt | "in the style of tsmini" |
| Resolution | 1024 |
| Train batch size | 1 |
| Gradient accumulation steps | 4 |
| Learning rate | 1e-4 |
| LR scheduler | constant |
| Max train steps | 1000 |
| LoRA rank | 16 |
| LoRA alpha | 16 |
| Mixed precision | bf16 |
| Gradient checkpointing | enabled |
Training Hardware
- 2ร NVIDIA RTX 4090 (24GB each)
- Single GPU training (SD3 ~2B params fits on one card)
How to Use
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
torch_dtype=torch.bfloat16,
).to("cuda")
# Load LoRA
pipe.load_lora_weights("your-username/sd3-lora-tsmini", weight_name="pytorch_lora_weights.safetensors")
# Generate with style
image = pipe(
"tsmini, tilt-shift miniature photography, miniature effect, aerial view, shallow depth of field, a city street at sunset with glowing streets",
num_inference_steps=28,
guidance_scale=7.0,
height=1024,
width=1024,
).images[0]
image.save("output.png")
Results
LoRA Scale Comparison
| Scale | Quality | Style Strength |
|---|---|---|
| 0.5 | Good | Weak |
| 0.75 | Good | Moderate |
| 1.0 | Best | Strong |
| 1.5 | Artifacts appear | Too strong |
| 2.0 | Severe artifacts | Overloaded |
See eval_images/ folder for full comparison images.
Observations
- The LoRA successfully transfers the tilt-shift miniature style
- Most effective on: cityscapes, aerial views, scenes with clear depth layers
- Scale > 1.0 causes artifacts (noise, ghosting) โ this is inherent to LoRA scaling, not a data quality issue
- Style strength can be adjusted via
pipe.set_adapters(["tsmini"], adapter_weights=[scale])
Limitations
- Style effect is subtle on some prompts (e.g., close-up subjects)
- Works best with prompts describing scenes with depth/distance
- Trained on a small dataset (26 images), may not generalize to all scene types
Key Learnings
- Official training scripts > custom scripts โ SD3's multi-encoder + flow matching details are easy to get wrong
- Timestep consistency is critical โ flow matching timestep range must match between training and inference
- Data quality > data quantity โ 26 consistent-style images sufficient for style LoRA
- Upscale doesn't help for style LoRA โ LoRA learns style features, not pixel details
- Downloads last month
- 17
Model tree for Zeta-Young/sd3-lora-tsmini
Base model
stabilityai/stable-diffusion-3-medium