SD3 LoRA - Tilt-Shift Miniature Photography Style (tsmini)

Model Details

Model Description

This is a LoRA adapter trained on Stable Diffusion 3 Medium for the tilt-shift miniature photography style. The trigger word is tsmini.

  • Developed by: [Zeta Young]
  • License: CreativeML OpenRAIL-M
  • Base model: Stable Diffusion 3 Medium (stabilityai/stable-diffusion-3-medium-diffusers)
  • Adapter type: LoRA (rank=16, alpha=16)

Training Details

Training Data

  • 26 high-quality tilt-shift miniature photography images
  • Style: shallow depth of field, aerial/diorama perspective, miniature effect
  • Resolution: 1024ร—1024

Training Configuration

Parameter Value
Training script diffusers/examples/dreambooth/train_dreambooth_lora_sd3.py
Instance prompt "in the style of tsmini"
Resolution 1024
Train batch size 1
Gradient accumulation steps 4
Learning rate 1e-4
LR scheduler constant
Max train steps 1000
LoRA rank 16
LoRA alpha 16
Mixed precision bf16
Gradient checkpointing enabled

Training Hardware

  • 2ร— NVIDIA RTX 4090 (24GB each)
  • Single GPU training (SD3 ~2B params fits on one card)

How to Use

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    torch_dtype=torch.bfloat16,
).to("cuda")

# Load LoRA
pipe.load_lora_weights("your-username/sd3-lora-tsmini", weight_name="pytorch_lora_weights.safetensors")

# Generate with style
image = pipe(
    "tsmini, tilt-shift miniature photography, miniature effect, aerial view, shallow depth of field, a city street at sunset with glowing streets",
    num_inference_steps=28,
    guidance_scale=7.0,
    height=1024,
    width=1024,
).images[0]
image.save("output.png")

Results

LoRA Scale Comparison

Scale Quality Style Strength
0.5 Good Weak
0.75 Good Moderate
1.0 Best Strong
1.5 Artifacts appear Too strong
2.0 Severe artifacts Overloaded

See eval_images/ folder for full comparison images.

Observations

  • The LoRA successfully transfers the tilt-shift miniature style
  • Most effective on: cityscapes, aerial views, scenes with clear depth layers
  • Scale > 1.0 causes artifacts (noise, ghosting) โ€” this is inherent to LoRA scaling, not a data quality issue
  • Style strength can be adjusted via pipe.set_adapters(["tsmini"], adapter_weights=[scale])

Limitations

  • Style effect is subtle on some prompts (e.g., close-up subjects)
  • Works best with prompts describing scenes with depth/distance
  • Trained on a small dataset (26 images), may not generalize to all scene types

Key Learnings

  1. Official training scripts > custom scripts โ€” SD3's multi-encoder + flow matching details are easy to get wrong
  2. Timestep consistency is critical โ€” flow matching timestep range must match between training and inference
  3. Data quality > data quantity โ€” 26 consistent-style images sufficient for style LoRA
  4. Upscale doesn't help for style LoRA โ€” LoRA learns style features, not pixel details
Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Zeta-Young/sd3-lora-tsmini

Adapter
(70)
this model