Mini Latent Diffusion Model
A ~46M parameter UNet trained from scratch for text-to-image generation in latent space (SD VAE + CLIP text encoder, both frozen).
Dataset: BitTranslate/Bittensor_subnet_19_06_04_24
Trained steps: 1000
Image size: 256px → 32×32 latents
Architecture
- UNet base channels: 128
- Channel multipliers: (1, 2, 3, 4)
- Res blocks per level: 2
- Cross-attention (text conditioning): latent res (8,)
- VAE (frozen):
runwayml/stable-diffusion-v1-5vae subfolder - Text encoder (frozen): CLIP ViT-L/14 from
runwayml/stable-diffusion-v1-5
Inference
from train_mini_ldm import generate
imgs = generate("a sunset over the ocean", "./mini_ldm_output/final")
- Downloads last month
- -