Mini Latent Diffusion Model

A ~46M parameter UNet trained from scratch for text-to-image generation in latent space (SD VAE + CLIP text encoder, both frozen).

Dataset: BitTranslate/Bittensor_subnet_19_06_04_24
Trained steps: 1000
Image size: 256px → 32×32 latents

Architecture

  • UNet base channels: 128
  • Channel multipliers: (1, 2, 3, 4)
  • Res blocks per level: 2
  • Cross-attention (text conditioning): latent res (8,)
  • VAE (frozen): runwayml/stable-diffusion-v1-5 vae subfolder
  • Text encoder (frozen): CLIP ViT-L/14 from runwayml/stable-diffusion-v1-5

Inference

from train_mini_ldm import generate
imgs = generate("a sunset over the ocean", "./mini_ldm_output/final")
Downloads last month
-
Safetensors
Model size
45.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support