NanoDiffusion-46M

A ~46M parameter UNet trained from scratch for text-to-image generation in latent space (SD VAE + CLIP text encoder, both frozen).

Dataset: BitTranslate/Bittensor_subnet_19_06_04_24
Trained steps: 500
Image size: 256px โ†’ 32ร—32 latents

Architecture

  • UNet base channels: 128
  • Channel multipliers: (1, 2, 3, 4)
  • Res blocks per level: 2
  • Cross-attention (text conditioning): latent res (8,)
  • VAE (frozen): runwayml/stable-diffusion-v1-5 vae subfolder
  • Text encoder (frozen): CLIP ViT-L/14 from runwayml/stable-diffusion-v1-5
Downloads last month
9
Safetensors
Model size
45.5M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using FlameF0X/NanoDiffusion-46M 1