NanoDiffusion-46M
A ~46M parameter UNet trained from scratch for text-to-image generation in latent space (SD VAE + CLIP text encoder, both frozen).
Dataset: BitTranslate/Bittensor_subnet_19_06_04_24
Trained steps: 500
Image size: 256px โ 32ร32 latents
Architecture
- UNet base channels: 128
- Channel multipliers: (1, 2, 3, 4)
- Res blocks per level: 2
- Cross-attention (text conditioning): latent res (8,)
- VAE (frozen):
runwayml/stable-diffusion-v1-5vae subfolder - Text encoder (frozen): CLIP ViT-L/14 from
runwayml/stable-diffusion-v1-5
- Downloads last month
- 9