Sana 0.6B — ONNX for In-Browser WebGPU Inference

Generate 1024x1024 to 4096x4096 images entirely in the browser using WebGPU.

Requirements

Component	File	Size	Precision
CLIP text encoder	onnx-community/clip-vit-large-patch14-ONNX	432 MB	uint8
DiT 1024	1024/sana_dit_1024.onnx + .data	2.3 GB	float32
DiT 2048	2048/sana_dit_2048.onnx + .data	2.3 GB	float32
DiT 4096	4096/sana_dit_4096.onnx + .data	2.3 GB	float32
VAE 1024	1024/sana_vae_1024.onnx + .data	608 MB	float32
VAE 2048	2048/sana_vae_2048.onnx + .data	608 MB	float32
VAE 4096	4096/sana_vae_4096.onnx + .data	608 MB	float32

Note: DiT must be float32 — Sana's linear attention produces NaN in fp16.

Unable to build the model tree, the base model loops to the model itself. Learn more.