| Using devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=2, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=3, process_index=0, coords=(1,1,0), core_on_chip=0)] |
| Device count 4 |
| Global device count 4 |
| Global Batch: 512 |
| Node Batch: 512 |
| Device Batch: 128 |
| /tmp/tmpsx4kqqyn |
| Loading dataset |
| Loading dataset |
| creating model |
| beta1: 0.9 |
| beta2: 0.999 |
| bootstrap_cfg: 1 |
| bootstrap_dt_bias: 0 |
| bootstrap_ema: 1 |
| bootstrap_every: 8 |
| cfg_scale: 1.5 |
| class_dropout_prob: 0.1 |
| denoise_timesteps: 128 |
| depth: 12 |
| dropout: 0.0 |
| dt_sampling: uniform |
| hidden_size: 768 |
| lr: 0.0001 |
| mlp_ratio: 4 |
| num_classes: 1000 |
| num_heads: 12 |
| patch_size: 2 |
| sharding: dp |
| t_sampling: discrete-dt |
| target_update_rate: 0.999 |
| train_type: naive |
| use_cosine: 0 |
| use_ema: 0 |
| use_stable_vae: 1 |
| warmup: 0 |
| weight_decay: 0.1 |
|
|
| Total devices TPU_0(process=0,(0,0,0,0)) |
| Initializing encoder. |
| Incoming encoder shape (1, 256, 256, 3) |
| Encoder layer (1, 256, 256, 128) |
| doing downsample |
| Encoder layer (1, 128, 128, 128) |
| doing downsample |
| Encoder layer (1, 64, 64, 256) |
| doing downsample |
| Encoder layer (1, 32, 32, 512) |
| Encoder layer (1, 32, 32, 512) |
| Encoder layer final (1, 32, 32, 512) |
| Encoder layer final (1, 32, 32, 512) |
| Final embeddings are size (1, 32, 32, 8) |
| After quant (1, 32, 32, 4) |
| encode finished |
| Decoder incoming shape (1, 32, 32, 4) |
| Decoder input (1, 32, 32, 512) |
| Mid Block Decoder layer (1, 32, 32, 512) |
| Mid Block Decoder layer (1, 32, 32, 512) |
| Decoder layer (1, 64, 64, 512) |
| Decoder layer (1, 128, 128, 512) |
| Decoder layer (1, 256, 256, 256) |
| Decoder layer (1, 256, 256, 128) |
| Total num of VQVAE parameters: 67565323 |
| Disc shape (1, 128, 128, 128) |
| Disc shape (1, 64, 64, 256) |
| Disc shape (1, 32, 32, 512) |
| Disc shape (1, 16, 16, 512) |
| Disc shape (1, 8, 8, 512) |
| Disc shape (1, 4, 4, 512) |
| Total num of Discriminator parameters: 23998017 |
| Loaded checkpoint from 18291200 seconds ago. |
| Loaded model with step 447001 |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β TPU 0 β |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| β TPU 1 β |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| β TPU 2 β |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| β TPU 3 β |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| returning model |
| model done |
| Input to vae (4, 1, 256, 256, 3) |
| encode image shape (1, 256, 256, 3) |
| Initializing encoder. |
| Incoming encoder shape (1, 256, 256, 3) |
| Encoder layer (1, 256, 256, 128) |
| doing downsample |
| Encoder layer (1, 128, 128, 128) |
| doing downsample |
| Encoder layer (1, 64, 64, 256) |
| doing downsample |
| Encoder layer (1, 32, 32, 512) |
| Encoder layer (1, 32, 32, 512) |
| Encoder layer final (1, 32, 32, 512) |
| Encoder layer final (1, 32, 32, 512) |
| Final embeddings are size (1, 32, 32, 8) |
| After quant (1, 32, 32, 4) |
| output example shape (4, 1, 32, 32, 4) |
| Test data shape (4, 256, 256, 3) |
| x shape (4, 1, 256, 256, 3) |
| encoded shape (4, 1, 32, 32, 4) |
| z_vectors shape (1, 32, 32, 4) |
| Decoder incoming shape (1, 32, 32, 4) |
| Decoder input (1, 32, 32, 512) |
| Mid Block Decoder layer (1, 32, 32, 512) |
| Mid Block Decoder layer (1, 32, 32, 512) |
| Decoder layer (1, 64, 64, 512) |
| Decoder layer (1, 128, 128, 512) |
| Decoder layer (1, 256, 256, 256) |
| Decoder layer (1, 256, 256, 128) |
| image shape (4, 1, 256, 256, 3) |
| decoded img shape (256, 256, 3) |
| obs shape (4, 32, 32, 4) |
| DiT: Input of shape (4, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (4, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (4, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (1, 768) dtype float32 |
|
|
| [3m DiT Summary [0m |
| ββββββββββββββββββββββββββββββββββββ³βββββββββββββββββββ³ββββββββββββββββββββββββ³ββββββββββββββββββββββββ³βββββββββββββββββββββββββββββββ |
| β[1m [0m[1mpath [0m[1m [0mβ[1m [0m[1mmodule [0m[1m [0mβ[1m [0m[1minputs [0m[1m [0mβ[1m [0m[1moutputs [0m[1m [0mβ[1m [0m[1mparams [0m[1m [0mβ |
| β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ© |
| β β DiT β - [2mfloat32[0m[4,32,32,4] β [2mbfloat16[0m[4,32,32,4] β β |
| β β β - [2mfloat32[0m[1] β β β |
| β β β - [2mfloat32[0m[1] β β β |
| β β β - [2mint32[0m[1] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β PatchEmbed_0 β PatchEmbed β [2mfloat32[0m[4,32,32,4] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β PatchEmbed_0/Conv_0 β Conv β [2mfloat32[0m[4,32,32,4] β [2mbfloat16[0m[4,16,16,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2,2,4,768] β |
| β β β β β β |
| β β β β β [1m13,056 [0m[1;2m(52.2 KB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β TimestepEmbedder_0 β TimestepEmbedder β [2mfloat32[0m[1] β [2mfloat32[0m[1,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β TimestepEmbedder_0/Dense_0 β Dense β [2mbfloat16[0m[1,256] β [2mbfloat16[0m[1,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[256,768] β |
| β β β β β β |
| β β β β β [1m197,376 [0m[1;2m(789.5 KB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β TimestepEmbedder_0/Dense_1 β Dense β [2mbfloat16[0m[1,768] β [2mfloat32[0m[1,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β TimestepEmbedder_1 β TimestepEmbedder β [2mfloat32[0m[1] β [2mfloat32[0m[1,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β TimestepEmbedder_1/Dense_0 β Dense β [2mbfloat16[0m[1,256] β [2mbfloat16[0m[1,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[256,768] β |
| β β β β β β |
| β β β β β [1m197,376 [0m[1;2m(789.5 KB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β TimestepEmbedder_1/Dense_1 β Dense β [2mbfloat16[0m[1,768] β [2mfloat32[0m[1,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β LabelEmbedder_0 β LabelEmbedder β [2mint32[0m[1] β [2mbfloat16[0m[1,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β LabelEmbedder_0/Embed_0 β Embed β [2mint32[0m[1] β [2mbfloat16[0m[1,768] β embedding: [2mfloat32[0m[1001,768] β |
| β β β β β β |
| β β β β β [1m768,768 [0m[1;2m(3.1 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0 β DiTBlock β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/Dense_2 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/Dense_3 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/Dense_4 β Dense β [2mfloat32[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,3072] β bias: [2mfloat32[0m[3072] β |
| β β β β β kernel: [2mfloat32[0m[768,3072] β |
| β β β β β β |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,3072] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[3072,768] β |
| β β β β β β |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1 β DiTBlock β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/Dense_2 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/Dense_3 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/Dense_4 β Dense β [2mfloat32[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,3072] β bias: [2mfloat32[0m[3072] β |
| β β β β β kernel: [2mfloat32[0m[768,3072] β |
| β β β β β β |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,3072] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[3072,768] β |
| β β β β β β |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2 β DiTBlock β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/Dense_2 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/Dense_3 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/Dense_4 β Dense β [2mfloat32[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,3072] β bias: [2mfloat32[0m[3072] β |
| β β β β β kernel: [2mfloat32[0m[768,3072] β |
| β β β β β β |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,3072] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[3072,768] β |
| β β β β β β |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3 β DiTBlock β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/Dense_2 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/Dense_3 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/Dense_4 β Dense β [2mfloat32[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,3072] β bias: [2mfloat32[0m[3072] β |
| β β β β β kernel: [2mfloat32[0m[768,3072] β |
| β β β β β β |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,3072] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[3072,768] β |
| β β β β β β |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4 β DiTBlock β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/Dense_2 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/Dense_3 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/Dense_4 β Dense β [2mfloat32[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,3072] β bias: [2mfloat32[0m[3072] β |
| β β β β β kernel: [2mfloat32[0m[768,3072] β |
| β β β β β β |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,3072] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[3072,768] β |
| β β β β β β |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5 β DiTBlock β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/Dense_2 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/Dense_3 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/Dense_4 β Dense β [2mfloat32[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,3072] β bias: [2mfloat32[0m[3072] β |
| β β β β β kernel: [2mfloat32[0m[768,3072] β |
| β β β β β β |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,3072] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[3072,768] β |
| β β β β β β |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6 β DiTBlock β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/Dense_2 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/Dense_3 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/Dense_4 β Dense β [2mfloat32[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,3072] β bias: [2mfloat32[0m[3072] β |
| β β β β β kernel: [2mfloat32[0m[768,3072] β |
| β β β β β β |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,3072] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[3072,768] β |
| β β β β β β |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7 β DiTBlock β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/Dense_2 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/Dense_3 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/Dense_4 β Dense β [2mfloat32[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,3072] β bias: [2mfloat32[0m[3072] β |
| β β β β β kernel: [2mfloat32[0m[768,3072] β |
| β β β β β β |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,3072] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[3072,768] β |
| β β β β β β |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8 β DiTBlock β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/Dense_2 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/Dense_3 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/Dense_4 β Dense β [2mfloat32[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,3072] β bias: [2mfloat32[0m[3072] β |
| β β β β β kernel: [2mfloat32[0m[768,3072] β |
| β β β β β β |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,3072] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[3072,768] β |
| β β β β β β |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9 β DiTBlock β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/Dense_2 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/Dense_3 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/Dense_4 β Dense β [2mfloat32[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,3072] β bias: [2mfloat32[0m[3072] β |
| β β β β β kernel: [2mfloat32[0m[768,3072] β |
| β β β β β β |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,3072] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[3072,768] β |
| β β β β β β |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10 β DiTBlock β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/Dense_2 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/Dense_3 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/Dense_4 β Dense β [2mfloat32[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,3072] β bias: [2mfloat32[0m[3072] β |
| β β β β β kernel: [2mfloat32[0m[768,3072] β |
| β β β β β β |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,3072] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[3072,768] β |
| β β β β β β |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11 β DiTBlock β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/Dense_2 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/Dense_3 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/Dense_4 β Dense β [2mfloat32[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,3072] β bias: [2mfloat32[0m[3072] β |
| β β β β β kernel: [2mfloat32[0m[768,3072] β |
| β β β β β β |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,3072] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,3072] β [2mbfloat16[0m[4,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[3072,768] β |
| β β β β β β |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β FinalLayer_0 β FinalLayer β - [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,16] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β FinalLayer_0/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,1536] β bias: [2mfloat32[0m[1536] β |
| β β β β β kernel: [2mfloat32[0m[768,1536] β |
| β β β β β β |
| β β β β β [1m1,181,184 [0m[1;2m(4.7 MB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β FinalLayer_0/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,768] β β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β FinalLayer_0/Dense_1 β Dense β [2mbfloat16[0m[4,256,768] β [2mbfloat16[0m[4,256,16] β bias: [2mfloat32[0m[16] β |
| β β β β β kernel: [2mfloat32[0m[768,16] β |
| β β β β β β |
| β β β β β [1m12,304 [0m[1;2m(49.2 KB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β Embed_0 β Embed β [2mint32[0m[1] β [2mfloat32[0m[1,1] β embedding: [2mfloat32[0m[256,1] β |
| β β β β β β |
| β β β β β [1m256 [0m[1;2m(1.0 KB)[0m β |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β[1m [0m[1m [0m[1m [0mβ[1m [0m[1m [0m[1m [0mβ[1m [0m[1m [0m[1m [0mβ[1m [0m[1m Total[0m[1m [0mβ[1m [0m[1m131,091,728 [0m[1;2m(524.4 MB)[0m[1m [0m[1m [0mβ |
| ββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββ΄ββββββββββββββββββββββββ΄ββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββ |
| [1m [0m |
| [1m Total Parameters: 131,091,728 [0m[1;2m(524.4 MB)[0m[1m [0m |
|
|
|
|
| DiT: Input of shape (4, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (4, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (4, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (1, 768) dtype float32 |
| Loaded checkpoint from 18039 seconds ago. |
|
|
| parameter shapes: |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (2, 2, 4, 768) |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (768,) |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (256, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (768,) |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (768, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (768,) |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (256, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (768,) |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (768, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (768,) |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1001, 768) |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_0', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_1', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_2', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_3', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_4', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_5', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_6', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_7', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_8', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_9', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_10', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_11', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (768, 1536) |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1536,) |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (768, 16) |
| ('FinalLayer_0', 'Dense_1', 'bias'): (16,) |
| ('Embed_0', 'embedding'): (256, 1) |
|
|
| parameter shapes: |
| ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('Embed_0', 'embedding'): (1, 256, 1) |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536) |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536) |
| ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16) |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16) |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768) |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768) |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768) |
|
|
| parameter shapes: |
| ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('Embed_0', 'embedding'): (1, 256, 1) |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536) |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536) |
| ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16) |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16) |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768) |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768) |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768) |
|
|
| parameter shapes: |
| ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('Embed_0', 'embedding'): (1, 256, 1) |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536) |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536) |
| ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16) |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16) |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768) |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768) |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768) |
|
|
| parameter shapes: |
| ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) |
| ('Embed_0', 'embedding'): (1, 256, 1) |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536) |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536) |
| ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16) |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16) |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768) |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768) |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768) |
|
|
| parameter shapes: |
| ('DiTBlock_0', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_1', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_1', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_10', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_10', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_11', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_11', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_2', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_2', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_3', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_3', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_4', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_4', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_5', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_5', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_6', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_6', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_7', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_7', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_8', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_8', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('DiTBlock_9', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_9', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) |
| ('Embed_0', 'embedding'): (256, 1) |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1536,) |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (768, 1536) |
| ('FinalLayer_0', 'Dense_1', 'bias'): (16,) |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (768, 16) |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1001, 768) |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (768,) |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (2, 2, 4, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (768,) |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (256, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (768,) |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (768, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (768,) |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (256, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (768,) |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (768, 768) |
| ββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β β |
| β β |
| β β |
| β β |
| β TPU 0,1,2,3 β |
| β β |
| β β |
| β β |
| β β |
| ββββββββββββββββββββββββββββββββββββββββββββββββββ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β β |
| β β |
| β β |
| β β |
| β TPU 0,1,2,3 β |
| β β |
| β β |
| β β |
| β β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| doing the else |
| (512, 256, 256, 3) |
| encode image shape (128, 256, 256, 3) |
| Initializing encoder. |
| Incoming encoder shape (128, 256, 256, 3) |
| Encoder layer (128, 256, 256, 128) |
| doing downsample |
| Encoder layer (128, 128, 128, 128) |
| doing downsample |
| Encoder layer (128, 64, 64, 256) |
| doing downsample |
| Encoder layer (128, 32, 32, 512) |
| Encoder layer (128, 32, 32, 512) |
| Encoder layer final (128, 32, 32, 512) |
| Encoder layer final (128, 32, 32, 512) |
| Final embeddings are size (128, 32, 32, 8) |
| After quant (128, 32, 32, 4) |
| Calc FID for CFG 1.0 and denoise_timesteps 128 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| z_vectors shape (128, 32, 32, 4) |
| Decoder incoming shape (128, 32, 32, 4) |
| Decoder input (128, 32, 32, 512) |
| Mid Block Decoder layer (128, 32, 32, 512) |
| Mid Block Decoder layer (128, 32, 32, 512) |
| Decoder layer (128, 64, 64, 512) |
| Decoder layer (128, 128, 128, 512) |
| Decoder layer (128, 256, 256, 256) |
| Decoder layer (128, 256, 256, 128) |
| FID is 31.905973434448242 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.0 and denoise_timesteps 64 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 32.27625274658203 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.0 and denoise_timesteps 32 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 33.54695510864258 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.0 and denoise_timesteps 16 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 37.894325256347656 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.0 and denoise_timesteps 8 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 51.847389221191406 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.0 and denoise_timesteps 4 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 107.63591003417969 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.0 and denoise_timesteps 2 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 252.69888305664062 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.0 and denoise_timesteps 1 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 319.7226867675781 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.25 and denoise_timesteps 128 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 18.814821243286133 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.25 and denoise_timesteps 64 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 19.11261558532715 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.25 and denoise_timesteps 32 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 20.127628326416016 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.25 and denoise_timesteps 16 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 23.594669342041016 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.25 and denoise_timesteps 8 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 35.4896125793457 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.25 and denoise_timesteps 4 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 85.51443481445312 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.25 and denoise_timesteps 2 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 240.59774780273438 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.25 and denoise_timesteps 1 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 305.35357666015625 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.5 and denoise_timesteps 128 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 11.71961784362793 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.5 and denoise_timesteps 64 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 11.917760848999023 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.5 and denoise_timesteps 32 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 12.663106918334961 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.5 and denoise_timesteps 16 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 15.181784629821777 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.5 and denoise_timesteps 8 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 24.551116943359375 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.5 and denoise_timesteps 4 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 67.87528991699219 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.5 and denoise_timesteps 2 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 230.55465698242188 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.5 and denoise_timesteps 1 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 295.24188232421875 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.75 and denoise_timesteps 128 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 8.580363273620605 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.75 and denoise_timesteps 64 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 8.700770378112793 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.75 and denoise_timesteps 32 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 9.181977272033691 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.75 and denoise_timesteps 16 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 10.940503120422363 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.75 and denoise_timesteps 8 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 18.0069637298584 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.75 and denoise_timesteps 4 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 54.57537078857422 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.75 and denoise_timesteps 2 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 221.85202026367188 |
| (512, 256, 256, 3) |
| Calc FID for CFG 1.75 and denoise_timesteps 1 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 288.0120544433594 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.0 and denoise_timesteps 128 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 7.7056450843811035 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.0 and denoise_timesteps 64 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 7.759979248046875 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.0 and denoise_timesteps 32 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 7.997262954711914 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.0 and denoise_timesteps 16 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 9.159201622009277 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.0 and denoise_timesteps 8 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 14.285831451416016 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.0 and denoise_timesteps 4 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 44.61159133911133 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.0 and denoise_timesteps 2 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 214.304931640625 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.0 and denoise_timesteps 1 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 282.582275390625 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.25 and denoise_timesteps 128 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 7.957117557525635 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.25 and denoise_timesteps 64 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 7.965053081512451 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.25 and denoise_timesteps 32 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 8.088018417358398 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.25 and denoise_timesteps 16 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 8.782133102416992 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.25 and denoise_timesteps 8 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 12.382518768310547 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.25 and denoise_timesteps 4 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 37.3809928894043 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.25 and denoise_timesteps 2 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 207.85345458984375 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.25 and denoise_timesteps 1 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 278.45654296875 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.5 and denoise_timesteps 128 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 8.769401550292969 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.5 and denoise_timesteps 64 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 8.749847412109375 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.5 and denoise_timesteps 32 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 8.788768768310547 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.5 and denoise_timesteps 16 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 9.199583053588867 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.5 and denoise_timesteps 8 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 11.61952018737793 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.5 and denoise_timesteps 4 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 32.082252502441406 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.5 and denoise_timesteps 2 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 202.39967346191406 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.5 and denoise_timesteps 1 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 274.93511962890625 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.75 and denoise_timesteps 128 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 9.857498168945312 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.75 and denoise_timesteps 64 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 9.813886642456055 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.75 and denoise_timesteps 32 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 9.792471885681152 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.75 and denoise_timesteps 16 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 10.007205963134766 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.75 and denoise_timesteps 8 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 11.570734977722168 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.75 and denoise_timesteps 4 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 28.250038146972656 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.75 and denoise_timesteps 2 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 197.681396484375 |
| (512, 256, 256, 3) |
| Calc FID for CFG 2.75 and denoise_timesteps 1 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 271.9351501464844 |
| (512, 256, 256, 3) |
| Calc FID for CFG 3.0 and denoise_timesteps 128 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 11.016831398010254 |
| (512, 256, 256, 3) |
| Calc FID for CFG 3.0 and denoise_timesteps 64 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 10.961159706115723 |
| (512, 256, 256, 3) |
| Calc FID for CFG 3.0 and denoise_timesteps 32 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 10.917856216430664 |
| (512, 256, 256, 3) |
| Calc FID for CFG 3.0 and denoise_timesteps 16 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 10.987756729125977 |
| (512, 256, 256, 3) |
| Calc FID for CFG 3.0 and denoise_timesteps 8 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 11.926231384277344 |
| (512, 256, 256, 3) |
| Calc FID for CFG 3.0 and denoise_timesteps 4 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 25.491756439208984 |
| (512, 256, 256, 3) |
| Calc FID for CFG 3.0 and denoise_timesteps 2 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 193.66769409179688 |
| (512, 256, 256, 3) |
| Calc FID for CFG 3.0 and denoise_timesteps 1 |
| DiT: Input of shape (512, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (512, 768) dtype float32 |
| FID is 269.3216857910156 |
| [1;34mwandb[0m: |
| [1;34mwandb[0m: π View run [33mshortcut_imagenet256[0m at: [34mhttps://wandb.ai/daniel-z-kaplan/shortcut/runs/shortcut_imagenet256_20250816_141408_345353_10[0m |
| [1;34mwandb[0m: Find logs at: [1;35m../../../tmp/tmpsx4kqqyn/wandb/run-20250816_141408-shortcut_imagenet256_20250816_141408_345353_10/logs[0m |
|
|