KublaiKhan1's picture
Upload folder using huggingface_hub
35cbfdf verified
Using devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=2, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=3, process_index=0, coords=(1,1,0), core_on_chip=0)]
Device count 4
Global device count 4
Global Batch: 512
Node Batch: 512
Device Batch: 128
/tmp/tmpsx4kqqyn
Loading dataset
Loading dataset
creating model
beta1: 0.9
beta2: 0.999
bootstrap_cfg: 1
bootstrap_dt_bias: 0
bootstrap_ema: 1
bootstrap_every: 8
cfg_scale: 1.5
class_dropout_prob: 0.1
denoise_timesteps: 128
depth: 12
dropout: 0.0
dt_sampling: uniform
hidden_size: 768
lr: 0.0001
mlp_ratio: 4
num_classes: 1000
num_heads: 12
patch_size: 2
sharding: dp
t_sampling: discrete-dt
target_update_rate: 0.999
train_type: naive
use_cosine: 0
use_ema: 0
use_stable_vae: 1
warmup: 0
weight_decay: 0.1
Total devices TPU_0(process=0,(0,0,0,0))
Initializing encoder.
Incoming encoder shape (1, 256, 256, 3)
Encoder layer (1, 256, 256, 128)
doing downsample
Encoder layer (1, 128, 128, 128)
doing downsample
Encoder layer (1, 64, 64, 256)
doing downsample
Encoder layer (1, 32, 32, 512)
Encoder layer (1, 32, 32, 512)
Encoder layer final (1, 32, 32, 512)
Encoder layer final (1, 32, 32, 512)
Final embeddings are size (1, 32, 32, 8)
After quant (1, 32, 32, 4)
encode finished
Decoder incoming shape (1, 32, 32, 4)
Decoder input (1, 32, 32, 512)
Mid Block Decoder layer (1, 32, 32, 512)
Mid Block Decoder layer (1, 32, 32, 512)
Decoder layer (1, 64, 64, 512)
Decoder layer (1, 128, 128, 512)
Decoder layer (1, 256, 256, 256)
Decoder layer (1, 256, 256, 128)
Total num of VQVAE parameters: 67565323
Disc shape (1, 128, 128, 128)
Disc shape (1, 64, 64, 256)
Disc shape (1, 32, 32, 512)
Disc shape (1, 16, 16, 512)
Disc shape (1, 8, 8, 512)
Disc shape (1, 4, 4, 512)
Total num of Discriminator parameters: 23998017
Loaded checkpoint from 18291200 seconds ago.
Loaded model with step 447001
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ TPU 0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TPU 1 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TPU 2 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TPU 3 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
returning model
model done
Input to vae (4, 1, 256, 256, 3)
encode image shape (1, 256, 256, 3)
Initializing encoder.
Incoming encoder shape (1, 256, 256, 3)
Encoder layer (1, 256, 256, 128)
doing downsample
Encoder layer (1, 128, 128, 128)
doing downsample
Encoder layer (1, 64, 64, 256)
doing downsample
Encoder layer (1, 32, 32, 512)
Encoder layer (1, 32, 32, 512)
Encoder layer final (1, 32, 32, 512)
Encoder layer final (1, 32, 32, 512)
Final embeddings are size (1, 32, 32, 8)
After quant (1, 32, 32, 4)
output example shape (4, 1, 32, 32, 4)
Test data shape (4, 256, 256, 3)
x shape (4, 1, 256, 256, 3)
encoded shape (4, 1, 32, 32, 4)
z_vectors shape (1, 32, 32, 4)
Decoder incoming shape (1, 32, 32, 4)
Decoder input (1, 32, 32, 512)
Mid Block Decoder layer (1, 32, 32, 512)
Mid Block Decoder layer (1, 32, 32, 512)
Decoder layer (1, 64, 64, 512)
Decoder layer (1, 128, 128, 512)
Decoder layer (1, 256, 256, 256)
Decoder layer (1, 256, 256, 128)
image shape (4, 1, 256, 256, 3)
decoded img shape (256, 256, 3)
obs shape (4, 32, 32, 4)
DiT: Input of shape (4, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (4, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (4, 256, 768) dtype bfloat16
DiT: Conditioning of shape (1, 768) dtype float32
 DiT Summary 
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ path  ┃ module  ┃ inputs  ┃ outputs  ┃ params  ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ β”‚ DiT β”‚ - float32[4,32,32,4] β”‚ bfloat16[4,32,32,4] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1] β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1] β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ - int32[1] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ PatchEmbed_0 β”‚ PatchEmbed β”‚ float32[4,32,32,4] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ PatchEmbed_0/Conv_0 β”‚ Conv β”‚ float32[4,32,32,4] β”‚ bfloat16[4,16,16,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[2,2,4,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 13,056 (52.2 KB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TimestepEmbedder_0 β”‚ TimestepEmbedder β”‚ float32[1] β”‚ float32[1,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TimestepEmbedder_0/Dense_0 β”‚ Dense β”‚ bfloat16[1,256] β”‚ bfloat16[1,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[256,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 197,376 (789.5 KB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TimestepEmbedder_0/Dense_1 β”‚ Dense β”‚ bfloat16[1,768] β”‚ float32[1,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TimestepEmbedder_1 β”‚ TimestepEmbedder β”‚ float32[1] β”‚ float32[1,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TimestepEmbedder_1/Dense_0 β”‚ Dense β”‚ bfloat16[1,256] β”‚ bfloat16[1,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[256,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 197,376 (789.5 KB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ TimestepEmbedder_1/Dense_1 β”‚ Dense β”‚ bfloat16[1,768] β”‚ float32[1,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ LabelEmbedder_0 β”‚ LabelEmbedder β”‚ int32[1] β”‚ bfloat16[1,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ LabelEmbedder_0/Embed_0 β”‚ Embed β”‚ int32[1] β”‚ bfloat16[1,768] β”‚ embedding: float32[1001,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 768,768 (3.1 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0 β”‚ DiTBlock β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,4608] β”‚ bias: float32[4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 3,543,552 (14.2 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0/Dense_2 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0/Dense_3 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0/Dense_4 β”‚ Dense β”‚ float32[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0/LayerNorm_1 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0/MlpBlock_0 β”‚ MlpBlock β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0/MlpBlock_0/Dense_0 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,3072] β”‚ bias: float32[3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,362,368 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0/MlpBlock_0/Dropout_0 β”‚ Dropout β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,3072] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0/MlpBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[3072,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,360,064 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_0/MlpBlock_0/Dropout_1 β”‚ Dropout β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1 β”‚ DiTBlock β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,4608] β”‚ bias: float32[4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 3,543,552 (14.2 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1/Dense_2 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1/Dense_3 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1/Dense_4 β”‚ Dense β”‚ float32[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1/LayerNorm_1 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1/MlpBlock_0 β”‚ MlpBlock β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1/MlpBlock_0/Dense_0 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,3072] β”‚ bias: float32[3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,362,368 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1/MlpBlock_0/Dropout_0 β”‚ Dropout β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,3072] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1/MlpBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[3072,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,360,064 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_1/MlpBlock_0/Dropout_1 β”‚ Dropout β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2 β”‚ DiTBlock β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,4608] β”‚ bias: float32[4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 3,543,552 (14.2 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2/Dense_2 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2/Dense_3 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2/Dense_4 β”‚ Dense β”‚ float32[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2/LayerNorm_1 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2/MlpBlock_0 β”‚ MlpBlock β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2/MlpBlock_0/Dense_0 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,3072] β”‚ bias: float32[3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,362,368 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2/MlpBlock_0/Dropout_0 β”‚ Dropout β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,3072] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2/MlpBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[3072,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,360,064 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_2/MlpBlock_0/Dropout_1 β”‚ Dropout β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3 β”‚ DiTBlock β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,4608] β”‚ bias: float32[4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 3,543,552 (14.2 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3/Dense_2 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3/Dense_3 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3/Dense_4 β”‚ Dense β”‚ float32[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3/LayerNorm_1 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3/MlpBlock_0 β”‚ MlpBlock β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3/MlpBlock_0/Dense_0 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,3072] β”‚ bias: float32[3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,362,368 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3/MlpBlock_0/Dropout_0 β”‚ Dropout β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,3072] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3/MlpBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[3072,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,360,064 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_3/MlpBlock_0/Dropout_1 β”‚ Dropout β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4 β”‚ DiTBlock β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,4608] β”‚ bias: float32[4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 3,543,552 (14.2 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4/Dense_2 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4/Dense_3 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4/Dense_4 β”‚ Dense β”‚ float32[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4/LayerNorm_1 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4/MlpBlock_0 β”‚ MlpBlock β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4/MlpBlock_0/Dense_0 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,3072] β”‚ bias: float32[3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,362,368 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4/MlpBlock_0/Dropout_0 β”‚ Dropout β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,3072] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4/MlpBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[3072,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,360,064 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_4/MlpBlock_0/Dropout_1 β”‚ Dropout β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5 β”‚ DiTBlock β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,4608] β”‚ bias: float32[4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 3,543,552 (14.2 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5/Dense_2 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5/Dense_3 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5/Dense_4 β”‚ Dense β”‚ float32[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5/LayerNorm_1 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5/MlpBlock_0 β”‚ MlpBlock β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5/MlpBlock_0/Dense_0 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,3072] β”‚ bias: float32[3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,362,368 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5/MlpBlock_0/Dropout_0 β”‚ Dropout β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,3072] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5/MlpBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[3072,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,360,064 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_5/MlpBlock_0/Dropout_1 β”‚ Dropout β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6 β”‚ DiTBlock β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,4608] β”‚ bias: float32[4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 3,543,552 (14.2 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6/Dense_2 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6/Dense_3 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6/Dense_4 β”‚ Dense β”‚ float32[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6/LayerNorm_1 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6/MlpBlock_0 β”‚ MlpBlock β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6/MlpBlock_0/Dense_0 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,3072] β”‚ bias: float32[3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,362,368 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6/MlpBlock_0/Dropout_0 β”‚ Dropout β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,3072] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6/MlpBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[3072,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,360,064 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_6/MlpBlock_0/Dropout_1 β”‚ Dropout β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7 β”‚ DiTBlock β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,4608] β”‚ bias: float32[4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 3,543,552 (14.2 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7/Dense_2 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7/Dense_3 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7/Dense_4 β”‚ Dense β”‚ float32[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7/LayerNorm_1 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7/MlpBlock_0 β”‚ MlpBlock β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7/MlpBlock_0/Dense_0 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,3072] β”‚ bias: float32[3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,362,368 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7/MlpBlock_0/Dropout_0 β”‚ Dropout β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,3072] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7/MlpBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[3072,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,360,064 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_7/MlpBlock_0/Dropout_1 β”‚ Dropout β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8 β”‚ DiTBlock β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,4608] β”‚ bias: float32[4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 3,543,552 (14.2 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8/Dense_2 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8/Dense_3 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8/Dense_4 β”‚ Dense β”‚ float32[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8/LayerNorm_1 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8/MlpBlock_0 β”‚ MlpBlock β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8/MlpBlock_0/Dense_0 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,3072] β”‚ bias: float32[3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,362,368 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8/MlpBlock_0/Dropout_0 β”‚ Dropout β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,3072] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8/MlpBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[3072,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,360,064 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_8/MlpBlock_0/Dropout_1 β”‚ Dropout β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9 β”‚ DiTBlock β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,4608] β”‚ bias: float32[4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 3,543,552 (14.2 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9/Dense_2 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9/Dense_3 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9/Dense_4 β”‚ Dense β”‚ float32[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9/LayerNorm_1 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9/MlpBlock_0 β”‚ MlpBlock β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9/MlpBlock_0/Dense_0 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,3072] β”‚ bias: float32[3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,362,368 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9/MlpBlock_0/Dropout_0 β”‚ Dropout β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,3072] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9/MlpBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[3072,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,360,064 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_9/MlpBlock_0/Dropout_1 β”‚ Dropout β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10 β”‚ DiTBlock β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,4608] β”‚ bias: float32[4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 3,543,552 (14.2 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10/Dense_2 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10/Dense_3 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10/Dense_4 β”‚ Dense β”‚ float32[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10/LayerNorm_1 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10/MlpBlock_0 β”‚ MlpBlock β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10/MlpBlock_0/Dense_0 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,3072] β”‚ bias: float32[3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,362,368 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10/MlpBlock_0/Dropout_0 β”‚ Dropout β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,3072] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10/MlpBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[3072,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,360,064 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_10/MlpBlock_0/Dropout_1 β”‚ Dropout β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11 β”‚ DiTBlock β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,4608] β”‚ bias: float32[4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,4608] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 3,543,552 (14.2 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11/Dense_2 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11/Dense_3 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11/Dense_4 β”‚ Dense β”‚ float32[4,256,768] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 590,592 (2.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11/LayerNorm_1 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11/MlpBlock_0 β”‚ MlpBlock β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11/MlpBlock_0/Dense_0 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,3072] β”‚ bias: float32[3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,3072] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,362,368 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11/MlpBlock_0/Dropout_0 β”‚ Dropout β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,3072] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11/MlpBlock_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,3072] β”‚ bfloat16[4,256,768] β”‚ bias: float32[768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[3072,768] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 2,360,064 (9.4 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DiTBlock_11/MlpBlock_0/Dropout_1 β”‚ Dropout β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ FinalLayer_0 β”‚ FinalLayer β”‚ - bfloat16[4,256,768] β”‚ bfloat16[4,256,16] β”‚ β”‚
β”‚ β”‚ β”‚ - float32[1,768] β”‚ β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ FinalLayer_0/Dense_0 β”‚ Dense β”‚ float32[1,768] β”‚ bfloat16[1,1536] β”‚ bias: float32[1536] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,1536] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 1,181,184 (4.7 MB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ FinalLayer_0/LayerNorm_0 β”‚ LayerNorm β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,768] β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ FinalLayer_0/Dense_1 β”‚ Dense β”‚ bfloat16[4,256,768] β”‚ bfloat16[4,256,16] β”‚ bias: float32[16] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ kernel: float32[768,16] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 12,304 (49.2 KB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Embed_0 β”‚ Embed β”‚ int32[1] β”‚ float32[1,1] β”‚ embedding: float32[256,1] β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ 256 (1.0 KB) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   β”‚   β”‚   β”‚  Total β”‚ 131,091,728 (524.4 MB)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 
 Total Parameters: 131,091,728 (524.4 MB) 
DiT: Input of shape (4, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (4, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (4, 256, 768) dtype bfloat16
DiT: Conditioning of shape (1, 768) dtype float32
Loaded checkpoint from 18039 seconds ago.
parameter shapes:
('PatchEmbed_0', 'Conv_0', 'kernel'): (2, 2, 4, 768)
('PatchEmbed_0', 'Conv_0', 'bias'): (768,)
('TimestepEmbedder_0', 'Dense_0', 'kernel'): (256, 768)
('TimestepEmbedder_0', 'Dense_0', 'bias'): (768,)
('TimestepEmbedder_0', 'Dense_1', 'kernel'): (768, 768)
('TimestepEmbedder_0', 'Dense_1', 'bias'): (768,)
('TimestepEmbedder_1', 'Dense_0', 'kernel'): (256, 768)
('TimestepEmbedder_1', 'Dense_0', 'bias'): (768,)
('TimestepEmbedder_1', 'Dense_1', 'kernel'): (768, 768)
('TimestepEmbedder_1', 'Dense_1', 'bias'): (768,)
('LabelEmbedder_0', 'Embed_0', 'embedding'): (1001, 768)
('DiTBlock_0', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_0', 'Dense_0', 'bias'): (4608,)
('DiTBlock_0', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_0', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_0', 'Dense_2', 'bias'): (768,)
('DiTBlock_0', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_0', 'Dense_3', 'bias'): (768,)
('DiTBlock_0', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_0', 'Dense_4', 'bias'): (768,)
('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_1', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_1', 'Dense_0', 'bias'): (4608,)
('DiTBlock_1', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_1', 'Dense_1', 'bias'): (768,)
('DiTBlock_1', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_1', 'Dense_2', 'bias'): (768,)
('DiTBlock_1', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_1', 'Dense_3', 'bias'): (768,)
('DiTBlock_1', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_1', 'Dense_4', 'bias'): (768,)
('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_2', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_2', 'Dense_0', 'bias'): (4608,)
('DiTBlock_2', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_2', 'Dense_1', 'bias'): (768,)
('DiTBlock_2', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_2', 'Dense_2', 'bias'): (768,)
('DiTBlock_2', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_2', 'Dense_3', 'bias'): (768,)
('DiTBlock_2', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_2', 'Dense_4', 'bias'): (768,)
('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_3', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_3', 'Dense_0', 'bias'): (4608,)
('DiTBlock_3', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_3', 'Dense_1', 'bias'): (768,)
('DiTBlock_3', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_3', 'Dense_2', 'bias'): (768,)
('DiTBlock_3', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_3', 'Dense_3', 'bias'): (768,)
('DiTBlock_3', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_3', 'Dense_4', 'bias'): (768,)
('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_4', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_4', 'Dense_0', 'bias'): (4608,)
('DiTBlock_4', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_4', 'Dense_1', 'bias'): (768,)
('DiTBlock_4', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_4', 'Dense_2', 'bias'): (768,)
('DiTBlock_4', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_4', 'Dense_3', 'bias'): (768,)
('DiTBlock_4', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_4', 'Dense_4', 'bias'): (768,)
('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_5', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_5', 'Dense_0', 'bias'): (4608,)
('DiTBlock_5', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_5', 'Dense_1', 'bias'): (768,)
('DiTBlock_5', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_5', 'Dense_2', 'bias'): (768,)
('DiTBlock_5', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_5', 'Dense_3', 'bias'): (768,)
('DiTBlock_5', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_5', 'Dense_4', 'bias'): (768,)
('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_6', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_6', 'Dense_0', 'bias'): (4608,)
('DiTBlock_6', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_6', 'Dense_1', 'bias'): (768,)
('DiTBlock_6', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_6', 'Dense_2', 'bias'): (768,)
('DiTBlock_6', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_6', 'Dense_3', 'bias'): (768,)
('DiTBlock_6', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_6', 'Dense_4', 'bias'): (768,)
('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_7', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_7', 'Dense_0', 'bias'): (4608,)
('DiTBlock_7', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_7', 'Dense_1', 'bias'): (768,)
('DiTBlock_7', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_7', 'Dense_2', 'bias'): (768,)
('DiTBlock_7', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_7', 'Dense_3', 'bias'): (768,)
('DiTBlock_7', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_7', 'Dense_4', 'bias'): (768,)
('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_8', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_8', 'Dense_0', 'bias'): (4608,)
('DiTBlock_8', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_8', 'Dense_1', 'bias'): (768,)
('DiTBlock_8', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_8', 'Dense_2', 'bias'): (768,)
('DiTBlock_8', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_8', 'Dense_3', 'bias'): (768,)
('DiTBlock_8', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_8', 'Dense_4', 'bias'): (768,)
('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_9', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_9', 'Dense_0', 'bias'): (4608,)
('DiTBlock_9', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_9', 'Dense_1', 'bias'): (768,)
('DiTBlock_9', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_9', 'Dense_2', 'bias'): (768,)
('DiTBlock_9', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_9', 'Dense_3', 'bias'): (768,)
('DiTBlock_9', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_9', 'Dense_4', 'bias'): (768,)
('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_10', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_10', 'Dense_0', 'bias'): (4608,)
('DiTBlock_10', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_10', 'Dense_1', 'bias'): (768,)
('DiTBlock_10', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_10', 'Dense_2', 'bias'): (768,)
('DiTBlock_10', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_10', 'Dense_3', 'bias'): (768,)
('DiTBlock_10', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_10', 'Dense_4', 'bias'): (768,)
('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_11', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_11', 'Dense_0', 'bias'): (4608,)
('DiTBlock_11', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_11', 'Dense_1', 'bias'): (768,)
('DiTBlock_11', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_11', 'Dense_2', 'bias'): (768,)
('DiTBlock_11', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_11', 'Dense_3', 'bias'): (768,)
('DiTBlock_11', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_11', 'Dense_4', 'bias'): (768,)
('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('FinalLayer_0', 'Dense_0', 'kernel'): (768, 1536)
('FinalLayer_0', 'Dense_0', 'bias'): (1536,)
('FinalLayer_0', 'Dense_1', 'kernel'): (768, 16)
('FinalLayer_0', 'Dense_1', 'bias'): (16,)
('Embed_0', 'embedding'): (256, 1)
parameter shapes:
('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_1', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_10', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_11', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_2', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_3', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_4', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_5', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_6', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_7', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_8', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_9', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('Embed_0', 'embedding'): (1, 256, 1)
('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536)
('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536)
('FinalLayer_0', 'Dense_1', 'bias'): (1, 16)
('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16)
('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768)
('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768)
('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768)
('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768)
('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768)
('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768)
('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768)
('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768)
('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768)
('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768)
('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768)
parameter shapes:
('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_1', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_10', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_11', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_2', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_3', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_4', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_5', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_6', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_7', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_8', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_9', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('Embed_0', 'embedding'): (1, 256, 1)
('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536)
('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536)
('FinalLayer_0', 'Dense_1', 'bias'): (1, 16)
('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16)
('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768)
('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768)
('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768)
('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768)
('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768)
('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768)
('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768)
('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768)
('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768)
('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768)
('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768)
parameter shapes:
('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_1', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_10', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_11', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_2', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_3', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_4', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_5', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_6', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_7', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_8', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_9', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('Embed_0', 'embedding'): (1, 256, 1)
('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536)
('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536)
('FinalLayer_0', 'Dense_1', 'bias'): (1, 16)
('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16)
('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768)
('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768)
('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768)
('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768)
('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768)
('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768)
('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768)
('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768)
('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768)
('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768)
('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768)
parameter shapes:
('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_1', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_10', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_11', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_2', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_3', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_4', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_5', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_6', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_7', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_8', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608)
('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608)
('DiTBlock_9', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'Dense_2', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'Dense_3', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'Dense_4', 'bias'): (1, 768)
('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768)
('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
('Embed_0', 'embedding'): (1, 256, 1)
('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536)
('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536)
('FinalLayer_0', 'Dense_1', 'bias'): (1, 16)
('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16)
('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768)
('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768)
('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768)
('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768)
('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768)
('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768)
('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768)
('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768)
('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768)
('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768)
('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768)
parameter shapes:
('DiTBlock_0', 'Dense_0', 'bias'): (4608,)
('DiTBlock_0', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_0', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_0', 'Dense_2', 'bias'): (768,)
('DiTBlock_0', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_0', 'Dense_3', 'bias'): (768,)
('DiTBlock_0', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_0', 'Dense_4', 'bias'): (768,)
('DiTBlock_0', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_1', 'Dense_0', 'bias'): (4608,)
('DiTBlock_1', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_1', 'Dense_1', 'bias'): (768,)
('DiTBlock_1', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_1', 'Dense_2', 'bias'): (768,)
('DiTBlock_1', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_1', 'Dense_3', 'bias'): (768,)
('DiTBlock_1', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_1', 'Dense_4', 'bias'): (768,)
('DiTBlock_1', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_10', 'Dense_0', 'bias'): (4608,)
('DiTBlock_10', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_10', 'Dense_1', 'bias'): (768,)
('DiTBlock_10', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_10', 'Dense_2', 'bias'): (768,)
('DiTBlock_10', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_10', 'Dense_3', 'bias'): (768,)
('DiTBlock_10', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_10', 'Dense_4', 'bias'): (768,)
('DiTBlock_10', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_11', 'Dense_0', 'bias'): (4608,)
('DiTBlock_11', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_11', 'Dense_1', 'bias'): (768,)
('DiTBlock_11', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_11', 'Dense_2', 'bias'): (768,)
('DiTBlock_11', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_11', 'Dense_3', 'bias'): (768,)
('DiTBlock_11', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_11', 'Dense_4', 'bias'): (768,)
('DiTBlock_11', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_2', 'Dense_0', 'bias'): (4608,)
('DiTBlock_2', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_2', 'Dense_1', 'bias'): (768,)
('DiTBlock_2', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_2', 'Dense_2', 'bias'): (768,)
('DiTBlock_2', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_2', 'Dense_3', 'bias'): (768,)
('DiTBlock_2', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_2', 'Dense_4', 'bias'): (768,)
('DiTBlock_2', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_3', 'Dense_0', 'bias'): (4608,)
('DiTBlock_3', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_3', 'Dense_1', 'bias'): (768,)
('DiTBlock_3', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_3', 'Dense_2', 'bias'): (768,)
('DiTBlock_3', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_3', 'Dense_3', 'bias'): (768,)
('DiTBlock_3', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_3', 'Dense_4', 'bias'): (768,)
('DiTBlock_3', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_4', 'Dense_0', 'bias'): (4608,)
('DiTBlock_4', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_4', 'Dense_1', 'bias'): (768,)
('DiTBlock_4', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_4', 'Dense_2', 'bias'): (768,)
('DiTBlock_4', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_4', 'Dense_3', 'bias'): (768,)
('DiTBlock_4', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_4', 'Dense_4', 'bias'): (768,)
('DiTBlock_4', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_5', 'Dense_0', 'bias'): (4608,)
('DiTBlock_5', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_5', 'Dense_1', 'bias'): (768,)
('DiTBlock_5', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_5', 'Dense_2', 'bias'): (768,)
('DiTBlock_5', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_5', 'Dense_3', 'bias'): (768,)
('DiTBlock_5', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_5', 'Dense_4', 'bias'): (768,)
('DiTBlock_5', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_6', 'Dense_0', 'bias'): (4608,)
('DiTBlock_6', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_6', 'Dense_1', 'bias'): (768,)
('DiTBlock_6', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_6', 'Dense_2', 'bias'): (768,)
('DiTBlock_6', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_6', 'Dense_3', 'bias'): (768,)
('DiTBlock_6', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_6', 'Dense_4', 'bias'): (768,)
('DiTBlock_6', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_7', 'Dense_0', 'bias'): (4608,)
('DiTBlock_7', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_7', 'Dense_1', 'bias'): (768,)
('DiTBlock_7', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_7', 'Dense_2', 'bias'): (768,)
('DiTBlock_7', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_7', 'Dense_3', 'bias'): (768,)
('DiTBlock_7', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_7', 'Dense_4', 'bias'): (768,)
('DiTBlock_7', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_8', 'Dense_0', 'bias'): (4608,)
('DiTBlock_8', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_8', 'Dense_1', 'bias'): (768,)
('DiTBlock_8', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_8', 'Dense_2', 'bias'): (768,)
('DiTBlock_8', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_8', 'Dense_3', 'bias'): (768,)
('DiTBlock_8', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_8', 'Dense_4', 'bias'): (768,)
('DiTBlock_8', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('DiTBlock_9', 'Dense_0', 'bias'): (4608,)
('DiTBlock_9', 'Dense_0', 'kernel'): (768, 4608)
('DiTBlock_9', 'Dense_1', 'bias'): (768,)
('DiTBlock_9', 'Dense_1', 'kernel'): (768, 768)
('DiTBlock_9', 'Dense_2', 'bias'): (768,)
('DiTBlock_9', 'Dense_2', 'kernel'): (768, 768)
('DiTBlock_9', 'Dense_3', 'bias'): (768,)
('DiTBlock_9', 'Dense_3', 'kernel'): (768, 768)
('DiTBlock_9', 'Dense_4', 'bias'): (768,)
('DiTBlock_9', 'Dense_4', 'kernel'): (768, 768)
('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
('Embed_0', 'embedding'): (256, 1)
('FinalLayer_0', 'Dense_0', 'bias'): (1536,)
('FinalLayer_0', 'Dense_0', 'kernel'): (768, 1536)
('FinalLayer_0', 'Dense_1', 'bias'): (16,)
('FinalLayer_0', 'Dense_1', 'kernel'): (768, 16)
('LabelEmbedder_0', 'Embed_0', 'embedding'): (1001, 768)
('PatchEmbed_0', 'Conv_0', 'bias'): (768,)
('PatchEmbed_0', 'Conv_0', 'kernel'): (2, 2, 4, 768)
('TimestepEmbedder_0', 'Dense_0', 'bias'): (768,)
('TimestepEmbedder_0', 'Dense_0', 'kernel'): (256, 768)
('TimestepEmbedder_0', 'Dense_1', 'bias'): (768,)
('TimestepEmbedder_0', 'Dense_1', 'kernel'): (768, 768)
('TimestepEmbedder_1', 'Dense_0', 'bias'): (768,)
('TimestepEmbedder_1', 'Dense_0', 'kernel'): (256, 768)
('TimestepEmbedder_1', 'Dense_1', 'bias'): (768,)
('TimestepEmbedder_1', 'Dense_1', 'kernel'): (768, 768)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚
β”‚ β”‚
β”‚ β”‚
β”‚ β”‚
β”‚ TPU 0,1,2,3 β”‚
β”‚ β”‚
β”‚ β”‚
β”‚ β”‚
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚
β”‚ β”‚
β”‚ β”‚
β”‚ β”‚
β”‚ TPU 0,1,2,3 β”‚
β”‚ β”‚
β”‚ β”‚
β”‚ β”‚
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
doing the else
(512, 256, 256, 3)
encode image shape (128, 256, 256, 3)
Initializing encoder.
Incoming encoder shape (128, 256, 256, 3)
Encoder layer (128, 256, 256, 128)
doing downsample
Encoder layer (128, 128, 128, 128)
doing downsample
Encoder layer (128, 64, 64, 256)
doing downsample
Encoder layer (128, 32, 32, 512)
Encoder layer (128, 32, 32, 512)
Encoder layer final (128, 32, 32, 512)
Encoder layer final (128, 32, 32, 512)
Final embeddings are size (128, 32, 32, 8)
After quant (128, 32, 32, 4)
Calc FID for CFG 1.0 and denoise_timesteps 128
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
z_vectors shape (128, 32, 32, 4)
Decoder incoming shape (128, 32, 32, 4)
Decoder input (128, 32, 32, 512)
Mid Block Decoder layer (128, 32, 32, 512)
Mid Block Decoder layer (128, 32, 32, 512)
Decoder layer (128, 64, 64, 512)
Decoder layer (128, 128, 128, 512)
Decoder layer (128, 256, 256, 256)
Decoder layer (128, 256, 256, 128)
FID is 31.905973434448242
(512, 256, 256, 3)
Calc FID for CFG 1.0 and denoise_timesteps 64
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 32.27625274658203
(512, 256, 256, 3)
Calc FID for CFG 1.0 and denoise_timesteps 32
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 33.54695510864258
(512, 256, 256, 3)
Calc FID for CFG 1.0 and denoise_timesteps 16
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 37.894325256347656
(512, 256, 256, 3)
Calc FID for CFG 1.0 and denoise_timesteps 8
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 51.847389221191406
(512, 256, 256, 3)
Calc FID for CFG 1.0 and denoise_timesteps 4
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 107.63591003417969
(512, 256, 256, 3)
Calc FID for CFG 1.0 and denoise_timesteps 2
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 252.69888305664062
(512, 256, 256, 3)
Calc FID for CFG 1.0 and denoise_timesteps 1
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 319.7226867675781
(512, 256, 256, 3)
Calc FID for CFG 1.25 and denoise_timesteps 128
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 18.814821243286133
(512, 256, 256, 3)
Calc FID for CFG 1.25 and denoise_timesteps 64
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 19.11261558532715
(512, 256, 256, 3)
Calc FID for CFG 1.25 and denoise_timesteps 32
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 20.127628326416016
(512, 256, 256, 3)
Calc FID for CFG 1.25 and denoise_timesteps 16
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 23.594669342041016
(512, 256, 256, 3)
Calc FID for CFG 1.25 and denoise_timesteps 8
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 35.4896125793457
(512, 256, 256, 3)
Calc FID for CFG 1.25 and denoise_timesteps 4
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 85.51443481445312
(512, 256, 256, 3)
Calc FID for CFG 1.25 and denoise_timesteps 2
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 240.59774780273438
(512, 256, 256, 3)
Calc FID for CFG 1.25 and denoise_timesteps 1
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 305.35357666015625
(512, 256, 256, 3)
Calc FID for CFG 1.5 and denoise_timesteps 128
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 11.71961784362793
(512, 256, 256, 3)
Calc FID for CFG 1.5 and denoise_timesteps 64
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 11.917760848999023
(512, 256, 256, 3)
Calc FID for CFG 1.5 and denoise_timesteps 32
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 12.663106918334961
(512, 256, 256, 3)
Calc FID for CFG 1.5 and denoise_timesteps 16
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 15.181784629821777
(512, 256, 256, 3)
Calc FID for CFG 1.5 and denoise_timesteps 8
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 24.551116943359375
(512, 256, 256, 3)
Calc FID for CFG 1.5 and denoise_timesteps 4
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 67.87528991699219
(512, 256, 256, 3)
Calc FID for CFG 1.5 and denoise_timesteps 2
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 230.55465698242188
(512, 256, 256, 3)
Calc FID for CFG 1.5 and denoise_timesteps 1
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 295.24188232421875
(512, 256, 256, 3)
Calc FID for CFG 1.75 and denoise_timesteps 128
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 8.580363273620605
(512, 256, 256, 3)
Calc FID for CFG 1.75 and denoise_timesteps 64
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 8.700770378112793
(512, 256, 256, 3)
Calc FID for CFG 1.75 and denoise_timesteps 32
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 9.181977272033691
(512, 256, 256, 3)
Calc FID for CFG 1.75 and denoise_timesteps 16
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 10.940503120422363
(512, 256, 256, 3)
Calc FID for CFG 1.75 and denoise_timesteps 8
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 18.0069637298584
(512, 256, 256, 3)
Calc FID for CFG 1.75 and denoise_timesteps 4
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 54.57537078857422
(512, 256, 256, 3)
Calc FID for CFG 1.75 and denoise_timesteps 2
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 221.85202026367188
(512, 256, 256, 3)
Calc FID for CFG 1.75 and denoise_timesteps 1
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 288.0120544433594
(512, 256, 256, 3)
Calc FID for CFG 2.0 and denoise_timesteps 128
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 7.7056450843811035
(512, 256, 256, 3)
Calc FID for CFG 2.0 and denoise_timesteps 64
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 7.759979248046875
(512, 256, 256, 3)
Calc FID for CFG 2.0 and denoise_timesteps 32
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 7.997262954711914
(512, 256, 256, 3)
Calc FID for CFG 2.0 and denoise_timesteps 16
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 9.159201622009277
(512, 256, 256, 3)
Calc FID for CFG 2.0 and denoise_timesteps 8
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 14.285831451416016
(512, 256, 256, 3)
Calc FID for CFG 2.0 and denoise_timesteps 4
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 44.61159133911133
(512, 256, 256, 3)
Calc FID for CFG 2.0 and denoise_timesteps 2
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 214.304931640625
(512, 256, 256, 3)
Calc FID for CFG 2.0 and denoise_timesteps 1
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 282.582275390625
(512, 256, 256, 3)
Calc FID for CFG 2.25 and denoise_timesteps 128
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 7.957117557525635
(512, 256, 256, 3)
Calc FID for CFG 2.25 and denoise_timesteps 64
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 7.965053081512451
(512, 256, 256, 3)
Calc FID for CFG 2.25 and denoise_timesteps 32
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 8.088018417358398
(512, 256, 256, 3)
Calc FID for CFG 2.25 and denoise_timesteps 16
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 8.782133102416992
(512, 256, 256, 3)
Calc FID for CFG 2.25 and denoise_timesteps 8
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 12.382518768310547
(512, 256, 256, 3)
Calc FID for CFG 2.25 and denoise_timesteps 4
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 37.3809928894043
(512, 256, 256, 3)
Calc FID for CFG 2.25 and denoise_timesteps 2
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 207.85345458984375
(512, 256, 256, 3)
Calc FID for CFG 2.25 and denoise_timesteps 1
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 278.45654296875
(512, 256, 256, 3)
Calc FID for CFG 2.5 and denoise_timesteps 128
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 8.769401550292969
(512, 256, 256, 3)
Calc FID for CFG 2.5 and denoise_timesteps 64
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 8.749847412109375
(512, 256, 256, 3)
Calc FID for CFG 2.5 and denoise_timesteps 32
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 8.788768768310547
(512, 256, 256, 3)
Calc FID for CFG 2.5 and denoise_timesteps 16
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 9.199583053588867
(512, 256, 256, 3)
Calc FID for CFG 2.5 and denoise_timesteps 8
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 11.61952018737793
(512, 256, 256, 3)
Calc FID for CFG 2.5 and denoise_timesteps 4
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 32.082252502441406
(512, 256, 256, 3)
Calc FID for CFG 2.5 and denoise_timesteps 2
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 202.39967346191406
(512, 256, 256, 3)
Calc FID for CFG 2.5 and denoise_timesteps 1
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 274.93511962890625
(512, 256, 256, 3)
Calc FID for CFG 2.75 and denoise_timesteps 128
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 9.857498168945312
(512, 256, 256, 3)
Calc FID for CFG 2.75 and denoise_timesteps 64
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 9.813886642456055
(512, 256, 256, 3)
Calc FID for CFG 2.75 and denoise_timesteps 32
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 9.792471885681152
(512, 256, 256, 3)
Calc FID for CFG 2.75 and denoise_timesteps 16
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 10.007205963134766
(512, 256, 256, 3)
Calc FID for CFG 2.75 and denoise_timesteps 8
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 11.570734977722168
(512, 256, 256, 3)
Calc FID for CFG 2.75 and denoise_timesteps 4
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 28.250038146972656
(512, 256, 256, 3)
Calc FID for CFG 2.75 and denoise_timesteps 2
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 197.681396484375
(512, 256, 256, 3)
Calc FID for CFG 2.75 and denoise_timesteps 1
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 271.9351501464844
(512, 256, 256, 3)
Calc FID for CFG 3.0 and denoise_timesteps 128
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 11.016831398010254
(512, 256, 256, 3)
Calc FID for CFG 3.0 and denoise_timesteps 64
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 10.961159706115723
(512, 256, 256, 3)
Calc FID for CFG 3.0 and denoise_timesteps 32
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 10.917856216430664
(512, 256, 256, 3)
Calc FID for CFG 3.0 and denoise_timesteps 16
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 10.987756729125977
(512, 256, 256, 3)
Calc FID for CFG 3.0 and denoise_timesteps 8
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 11.926231384277344
(512, 256, 256, 3)
Calc FID for CFG 3.0 and denoise_timesteps 4
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 25.491756439208984
(512, 256, 256, 3)
Calc FID for CFG 3.0 and denoise_timesteps 2
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 193.66769409179688
(512, 256, 256, 3)
Calc FID for CFG 3.0 and denoise_timesteps 1
DiT: Input of shape (512, 32, 32, 4) dtype float32
DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
DiT: Conditioning of shape (512, 768) dtype float32
FID is 269.3216857910156
wandb:
wandb: πŸš€ View run shortcut_imagenet256 at: https://wandb.ai/daniel-z-kaplan/shortcut/runs/shortcut_imagenet256_20250816_141408_345353_10
wandb: Find logs at: ../../../tmp/tmpsx4kqqyn/wandb/run-20250816_141408-shortcut_imagenet256_20250816_141408_345353_10/logs