| Using devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=2, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=3, process_index=0, coords=(1,1,0), core_on_chip=0)] |
| Device count 4 |
| Global device count 4 |
| Global Batch: 256 |
| Node Batch: 256 |
| Device Batch: 64 |
| Loading dataset |
| Loading dataset |
| DiT: Input of shape (1, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (1, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (1, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (1, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (1, 256, 768) |
| (1, 768) |
| (1, 768) |
| (1, 256, 768) |
| (1, 768) |
| (1, 256, 768) |
|
|
| [3m DiT Summary [0m |
| βββββββββββββββββββββββββββββββββββ³ββββββββββββββββββββββββββββ³ββββββββββββββββββββββββ³ββββββββββββββββββββββββ³βββββββββββββββββββββββββββββββ |
| β[1m [0m[1mpath [0m[1m [0mβ[1m [0m[1mmodule [0m[1m [0mβ[1m [0m[1minputs [0m[1m [0mβ[1m [0m[1moutputs [0m[1m [0mβ[1m [0m[1mparams [0m[1m [0mβ |
| β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ© |
| β β DiT β - [2mfloat32[0m[1,32,32,4] β [2mbfloat16[0m[1,32,32,4] β β |
| β β β - [2mfloat32[0m[1] β β β |
| β β β - [2mfloat32[0m[1] β β β |
| β β β - [2mint32[0m[1] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β PatchEmbed_0 β PatchEmbed β [2mfloat32[0m[1,32,32,4] β [2mbfloat16[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β PatchEmbed_0/Conv_0 β Conv β [2mfloat32[0m[1,32,32,4] β [2mbfloat16[0m[1,16,16,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2,2,4,768] β |
| β β β β β β |
| β β β β β [1m13,056 [0m[1;2m(52.2 KB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β PatchEmbed_1 β PatchEmbed β [2mfloat32[0m[1,32,32,4] β [2mbfloat16[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β PatchEmbed_1/Conv_0 β Conv β [2mfloat32[0m[1,32,32,4] β [2mbfloat16[0m[1,16,16,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2,2,4,768] β |
| β β β β β β |
| β β β β β [1m13,056 [0m[1;2m(52.2 KB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β TimestepEmbedder_0 β TimestepEmbedder β [2mfloat32[0m[1] β [2mfloat32[0m[1,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β TimestepEmbedder_0/Dense_0 β Dense β [2mbfloat16[0m[1,256] β [2mbfloat16[0m[1,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[256,768] β |
| β β β β β β |
| β β β β β [1m197,376 [0m[1;2m(789.5 KB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β TimestepEmbedder_0/Dense_1 β Dense β [2mbfloat16[0m[1,768] β [2mfloat32[0m[1,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β TimestepEmbedder_1 β TimestepEmbedder β [2mfloat32[0m[1] β [2mfloat32[0m[1,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β TimestepEmbedder_1/Dense_0 β Dense β [2mbfloat16[0m[1,256] β [2mbfloat16[0m[1,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[256,768] β |
| β β β β β β |
| β β β β β [1m197,376 [0m[1;2m(789.5 KB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β TimestepEmbedder_1/Dense_1 β Dense β [2mbfloat16[0m[1,768] β [2mfloat32[0m[1,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β LabelEmbedder_0 β LabelEmbedder β [2mint32[0m[1] β [2mbfloat16[0m[1,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β LabelEmbedder_0/Embed_0 β Embed β [2mint32[0m[1] β [2mbfloat16[0m[1,768] β embedding: [2mfloat32[0m[1001,768] β |
| β β β β β β |
| β β β β β [1m768,768 [0m[1;2m(3.1 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β VisionRotaryEmbeddingFast_0 β VisionRotaryEmbeddingFast β [2mbfloat16[0m[1,256,12,64] β [2mfloat32[0m[1,256,12,64] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_0/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_1/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_2/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_3/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_4/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_5/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_6/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_7/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_8/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_9/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_10/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_11/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_12 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_12/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_12/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_12/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_12/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_12/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_12/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_12/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_12/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_13 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_13/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_13/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_13/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_13/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_13/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_13/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_13/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_13/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_14 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_14/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_14/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_14/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_14/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_14/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_14/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_14/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_14/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_15 β DiTBlock β - [2mfloat32[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_15/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,4608] β bias: [2mfloat32[0m[4608] β |
| β β β β β kernel: [2mfloat32[0m[768,4608] β |
| β β β β β β |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_15/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_15/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_15/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_15/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[768,768] β |
| β β β β β β |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_15/SwiGLUFFN_0 β SwiGLUFFN β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,768] β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_15/SwiGLUFFN_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mfloat32[0m[1,256,4096] β bias: [2mfloat32[0m[4096] β |
| β β β β β kernel: [2mfloat32[0m[768,4096] β |
| β β β β β β |
| β β β β β [1m3,149,824 [0m[1;2m(12.6 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β DiTBlock_15/SwiGLUFFN_0/Dense_1 β Dense β [2mfloat32[0m[1,256,2048] β [2mfloat32[0m[1,256,768] β bias: [2mfloat32[0m[768] β |
| β β β β β kernel: [2mfloat32[0m[2048,768] β |
| β β β β β β |
| β β β β β [1m1,573,632 [0m[1;2m(6.3 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β FinalLayer_0 β FinalLayer β - [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,16] β β |
| β β β - [2mfloat32[0m[1,256,768] β β β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β FinalLayer_0/Dense_0 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,1536] β bias: [2mfloat32[0m[1536] β |
| β β β β β kernel: [2mfloat32[0m[768,1536] β |
| β β β β β β |
| β β β β β [1m1,181,184 [0m[1;2m(4.7 MB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β FinalLayer_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,16] β bias: [2mfloat32[0m[16] β |
| β β β β β kernel: [2mfloat32[0m[768,16] β |
| β β β β β β |
| β β β β β [1m12,304 [0m[1;2m(49.2 KB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β Embed_0 β Embed β [2mint32[0m[1] β [2mfloat32[0m[1,1] β embedding: [2mfloat32[0m[256,1] β |
| β β β β β β |
| β β β β β [1m256 [0m[1;2m(1.0 KB)[0m β |
| βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ |
| β[1m [0m[1m [0m[1m [0mβ[1m [0m[1m [0m[1m [0mβ[1m [0m[1m [0m[1m [0mβ[1m [0m[1m Total[0m[1m [0mβ[1m [0m[1m173,634,576 [0m[1;2m(694.5 MB)[0m[1m [0m[1m [0mβ |
| βββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββ΄ββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββ |
| [1m [0m |
| [1m Total Parameters: 173,634,576 [0m[1;2m(694.5 MB)[0m[1m [0m |
|
|
|
|
| DiT: Input of shape (1, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (1, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (1, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (1, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (1, 256, 768) |
| (1, 768) |
| (1, 768) |
| (1, 256, 768) |
| (1, 768) |
| (1, 256, 768) |
| Loaded checkpoint from 26446 seconds ago. |
|
|
| parameter shapes: |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (2, 2, 4, 768) |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (768,) |
| ('PatchEmbed_1', 'Conv_0', 'kernel'): (2, 2, 4, 768) |
| ('PatchEmbed_1', 'Conv_0', 'bias'): (768,) |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (256, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (768,) |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (768, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (768,) |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (256, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (768,) |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (768, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (768,) |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1001, 768) |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_0', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_1', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_2', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_3', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_4', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_5', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_6', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_7', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_8', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_9', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_10', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_11', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_12', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_12', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_12', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_12', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_12', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_12', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_12', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_12', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_12', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_12', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_13', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_13', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_13', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_13', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_13', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_13', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_13', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_13', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_13', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_13', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_14', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_14', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_14', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_14', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_14', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_14', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_14', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_14', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_14', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_14', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_15', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_15', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_15', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_15', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_15', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_15', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_15', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_15', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_15', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_15', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (768, 1536) |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1536,) |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (768, 16) |
| ('FinalLayer_0', 'Dense_1', 'bias'): (16,) |
| ('Embed_0', 'embedding'): (256, 1) |
|
|
| parameter shapes: |
| ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_12', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_12', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_12', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_13', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_13', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_13', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_14', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_14', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_14', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_15', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_15', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_15', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('Embed_0', 'embedding'): (1, 256, 1) |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536) |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536) |
| ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16) |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16) |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768) |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768) |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) |
| ('PatchEmbed_1', 'Conv_0', 'bias'): (1, 768) |
| ('PatchEmbed_1', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768) |
|
|
| parameter shapes: |
| ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_12', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_12', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_12', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_13', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_13', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_13', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_14', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_14', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_14', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_15', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_15', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_15', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('Embed_0', 'embedding'): (1, 256, 1) |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536) |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536) |
| ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16) |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16) |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768) |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768) |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) |
| ('PatchEmbed_1', 'Conv_0', 'bias'): (1, 768) |
| ('PatchEmbed_1', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768) |
|
|
| parameter shapes: |
| ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_12', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_12', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_12', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_13', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_13', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_13', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_14', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_14', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_14', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_15', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_15', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_15', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('Embed_0', 'embedding'): (1, 256, 1) |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536) |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536) |
| ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16) |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16) |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768) |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768) |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) |
| ('PatchEmbed_1', 'Conv_0', 'bias'): (1, 768) |
| ('PatchEmbed_1', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768) |
|
|
| parameter shapes: |
| ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_12', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_12', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_12', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_12', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_13', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_13', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_13', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_13', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_14', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_14', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_14', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_14', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_15', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_15', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_15', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_15', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608) |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608) |
| ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768) |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (1, 4096) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (1, 768, 4096) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (1, 768) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (1, 2048, 768) |
| ('Embed_0', 'embedding'): (1, 256, 1) |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536) |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536) |
| ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16) |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16) |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768) |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768) |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) |
| ('PatchEmbed_1', 'Conv_0', 'bias'): (1, 768) |
| ('PatchEmbed_1', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768) |
|
|
| parameter shapes: |
| ('DiTBlock_0', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_0', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_1', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_1', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_1', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_10', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_10', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_10', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_11', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_11', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_11', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_12', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_12', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_12', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_12', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_12', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_12', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_12', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_12', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_12', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_12', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_12', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_13', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_13', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_13', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_13', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_13', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_13', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_13', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_13', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_13', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_13', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_13', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_14', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_14', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_14', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_14', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_14', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_14', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_14', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_14', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_14', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_14', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_14', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_15', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_15', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_15', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_15', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_15', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_15', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_15', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_15', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_15', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_15', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_15', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_2', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_2', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_2', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_3', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_3', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_3', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_4', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_4', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_4', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_5', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_5', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_5', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_6', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_6', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_6', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_7', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_7', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_7', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_8', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_8', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_8', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('DiTBlock_9', 'Dense_0', 'bias'): (4608,) |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (768, 4608) |
| ('DiTBlock_9', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_2', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_3', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'Dense_4', 'bias'): (768,) |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (768, 768) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_0', 'bias'): (4096,) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_0', 'kernel'): (768, 4096) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_1', 'bias'): (768,) |
| ('DiTBlock_9', 'SwiGLUFFN_0', 'Dense_1', 'kernel'): (2048, 768) |
| ('Embed_0', 'embedding'): (256, 1) |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1536,) |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (768, 1536) |
| ('FinalLayer_0', 'Dense_1', 'bias'): (16,) |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (768, 16) |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1001, 768) |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (768,) |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (2, 2, 4, 768) |
| ('PatchEmbed_1', 'Conv_0', 'bias'): (768,) |
| ('PatchEmbed_1', 'Conv_0', 'kernel'): (2, 2, 4, 768) |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (768,) |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (256, 768) |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (768,) |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (768, 768) |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (768,) |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (256, 768) |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (768,) |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (768, 768) |
| ββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β β |
| β β |
| β β |
| β β |
| β TPU 0,1,2,3 β |
| β β |
| β β |
| β β |
| β β |
| ββββββββββββββββββββββββββββββββββββββββββββββββββ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β β |
| β β |
| β β |
| β β |
| β TPU 0,1,2,3 β |
| β β |
| β β |
| β β |
| β β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| doing the else |
| Calc FID for CFG 1.0 and denoise_timesteps 128 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 19.844388961791992 |
| Calc FID for CFG 1.0 and denoise_timesteps 64 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 7.540131092071533 |
| Calc FID for CFG 1.0 and denoise_timesteps 32 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 7.326756000518799 |
| Calc FID for CFG 1.0 and denoise_timesteps 16 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 8.09870719909668 |
| Calc FID for CFG 1.0 and denoise_timesteps 8 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 8.908642768859863 |
| Calc FID for CFG 1.0 and denoise_timesteps 4 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 10.321769714355469 |
| Calc FID for CFG 1.0 and denoise_timesteps 2 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 13.177682876586914 |
| Calc FID for CFG 1.0 and denoise_timesteps 1 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 21.720718383789062 |
| Calc FID for CFG 1.25 and denoise_timesteps 128 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 9.465641021728516 |
| Calc FID for CFG 1.25 and denoise_timesteps 64 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 5.619251251220703 |
| Calc FID for CFG 1.25 and denoise_timesteps 32 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 6.133846759796143 |
| Calc FID for CFG 1.25 and denoise_timesteps 16 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 6.547901153564453 |
| Calc FID for CFG 1.25 and denoise_timesteps 8 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 7.239920616149902 |
| Calc FID for CFG 1.25 and denoise_timesteps 4 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 8.199488639831543 |
| Calc FID for CFG 1.25 and denoise_timesteps 2 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 10.293099403381348 |
| Calc FID for CFG 1.25 and denoise_timesteps 1 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 16.286109924316406 |
| Calc FID for CFG 1.5 and denoise_timesteps 128 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 5.479815483093262 |
| Calc FID for CFG 1.5 and denoise_timesteps 64 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 6.8870391845703125 |
| Calc FID for CFG 1.5 and denoise_timesteps 32 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 7.545630931854248 |
| Calc FID for CFG 1.5 and denoise_timesteps 16 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 7.620797157287598 |
| Calc FID for CFG 1.5 and denoise_timesteps 8 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 7.973172664642334 |
| Calc FID for CFG 1.5 and denoise_timesteps 4 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 8.499159812927246 |
| Calc FID for CFG 1.5 and denoise_timesteps 2 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 10.150247573852539 |
| Calc FID for CFG 1.5 and denoise_timesteps 1 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 16.650300979614258 |
| Calc FID for CFG 1.75 and denoise_timesteps 128 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 4.810676574707031 |
| Calc FID for CFG 1.75 and denoise_timesteps 64 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 8.879629135131836 |
| Calc FID for CFG 1.75 and denoise_timesteps 32 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 9.475668907165527 |
| Calc FID for CFG 1.75 and denoise_timesteps 16 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 9.26155948638916 |
| Calc FID for CFG 1.75 and denoise_timesteps 8 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 9.309438705444336 |
| Calc FID for CFG 1.75 and denoise_timesteps 4 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 9.46221923828125 |
| Calc FID for CFG 1.75 and denoise_timesteps 2 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 10.858394622802734 |
| Calc FID for CFG 1.75 and denoise_timesteps 1 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 19.826522827148438 |
| Calc FID for CFG 2.0 and denoise_timesteps 128 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 5.595621109008789 |
| Calc FID for CFG 2.0 and denoise_timesteps 64 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 10.847318649291992 |
| Calc FID for CFG 2.0 and denoise_timesteps 32 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 11.344917297363281 |
| Calc FID for CFG 2.0 and denoise_timesteps 16 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 10.920357704162598 |
| Calc FID for CFG 2.0 and denoise_timesteps 8 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 10.659997940063477 |
| Calc FID for CFG 2.0 and denoise_timesteps 4 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 10.584415435791016 |
| Calc FID for CFG 2.0 and denoise_timesteps 2 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 12.026324272155762 |
| Calc FID for CFG 2.0 and denoise_timesteps 1 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 24.808246612548828 |
| Calc FID for CFG 2.25 and denoise_timesteps 128 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 6.935703277587891 |
| Calc FID for CFG 2.25 and denoise_timesteps 64 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 12.604620933532715 |
| Calc FID for CFG 2.25 and denoise_timesteps 32 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 12.945205688476562 |
| Calc FID for CFG 2.25 and denoise_timesteps 16 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 12.379002571105957 |
| Calc FID for CFG 2.25 and denoise_timesteps 8 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 11.994935989379883 |
| Calc FID for CFG 2.25 and denoise_timesteps 4 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 11.68898868560791 |
| Calc FID for CFG 2.25 and denoise_timesteps 2 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 13.530972480773926 |
| Calc FID for CFG 2.25 and denoise_timesteps 1 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 30.642826080322266 |
| Calc FID for CFG 2.5 and denoise_timesteps 128 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 8.419413566589355 |
| Calc FID for CFG 2.5 and denoise_timesteps 64 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 14.017271041870117 |
| Calc FID for CFG 2.5 and denoise_timesteps 32 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 14.307957649230957 |
| Calc FID for CFG 2.5 and denoise_timesteps 16 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 13.642496109008789 |
| Calc FID for CFG 2.5 and denoise_timesteps 8 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 13.212810516357422 |
| Calc FID for CFG 2.5 and denoise_timesteps 4 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 12.848617553710938 |
| Calc FID for CFG 2.5 and denoise_timesteps 2 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 15.422527313232422 |
| Calc FID for CFG 2.5 and denoise_timesteps 1 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 36.63753890991211 |
| Calc FID for CFG 2.75 and denoise_timesteps 128 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 9.82433032989502 |
| Calc FID for CFG 2.75 and denoise_timesteps 64 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 15.356572151184082 |
| Calc FID for CFG 2.75 and denoise_timesteps 32 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 15.466590881347656 |
| Calc FID for CFG 2.75 and denoise_timesteps 16 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 14.721940040588379 |
| Calc FID for CFG 2.75 and denoise_timesteps 8 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 14.276111602783203 |
| Calc FID for CFG 2.75 and denoise_timesteps 4 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 14.007651329040527 |
| Calc FID for CFG 2.75 and denoise_timesteps 2 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 17.719642639160156 |
| Calc FID for CFG 2.75 and denoise_timesteps 1 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 42.4148063659668 |
| Calc FID for CFG 3.0 and denoise_timesteps 128 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 11.155534744262695 |
| Calc FID for CFG 3.0 and denoise_timesteps 64 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 16.455982208251953 |
| Calc FID for CFG 3.0 and denoise_timesteps 32 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 16.489774703979492 |
| Calc FID for CFG 3.0 and denoise_timesteps 16 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 15.718584060668945 |
| Calc FID for CFG 3.0 and denoise_timesteps 8 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 15.273191452026367 |
| Calc FID for CFG 3.0 and denoise_timesteps 4 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 15.260807991027832 |
| Calc FID for CFG 3.0 and denoise_timesteps 2 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 20.481718063354492 |
| Calc FID for CFG 3.0 and denoise_timesteps 1 |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 |
| DiT: Conditioning of shape (256, 768) dtype float32 |
| selfh idden 768 |
| self heads 12 |
| hw_swq 16 |
| xshape (256, 256, 768) |
| (256, 768) |
| (256, 768) |
| (256, 256, 768) |
| (256, 768) |
| (256, 256, 768) |
| FID is 47.77876281738281 |
|
|