KublaiKhan1 commited on
Commit
550285f
Β·
verified Β·
1 Parent(s): fef4fe8

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. 2e-5_no_sampling/log.txt +1669 -1
2e-5_no_sampling/log.txt CHANGED
@@ -67,7 +67,7 @@ Disc shape (1, 16, 16, 512)
67
  Disc shape (1, 8, 8, 512)
68
  Disc shape (1, 4, 4, 512)
69
  Total num of Discriminator parameters: 23998017
70
- Loaded checkpoint from 16591587 seconds ago.
71
  Loaded model with step 511001
72
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
73
  β”‚ TPU 0 β”‚
@@ -768,3 +768,1671 @@ DiT: Input of shape (4, 32, 32, 4) dtype float32
768
  DiT: After patch embed, shape is (4, 256, 768) dtype bfloat16
769
  DiT: Patch Embed of shape (4, 256, 768) dtype bfloat16
770
  DiT: Conditioning of shape (1, 768) dtype float32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  Disc shape (1, 8, 8, 512)
68
  Disc shape (1, 4, 4, 512)
69
  Total num of Discriminator parameters: 23998017
70
+ Loaded checkpoint from 16697582 seconds ago.
71
  Loaded model with step 511001
72
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
73
  β”‚ TPU 0 β”‚
 
768
  DiT: After patch embed, shape is (4, 256, 768) dtype bfloat16
769
  DiT: Patch Embed of shape (4, 256, 768) dtype bfloat16
770
  DiT: Conditioning of shape (1, 768) dtype float32
771
+ Loaded checkpoint from 153447 seconds ago.
772
+
773
+ parameter shapes:
774
+ ('PatchEmbed_0', 'Conv_0', 'kernel'): (2, 2, 4, 768)
775
+ ('PatchEmbed_0', 'Conv_0', 'bias'): (768,)
776
+ ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (256, 768)
777
+ ('TimestepEmbedder_0', 'Dense_0', 'bias'): (768,)
778
+ ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (768, 768)
779
+ ('TimestepEmbedder_0', 'Dense_1', 'bias'): (768,)
780
+ ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (256, 768)
781
+ ('TimestepEmbedder_1', 'Dense_0', 'bias'): (768,)
782
+ ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (768, 768)
783
+ ('TimestepEmbedder_1', 'Dense_1', 'bias'): (768,)
784
+ ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1001, 768)
785
+ ('DiTBlock_0', 'Dense_0', 'kernel'): (768, 4608)
786
+ ('DiTBlock_0', 'Dense_0', 'bias'): (4608,)
787
+ ('DiTBlock_0', 'Dense_1', 'kernel'): (768, 768)
788
+ ('DiTBlock_0', 'Dense_1', 'bias'): (768,)
789
+ ('DiTBlock_0', 'Dense_2', 'kernel'): (768, 768)
790
+ ('DiTBlock_0', 'Dense_2', 'bias'): (768,)
791
+ ('DiTBlock_0', 'Dense_3', 'kernel'): (768, 768)
792
+ ('DiTBlock_0', 'Dense_3', 'bias'): (768,)
793
+ ('DiTBlock_0', 'Dense_4', 'kernel'): (768, 768)
794
+ ('DiTBlock_0', 'Dense_4', 'bias'): (768,)
795
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
796
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
797
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
798
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
799
+ ('DiTBlock_1', 'Dense_0', 'kernel'): (768, 4608)
800
+ ('DiTBlock_1', 'Dense_0', 'bias'): (4608,)
801
+ ('DiTBlock_1', 'Dense_1', 'kernel'): (768, 768)
802
+ ('DiTBlock_1', 'Dense_1', 'bias'): (768,)
803
+ ('DiTBlock_1', 'Dense_2', 'kernel'): (768, 768)
804
+ ('DiTBlock_1', 'Dense_2', 'bias'): (768,)
805
+ ('DiTBlock_1', 'Dense_3', 'kernel'): (768, 768)
806
+ ('DiTBlock_1', 'Dense_3', 'bias'): (768,)
807
+ ('DiTBlock_1', 'Dense_4', 'kernel'): (768, 768)
808
+ ('DiTBlock_1', 'Dense_4', 'bias'): (768,)
809
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
810
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
811
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
812
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
813
+ ('DiTBlock_2', 'Dense_0', 'kernel'): (768, 4608)
814
+ ('DiTBlock_2', 'Dense_0', 'bias'): (4608,)
815
+ ('DiTBlock_2', 'Dense_1', 'kernel'): (768, 768)
816
+ ('DiTBlock_2', 'Dense_1', 'bias'): (768,)
817
+ ('DiTBlock_2', 'Dense_2', 'kernel'): (768, 768)
818
+ ('DiTBlock_2', 'Dense_2', 'bias'): (768,)
819
+ ('DiTBlock_2', 'Dense_3', 'kernel'): (768, 768)
820
+ ('DiTBlock_2', 'Dense_3', 'bias'): (768,)
821
+ ('DiTBlock_2', 'Dense_4', 'kernel'): (768, 768)
822
+ ('DiTBlock_2', 'Dense_4', 'bias'): (768,)
823
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
824
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
825
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
826
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
827
+ ('DiTBlock_3', 'Dense_0', 'kernel'): (768, 4608)
828
+ ('DiTBlock_3', 'Dense_0', 'bias'): (4608,)
829
+ ('DiTBlock_3', 'Dense_1', 'kernel'): (768, 768)
830
+ ('DiTBlock_3', 'Dense_1', 'bias'): (768,)
831
+ ('DiTBlock_3', 'Dense_2', 'kernel'): (768, 768)
832
+ ('DiTBlock_3', 'Dense_2', 'bias'): (768,)
833
+ ('DiTBlock_3', 'Dense_3', 'kernel'): (768, 768)
834
+ ('DiTBlock_3', 'Dense_3', 'bias'): (768,)
835
+ ('DiTBlock_3', 'Dense_4', 'kernel'): (768, 768)
836
+ ('DiTBlock_3', 'Dense_4', 'bias'): (768,)
837
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
838
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
839
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
840
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
841
+ ('DiTBlock_4', 'Dense_0', 'kernel'): (768, 4608)
842
+ ('DiTBlock_4', 'Dense_0', 'bias'): (4608,)
843
+ ('DiTBlock_4', 'Dense_1', 'kernel'): (768, 768)
844
+ ('DiTBlock_4', 'Dense_1', 'bias'): (768,)
845
+ ('DiTBlock_4', 'Dense_2', 'kernel'): (768, 768)
846
+ ('DiTBlock_4', 'Dense_2', 'bias'): (768,)
847
+ ('DiTBlock_4', 'Dense_3', 'kernel'): (768, 768)
848
+ ('DiTBlock_4', 'Dense_3', 'bias'): (768,)
849
+ ('DiTBlock_4', 'Dense_4', 'kernel'): (768, 768)
850
+ ('DiTBlock_4', 'Dense_4', 'bias'): (768,)
851
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
852
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
853
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
854
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
855
+ ('DiTBlock_5', 'Dense_0', 'kernel'): (768, 4608)
856
+ ('DiTBlock_5', 'Dense_0', 'bias'): (4608,)
857
+ ('DiTBlock_5', 'Dense_1', 'kernel'): (768, 768)
858
+ ('DiTBlock_5', 'Dense_1', 'bias'): (768,)
859
+ ('DiTBlock_5', 'Dense_2', 'kernel'): (768, 768)
860
+ ('DiTBlock_5', 'Dense_2', 'bias'): (768,)
861
+ ('DiTBlock_5', 'Dense_3', 'kernel'): (768, 768)
862
+ ('DiTBlock_5', 'Dense_3', 'bias'): (768,)
863
+ ('DiTBlock_5', 'Dense_4', 'kernel'): (768, 768)
864
+ ('DiTBlock_5', 'Dense_4', 'bias'): (768,)
865
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
866
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
867
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
868
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
869
+ ('DiTBlock_6', 'Dense_0', 'kernel'): (768, 4608)
870
+ ('DiTBlock_6', 'Dense_0', 'bias'): (4608,)
871
+ ('DiTBlock_6', 'Dense_1', 'kernel'): (768, 768)
872
+ ('DiTBlock_6', 'Dense_1', 'bias'): (768,)
873
+ ('DiTBlock_6', 'Dense_2', 'kernel'): (768, 768)
874
+ ('DiTBlock_6', 'Dense_2', 'bias'): (768,)
875
+ ('DiTBlock_6', 'Dense_3', 'kernel'): (768, 768)
876
+ ('DiTBlock_6', 'Dense_3', 'bias'): (768,)
877
+ ('DiTBlock_6', 'Dense_4', 'kernel'): (768, 768)
878
+ ('DiTBlock_6', 'Dense_4', 'bias'): (768,)
879
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
880
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
881
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
882
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
883
+ ('DiTBlock_7', 'Dense_0', 'kernel'): (768, 4608)
884
+ ('DiTBlock_7', 'Dense_0', 'bias'): (4608,)
885
+ ('DiTBlock_7', 'Dense_1', 'kernel'): (768, 768)
886
+ ('DiTBlock_7', 'Dense_1', 'bias'): (768,)
887
+ ('DiTBlock_7', 'Dense_2', 'kernel'): (768, 768)
888
+ ('DiTBlock_7', 'Dense_2', 'bias'): (768,)
889
+ ('DiTBlock_7', 'Dense_3', 'kernel'): (768, 768)
890
+ ('DiTBlock_7', 'Dense_3', 'bias'): (768,)
891
+ ('DiTBlock_7', 'Dense_4', 'kernel'): (768, 768)
892
+ ('DiTBlock_7', 'Dense_4', 'bias'): (768,)
893
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
894
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
895
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
896
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
897
+ ('DiTBlock_8', 'Dense_0', 'kernel'): (768, 4608)
898
+ ('DiTBlock_8', 'Dense_0', 'bias'): (4608,)
899
+ ('DiTBlock_8', 'Dense_1', 'kernel'): (768, 768)
900
+ ('DiTBlock_8', 'Dense_1', 'bias'): (768,)
901
+ ('DiTBlock_8', 'Dense_2', 'kernel'): (768, 768)
902
+ ('DiTBlock_8', 'Dense_2', 'bias'): (768,)
903
+ ('DiTBlock_8', 'Dense_3', 'kernel'): (768, 768)
904
+ ('DiTBlock_8', 'Dense_3', 'bias'): (768,)
905
+ ('DiTBlock_8', 'Dense_4', 'kernel'): (768, 768)
906
+ ('DiTBlock_8', 'Dense_4', 'bias'): (768,)
907
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
908
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
909
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
910
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
911
+ ('DiTBlock_9', 'Dense_0', 'kernel'): (768, 4608)
912
+ ('DiTBlock_9', 'Dense_0', 'bias'): (4608,)
913
+ ('DiTBlock_9', 'Dense_1', 'kernel'): (768, 768)
914
+ ('DiTBlock_9', 'Dense_1', 'bias'): (768,)
915
+ ('DiTBlock_9', 'Dense_2', 'kernel'): (768, 768)
916
+ ('DiTBlock_9', 'Dense_2', 'bias'): (768,)
917
+ ('DiTBlock_9', 'Dense_3', 'kernel'): (768, 768)
918
+ ('DiTBlock_9', 'Dense_3', 'bias'): (768,)
919
+ ('DiTBlock_9', 'Dense_4', 'kernel'): (768, 768)
920
+ ('DiTBlock_9', 'Dense_4', 'bias'): (768,)
921
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
922
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
923
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
924
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
925
+ ('DiTBlock_10', 'Dense_0', 'kernel'): (768, 4608)
926
+ ('DiTBlock_10', 'Dense_0', 'bias'): (4608,)
927
+ ('DiTBlock_10', 'Dense_1', 'kernel'): (768, 768)
928
+ ('DiTBlock_10', 'Dense_1', 'bias'): (768,)
929
+ ('DiTBlock_10', 'Dense_2', 'kernel'): (768, 768)
930
+ ('DiTBlock_10', 'Dense_2', 'bias'): (768,)
931
+ ('DiTBlock_10', 'Dense_3', 'kernel'): (768, 768)
932
+ ('DiTBlock_10', 'Dense_3', 'bias'): (768,)
933
+ ('DiTBlock_10', 'Dense_4', 'kernel'): (768, 768)
934
+ ('DiTBlock_10', 'Dense_4', 'bias'): (768,)
935
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
936
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
937
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
938
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
939
+ ('DiTBlock_11', 'Dense_0', 'kernel'): (768, 4608)
940
+ ('DiTBlock_11', 'Dense_0', 'bias'): (4608,)
941
+ ('DiTBlock_11', 'Dense_1', 'kernel'): (768, 768)
942
+ ('DiTBlock_11', 'Dense_1', 'bias'): (768,)
943
+ ('DiTBlock_11', 'Dense_2', 'kernel'): (768, 768)
944
+ ('DiTBlock_11', 'Dense_2', 'bias'): (768,)
945
+ ('DiTBlock_11', 'Dense_3', 'kernel'): (768, 768)
946
+ ('DiTBlock_11', 'Dense_3', 'bias'): (768,)
947
+ ('DiTBlock_11', 'Dense_4', 'kernel'): (768, 768)
948
+ ('DiTBlock_11', 'Dense_4', 'bias'): (768,)
949
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
950
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
951
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
952
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
953
+ ('FinalLayer_0', 'Dense_0', 'kernel'): (768, 1536)
954
+ ('FinalLayer_0', 'Dense_0', 'bias'): (1536,)
955
+ ('FinalLayer_0', 'Dense_1', 'kernel'): (768, 16)
956
+ ('FinalLayer_0', 'Dense_1', 'bias'): (16,)
957
+ ('Embed_0', 'embedding'): (256, 1)
958
+
959
+ parameter shapes:
960
+ ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608)
961
+ ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608)
962
+ ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768)
963
+ ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768)
964
+ ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768)
965
+ ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768)
966
+ ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768)
967
+ ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768)
968
+ ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768)
969
+ ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768)
970
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
971
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
972
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
973
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
974
+ ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608)
975
+ ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608)
976
+ ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768)
977
+ ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768)
978
+ ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768)
979
+ ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768)
980
+ ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768)
981
+ ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768)
982
+ ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768)
983
+ ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768)
984
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
985
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
986
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
987
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
988
+ ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608)
989
+ ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608)
990
+ ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768)
991
+ ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768)
992
+ ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768)
993
+ ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768)
994
+ ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768)
995
+ ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768)
996
+ ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768)
997
+ ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768)
998
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
999
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1000
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1001
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1002
+ ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608)
1003
+ ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608)
1004
+ ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768)
1005
+ ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768)
1006
+ ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768)
1007
+ ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768)
1008
+ ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768)
1009
+ ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768)
1010
+ ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768)
1011
+ ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768)
1012
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1013
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1014
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1015
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1016
+ ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608)
1017
+ ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608)
1018
+ ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768)
1019
+ ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768)
1020
+ ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768)
1021
+ ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768)
1022
+ ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768)
1023
+ ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768)
1024
+ ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768)
1025
+ ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768)
1026
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1027
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1028
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1029
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1030
+ ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608)
1031
+ ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608)
1032
+ ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768)
1033
+ ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768)
1034
+ ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768)
1035
+ ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768)
1036
+ ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768)
1037
+ ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768)
1038
+ ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768)
1039
+ ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768)
1040
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1041
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1042
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1043
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1044
+ ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608)
1045
+ ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608)
1046
+ ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768)
1047
+ ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768)
1048
+ ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768)
1049
+ ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768)
1050
+ ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768)
1051
+ ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768)
1052
+ ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768)
1053
+ ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768)
1054
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1055
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1056
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1057
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1058
+ ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608)
1059
+ ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608)
1060
+ ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768)
1061
+ ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768)
1062
+ ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768)
1063
+ ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768)
1064
+ ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768)
1065
+ ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768)
1066
+ ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768)
1067
+ ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768)
1068
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1069
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1070
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1071
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1072
+ ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608)
1073
+ ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608)
1074
+ ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768)
1075
+ ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768)
1076
+ ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768)
1077
+ ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768)
1078
+ ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768)
1079
+ ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768)
1080
+ ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768)
1081
+ ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768)
1082
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1083
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1084
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1085
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1086
+ ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608)
1087
+ ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608)
1088
+ ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768)
1089
+ ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768)
1090
+ ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768)
1091
+ ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768)
1092
+ ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768)
1093
+ ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768)
1094
+ ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768)
1095
+ ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768)
1096
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1097
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1098
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1099
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1100
+ ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608)
1101
+ ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608)
1102
+ ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768)
1103
+ ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768)
1104
+ ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768)
1105
+ ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768)
1106
+ ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768)
1107
+ ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768)
1108
+ ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768)
1109
+ ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768)
1110
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1111
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1112
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1113
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1114
+ ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608)
1115
+ ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608)
1116
+ ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768)
1117
+ ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768)
1118
+ ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768)
1119
+ ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768)
1120
+ ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768)
1121
+ ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768)
1122
+ ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768)
1123
+ ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768)
1124
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1125
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1126
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1127
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1128
+ ('Embed_0', 'embedding'): (1, 256, 1)
1129
+ ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536)
1130
+ ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536)
1131
+ ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16)
1132
+ ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16)
1133
+ ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768)
1134
+ ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768)
1135
+ ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768)
1136
+ ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768)
1137
+ ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768)
1138
+ ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768)
1139
+ ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768)
1140
+ ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768)
1141
+ ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768)
1142
+ ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768)
1143
+ ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768)
1144
+
1145
+ parameter shapes:
1146
+ ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608)
1147
+ ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608)
1148
+ ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768)
1149
+ ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768)
1150
+ ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768)
1151
+ ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768)
1152
+ ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768)
1153
+ ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768)
1154
+ ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768)
1155
+ ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768)
1156
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1157
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1158
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1159
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1160
+ ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608)
1161
+ ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608)
1162
+ ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768)
1163
+ ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768)
1164
+ ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768)
1165
+ ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768)
1166
+ ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768)
1167
+ ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768)
1168
+ ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768)
1169
+ ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768)
1170
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1171
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1172
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1173
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1174
+ ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608)
1175
+ ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608)
1176
+ ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768)
1177
+ ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768)
1178
+ ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768)
1179
+ ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768)
1180
+ ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768)
1181
+ ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768)
1182
+ ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768)
1183
+ ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768)
1184
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1185
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1186
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1187
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1188
+ ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608)
1189
+ ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608)
1190
+ ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768)
1191
+ ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768)
1192
+ ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768)
1193
+ ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768)
1194
+ ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768)
1195
+ ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768)
1196
+ ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768)
1197
+ ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768)
1198
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1199
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1200
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1201
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1202
+ ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608)
1203
+ ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608)
1204
+ ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768)
1205
+ ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768)
1206
+ ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768)
1207
+ ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768)
1208
+ ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768)
1209
+ ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768)
1210
+ ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768)
1211
+ ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768)
1212
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1213
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1214
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1215
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1216
+ ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608)
1217
+ ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608)
1218
+ ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768)
1219
+ ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768)
1220
+ ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768)
1221
+ ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768)
1222
+ ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768)
1223
+ ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768)
1224
+ ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768)
1225
+ ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768)
1226
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1227
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1228
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1229
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1230
+ ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608)
1231
+ ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608)
1232
+ ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768)
1233
+ ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768)
1234
+ ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768)
1235
+ ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768)
1236
+ ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768)
1237
+ ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768)
1238
+ ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768)
1239
+ ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768)
1240
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1241
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1242
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1243
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1244
+ ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608)
1245
+ ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608)
1246
+ ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768)
1247
+ ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768)
1248
+ ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768)
1249
+ ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768)
1250
+ ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768)
1251
+ ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768)
1252
+ ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768)
1253
+ ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768)
1254
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1255
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1256
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1257
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1258
+ ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608)
1259
+ ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608)
1260
+ ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768)
1261
+ ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768)
1262
+ ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768)
1263
+ ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768)
1264
+ ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768)
1265
+ ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768)
1266
+ ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768)
1267
+ ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768)
1268
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1269
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1270
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1271
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1272
+ ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608)
1273
+ ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608)
1274
+ ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768)
1275
+ ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768)
1276
+ ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768)
1277
+ ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768)
1278
+ ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768)
1279
+ ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768)
1280
+ ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768)
1281
+ ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768)
1282
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1283
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1284
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1285
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1286
+ ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608)
1287
+ ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608)
1288
+ ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768)
1289
+ ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768)
1290
+ ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768)
1291
+ ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768)
1292
+ ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768)
1293
+ ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768)
1294
+ ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768)
1295
+ ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768)
1296
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1297
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1298
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1299
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1300
+ ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608)
1301
+ ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608)
1302
+ ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768)
1303
+ ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768)
1304
+ ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768)
1305
+ ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768)
1306
+ ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768)
1307
+ ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768)
1308
+ ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768)
1309
+ ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768)
1310
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1311
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1312
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1313
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1314
+ ('Embed_0', 'embedding'): (1, 256, 1)
1315
+ ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536)
1316
+ ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536)
1317
+ ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16)
1318
+ ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16)
1319
+ ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768)
1320
+ ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768)
1321
+ ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768)
1322
+ ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768)
1323
+ ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768)
1324
+ ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768)
1325
+ ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768)
1326
+ ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768)
1327
+ ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768)
1328
+ ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768)
1329
+ ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768)
1330
+
1331
+ parameter shapes:
1332
+ ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608)
1333
+ ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608)
1334
+ ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768)
1335
+ ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768)
1336
+ ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768)
1337
+ ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768)
1338
+ ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768)
1339
+ ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768)
1340
+ ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768)
1341
+ ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768)
1342
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1343
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1344
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1345
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1346
+ ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608)
1347
+ ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608)
1348
+ ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768)
1349
+ ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768)
1350
+ ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768)
1351
+ ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768)
1352
+ ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768)
1353
+ ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768)
1354
+ ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768)
1355
+ ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768)
1356
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1357
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1358
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1359
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1360
+ ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608)
1361
+ ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608)
1362
+ ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768)
1363
+ ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768)
1364
+ ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768)
1365
+ ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768)
1366
+ ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768)
1367
+ ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768)
1368
+ ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768)
1369
+ ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768)
1370
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1371
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1372
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1373
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1374
+ ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608)
1375
+ ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608)
1376
+ ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768)
1377
+ ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768)
1378
+ ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768)
1379
+ ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768)
1380
+ ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768)
1381
+ ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768)
1382
+ ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768)
1383
+ ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768)
1384
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1385
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1386
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1387
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1388
+ ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608)
1389
+ ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608)
1390
+ ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768)
1391
+ ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768)
1392
+ ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768)
1393
+ ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768)
1394
+ ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768)
1395
+ ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768)
1396
+ ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768)
1397
+ ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768)
1398
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1399
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1400
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1401
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1402
+ ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608)
1403
+ ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608)
1404
+ ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768)
1405
+ ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768)
1406
+ ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768)
1407
+ ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768)
1408
+ ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768)
1409
+ ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768)
1410
+ ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768)
1411
+ ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768)
1412
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1413
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1414
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1415
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1416
+ ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608)
1417
+ ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608)
1418
+ ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768)
1419
+ ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768)
1420
+ ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768)
1421
+ ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768)
1422
+ ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768)
1423
+ ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768)
1424
+ ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768)
1425
+ ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768)
1426
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1427
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1428
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1429
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1430
+ ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608)
1431
+ ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608)
1432
+ ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768)
1433
+ ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768)
1434
+ ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768)
1435
+ ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768)
1436
+ ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768)
1437
+ ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768)
1438
+ ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768)
1439
+ ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768)
1440
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1441
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1442
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1443
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1444
+ ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608)
1445
+ ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608)
1446
+ ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768)
1447
+ ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768)
1448
+ ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768)
1449
+ ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768)
1450
+ ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768)
1451
+ ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768)
1452
+ ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768)
1453
+ ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768)
1454
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1455
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1456
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1457
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1458
+ ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608)
1459
+ ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608)
1460
+ ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768)
1461
+ ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768)
1462
+ ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768)
1463
+ ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768)
1464
+ ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768)
1465
+ ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768)
1466
+ ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768)
1467
+ ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768)
1468
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1469
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1470
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1471
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1472
+ ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608)
1473
+ ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608)
1474
+ ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768)
1475
+ ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768)
1476
+ ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768)
1477
+ ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768)
1478
+ ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768)
1479
+ ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768)
1480
+ ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768)
1481
+ ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768)
1482
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1483
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1484
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1485
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1486
+ ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608)
1487
+ ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608)
1488
+ ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768)
1489
+ ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768)
1490
+ ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768)
1491
+ ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768)
1492
+ ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768)
1493
+ ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768)
1494
+ ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768)
1495
+ ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768)
1496
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1497
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1498
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1499
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1500
+ ('Embed_0', 'embedding'): (1, 256, 1)
1501
+ ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536)
1502
+ ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536)
1503
+ ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16)
1504
+ ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16)
1505
+ ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768)
1506
+ ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768)
1507
+ ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768)
1508
+ ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768)
1509
+ ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768)
1510
+ ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768)
1511
+ ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768)
1512
+ ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768)
1513
+ ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768)
1514
+ ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768)
1515
+ ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768)
1516
+
1517
+ parameter shapes:
1518
+ ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608)
1519
+ ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608)
1520
+ ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768)
1521
+ ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768)
1522
+ ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768)
1523
+ ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768)
1524
+ ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768)
1525
+ ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768)
1526
+ ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768)
1527
+ ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768)
1528
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1529
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1530
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1531
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1532
+ ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608)
1533
+ ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608)
1534
+ ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768)
1535
+ ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768)
1536
+ ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768)
1537
+ ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768)
1538
+ ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768)
1539
+ ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768)
1540
+ ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768)
1541
+ ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768)
1542
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1543
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1544
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1545
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1546
+ ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608)
1547
+ ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608)
1548
+ ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768)
1549
+ ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768)
1550
+ ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768)
1551
+ ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768)
1552
+ ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768)
1553
+ ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768)
1554
+ ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768)
1555
+ ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768)
1556
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1557
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1558
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1559
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1560
+ ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608)
1561
+ ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608)
1562
+ ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768)
1563
+ ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768)
1564
+ ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768)
1565
+ ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768)
1566
+ ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768)
1567
+ ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768)
1568
+ ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768)
1569
+ ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768)
1570
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1571
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1572
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1573
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1574
+ ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608)
1575
+ ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608)
1576
+ ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768)
1577
+ ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768)
1578
+ ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768)
1579
+ ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768)
1580
+ ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768)
1581
+ ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768)
1582
+ ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768)
1583
+ ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768)
1584
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1585
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1586
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1587
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1588
+ ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608)
1589
+ ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608)
1590
+ ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768)
1591
+ ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768)
1592
+ ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768)
1593
+ ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768)
1594
+ ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768)
1595
+ ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768)
1596
+ ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768)
1597
+ ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768)
1598
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1599
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1600
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1601
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1602
+ ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608)
1603
+ ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608)
1604
+ ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768)
1605
+ ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768)
1606
+ ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768)
1607
+ ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768)
1608
+ ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768)
1609
+ ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768)
1610
+ ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768)
1611
+ ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768)
1612
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1613
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1614
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1615
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1616
+ ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608)
1617
+ ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608)
1618
+ ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768)
1619
+ ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768)
1620
+ ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768)
1621
+ ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768)
1622
+ ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768)
1623
+ ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768)
1624
+ ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768)
1625
+ ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768)
1626
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1627
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1628
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1629
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1630
+ ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608)
1631
+ ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608)
1632
+ ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768)
1633
+ ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768)
1634
+ ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768)
1635
+ ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768)
1636
+ ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768)
1637
+ ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768)
1638
+ ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768)
1639
+ ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768)
1640
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1641
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1642
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1643
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1644
+ ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608)
1645
+ ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608)
1646
+ ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768)
1647
+ ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768)
1648
+ ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768)
1649
+ ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768)
1650
+ ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768)
1651
+ ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768)
1652
+ ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768)
1653
+ ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768)
1654
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1655
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1656
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1657
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1658
+ ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608)
1659
+ ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608)
1660
+ ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768)
1661
+ ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768)
1662
+ ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768)
1663
+ ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768)
1664
+ ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768)
1665
+ ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768)
1666
+ ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768)
1667
+ ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768)
1668
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1669
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1670
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1671
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1672
+ ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608)
1673
+ ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608)
1674
+ ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768)
1675
+ ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768)
1676
+ ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768)
1677
+ ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768)
1678
+ ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768)
1679
+ ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768)
1680
+ ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768)
1681
+ ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768)
1682
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072)
1683
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072)
1684
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768)
1685
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768)
1686
+ ('Embed_0', 'embedding'): (1, 256, 1)
1687
+ ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536)
1688
+ ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536)
1689
+ ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16)
1690
+ ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16)
1691
+ ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768)
1692
+ ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768)
1693
+ ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768)
1694
+ ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768)
1695
+ ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768)
1696
+ ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768)
1697
+ ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768)
1698
+ ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768)
1699
+ ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768)
1700
+ ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768)
1701
+ ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768)
1702
+
1703
+ parameter shapes:
1704
+ ('DiTBlock_0', 'Dense_0', 'bias'): (4608,)
1705
+ ('DiTBlock_0', 'Dense_0', 'kernel'): (768, 4608)
1706
+ ('DiTBlock_0', 'Dense_1', 'bias'): (768,)
1707
+ ('DiTBlock_0', 'Dense_1', 'kernel'): (768, 768)
1708
+ ('DiTBlock_0', 'Dense_2', 'bias'): (768,)
1709
+ ('DiTBlock_0', 'Dense_2', 'kernel'): (768, 768)
1710
+ ('DiTBlock_0', 'Dense_3', 'bias'): (768,)
1711
+ ('DiTBlock_0', 'Dense_3', 'kernel'): (768, 768)
1712
+ ('DiTBlock_0', 'Dense_4', 'bias'): (768,)
1713
+ ('DiTBlock_0', 'Dense_4', 'kernel'): (768, 768)
1714
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
1715
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
1716
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
1717
+ ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
1718
+ ('DiTBlock_1', 'Dense_0', 'bias'): (4608,)
1719
+ ('DiTBlock_1', 'Dense_0', 'kernel'): (768, 4608)
1720
+ ('DiTBlock_1', 'Dense_1', 'bias'): (768,)
1721
+ ('DiTBlock_1', 'Dense_1', 'kernel'): (768, 768)
1722
+ ('DiTBlock_1', 'Dense_2', 'bias'): (768,)
1723
+ ('DiTBlock_1', 'Dense_2', 'kernel'): (768, 768)
1724
+ ('DiTBlock_1', 'Dense_3', 'bias'): (768,)
1725
+ ('DiTBlock_1', 'Dense_3', 'kernel'): (768, 768)
1726
+ ('DiTBlock_1', 'Dense_4', 'bias'): (768,)
1727
+ ('DiTBlock_1', 'Dense_4', 'kernel'): (768, 768)
1728
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
1729
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
1730
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
1731
+ ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
1732
+ ('DiTBlock_10', 'Dense_0', 'bias'): (4608,)
1733
+ ('DiTBlock_10', 'Dense_0', 'kernel'): (768, 4608)
1734
+ ('DiTBlock_10', 'Dense_1', 'bias'): (768,)
1735
+ ('DiTBlock_10', 'Dense_1', 'kernel'): (768, 768)
1736
+ ('DiTBlock_10', 'Dense_2', 'bias'): (768,)
1737
+ ('DiTBlock_10', 'Dense_2', 'kernel'): (768, 768)
1738
+ ('DiTBlock_10', 'Dense_3', 'bias'): (768,)
1739
+ ('DiTBlock_10', 'Dense_3', 'kernel'): (768, 768)
1740
+ ('DiTBlock_10', 'Dense_4', 'bias'): (768,)
1741
+ ('DiTBlock_10', 'Dense_4', 'kernel'): (768, 768)
1742
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
1743
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
1744
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
1745
+ ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
1746
+ ('DiTBlock_11', 'Dense_0', 'bias'): (4608,)
1747
+ ('DiTBlock_11', 'Dense_0', 'kernel'): (768, 4608)
1748
+ ('DiTBlock_11', 'Dense_1', 'bias'): (768,)
1749
+ ('DiTBlock_11', 'Dense_1', 'kernel'): (768, 768)
1750
+ ('DiTBlock_11', 'Dense_2', 'bias'): (768,)
1751
+ ('DiTBlock_11', 'Dense_2', 'kernel'): (768, 768)
1752
+ ('DiTBlock_11', 'Dense_3', 'bias'): (768,)
1753
+ ('DiTBlock_11', 'Dense_3', 'kernel'): (768, 768)
1754
+ ('DiTBlock_11', 'Dense_4', 'bias'): (768,)
1755
+ ('DiTBlock_11', 'Dense_4', 'kernel'): (768, 768)
1756
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
1757
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
1758
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
1759
+ ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
1760
+ ('DiTBlock_2', 'Dense_0', 'bias'): (4608,)
1761
+ ('DiTBlock_2', 'Dense_0', 'kernel'): (768, 4608)
1762
+ ('DiTBlock_2', 'Dense_1', 'bias'): (768,)
1763
+ ('DiTBlock_2', 'Dense_1', 'kernel'): (768, 768)
1764
+ ('DiTBlock_2', 'Dense_2', 'bias'): (768,)
1765
+ ('DiTBlock_2', 'Dense_2', 'kernel'): (768, 768)
1766
+ ('DiTBlock_2', 'Dense_3', 'bias'): (768,)
1767
+ ('DiTBlock_2', 'Dense_3', 'kernel'): (768, 768)
1768
+ ('DiTBlock_2', 'Dense_4', 'bias'): (768,)
1769
+ ('DiTBlock_2', 'Dense_4', 'kernel'): (768, 768)
1770
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
1771
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
1772
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
1773
+ ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
1774
+ ('DiTBlock_3', 'Dense_0', 'bias'): (4608,)
1775
+ ('DiTBlock_3', 'Dense_0', 'kernel'): (768, 4608)
1776
+ ('DiTBlock_3', 'Dense_1', 'bias'): (768,)
1777
+ ('DiTBlock_3', 'Dense_1', 'kernel'): (768, 768)
1778
+ ('DiTBlock_3', 'Dense_2', 'bias'): (768,)
1779
+ ('DiTBlock_3', 'Dense_2', 'kernel'): (768, 768)
1780
+ ('DiTBlock_3', 'Dense_3', 'bias'): (768,)
1781
+ ('DiTBlock_3', 'Dense_3', 'kernel'): (768, 768)
1782
+ ('DiTBlock_3', 'Dense_4', 'bias'): (768,)
1783
+ ('DiTBlock_3', 'Dense_4', 'kernel'): (768, 768)
1784
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
1785
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
1786
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
1787
+ ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
1788
+ ('DiTBlock_4', 'Dense_0', 'bias'): (4608,)
1789
+ ('DiTBlock_4', 'Dense_0', 'kernel'): (768, 4608)
1790
+ ('DiTBlock_4', 'Dense_1', 'bias'): (768,)
1791
+ ('DiTBlock_4', 'Dense_1', 'kernel'): (768, 768)
1792
+ ('DiTBlock_4', 'Dense_2', 'bias'): (768,)
1793
+ ('DiTBlock_4', 'Dense_2', 'kernel'): (768, 768)
1794
+ ('DiTBlock_4', 'Dense_3', 'bias'): (768,)
1795
+ ('DiTBlock_4', 'Dense_3', 'kernel'): (768, 768)
1796
+ ('DiTBlock_4', 'Dense_4', 'bias'): (768,)
1797
+ ('DiTBlock_4', 'Dense_4', 'kernel'): (768, 768)
1798
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
1799
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
1800
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
1801
+ ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
1802
+ ('DiTBlock_5', 'Dense_0', 'bias'): (4608,)
1803
+ ('DiTBlock_5', 'Dense_0', 'kernel'): (768, 4608)
1804
+ ('DiTBlock_5', 'Dense_1', 'bias'): (768,)
1805
+ ('DiTBlock_5', 'Dense_1', 'kernel'): (768, 768)
1806
+ ('DiTBlock_5', 'Dense_2', 'bias'): (768,)
1807
+ ('DiTBlock_5', 'Dense_2', 'kernel'): (768, 768)
1808
+ ('DiTBlock_5', 'Dense_3', 'bias'): (768,)
1809
+ ('DiTBlock_5', 'Dense_3', 'kernel'): (768, 768)
1810
+ ('DiTBlock_5', 'Dense_4', 'bias'): (768,)
1811
+ ('DiTBlock_5', 'Dense_4', 'kernel'): (768, 768)
1812
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
1813
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
1814
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
1815
+ ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
1816
+ ('DiTBlock_6', 'Dense_0', 'bias'): (4608,)
1817
+ ('DiTBlock_6', 'Dense_0', 'kernel'): (768, 4608)
1818
+ ('DiTBlock_6', 'Dense_1', 'bias'): (768,)
1819
+ ('DiTBlock_6', 'Dense_1', 'kernel'): (768, 768)
1820
+ ('DiTBlock_6', 'Dense_2', 'bias'): (768,)
1821
+ ('DiTBlock_6', 'Dense_2', 'kernel'): (768, 768)
1822
+ ('DiTBlock_6', 'Dense_3', 'bias'): (768,)
1823
+ ('DiTBlock_6', 'Dense_3', 'kernel'): (768, 768)
1824
+ ('DiTBlock_6', 'Dense_4', 'bias'): (768,)
1825
+ ('DiTBlock_6', 'Dense_4', 'kernel'): (768, 768)
1826
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
1827
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
1828
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
1829
+ ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
1830
+ ('DiTBlock_7', 'Dense_0', 'bias'): (4608,)
1831
+ ('DiTBlock_7', 'Dense_0', 'kernel'): (768, 4608)
1832
+ ('DiTBlock_7', 'Dense_1', 'bias'): (768,)
1833
+ ('DiTBlock_7', 'Dense_1', 'kernel'): (768, 768)
1834
+ ('DiTBlock_7', 'Dense_2', 'bias'): (768,)
1835
+ ('DiTBlock_7', 'Dense_2', 'kernel'): (768, 768)
1836
+ ('DiTBlock_7', 'Dense_3', 'bias'): (768,)
1837
+ ('DiTBlock_7', 'Dense_3', 'kernel'): (768, 768)
1838
+ ('DiTBlock_7', 'Dense_4', 'bias'): (768,)
1839
+ ('DiTBlock_7', 'Dense_4', 'kernel'): (768, 768)
1840
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
1841
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
1842
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
1843
+ ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
1844
+ ('DiTBlock_8', 'Dense_0', 'bias'): (4608,)
1845
+ ('DiTBlock_8', 'Dense_0', 'kernel'): (768, 4608)
1846
+ ('DiTBlock_8', 'Dense_1', 'bias'): (768,)
1847
+ ('DiTBlock_8', 'Dense_1', 'kernel'): (768, 768)
1848
+ ('DiTBlock_8', 'Dense_2', 'bias'): (768,)
1849
+ ('DiTBlock_8', 'Dense_2', 'kernel'): (768, 768)
1850
+ ('DiTBlock_8', 'Dense_3', 'bias'): (768,)
1851
+ ('DiTBlock_8', 'Dense_3', 'kernel'): (768, 768)
1852
+ ('DiTBlock_8', 'Dense_4', 'bias'): (768,)
1853
+ ('DiTBlock_8', 'Dense_4', 'kernel'): (768, 768)
1854
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
1855
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
1856
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
1857
+ ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
1858
+ ('DiTBlock_9', 'Dense_0', 'bias'): (4608,)
1859
+ ('DiTBlock_9', 'Dense_0', 'kernel'): (768, 4608)
1860
+ ('DiTBlock_9', 'Dense_1', 'bias'): (768,)
1861
+ ('DiTBlock_9', 'Dense_1', 'kernel'): (768, 768)
1862
+ ('DiTBlock_9', 'Dense_2', 'bias'): (768,)
1863
+ ('DiTBlock_9', 'Dense_2', 'kernel'): (768, 768)
1864
+ ('DiTBlock_9', 'Dense_3', 'bias'): (768,)
1865
+ ('DiTBlock_9', 'Dense_3', 'kernel'): (768, 768)
1866
+ ('DiTBlock_9', 'Dense_4', 'bias'): (768,)
1867
+ ('DiTBlock_9', 'Dense_4', 'kernel'): (768, 768)
1868
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,)
1869
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072)
1870
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (768,)
1871
+ ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768)
1872
+ ('Embed_0', 'embedding'): (256, 1)
1873
+ ('FinalLayer_0', 'Dense_0', 'bias'): (1536,)
1874
+ ('FinalLayer_0', 'Dense_0', 'kernel'): (768, 1536)
1875
+ ('FinalLayer_0', 'Dense_1', 'bias'): (16,)
1876
+ ('FinalLayer_0', 'Dense_1', 'kernel'): (768, 16)
1877
+ ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1001, 768)
1878
+ ('PatchEmbed_0', 'Conv_0', 'bias'): (768,)
1879
+ ('PatchEmbed_0', 'Conv_0', 'kernel'): (2, 2, 4, 768)
1880
+ ('TimestepEmbedder_0', 'Dense_0', 'bias'): (768,)
1881
+ ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (256, 768)
1882
+ ('TimestepEmbedder_0', 'Dense_1', 'bias'): (768,)
1883
+ ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (768, 768)
1884
+ ('TimestepEmbedder_1', 'Dense_0', 'bias'): (768,)
1885
+ ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (256, 768)
1886
+ ('TimestepEmbedder_1', 'Dense_1', 'bias'): (768,)
1887
+ ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (768, 768)
1888
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
1889
+ β”‚ β”‚
1890
+ β”‚ β”‚
1891
+ β”‚ β”‚
1892
+ β”‚ β”‚
1893
+ β”‚ TPU 0,1,2,3 β”‚
1894
+ β”‚ β”‚
1895
+ β”‚ β”‚
1896
+ β”‚ β”‚
1897
+ β”‚ β”‚
1898
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
1899
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
1900
+ β”‚ β”‚
1901
+ β”‚ β”‚
1902
+ β”‚ β”‚
1903
+ β”‚ β”‚
1904
+ β”‚ TPU 0,1,2,3 β”‚
1905
+ β”‚ β”‚
1906
+ β”‚ β”‚
1907
+ β”‚ β”‚
1908
+ β”‚ β”‚
1909
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
1910
+ doing the else
1911
+ (512, 256, 256, 3)
1912
+ encode image shape (128, 256, 256, 3)
1913
+ Initializing encoder.
1914
+ Incoming encoder shape (128, 256, 256, 3)
1915
+ Encoder layer (128, 256, 256, 128)
1916
+ doing downsample
1917
+ Encoder layer (128, 128, 128, 128)
1918
+ doing downsample
1919
+ Encoder layer (128, 64, 64, 256)
1920
+ doing downsample
1921
+ Encoder layer (128, 32, 32, 512)
1922
+ Encoder layer (128, 32, 32, 512)
1923
+ Encoder layer final (128, 32, 32, 512)
1924
+ Encoder layer final (128, 32, 32, 512)
1925
+ Final embeddings are size (128, 32, 32, 8)
1926
+ After quant (128, 32, 32, 4)
1927
+ Calc FID for CFG 1.0 and denoise_timesteps 128
1928
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
1929
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
1930
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
1931
+ DiT: Conditioning of shape (512, 768) dtype float32
1932
+ z_vectors shape (128, 32, 32, 4)
1933
+ Decoder incoming shape (128, 32, 32, 4)
1934
+ Decoder input (128, 32, 32, 512)
1935
+ Mid Block Decoder layer (128, 32, 32, 512)
1936
+ Mid Block Decoder layer (128, 32, 32, 512)
1937
+ Decoder layer (128, 64, 64, 512)
1938
+ Decoder layer (128, 128, 128, 512)
1939
+ Decoder layer (128, 256, 256, 256)
1940
+ Decoder layer (128, 256, 256, 128)
1941
+ FID is 36.056663513183594
1942
+ (512, 256, 256, 3)
1943
+ Calc FID for CFG 1.0 and denoise_timesteps 64
1944
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
1945
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
1946
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
1947
+ DiT: Conditioning of shape (512, 768) dtype float32
1948
+ FID is 36.78165054321289
1949
+ (512, 256, 256, 3)
1950
+ Calc FID for CFG 1.0 and denoise_timesteps 32
1951
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
1952
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
1953
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
1954
+ DiT: Conditioning of shape (512, 768) dtype float32
1955
+ FID is 38.99095153808594
1956
+ (512, 256, 256, 3)
1957
+ Calc FID for CFG 1.0 and denoise_timesteps 16
1958
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
1959
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
1960
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
1961
+ DiT: Conditioning of shape (512, 768) dtype float32
1962
+ FID is 46.122859954833984
1963
+ (512, 256, 256, 3)
1964
+ Calc FID for CFG 1.0 and denoise_timesteps 8
1965
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
1966
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
1967
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
1968
+ DiT: Conditioning of shape (512, 768) dtype float32
1969
+ FID is 69.97744750976562
1970
+ (512, 256, 256, 3)
1971
+ Calc FID for CFG 1.0 and denoise_timesteps 4
1972
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
1973
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
1974
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
1975
+ DiT: Conditioning of shape (512, 768) dtype float32
1976
+ FID is 140.8319854736328
1977
+ (512, 256, 256, 3)
1978
+ Calc FID for CFG 1.0 and denoise_timesteps 2
1979
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
1980
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
1981
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
1982
+ DiT: Conditioning of shape (512, 768) dtype float32
1983
+ FID is 271.42181396484375
1984
+ (512, 256, 256, 3)
1985
+ Calc FID for CFG 1.0 and denoise_timesteps 1
1986
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
1987
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
1988
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
1989
+ DiT: Conditioning of shape (512, 768) dtype float32
1990
+ FID is 264.41375732421875
1991
+ (512, 256, 256, 3)
1992
+ Calc FID for CFG 1.25 and denoise_timesteps 128
1993
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
1994
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
1995
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
1996
+ DiT: Conditioning of shape (512, 768) dtype float32
1997
+ FID is 22.577829360961914
1998
+ (512, 256, 256, 3)
1999
+ Calc FID for CFG 1.25 and denoise_timesteps 64
2000
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2001
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2002
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2003
+ DiT: Conditioning of shape (512, 768) dtype float32
2004
+ FID is 23.178266525268555
2005
+ (512, 256, 256, 3)
2006
+ Calc FID for CFG 1.25 and denoise_timesteps 32
2007
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2008
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2009
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2010
+ DiT: Conditioning of shape (512, 768) dtype float32
2011
+ FID is 25.00735092163086
2012
+ (512, 256, 256, 3)
2013
+ Calc FID for CFG 1.25 and denoise_timesteps 16
2014
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2015
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2016
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2017
+ DiT: Conditioning of shape (512, 768) dtype float32
2018
+ FID is 31.010696411132812
2019
+ (512, 256, 256, 3)
2020
+ Calc FID for CFG 1.25 and denoise_timesteps 8
2021
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2022
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2023
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2024
+ DiT: Conditioning of shape (512, 768) dtype float32
2025
+ FID is 52.40938949584961
2026
+ (512, 256, 256, 3)
2027
+ Calc FID for CFG 1.25 and denoise_timesteps 4
2028
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2029
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2030
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2031
+ DiT: Conditioning of shape (512, 768) dtype float32
2032
+ FID is 119.3035659790039
2033
+ (512, 256, 256, 3)
2034
+ Calc FID for CFG 1.25 and denoise_timesteps 2
2035
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2036
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2037
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2038
+ DiT: Conditioning of shape (512, 768) dtype float32
2039
+ FID is 262.1265869140625
2040
+ (512, 256, 256, 3)
2041
+ Calc FID for CFG 1.25 and denoise_timesteps 1
2042
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2043
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2044
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2045
+ DiT: Conditioning of shape (512, 768) dtype float32
2046
+ FID is 255.10650634765625
2047
+ (512, 256, 256, 3)
2048
+ Calc FID for CFG 1.5 and denoise_timesteps 128
2049
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2050
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2051
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2052
+ DiT: Conditioning of shape (512, 768) dtype float32
2053
+ FID is 14.719727516174316
2054
+ (512, 256, 256, 3)
2055
+ Calc FID for CFG 1.5 and denoise_timesteps 64
2056
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2057
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2058
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2059
+ DiT: Conditioning of shape (512, 768) dtype float32
2060
+ FID is 15.140058517456055
2061
+ (512, 256, 256, 3)
2062
+ Calc FID for CFG 1.5 and denoise_timesteps 32
2063
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2064
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2065
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2066
+ DiT: Conditioning of shape (512, 768) dtype float32
2067
+ FID is 16.466432571411133
2068
+ (512, 256, 256, 3)
2069
+ Calc FID for CFG 1.5 and denoise_timesteps 16
2070
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2071
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2072
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2073
+ DiT: Conditioning of shape (512, 768) dtype float32
2074
+ FID is 21.103317260742188
2075
+ (512, 256, 256, 3)
2076
+ Calc FID for CFG 1.5 and denoise_timesteps 8
2077
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2078
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2079
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2080
+ DiT: Conditioning of shape (512, 768) dtype float32
2081
+ FID is 38.82719421386719
2082
+ (512, 256, 256, 3)
2083
+ Calc FID for CFG 1.5 and denoise_timesteps 4
2084
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2085
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2086
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2087
+ DiT: Conditioning of shape (512, 768) dtype float32
2088
+ FID is 100.574951171875
2089
+ (512, 256, 256, 3)
2090
+ Calc FID for CFG 1.5 and denoise_timesteps 2
2091
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2092
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2093
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2094
+ DiT: Conditioning of shape (512, 768) dtype float32
2095
+ FID is 254.09597778320312
2096
+ (512, 256, 256, 3)
2097
+ Calc FID for CFG 1.5 and denoise_timesteps 1
2098
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2099
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2100
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2101
+ DiT: Conditioning of shape (512, 768) dtype float32
2102
+ FID is 248.49542236328125
2103
+ (512, 256, 256, 3)
2104
+ Calc FID for CFG 1.75 and denoise_timesteps 128
2105
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2106
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2107
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2108
+ DiT: Conditioning of shape (512, 768) dtype float32
2109
+ FID is 10.582630157470703
2110
+ (512, 256, 256, 3)
2111
+ Calc FID for CFG 1.75 and denoise_timesteps 64
2112
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2113
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2114
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2115
+ DiT: Conditioning of shape (512, 768) dtype float32
2116
+ FID is 10.867171287536621
2117
+ (512, 256, 256, 3)
2118
+ Calc FID for CFG 1.75 and denoise_timesteps 32
2119
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2120
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2121
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2122
+ DiT: Conditioning of shape (512, 768) dtype float32
2123
+ FID is 11.793257713317871
2124
+ (512, 256, 256, 3)
2125
+ Calc FID for CFG 1.75 and denoise_timesteps 16
2126
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2127
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2128
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2129
+ DiT: Conditioning of shape (512, 768) dtype float32
2130
+ FID is 15.195892333984375
2131
+ (512, 256, 256, 3)
2132
+ Calc FID for CFG 1.75 and denoise_timesteps 8
2133
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2134
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2135
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2136
+ DiT: Conditioning of shape (512, 768) dtype float32
2137
+ FID is 29.16132926940918
2138
+ (512, 256, 256, 3)
2139
+ Calc FID for CFG 1.75 and denoise_timesteps 4
2140
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2141
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2142
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2143
+ DiT: Conditioning of shape (512, 768) dtype float32
2144
+ FID is 84.88493347167969
2145
+ (512, 256, 256, 3)
2146
+ Calc FID for CFG 1.75 and denoise_timesteps 2
2147
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2148
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2149
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2150
+ DiT: Conditioning of shape (512, 768) dtype float32
2151
+ FID is 247.1053924560547
2152
+ (512, 256, 256, 3)
2153
+ Calc FID for CFG 1.75 and denoise_timesteps 1
2154
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2155
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2156
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2157
+ DiT: Conditioning of shape (512, 768) dtype float32
2158
+ FID is 243.9935302734375
2159
+ (512, 256, 256, 3)
2160
+ Calc FID for CFG 2.0 and denoise_timesteps 128
2161
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2162
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2163
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2164
+ DiT: Conditioning of shape (512, 768) dtype float32
2165
+ FID is 8.849721908569336
2166
+ (512, 256, 256, 3)
2167
+ Calc FID for CFG 2.0 and denoise_timesteps 64
2168
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2169
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2170
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2171
+ DiT: Conditioning of shape (512, 768) dtype float32
2172
+ FID is 9.026944160461426
2173
+ (512, 256, 256, 3)
2174
+ Calc FID for CFG 2.0 and denoise_timesteps 32
2175
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2176
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2177
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2178
+ DiT: Conditioning of shape (512, 768) dtype float32
2179
+ FID is 9.616308212280273
2180
+ (512, 256, 256, 3)
2181
+ Calc FID for CFG 2.0 and denoise_timesteps 16
2182
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2183
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2184
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2185
+ DiT: Conditioning of shape (512, 768) dtype float32
2186
+ FID is 12.015625953674316
2187
+ (512, 256, 256, 3)
2188
+ Calc FID for CFG 2.0 and denoise_timesteps 8
2189
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2190
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2191
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2192
+ DiT: Conditioning of shape (512, 768) dtype float32
2193
+ FID is 22.730276107788086
2194
+ (512, 256, 256, 3)
2195
+ Calc FID for CFG 2.0 and denoise_timesteps 4
2196
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2197
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2198
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2199
+ DiT: Conditioning of shape (512, 768) dtype float32
2200
+ FID is 72.25479125976562
2201
+ (512, 256, 256, 3)
2202
+ Calc FID for CFG 2.0 and denoise_timesteps 2
2203
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2204
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2205
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2206
+ DiT: Conditioning of shape (512, 768) dtype float32
2207
+ FID is 241.37213134765625
2208
+ (512, 256, 256, 3)
2209
+ Calc FID for CFG 2.0 and denoise_timesteps 1
2210
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2211
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2212
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2213
+ DiT: Conditioning of shape (512, 768) dtype float32
2214
+ FID is 241.18011474609375
2215
+ (512, 256, 256, 3)
2216
+ Calc FID for CFG 2.25 and denoise_timesteps 128
2217
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2218
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2219
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2220
+ DiT: Conditioning of shape (512, 768) dtype float32
2221
+ FID is 8.385416030883789
2222
+ (512, 256, 256, 3)
2223
+ Calc FID for CFG 2.25 and denoise_timesteps 64
2224
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2225
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2226
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2227
+ DiT: Conditioning of shape (512, 768) dtype float32
2228
+ FID is 8.469977378845215
2229
+ (512, 256, 256, 3)
2230
+ Calc FID for CFG 2.25 and denoise_timesteps 32
2231
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2232
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2233
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2234
+ DiT: Conditioning of shape (512, 768) dtype float32
2235
+ FID is 8.870479583740234
2236
+ (512, 256, 256, 3)
2237
+ Calc FID for CFG 2.25 and denoise_timesteps 16
2238
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2239
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2240
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2241
+ DiT: Conditioning of shape (512, 768) dtype float32
2242
+ FID is 10.516737937927246
2243
+ (512, 256, 256, 3)
2244
+ Calc FID for CFG 2.25 and denoise_timesteps 8
2245
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2246
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2247
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2248
+ DiT: Conditioning of shape (512, 768) dtype float32
2249
+ FID is 18.650209426879883
2250
+ (512, 256, 256, 3)
2251
+ Calc FID for CFG 2.25 and denoise_timesteps 4
2252
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2253
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2254
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2255
+ DiT: Conditioning of shape (512, 768) dtype float32
2256
+ FID is 62.06416320800781
2257
+ (512, 256, 256, 3)
2258
+ Calc FID for CFG 2.25 and denoise_timesteps 2
2259
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2260
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2261
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2262
+ DiT: Conditioning of shape (512, 768) dtype float32
2263
+ FID is 236.56796264648438
2264
+ (512, 256, 256, 3)
2265
+ Calc FID for CFG 2.25 and denoise_timesteps 1
2266
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2267
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2268
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2269
+ DiT: Conditioning of shape (512, 768) dtype float32
2270
+ FID is 238.99615478515625
2271
+ (512, 256, 256, 3)
2272
+ Calc FID for CFG 2.5 and denoise_timesteps 128
2273
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2274
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2275
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2276
+ DiT: Conditioning of shape (512, 768) dtype float32
2277
+ FID is 8.667655944824219
2278
+ (512, 256, 256, 3)
2279
+ Calc FID for CFG 2.5 and denoise_timesteps 64
2280
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2281
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2282
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2283
+ DiT: Conditioning of shape (512, 768) dtype float32
2284
+ FID is 8.719772338867188
2285
+ (512, 256, 256, 3)
2286
+ Calc FID for CFG 2.5 and denoise_timesteps 32
2287
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2288
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2289
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2290
+ DiT: Conditioning of shape (512, 768) dtype float32
2291
+ FID is 8.941210746765137
2292
+ (512, 256, 256, 3)
2293
+ Calc FID for CFG 2.5 and denoise_timesteps 16
2294
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2295
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2296
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2297
+ DiT: Conditioning of shape (512, 768) dtype float32
2298
+ FID is 10.057480812072754
2299
+ (512, 256, 256, 3)
2300
+ Calc FID for CFG 2.5 and denoise_timesteps 8
2301
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2302
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2303
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2304
+ DiT: Conditioning of shape (512, 768) dtype float32
2305
+ FID is 16.115034103393555
2306
+ (512, 256, 256, 3)
2307
+ Calc FID for CFG 2.5 and denoise_timesteps 4
2308
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2309
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2310
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2311
+ DiT: Conditioning of shape (512, 768) dtype float32
2312
+ FID is 53.955543518066406
2313
+ (512, 256, 256, 3)
2314
+ Calc FID for CFG 2.5 and denoise_timesteps 2
2315
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2316
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2317
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2318
+ DiT: Conditioning of shape (512, 768) dtype float32
2319
+ FID is 232.41696166992188
2320
+ (512, 256, 256, 3)
2321
+ Calc FID for CFG 2.5 and denoise_timesteps 1
2322
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2323
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2324
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2325
+ DiT: Conditioning of shape (512, 768) dtype float32
2326
+ FID is 237.0272216796875
2327
+ (512, 256, 256, 3)
2328
+ Calc FID for CFG 2.75 and denoise_timesteps 128
2329
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2330
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2331
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2332
+ DiT: Conditioning of shape (512, 768) dtype float32
2333
+ FID is 9.384163856506348
2334
+ (512, 256, 256, 3)
2335
+ Calc FID for CFG 2.75 and denoise_timesteps 64
2336
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2337
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2338
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2339
+ DiT: Conditioning of shape (512, 768) dtype float32
2340
+ FID is 9.374807357788086
2341
+ (512, 256, 256, 3)
2342
+ Calc FID for CFG 2.75 and denoise_timesteps 32
2343
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2344
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2345
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2346
+ DiT: Conditioning of shape (512, 768) dtype float32
2347
+ FID is 9.470893859863281
2348
+ (512, 256, 256, 3)
2349
+ Calc FID for CFG 2.75 and denoise_timesteps 16
2350
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2351
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2352
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2353
+ DiT: Conditioning of shape (512, 768) dtype float32
2354
+ FID is 10.200791358947754
2355
+ (512, 256, 256, 3)
2356
+ Calc FID for CFG 2.75 and denoise_timesteps 8
2357
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2358
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2359
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2360
+ DiT: Conditioning of shape (512, 768) dtype float32
2361
+ FID is 14.65302848815918
2362
+ (512, 256, 256, 3)
2363
+ Calc FID for CFG 2.75 and denoise_timesteps 4
2364
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2365
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2366
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2367
+ DiT: Conditioning of shape (512, 768) dtype float32
2368
+ FID is 47.41338348388672
2369
+ (512, 256, 256, 3)
2370
+ Calc FID for CFG 2.75 and denoise_timesteps 2
2371
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2372
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2373
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2374
+ DiT: Conditioning of shape (512, 768) dtype float32
2375
+ FID is 228.976318359375
2376
+ (512, 256, 256, 3)
2377
+ Calc FID for CFG 2.75 and denoise_timesteps 1
2378
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2379
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2380
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2381
+ DiT: Conditioning of shape (512, 768) dtype float32
2382
+ FID is 235.353271484375
2383
+ (512, 256, 256, 3)
2384
+ Calc FID for CFG 3.0 and denoise_timesteps 128
2385
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2386
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2387
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2388
+ DiT: Conditioning of shape (512, 768) dtype float32
2389
+ FID is 10.227725982666016
2390
+ (512, 256, 256, 3)
2391
+ Calc FID for CFG 3.0 and denoise_timesteps 64
2392
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2393
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2394
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2395
+ DiT: Conditioning of shape (512, 768) dtype float32
2396
+ FID is 10.197022438049316
2397
+ (512, 256, 256, 3)
2398
+ Calc FID for CFG 3.0 and denoise_timesteps 32
2399
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2400
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2401
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2402
+ DiT: Conditioning of shape (512, 768) dtype float32
2403
+ FID is 10.22225570678711
2404
+ (512, 256, 256, 3)
2405
+ Calc FID for CFG 3.0 and denoise_timesteps 16
2406
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2407
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2408
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2409
+ DiT: Conditioning of shape (512, 768) dtype float32
2410
+ FID is 10.676050186157227
2411
+ (512, 256, 256, 3)
2412
+ Calc FID for CFG 3.0 and denoise_timesteps 8
2413
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2414
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2415
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2416
+ DiT: Conditioning of shape (512, 768) dtype float32
2417
+ FID is 13.905902862548828
2418
+ (512, 256, 256, 3)
2419
+ Calc FID for CFG 3.0 and denoise_timesteps 4
2420
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2421
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2422
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2423
+ DiT: Conditioning of shape (512, 768) dtype float32
2424
+ FID is 42.176517486572266
2425
+ (512, 256, 256, 3)
2426
+ Calc FID for CFG 3.0 and denoise_timesteps 2
2427
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2428
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2429
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2430
+ DiT: Conditioning of shape (512, 768) dtype float32
2431
+ FID is 226.26129150390625
2432
+ (512, 256, 256, 3)
2433
+ Calc FID for CFG 3.0 and denoise_timesteps 1
2434
+ DiT: Input of shape (512, 32, 32, 4) dtype float32
2435
+ DiT: After patch embed, shape is (512, 256, 768) dtype bfloat16
2436
+ DiT: Patch Embed of shape (512, 256, 768) dtype bfloat16
2437
+ DiT: Conditioning of shape (512, 768) dtype float32
2438
+ FID is 233.85470581054688