perorina commited on
Commit
847e2b9
·
1 Parent(s): 3ed91a5

Create UNetStructureStr.txt

Browse files
Files changed (1) hide show
  1. UNetStructureStr.txt +926 -0
UNetStructureStr.txt ADDED
@@ -0,0 +1,926 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ input_blocks
2
+ ModuleList(
3
+ (0): TimestepEmbedSequential(
4
+ (0): Conv2d(4, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
5
+ )
6
+ (1-2): 2 x TimestepEmbedSequential(
7
+ (0): ResBlock(
8
+ (in_layers): Sequential(
9
+ (0): GroupNorm32(32, 320, eps=1e-05, affine=True)
10
+ (1): SiLU()
11
+ (2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
12
+ )
13
+ (h_upd): Identity()
14
+ (x_upd): Identity()
15
+ (emb_layers): Sequential(
16
+ (0): SiLU()
17
+ (1): Linear(in_features=1280, out_features=320, bias=True)
18
+ )
19
+ (out_layers): Sequential(
20
+ (0): GroupNorm32(32, 320, eps=1e-05, affine=True)
21
+ (1): SiLU()
22
+ (2): Dropout(p=0, inplace=False)
23
+ (3): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
24
+ )
25
+ (skip_connection): Identity()
26
+ )
27
+ (1): SpatialTransformer(
28
+ (norm): GroupNorm(32, 320, eps=1e-06, affine=True)
29
+ (proj_in): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
30
+ (transformer_blocks): ModuleList(
31
+ (0): BasicTransformerBlock(
32
+ (attn1): CrossAttention(
33
+ (to_q): Linear(in_features=320, out_features=320, bias=False)
34
+ (to_k): Linear(in_features=320, out_features=320, bias=False)
35
+ (to_v): Linear(in_features=320, out_features=320, bias=False)
36
+ (to_out): Sequential(
37
+ (0): Linear(in_features=320, out_features=320, bias=True)
38
+ (1): Dropout(p=0.0, inplace=False)
39
+ )
40
+ )
41
+ (ff): FeedForward(
42
+ (net): Sequential(
43
+ (0): GEGLU(
44
+ (proj): Linear(in_features=320, out_features=2560, bias=True)
45
+ )
46
+ (1): Dropout(p=0.0, inplace=False)
47
+ (2): Linear(in_features=1280, out_features=320, bias=True)
48
+ )
49
+ )
50
+ (attn2): CrossAttention(
51
+ (to_q): Linear(in_features=320, out_features=320, bias=False)
52
+ (to_k): Linear(in_features=768, out_features=320, bias=False)
53
+ (to_v): Linear(in_features=768, out_features=320, bias=False)
54
+ (to_out): Sequential(
55
+ (0): Linear(in_features=320, out_features=320, bias=True)
56
+ (1): Dropout(p=0.0, inplace=False)
57
+ )
58
+ )
59
+ (norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
60
+ (norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
61
+ (norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
62
+ )
63
+ )
64
+ (proj_out): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
65
+ )
66
+ )
67
+ (3): TimestepEmbedSequential(
68
+ (0): Downsample(
69
+ (op): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
70
+ )
71
+ )
72
+ (4): TimestepEmbedSequential(
73
+ (0): ResBlock(
74
+ (in_layers): Sequential(
75
+ (0): GroupNorm32(32, 320, eps=1e-05, affine=True)
76
+ (1): SiLU()
77
+ (2): Conv2d(320, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
78
+ )
79
+ (h_upd): Identity()
80
+ (x_upd): Identity()
81
+ (emb_layers): Sequential(
82
+ (0): SiLU()
83
+ (1): Linear(in_features=1280, out_features=640, bias=True)
84
+ )
85
+ (out_layers): Sequential(
86
+ (0): GroupNorm32(32, 640, eps=1e-05, affine=True)
87
+ (1): SiLU()
88
+ (2): Dropout(p=0, inplace=False)
89
+ (3): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
90
+ )
91
+ (skip_connection): Conv2d(320, 640, kernel_size=(1, 1), stride=(1, 1))
92
+ )
93
+ (1): SpatialTransformer(
94
+ (norm): GroupNorm(32, 640, eps=1e-06, affine=True)
95
+ (proj_in): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
96
+ (transformer_blocks): ModuleList(
97
+ (0): BasicTransformerBlock(
98
+ (attn1): CrossAttention(
99
+ (to_q): Linear(in_features=640, out_features=640, bias=False)
100
+ (to_k): Linear(in_features=640, out_features=640, bias=False)
101
+ (to_v): Linear(in_features=640, out_features=640, bias=False)
102
+ (to_out): Sequential(
103
+ (0): Linear(in_features=640, out_features=640, bias=True)
104
+ (1): Dropout(p=0.0, inplace=False)
105
+ )
106
+ )
107
+ (ff): FeedForward(
108
+ (net): Sequential(
109
+ (0): GEGLU(
110
+ (proj): Linear(in_features=640, out_features=5120, bias=True)
111
+ )
112
+ (1): Dropout(p=0.0, inplace=False)
113
+ (2): Linear(in_features=2560, out_features=640, bias=True)
114
+ )
115
+ )
116
+ (attn2): CrossAttention(
117
+ (to_q): Linear(in_features=640, out_features=640, bias=False)
118
+ (to_k): Linear(in_features=768, out_features=640, bias=False)
119
+ (to_v): Linear(in_features=768, out_features=640, bias=False)
120
+ (to_out): Sequential(
121
+ (0): Linear(in_features=640, out_features=640, bias=True)
122
+ (1): Dropout(p=0.0, inplace=False)
123
+ )
124
+ )
125
+ (norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
126
+ (norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
127
+ (norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
128
+ )
129
+ )
130
+ (proj_out): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
131
+ )
132
+ )
133
+ (5): TimestepEmbedSequential(
134
+ (0): ResBlock(
135
+ (in_layers): Sequential(
136
+ (0): GroupNorm32(32, 640, eps=1e-05, affine=True)
137
+ (1): SiLU()
138
+ (2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
139
+ )
140
+ (h_upd): Identity()
141
+ (x_upd): Identity()
142
+ (emb_layers): Sequential(
143
+ (0): SiLU()
144
+ (1): Linear(in_features=1280, out_features=640, bias=True)
145
+ )
146
+ (out_layers): Sequential(
147
+ (0): GroupNorm32(32, 640, eps=1e-05, affine=True)
148
+ (1): SiLU()
149
+ (2): Dropout(p=0, inplace=False)
150
+ (3): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
151
+ )
152
+ (skip_connection): Identity()
153
+ )
154
+ (1): SpatialTransformer(
155
+ (norm): GroupNorm(32, 640, eps=1e-06, affine=True)
156
+ (proj_in): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
157
+ (transformer_blocks): ModuleList(
158
+ (0): BasicTransformerBlock(
159
+ (attn1): CrossAttention(
160
+ (to_q): Linear(in_features=640, out_features=640, bias=False)
161
+ (to_k): Linear(in_features=640, out_features=640, bias=False)
162
+ (to_v): Linear(in_features=640, out_features=640, bias=False)
163
+ (to_out): Sequential(
164
+ (0): Linear(in_features=640, out_features=640, bias=True)
165
+ (1): Dropout(p=0.0, inplace=False)
166
+ )
167
+ )
168
+ (ff): FeedForward(
169
+ (net): Sequential(
170
+ (0): GEGLU(
171
+ (proj): Linear(in_features=640, out_features=5120, bias=True)
172
+ )
173
+ (1): Dropout(p=0.0, inplace=False)
174
+ (2): Linear(in_features=2560, out_features=640, bias=True)
175
+ )
176
+ )
177
+ (attn2): CrossAttention(
178
+ (to_q): Linear(in_features=640, out_features=640, bias=False)
179
+ (to_k): Linear(in_features=768, out_features=640, bias=False)
180
+ (to_v): Linear(in_features=768, out_features=640, bias=False)
181
+ (to_out): Sequential(
182
+ (0): Linear(in_features=640, out_features=640, bias=True)
183
+ (1): Dropout(p=0.0, inplace=False)
184
+ )
185
+ )
186
+ (norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
187
+ (norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
188
+ (norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
189
+ )
190
+ )
191
+ (proj_out): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
192
+ )
193
+ )
194
+ (6): TimestepEmbedSequential(
195
+ (0): Downsample(
196
+ (op): Conv2d(640, 640, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
197
+ )
198
+ )
199
+ (7): TimestepEmbedSequential(
200
+ (0): ResBlock(
201
+ (in_layers): Sequential(
202
+ (0): GroupNorm32(32, 640, eps=1e-05, affine=True)
203
+ (1): SiLU()
204
+ (2): Conv2d(640, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
205
+ )
206
+ (h_upd): Identity()
207
+ (x_upd): Identity()
208
+ (emb_layers): Sequential(
209
+ (0): SiLU()
210
+ (1): Linear(in_features=1280, out_features=1280, bias=True)
211
+ )
212
+ (out_layers): Sequential(
213
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
214
+ (1): SiLU()
215
+ (2): Dropout(p=0, inplace=False)
216
+ (3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
217
+ )
218
+ (skip_connection): Conv2d(640, 1280, kernel_size=(1, 1), stride=(1, 1))
219
+ )
220
+ (1): SpatialTransformer(
221
+ (norm): GroupNorm(32, 1280, eps=1e-06, affine=True)
222
+ (proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
223
+ (transformer_blocks): ModuleList(
224
+ (0): BasicTransformerBlock(
225
+ (attn1): CrossAttention(
226
+ (to_q): Linear(in_features=1280, out_features=1280, bias=False)
227
+ (to_k): Linear(in_features=1280, out_features=1280, bias=False)
228
+ (to_v): Linear(in_features=1280, out_features=1280, bias=False)
229
+ (to_out): Sequential(
230
+ (0): Linear(in_features=1280, out_features=1280, bias=True)
231
+ (1): Dropout(p=0.0, inplace=False)
232
+ )
233
+ )
234
+ (ff): FeedForward(
235
+ (net): Sequential(
236
+ (0): GEGLU(
237
+ (proj): Linear(in_features=1280, out_features=10240, bias=True)
238
+ )
239
+ (1): Dropout(p=0.0, inplace=False)
240
+ (2): Linear(in_features=5120, out_features=1280, bias=True)
241
+ )
242
+ )
243
+ (attn2): CrossAttention(
244
+ (to_q): Linear(in_features=1280, out_features=1280, bias=False)
245
+ (to_k): Linear(in_features=768, out_features=1280, bias=False)
246
+ (to_v): Linear(in_features=768, out_features=1280, bias=False)
247
+ (to_out): Sequential(
248
+ (0): Linear(in_features=1280, out_features=1280, bias=True)
249
+ (1): Dropout(p=0.0, inplace=False)
250
+ )
251
+ )
252
+ (norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
253
+ (norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
254
+ (norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
255
+ )
256
+ )
257
+ (proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
258
+ )
259
+ )
260
+ (8): TimestepEmbedSequential(
261
+ (0): ResBlock(
262
+ (in_layers): Sequential(
263
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
264
+ (1): SiLU()
265
+ (2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
266
+ )
267
+ (h_upd): Identity()
268
+ (x_upd): Identity()
269
+ (emb_layers): Sequential(
270
+ (0): SiLU()
271
+ (1): Linear(in_features=1280, out_features=1280, bias=True)
272
+ )
273
+ (out_layers): Sequential(
274
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
275
+ (1): SiLU()
276
+ (2): Dropout(p=0, inplace=False)
277
+ (3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
278
+ )
279
+ (skip_connection): Identity()
280
+ )
281
+ (1): SpatialTransformer(
282
+ (norm): GroupNorm(32, 1280, eps=1e-06, affine=True)
283
+ (proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
284
+ (transformer_blocks): ModuleList(
285
+ (0): BasicTransformerBlock(
286
+ (attn1): CrossAttention(
287
+ (to_q): Linear(in_features=1280, out_features=1280, bias=False)
288
+ (to_k): Linear(in_features=1280, out_features=1280, bias=False)
289
+ (to_v): Linear(in_features=1280, out_features=1280, bias=False)
290
+ (to_out): Sequential(
291
+ (0): Linear(in_features=1280, out_features=1280, bias=True)
292
+ (1): Dropout(p=0.0, inplace=False)
293
+ )
294
+ )
295
+ (ff): FeedForward(
296
+ (net): Sequential(
297
+ (0): GEGLU(
298
+ (proj): Linear(in_features=1280, out_features=10240, bias=True)
299
+ )
300
+ (1): Dropout(p=0.0, inplace=False)
301
+ (2): Linear(in_features=5120, out_features=1280, bias=True)
302
+ )
303
+ )
304
+ (attn2): CrossAttention(
305
+ (to_q): Linear(in_features=1280, out_features=1280, bias=False)
306
+ (to_k): Linear(in_features=768, out_features=1280, bias=False)
307
+ (to_v): Linear(in_features=768, out_features=1280, bias=False)
308
+ (to_out): Sequential(
309
+ (0): Linear(in_features=1280, out_features=1280, bias=True)
310
+ (1): Dropout(p=0.0, inplace=False)
311
+ )
312
+ )
313
+ (norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
314
+ (norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
315
+ (norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
316
+ )
317
+ )
318
+ (proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
319
+ )
320
+ )
321
+ (9): TimestepEmbedSequential(
322
+ (0): Downsample(
323
+ (op): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
324
+ )
325
+ )
326
+ (10-11): 2 x TimestepEmbedSequential(
327
+ (0): ResBlock(
328
+ (in_layers): Sequential(
329
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
330
+ (1): SiLU()
331
+ (2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
332
+ )
333
+ (h_upd): Identity()
334
+ (x_upd): Identity()
335
+ (emb_layers): Sequential(
336
+ (0): SiLU()
337
+ (1): Linear(in_features=1280, out_features=1280, bias=True)
338
+ )
339
+ (out_layers): Sequential(
340
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
341
+ (1): SiLU()
342
+ (2): Dropout(p=0, inplace=False)
343
+ (3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
344
+ )
345
+ (skip_connection): Identity()
346
+ )
347
+ )
348
+ )
349
+
350
+
351
+ middle_block
352
+ TimestepEmbedSequential(
353
+ (0): ResBlock(
354
+ (in_layers): Sequential(
355
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
356
+ (1): SiLU()
357
+ (2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
358
+ )
359
+ (h_upd): Identity()
360
+ (x_upd): Identity()
361
+ (emb_layers): Sequential(
362
+ (0): SiLU()
363
+ (1): Linear(in_features=1280, out_features=1280, bias=True)
364
+ )
365
+ (out_layers): Sequential(
366
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
367
+ (1): SiLU()
368
+ (2): Dropout(p=0, inplace=False)
369
+ (3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
370
+ )
371
+ (skip_connection): Identity()
372
+ )
373
+ (1): SpatialTransformer(
374
+ (norm): GroupNorm(32, 1280, eps=1e-06, affine=True)
375
+ (proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
376
+ (transformer_blocks): ModuleList(
377
+ (0): BasicTransformerBlock(
378
+ (attn1): CrossAttention(
379
+ (to_q): Linear(in_features=1280, out_features=1280, bias=False)
380
+ (to_k): Linear(in_features=1280, out_features=1280, bias=False)
381
+ (to_v): Linear(in_features=1280, out_features=1280, bias=False)
382
+ (to_out): Sequential(
383
+ (0): Linear(in_features=1280, out_features=1280, bias=True)
384
+ (1): Dropout(p=0.0, inplace=False)
385
+ )
386
+ )
387
+ (ff): FeedForward(
388
+ (net): Sequential(
389
+ (0): GEGLU(
390
+ (proj): Linear(in_features=1280, out_features=10240, bias=True)
391
+ )
392
+ (1): Dropout(p=0.0, inplace=False)
393
+ (2): Linear(in_features=5120, out_features=1280, bias=True)
394
+ )
395
+ )
396
+ (attn2): CrossAttention(
397
+ (to_q): Linear(in_features=1280, out_features=1280, bias=False)
398
+ (to_k): Linear(in_features=768, out_features=1280, bias=False)
399
+ (to_v): Linear(in_features=768, out_features=1280, bias=False)
400
+ (to_out): Sequential(
401
+ (0): Linear(in_features=1280, out_features=1280, bias=True)
402
+ (1): Dropout(p=0.0, inplace=False)
403
+ )
404
+ )
405
+ (norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
406
+ (norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
407
+ (norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
408
+ )
409
+ )
410
+ (proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
411
+ )
412
+ (2): ResBlock(
413
+ (in_layers): Sequential(
414
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
415
+ (1): SiLU()
416
+ (2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
417
+ )
418
+ (h_upd): Identity()
419
+ (x_upd): Identity()
420
+ (emb_layers): Sequential(
421
+ (0): SiLU()
422
+ (1): Linear(in_features=1280, out_features=1280, bias=True)
423
+ )
424
+ (out_layers): Sequential(
425
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
426
+ (1): SiLU()
427
+ (2): Dropout(p=0, inplace=False)
428
+ (3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
429
+ )
430
+ (skip_connection): Identity()
431
+ )
432
+ )
433
+
434
+
435
+
436
+ output_blocks
437
+ ModuleList(
438
+ (0-1): 2 x TimestepEmbedSequential(
439
+ (0): ResBlock(
440
+ (in_layers): Sequential(
441
+ (0): GroupNorm32(32, 2560, eps=1e-05, affine=True)
442
+ (1): SiLU()
443
+ (2): Conv2d(2560, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
444
+ )
445
+ (h_upd): Identity()
446
+ (x_upd): Identity()
447
+ (emb_layers): Sequential(
448
+ (0): SiLU()
449
+ (1): Linear(in_features=1280, out_features=1280, bias=True)
450
+ )
451
+ (out_layers): Sequential(
452
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
453
+ (1): SiLU()
454
+ (2): Dropout(p=0, inplace=False)
455
+ (3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
456
+ )
457
+ (skip_connection): Conv2d(2560, 1280, kernel_size=(1, 1), stride=(1, 1))
458
+ )
459
+ )
460
+ (2): TimestepEmbedSequential(
461
+ (0): ResBlock(
462
+ (in_layers): Sequential(
463
+ (0): GroupNorm32(32, 2560, eps=1e-05, affine=True)
464
+ (1): SiLU()
465
+ (2): Conv2d(2560, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
466
+ )
467
+ (h_upd): Identity()
468
+ (x_upd): Identity()
469
+ (emb_layers): Sequential(
470
+ (0): SiLU()
471
+ (1): Linear(in_features=1280, out_features=1280, bias=True)
472
+ )
473
+ (out_layers): Sequential(
474
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
475
+ (1): SiLU()
476
+ (2): Dropout(p=0, inplace=False)
477
+ (3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
478
+ )
479
+ (skip_connection): Conv2d(2560, 1280, kernel_size=(1, 1), stride=(1, 1))
480
+ )
481
+ (1): Upsample(
482
+ (conv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
483
+ )
484
+ )
485
+ (3-4): 2 x TimestepEmbedSequential(
486
+ (0): ResBlock(
487
+ (in_layers): Sequential(
488
+ (0): GroupNorm32(32, 2560, eps=1e-05, affine=True)
489
+ (1): SiLU()
490
+ (2): Conv2d(2560, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
491
+ )
492
+ (h_upd): Identity()
493
+ (x_upd): Identity()
494
+ (emb_layers): Sequential(
495
+ (0): SiLU()
496
+ (1): Linear(in_features=1280, out_features=1280, bias=True)
497
+ )
498
+ (out_layers): Sequential(
499
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
500
+ (1): SiLU()
501
+ (2): Dropout(p=0, inplace=False)
502
+ (3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
503
+ )
504
+ (skip_connection): Conv2d(2560, 1280, kernel_size=(1, 1), stride=(1, 1))
505
+ )
506
+ (1): SpatialTransformer(
507
+ (norm): GroupNorm(32, 1280, eps=1e-06, affine=True)
508
+ (proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
509
+ (transformer_blocks): ModuleList(
510
+ (0): BasicTransformerBlock(
511
+ (attn1): CrossAttention(
512
+ (to_q): Linear(in_features=1280, out_features=1280, bias=False)
513
+ (to_k): Linear(in_features=1280, out_features=1280, bias=False)
514
+ (to_v): Linear(in_features=1280, out_features=1280, bias=False)
515
+ (to_out): Sequential(
516
+ (0): Linear(in_features=1280, out_features=1280, bias=True)
517
+ (1): Dropout(p=0.0, inplace=False)
518
+ )
519
+ )
520
+ (ff): FeedForward(
521
+ (net): Sequential(
522
+ (0): GEGLU(
523
+ (proj): Linear(in_features=1280, out_features=10240, bias=True)
524
+ )
525
+ (1): Dropout(p=0.0, inplace=False)
526
+ (2): Linear(in_features=5120, out_features=1280, bias=True)
527
+ )
528
+ )
529
+ (attn2): CrossAttention(
530
+ (to_q): Linear(in_features=1280, out_features=1280, bias=False)
531
+ (to_k): Linear(in_features=768, out_features=1280, bias=False)
532
+ (to_v): Linear(in_features=768, out_features=1280, bias=False)
533
+ (to_out): Sequential(
534
+ (0): Linear(in_features=1280, out_features=1280, bias=True)
535
+ (1): Dropout(p=0.0, inplace=False)
536
+ )
537
+ )
538
+ (norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
539
+ (norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
540
+ (norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
541
+ )
542
+ )
543
+ (proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
544
+ )
545
+ )
546
+ (5): TimestepEmbedSequential(
547
+ (0): ResBlock(
548
+ (in_layers): Sequential(
549
+ (0): GroupNorm32(32, 1920, eps=1e-05, affine=True)
550
+ (1): SiLU()
551
+ (2): Conv2d(1920, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
552
+ )
553
+ (h_upd): Identity()
554
+ (x_upd): Identity()
555
+ (emb_layers): Sequential(
556
+ (0): SiLU()
557
+ (1): Linear(in_features=1280, out_features=1280, bias=True)
558
+ )
559
+ (out_layers): Sequential(
560
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
561
+ (1): SiLU()
562
+ (2): Dropout(p=0, inplace=False)
563
+ (3): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
564
+ )
565
+ (skip_connection): Conv2d(1920, 1280, kernel_size=(1, 1), stride=(1, 1))
566
+ )
567
+ (1): SpatialTransformer(
568
+ (norm): GroupNorm(32, 1280, eps=1e-06, affine=True)
569
+ (proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
570
+ (transformer_blocks): ModuleList(
571
+ (0): BasicTransformerBlock(
572
+ (attn1): CrossAttention(
573
+ (to_q): Linear(in_features=1280, out_features=1280, bias=False)
574
+ (to_k): Linear(in_features=1280, out_features=1280, bias=False)
575
+ (to_v): Linear(in_features=1280, out_features=1280, bias=False)
576
+ (to_out): Sequential(
577
+ (0): Linear(in_features=1280, out_features=1280, bias=True)
578
+ (1): Dropout(p=0.0, inplace=False)
579
+ )
580
+ )
581
+ (ff): FeedForward(
582
+ (net): Sequential(
583
+ (0): GEGLU(
584
+ (proj): Linear(in_features=1280, out_features=10240, bias=True)
585
+ )
586
+ (1): Dropout(p=0.0, inplace=False)
587
+ (2): Linear(in_features=5120, out_features=1280, bias=True)
588
+ )
589
+ )
590
+ (attn2): CrossAttention(
591
+ (to_q): Linear(in_features=1280, out_features=1280, bias=False)
592
+ (to_k): Linear(in_features=768, out_features=1280, bias=False)
593
+ (to_v): Linear(in_features=768, out_features=1280, bias=False)
594
+ (to_out): Sequential(
595
+ (0): Linear(in_features=1280, out_features=1280, bias=True)
596
+ (1): Dropout(p=0.0, inplace=False)
597
+ )
598
+ )
599
+ (norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
600
+ (norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
601
+ (norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
602
+ )
603
+ )
604
+ (proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
605
+ )
606
+ (2): Upsample(
607
+ (conv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
608
+ )
609
+ )
610
+ (6): TimestepEmbedSequential(
611
+ (0): ResBlock(
612
+ (in_layers): Sequential(
613
+ (0): GroupNorm32(32, 1920, eps=1e-05, affine=True)
614
+ (1): SiLU()
615
+ (2): Conv2d(1920, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
616
+ )
617
+ (h_upd): Identity()
618
+ (x_upd): Identity()
619
+ (emb_layers): Sequential(
620
+ (0): SiLU()
621
+ (1): Linear(in_features=1280, out_features=640, bias=True)
622
+ )
623
+ (out_layers): Sequential(
624
+ (0): GroupNorm32(32, 640, eps=1e-05, affine=True)
625
+ (1): SiLU()
626
+ (2): Dropout(p=0, inplace=False)
627
+ (3): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
628
+ )
629
+ (skip_connection): Conv2d(1920, 640, kernel_size=(1, 1), stride=(1, 1))
630
+ )
631
+ (1): SpatialTransformer(
632
+ (norm): GroupNorm(32, 640, eps=1e-06, affine=True)
633
+ (proj_in): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
634
+ (transformer_blocks): ModuleList(
635
+ (0): BasicTransformerBlock(
636
+ (attn1): CrossAttention(
637
+ (to_q): Linear(in_features=640, out_features=640, bias=False)
638
+ (to_k): Linear(in_features=640, out_features=640, bias=False)
639
+ (to_v): Linear(in_features=640, out_features=640, bias=False)
640
+ (to_out): Sequential(
641
+ (0): Linear(in_features=640, out_features=640, bias=True)
642
+ (1): Dropout(p=0.0, inplace=False)
643
+ )
644
+ )
645
+ (ff): FeedForward(
646
+ (net): Sequential(
647
+ (0): GEGLU(
648
+ (proj): Linear(in_features=640, out_features=5120, bias=True)
649
+ )
650
+ (1): Dropout(p=0.0, inplace=False)
651
+ (2): Linear(in_features=2560, out_features=640, bias=True)
652
+ )
653
+ )
654
+ (attn2): CrossAttention(
655
+ (to_q): Linear(in_features=640, out_features=640, bias=False)
656
+ (to_k): Linear(in_features=768, out_features=640, bias=False)
657
+ (to_v): Linear(in_features=768, out_features=640, bias=False)
658
+ (to_out): Sequential(
659
+ (0): Linear(in_features=640, out_features=640, bias=True)
660
+ (1): Dropout(p=0.0, inplace=False)
661
+ )
662
+ )
663
+ (norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
664
+ (norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
665
+ (norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
666
+ )
667
+ )
668
+ (proj_out): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
669
+ )
670
+ )
671
+ (7): TimestepEmbedSequential(
672
+ (0): ResBlock(
673
+ (in_layers): Sequential(
674
+ (0): GroupNorm32(32, 1280, eps=1e-05, affine=True)
675
+ (1): SiLU()
676
+ (2): Conv2d(1280, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
677
+ )
678
+ (h_upd): Identity()
679
+ (x_upd): Identity()
680
+ (emb_layers): Sequential(
681
+ (0): SiLU()
682
+ (1): Linear(in_features=1280, out_features=640, bias=True)
683
+ )
684
+ (out_layers): Sequential(
685
+ (0): GroupNorm32(32, 640, eps=1e-05, affine=True)
686
+ (1): SiLU()
687
+ (2): Dropout(p=0, inplace=False)
688
+ (3): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
689
+ )
690
+ (skip_connection): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1))
691
+ )
692
+ (1): SpatialTransformer(
693
+ (norm): GroupNorm(32, 640, eps=1e-06, affine=True)
694
+ (proj_in): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
695
+ (transformer_blocks): ModuleList(
696
+ (0): BasicTransformerBlock(
697
+ (attn1): CrossAttention(
698
+ (to_q): Linear(in_features=640, out_features=640, bias=False)
699
+ (to_k): Linear(in_features=640, out_features=640, bias=False)
700
+ (to_v): Linear(in_features=640, out_features=640, bias=False)
701
+ (to_out): Sequential(
702
+ (0): Linear(in_features=640, out_features=640, bias=True)
703
+ (1): Dropout(p=0.0, inplace=False)
704
+ )
705
+ )
706
+ (ff): FeedForward(
707
+ (net): Sequential(
708
+ (0): GEGLU(
709
+ (proj): Linear(in_features=640, out_features=5120, bias=True)
710
+ )
711
+ (1): Dropout(p=0.0, inplace=False)
712
+ (2): Linear(in_features=2560, out_features=640, bias=True)
713
+ )
714
+ )
715
+ (attn2): CrossAttention(
716
+ (to_q): Linear(in_features=640, out_features=640, bias=False)
717
+ (to_k): Linear(in_features=768, out_features=640, bias=False)
718
+ (to_v): Linear(in_features=768, out_features=640, bias=False)
719
+ (to_out): Sequential(
720
+ (0): Linear(in_features=640, out_features=640, bias=True)
721
+ (1): Dropout(p=0.0, inplace=False)
722
+ )
723
+ )
724
+ (norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
725
+ (norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
726
+ (norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
727
+ )
728
+ )
729
+ (proj_out): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
730
+ )
731
+ )
732
+ (8): TimestepEmbedSequential(
733
+ (0): ResBlock(
734
+ (in_layers): Sequential(
735
+ (0): GroupNorm32(32, 960, eps=1e-05, affine=True)
736
+ (1): SiLU()
737
+ (2): Conv2d(960, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
738
+ )
739
+ (h_upd): Identity()
740
+ (x_upd): Identity()
741
+ (emb_layers): Sequential(
742
+ (0): SiLU()
743
+ (1): Linear(in_features=1280, out_features=640, bias=True)
744
+ )
745
+ (out_layers): Sequential(
746
+ (0): GroupNorm32(32, 640, eps=1e-05, affine=True)
747
+ (1): SiLU()
748
+ (2): Dropout(p=0, inplace=False)
749
+ (3): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
750
+ )
751
+ (skip_connection): Conv2d(960, 640, kernel_size=(1, 1), stride=(1, 1))
752
+ )
753
+ (1): SpatialTransformer(
754
+ (norm): GroupNorm(32, 640, eps=1e-06, affine=True)
755
+ (proj_in): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
756
+ (transformer_blocks): ModuleList(
757
+ (0): BasicTransformerBlock(
758
+ (attn1): CrossAttention(
759
+ (to_q): Linear(in_features=640, out_features=640, bias=False)
760
+ (to_k): Linear(in_features=640, out_features=640, bias=False)
761
+ (to_v): Linear(in_features=640, out_features=640, bias=False)
762
+ (to_out): Sequential(
763
+ (0): Linear(in_features=640, out_features=640, bias=True)
764
+ (1): Dropout(p=0.0, inplace=False)
765
+ )
766
+ )
767
+ (ff): FeedForward(
768
+ (net): Sequential(
769
+ (0): GEGLU(
770
+ (proj): Linear(in_features=640, out_features=5120, bias=True)
771
+ )
772
+ (1): Dropout(p=0.0, inplace=False)
773
+ (2): Linear(in_features=2560, out_features=640, bias=True)
774
+ )
775
+ )
776
+ (attn2): CrossAttention(
777
+ (to_q): Linear(in_features=640, out_features=640, bias=False)
778
+ (to_k): Linear(in_features=768, out_features=640, bias=False)
779
+ (to_v): Linear(in_features=768, out_features=640, bias=False)
780
+ (to_out): Sequential(
781
+ (0): Linear(in_features=640, out_features=640, bias=True)
782
+ (1): Dropout(p=0.0, inplace=False)
783
+ )
784
+ )
785
+ (norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
786
+ (norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
787
+ (norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
788
+ )
789
+ )
790
+ (proj_out): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
791
+ )
792
+ (2): Upsample(
793
+ (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
794
+ )
795
+ )
796
+ (9): TimestepEmbedSequential(
797
+ (0): ResBlock(
798
+ (in_layers): Sequential(
799
+ (0): GroupNorm32(32, 960, eps=1e-05, affine=True)
800
+ (1): SiLU()
801
+ (2): Conv2d(960, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
802
+ )
803
+ (h_upd): Identity()
804
+ (x_upd): Identity()
805
+ (emb_layers): Sequential(
806
+ (0): SiLU()
807
+ (1): Linear(in_features=1280, out_features=320, bias=True)
808
+ )
809
+ (out_layers): Sequential(
810
+ (0): GroupNorm32(32, 320, eps=1e-05, affine=True)
811
+ (1): SiLU()
812
+ (2): Dropout(p=0, inplace=False)
813
+ (3): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
814
+ )
815
+ (skip_connection): Conv2d(960, 320, kernel_size=(1, 1), stride=(1, 1))
816
+ )
817
+ (1): SpatialTransformer(
818
+ (norm): GroupNorm(32, 320, eps=1e-06, affine=True)
819
+ (proj_in): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
820
+ (transformer_blocks): ModuleList(
821
+ (0): BasicTransformerBlock(
822
+ (attn1): CrossAttention(
823
+ (to_q): Linear(in_features=320, out_features=320, bias=False)
824
+ (to_k): Linear(in_features=320, out_features=320, bias=False)
825
+ (to_v): Linear(in_features=320, out_features=320, bias=False)
826
+ (to_out): Sequential(
827
+ (0): Linear(in_features=320, out_features=320, bias=True)
828
+ (1): Dropout(p=0.0, inplace=False)
829
+ )
830
+ )
831
+ (ff): FeedForward(
832
+ (net): Sequential(
833
+ (0): GEGLU(
834
+ (proj): Linear(in_features=320, out_features=2560, bias=True)
835
+ )
836
+ (1): Dropout(p=0.0, inplace=False)
837
+ (2): Linear(in_features=1280, out_features=320, bias=True)
838
+ )
839
+ )
840
+ (attn2): CrossAttention(
841
+ (to_q): Linear(in_features=320, out_features=320, bias=False)
842
+ (to_k): Linear(in_features=768, out_features=320, bias=False)
843
+ (to_v): Linear(in_features=768, out_features=320, bias=False)
844
+ (to_out): Sequential(
845
+ (0): Linear(in_features=320, out_features=320, bias=True)
846
+ (1): Dropout(p=0.0, inplace=False)
847
+ )
848
+ )
849
+ (norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
850
+ (norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
851
+ (norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
852
+ )
853
+ )
854
+ (proj_out): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
855
+ )
856
+ )
857
+ (10-11): 2 x TimestepEmbedSequential(
858
+ (0): ResBlock(
859
+ (in_layers): Sequential(
860
+ (0): GroupNorm32(32, 640, eps=1e-05, affine=True)
861
+ (1): SiLU()
862
+ (2): Conv2d(640, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
863
+ )
864
+ (h_upd): Identity()
865
+ (x_upd): Identity()
866
+ (emb_layers): Sequential(
867
+ (0): SiLU()
868
+ (1): Linear(in_features=1280, out_features=320, bias=True)
869
+ )
870
+ (out_layers): Sequential(
871
+ (0): GroupNorm32(32, 320, eps=1e-05, affine=True)
872
+ (1): SiLU()
873
+ (2): Dropout(p=0, inplace=False)
874
+ (3): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
875
+ )
876
+ (skip_connection): Conv2d(640, 320, kernel_size=(1, 1), stride=(1, 1))
877
+ )
878
+ (1): SpatialTransformer(
879
+ (norm): GroupNorm(32, 320, eps=1e-06, affine=True)
880
+ (proj_in): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
881
+ (transformer_blocks): ModuleList(
882
+ (0): BasicTransformerBlock(
883
+ (attn1): CrossAttention(
884
+ (to_q): Linear(in_features=320, out_features=320, bias=False)
885
+ (to_k): Linear(in_features=320, out_features=320, bias=False)
886
+ (to_v): Linear(in_features=320, out_features=320, bias=False)
887
+ (to_out): Sequential(
888
+ (0): Linear(in_features=320, out_features=320, bias=True)
889
+ (1): Dropout(p=0.0, inplace=False)
890
+ )
891
+ )
892
+ (ff): FeedForward(
893
+ (net): Sequential(
894
+ (0): GEGLU(
895
+ (proj): Linear(in_features=320, out_features=2560, bias=True)
896
+ )
897
+ (1): Dropout(p=0.0, inplace=False)
898
+ (2): Linear(in_features=1280, out_features=320, bias=True)
899
+ )
900
+ )
901
+ (attn2): CrossAttention(
902
+ (to_q): Linear(in_features=320, out_features=320, bias=False)
903
+ (to_k): Linear(in_features=768, out_features=320, bias=False)
904
+ (to_v): Linear(in_features=768, out_features=320, bias=False)
905
+ (to_out): Sequential(
906
+ (0): Linear(in_features=320, out_features=320, bias=True)
907
+ (1): Dropout(p=0.0, inplace=False)
908
+ )
909
+ )
910
+ (norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
911
+ (norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
912
+ (norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
913
+ )
914
+ )
915
+ (proj_out): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
916
+ )
917
+ )
918
+ )
919
+
920
+
921
+ out
922
+ Sequential(
923
+ (0): GroupNorm32(32, 320, eps=1e-05, affine=True)
924
+ (1): SiLU()
925
+ (2): Conv2d(320, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
926
+ )