Ryann829 commited on
Commit
b8e4e54
·
verified ·
1 Parent(s): 67a8d88

Upload folder using huggingface_hub

Browse files
.DS_Store ADDED
Binary file (6.15 kB). View file
 
ae.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afc8e28272cd15db3919bacdb6918ce9c1ed22e96cb12c4d5ed0fba823529e38
3
+ size 335304388
ae_metadata.json ADDED
@@ -0,0 +1,1676 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "decoder.conv_in.bias": {
3
+ "shape": [
4
+ 512
5
+ ],
6
+ "dtype": "torch.float32"
7
+ },
8
+ "decoder.conv_in.weight": {
9
+ "shape": [
10
+ 512,
11
+ 16,
12
+ 3,
13
+ 3
14
+ ],
15
+ "dtype": "torch.float32"
16
+ },
17
+ "decoder.conv_out.bias": {
18
+ "shape": [
19
+ 3
20
+ ],
21
+ "dtype": "torch.float32"
22
+ },
23
+ "decoder.conv_out.weight": {
24
+ "shape": [
25
+ 3,
26
+ 128,
27
+ 3,
28
+ 3
29
+ ],
30
+ "dtype": "torch.float32"
31
+ },
32
+ "decoder.mid.attn_1.k.bias": {
33
+ "shape": [
34
+ 512
35
+ ],
36
+ "dtype": "torch.float32"
37
+ },
38
+ "decoder.mid.attn_1.k.weight": {
39
+ "shape": [
40
+ 512,
41
+ 512,
42
+ 1,
43
+ 1
44
+ ],
45
+ "dtype": "torch.float32"
46
+ },
47
+ "decoder.mid.attn_1.norm.bias": {
48
+ "shape": [
49
+ 512
50
+ ],
51
+ "dtype": "torch.float32"
52
+ },
53
+ "decoder.mid.attn_1.norm.weight": {
54
+ "shape": [
55
+ 512
56
+ ],
57
+ "dtype": "torch.float32"
58
+ },
59
+ "decoder.mid.attn_1.proj_out.bias": {
60
+ "shape": [
61
+ 512
62
+ ],
63
+ "dtype": "torch.float32"
64
+ },
65
+ "decoder.mid.attn_1.proj_out.weight": {
66
+ "shape": [
67
+ 512,
68
+ 512,
69
+ 1,
70
+ 1
71
+ ],
72
+ "dtype": "torch.float32"
73
+ },
74
+ "decoder.mid.attn_1.q.bias": {
75
+ "shape": [
76
+ 512
77
+ ],
78
+ "dtype": "torch.float32"
79
+ },
80
+ "decoder.mid.attn_1.q.weight": {
81
+ "shape": [
82
+ 512,
83
+ 512,
84
+ 1,
85
+ 1
86
+ ],
87
+ "dtype": "torch.float32"
88
+ },
89
+ "decoder.mid.attn_1.v.bias": {
90
+ "shape": [
91
+ 512
92
+ ],
93
+ "dtype": "torch.float32"
94
+ },
95
+ "decoder.mid.attn_1.v.weight": {
96
+ "shape": [
97
+ 512,
98
+ 512,
99
+ 1,
100
+ 1
101
+ ],
102
+ "dtype": "torch.float32"
103
+ },
104
+ "decoder.mid.block_1.conv1.bias": {
105
+ "shape": [
106
+ 512
107
+ ],
108
+ "dtype": "torch.float32"
109
+ },
110
+ "decoder.mid.block_1.conv1.weight": {
111
+ "shape": [
112
+ 512,
113
+ 512,
114
+ 3,
115
+ 3
116
+ ],
117
+ "dtype": "torch.float32"
118
+ },
119
+ "decoder.mid.block_1.conv2.bias": {
120
+ "shape": [
121
+ 512
122
+ ],
123
+ "dtype": "torch.float32"
124
+ },
125
+ "decoder.mid.block_1.conv2.weight": {
126
+ "shape": [
127
+ 512,
128
+ 512,
129
+ 3,
130
+ 3
131
+ ],
132
+ "dtype": "torch.float32"
133
+ },
134
+ "decoder.mid.block_1.norm1.bias": {
135
+ "shape": [
136
+ 512
137
+ ],
138
+ "dtype": "torch.float32"
139
+ },
140
+ "decoder.mid.block_1.norm1.weight": {
141
+ "shape": [
142
+ 512
143
+ ],
144
+ "dtype": "torch.float32"
145
+ },
146
+ "decoder.mid.block_1.norm2.bias": {
147
+ "shape": [
148
+ 512
149
+ ],
150
+ "dtype": "torch.float32"
151
+ },
152
+ "decoder.mid.block_1.norm2.weight": {
153
+ "shape": [
154
+ 512
155
+ ],
156
+ "dtype": "torch.float32"
157
+ },
158
+ "decoder.mid.block_2.conv1.bias": {
159
+ "shape": [
160
+ 512
161
+ ],
162
+ "dtype": "torch.float32"
163
+ },
164
+ "decoder.mid.block_2.conv1.weight": {
165
+ "shape": [
166
+ 512,
167
+ 512,
168
+ 3,
169
+ 3
170
+ ],
171
+ "dtype": "torch.float32"
172
+ },
173
+ "decoder.mid.block_2.conv2.bias": {
174
+ "shape": [
175
+ 512
176
+ ],
177
+ "dtype": "torch.float32"
178
+ },
179
+ "decoder.mid.block_2.conv2.weight": {
180
+ "shape": [
181
+ 512,
182
+ 512,
183
+ 3,
184
+ 3
185
+ ],
186
+ "dtype": "torch.float32"
187
+ },
188
+ "decoder.mid.block_2.norm1.bias": {
189
+ "shape": [
190
+ 512
191
+ ],
192
+ "dtype": "torch.float32"
193
+ },
194
+ "decoder.mid.block_2.norm1.weight": {
195
+ "shape": [
196
+ 512
197
+ ],
198
+ "dtype": "torch.float32"
199
+ },
200
+ "decoder.mid.block_2.norm2.bias": {
201
+ "shape": [
202
+ 512
203
+ ],
204
+ "dtype": "torch.float32"
205
+ },
206
+ "decoder.mid.block_2.norm2.weight": {
207
+ "shape": [
208
+ 512
209
+ ],
210
+ "dtype": "torch.float32"
211
+ },
212
+ "decoder.norm_out.bias": {
213
+ "shape": [
214
+ 128
215
+ ],
216
+ "dtype": "torch.float32"
217
+ },
218
+ "decoder.norm_out.weight": {
219
+ "shape": [
220
+ 128
221
+ ],
222
+ "dtype": "torch.float32"
223
+ },
224
+ "decoder.up.0.block.0.conv1.bias": {
225
+ "shape": [
226
+ 128
227
+ ],
228
+ "dtype": "torch.float32"
229
+ },
230
+ "decoder.up.0.block.0.conv1.weight": {
231
+ "shape": [
232
+ 128,
233
+ 256,
234
+ 3,
235
+ 3
236
+ ],
237
+ "dtype": "torch.float32"
238
+ },
239
+ "decoder.up.0.block.0.conv2.bias": {
240
+ "shape": [
241
+ 128
242
+ ],
243
+ "dtype": "torch.float32"
244
+ },
245
+ "decoder.up.0.block.0.conv2.weight": {
246
+ "shape": [
247
+ 128,
248
+ 128,
249
+ 3,
250
+ 3
251
+ ],
252
+ "dtype": "torch.float32"
253
+ },
254
+ "decoder.up.0.block.0.nin_shortcut.bias": {
255
+ "shape": [
256
+ 128
257
+ ],
258
+ "dtype": "torch.float32"
259
+ },
260
+ "decoder.up.0.block.0.nin_shortcut.weight": {
261
+ "shape": [
262
+ 128,
263
+ 256,
264
+ 1,
265
+ 1
266
+ ],
267
+ "dtype": "torch.float32"
268
+ },
269
+ "decoder.up.0.block.0.norm1.bias": {
270
+ "shape": [
271
+ 256
272
+ ],
273
+ "dtype": "torch.float32"
274
+ },
275
+ "decoder.up.0.block.0.norm1.weight": {
276
+ "shape": [
277
+ 256
278
+ ],
279
+ "dtype": "torch.float32"
280
+ },
281
+ "decoder.up.0.block.0.norm2.bias": {
282
+ "shape": [
283
+ 128
284
+ ],
285
+ "dtype": "torch.float32"
286
+ },
287
+ "decoder.up.0.block.0.norm2.weight": {
288
+ "shape": [
289
+ 128
290
+ ],
291
+ "dtype": "torch.float32"
292
+ },
293
+ "decoder.up.0.block.1.conv1.bias": {
294
+ "shape": [
295
+ 128
296
+ ],
297
+ "dtype": "torch.float32"
298
+ },
299
+ "decoder.up.0.block.1.conv1.weight": {
300
+ "shape": [
301
+ 128,
302
+ 128,
303
+ 3,
304
+ 3
305
+ ],
306
+ "dtype": "torch.float32"
307
+ },
308
+ "decoder.up.0.block.1.conv2.bias": {
309
+ "shape": [
310
+ 128
311
+ ],
312
+ "dtype": "torch.float32"
313
+ },
314
+ "decoder.up.0.block.1.conv2.weight": {
315
+ "shape": [
316
+ 128,
317
+ 128,
318
+ 3,
319
+ 3
320
+ ],
321
+ "dtype": "torch.float32"
322
+ },
323
+ "decoder.up.0.block.1.norm1.bias": {
324
+ "shape": [
325
+ 128
326
+ ],
327
+ "dtype": "torch.float32"
328
+ },
329
+ "decoder.up.0.block.1.norm1.weight": {
330
+ "shape": [
331
+ 128
332
+ ],
333
+ "dtype": "torch.float32"
334
+ },
335
+ "decoder.up.0.block.1.norm2.bias": {
336
+ "shape": [
337
+ 128
338
+ ],
339
+ "dtype": "torch.float32"
340
+ },
341
+ "decoder.up.0.block.1.norm2.weight": {
342
+ "shape": [
343
+ 128
344
+ ],
345
+ "dtype": "torch.float32"
346
+ },
347
+ "decoder.up.0.block.2.conv1.bias": {
348
+ "shape": [
349
+ 128
350
+ ],
351
+ "dtype": "torch.float32"
352
+ },
353
+ "decoder.up.0.block.2.conv1.weight": {
354
+ "shape": [
355
+ 128,
356
+ 128,
357
+ 3,
358
+ 3
359
+ ],
360
+ "dtype": "torch.float32"
361
+ },
362
+ "decoder.up.0.block.2.conv2.bias": {
363
+ "shape": [
364
+ 128
365
+ ],
366
+ "dtype": "torch.float32"
367
+ },
368
+ "decoder.up.0.block.2.conv2.weight": {
369
+ "shape": [
370
+ 128,
371
+ 128,
372
+ 3,
373
+ 3
374
+ ],
375
+ "dtype": "torch.float32"
376
+ },
377
+ "decoder.up.0.block.2.norm1.bias": {
378
+ "shape": [
379
+ 128
380
+ ],
381
+ "dtype": "torch.float32"
382
+ },
383
+ "decoder.up.0.block.2.norm1.weight": {
384
+ "shape": [
385
+ 128
386
+ ],
387
+ "dtype": "torch.float32"
388
+ },
389
+ "decoder.up.0.block.2.norm2.bias": {
390
+ "shape": [
391
+ 128
392
+ ],
393
+ "dtype": "torch.float32"
394
+ },
395
+ "decoder.up.0.block.2.norm2.weight": {
396
+ "shape": [
397
+ 128
398
+ ],
399
+ "dtype": "torch.float32"
400
+ },
401
+ "decoder.up.1.block.0.conv1.bias": {
402
+ "shape": [
403
+ 256
404
+ ],
405
+ "dtype": "torch.float32"
406
+ },
407
+ "decoder.up.1.block.0.conv1.weight": {
408
+ "shape": [
409
+ 256,
410
+ 512,
411
+ 3,
412
+ 3
413
+ ],
414
+ "dtype": "torch.float32"
415
+ },
416
+ "decoder.up.1.block.0.conv2.bias": {
417
+ "shape": [
418
+ 256
419
+ ],
420
+ "dtype": "torch.float32"
421
+ },
422
+ "decoder.up.1.block.0.conv2.weight": {
423
+ "shape": [
424
+ 256,
425
+ 256,
426
+ 3,
427
+ 3
428
+ ],
429
+ "dtype": "torch.float32"
430
+ },
431
+ "decoder.up.1.block.0.nin_shortcut.bias": {
432
+ "shape": [
433
+ 256
434
+ ],
435
+ "dtype": "torch.float32"
436
+ },
437
+ "decoder.up.1.block.0.nin_shortcut.weight": {
438
+ "shape": [
439
+ 256,
440
+ 512,
441
+ 1,
442
+ 1
443
+ ],
444
+ "dtype": "torch.float32"
445
+ },
446
+ "decoder.up.1.block.0.norm1.bias": {
447
+ "shape": [
448
+ 512
449
+ ],
450
+ "dtype": "torch.float32"
451
+ },
452
+ "decoder.up.1.block.0.norm1.weight": {
453
+ "shape": [
454
+ 512
455
+ ],
456
+ "dtype": "torch.float32"
457
+ },
458
+ "decoder.up.1.block.0.norm2.bias": {
459
+ "shape": [
460
+ 256
461
+ ],
462
+ "dtype": "torch.float32"
463
+ },
464
+ "decoder.up.1.block.0.norm2.weight": {
465
+ "shape": [
466
+ 256
467
+ ],
468
+ "dtype": "torch.float32"
469
+ },
470
+ "decoder.up.1.block.1.conv1.bias": {
471
+ "shape": [
472
+ 256
473
+ ],
474
+ "dtype": "torch.float32"
475
+ },
476
+ "decoder.up.1.block.1.conv1.weight": {
477
+ "shape": [
478
+ 256,
479
+ 256,
480
+ 3,
481
+ 3
482
+ ],
483
+ "dtype": "torch.float32"
484
+ },
485
+ "decoder.up.1.block.1.conv2.bias": {
486
+ "shape": [
487
+ 256
488
+ ],
489
+ "dtype": "torch.float32"
490
+ },
491
+ "decoder.up.1.block.1.conv2.weight": {
492
+ "shape": [
493
+ 256,
494
+ 256,
495
+ 3,
496
+ 3
497
+ ],
498
+ "dtype": "torch.float32"
499
+ },
500
+ "decoder.up.1.block.1.norm1.bias": {
501
+ "shape": [
502
+ 256
503
+ ],
504
+ "dtype": "torch.float32"
505
+ },
506
+ "decoder.up.1.block.1.norm1.weight": {
507
+ "shape": [
508
+ 256
509
+ ],
510
+ "dtype": "torch.float32"
511
+ },
512
+ "decoder.up.1.block.1.norm2.bias": {
513
+ "shape": [
514
+ 256
515
+ ],
516
+ "dtype": "torch.float32"
517
+ },
518
+ "decoder.up.1.block.1.norm2.weight": {
519
+ "shape": [
520
+ 256
521
+ ],
522
+ "dtype": "torch.float32"
523
+ },
524
+ "decoder.up.1.block.2.conv1.bias": {
525
+ "shape": [
526
+ 256
527
+ ],
528
+ "dtype": "torch.float32"
529
+ },
530
+ "decoder.up.1.block.2.conv1.weight": {
531
+ "shape": [
532
+ 256,
533
+ 256,
534
+ 3,
535
+ 3
536
+ ],
537
+ "dtype": "torch.float32"
538
+ },
539
+ "decoder.up.1.block.2.conv2.bias": {
540
+ "shape": [
541
+ 256
542
+ ],
543
+ "dtype": "torch.float32"
544
+ },
545
+ "decoder.up.1.block.2.conv2.weight": {
546
+ "shape": [
547
+ 256,
548
+ 256,
549
+ 3,
550
+ 3
551
+ ],
552
+ "dtype": "torch.float32"
553
+ },
554
+ "decoder.up.1.block.2.norm1.bias": {
555
+ "shape": [
556
+ 256
557
+ ],
558
+ "dtype": "torch.float32"
559
+ },
560
+ "decoder.up.1.block.2.norm1.weight": {
561
+ "shape": [
562
+ 256
563
+ ],
564
+ "dtype": "torch.float32"
565
+ },
566
+ "decoder.up.1.block.2.norm2.bias": {
567
+ "shape": [
568
+ 256
569
+ ],
570
+ "dtype": "torch.float32"
571
+ },
572
+ "decoder.up.1.block.2.norm2.weight": {
573
+ "shape": [
574
+ 256
575
+ ],
576
+ "dtype": "torch.float32"
577
+ },
578
+ "decoder.up.1.upsample.conv.bias": {
579
+ "shape": [
580
+ 256
581
+ ],
582
+ "dtype": "torch.float32"
583
+ },
584
+ "decoder.up.1.upsample.conv.weight": {
585
+ "shape": [
586
+ 256,
587
+ 256,
588
+ 3,
589
+ 3
590
+ ],
591
+ "dtype": "torch.float32"
592
+ },
593
+ "decoder.up.2.block.0.conv1.bias": {
594
+ "shape": [
595
+ 512
596
+ ],
597
+ "dtype": "torch.float32"
598
+ },
599
+ "decoder.up.2.block.0.conv1.weight": {
600
+ "shape": [
601
+ 512,
602
+ 512,
603
+ 3,
604
+ 3
605
+ ],
606
+ "dtype": "torch.float32"
607
+ },
608
+ "decoder.up.2.block.0.conv2.bias": {
609
+ "shape": [
610
+ 512
611
+ ],
612
+ "dtype": "torch.float32"
613
+ },
614
+ "decoder.up.2.block.0.conv2.weight": {
615
+ "shape": [
616
+ 512,
617
+ 512,
618
+ 3,
619
+ 3
620
+ ],
621
+ "dtype": "torch.float32"
622
+ },
623
+ "decoder.up.2.block.0.norm1.bias": {
624
+ "shape": [
625
+ 512
626
+ ],
627
+ "dtype": "torch.float32"
628
+ },
629
+ "decoder.up.2.block.0.norm1.weight": {
630
+ "shape": [
631
+ 512
632
+ ],
633
+ "dtype": "torch.float32"
634
+ },
635
+ "decoder.up.2.block.0.norm2.bias": {
636
+ "shape": [
637
+ 512
638
+ ],
639
+ "dtype": "torch.float32"
640
+ },
641
+ "decoder.up.2.block.0.norm2.weight": {
642
+ "shape": [
643
+ 512
644
+ ],
645
+ "dtype": "torch.float32"
646
+ },
647
+ "decoder.up.2.block.1.conv1.bias": {
648
+ "shape": [
649
+ 512
650
+ ],
651
+ "dtype": "torch.float32"
652
+ },
653
+ "decoder.up.2.block.1.conv1.weight": {
654
+ "shape": [
655
+ 512,
656
+ 512,
657
+ 3,
658
+ 3
659
+ ],
660
+ "dtype": "torch.float32"
661
+ },
662
+ "decoder.up.2.block.1.conv2.bias": {
663
+ "shape": [
664
+ 512
665
+ ],
666
+ "dtype": "torch.float32"
667
+ },
668
+ "decoder.up.2.block.1.conv2.weight": {
669
+ "shape": [
670
+ 512,
671
+ 512,
672
+ 3,
673
+ 3
674
+ ],
675
+ "dtype": "torch.float32"
676
+ },
677
+ "decoder.up.2.block.1.norm1.bias": {
678
+ "shape": [
679
+ 512
680
+ ],
681
+ "dtype": "torch.float32"
682
+ },
683
+ "decoder.up.2.block.1.norm1.weight": {
684
+ "shape": [
685
+ 512
686
+ ],
687
+ "dtype": "torch.float32"
688
+ },
689
+ "decoder.up.2.block.1.norm2.bias": {
690
+ "shape": [
691
+ 512
692
+ ],
693
+ "dtype": "torch.float32"
694
+ },
695
+ "decoder.up.2.block.1.norm2.weight": {
696
+ "shape": [
697
+ 512
698
+ ],
699
+ "dtype": "torch.float32"
700
+ },
701
+ "decoder.up.2.block.2.conv1.bias": {
702
+ "shape": [
703
+ 512
704
+ ],
705
+ "dtype": "torch.float32"
706
+ },
707
+ "decoder.up.2.block.2.conv1.weight": {
708
+ "shape": [
709
+ 512,
710
+ 512,
711
+ 3,
712
+ 3
713
+ ],
714
+ "dtype": "torch.float32"
715
+ },
716
+ "decoder.up.2.block.2.conv2.bias": {
717
+ "shape": [
718
+ 512
719
+ ],
720
+ "dtype": "torch.float32"
721
+ },
722
+ "decoder.up.2.block.2.conv2.weight": {
723
+ "shape": [
724
+ 512,
725
+ 512,
726
+ 3,
727
+ 3
728
+ ],
729
+ "dtype": "torch.float32"
730
+ },
731
+ "decoder.up.2.block.2.norm1.bias": {
732
+ "shape": [
733
+ 512
734
+ ],
735
+ "dtype": "torch.float32"
736
+ },
737
+ "decoder.up.2.block.2.norm1.weight": {
738
+ "shape": [
739
+ 512
740
+ ],
741
+ "dtype": "torch.float32"
742
+ },
743
+ "decoder.up.2.block.2.norm2.bias": {
744
+ "shape": [
745
+ 512
746
+ ],
747
+ "dtype": "torch.float32"
748
+ },
749
+ "decoder.up.2.block.2.norm2.weight": {
750
+ "shape": [
751
+ 512
752
+ ],
753
+ "dtype": "torch.float32"
754
+ },
755
+ "decoder.up.2.upsample.conv.bias": {
756
+ "shape": [
757
+ 512
758
+ ],
759
+ "dtype": "torch.float32"
760
+ },
761
+ "decoder.up.2.upsample.conv.weight": {
762
+ "shape": [
763
+ 512,
764
+ 512,
765
+ 3,
766
+ 3
767
+ ],
768
+ "dtype": "torch.float32"
769
+ },
770
+ "decoder.up.3.block.0.conv1.bias": {
771
+ "shape": [
772
+ 512
773
+ ],
774
+ "dtype": "torch.float32"
775
+ },
776
+ "decoder.up.3.block.0.conv1.weight": {
777
+ "shape": [
778
+ 512,
779
+ 512,
780
+ 3,
781
+ 3
782
+ ],
783
+ "dtype": "torch.float32"
784
+ },
785
+ "decoder.up.3.block.0.conv2.bias": {
786
+ "shape": [
787
+ 512
788
+ ],
789
+ "dtype": "torch.float32"
790
+ },
791
+ "decoder.up.3.block.0.conv2.weight": {
792
+ "shape": [
793
+ 512,
794
+ 512,
795
+ 3,
796
+ 3
797
+ ],
798
+ "dtype": "torch.float32"
799
+ },
800
+ "decoder.up.3.block.0.norm1.bias": {
801
+ "shape": [
802
+ 512
803
+ ],
804
+ "dtype": "torch.float32"
805
+ },
806
+ "decoder.up.3.block.0.norm1.weight": {
807
+ "shape": [
808
+ 512
809
+ ],
810
+ "dtype": "torch.float32"
811
+ },
812
+ "decoder.up.3.block.0.norm2.bias": {
813
+ "shape": [
814
+ 512
815
+ ],
816
+ "dtype": "torch.float32"
817
+ },
818
+ "decoder.up.3.block.0.norm2.weight": {
819
+ "shape": [
820
+ 512
821
+ ],
822
+ "dtype": "torch.float32"
823
+ },
824
+ "decoder.up.3.block.1.conv1.bias": {
825
+ "shape": [
826
+ 512
827
+ ],
828
+ "dtype": "torch.float32"
829
+ },
830
+ "decoder.up.3.block.1.conv1.weight": {
831
+ "shape": [
832
+ 512,
833
+ 512,
834
+ 3,
835
+ 3
836
+ ],
837
+ "dtype": "torch.float32"
838
+ },
839
+ "decoder.up.3.block.1.conv2.bias": {
840
+ "shape": [
841
+ 512
842
+ ],
843
+ "dtype": "torch.float32"
844
+ },
845
+ "decoder.up.3.block.1.conv2.weight": {
846
+ "shape": [
847
+ 512,
848
+ 512,
849
+ 3,
850
+ 3
851
+ ],
852
+ "dtype": "torch.float32"
853
+ },
854
+ "decoder.up.3.block.1.norm1.bias": {
855
+ "shape": [
856
+ 512
857
+ ],
858
+ "dtype": "torch.float32"
859
+ },
860
+ "decoder.up.3.block.1.norm1.weight": {
861
+ "shape": [
862
+ 512
863
+ ],
864
+ "dtype": "torch.float32"
865
+ },
866
+ "decoder.up.3.block.1.norm2.bias": {
867
+ "shape": [
868
+ 512
869
+ ],
870
+ "dtype": "torch.float32"
871
+ },
872
+ "decoder.up.3.block.1.norm2.weight": {
873
+ "shape": [
874
+ 512
875
+ ],
876
+ "dtype": "torch.float32"
877
+ },
878
+ "decoder.up.3.block.2.conv1.bias": {
879
+ "shape": [
880
+ 512
881
+ ],
882
+ "dtype": "torch.float32"
883
+ },
884
+ "decoder.up.3.block.2.conv1.weight": {
885
+ "shape": [
886
+ 512,
887
+ 512,
888
+ 3,
889
+ 3
890
+ ],
891
+ "dtype": "torch.float32"
892
+ },
893
+ "decoder.up.3.block.2.conv2.bias": {
894
+ "shape": [
895
+ 512
896
+ ],
897
+ "dtype": "torch.float32"
898
+ },
899
+ "decoder.up.3.block.2.conv2.weight": {
900
+ "shape": [
901
+ 512,
902
+ 512,
903
+ 3,
904
+ 3
905
+ ],
906
+ "dtype": "torch.float32"
907
+ },
908
+ "decoder.up.3.block.2.norm1.bias": {
909
+ "shape": [
910
+ 512
911
+ ],
912
+ "dtype": "torch.float32"
913
+ },
914
+ "decoder.up.3.block.2.norm1.weight": {
915
+ "shape": [
916
+ 512
917
+ ],
918
+ "dtype": "torch.float32"
919
+ },
920
+ "decoder.up.3.block.2.norm2.bias": {
921
+ "shape": [
922
+ 512
923
+ ],
924
+ "dtype": "torch.float32"
925
+ },
926
+ "decoder.up.3.block.2.norm2.weight": {
927
+ "shape": [
928
+ 512
929
+ ],
930
+ "dtype": "torch.float32"
931
+ },
932
+ "decoder.up.3.upsample.conv.bias": {
933
+ "shape": [
934
+ 512
935
+ ],
936
+ "dtype": "torch.float32"
937
+ },
938
+ "decoder.up.3.upsample.conv.weight": {
939
+ "shape": [
940
+ 512,
941
+ 512,
942
+ 3,
943
+ 3
944
+ ],
945
+ "dtype": "torch.float32"
946
+ },
947
+ "encoder.conv_in.bias": {
948
+ "shape": [
949
+ 128
950
+ ],
951
+ "dtype": "torch.float32"
952
+ },
953
+ "encoder.conv_in.weight": {
954
+ "shape": [
955
+ 128,
956
+ 3,
957
+ 3,
958
+ 3
959
+ ],
960
+ "dtype": "torch.float32"
961
+ },
962
+ "encoder.conv_out.bias": {
963
+ "shape": [
964
+ 32
965
+ ],
966
+ "dtype": "torch.float32"
967
+ },
968
+ "encoder.conv_out.weight": {
969
+ "shape": [
970
+ 32,
971
+ 512,
972
+ 3,
973
+ 3
974
+ ],
975
+ "dtype": "torch.float32"
976
+ },
977
+ "encoder.down.0.block.0.conv1.bias": {
978
+ "shape": [
979
+ 128
980
+ ],
981
+ "dtype": "torch.float32"
982
+ },
983
+ "encoder.down.0.block.0.conv1.weight": {
984
+ "shape": [
985
+ 128,
986
+ 128,
987
+ 3,
988
+ 3
989
+ ],
990
+ "dtype": "torch.float32"
991
+ },
992
+ "encoder.down.0.block.0.conv2.bias": {
993
+ "shape": [
994
+ 128
995
+ ],
996
+ "dtype": "torch.float32"
997
+ },
998
+ "encoder.down.0.block.0.conv2.weight": {
999
+ "shape": [
1000
+ 128,
1001
+ 128,
1002
+ 3,
1003
+ 3
1004
+ ],
1005
+ "dtype": "torch.float32"
1006
+ },
1007
+ "encoder.down.0.block.0.norm1.bias": {
1008
+ "shape": [
1009
+ 128
1010
+ ],
1011
+ "dtype": "torch.float32"
1012
+ },
1013
+ "encoder.down.0.block.0.norm1.weight": {
1014
+ "shape": [
1015
+ 128
1016
+ ],
1017
+ "dtype": "torch.float32"
1018
+ },
1019
+ "encoder.down.0.block.0.norm2.bias": {
1020
+ "shape": [
1021
+ 128
1022
+ ],
1023
+ "dtype": "torch.float32"
1024
+ },
1025
+ "encoder.down.0.block.0.norm2.weight": {
1026
+ "shape": [
1027
+ 128
1028
+ ],
1029
+ "dtype": "torch.float32"
1030
+ },
1031
+ "encoder.down.0.block.1.conv1.bias": {
1032
+ "shape": [
1033
+ 128
1034
+ ],
1035
+ "dtype": "torch.float32"
1036
+ },
1037
+ "encoder.down.0.block.1.conv1.weight": {
1038
+ "shape": [
1039
+ 128,
1040
+ 128,
1041
+ 3,
1042
+ 3
1043
+ ],
1044
+ "dtype": "torch.float32"
1045
+ },
1046
+ "encoder.down.0.block.1.conv2.bias": {
1047
+ "shape": [
1048
+ 128
1049
+ ],
1050
+ "dtype": "torch.float32"
1051
+ },
1052
+ "encoder.down.0.block.1.conv2.weight": {
1053
+ "shape": [
1054
+ 128,
1055
+ 128,
1056
+ 3,
1057
+ 3
1058
+ ],
1059
+ "dtype": "torch.float32"
1060
+ },
1061
+ "encoder.down.0.block.1.norm1.bias": {
1062
+ "shape": [
1063
+ 128
1064
+ ],
1065
+ "dtype": "torch.float32"
1066
+ },
1067
+ "encoder.down.0.block.1.norm1.weight": {
1068
+ "shape": [
1069
+ 128
1070
+ ],
1071
+ "dtype": "torch.float32"
1072
+ },
1073
+ "encoder.down.0.block.1.norm2.bias": {
1074
+ "shape": [
1075
+ 128
1076
+ ],
1077
+ "dtype": "torch.float32"
1078
+ },
1079
+ "encoder.down.0.block.1.norm2.weight": {
1080
+ "shape": [
1081
+ 128
1082
+ ],
1083
+ "dtype": "torch.float32"
1084
+ },
1085
+ "encoder.down.0.downsample.conv.bias": {
1086
+ "shape": [
1087
+ 128
1088
+ ],
1089
+ "dtype": "torch.float32"
1090
+ },
1091
+ "encoder.down.0.downsample.conv.weight": {
1092
+ "shape": [
1093
+ 128,
1094
+ 128,
1095
+ 3,
1096
+ 3
1097
+ ],
1098
+ "dtype": "torch.float32"
1099
+ },
1100
+ "encoder.down.1.block.0.conv1.bias": {
1101
+ "shape": [
1102
+ 256
1103
+ ],
1104
+ "dtype": "torch.float32"
1105
+ },
1106
+ "encoder.down.1.block.0.conv1.weight": {
1107
+ "shape": [
1108
+ 256,
1109
+ 128,
1110
+ 3,
1111
+ 3
1112
+ ],
1113
+ "dtype": "torch.float32"
1114
+ },
1115
+ "encoder.down.1.block.0.conv2.bias": {
1116
+ "shape": [
1117
+ 256
1118
+ ],
1119
+ "dtype": "torch.float32"
1120
+ },
1121
+ "encoder.down.1.block.0.conv2.weight": {
1122
+ "shape": [
1123
+ 256,
1124
+ 256,
1125
+ 3,
1126
+ 3
1127
+ ],
1128
+ "dtype": "torch.float32"
1129
+ },
1130
+ "encoder.down.1.block.0.nin_shortcut.bias": {
1131
+ "shape": [
1132
+ 256
1133
+ ],
1134
+ "dtype": "torch.float32"
1135
+ },
1136
+ "encoder.down.1.block.0.nin_shortcut.weight": {
1137
+ "shape": [
1138
+ 256,
1139
+ 128,
1140
+ 1,
1141
+ 1
1142
+ ],
1143
+ "dtype": "torch.float32"
1144
+ },
1145
+ "encoder.down.1.block.0.norm1.bias": {
1146
+ "shape": [
1147
+ 128
1148
+ ],
1149
+ "dtype": "torch.float32"
1150
+ },
1151
+ "encoder.down.1.block.0.norm1.weight": {
1152
+ "shape": [
1153
+ 128
1154
+ ],
1155
+ "dtype": "torch.float32"
1156
+ },
1157
+ "encoder.down.1.block.0.norm2.bias": {
1158
+ "shape": [
1159
+ 256
1160
+ ],
1161
+ "dtype": "torch.float32"
1162
+ },
1163
+ "encoder.down.1.block.0.norm2.weight": {
1164
+ "shape": [
1165
+ 256
1166
+ ],
1167
+ "dtype": "torch.float32"
1168
+ },
1169
+ "encoder.down.1.block.1.conv1.bias": {
1170
+ "shape": [
1171
+ 256
1172
+ ],
1173
+ "dtype": "torch.float32"
1174
+ },
1175
+ "encoder.down.1.block.1.conv1.weight": {
1176
+ "shape": [
1177
+ 256,
1178
+ 256,
1179
+ 3,
1180
+ 3
1181
+ ],
1182
+ "dtype": "torch.float32"
1183
+ },
1184
+ "encoder.down.1.block.1.conv2.bias": {
1185
+ "shape": [
1186
+ 256
1187
+ ],
1188
+ "dtype": "torch.float32"
1189
+ },
1190
+ "encoder.down.1.block.1.conv2.weight": {
1191
+ "shape": [
1192
+ 256,
1193
+ 256,
1194
+ 3,
1195
+ 3
1196
+ ],
1197
+ "dtype": "torch.float32"
1198
+ },
1199
+ "encoder.down.1.block.1.norm1.bias": {
1200
+ "shape": [
1201
+ 256
1202
+ ],
1203
+ "dtype": "torch.float32"
1204
+ },
1205
+ "encoder.down.1.block.1.norm1.weight": {
1206
+ "shape": [
1207
+ 256
1208
+ ],
1209
+ "dtype": "torch.float32"
1210
+ },
1211
+ "encoder.down.1.block.1.norm2.bias": {
1212
+ "shape": [
1213
+ 256
1214
+ ],
1215
+ "dtype": "torch.float32"
1216
+ },
1217
+ "encoder.down.1.block.1.norm2.weight": {
1218
+ "shape": [
1219
+ 256
1220
+ ],
1221
+ "dtype": "torch.float32"
1222
+ },
1223
+ "encoder.down.1.downsample.conv.bias": {
1224
+ "shape": [
1225
+ 256
1226
+ ],
1227
+ "dtype": "torch.float32"
1228
+ },
1229
+ "encoder.down.1.downsample.conv.weight": {
1230
+ "shape": [
1231
+ 256,
1232
+ 256,
1233
+ 3,
1234
+ 3
1235
+ ],
1236
+ "dtype": "torch.float32"
1237
+ },
1238
+ "encoder.down.2.block.0.conv1.bias": {
1239
+ "shape": [
1240
+ 512
1241
+ ],
1242
+ "dtype": "torch.float32"
1243
+ },
1244
+ "encoder.down.2.block.0.conv1.weight": {
1245
+ "shape": [
1246
+ 512,
1247
+ 256,
1248
+ 3,
1249
+ 3
1250
+ ],
1251
+ "dtype": "torch.float32"
1252
+ },
1253
+ "encoder.down.2.block.0.conv2.bias": {
1254
+ "shape": [
1255
+ 512
1256
+ ],
1257
+ "dtype": "torch.float32"
1258
+ },
1259
+ "encoder.down.2.block.0.conv2.weight": {
1260
+ "shape": [
1261
+ 512,
1262
+ 512,
1263
+ 3,
1264
+ 3
1265
+ ],
1266
+ "dtype": "torch.float32"
1267
+ },
1268
+ "encoder.down.2.block.0.nin_shortcut.bias": {
1269
+ "shape": [
1270
+ 512
1271
+ ],
1272
+ "dtype": "torch.float32"
1273
+ },
1274
+ "encoder.down.2.block.0.nin_shortcut.weight": {
1275
+ "shape": [
1276
+ 512,
1277
+ 256,
1278
+ 1,
1279
+ 1
1280
+ ],
1281
+ "dtype": "torch.float32"
1282
+ },
1283
+ "encoder.down.2.block.0.norm1.bias": {
1284
+ "shape": [
1285
+ 256
1286
+ ],
1287
+ "dtype": "torch.float32"
1288
+ },
1289
+ "encoder.down.2.block.0.norm1.weight": {
1290
+ "shape": [
1291
+ 256
1292
+ ],
1293
+ "dtype": "torch.float32"
1294
+ },
1295
+ "encoder.down.2.block.0.norm2.bias": {
1296
+ "shape": [
1297
+ 512
1298
+ ],
1299
+ "dtype": "torch.float32"
1300
+ },
1301
+ "encoder.down.2.block.0.norm2.weight": {
1302
+ "shape": [
1303
+ 512
1304
+ ],
1305
+ "dtype": "torch.float32"
1306
+ },
1307
+ "encoder.down.2.block.1.conv1.bias": {
1308
+ "shape": [
1309
+ 512
1310
+ ],
1311
+ "dtype": "torch.float32"
1312
+ },
1313
+ "encoder.down.2.block.1.conv1.weight": {
1314
+ "shape": [
1315
+ 512,
1316
+ 512,
1317
+ 3,
1318
+ 3
1319
+ ],
1320
+ "dtype": "torch.float32"
1321
+ },
1322
+ "encoder.down.2.block.1.conv2.bias": {
1323
+ "shape": [
1324
+ 512
1325
+ ],
1326
+ "dtype": "torch.float32"
1327
+ },
1328
+ "encoder.down.2.block.1.conv2.weight": {
1329
+ "shape": [
1330
+ 512,
1331
+ 512,
1332
+ 3,
1333
+ 3
1334
+ ],
1335
+ "dtype": "torch.float32"
1336
+ },
1337
+ "encoder.down.2.block.1.norm1.bias": {
1338
+ "shape": [
1339
+ 512
1340
+ ],
1341
+ "dtype": "torch.float32"
1342
+ },
1343
+ "encoder.down.2.block.1.norm1.weight": {
1344
+ "shape": [
1345
+ 512
1346
+ ],
1347
+ "dtype": "torch.float32"
1348
+ },
1349
+ "encoder.down.2.block.1.norm2.bias": {
1350
+ "shape": [
1351
+ 512
1352
+ ],
1353
+ "dtype": "torch.float32"
1354
+ },
1355
+ "encoder.down.2.block.1.norm2.weight": {
1356
+ "shape": [
1357
+ 512
1358
+ ],
1359
+ "dtype": "torch.float32"
1360
+ },
1361
+ "encoder.down.2.downsample.conv.bias": {
1362
+ "shape": [
1363
+ 512
1364
+ ],
1365
+ "dtype": "torch.float32"
1366
+ },
1367
+ "encoder.down.2.downsample.conv.weight": {
1368
+ "shape": [
1369
+ 512,
1370
+ 512,
1371
+ 3,
1372
+ 3
1373
+ ],
1374
+ "dtype": "torch.float32"
1375
+ },
1376
+ "encoder.down.3.block.0.conv1.bias": {
1377
+ "shape": [
1378
+ 512
1379
+ ],
1380
+ "dtype": "torch.float32"
1381
+ },
1382
+ "encoder.down.3.block.0.conv1.weight": {
1383
+ "shape": [
1384
+ 512,
1385
+ 512,
1386
+ 3,
1387
+ 3
1388
+ ],
1389
+ "dtype": "torch.float32"
1390
+ },
1391
+ "encoder.down.3.block.0.conv2.bias": {
1392
+ "shape": [
1393
+ 512
1394
+ ],
1395
+ "dtype": "torch.float32"
1396
+ },
1397
+ "encoder.down.3.block.0.conv2.weight": {
1398
+ "shape": [
1399
+ 512,
1400
+ 512,
1401
+ 3,
1402
+ 3
1403
+ ],
1404
+ "dtype": "torch.float32"
1405
+ },
1406
+ "encoder.down.3.block.0.norm1.bias": {
1407
+ "shape": [
1408
+ 512
1409
+ ],
1410
+ "dtype": "torch.float32"
1411
+ },
1412
+ "encoder.down.3.block.0.norm1.weight": {
1413
+ "shape": [
1414
+ 512
1415
+ ],
1416
+ "dtype": "torch.float32"
1417
+ },
1418
+ "encoder.down.3.block.0.norm2.bias": {
1419
+ "shape": [
1420
+ 512
1421
+ ],
1422
+ "dtype": "torch.float32"
1423
+ },
1424
+ "encoder.down.3.block.0.norm2.weight": {
1425
+ "shape": [
1426
+ 512
1427
+ ],
1428
+ "dtype": "torch.float32"
1429
+ },
1430
+ "encoder.down.3.block.1.conv1.bias": {
1431
+ "shape": [
1432
+ 512
1433
+ ],
1434
+ "dtype": "torch.float32"
1435
+ },
1436
+ "encoder.down.3.block.1.conv1.weight": {
1437
+ "shape": [
1438
+ 512,
1439
+ 512,
1440
+ 3,
1441
+ 3
1442
+ ],
1443
+ "dtype": "torch.float32"
1444
+ },
1445
+ "encoder.down.3.block.1.conv2.bias": {
1446
+ "shape": [
1447
+ 512
1448
+ ],
1449
+ "dtype": "torch.float32"
1450
+ },
1451
+ "encoder.down.3.block.1.conv2.weight": {
1452
+ "shape": [
1453
+ 512,
1454
+ 512,
1455
+ 3,
1456
+ 3
1457
+ ],
1458
+ "dtype": "torch.float32"
1459
+ },
1460
+ "encoder.down.3.block.1.norm1.bias": {
1461
+ "shape": [
1462
+ 512
1463
+ ],
1464
+ "dtype": "torch.float32"
1465
+ },
1466
+ "encoder.down.3.block.1.norm1.weight": {
1467
+ "shape": [
1468
+ 512
1469
+ ],
1470
+ "dtype": "torch.float32"
1471
+ },
1472
+ "encoder.down.3.block.1.norm2.bias": {
1473
+ "shape": [
1474
+ 512
1475
+ ],
1476
+ "dtype": "torch.float32"
1477
+ },
1478
+ "encoder.down.3.block.1.norm2.weight": {
1479
+ "shape": [
1480
+ 512
1481
+ ],
1482
+ "dtype": "torch.float32"
1483
+ },
1484
+ "encoder.mid.attn_1.k.bias": {
1485
+ "shape": [
1486
+ 512
1487
+ ],
1488
+ "dtype": "torch.float32"
1489
+ },
1490
+ "encoder.mid.attn_1.k.weight": {
1491
+ "shape": [
1492
+ 512,
1493
+ 512,
1494
+ 1,
1495
+ 1
1496
+ ],
1497
+ "dtype": "torch.float32"
1498
+ },
1499
+ "encoder.mid.attn_1.norm.bias": {
1500
+ "shape": [
1501
+ 512
1502
+ ],
1503
+ "dtype": "torch.float32"
1504
+ },
1505
+ "encoder.mid.attn_1.norm.weight": {
1506
+ "shape": [
1507
+ 512
1508
+ ],
1509
+ "dtype": "torch.float32"
1510
+ },
1511
+ "encoder.mid.attn_1.proj_out.bias": {
1512
+ "shape": [
1513
+ 512
1514
+ ],
1515
+ "dtype": "torch.float32"
1516
+ },
1517
+ "encoder.mid.attn_1.proj_out.weight": {
1518
+ "shape": [
1519
+ 512,
1520
+ 512,
1521
+ 1,
1522
+ 1
1523
+ ],
1524
+ "dtype": "torch.float32"
1525
+ },
1526
+ "encoder.mid.attn_1.q.bias": {
1527
+ "shape": [
1528
+ 512
1529
+ ],
1530
+ "dtype": "torch.float32"
1531
+ },
1532
+ "encoder.mid.attn_1.q.weight": {
1533
+ "shape": [
1534
+ 512,
1535
+ 512,
1536
+ 1,
1537
+ 1
1538
+ ],
1539
+ "dtype": "torch.float32"
1540
+ },
1541
+ "encoder.mid.attn_1.v.bias": {
1542
+ "shape": [
1543
+ 512
1544
+ ],
1545
+ "dtype": "torch.float32"
1546
+ },
1547
+ "encoder.mid.attn_1.v.weight": {
1548
+ "shape": [
1549
+ 512,
1550
+ 512,
1551
+ 1,
1552
+ 1
1553
+ ],
1554
+ "dtype": "torch.float32"
1555
+ },
1556
+ "encoder.mid.block_1.conv1.bias": {
1557
+ "shape": [
1558
+ 512
1559
+ ],
1560
+ "dtype": "torch.float32"
1561
+ },
1562
+ "encoder.mid.block_1.conv1.weight": {
1563
+ "shape": [
1564
+ 512,
1565
+ 512,
1566
+ 3,
1567
+ 3
1568
+ ],
1569
+ "dtype": "torch.float32"
1570
+ },
1571
+ "encoder.mid.block_1.conv2.bias": {
1572
+ "shape": [
1573
+ 512
1574
+ ],
1575
+ "dtype": "torch.float32"
1576
+ },
1577
+ "encoder.mid.block_1.conv2.weight": {
1578
+ "shape": [
1579
+ 512,
1580
+ 512,
1581
+ 3,
1582
+ 3
1583
+ ],
1584
+ "dtype": "torch.float32"
1585
+ },
1586
+ "encoder.mid.block_1.norm1.bias": {
1587
+ "shape": [
1588
+ 512
1589
+ ],
1590
+ "dtype": "torch.float32"
1591
+ },
1592
+ "encoder.mid.block_1.norm1.weight": {
1593
+ "shape": [
1594
+ 512
1595
+ ],
1596
+ "dtype": "torch.float32"
1597
+ },
1598
+ "encoder.mid.block_1.norm2.bias": {
1599
+ "shape": [
1600
+ 512
1601
+ ],
1602
+ "dtype": "torch.float32"
1603
+ },
1604
+ "encoder.mid.block_1.norm2.weight": {
1605
+ "shape": [
1606
+ 512
1607
+ ],
1608
+ "dtype": "torch.float32"
1609
+ },
1610
+ "encoder.mid.block_2.conv1.bias": {
1611
+ "shape": [
1612
+ 512
1613
+ ],
1614
+ "dtype": "torch.float32"
1615
+ },
1616
+ "encoder.mid.block_2.conv1.weight": {
1617
+ "shape": [
1618
+ 512,
1619
+ 512,
1620
+ 3,
1621
+ 3
1622
+ ],
1623
+ "dtype": "torch.float32"
1624
+ },
1625
+ "encoder.mid.block_2.conv2.bias": {
1626
+ "shape": [
1627
+ 512
1628
+ ],
1629
+ "dtype": "torch.float32"
1630
+ },
1631
+ "encoder.mid.block_2.conv2.weight": {
1632
+ "shape": [
1633
+ 512,
1634
+ 512,
1635
+ 3,
1636
+ 3
1637
+ ],
1638
+ "dtype": "torch.float32"
1639
+ },
1640
+ "encoder.mid.block_2.norm1.bias": {
1641
+ "shape": [
1642
+ 512
1643
+ ],
1644
+ "dtype": "torch.float32"
1645
+ },
1646
+ "encoder.mid.block_2.norm1.weight": {
1647
+ "shape": [
1648
+ 512
1649
+ ],
1650
+ "dtype": "torch.float32"
1651
+ },
1652
+ "encoder.mid.block_2.norm2.bias": {
1653
+ "shape": [
1654
+ 512
1655
+ ],
1656
+ "dtype": "torch.float32"
1657
+ },
1658
+ "encoder.mid.block_2.norm2.weight": {
1659
+ "shape": [
1660
+ 512
1661
+ ],
1662
+ "dtype": "torch.float32"
1663
+ },
1664
+ "encoder.norm_out.bias": {
1665
+ "shape": [
1666
+ 512
1667
+ ],
1668
+ "dtype": "torch.float32"
1669
+ },
1670
+ "encoder.norm_out.weight": {
1671
+ "shape": [
1672
+ 512
1673
+ ],
1674
+ "dtype": "torch.float32"
1675
+ }
1676
+ }
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "name": [
3
+ "Scone"
4
+ ]
5
+ }
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "pad_token_id": 151643,
4
+ "do_sample": true,
5
+ "eos_token_id": [
6
+ 151645,
7
+ 151643
8
+ ],
9
+ "repetition_penalty": 1.05,
10
+ "temperature": 0.7,
11
+ "top_p": 0.8,
12
+ "top_k": 20,
13
+ "transformers_version": "4.37.0"
14
+ }
gitattributes.txt ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
llm_config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 151643,
7
+ "eos_token_id": 151645,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 3584,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 18944,
12
+ "max_position_embeddings": 32768,
13
+ "max_window_layers": 28,
14
+ "model_type": "qwen2",
15
+ "num_attention_heads": 28,
16
+ "num_hidden_layers": 28,
17
+ "num_key_value_heads": 4,
18
+ "rms_norm_eps": 1e-06,
19
+ "rope_theta": 1000000.0,
20
+ "sliding_window": 131072,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "bfloat16",
23
+ "transformers_version": "4.43.1",
24
+ "use_cache": true,
25
+ "use_sliding_window": false,
26
+ "vocab_size": 152064
27
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba91a0ab78b58fbea412eb2248e64f9a590c6db5fb1e3d9fa537d07c5cd8f7bc
3
+ size 29214685336
model_keys.txt ADDED
@@ -0,0 +1,1223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ connector.fc1.bias
2
+ connector.fc1.weight
3
+ connector.fc2.bias
4
+ connector.fc2.weight
5
+ language_model.lm_head.weight
6
+ language_model.model.embed_tokens.weight
7
+ language_model.model.layers.0.input_layernorm.weight
8
+ language_model.model.layers.0.input_layernorm_moe_gen.weight
9
+ language_model.model.layers.0.mlp.down_proj.weight
10
+ language_model.model.layers.0.mlp.gate_proj.weight
11
+ language_model.model.layers.0.mlp.up_proj.weight
12
+ language_model.model.layers.0.mlp_moe_gen.down_proj.weight
13
+ language_model.model.layers.0.mlp_moe_gen.gate_proj.weight
14
+ language_model.model.layers.0.mlp_moe_gen.up_proj.weight
15
+ language_model.model.layers.0.post_attention_layernorm.weight
16
+ language_model.model.layers.0.post_attention_layernorm_moe_gen.weight
17
+ language_model.model.layers.0.self_attn.k_norm.weight
18
+ language_model.model.layers.0.self_attn.k_norm_moe_gen.weight
19
+ language_model.model.layers.0.self_attn.k_proj.bias
20
+ language_model.model.layers.0.self_attn.k_proj.weight
21
+ language_model.model.layers.0.self_attn.k_proj_moe_gen.bias
22
+ language_model.model.layers.0.self_attn.k_proj_moe_gen.weight
23
+ language_model.model.layers.0.self_attn.o_proj.weight
24
+ language_model.model.layers.0.self_attn.o_proj_moe_gen.weight
25
+ language_model.model.layers.0.self_attn.q_norm.weight
26
+ language_model.model.layers.0.self_attn.q_norm_moe_gen.weight
27
+ language_model.model.layers.0.self_attn.q_proj.bias
28
+ language_model.model.layers.0.self_attn.q_proj.weight
29
+ language_model.model.layers.0.self_attn.q_proj_moe_gen.bias
30
+ language_model.model.layers.0.self_attn.q_proj_moe_gen.weight
31
+ language_model.model.layers.0.self_attn.v_proj.bias
32
+ language_model.model.layers.0.self_attn.v_proj.weight
33
+ language_model.model.layers.0.self_attn.v_proj_moe_gen.bias
34
+ language_model.model.layers.0.self_attn.v_proj_moe_gen.weight
35
+ language_model.model.layers.1.input_layernorm.weight
36
+ language_model.model.layers.1.input_layernorm_moe_gen.weight
37
+ language_model.model.layers.1.mlp.down_proj.weight
38
+ language_model.model.layers.1.mlp.gate_proj.weight
39
+ language_model.model.layers.1.mlp.up_proj.weight
40
+ language_model.model.layers.1.mlp_moe_gen.down_proj.weight
41
+ language_model.model.layers.1.mlp_moe_gen.gate_proj.weight
42
+ language_model.model.layers.1.mlp_moe_gen.up_proj.weight
43
+ language_model.model.layers.1.post_attention_layernorm.weight
44
+ language_model.model.layers.1.post_attention_layernorm_moe_gen.weight
45
+ language_model.model.layers.1.self_attn.k_norm.weight
46
+ language_model.model.layers.1.self_attn.k_norm_moe_gen.weight
47
+ language_model.model.layers.1.self_attn.k_proj.bias
48
+ language_model.model.layers.1.self_attn.k_proj.weight
49
+ language_model.model.layers.1.self_attn.k_proj_moe_gen.bias
50
+ language_model.model.layers.1.self_attn.k_proj_moe_gen.weight
51
+ language_model.model.layers.1.self_attn.o_proj.weight
52
+ language_model.model.layers.1.self_attn.o_proj_moe_gen.weight
53
+ language_model.model.layers.1.self_attn.q_norm.weight
54
+ language_model.model.layers.1.self_attn.q_norm_moe_gen.weight
55
+ language_model.model.layers.1.self_attn.q_proj.bias
56
+ language_model.model.layers.1.self_attn.q_proj.weight
57
+ language_model.model.layers.1.self_attn.q_proj_moe_gen.bias
58
+ language_model.model.layers.1.self_attn.q_proj_moe_gen.weight
59
+ language_model.model.layers.1.self_attn.v_proj.bias
60
+ language_model.model.layers.1.self_attn.v_proj.weight
61
+ language_model.model.layers.1.self_attn.v_proj_moe_gen.bias
62
+ language_model.model.layers.1.self_attn.v_proj_moe_gen.weight
63
+ language_model.model.layers.10.input_layernorm.weight
64
+ language_model.model.layers.10.input_layernorm_moe_gen.weight
65
+ language_model.model.layers.10.mlp.down_proj.weight
66
+ language_model.model.layers.10.mlp.gate_proj.weight
67
+ language_model.model.layers.10.mlp.up_proj.weight
68
+ language_model.model.layers.10.mlp_moe_gen.down_proj.weight
69
+ language_model.model.layers.10.mlp_moe_gen.gate_proj.weight
70
+ language_model.model.layers.10.mlp_moe_gen.up_proj.weight
71
+ language_model.model.layers.10.post_attention_layernorm.weight
72
+ language_model.model.layers.10.post_attention_layernorm_moe_gen.weight
73
+ language_model.model.layers.10.self_attn.k_norm.weight
74
+ language_model.model.layers.10.self_attn.k_norm_moe_gen.weight
75
+ language_model.model.layers.10.self_attn.k_proj.bias
76
+ language_model.model.layers.10.self_attn.k_proj.weight
77
+ language_model.model.layers.10.self_attn.k_proj_moe_gen.bias
78
+ language_model.model.layers.10.self_attn.k_proj_moe_gen.weight
79
+ language_model.model.layers.10.self_attn.o_proj.weight
80
+ language_model.model.layers.10.self_attn.o_proj_moe_gen.weight
81
+ language_model.model.layers.10.self_attn.q_norm.weight
82
+ language_model.model.layers.10.self_attn.q_norm_moe_gen.weight
83
+ language_model.model.layers.10.self_attn.q_proj.bias
84
+ language_model.model.layers.10.self_attn.q_proj.weight
85
+ language_model.model.layers.10.self_attn.q_proj_moe_gen.bias
86
+ language_model.model.layers.10.self_attn.q_proj_moe_gen.weight
87
+ language_model.model.layers.10.self_attn.v_proj.bias
88
+ language_model.model.layers.10.self_attn.v_proj.weight
89
+ language_model.model.layers.10.self_attn.v_proj_moe_gen.bias
90
+ language_model.model.layers.10.self_attn.v_proj_moe_gen.weight
91
+ language_model.model.layers.11.input_layernorm.weight
92
+ language_model.model.layers.11.input_layernorm_moe_gen.weight
93
+ language_model.model.layers.11.mlp.down_proj.weight
94
+ language_model.model.layers.11.mlp.gate_proj.weight
95
+ language_model.model.layers.11.mlp.up_proj.weight
96
+ language_model.model.layers.11.mlp_moe_gen.down_proj.weight
97
+ language_model.model.layers.11.mlp_moe_gen.gate_proj.weight
98
+ language_model.model.layers.11.mlp_moe_gen.up_proj.weight
99
+ language_model.model.layers.11.post_attention_layernorm.weight
100
+ language_model.model.layers.11.post_attention_layernorm_moe_gen.weight
101
+ language_model.model.layers.11.self_attn.k_norm.weight
102
+ language_model.model.layers.11.self_attn.k_norm_moe_gen.weight
103
+ language_model.model.layers.11.self_attn.k_proj.bias
104
+ language_model.model.layers.11.self_attn.k_proj.weight
105
+ language_model.model.layers.11.self_attn.k_proj_moe_gen.bias
106
+ language_model.model.layers.11.self_attn.k_proj_moe_gen.weight
107
+ language_model.model.layers.11.self_attn.o_proj.weight
108
+ language_model.model.layers.11.self_attn.o_proj_moe_gen.weight
109
+ language_model.model.layers.11.self_attn.q_norm.weight
110
+ language_model.model.layers.11.self_attn.q_norm_moe_gen.weight
111
+ language_model.model.layers.11.self_attn.q_proj.bias
112
+ language_model.model.layers.11.self_attn.q_proj.weight
113
+ language_model.model.layers.11.self_attn.q_proj_moe_gen.bias
114
+ language_model.model.layers.11.self_attn.q_proj_moe_gen.weight
115
+ language_model.model.layers.11.self_attn.v_proj.bias
116
+ language_model.model.layers.11.self_attn.v_proj.weight
117
+ language_model.model.layers.11.self_attn.v_proj_moe_gen.bias
118
+ language_model.model.layers.11.self_attn.v_proj_moe_gen.weight
119
+ language_model.model.layers.12.input_layernorm.weight
120
+ language_model.model.layers.12.input_layernorm_moe_gen.weight
121
+ language_model.model.layers.12.mlp.down_proj.weight
122
+ language_model.model.layers.12.mlp.gate_proj.weight
123
+ language_model.model.layers.12.mlp.up_proj.weight
124
+ language_model.model.layers.12.mlp_moe_gen.down_proj.weight
125
+ language_model.model.layers.12.mlp_moe_gen.gate_proj.weight
126
+ language_model.model.layers.12.mlp_moe_gen.up_proj.weight
127
+ language_model.model.layers.12.post_attention_layernorm.weight
128
+ language_model.model.layers.12.post_attention_layernorm_moe_gen.weight
129
+ language_model.model.layers.12.self_attn.k_norm.weight
130
+ language_model.model.layers.12.self_attn.k_norm_moe_gen.weight
131
+ language_model.model.layers.12.self_attn.k_proj.bias
132
+ language_model.model.layers.12.self_attn.k_proj.weight
133
+ language_model.model.layers.12.self_attn.k_proj_moe_gen.bias
134
+ language_model.model.layers.12.self_attn.k_proj_moe_gen.weight
135
+ language_model.model.layers.12.self_attn.o_proj.weight
136
+ language_model.model.layers.12.self_attn.o_proj_moe_gen.weight
137
+ language_model.model.layers.12.self_attn.q_norm.weight
138
+ language_model.model.layers.12.self_attn.q_norm_moe_gen.weight
139
+ language_model.model.layers.12.self_attn.q_proj.bias
140
+ language_model.model.layers.12.self_attn.q_proj.weight
141
+ language_model.model.layers.12.self_attn.q_proj_moe_gen.bias
142
+ language_model.model.layers.12.self_attn.q_proj_moe_gen.weight
143
+ language_model.model.layers.12.self_attn.v_proj.bias
144
+ language_model.model.layers.12.self_attn.v_proj.weight
145
+ language_model.model.layers.12.self_attn.v_proj_moe_gen.bias
146
+ language_model.model.layers.12.self_attn.v_proj_moe_gen.weight
147
+ language_model.model.layers.13.input_layernorm.weight
148
+ language_model.model.layers.13.input_layernorm_moe_gen.weight
149
+ language_model.model.layers.13.mlp.down_proj.weight
150
+ language_model.model.layers.13.mlp.gate_proj.weight
151
+ language_model.model.layers.13.mlp.up_proj.weight
152
+ language_model.model.layers.13.mlp_moe_gen.down_proj.weight
153
+ language_model.model.layers.13.mlp_moe_gen.gate_proj.weight
154
+ language_model.model.layers.13.mlp_moe_gen.up_proj.weight
155
+ language_model.model.layers.13.post_attention_layernorm.weight
156
+ language_model.model.layers.13.post_attention_layernorm_moe_gen.weight
157
+ language_model.model.layers.13.self_attn.k_norm.weight
158
+ language_model.model.layers.13.self_attn.k_norm_moe_gen.weight
159
+ language_model.model.layers.13.self_attn.k_proj.bias
160
+ language_model.model.layers.13.self_attn.k_proj.weight
161
+ language_model.model.layers.13.self_attn.k_proj_moe_gen.bias
162
+ language_model.model.layers.13.self_attn.k_proj_moe_gen.weight
163
+ language_model.model.layers.13.self_attn.o_proj.weight
164
+ language_model.model.layers.13.self_attn.o_proj_moe_gen.weight
165
+ language_model.model.layers.13.self_attn.q_norm.weight
166
+ language_model.model.layers.13.self_attn.q_norm_moe_gen.weight
167
+ language_model.model.layers.13.self_attn.q_proj.bias
168
+ language_model.model.layers.13.self_attn.q_proj.weight
169
+ language_model.model.layers.13.self_attn.q_proj_moe_gen.bias
170
+ language_model.model.layers.13.self_attn.q_proj_moe_gen.weight
171
+ language_model.model.layers.13.self_attn.v_proj.bias
172
+ language_model.model.layers.13.self_attn.v_proj.weight
173
+ language_model.model.layers.13.self_attn.v_proj_moe_gen.bias
174
+ language_model.model.layers.13.self_attn.v_proj_moe_gen.weight
175
+ language_model.model.layers.14.input_layernorm.weight
176
+ language_model.model.layers.14.input_layernorm_moe_gen.weight
177
+ language_model.model.layers.14.mlp.down_proj.weight
178
+ language_model.model.layers.14.mlp.gate_proj.weight
179
+ language_model.model.layers.14.mlp.up_proj.weight
180
+ language_model.model.layers.14.mlp_moe_gen.down_proj.weight
181
+ language_model.model.layers.14.mlp_moe_gen.gate_proj.weight
182
+ language_model.model.layers.14.mlp_moe_gen.up_proj.weight
183
+ language_model.model.layers.14.post_attention_layernorm.weight
184
+ language_model.model.layers.14.post_attention_layernorm_moe_gen.weight
185
+ language_model.model.layers.14.self_attn.k_norm.weight
186
+ language_model.model.layers.14.self_attn.k_norm_moe_gen.weight
187
+ language_model.model.layers.14.self_attn.k_proj.bias
188
+ language_model.model.layers.14.self_attn.k_proj.weight
189
+ language_model.model.layers.14.self_attn.k_proj_moe_gen.bias
190
+ language_model.model.layers.14.self_attn.k_proj_moe_gen.weight
191
+ language_model.model.layers.14.self_attn.o_proj.weight
192
+ language_model.model.layers.14.self_attn.o_proj_moe_gen.weight
193
+ language_model.model.layers.14.self_attn.q_norm.weight
194
+ language_model.model.layers.14.self_attn.q_norm_moe_gen.weight
195
+ language_model.model.layers.14.self_attn.q_proj.bias
196
+ language_model.model.layers.14.self_attn.q_proj.weight
197
+ language_model.model.layers.14.self_attn.q_proj_moe_gen.bias
198
+ language_model.model.layers.14.self_attn.q_proj_moe_gen.weight
199
+ language_model.model.layers.14.self_attn.v_proj.bias
200
+ language_model.model.layers.14.self_attn.v_proj.weight
201
+ language_model.model.layers.14.self_attn.v_proj_moe_gen.bias
202
+ language_model.model.layers.14.self_attn.v_proj_moe_gen.weight
203
+ language_model.model.layers.15.input_layernorm.weight
204
+ language_model.model.layers.15.input_layernorm_moe_gen.weight
205
+ language_model.model.layers.15.mlp.down_proj.weight
206
+ language_model.model.layers.15.mlp.gate_proj.weight
207
+ language_model.model.layers.15.mlp.up_proj.weight
208
+ language_model.model.layers.15.mlp_moe_gen.down_proj.weight
209
+ language_model.model.layers.15.mlp_moe_gen.gate_proj.weight
210
+ language_model.model.layers.15.mlp_moe_gen.up_proj.weight
211
+ language_model.model.layers.15.post_attention_layernorm.weight
212
+ language_model.model.layers.15.post_attention_layernorm_moe_gen.weight
213
+ language_model.model.layers.15.self_attn.k_norm.weight
214
+ language_model.model.layers.15.self_attn.k_norm_moe_gen.weight
215
+ language_model.model.layers.15.self_attn.k_proj.bias
216
+ language_model.model.layers.15.self_attn.k_proj.weight
217
+ language_model.model.layers.15.self_attn.k_proj_moe_gen.bias
218
+ language_model.model.layers.15.self_attn.k_proj_moe_gen.weight
219
+ language_model.model.layers.15.self_attn.o_proj.weight
220
+ language_model.model.layers.15.self_attn.o_proj_moe_gen.weight
221
+ language_model.model.layers.15.self_attn.q_norm.weight
222
+ language_model.model.layers.15.self_attn.q_norm_moe_gen.weight
223
+ language_model.model.layers.15.self_attn.q_proj.bias
224
+ language_model.model.layers.15.self_attn.q_proj.weight
225
+ language_model.model.layers.15.self_attn.q_proj_moe_gen.bias
226
+ language_model.model.layers.15.self_attn.q_proj_moe_gen.weight
227
+ language_model.model.layers.15.self_attn.v_proj.bias
228
+ language_model.model.layers.15.self_attn.v_proj.weight
229
+ language_model.model.layers.15.self_attn.v_proj_moe_gen.bias
230
+ language_model.model.layers.15.self_attn.v_proj_moe_gen.weight
231
+ language_model.model.layers.16.input_layernorm.weight
232
+ language_model.model.layers.16.input_layernorm_moe_gen.weight
233
+ language_model.model.layers.16.mlp.down_proj.weight
234
+ language_model.model.layers.16.mlp.gate_proj.weight
235
+ language_model.model.layers.16.mlp.up_proj.weight
236
+ language_model.model.layers.16.mlp_moe_gen.down_proj.weight
237
+ language_model.model.layers.16.mlp_moe_gen.gate_proj.weight
238
+ language_model.model.layers.16.mlp_moe_gen.up_proj.weight
239
+ language_model.model.layers.16.post_attention_layernorm.weight
240
+ language_model.model.layers.16.post_attention_layernorm_moe_gen.weight
241
+ language_model.model.layers.16.self_attn.k_norm.weight
242
+ language_model.model.layers.16.self_attn.k_norm_moe_gen.weight
243
+ language_model.model.layers.16.self_attn.k_proj.bias
244
+ language_model.model.layers.16.self_attn.k_proj.weight
245
+ language_model.model.layers.16.self_attn.k_proj_moe_gen.bias
246
+ language_model.model.layers.16.self_attn.k_proj_moe_gen.weight
247
+ language_model.model.layers.16.self_attn.o_proj.weight
248
+ language_model.model.layers.16.self_attn.o_proj_moe_gen.weight
249
+ language_model.model.layers.16.self_attn.q_norm.weight
250
+ language_model.model.layers.16.self_attn.q_norm_moe_gen.weight
251
+ language_model.model.layers.16.self_attn.q_proj.bias
252
+ language_model.model.layers.16.self_attn.q_proj.weight
253
+ language_model.model.layers.16.self_attn.q_proj_moe_gen.bias
254
+ language_model.model.layers.16.self_attn.q_proj_moe_gen.weight
255
+ language_model.model.layers.16.self_attn.v_proj.bias
256
+ language_model.model.layers.16.self_attn.v_proj.weight
257
+ language_model.model.layers.16.self_attn.v_proj_moe_gen.bias
258
+ language_model.model.layers.16.self_attn.v_proj_moe_gen.weight
259
+ language_model.model.layers.17.input_layernorm.weight
260
+ language_model.model.layers.17.input_layernorm_moe_gen.weight
261
+ language_model.model.layers.17.mlp.down_proj.weight
262
+ language_model.model.layers.17.mlp.gate_proj.weight
263
+ language_model.model.layers.17.mlp.up_proj.weight
264
+ language_model.model.layers.17.mlp_moe_gen.down_proj.weight
265
+ language_model.model.layers.17.mlp_moe_gen.gate_proj.weight
266
+ language_model.model.layers.17.mlp_moe_gen.up_proj.weight
267
+ language_model.model.layers.17.post_attention_layernorm.weight
268
+ language_model.model.layers.17.post_attention_layernorm_moe_gen.weight
269
+ language_model.model.layers.17.self_attn.k_norm.weight
270
+ language_model.model.layers.17.self_attn.k_norm_moe_gen.weight
271
+ language_model.model.layers.17.self_attn.k_proj.bias
272
+ language_model.model.layers.17.self_attn.k_proj.weight
273
+ language_model.model.layers.17.self_attn.k_proj_moe_gen.bias
274
+ language_model.model.layers.17.self_attn.k_proj_moe_gen.weight
275
+ language_model.model.layers.17.self_attn.o_proj.weight
276
+ language_model.model.layers.17.self_attn.o_proj_moe_gen.weight
277
+ language_model.model.layers.17.self_attn.q_norm.weight
278
+ language_model.model.layers.17.self_attn.q_norm_moe_gen.weight
279
+ language_model.model.layers.17.self_attn.q_proj.bias
280
+ language_model.model.layers.17.self_attn.q_proj.weight
281
+ language_model.model.layers.17.self_attn.q_proj_moe_gen.bias
282
+ language_model.model.layers.17.self_attn.q_proj_moe_gen.weight
283
+ language_model.model.layers.17.self_attn.v_proj.bias
284
+ language_model.model.layers.17.self_attn.v_proj.weight
285
+ language_model.model.layers.17.self_attn.v_proj_moe_gen.bias
286
+ language_model.model.layers.17.self_attn.v_proj_moe_gen.weight
287
+ language_model.model.layers.18.input_layernorm.weight
288
+ language_model.model.layers.18.input_layernorm_moe_gen.weight
289
+ language_model.model.layers.18.mlp.down_proj.weight
290
+ language_model.model.layers.18.mlp.gate_proj.weight
291
+ language_model.model.layers.18.mlp.up_proj.weight
292
+ language_model.model.layers.18.mlp_moe_gen.down_proj.weight
293
+ language_model.model.layers.18.mlp_moe_gen.gate_proj.weight
294
+ language_model.model.layers.18.mlp_moe_gen.up_proj.weight
295
+ language_model.model.layers.18.post_attention_layernorm.weight
296
+ language_model.model.layers.18.post_attention_layernorm_moe_gen.weight
297
+ language_model.model.layers.18.self_attn.k_norm.weight
298
+ language_model.model.layers.18.self_attn.k_norm_moe_gen.weight
299
+ language_model.model.layers.18.self_attn.k_proj.bias
300
+ language_model.model.layers.18.self_attn.k_proj.weight
301
+ language_model.model.layers.18.self_attn.k_proj_moe_gen.bias
302
+ language_model.model.layers.18.self_attn.k_proj_moe_gen.weight
303
+ language_model.model.layers.18.self_attn.o_proj.weight
304
+ language_model.model.layers.18.self_attn.o_proj_moe_gen.weight
305
+ language_model.model.layers.18.self_attn.q_norm.weight
306
+ language_model.model.layers.18.self_attn.q_norm_moe_gen.weight
307
+ language_model.model.layers.18.self_attn.q_proj.bias
308
+ language_model.model.layers.18.self_attn.q_proj.weight
309
+ language_model.model.layers.18.self_attn.q_proj_moe_gen.bias
310
+ language_model.model.layers.18.self_attn.q_proj_moe_gen.weight
311
+ language_model.model.layers.18.self_attn.v_proj.bias
312
+ language_model.model.layers.18.self_attn.v_proj.weight
313
+ language_model.model.layers.18.self_attn.v_proj_moe_gen.bias
314
+ language_model.model.layers.18.self_attn.v_proj_moe_gen.weight
315
+ language_model.model.layers.19.input_layernorm.weight
316
+ language_model.model.layers.19.input_layernorm_moe_gen.weight
317
+ language_model.model.layers.19.mlp.down_proj.weight
318
+ language_model.model.layers.19.mlp.gate_proj.weight
319
+ language_model.model.layers.19.mlp.up_proj.weight
320
+ language_model.model.layers.19.mlp_moe_gen.down_proj.weight
321
+ language_model.model.layers.19.mlp_moe_gen.gate_proj.weight
322
+ language_model.model.layers.19.mlp_moe_gen.up_proj.weight
323
+ language_model.model.layers.19.post_attention_layernorm.weight
324
+ language_model.model.layers.19.post_attention_layernorm_moe_gen.weight
325
+ language_model.model.layers.19.self_attn.k_norm.weight
326
+ language_model.model.layers.19.self_attn.k_norm_moe_gen.weight
327
+ language_model.model.layers.19.self_attn.k_proj.bias
328
+ language_model.model.layers.19.self_attn.k_proj.weight
329
+ language_model.model.layers.19.self_attn.k_proj_moe_gen.bias
330
+ language_model.model.layers.19.self_attn.k_proj_moe_gen.weight
331
+ language_model.model.layers.19.self_attn.o_proj.weight
332
+ language_model.model.layers.19.self_attn.o_proj_moe_gen.weight
333
+ language_model.model.layers.19.self_attn.q_norm.weight
334
+ language_model.model.layers.19.self_attn.q_norm_moe_gen.weight
335
+ language_model.model.layers.19.self_attn.q_proj.bias
336
+ language_model.model.layers.19.self_attn.q_proj.weight
337
+ language_model.model.layers.19.self_attn.q_proj_moe_gen.bias
338
+ language_model.model.layers.19.self_attn.q_proj_moe_gen.weight
339
+ language_model.model.layers.19.self_attn.v_proj.bias
340
+ language_model.model.layers.19.self_attn.v_proj.weight
341
+ language_model.model.layers.19.self_attn.v_proj_moe_gen.bias
342
+ language_model.model.layers.19.self_attn.v_proj_moe_gen.weight
343
+ language_model.model.layers.2.input_layernorm.weight
344
+ language_model.model.layers.2.input_layernorm_moe_gen.weight
345
+ language_model.model.layers.2.mlp.down_proj.weight
346
+ language_model.model.layers.2.mlp.gate_proj.weight
347
+ language_model.model.layers.2.mlp.up_proj.weight
348
+ language_model.model.layers.2.mlp_moe_gen.down_proj.weight
349
+ language_model.model.layers.2.mlp_moe_gen.gate_proj.weight
350
+ language_model.model.layers.2.mlp_moe_gen.up_proj.weight
351
+ language_model.model.layers.2.post_attention_layernorm.weight
352
+ language_model.model.layers.2.post_attention_layernorm_moe_gen.weight
353
+ language_model.model.layers.2.self_attn.k_norm.weight
354
+ language_model.model.layers.2.self_attn.k_norm_moe_gen.weight
355
+ language_model.model.layers.2.self_attn.k_proj.bias
356
+ language_model.model.layers.2.self_attn.k_proj.weight
357
+ language_model.model.layers.2.self_attn.k_proj_moe_gen.bias
358
+ language_model.model.layers.2.self_attn.k_proj_moe_gen.weight
359
+ language_model.model.layers.2.self_attn.o_proj.weight
360
+ language_model.model.layers.2.self_attn.o_proj_moe_gen.weight
361
+ language_model.model.layers.2.self_attn.q_norm.weight
362
+ language_model.model.layers.2.self_attn.q_norm_moe_gen.weight
363
+ language_model.model.layers.2.self_attn.q_proj.bias
364
+ language_model.model.layers.2.self_attn.q_proj.weight
365
+ language_model.model.layers.2.self_attn.q_proj_moe_gen.bias
366
+ language_model.model.layers.2.self_attn.q_proj_moe_gen.weight
367
+ language_model.model.layers.2.self_attn.v_proj.bias
368
+ language_model.model.layers.2.self_attn.v_proj.weight
369
+ language_model.model.layers.2.self_attn.v_proj_moe_gen.bias
370
+ language_model.model.layers.2.self_attn.v_proj_moe_gen.weight
371
+ language_model.model.layers.20.input_layernorm.weight
372
+ language_model.model.layers.20.input_layernorm_moe_gen.weight
373
+ language_model.model.layers.20.mlp.down_proj.weight
374
+ language_model.model.layers.20.mlp.gate_proj.weight
375
+ language_model.model.layers.20.mlp.up_proj.weight
376
+ language_model.model.layers.20.mlp_moe_gen.down_proj.weight
377
+ language_model.model.layers.20.mlp_moe_gen.gate_proj.weight
378
+ language_model.model.layers.20.mlp_moe_gen.up_proj.weight
379
+ language_model.model.layers.20.post_attention_layernorm.weight
380
+ language_model.model.layers.20.post_attention_layernorm_moe_gen.weight
381
+ language_model.model.layers.20.self_attn.k_norm.weight
382
+ language_model.model.layers.20.self_attn.k_norm_moe_gen.weight
383
+ language_model.model.layers.20.self_attn.k_proj.bias
384
+ language_model.model.layers.20.self_attn.k_proj.weight
385
+ language_model.model.layers.20.self_attn.k_proj_moe_gen.bias
386
+ language_model.model.layers.20.self_attn.k_proj_moe_gen.weight
387
+ language_model.model.layers.20.self_attn.o_proj.weight
388
+ language_model.model.layers.20.self_attn.o_proj_moe_gen.weight
389
+ language_model.model.layers.20.self_attn.q_norm.weight
390
+ language_model.model.layers.20.self_attn.q_norm_moe_gen.weight
391
+ language_model.model.layers.20.self_attn.q_proj.bias
392
+ language_model.model.layers.20.self_attn.q_proj.weight
393
+ language_model.model.layers.20.self_attn.q_proj_moe_gen.bias
394
+ language_model.model.layers.20.self_attn.q_proj_moe_gen.weight
395
+ language_model.model.layers.20.self_attn.v_proj.bias
396
+ language_model.model.layers.20.self_attn.v_proj.weight
397
+ language_model.model.layers.20.self_attn.v_proj_moe_gen.bias
398
+ language_model.model.layers.20.self_attn.v_proj_moe_gen.weight
399
+ language_model.model.layers.21.input_layernorm.weight
400
+ language_model.model.layers.21.input_layernorm_moe_gen.weight
401
+ language_model.model.layers.21.mlp.down_proj.weight
402
+ language_model.model.layers.21.mlp.gate_proj.weight
403
+ language_model.model.layers.21.mlp.up_proj.weight
404
+ language_model.model.layers.21.mlp_moe_gen.down_proj.weight
405
+ language_model.model.layers.21.mlp_moe_gen.gate_proj.weight
406
+ language_model.model.layers.21.mlp_moe_gen.up_proj.weight
407
+ language_model.model.layers.21.post_attention_layernorm.weight
408
+ language_model.model.layers.21.post_attention_layernorm_moe_gen.weight
409
+ language_model.model.layers.21.self_attn.k_norm.weight
410
+ language_model.model.layers.21.self_attn.k_norm_moe_gen.weight
411
+ language_model.model.layers.21.self_attn.k_proj.bias
412
+ language_model.model.layers.21.self_attn.k_proj.weight
413
+ language_model.model.layers.21.self_attn.k_proj_moe_gen.bias
414
+ language_model.model.layers.21.self_attn.k_proj_moe_gen.weight
415
+ language_model.model.layers.21.self_attn.o_proj.weight
416
+ language_model.model.layers.21.self_attn.o_proj_moe_gen.weight
417
+ language_model.model.layers.21.self_attn.q_norm.weight
418
+ language_model.model.layers.21.self_attn.q_norm_moe_gen.weight
419
+ language_model.model.layers.21.self_attn.q_proj.bias
420
+ language_model.model.layers.21.self_attn.q_proj.weight
421
+ language_model.model.layers.21.self_attn.q_proj_moe_gen.bias
422
+ language_model.model.layers.21.self_attn.q_proj_moe_gen.weight
423
+ language_model.model.layers.21.self_attn.v_proj.bias
424
+ language_model.model.layers.21.self_attn.v_proj.weight
425
+ language_model.model.layers.21.self_attn.v_proj_moe_gen.bias
426
+ language_model.model.layers.21.self_attn.v_proj_moe_gen.weight
427
+ language_model.model.layers.22.input_layernorm.weight
428
+ language_model.model.layers.22.input_layernorm_moe_gen.weight
429
+ language_model.model.layers.22.mlp.down_proj.weight
430
+ language_model.model.layers.22.mlp.gate_proj.weight
431
+ language_model.model.layers.22.mlp.up_proj.weight
432
+ language_model.model.layers.22.mlp_moe_gen.down_proj.weight
433
+ language_model.model.layers.22.mlp_moe_gen.gate_proj.weight
434
+ language_model.model.layers.22.mlp_moe_gen.up_proj.weight
435
+ language_model.model.layers.22.post_attention_layernorm.weight
436
+ language_model.model.layers.22.post_attention_layernorm_moe_gen.weight
437
+ language_model.model.layers.22.self_attn.k_norm.weight
438
+ language_model.model.layers.22.self_attn.k_norm_moe_gen.weight
439
+ language_model.model.layers.22.self_attn.k_proj.bias
440
+ language_model.model.layers.22.self_attn.k_proj.weight
441
+ language_model.model.layers.22.self_attn.k_proj_moe_gen.bias
442
+ language_model.model.layers.22.self_attn.k_proj_moe_gen.weight
443
+ language_model.model.layers.22.self_attn.o_proj.weight
444
+ language_model.model.layers.22.self_attn.o_proj_moe_gen.weight
445
+ language_model.model.layers.22.self_attn.q_norm.weight
446
+ language_model.model.layers.22.self_attn.q_norm_moe_gen.weight
447
+ language_model.model.layers.22.self_attn.q_proj.bias
448
+ language_model.model.layers.22.self_attn.q_proj.weight
449
+ language_model.model.layers.22.self_attn.q_proj_moe_gen.bias
450
+ language_model.model.layers.22.self_attn.q_proj_moe_gen.weight
451
+ language_model.model.layers.22.self_attn.v_proj.bias
452
+ language_model.model.layers.22.self_attn.v_proj.weight
453
+ language_model.model.layers.22.self_attn.v_proj_moe_gen.bias
454
+ language_model.model.layers.22.self_attn.v_proj_moe_gen.weight
455
+ language_model.model.layers.23.input_layernorm.weight
456
+ language_model.model.layers.23.input_layernorm_moe_gen.weight
457
+ language_model.model.layers.23.mlp.down_proj.weight
458
+ language_model.model.layers.23.mlp.gate_proj.weight
459
+ language_model.model.layers.23.mlp.up_proj.weight
460
+ language_model.model.layers.23.mlp_moe_gen.down_proj.weight
461
+ language_model.model.layers.23.mlp_moe_gen.gate_proj.weight
462
+ language_model.model.layers.23.mlp_moe_gen.up_proj.weight
463
+ language_model.model.layers.23.post_attention_layernorm.weight
464
+ language_model.model.layers.23.post_attention_layernorm_moe_gen.weight
465
+ language_model.model.layers.23.self_attn.k_norm.weight
466
+ language_model.model.layers.23.self_attn.k_norm_moe_gen.weight
467
+ language_model.model.layers.23.self_attn.k_proj.bias
468
+ language_model.model.layers.23.self_attn.k_proj.weight
469
+ language_model.model.layers.23.self_attn.k_proj_moe_gen.bias
470
+ language_model.model.layers.23.self_attn.k_proj_moe_gen.weight
471
+ language_model.model.layers.23.self_attn.o_proj.weight
472
+ language_model.model.layers.23.self_attn.o_proj_moe_gen.weight
473
+ language_model.model.layers.23.self_attn.q_norm.weight
474
+ language_model.model.layers.23.self_attn.q_norm_moe_gen.weight
475
+ language_model.model.layers.23.self_attn.q_proj.bias
476
+ language_model.model.layers.23.self_attn.q_proj.weight
477
+ language_model.model.layers.23.self_attn.q_proj_moe_gen.bias
478
+ language_model.model.layers.23.self_attn.q_proj_moe_gen.weight
479
+ language_model.model.layers.23.self_attn.v_proj.bias
480
+ language_model.model.layers.23.self_attn.v_proj.weight
481
+ language_model.model.layers.23.self_attn.v_proj_moe_gen.bias
482
+ language_model.model.layers.23.self_attn.v_proj_moe_gen.weight
483
+ language_model.model.layers.24.input_layernorm.weight
484
+ language_model.model.layers.24.input_layernorm_moe_gen.weight
485
+ language_model.model.layers.24.mlp.down_proj.weight
486
+ language_model.model.layers.24.mlp.gate_proj.weight
487
+ language_model.model.layers.24.mlp.up_proj.weight
488
+ language_model.model.layers.24.mlp_moe_gen.down_proj.weight
489
+ language_model.model.layers.24.mlp_moe_gen.gate_proj.weight
490
+ language_model.model.layers.24.mlp_moe_gen.up_proj.weight
491
+ language_model.model.layers.24.post_attention_layernorm.weight
492
+ language_model.model.layers.24.post_attention_layernorm_moe_gen.weight
493
+ language_model.model.layers.24.self_attn.k_norm.weight
494
+ language_model.model.layers.24.self_attn.k_norm_moe_gen.weight
495
+ language_model.model.layers.24.self_attn.k_proj.bias
496
+ language_model.model.layers.24.self_attn.k_proj.weight
497
+ language_model.model.layers.24.self_attn.k_proj_moe_gen.bias
498
+ language_model.model.layers.24.self_attn.k_proj_moe_gen.weight
499
+ language_model.model.layers.24.self_attn.o_proj.weight
500
+ language_model.model.layers.24.self_attn.o_proj_moe_gen.weight
501
+ language_model.model.layers.24.self_attn.q_norm.weight
502
+ language_model.model.layers.24.self_attn.q_norm_moe_gen.weight
503
+ language_model.model.layers.24.self_attn.q_proj.bias
504
+ language_model.model.layers.24.self_attn.q_proj.weight
505
+ language_model.model.layers.24.self_attn.q_proj_moe_gen.bias
506
+ language_model.model.layers.24.self_attn.q_proj_moe_gen.weight
507
+ language_model.model.layers.24.self_attn.v_proj.bias
508
+ language_model.model.layers.24.self_attn.v_proj.weight
509
+ language_model.model.layers.24.self_attn.v_proj_moe_gen.bias
510
+ language_model.model.layers.24.self_attn.v_proj_moe_gen.weight
511
+ language_model.model.layers.25.input_layernorm.weight
512
+ language_model.model.layers.25.input_layernorm_moe_gen.weight
513
+ language_model.model.layers.25.mlp.down_proj.weight
514
+ language_model.model.layers.25.mlp.gate_proj.weight
515
+ language_model.model.layers.25.mlp.up_proj.weight
516
+ language_model.model.layers.25.mlp_moe_gen.down_proj.weight
517
+ language_model.model.layers.25.mlp_moe_gen.gate_proj.weight
518
+ language_model.model.layers.25.mlp_moe_gen.up_proj.weight
519
+ language_model.model.layers.25.post_attention_layernorm.weight
520
+ language_model.model.layers.25.post_attention_layernorm_moe_gen.weight
521
+ language_model.model.layers.25.self_attn.k_norm.weight
522
+ language_model.model.layers.25.self_attn.k_norm_moe_gen.weight
523
+ language_model.model.layers.25.self_attn.k_proj.bias
524
+ language_model.model.layers.25.self_attn.k_proj.weight
525
+ language_model.model.layers.25.self_attn.k_proj_moe_gen.bias
526
+ language_model.model.layers.25.self_attn.k_proj_moe_gen.weight
527
+ language_model.model.layers.25.self_attn.o_proj.weight
528
+ language_model.model.layers.25.self_attn.o_proj_moe_gen.weight
529
+ language_model.model.layers.25.self_attn.q_norm.weight
530
+ language_model.model.layers.25.self_attn.q_norm_moe_gen.weight
531
+ language_model.model.layers.25.self_attn.q_proj.bias
532
+ language_model.model.layers.25.self_attn.q_proj.weight
533
+ language_model.model.layers.25.self_attn.q_proj_moe_gen.bias
534
+ language_model.model.layers.25.self_attn.q_proj_moe_gen.weight
535
+ language_model.model.layers.25.self_attn.v_proj.bias
536
+ language_model.model.layers.25.self_attn.v_proj.weight
537
+ language_model.model.layers.25.self_attn.v_proj_moe_gen.bias
538
+ language_model.model.layers.25.self_attn.v_proj_moe_gen.weight
539
+ language_model.model.layers.26.input_layernorm.weight
540
+ language_model.model.layers.26.input_layernorm_moe_gen.weight
541
+ language_model.model.layers.26.mlp.down_proj.weight
542
+ language_model.model.layers.26.mlp.gate_proj.weight
543
+ language_model.model.layers.26.mlp.up_proj.weight
544
+ language_model.model.layers.26.mlp_moe_gen.down_proj.weight
545
+ language_model.model.layers.26.mlp_moe_gen.gate_proj.weight
546
+ language_model.model.layers.26.mlp_moe_gen.up_proj.weight
547
+ language_model.model.layers.26.post_attention_layernorm.weight
548
+ language_model.model.layers.26.post_attention_layernorm_moe_gen.weight
549
+ language_model.model.layers.26.self_attn.k_norm.weight
550
+ language_model.model.layers.26.self_attn.k_norm_moe_gen.weight
551
+ language_model.model.layers.26.self_attn.k_proj.bias
552
+ language_model.model.layers.26.self_attn.k_proj.weight
553
+ language_model.model.layers.26.self_attn.k_proj_moe_gen.bias
554
+ language_model.model.layers.26.self_attn.k_proj_moe_gen.weight
555
+ language_model.model.layers.26.self_attn.o_proj.weight
556
+ language_model.model.layers.26.self_attn.o_proj_moe_gen.weight
557
+ language_model.model.layers.26.self_attn.q_norm.weight
558
+ language_model.model.layers.26.self_attn.q_norm_moe_gen.weight
559
+ language_model.model.layers.26.self_attn.q_proj.bias
560
+ language_model.model.layers.26.self_attn.q_proj.weight
561
+ language_model.model.layers.26.self_attn.q_proj_moe_gen.bias
562
+ language_model.model.layers.26.self_attn.q_proj_moe_gen.weight
563
+ language_model.model.layers.26.self_attn.v_proj.bias
564
+ language_model.model.layers.26.self_attn.v_proj.weight
565
+ language_model.model.layers.26.self_attn.v_proj_moe_gen.bias
566
+ language_model.model.layers.26.self_attn.v_proj_moe_gen.weight
567
+ language_model.model.layers.27.input_layernorm.weight
568
+ language_model.model.layers.27.input_layernorm_moe_gen.weight
569
+ language_model.model.layers.27.mlp.down_proj.weight
570
+ language_model.model.layers.27.mlp.gate_proj.weight
571
+ language_model.model.layers.27.mlp.up_proj.weight
572
+ language_model.model.layers.27.mlp_moe_gen.down_proj.weight
573
+ language_model.model.layers.27.mlp_moe_gen.gate_proj.weight
574
+ language_model.model.layers.27.mlp_moe_gen.up_proj.weight
575
+ language_model.model.layers.27.post_attention_layernorm.weight
576
+ language_model.model.layers.27.post_attention_layernorm_moe_gen.weight
577
+ language_model.model.layers.27.self_attn.k_norm.weight
578
+ language_model.model.layers.27.self_attn.k_norm_moe_gen.weight
579
+ language_model.model.layers.27.self_attn.k_proj.bias
580
+ language_model.model.layers.27.self_attn.k_proj.weight
581
+ language_model.model.layers.27.self_attn.k_proj_moe_gen.bias
582
+ language_model.model.layers.27.self_attn.k_proj_moe_gen.weight
583
+ language_model.model.layers.27.self_attn.o_proj.weight
584
+ language_model.model.layers.27.self_attn.o_proj_moe_gen.weight
585
+ language_model.model.layers.27.self_attn.q_norm.weight
586
+ language_model.model.layers.27.self_attn.q_norm_moe_gen.weight
587
+ language_model.model.layers.27.self_attn.q_proj.bias
588
+ language_model.model.layers.27.self_attn.q_proj.weight
589
+ language_model.model.layers.27.self_attn.q_proj_moe_gen.bias
590
+ language_model.model.layers.27.self_attn.q_proj_moe_gen.weight
591
+ language_model.model.layers.27.self_attn.v_proj.bias
592
+ language_model.model.layers.27.self_attn.v_proj.weight
593
+ language_model.model.layers.27.self_attn.v_proj_moe_gen.bias
594
+ language_model.model.layers.27.self_attn.v_proj_moe_gen.weight
595
+ language_model.model.layers.3.input_layernorm.weight
596
+ language_model.model.layers.3.input_layernorm_moe_gen.weight
597
+ language_model.model.layers.3.mlp.down_proj.weight
598
+ language_model.model.layers.3.mlp.gate_proj.weight
599
+ language_model.model.layers.3.mlp.up_proj.weight
600
+ language_model.model.layers.3.mlp_moe_gen.down_proj.weight
601
+ language_model.model.layers.3.mlp_moe_gen.gate_proj.weight
602
+ language_model.model.layers.3.mlp_moe_gen.up_proj.weight
603
+ language_model.model.layers.3.post_attention_layernorm.weight
604
+ language_model.model.layers.3.post_attention_layernorm_moe_gen.weight
605
+ language_model.model.layers.3.self_attn.k_norm.weight
606
+ language_model.model.layers.3.self_attn.k_norm_moe_gen.weight
607
+ language_model.model.layers.3.self_attn.k_proj.bias
608
+ language_model.model.layers.3.self_attn.k_proj.weight
609
+ language_model.model.layers.3.self_attn.k_proj_moe_gen.bias
610
+ language_model.model.layers.3.self_attn.k_proj_moe_gen.weight
611
+ language_model.model.layers.3.self_attn.o_proj.weight
612
+ language_model.model.layers.3.self_attn.o_proj_moe_gen.weight
613
+ language_model.model.layers.3.self_attn.q_norm.weight
614
+ language_model.model.layers.3.self_attn.q_norm_moe_gen.weight
615
+ language_model.model.layers.3.self_attn.q_proj.bias
616
+ language_model.model.layers.3.self_attn.q_proj.weight
617
+ language_model.model.layers.3.self_attn.q_proj_moe_gen.bias
618
+ language_model.model.layers.3.self_attn.q_proj_moe_gen.weight
619
+ language_model.model.layers.3.self_attn.v_proj.bias
620
+ language_model.model.layers.3.self_attn.v_proj.weight
621
+ language_model.model.layers.3.self_attn.v_proj_moe_gen.bias
622
+ language_model.model.layers.3.self_attn.v_proj_moe_gen.weight
623
+ language_model.model.layers.4.input_layernorm.weight
624
+ language_model.model.layers.4.input_layernorm_moe_gen.weight
625
+ language_model.model.layers.4.mlp.down_proj.weight
626
+ language_model.model.layers.4.mlp.gate_proj.weight
627
+ language_model.model.layers.4.mlp.up_proj.weight
628
+ language_model.model.layers.4.mlp_moe_gen.down_proj.weight
629
+ language_model.model.layers.4.mlp_moe_gen.gate_proj.weight
630
+ language_model.model.layers.4.mlp_moe_gen.up_proj.weight
631
+ language_model.model.layers.4.post_attention_layernorm.weight
632
+ language_model.model.layers.4.post_attention_layernorm_moe_gen.weight
633
+ language_model.model.layers.4.self_attn.k_norm.weight
634
+ language_model.model.layers.4.self_attn.k_norm_moe_gen.weight
635
+ language_model.model.layers.4.self_attn.k_proj.bias
636
+ language_model.model.layers.4.self_attn.k_proj.weight
637
+ language_model.model.layers.4.self_attn.k_proj_moe_gen.bias
638
+ language_model.model.layers.4.self_attn.k_proj_moe_gen.weight
639
+ language_model.model.layers.4.self_attn.o_proj.weight
640
+ language_model.model.layers.4.self_attn.o_proj_moe_gen.weight
641
+ language_model.model.layers.4.self_attn.q_norm.weight
642
+ language_model.model.layers.4.self_attn.q_norm_moe_gen.weight
643
+ language_model.model.layers.4.self_attn.q_proj.bias
644
+ language_model.model.layers.4.self_attn.q_proj.weight
645
+ language_model.model.layers.4.self_attn.q_proj_moe_gen.bias
646
+ language_model.model.layers.4.self_attn.q_proj_moe_gen.weight
647
+ language_model.model.layers.4.self_attn.v_proj.bias
648
+ language_model.model.layers.4.self_attn.v_proj.weight
649
+ language_model.model.layers.4.self_attn.v_proj_moe_gen.bias
650
+ language_model.model.layers.4.self_attn.v_proj_moe_gen.weight
651
+ language_model.model.layers.5.input_layernorm.weight
652
+ language_model.model.layers.5.input_layernorm_moe_gen.weight
653
+ language_model.model.layers.5.mlp.down_proj.weight
654
+ language_model.model.layers.5.mlp.gate_proj.weight
655
+ language_model.model.layers.5.mlp.up_proj.weight
656
+ language_model.model.layers.5.mlp_moe_gen.down_proj.weight
657
+ language_model.model.layers.5.mlp_moe_gen.gate_proj.weight
658
+ language_model.model.layers.5.mlp_moe_gen.up_proj.weight
659
+ language_model.model.layers.5.post_attention_layernorm.weight
660
+ language_model.model.layers.5.post_attention_layernorm_moe_gen.weight
661
+ language_model.model.layers.5.self_attn.k_norm.weight
662
+ language_model.model.layers.5.self_attn.k_norm_moe_gen.weight
663
+ language_model.model.layers.5.self_attn.k_proj.bias
664
+ language_model.model.layers.5.self_attn.k_proj.weight
665
+ language_model.model.layers.5.self_attn.k_proj_moe_gen.bias
666
+ language_model.model.layers.5.self_attn.k_proj_moe_gen.weight
667
+ language_model.model.layers.5.self_attn.o_proj.weight
668
+ language_model.model.layers.5.self_attn.o_proj_moe_gen.weight
669
+ language_model.model.layers.5.self_attn.q_norm.weight
670
+ language_model.model.layers.5.self_attn.q_norm_moe_gen.weight
671
+ language_model.model.layers.5.self_attn.q_proj.bias
672
+ language_model.model.layers.5.self_attn.q_proj.weight
673
+ language_model.model.layers.5.self_attn.q_proj_moe_gen.bias
674
+ language_model.model.layers.5.self_attn.q_proj_moe_gen.weight
675
+ language_model.model.layers.5.self_attn.v_proj.bias
676
+ language_model.model.layers.5.self_attn.v_proj.weight
677
+ language_model.model.layers.5.self_attn.v_proj_moe_gen.bias
678
+ language_model.model.layers.5.self_attn.v_proj_moe_gen.weight
679
+ language_model.model.layers.6.input_layernorm.weight
680
+ language_model.model.layers.6.input_layernorm_moe_gen.weight
681
+ language_model.model.layers.6.mlp.down_proj.weight
682
+ language_model.model.layers.6.mlp.gate_proj.weight
683
+ language_model.model.layers.6.mlp.up_proj.weight
684
+ language_model.model.layers.6.mlp_moe_gen.down_proj.weight
685
+ language_model.model.layers.6.mlp_moe_gen.gate_proj.weight
686
+ language_model.model.layers.6.mlp_moe_gen.up_proj.weight
687
+ language_model.model.layers.6.post_attention_layernorm.weight
688
+ language_model.model.layers.6.post_attention_layernorm_moe_gen.weight
689
+ language_model.model.layers.6.self_attn.k_norm.weight
690
+ language_model.model.layers.6.self_attn.k_norm_moe_gen.weight
691
+ language_model.model.layers.6.self_attn.k_proj.bias
692
+ language_model.model.layers.6.self_attn.k_proj.weight
693
+ language_model.model.layers.6.self_attn.k_proj_moe_gen.bias
694
+ language_model.model.layers.6.self_attn.k_proj_moe_gen.weight
695
+ language_model.model.layers.6.self_attn.o_proj.weight
696
+ language_model.model.layers.6.self_attn.o_proj_moe_gen.weight
697
+ language_model.model.layers.6.self_attn.q_norm.weight
698
+ language_model.model.layers.6.self_attn.q_norm_moe_gen.weight
699
+ language_model.model.layers.6.self_attn.q_proj.bias
700
+ language_model.model.layers.6.self_attn.q_proj.weight
701
+ language_model.model.layers.6.self_attn.q_proj_moe_gen.bias
702
+ language_model.model.layers.6.self_attn.q_proj_moe_gen.weight
703
+ language_model.model.layers.6.self_attn.v_proj.bias
704
+ language_model.model.layers.6.self_attn.v_proj.weight
705
+ language_model.model.layers.6.self_attn.v_proj_moe_gen.bias
706
+ language_model.model.layers.6.self_attn.v_proj_moe_gen.weight
707
+ language_model.model.layers.7.input_layernorm.weight
708
+ language_model.model.layers.7.input_layernorm_moe_gen.weight
709
+ language_model.model.layers.7.mlp.down_proj.weight
710
+ language_model.model.layers.7.mlp.gate_proj.weight
711
+ language_model.model.layers.7.mlp.up_proj.weight
712
+ language_model.model.layers.7.mlp_moe_gen.down_proj.weight
713
+ language_model.model.layers.7.mlp_moe_gen.gate_proj.weight
714
+ language_model.model.layers.7.mlp_moe_gen.up_proj.weight
715
+ language_model.model.layers.7.post_attention_layernorm.weight
716
+ language_model.model.layers.7.post_attention_layernorm_moe_gen.weight
717
+ language_model.model.layers.7.self_attn.k_norm.weight
718
+ language_model.model.layers.7.self_attn.k_norm_moe_gen.weight
719
+ language_model.model.layers.7.self_attn.k_proj.bias
720
+ language_model.model.layers.7.self_attn.k_proj.weight
721
+ language_model.model.layers.7.self_attn.k_proj_moe_gen.bias
722
+ language_model.model.layers.7.self_attn.k_proj_moe_gen.weight
723
+ language_model.model.layers.7.self_attn.o_proj.weight
724
+ language_model.model.layers.7.self_attn.o_proj_moe_gen.weight
725
+ language_model.model.layers.7.self_attn.q_norm.weight
726
+ language_model.model.layers.7.self_attn.q_norm_moe_gen.weight
727
+ language_model.model.layers.7.self_attn.q_proj.bias
728
+ language_model.model.layers.7.self_attn.q_proj.weight
729
+ language_model.model.layers.7.self_attn.q_proj_moe_gen.bias
730
+ language_model.model.layers.7.self_attn.q_proj_moe_gen.weight
731
+ language_model.model.layers.7.self_attn.v_proj.bias
732
+ language_model.model.layers.7.self_attn.v_proj.weight
733
+ language_model.model.layers.7.self_attn.v_proj_moe_gen.bias
734
+ language_model.model.layers.7.self_attn.v_proj_moe_gen.weight
735
+ language_model.model.layers.8.input_layernorm.weight
736
+ language_model.model.layers.8.input_layernorm_moe_gen.weight
737
+ language_model.model.layers.8.mlp.down_proj.weight
738
+ language_model.model.layers.8.mlp.gate_proj.weight
739
+ language_model.model.layers.8.mlp.up_proj.weight
740
+ language_model.model.layers.8.mlp_moe_gen.down_proj.weight
741
+ language_model.model.layers.8.mlp_moe_gen.gate_proj.weight
742
+ language_model.model.layers.8.mlp_moe_gen.up_proj.weight
743
+ language_model.model.layers.8.post_attention_layernorm.weight
744
+ language_model.model.layers.8.post_attention_layernorm_moe_gen.weight
745
+ language_model.model.layers.8.self_attn.k_norm.weight
746
+ language_model.model.layers.8.self_attn.k_norm_moe_gen.weight
747
+ language_model.model.layers.8.self_attn.k_proj.bias
748
+ language_model.model.layers.8.self_attn.k_proj.weight
749
+ language_model.model.layers.8.self_attn.k_proj_moe_gen.bias
750
+ language_model.model.layers.8.self_attn.k_proj_moe_gen.weight
751
+ language_model.model.layers.8.self_attn.o_proj.weight
752
+ language_model.model.layers.8.self_attn.o_proj_moe_gen.weight
753
+ language_model.model.layers.8.self_attn.q_norm.weight
754
+ language_model.model.layers.8.self_attn.q_norm_moe_gen.weight
755
+ language_model.model.layers.8.self_attn.q_proj.bias
756
+ language_model.model.layers.8.self_attn.q_proj.weight
757
+ language_model.model.layers.8.self_attn.q_proj_moe_gen.bias
758
+ language_model.model.layers.8.self_attn.q_proj_moe_gen.weight
759
+ language_model.model.layers.8.self_attn.v_proj.bias
760
+ language_model.model.layers.8.self_attn.v_proj.weight
761
+ language_model.model.layers.8.self_attn.v_proj_moe_gen.bias
762
+ language_model.model.layers.8.self_attn.v_proj_moe_gen.weight
763
+ language_model.model.layers.9.input_layernorm.weight
764
+ language_model.model.layers.9.input_layernorm_moe_gen.weight
765
+ language_model.model.layers.9.mlp.down_proj.weight
766
+ language_model.model.layers.9.mlp.gate_proj.weight
767
+ language_model.model.layers.9.mlp.up_proj.weight
768
+ language_model.model.layers.9.mlp_moe_gen.down_proj.weight
769
+ language_model.model.layers.9.mlp_moe_gen.gate_proj.weight
770
+ language_model.model.layers.9.mlp_moe_gen.up_proj.weight
771
+ language_model.model.layers.9.post_attention_layernorm.weight
772
+ language_model.model.layers.9.post_attention_layernorm_moe_gen.weight
773
+ language_model.model.layers.9.self_attn.k_norm.weight
774
+ language_model.model.layers.9.self_attn.k_norm_moe_gen.weight
775
+ language_model.model.layers.9.self_attn.k_proj.bias
776
+ language_model.model.layers.9.self_attn.k_proj.weight
777
+ language_model.model.layers.9.self_attn.k_proj_moe_gen.bias
778
+ language_model.model.layers.9.self_attn.k_proj_moe_gen.weight
779
+ language_model.model.layers.9.self_attn.o_proj.weight
780
+ language_model.model.layers.9.self_attn.o_proj_moe_gen.weight
781
+ language_model.model.layers.9.self_attn.q_norm.weight
782
+ language_model.model.layers.9.self_attn.q_norm_moe_gen.weight
783
+ language_model.model.layers.9.self_attn.q_proj.bias
784
+ language_model.model.layers.9.self_attn.q_proj.weight
785
+ language_model.model.layers.9.self_attn.q_proj_moe_gen.bias
786
+ language_model.model.layers.9.self_attn.q_proj_moe_gen.weight
787
+ language_model.model.layers.9.self_attn.v_proj.bias
788
+ language_model.model.layers.9.self_attn.v_proj.weight
789
+ language_model.model.layers.9.self_attn.v_proj_moe_gen.bias
790
+ language_model.model.layers.9.self_attn.v_proj_moe_gen.weight
791
+ language_model.model.norm.weight
792
+ language_model.model.norm_moe_gen.weight
793
+ latent_pos_embed.pos_embed
794
+ llm2vae.bias
795
+ llm2vae.weight
796
+ time_embedder.mlp.0.bias
797
+ time_embedder.mlp.0.weight
798
+ time_embedder.mlp.2.bias
799
+ time_embedder.mlp.2.weight
800
+ vae2llm.bias
801
+ vae2llm.weight
802
+ vit_model.vision_model.embeddings.patch_embedding.bias
803
+ vit_model.vision_model.embeddings.patch_embedding.weight
804
+ vit_model.vision_model.embeddings.position_embedding.weight
805
+ vit_model.vision_model.encoder.layers.0.layer_norm1.bias
806
+ vit_model.vision_model.encoder.layers.0.layer_norm1.weight
807
+ vit_model.vision_model.encoder.layers.0.layer_norm2.bias
808
+ vit_model.vision_model.encoder.layers.0.layer_norm2.weight
809
+ vit_model.vision_model.encoder.layers.0.mlp.fc1.bias
810
+ vit_model.vision_model.encoder.layers.0.mlp.fc1.weight
811
+ vit_model.vision_model.encoder.layers.0.mlp.fc2.bias
812
+ vit_model.vision_model.encoder.layers.0.mlp.fc2.weight
813
+ vit_model.vision_model.encoder.layers.0.self_attn.k_proj.bias
814
+ vit_model.vision_model.encoder.layers.0.self_attn.k_proj.weight
815
+ vit_model.vision_model.encoder.layers.0.self_attn.out_proj.bias
816
+ vit_model.vision_model.encoder.layers.0.self_attn.out_proj.weight
817
+ vit_model.vision_model.encoder.layers.0.self_attn.q_proj.bias
818
+ vit_model.vision_model.encoder.layers.0.self_attn.q_proj.weight
819
+ vit_model.vision_model.encoder.layers.0.self_attn.v_proj.bias
820
+ vit_model.vision_model.encoder.layers.0.self_attn.v_proj.weight
821
+ vit_model.vision_model.encoder.layers.1.layer_norm1.bias
822
+ vit_model.vision_model.encoder.layers.1.layer_norm1.weight
823
+ vit_model.vision_model.encoder.layers.1.layer_norm2.bias
824
+ vit_model.vision_model.encoder.layers.1.layer_norm2.weight
825
+ vit_model.vision_model.encoder.layers.1.mlp.fc1.bias
826
+ vit_model.vision_model.encoder.layers.1.mlp.fc1.weight
827
+ vit_model.vision_model.encoder.layers.1.mlp.fc2.bias
828
+ vit_model.vision_model.encoder.layers.1.mlp.fc2.weight
829
+ vit_model.vision_model.encoder.layers.1.self_attn.k_proj.bias
830
+ vit_model.vision_model.encoder.layers.1.self_attn.k_proj.weight
831
+ vit_model.vision_model.encoder.layers.1.self_attn.out_proj.bias
832
+ vit_model.vision_model.encoder.layers.1.self_attn.out_proj.weight
833
+ vit_model.vision_model.encoder.layers.1.self_attn.q_proj.bias
834
+ vit_model.vision_model.encoder.layers.1.self_attn.q_proj.weight
835
+ vit_model.vision_model.encoder.layers.1.self_attn.v_proj.bias
836
+ vit_model.vision_model.encoder.layers.1.self_attn.v_proj.weight
837
+ vit_model.vision_model.encoder.layers.10.layer_norm1.bias
838
+ vit_model.vision_model.encoder.layers.10.layer_norm1.weight
839
+ vit_model.vision_model.encoder.layers.10.layer_norm2.bias
840
+ vit_model.vision_model.encoder.layers.10.layer_norm2.weight
841
+ vit_model.vision_model.encoder.layers.10.mlp.fc1.bias
842
+ vit_model.vision_model.encoder.layers.10.mlp.fc1.weight
843
+ vit_model.vision_model.encoder.layers.10.mlp.fc2.bias
844
+ vit_model.vision_model.encoder.layers.10.mlp.fc2.weight
845
+ vit_model.vision_model.encoder.layers.10.self_attn.k_proj.bias
846
+ vit_model.vision_model.encoder.layers.10.self_attn.k_proj.weight
847
+ vit_model.vision_model.encoder.layers.10.self_attn.out_proj.bias
848
+ vit_model.vision_model.encoder.layers.10.self_attn.out_proj.weight
849
+ vit_model.vision_model.encoder.layers.10.self_attn.q_proj.bias
850
+ vit_model.vision_model.encoder.layers.10.self_attn.q_proj.weight
851
+ vit_model.vision_model.encoder.layers.10.self_attn.v_proj.bias
852
+ vit_model.vision_model.encoder.layers.10.self_attn.v_proj.weight
853
+ vit_model.vision_model.encoder.layers.11.layer_norm1.bias
854
+ vit_model.vision_model.encoder.layers.11.layer_norm1.weight
855
+ vit_model.vision_model.encoder.layers.11.layer_norm2.bias
856
+ vit_model.vision_model.encoder.layers.11.layer_norm2.weight
857
+ vit_model.vision_model.encoder.layers.11.mlp.fc1.bias
858
+ vit_model.vision_model.encoder.layers.11.mlp.fc1.weight
859
+ vit_model.vision_model.encoder.layers.11.mlp.fc2.bias
860
+ vit_model.vision_model.encoder.layers.11.mlp.fc2.weight
861
+ vit_model.vision_model.encoder.layers.11.self_attn.k_proj.bias
862
+ vit_model.vision_model.encoder.layers.11.self_attn.k_proj.weight
863
+ vit_model.vision_model.encoder.layers.11.self_attn.out_proj.bias
864
+ vit_model.vision_model.encoder.layers.11.self_attn.out_proj.weight
865
+ vit_model.vision_model.encoder.layers.11.self_attn.q_proj.bias
866
+ vit_model.vision_model.encoder.layers.11.self_attn.q_proj.weight
867
+ vit_model.vision_model.encoder.layers.11.self_attn.v_proj.bias
868
+ vit_model.vision_model.encoder.layers.11.self_attn.v_proj.weight
869
+ vit_model.vision_model.encoder.layers.12.layer_norm1.bias
870
+ vit_model.vision_model.encoder.layers.12.layer_norm1.weight
871
+ vit_model.vision_model.encoder.layers.12.layer_norm2.bias
872
+ vit_model.vision_model.encoder.layers.12.layer_norm2.weight
873
+ vit_model.vision_model.encoder.layers.12.mlp.fc1.bias
874
+ vit_model.vision_model.encoder.layers.12.mlp.fc1.weight
875
+ vit_model.vision_model.encoder.layers.12.mlp.fc2.bias
876
+ vit_model.vision_model.encoder.layers.12.mlp.fc2.weight
877
+ vit_model.vision_model.encoder.layers.12.self_attn.k_proj.bias
878
+ vit_model.vision_model.encoder.layers.12.self_attn.k_proj.weight
879
+ vit_model.vision_model.encoder.layers.12.self_attn.out_proj.bias
880
+ vit_model.vision_model.encoder.layers.12.self_attn.out_proj.weight
881
+ vit_model.vision_model.encoder.layers.12.self_attn.q_proj.bias
882
+ vit_model.vision_model.encoder.layers.12.self_attn.q_proj.weight
883
+ vit_model.vision_model.encoder.layers.12.self_attn.v_proj.bias
884
+ vit_model.vision_model.encoder.layers.12.self_attn.v_proj.weight
885
+ vit_model.vision_model.encoder.layers.13.layer_norm1.bias
886
+ vit_model.vision_model.encoder.layers.13.layer_norm1.weight
887
+ vit_model.vision_model.encoder.layers.13.layer_norm2.bias
888
+ vit_model.vision_model.encoder.layers.13.layer_norm2.weight
889
+ vit_model.vision_model.encoder.layers.13.mlp.fc1.bias
890
+ vit_model.vision_model.encoder.layers.13.mlp.fc1.weight
891
+ vit_model.vision_model.encoder.layers.13.mlp.fc2.bias
892
+ vit_model.vision_model.encoder.layers.13.mlp.fc2.weight
893
+ vit_model.vision_model.encoder.layers.13.self_attn.k_proj.bias
894
+ vit_model.vision_model.encoder.layers.13.self_attn.k_proj.weight
895
+ vit_model.vision_model.encoder.layers.13.self_attn.out_proj.bias
896
+ vit_model.vision_model.encoder.layers.13.self_attn.out_proj.weight
897
+ vit_model.vision_model.encoder.layers.13.self_attn.q_proj.bias
898
+ vit_model.vision_model.encoder.layers.13.self_attn.q_proj.weight
899
+ vit_model.vision_model.encoder.layers.13.self_attn.v_proj.bias
900
+ vit_model.vision_model.encoder.layers.13.self_attn.v_proj.weight
901
+ vit_model.vision_model.encoder.layers.14.layer_norm1.bias
902
+ vit_model.vision_model.encoder.layers.14.layer_norm1.weight
903
+ vit_model.vision_model.encoder.layers.14.layer_norm2.bias
904
+ vit_model.vision_model.encoder.layers.14.layer_norm2.weight
905
+ vit_model.vision_model.encoder.layers.14.mlp.fc1.bias
906
+ vit_model.vision_model.encoder.layers.14.mlp.fc1.weight
907
+ vit_model.vision_model.encoder.layers.14.mlp.fc2.bias
908
+ vit_model.vision_model.encoder.layers.14.mlp.fc2.weight
909
+ vit_model.vision_model.encoder.layers.14.self_attn.k_proj.bias
910
+ vit_model.vision_model.encoder.layers.14.self_attn.k_proj.weight
911
+ vit_model.vision_model.encoder.layers.14.self_attn.out_proj.bias
912
+ vit_model.vision_model.encoder.layers.14.self_attn.out_proj.weight
913
+ vit_model.vision_model.encoder.layers.14.self_attn.q_proj.bias
914
+ vit_model.vision_model.encoder.layers.14.self_attn.q_proj.weight
915
+ vit_model.vision_model.encoder.layers.14.self_attn.v_proj.bias
916
+ vit_model.vision_model.encoder.layers.14.self_attn.v_proj.weight
917
+ vit_model.vision_model.encoder.layers.15.layer_norm1.bias
918
+ vit_model.vision_model.encoder.layers.15.layer_norm1.weight
919
+ vit_model.vision_model.encoder.layers.15.layer_norm2.bias
920
+ vit_model.vision_model.encoder.layers.15.layer_norm2.weight
921
+ vit_model.vision_model.encoder.layers.15.mlp.fc1.bias
922
+ vit_model.vision_model.encoder.layers.15.mlp.fc1.weight
923
+ vit_model.vision_model.encoder.layers.15.mlp.fc2.bias
924
+ vit_model.vision_model.encoder.layers.15.mlp.fc2.weight
925
+ vit_model.vision_model.encoder.layers.15.self_attn.k_proj.bias
926
+ vit_model.vision_model.encoder.layers.15.self_attn.k_proj.weight
927
+ vit_model.vision_model.encoder.layers.15.self_attn.out_proj.bias
928
+ vit_model.vision_model.encoder.layers.15.self_attn.out_proj.weight
929
+ vit_model.vision_model.encoder.layers.15.self_attn.q_proj.bias
930
+ vit_model.vision_model.encoder.layers.15.self_attn.q_proj.weight
931
+ vit_model.vision_model.encoder.layers.15.self_attn.v_proj.bias
932
+ vit_model.vision_model.encoder.layers.15.self_attn.v_proj.weight
933
+ vit_model.vision_model.encoder.layers.16.layer_norm1.bias
934
+ vit_model.vision_model.encoder.layers.16.layer_norm1.weight
935
+ vit_model.vision_model.encoder.layers.16.layer_norm2.bias
936
+ vit_model.vision_model.encoder.layers.16.layer_norm2.weight
937
+ vit_model.vision_model.encoder.layers.16.mlp.fc1.bias
938
+ vit_model.vision_model.encoder.layers.16.mlp.fc1.weight
939
+ vit_model.vision_model.encoder.layers.16.mlp.fc2.bias
940
+ vit_model.vision_model.encoder.layers.16.mlp.fc2.weight
941
+ vit_model.vision_model.encoder.layers.16.self_attn.k_proj.bias
942
+ vit_model.vision_model.encoder.layers.16.self_attn.k_proj.weight
943
+ vit_model.vision_model.encoder.layers.16.self_attn.out_proj.bias
944
+ vit_model.vision_model.encoder.layers.16.self_attn.out_proj.weight
945
+ vit_model.vision_model.encoder.layers.16.self_attn.q_proj.bias
946
+ vit_model.vision_model.encoder.layers.16.self_attn.q_proj.weight
947
+ vit_model.vision_model.encoder.layers.16.self_attn.v_proj.bias
948
+ vit_model.vision_model.encoder.layers.16.self_attn.v_proj.weight
949
+ vit_model.vision_model.encoder.layers.17.layer_norm1.bias
950
+ vit_model.vision_model.encoder.layers.17.layer_norm1.weight
951
+ vit_model.vision_model.encoder.layers.17.layer_norm2.bias
952
+ vit_model.vision_model.encoder.layers.17.layer_norm2.weight
953
+ vit_model.vision_model.encoder.layers.17.mlp.fc1.bias
954
+ vit_model.vision_model.encoder.layers.17.mlp.fc1.weight
955
+ vit_model.vision_model.encoder.layers.17.mlp.fc2.bias
956
+ vit_model.vision_model.encoder.layers.17.mlp.fc2.weight
957
+ vit_model.vision_model.encoder.layers.17.self_attn.k_proj.bias
958
+ vit_model.vision_model.encoder.layers.17.self_attn.k_proj.weight
959
+ vit_model.vision_model.encoder.layers.17.self_attn.out_proj.bias
960
+ vit_model.vision_model.encoder.layers.17.self_attn.out_proj.weight
961
+ vit_model.vision_model.encoder.layers.17.self_attn.q_proj.bias
962
+ vit_model.vision_model.encoder.layers.17.self_attn.q_proj.weight
963
+ vit_model.vision_model.encoder.layers.17.self_attn.v_proj.bias
964
+ vit_model.vision_model.encoder.layers.17.self_attn.v_proj.weight
965
+ vit_model.vision_model.encoder.layers.18.layer_norm1.bias
966
+ vit_model.vision_model.encoder.layers.18.layer_norm1.weight
967
+ vit_model.vision_model.encoder.layers.18.layer_norm2.bias
968
+ vit_model.vision_model.encoder.layers.18.layer_norm2.weight
969
+ vit_model.vision_model.encoder.layers.18.mlp.fc1.bias
970
+ vit_model.vision_model.encoder.layers.18.mlp.fc1.weight
971
+ vit_model.vision_model.encoder.layers.18.mlp.fc2.bias
972
+ vit_model.vision_model.encoder.layers.18.mlp.fc2.weight
973
+ vit_model.vision_model.encoder.layers.18.self_attn.k_proj.bias
974
+ vit_model.vision_model.encoder.layers.18.self_attn.k_proj.weight
975
+ vit_model.vision_model.encoder.layers.18.self_attn.out_proj.bias
976
+ vit_model.vision_model.encoder.layers.18.self_attn.out_proj.weight
977
+ vit_model.vision_model.encoder.layers.18.self_attn.q_proj.bias
978
+ vit_model.vision_model.encoder.layers.18.self_attn.q_proj.weight
979
+ vit_model.vision_model.encoder.layers.18.self_attn.v_proj.bias
980
+ vit_model.vision_model.encoder.layers.18.self_attn.v_proj.weight
981
+ vit_model.vision_model.encoder.layers.19.layer_norm1.bias
982
+ vit_model.vision_model.encoder.layers.19.layer_norm1.weight
983
+ vit_model.vision_model.encoder.layers.19.layer_norm2.bias
984
+ vit_model.vision_model.encoder.layers.19.layer_norm2.weight
985
+ vit_model.vision_model.encoder.layers.19.mlp.fc1.bias
986
+ vit_model.vision_model.encoder.layers.19.mlp.fc1.weight
987
+ vit_model.vision_model.encoder.layers.19.mlp.fc2.bias
988
+ vit_model.vision_model.encoder.layers.19.mlp.fc2.weight
989
+ vit_model.vision_model.encoder.layers.19.self_attn.k_proj.bias
990
+ vit_model.vision_model.encoder.layers.19.self_attn.k_proj.weight
991
+ vit_model.vision_model.encoder.layers.19.self_attn.out_proj.bias
992
+ vit_model.vision_model.encoder.layers.19.self_attn.out_proj.weight
993
+ vit_model.vision_model.encoder.layers.19.self_attn.q_proj.bias
994
+ vit_model.vision_model.encoder.layers.19.self_attn.q_proj.weight
995
+ vit_model.vision_model.encoder.layers.19.self_attn.v_proj.bias
996
+ vit_model.vision_model.encoder.layers.19.self_attn.v_proj.weight
997
+ vit_model.vision_model.encoder.layers.2.layer_norm1.bias
998
+ vit_model.vision_model.encoder.layers.2.layer_norm1.weight
999
+ vit_model.vision_model.encoder.layers.2.layer_norm2.bias
1000
+ vit_model.vision_model.encoder.layers.2.layer_norm2.weight
1001
+ vit_model.vision_model.encoder.layers.2.mlp.fc1.bias
1002
+ vit_model.vision_model.encoder.layers.2.mlp.fc1.weight
1003
+ vit_model.vision_model.encoder.layers.2.mlp.fc2.bias
1004
+ vit_model.vision_model.encoder.layers.2.mlp.fc2.weight
1005
+ vit_model.vision_model.encoder.layers.2.self_attn.k_proj.bias
1006
+ vit_model.vision_model.encoder.layers.2.self_attn.k_proj.weight
1007
+ vit_model.vision_model.encoder.layers.2.self_attn.out_proj.bias
1008
+ vit_model.vision_model.encoder.layers.2.self_attn.out_proj.weight
1009
+ vit_model.vision_model.encoder.layers.2.self_attn.q_proj.bias
1010
+ vit_model.vision_model.encoder.layers.2.self_attn.q_proj.weight
1011
+ vit_model.vision_model.encoder.layers.2.self_attn.v_proj.bias
1012
+ vit_model.vision_model.encoder.layers.2.self_attn.v_proj.weight
1013
+ vit_model.vision_model.encoder.layers.20.layer_norm1.bias
1014
+ vit_model.vision_model.encoder.layers.20.layer_norm1.weight
1015
+ vit_model.vision_model.encoder.layers.20.layer_norm2.bias
1016
+ vit_model.vision_model.encoder.layers.20.layer_norm2.weight
1017
+ vit_model.vision_model.encoder.layers.20.mlp.fc1.bias
1018
+ vit_model.vision_model.encoder.layers.20.mlp.fc1.weight
1019
+ vit_model.vision_model.encoder.layers.20.mlp.fc2.bias
1020
+ vit_model.vision_model.encoder.layers.20.mlp.fc2.weight
1021
+ vit_model.vision_model.encoder.layers.20.self_attn.k_proj.bias
1022
+ vit_model.vision_model.encoder.layers.20.self_attn.k_proj.weight
1023
+ vit_model.vision_model.encoder.layers.20.self_attn.out_proj.bias
1024
+ vit_model.vision_model.encoder.layers.20.self_attn.out_proj.weight
1025
+ vit_model.vision_model.encoder.layers.20.self_attn.q_proj.bias
1026
+ vit_model.vision_model.encoder.layers.20.self_attn.q_proj.weight
1027
+ vit_model.vision_model.encoder.layers.20.self_attn.v_proj.bias
1028
+ vit_model.vision_model.encoder.layers.20.self_attn.v_proj.weight
1029
+ vit_model.vision_model.encoder.layers.21.layer_norm1.bias
1030
+ vit_model.vision_model.encoder.layers.21.layer_norm1.weight
1031
+ vit_model.vision_model.encoder.layers.21.layer_norm2.bias
1032
+ vit_model.vision_model.encoder.layers.21.layer_norm2.weight
1033
+ vit_model.vision_model.encoder.layers.21.mlp.fc1.bias
1034
+ vit_model.vision_model.encoder.layers.21.mlp.fc1.weight
1035
+ vit_model.vision_model.encoder.layers.21.mlp.fc2.bias
1036
+ vit_model.vision_model.encoder.layers.21.mlp.fc2.weight
1037
+ vit_model.vision_model.encoder.layers.21.self_attn.k_proj.bias
1038
+ vit_model.vision_model.encoder.layers.21.self_attn.k_proj.weight
1039
+ vit_model.vision_model.encoder.layers.21.self_attn.out_proj.bias
1040
+ vit_model.vision_model.encoder.layers.21.self_attn.out_proj.weight
1041
+ vit_model.vision_model.encoder.layers.21.self_attn.q_proj.bias
1042
+ vit_model.vision_model.encoder.layers.21.self_attn.q_proj.weight
1043
+ vit_model.vision_model.encoder.layers.21.self_attn.v_proj.bias
1044
+ vit_model.vision_model.encoder.layers.21.self_attn.v_proj.weight
1045
+ vit_model.vision_model.encoder.layers.22.layer_norm1.bias
1046
+ vit_model.vision_model.encoder.layers.22.layer_norm1.weight
1047
+ vit_model.vision_model.encoder.layers.22.layer_norm2.bias
1048
+ vit_model.vision_model.encoder.layers.22.layer_norm2.weight
1049
+ vit_model.vision_model.encoder.layers.22.mlp.fc1.bias
1050
+ vit_model.vision_model.encoder.layers.22.mlp.fc1.weight
1051
+ vit_model.vision_model.encoder.layers.22.mlp.fc2.bias
1052
+ vit_model.vision_model.encoder.layers.22.mlp.fc2.weight
1053
+ vit_model.vision_model.encoder.layers.22.self_attn.k_proj.bias
1054
+ vit_model.vision_model.encoder.layers.22.self_attn.k_proj.weight
1055
+ vit_model.vision_model.encoder.layers.22.self_attn.out_proj.bias
1056
+ vit_model.vision_model.encoder.layers.22.self_attn.out_proj.weight
1057
+ vit_model.vision_model.encoder.layers.22.self_attn.q_proj.bias
1058
+ vit_model.vision_model.encoder.layers.22.self_attn.q_proj.weight
1059
+ vit_model.vision_model.encoder.layers.22.self_attn.v_proj.bias
1060
+ vit_model.vision_model.encoder.layers.22.self_attn.v_proj.weight
1061
+ vit_model.vision_model.encoder.layers.23.layer_norm1.bias
1062
+ vit_model.vision_model.encoder.layers.23.layer_norm1.weight
1063
+ vit_model.vision_model.encoder.layers.23.layer_norm2.bias
1064
+ vit_model.vision_model.encoder.layers.23.layer_norm2.weight
1065
+ vit_model.vision_model.encoder.layers.23.mlp.fc1.bias
1066
+ vit_model.vision_model.encoder.layers.23.mlp.fc1.weight
1067
+ vit_model.vision_model.encoder.layers.23.mlp.fc2.bias
1068
+ vit_model.vision_model.encoder.layers.23.mlp.fc2.weight
1069
+ vit_model.vision_model.encoder.layers.23.self_attn.k_proj.bias
1070
+ vit_model.vision_model.encoder.layers.23.self_attn.k_proj.weight
1071
+ vit_model.vision_model.encoder.layers.23.self_attn.out_proj.bias
1072
+ vit_model.vision_model.encoder.layers.23.self_attn.out_proj.weight
1073
+ vit_model.vision_model.encoder.layers.23.self_attn.q_proj.bias
1074
+ vit_model.vision_model.encoder.layers.23.self_attn.q_proj.weight
1075
+ vit_model.vision_model.encoder.layers.23.self_attn.v_proj.bias
1076
+ vit_model.vision_model.encoder.layers.23.self_attn.v_proj.weight
1077
+ vit_model.vision_model.encoder.layers.24.layer_norm1.bias
1078
+ vit_model.vision_model.encoder.layers.24.layer_norm1.weight
1079
+ vit_model.vision_model.encoder.layers.24.layer_norm2.bias
1080
+ vit_model.vision_model.encoder.layers.24.layer_norm2.weight
1081
+ vit_model.vision_model.encoder.layers.24.mlp.fc1.bias
1082
+ vit_model.vision_model.encoder.layers.24.mlp.fc1.weight
1083
+ vit_model.vision_model.encoder.layers.24.mlp.fc2.bias
1084
+ vit_model.vision_model.encoder.layers.24.mlp.fc2.weight
1085
+ vit_model.vision_model.encoder.layers.24.self_attn.k_proj.bias
1086
+ vit_model.vision_model.encoder.layers.24.self_attn.k_proj.weight
1087
+ vit_model.vision_model.encoder.layers.24.self_attn.out_proj.bias
1088
+ vit_model.vision_model.encoder.layers.24.self_attn.out_proj.weight
1089
+ vit_model.vision_model.encoder.layers.24.self_attn.q_proj.bias
1090
+ vit_model.vision_model.encoder.layers.24.self_attn.q_proj.weight
1091
+ vit_model.vision_model.encoder.layers.24.self_attn.v_proj.bias
1092
+ vit_model.vision_model.encoder.layers.24.self_attn.v_proj.weight
1093
+ vit_model.vision_model.encoder.layers.25.layer_norm1.bias
1094
+ vit_model.vision_model.encoder.layers.25.layer_norm1.weight
1095
+ vit_model.vision_model.encoder.layers.25.layer_norm2.bias
1096
+ vit_model.vision_model.encoder.layers.25.layer_norm2.weight
1097
+ vit_model.vision_model.encoder.layers.25.mlp.fc1.bias
1098
+ vit_model.vision_model.encoder.layers.25.mlp.fc1.weight
1099
+ vit_model.vision_model.encoder.layers.25.mlp.fc2.bias
1100
+ vit_model.vision_model.encoder.layers.25.mlp.fc2.weight
1101
+ vit_model.vision_model.encoder.layers.25.self_attn.k_proj.bias
1102
+ vit_model.vision_model.encoder.layers.25.self_attn.k_proj.weight
1103
+ vit_model.vision_model.encoder.layers.25.self_attn.out_proj.bias
1104
+ vit_model.vision_model.encoder.layers.25.self_attn.out_proj.weight
1105
+ vit_model.vision_model.encoder.layers.25.self_attn.q_proj.bias
1106
+ vit_model.vision_model.encoder.layers.25.self_attn.q_proj.weight
1107
+ vit_model.vision_model.encoder.layers.25.self_attn.v_proj.bias
1108
+ vit_model.vision_model.encoder.layers.25.self_attn.v_proj.weight
1109
+ vit_model.vision_model.encoder.layers.3.layer_norm1.bias
1110
+ vit_model.vision_model.encoder.layers.3.layer_norm1.weight
1111
+ vit_model.vision_model.encoder.layers.3.layer_norm2.bias
1112
+ vit_model.vision_model.encoder.layers.3.layer_norm2.weight
1113
+ vit_model.vision_model.encoder.layers.3.mlp.fc1.bias
1114
+ vit_model.vision_model.encoder.layers.3.mlp.fc1.weight
1115
+ vit_model.vision_model.encoder.layers.3.mlp.fc2.bias
1116
+ vit_model.vision_model.encoder.layers.3.mlp.fc2.weight
1117
+ vit_model.vision_model.encoder.layers.3.self_attn.k_proj.bias
1118
+ vit_model.vision_model.encoder.layers.3.self_attn.k_proj.weight
1119
+ vit_model.vision_model.encoder.layers.3.self_attn.out_proj.bias
1120
+ vit_model.vision_model.encoder.layers.3.self_attn.out_proj.weight
1121
+ vit_model.vision_model.encoder.layers.3.self_attn.q_proj.bias
1122
+ vit_model.vision_model.encoder.layers.3.self_attn.q_proj.weight
1123
+ vit_model.vision_model.encoder.layers.3.self_attn.v_proj.bias
1124
+ vit_model.vision_model.encoder.layers.3.self_attn.v_proj.weight
1125
+ vit_model.vision_model.encoder.layers.4.layer_norm1.bias
1126
+ vit_model.vision_model.encoder.layers.4.layer_norm1.weight
1127
+ vit_model.vision_model.encoder.layers.4.layer_norm2.bias
1128
+ vit_model.vision_model.encoder.layers.4.layer_norm2.weight
1129
+ vit_model.vision_model.encoder.layers.4.mlp.fc1.bias
1130
+ vit_model.vision_model.encoder.layers.4.mlp.fc1.weight
1131
+ vit_model.vision_model.encoder.layers.4.mlp.fc2.bias
1132
+ vit_model.vision_model.encoder.layers.4.mlp.fc2.weight
1133
+ vit_model.vision_model.encoder.layers.4.self_attn.k_proj.bias
1134
+ vit_model.vision_model.encoder.layers.4.self_attn.k_proj.weight
1135
+ vit_model.vision_model.encoder.layers.4.self_attn.out_proj.bias
1136
+ vit_model.vision_model.encoder.layers.4.self_attn.out_proj.weight
1137
+ vit_model.vision_model.encoder.layers.4.self_attn.q_proj.bias
1138
+ vit_model.vision_model.encoder.layers.4.self_attn.q_proj.weight
1139
+ vit_model.vision_model.encoder.layers.4.self_attn.v_proj.bias
1140
+ vit_model.vision_model.encoder.layers.4.self_attn.v_proj.weight
1141
+ vit_model.vision_model.encoder.layers.5.layer_norm1.bias
1142
+ vit_model.vision_model.encoder.layers.5.layer_norm1.weight
1143
+ vit_model.vision_model.encoder.layers.5.layer_norm2.bias
1144
+ vit_model.vision_model.encoder.layers.5.layer_norm2.weight
1145
+ vit_model.vision_model.encoder.layers.5.mlp.fc1.bias
1146
+ vit_model.vision_model.encoder.layers.5.mlp.fc1.weight
1147
+ vit_model.vision_model.encoder.layers.5.mlp.fc2.bias
1148
+ vit_model.vision_model.encoder.layers.5.mlp.fc2.weight
1149
+ vit_model.vision_model.encoder.layers.5.self_attn.k_proj.bias
1150
+ vit_model.vision_model.encoder.layers.5.self_attn.k_proj.weight
1151
+ vit_model.vision_model.encoder.layers.5.self_attn.out_proj.bias
1152
+ vit_model.vision_model.encoder.layers.5.self_attn.out_proj.weight
1153
+ vit_model.vision_model.encoder.layers.5.self_attn.q_proj.bias
1154
+ vit_model.vision_model.encoder.layers.5.self_attn.q_proj.weight
1155
+ vit_model.vision_model.encoder.layers.5.self_attn.v_proj.bias
1156
+ vit_model.vision_model.encoder.layers.5.self_attn.v_proj.weight
1157
+ vit_model.vision_model.encoder.layers.6.layer_norm1.bias
1158
+ vit_model.vision_model.encoder.layers.6.layer_norm1.weight
1159
+ vit_model.vision_model.encoder.layers.6.layer_norm2.bias
1160
+ vit_model.vision_model.encoder.layers.6.layer_norm2.weight
1161
+ vit_model.vision_model.encoder.layers.6.mlp.fc1.bias
1162
+ vit_model.vision_model.encoder.layers.6.mlp.fc1.weight
1163
+ vit_model.vision_model.encoder.layers.6.mlp.fc2.bias
1164
+ vit_model.vision_model.encoder.layers.6.mlp.fc2.weight
1165
+ vit_model.vision_model.encoder.layers.6.self_attn.k_proj.bias
1166
+ vit_model.vision_model.encoder.layers.6.self_attn.k_proj.weight
1167
+ vit_model.vision_model.encoder.layers.6.self_attn.out_proj.bias
1168
+ vit_model.vision_model.encoder.layers.6.self_attn.out_proj.weight
1169
+ vit_model.vision_model.encoder.layers.6.self_attn.q_proj.bias
1170
+ vit_model.vision_model.encoder.layers.6.self_attn.q_proj.weight
1171
+ vit_model.vision_model.encoder.layers.6.self_attn.v_proj.bias
1172
+ vit_model.vision_model.encoder.layers.6.self_attn.v_proj.weight
1173
+ vit_model.vision_model.encoder.layers.7.layer_norm1.bias
1174
+ vit_model.vision_model.encoder.layers.7.layer_norm1.weight
1175
+ vit_model.vision_model.encoder.layers.7.layer_norm2.bias
1176
+ vit_model.vision_model.encoder.layers.7.layer_norm2.weight
1177
+ vit_model.vision_model.encoder.layers.7.mlp.fc1.bias
1178
+ vit_model.vision_model.encoder.layers.7.mlp.fc1.weight
1179
+ vit_model.vision_model.encoder.layers.7.mlp.fc2.bias
1180
+ vit_model.vision_model.encoder.layers.7.mlp.fc2.weight
1181
+ vit_model.vision_model.encoder.layers.7.self_attn.k_proj.bias
1182
+ vit_model.vision_model.encoder.layers.7.self_attn.k_proj.weight
1183
+ vit_model.vision_model.encoder.layers.7.self_attn.out_proj.bias
1184
+ vit_model.vision_model.encoder.layers.7.self_attn.out_proj.weight
1185
+ vit_model.vision_model.encoder.layers.7.self_attn.q_proj.bias
1186
+ vit_model.vision_model.encoder.layers.7.self_attn.q_proj.weight
1187
+ vit_model.vision_model.encoder.layers.7.self_attn.v_proj.bias
1188
+ vit_model.vision_model.encoder.layers.7.self_attn.v_proj.weight
1189
+ vit_model.vision_model.encoder.layers.8.layer_norm1.bias
1190
+ vit_model.vision_model.encoder.layers.8.layer_norm1.weight
1191
+ vit_model.vision_model.encoder.layers.8.layer_norm2.bias
1192
+ vit_model.vision_model.encoder.layers.8.layer_norm2.weight
1193
+ vit_model.vision_model.encoder.layers.8.mlp.fc1.bias
1194
+ vit_model.vision_model.encoder.layers.8.mlp.fc1.weight
1195
+ vit_model.vision_model.encoder.layers.8.mlp.fc2.bias
1196
+ vit_model.vision_model.encoder.layers.8.mlp.fc2.weight
1197
+ vit_model.vision_model.encoder.layers.8.self_attn.k_proj.bias
1198
+ vit_model.vision_model.encoder.layers.8.self_attn.k_proj.weight
1199
+ vit_model.vision_model.encoder.layers.8.self_attn.out_proj.bias
1200
+ vit_model.vision_model.encoder.layers.8.self_attn.out_proj.weight
1201
+ vit_model.vision_model.encoder.layers.8.self_attn.q_proj.bias
1202
+ vit_model.vision_model.encoder.layers.8.self_attn.q_proj.weight
1203
+ vit_model.vision_model.encoder.layers.8.self_attn.v_proj.bias
1204
+ vit_model.vision_model.encoder.layers.8.self_attn.v_proj.weight
1205
+ vit_model.vision_model.encoder.layers.9.layer_norm1.bias
1206
+ vit_model.vision_model.encoder.layers.9.layer_norm1.weight
1207
+ vit_model.vision_model.encoder.layers.9.layer_norm2.bias
1208
+ vit_model.vision_model.encoder.layers.9.layer_norm2.weight
1209
+ vit_model.vision_model.encoder.layers.9.mlp.fc1.bias
1210
+ vit_model.vision_model.encoder.layers.9.mlp.fc1.weight
1211
+ vit_model.vision_model.encoder.layers.9.mlp.fc2.bias
1212
+ vit_model.vision_model.encoder.layers.9.mlp.fc2.weight
1213
+ vit_model.vision_model.encoder.layers.9.self_attn.k_proj.bias
1214
+ vit_model.vision_model.encoder.layers.9.self_attn.k_proj.weight
1215
+ vit_model.vision_model.encoder.layers.9.self_attn.out_proj.bias
1216
+ vit_model.vision_model.encoder.layers.9.self_attn.out_proj.weight
1217
+ vit_model.vision_model.encoder.layers.9.self_attn.q_proj.bias
1218
+ vit_model.vision_model.encoder.layers.9.self_attn.q_proj.weight
1219
+ vit_model.vision_model.encoder.layers.9.self_attn.v_proj.bias
1220
+ vit_model.vision_model.encoder.layers.9.self_attn.v_proj.weight
1221
+ vit_model.vision_model.post_layernorm.bias
1222
+ vit_model.vision_model.post_layernorm.weight
1223
+ vit_pos_embed.pos_embed
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
vit_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "hidden_size": 1152,
3
+ "image_size": 980,
4
+ "intermediate_size": 4304,
5
+ "model_type": "siglip_vision_model",
6
+ "num_attention_heads": 16,
7
+ "num_hidden_layers": 27,
8
+ "patch_size": 14
9
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff