OpenTransformer commited on
Commit
cc7ab1e
·
verified ·
1 Parent(s): 834d8e0

Add files using upload-large-folder tool

Browse files
deepseek-r1-1.5b-gunary/manifest.json ADDED
@@ -0,0 +1,1221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "unary": {
3
+ "model.layers.0.self_attn.q_proj.weight": [
4
+ 1536,
5
+ 1536
6
+ ],
7
+ "model.layers.0.self_attn.k_proj.weight": [
8
+ 256,
9
+ 1536
10
+ ],
11
+ "model.layers.0.self_attn.v_proj.weight": [
12
+ 256,
13
+ 1536
14
+ ],
15
+ "model.layers.0.self_attn.o_proj.weight": [
16
+ 1536,
17
+ 1536
18
+ ],
19
+ "model.layers.0.mlp.gate_proj.weight": [
20
+ 8960,
21
+ 1536
22
+ ],
23
+ "model.layers.0.mlp.up_proj.weight": [
24
+ 8960,
25
+ 1536
26
+ ],
27
+ "model.layers.0.mlp.down_proj.weight": [
28
+ 1536,
29
+ 8960
30
+ ],
31
+ "model.layers.1.self_attn.q_proj.weight": [
32
+ 1536,
33
+ 1536
34
+ ],
35
+ "model.layers.1.self_attn.k_proj.weight": [
36
+ 256,
37
+ 1536
38
+ ],
39
+ "model.layers.1.self_attn.v_proj.weight": [
40
+ 256,
41
+ 1536
42
+ ],
43
+ "model.layers.1.self_attn.o_proj.weight": [
44
+ 1536,
45
+ 1536
46
+ ],
47
+ "model.layers.1.mlp.gate_proj.weight": [
48
+ 8960,
49
+ 1536
50
+ ],
51
+ "model.layers.1.mlp.up_proj.weight": [
52
+ 8960,
53
+ 1536
54
+ ],
55
+ "model.layers.1.mlp.down_proj.weight": [
56
+ 1536,
57
+ 8960
58
+ ],
59
+ "model.layers.2.self_attn.q_proj.weight": [
60
+ 1536,
61
+ 1536
62
+ ],
63
+ "model.layers.2.self_attn.k_proj.weight": [
64
+ 256,
65
+ 1536
66
+ ],
67
+ "model.layers.2.self_attn.v_proj.weight": [
68
+ 256,
69
+ 1536
70
+ ],
71
+ "model.layers.2.self_attn.o_proj.weight": [
72
+ 1536,
73
+ 1536
74
+ ],
75
+ "model.layers.2.mlp.gate_proj.weight": [
76
+ 8960,
77
+ 1536
78
+ ],
79
+ "model.layers.2.mlp.up_proj.weight": [
80
+ 8960,
81
+ 1536
82
+ ],
83
+ "model.layers.2.mlp.down_proj.weight": [
84
+ 1536,
85
+ 8960
86
+ ],
87
+ "model.layers.3.self_attn.q_proj.weight": [
88
+ 1536,
89
+ 1536
90
+ ],
91
+ "model.layers.3.self_attn.k_proj.weight": [
92
+ 256,
93
+ 1536
94
+ ],
95
+ "model.layers.3.self_attn.v_proj.weight": [
96
+ 256,
97
+ 1536
98
+ ],
99
+ "model.layers.3.self_attn.o_proj.weight": [
100
+ 1536,
101
+ 1536
102
+ ],
103
+ "model.layers.3.mlp.gate_proj.weight": [
104
+ 8960,
105
+ 1536
106
+ ],
107
+ "model.layers.3.mlp.up_proj.weight": [
108
+ 8960,
109
+ 1536
110
+ ],
111
+ "model.layers.3.mlp.down_proj.weight": [
112
+ 1536,
113
+ 8960
114
+ ],
115
+ "model.layers.4.self_attn.q_proj.weight": [
116
+ 1536,
117
+ 1536
118
+ ],
119
+ "model.layers.4.self_attn.k_proj.weight": [
120
+ 256,
121
+ 1536
122
+ ],
123
+ "model.layers.4.self_attn.v_proj.weight": [
124
+ 256,
125
+ 1536
126
+ ],
127
+ "model.layers.4.self_attn.o_proj.weight": [
128
+ 1536,
129
+ 1536
130
+ ],
131
+ "model.layers.4.mlp.gate_proj.weight": [
132
+ 8960,
133
+ 1536
134
+ ],
135
+ "model.layers.4.mlp.up_proj.weight": [
136
+ 8960,
137
+ 1536
138
+ ],
139
+ "model.layers.4.mlp.down_proj.weight": [
140
+ 1536,
141
+ 8960
142
+ ],
143
+ "model.layers.5.self_attn.q_proj.weight": [
144
+ 1536,
145
+ 1536
146
+ ],
147
+ "model.layers.5.self_attn.k_proj.weight": [
148
+ 256,
149
+ 1536
150
+ ],
151
+ "model.layers.5.self_attn.v_proj.weight": [
152
+ 256,
153
+ 1536
154
+ ],
155
+ "model.layers.5.self_attn.o_proj.weight": [
156
+ 1536,
157
+ 1536
158
+ ],
159
+ "model.layers.5.mlp.gate_proj.weight": [
160
+ 8960,
161
+ 1536
162
+ ],
163
+ "model.layers.5.mlp.up_proj.weight": [
164
+ 8960,
165
+ 1536
166
+ ],
167
+ "model.layers.5.mlp.down_proj.weight": [
168
+ 1536,
169
+ 8960
170
+ ],
171
+ "model.layers.6.self_attn.q_proj.weight": [
172
+ 1536,
173
+ 1536
174
+ ],
175
+ "model.layers.6.self_attn.k_proj.weight": [
176
+ 256,
177
+ 1536
178
+ ],
179
+ "model.layers.6.self_attn.v_proj.weight": [
180
+ 256,
181
+ 1536
182
+ ],
183
+ "model.layers.6.self_attn.o_proj.weight": [
184
+ 1536,
185
+ 1536
186
+ ],
187
+ "model.layers.6.mlp.gate_proj.weight": [
188
+ 8960,
189
+ 1536
190
+ ],
191
+ "model.layers.6.mlp.up_proj.weight": [
192
+ 8960,
193
+ 1536
194
+ ],
195
+ "model.layers.6.mlp.down_proj.weight": [
196
+ 1536,
197
+ 8960
198
+ ],
199
+ "model.layers.7.self_attn.q_proj.weight": [
200
+ 1536,
201
+ 1536
202
+ ],
203
+ "model.layers.7.self_attn.k_proj.weight": [
204
+ 256,
205
+ 1536
206
+ ],
207
+ "model.layers.7.self_attn.v_proj.weight": [
208
+ 256,
209
+ 1536
210
+ ],
211
+ "model.layers.7.self_attn.o_proj.weight": [
212
+ 1536,
213
+ 1536
214
+ ],
215
+ "model.layers.7.mlp.gate_proj.weight": [
216
+ 8960,
217
+ 1536
218
+ ],
219
+ "model.layers.7.mlp.up_proj.weight": [
220
+ 8960,
221
+ 1536
222
+ ],
223
+ "model.layers.7.mlp.down_proj.weight": [
224
+ 1536,
225
+ 8960
226
+ ],
227
+ "model.layers.8.self_attn.q_proj.weight": [
228
+ 1536,
229
+ 1536
230
+ ],
231
+ "model.layers.8.self_attn.k_proj.weight": [
232
+ 256,
233
+ 1536
234
+ ],
235
+ "model.layers.8.self_attn.v_proj.weight": [
236
+ 256,
237
+ 1536
238
+ ],
239
+ "model.layers.8.self_attn.o_proj.weight": [
240
+ 1536,
241
+ 1536
242
+ ],
243
+ "model.layers.8.mlp.gate_proj.weight": [
244
+ 8960,
245
+ 1536
246
+ ],
247
+ "model.layers.8.mlp.up_proj.weight": [
248
+ 8960,
249
+ 1536
250
+ ],
251
+ "model.layers.8.mlp.down_proj.weight": [
252
+ 1536,
253
+ 8960
254
+ ],
255
+ "model.layers.9.self_attn.q_proj.weight": [
256
+ 1536,
257
+ 1536
258
+ ],
259
+ "model.layers.9.self_attn.k_proj.weight": [
260
+ 256,
261
+ 1536
262
+ ],
263
+ "model.layers.9.self_attn.v_proj.weight": [
264
+ 256,
265
+ 1536
266
+ ],
267
+ "model.layers.9.self_attn.o_proj.weight": [
268
+ 1536,
269
+ 1536
270
+ ],
271
+ "model.layers.9.mlp.gate_proj.weight": [
272
+ 8960,
273
+ 1536
274
+ ],
275
+ "model.layers.9.mlp.up_proj.weight": [
276
+ 8960,
277
+ 1536
278
+ ],
279
+ "model.layers.9.mlp.down_proj.weight": [
280
+ 1536,
281
+ 8960
282
+ ],
283
+ "model.layers.10.self_attn.q_proj.weight": [
284
+ 1536,
285
+ 1536
286
+ ],
287
+ "model.layers.10.self_attn.k_proj.weight": [
288
+ 256,
289
+ 1536
290
+ ],
291
+ "model.layers.10.self_attn.v_proj.weight": [
292
+ 256,
293
+ 1536
294
+ ],
295
+ "model.layers.10.self_attn.o_proj.weight": [
296
+ 1536,
297
+ 1536
298
+ ],
299
+ "model.layers.10.mlp.gate_proj.weight": [
300
+ 8960,
301
+ 1536
302
+ ],
303
+ "model.layers.10.mlp.up_proj.weight": [
304
+ 8960,
305
+ 1536
306
+ ],
307
+ "model.layers.10.mlp.down_proj.weight": [
308
+ 1536,
309
+ 8960
310
+ ],
311
+ "model.layers.11.self_attn.q_proj.weight": [
312
+ 1536,
313
+ 1536
314
+ ],
315
+ "model.layers.11.self_attn.k_proj.weight": [
316
+ 256,
317
+ 1536
318
+ ],
319
+ "model.layers.11.self_attn.v_proj.weight": [
320
+ 256,
321
+ 1536
322
+ ],
323
+ "model.layers.11.self_attn.o_proj.weight": [
324
+ 1536,
325
+ 1536
326
+ ],
327
+ "model.layers.11.mlp.gate_proj.weight": [
328
+ 8960,
329
+ 1536
330
+ ],
331
+ "model.layers.11.mlp.up_proj.weight": [
332
+ 8960,
333
+ 1536
334
+ ],
335
+ "model.layers.11.mlp.down_proj.weight": [
336
+ 1536,
337
+ 8960
338
+ ],
339
+ "model.layers.12.self_attn.q_proj.weight": [
340
+ 1536,
341
+ 1536
342
+ ],
343
+ "model.layers.12.self_attn.k_proj.weight": [
344
+ 256,
345
+ 1536
346
+ ],
347
+ "model.layers.12.self_attn.v_proj.weight": [
348
+ 256,
349
+ 1536
350
+ ],
351
+ "model.layers.12.self_attn.o_proj.weight": [
352
+ 1536,
353
+ 1536
354
+ ],
355
+ "model.layers.12.mlp.gate_proj.weight": [
356
+ 8960,
357
+ 1536
358
+ ],
359
+ "model.layers.12.mlp.up_proj.weight": [
360
+ 8960,
361
+ 1536
362
+ ],
363
+ "model.layers.12.mlp.down_proj.weight": [
364
+ 1536,
365
+ 8960
366
+ ],
367
+ "model.layers.13.self_attn.q_proj.weight": [
368
+ 1536,
369
+ 1536
370
+ ],
371
+ "model.layers.13.self_attn.k_proj.weight": [
372
+ 256,
373
+ 1536
374
+ ],
375
+ "model.layers.13.self_attn.v_proj.weight": [
376
+ 256,
377
+ 1536
378
+ ],
379
+ "model.layers.13.self_attn.o_proj.weight": [
380
+ 1536,
381
+ 1536
382
+ ],
383
+ "model.layers.13.mlp.gate_proj.weight": [
384
+ 8960,
385
+ 1536
386
+ ],
387
+ "model.layers.13.mlp.up_proj.weight": [
388
+ 8960,
389
+ 1536
390
+ ],
391
+ "model.layers.13.mlp.down_proj.weight": [
392
+ 1536,
393
+ 8960
394
+ ],
395
+ "model.layers.14.self_attn.q_proj.weight": [
396
+ 1536,
397
+ 1536
398
+ ],
399
+ "model.layers.14.self_attn.k_proj.weight": [
400
+ 256,
401
+ 1536
402
+ ],
403
+ "model.layers.14.self_attn.v_proj.weight": [
404
+ 256,
405
+ 1536
406
+ ],
407
+ "model.layers.14.self_attn.o_proj.weight": [
408
+ 1536,
409
+ 1536
410
+ ],
411
+ "model.layers.14.mlp.gate_proj.weight": [
412
+ 8960,
413
+ 1536
414
+ ],
415
+ "model.layers.14.mlp.up_proj.weight": [
416
+ 8960,
417
+ 1536
418
+ ],
419
+ "model.layers.14.mlp.down_proj.weight": [
420
+ 1536,
421
+ 8960
422
+ ],
423
+ "model.layers.15.self_attn.q_proj.weight": [
424
+ 1536,
425
+ 1536
426
+ ],
427
+ "model.layers.15.self_attn.k_proj.weight": [
428
+ 256,
429
+ 1536
430
+ ],
431
+ "model.layers.15.self_attn.v_proj.weight": [
432
+ 256,
433
+ 1536
434
+ ],
435
+ "model.layers.15.self_attn.o_proj.weight": [
436
+ 1536,
437
+ 1536
438
+ ],
439
+ "model.layers.15.mlp.gate_proj.weight": [
440
+ 8960,
441
+ 1536
442
+ ],
443
+ "model.layers.15.mlp.up_proj.weight": [
444
+ 8960,
445
+ 1536
446
+ ],
447
+ "model.layers.15.mlp.down_proj.weight": [
448
+ 1536,
449
+ 8960
450
+ ],
451
+ "model.layers.16.self_attn.q_proj.weight": [
452
+ 1536,
453
+ 1536
454
+ ],
455
+ "model.layers.16.self_attn.k_proj.weight": [
456
+ 256,
457
+ 1536
458
+ ],
459
+ "model.layers.16.self_attn.v_proj.weight": [
460
+ 256,
461
+ 1536
462
+ ],
463
+ "model.layers.16.self_attn.o_proj.weight": [
464
+ 1536,
465
+ 1536
466
+ ],
467
+ "model.layers.16.mlp.gate_proj.weight": [
468
+ 8960,
469
+ 1536
470
+ ],
471
+ "model.layers.16.mlp.up_proj.weight": [
472
+ 8960,
473
+ 1536
474
+ ],
475
+ "model.layers.16.mlp.down_proj.weight": [
476
+ 1536,
477
+ 8960
478
+ ],
479
+ "model.layers.17.self_attn.q_proj.weight": [
480
+ 1536,
481
+ 1536
482
+ ],
483
+ "model.layers.17.self_attn.k_proj.weight": [
484
+ 256,
485
+ 1536
486
+ ],
487
+ "model.layers.17.self_attn.v_proj.weight": [
488
+ 256,
489
+ 1536
490
+ ],
491
+ "model.layers.17.self_attn.o_proj.weight": [
492
+ 1536,
493
+ 1536
494
+ ],
495
+ "model.layers.17.mlp.gate_proj.weight": [
496
+ 8960,
497
+ 1536
498
+ ],
499
+ "model.layers.17.mlp.up_proj.weight": [
500
+ 8960,
501
+ 1536
502
+ ],
503
+ "model.layers.17.mlp.down_proj.weight": [
504
+ 1536,
505
+ 8960
506
+ ],
507
+ "model.layers.18.self_attn.q_proj.weight": [
508
+ 1536,
509
+ 1536
510
+ ],
511
+ "model.layers.18.self_attn.k_proj.weight": [
512
+ 256,
513
+ 1536
514
+ ],
515
+ "model.layers.18.self_attn.v_proj.weight": [
516
+ 256,
517
+ 1536
518
+ ],
519
+ "model.layers.18.self_attn.o_proj.weight": [
520
+ 1536,
521
+ 1536
522
+ ],
523
+ "model.layers.18.mlp.gate_proj.weight": [
524
+ 8960,
525
+ 1536
526
+ ],
527
+ "model.layers.18.mlp.up_proj.weight": [
528
+ 8960,
529
+ 1536
530
+ ],
531
+ "model.layers.18.mlp.down_proj.weight": [
532
+ 1536,
533
+ 8960
534
+ ],
535
+ "model.layers.19.self_attn.q_proj.weight": [
536
+ 1536,
537
+ 1536
538
+ ],
539
+ "model.layers.19.self_attn.k_proj.weight": [
540
+ 256,
541
+ 1536
542
+ ],
543
+ "model.layers.19.self_attn.v_proj.weight": [
544
+ 256,
545
+ 1536
546
+ ],
547
+ "model.layers.19.self_attn.o_proj.weight": [
548
+ 1536,
549
+ 1536
550
+ ],
551
+ "model.layers.19.mlp.gate_proj.weight": [
552
+ 8960,
553
+ 1536
554
+ ],
555
+ "model.layers.19.mlp.up_proj.weight": [
556
+ 8960,
557
+ 1536
558
+ ],
559
+ "model.layers.19.mlp.down_proj.weight": [
560
+ 1536,
561
+ 8960
562
+ ],
563
+ "model.layers.20.self_attn.q_proj.weight": [
564
+ 1536,
565
+ 1536
566
+ ],
567
+ "model.layers.20.self_attn.k_proj.weight": [
568
+ 256,
569
+ 1536
570
+ ],
571
+ "model.layers.20.self_attn.v_proj.weight": [
572
+ 256,
573
+ 1536
574
+ ],
575
+ "model.layers.20.self_attn.o_proj.weight": [
576
+ 1536,
577
+ 1536
578
+ ],
579
+ "model.layers.20.mlp.gate_proj.weight": [
580
+ 8960,
581
+ 1536
582
+ ],
583
+ "model.layers.20.mlp.up_proj.weight": [
584
+ 8960,
585
+ 1536
586
+ ],
587
+ "model.layers.20.mlp.down_proj.weight": [
588
+ 1536,
589
+ 8960
590
+ ],
591
+ "model.layers.21.self_attn.q_proj.weight": [
592
+ 1536,
593
+ 1536
594
+ ],
595
+ "model.layers.21.self_attn.k_proj.weight": [
596
+ 256,
597
+ 1536
598
+ ],
599
+ "model.layers.21.self_attn.v_proj.weight": [
600
+ 256,
601
+ 1536
602
+ ],
603
+ "model.layers.21.self_attn.o_proj.weight": [
604
+ 1536,
605
+ 1536
606
+ ],
607
+ "model.layers.21.mlp.gate_proj.weight": [
608
+ 8960,
609
+ 1536
610
+ ],
611
+ "model.layers.21.mlp.up_proj.weight": [
612
+ 8960,
613
+ 1536
614
+ ],
615
+ "model.layers.21.mlp.down_proj.weight": [
616
+ 1536,
617
+ 8960
618
+ ],
619
+ "model.layers.22.self_attn.q_proj.weight": [
620
+ 1536,
621
+ 1536
622
+ ],
623
+ "model.layers.22.self_attn.k_proj.weight": [
624
+ 256,
625
+ 1536
626
+ ],
627
+ "model.layers.22.self_attn.v_proj.weight": [
628
+ 256,
629
+ 1536
630
+ ],
631
+ "model.layers.22.self_attn.o_proj.weight": [
632
+ 1536,
633
+ 1536
634
+ ],
635
+ "model.layers.22.mlp.gate_proj.weight": [
636
+ 8960,
637
+ 1536
638
+ ],
639
+ "model.layers.22.mlp.up_proj.weight": [
640
+ 8960,
641
+ 1536
642
+ ],
643
+ "model.layers.22.mlp.down_proj.weight": [
644
+ 1536,
645
+ 8960
646
+ ],
647
+ "model.layers.23.self_attn.q_proj.weight": [
648
+ 1536,
649
+ 1536
650
+ ],
651
+ "model.layers.23.self_attn.k_proj.weight": [
652
+ 256,
653
+ 1536
654
+ ],
655
+ "model.layers.23.self_attn.v_proj.weight": [
656
+ 256,
657
+ 1536
658
+ ],
659
+ "model.layers.23.self_attn.o_proj.weight": [
660
+ 1536,
661
+ 1536
662
+ ],
663
+ "model.layers.23.mlp.gate_proj.weight": [
664
+ 8960,
665
+ 1536
666
+ ],
667
+ "model.layers.23.mlp.up_proj.weight": [
668
+ 8960,
669
+ 1536
670
+ ],
671
+ "model.layers.23.mlp.down_proj.weight": [
672
+ 1536,
673
+ 8960
674
+ ],
675
+ "model.layers.24.self_attn.q_proj.weight": [
676
+ 1536,
677
+ 1536
678
+ ],
679
+ "model.layers.24.self_attn.k_proj.weight": [
680
+ 256,
681
+ 1536
682
+ ],
683
+ "model.layers.24.self_attn.v_proj.weight": [
684
+ 256,
685
+ 1536
686
+ ],
687
+ "model.layers.24.self_attn.o_proj.weight": [
688
+ 1536,
689
+ 1536
690
+ ],
691
+ "model.layers.24.mlp.gate_proj.weight": [
692
+ 8960,
693
+ 1536
694
+ ],
695
+ "model.layers.24.mlp.up_proj.weight": [
696
+ 8960,
697
+ 1536
698
+ ],
699
+ "model.layers.24.mlp.down_proj.weight": [
700
+ 1536,
701
+ 8960
702
+ ],
703
+ "model.layers.25.self_attn.q_proj.weight": [
704
+ 1536,
705
+ 1536
706
+ ],
707
+ "model.layers.25.self_attn.k_proj.weight": [
708
+ 256,
709
+ 1536
710
+ ],
711
+ "model.layers.25.self_attn.v_proj.weight": [
712
+ 256,
713
+ 1536
714
+ ],
715
+ "model.layers.25.self_attn.o_proj.weight": [
716
+ 1536,
717
+ 1536
718
+ ],
719
+ "model.layers.25.mlp.gate_proj.weight": [
720
+ 8960,
721
+ 1536
722
+ ],
723
+ "model.layers.25.mlp.up_proj.weight": [
724
+ 8960,
725
+ 1536
726
+ ],
727
+ "model.layers.25.mlp.down_proj.weight": [
728
+ 1536,
729
+ 8960
730
+ ],
731
+ "model.layers.26.self_attn.q_proj.weight": [
732
+ 1536,
733
+ 1536
734
+ ],
735
+ "model.layers.26.self_attn.k_proj.weight": [
736
+ 256,
737
+ 1536
738
+ ],
739
+ "model.layers.26.self_attn.v_proj.weight": [
740
+ 256,
741
+ 1536
742
+ ],
743
+ "model.layers.26.self_attn.o_proj.weight": [
744
+ 1536,
745
+ 1536
746
+ ],
747
+ "model.layers.26.mlp.gate_proj.weight": [
748
+ 8960,
749
+ 1536
750
+ ],
751
+ "model.layers.26.mlp.up_proj.weight": [
752
+ 8960,
753
+ 1536
754
+ ],
755
+ "model.layers.26.mlp.down_proj.weight": [
756
+ 1536,
757
+ 8960
758
+ ],
759
+ "model.layers.27.self_attn.q_proj.weight": [
760
+ 1536,
761
+ 1536
762
+ ],
763
+ "model.layers.27.self_attn.k_proj.weight": [
764
+ 256,
765
+ 1536
766
+ ],
767
+ "model.layers.27.self_attn.v_proj.weight": [
768
+ 256,
769
+ 1536
770
+ ],
771
+ "model.layers.27.self_attn.o_proj.weight": [
772
+ 1536,
773
+ 1536
774
+ ],
775
+ "model.layers.27.mlp.gate_proj.weight": [
776
+ 8960,
777
+ 1536
778
+ ],
779
+ "model.layers.27.mlp.up_proj.weight": [
780
+ 8960,
781
+ 1536
782
+ ],
783
+ "model.layers.27.mlp.down_proj.weight": [
784
+ 1536,
785
+ 8960
786
+ ]
787
+ },
788
+ "fp16": {
789
+ "model.embed_tokens.weight": [
790
+ 151936,
791
+ 1536
792
+ ],
793
+ "model.layers.0.self_attn.q_proj.bias": [
794
+ 1536
795
+ ],
796
+ "model.layers.0.self_attn.k_proj.bias": [
797
+ 256
798
+ ],
799
+ "model.layers.0.self_attn.v_proj.bias": [
800
+ 256
801
+ ],
802
+ "model.layers.0.input_layernorm.weight": [
803
+ 1536
804
+ ],
805
+ "model.layers.0.post_attention_layernorm.weight": [
806
+ 1536
807
+ ],
808
+ "model.layers.1.self_attn.q_proj.bias": [
809
+ 1536
810
+ ],
811
+ "model.layers.1.self_attn.k_proj.bias": [
812
+ 256
813
+ ],
814
+ "model.layers.1.self_attn.v_proj.bias": [
815
+ 256
816
+ ],
817
+ "model.layers.1.input_layernorm.weight": [
818
+ 1536
819
+ ],
820
+ "model.layers.1.post_attention_layernorm.weight": [
821
+ 1536
822
+ ],
823
+ "model.layers.2.self_attn.q_proj.bias": [
824
+ 1536
825
+ ],
826
+ "model.layers.2.self_attn.k_proj.bias": [
827
+ 256
828
+ ],
829
+ "model.layers.2.self_attn.v_proj.bias": [
830
+ 256
831
+ ],
832
+ "model.layers.2.input_layernorm.weight": [
833
+ 1536
834
+ ],
835
+ "model.layers.2.post_attention_layernorm.weight": [
836
+ 1536
837
+ ],
838
+ "model.layers.3.self_attn.q_proj.bias": [
839
+ 1536
840
+ ],
841
+ "model.layers.3.self_attn.k_proj.bias": [
842
+ 256
843
+ ],
844
+ "model.layers.3.self_attn.v_proj.bias": [
845
+ 256
846
+ ],
847
+ "model.layers.3.input_layernorm.weight": [
848
+ 1536
849
+ ],
850
+ "model.layers.3.post_attention_layernorm.weight": [
851
+ 1536
852
+ ],
853
+ "model.layers.4.self_attn.q_proj.bias": [
854
+ 1536
855
+ ],
856
+ "model.layers.4.self_attn.k_proj.bias": [
857
+ 256
858
+ ],
859
+ "model.layers.4.self_attn.v_proj.bias": [
860
+ 256
861
+ ],
862
+ "model.layers.4.input_layernorm.weight": [
863
+ 1536
864
+ ],
865
+ "model.layers.4.post_attention_layernorm.weight": [
866
+ 1536
867
+ ],
868
+ "model.layers.5.self_attn.q_proj.bias": [
869
+ 1536
870
+ ],
871
+ "model.layers.5.self_attn.k_proj.bias": [
872
+ 256
873
+ ],
874
+ "model.layers.5.self_attn.v_proj.bias": [
875
+ 256
876
+ ],
877
+ "model.layers.5.input_layernorm.weight": [
878
+ 1536
879
+ ],
880
+ "model.layers.5.post_attention_layernorm.weight": [
881
+ 1536
882
+ ],
883
+ "model.layers.6.self_attn.q_proj.bias": [
884
+ 1536
885
+ ],
886
+ "model.layers.6.self_attn.k_proj.bias": [
887
+ 256
888
+ ],
889
+ "model.layers.6.self_attn.v_proj.bias": [
890
+ 256
891
+ ],
892
+ "model.layers.6.input_layernorm.weight": [
893
+ 1536
894
+ ],
895
+ "model.layers.6.post_attention_layernorm.weight": [
896
+ 1536
897
+ ],
898
+ "model.layers.7.self_attn.q_proj.bias": [
899
+ 1536
900
+ ],
901
+ "model.layers.7.self_attn.k_proj.bias": [
902
+ 256
903
+ ],
904
+ "model.layers.7.self_attn.v_proj.bias": [
905
+ 256
906
+ ],
907
+ "model.layers.7.input_layernorm.weight": [
908
+ 1536
909
+ ],
910
+ "model.layers.7.post_attention_layernorm.weight": [
911
+ 1536
912
+ ],
913
+ "model.layers.8.self_attn.q_proj.bias": [
914
+ 1536
915
+ ],
916
+ "model.layers.8.self_attn.k_proj.bias": [
917
+ 256
918
+ ],
919
+ "model.layers.8.self_attn.v_proj.bias": [
920
+ 256
921
+ ],
922
+ "model.layers.8.input_layernorm.weight": [
923
+ 1536
924
+ ],
925
+ "model.layers.8.post_attention_layernorm.weight": [
926
+ 1536
927
+ ],
928
+ "model.layers.9.self_attn.q_proj.bias": [
929
+ 1536
930
+ ],
931
+ "model.layers.9.self_attn.k_proj.bias": [
932
+ 256
933
+ ],
934
+ "model.layers.9.self_attn.v_proj.bias": [
935
+ 256
936
+ ],
937
+ "model.layers.9.input_layernorm.weight": [
938
+ 1536
939
+ ],
940
+ "model.layers.9.post_attention_layernorm.weight": [
941
+ 1536
942
+ ],
943
+ "model.layers.10.self_attn.q_proj.bias": [
944
+ 1536
945
+ ],
946
+ "model.layers.10.self_attn.k_proj.bias": [
947
+ 256
948
+ ],
949
+ "model.layers.10.self_attn.v_proj.bias": [
950
+ 256
951
+ ],
952
+ "model.layers.10.input_layernorm.weight": [
953
+ 1536
954
+ ],
955
+ "model.layers.10.post_attention_layernorm.weight": [
956
+ 1536
957
+ ],
958
+ "model.layers.11.self_attn.q_proj.bias": [
959
+ 1536
960
+ ],
961
+ "model.layers.11.self_attn.k_proj.bias": [
962
+ 256
963
+ ],
964
+ "model.layers.11.self_attn.v_proj.bias": [
965
+ 256
966
+ ],
967
+ "model.layers.11.input_layernorm.weight": [
968
+ 1536
969
+ ],
970
+ "model.layers.11.post_attention_layernorm.weight": [
971
+ 1536
972
+ ],
973
+ "model.layers.12.self_attn.q_proj.bias": [
974
+ 1536
975
+ ],
976
+ "model.layers.12.self_attn.k_proj.bias": [
977
+ 256
978
+ ],
979
+ "model.layers.12.self_attn.v_proj.bias": [
980
+ 256
981
+ ],
982
+ "model.layers.12.input_layernorm.weight": [
983
+ 1536
984
+ ],
985
+ "model.layers.12.post_attention_layernorm.weight": [
986
+ 1536
987
+ ],
988
+ "model.layers.13.self_attn.q_proj.bias": [
989
+ 1536
990
+ ],
991
+ "model.layers.13.self_attn.k_proj.bias": [
992
+ 256
993
+ ],
994
+ "model.layers.13.self_attn.v_proj.bias": [
995
+ 256
996
+ ],
997
+ "model.layers.13.input_layernorm.weight": [
998
+ 1536
999
+ ],
1000
+ "model.layers.13.post_attention_layernorm.weight": [
1001
+ 1536
1002
+ ],
1003
+ "model.layers.14.self_attn.q_proj.bias": [
1004
+ 1536
1005
+ ],
1006
+ "model.layers.14.self_attn.k_proj.bias": [
1007
+ 256
1008
+ ],
1009
+ "model.layers.14.self_attn.v_proj.bias": [
1010
+ 256
1011
+ ],
1012
+ "model.layers.14.input_layernorm.weight": [
1013
+ 1536
1014
+ ],
1015
+ "model.layers.14.post_attention_layernorm.weight": [
1016
+ 1536
1017
+ ],
1018
+ "model.layers.15.self_attn.q_proj.bias": [
1019
+ 1536
1020
+ ],
1021
+ "model.layers.15.self_attn.k_proj.bias": [
1022
+ 256
1023
+ ],
1024
+ "model.layers.15.self_attn.v_proj.bias": [
1025
+ 256
1026
+ ],
1027
+ "model.layers.15.input_layernorm.weight": [
1028
+ 1536
1029
+ ],
1030
+ "model.layers.15.post_attention_layernorm.weight": [
1031
+ 1536
1032
+ ],
1033
+ "model.layers.16.self_attn.q_proj.bias": [
1034
+ 1536
1035
+ ],
1036
+ "model.layers.16.self_attn.k_proj.bias": [
1037
+ 256
1038
+ ],
1039
+ "model.layers.16.self_attn.v_proj.bias": [
1040
+ 256
1041
+ ],
1042
+ "model.layers.16.input_layernorm.weight": [
1043
+ 1536
1044
+ ],
1045
+ "model.layers.16.post_attention_layernorm.weight": [
1046
+ 1536
1047
+ ],
1048
+ "model.layers.17.self_attn.q_proj.bias": [
1049
+ 1536
1050
+ ],
1051
+ "model.layers.17.self_attn.k_proj.bias": [
1052
+ 256
1053
+ ],
1054
+ "model.layers.17.self_attn.v_proj.bias": [
1055
+ 256
1056
+ ],
1057
+ "model.layers.17.input_layernorm.weight": [
1058
+ 1536
1059
+ ],
1060
+ "model.layers.17.post_attention_layernorm.weight": [
1061
+ 1536
1062
+ ],
1063
+ "model.layers.18.self_attn.q_proj.bias": [
1064
+ 1536
1065
+ ],
1066
+ "model.layers.18.self_attn.k_proj.bias": [
1067
+ 256
1068
+ ],
1069
+ "model.layers.18.self_attn.v_proj.bias": [
1070
+ 256
1071
+ ],
1072
+ "model.layers.18.input_layernorm.weight": [
1073
+ 1536
1074
+ ],
1075
+ "model.layers.18.post_attention_layernorm.weight": [
1076
+ 1536
1077
+ ],
1078
+ "model.layers.19.self_attn.q_proj.bias": [
1079
+ 1536
1080
+ ],
1081
+ "model.layers.19.self_attn.k_proj.bias": [
1082
+ 256
1083
+ ],
1084
+ "model.layers.19.self_attn.v_proj.bias": [
1085
+ 256
1086
+ ],
1087
+ "model.layers.19.input_layernorm.weight": [
1088
+ 1536
1089
+ ],
1090
+ "model.layers.19.post_attention_layernorm.weight": [
1091
+ 1536
1092
+ ],
1093
+ "model.layers.20.self_attn.q_proj.bias": [
1094
+ 1536
1095
+ ],
1096
+ "model.layers.20.self_attn.k_proj.bias": [
1097
+ 256
1098
+ ],
1099
+ "model.layers.20.self_attn.v_proj.bias": [
1100
+ 256
1101
+ ],
1102
+ "model.layers.20.input_layernorm.weight": [
1103
+ 1536
1104
+ ],
1105
+ "model.layers.20.post_attention_layernorm.weight": [
1106
+ 1536
1107
+ ],
1108
+ "model.layers.21.self_attn.q_proj.bias": [
1109
+ 1536
1110
+ ],
1111
+ "model.layers.21.self_attn.k_proj.bias": [
1112
+ 256
1113
+ ],
1114
+ "model.layers.21.self_attn.v_proj.bias": [
1115
+ 256
1116
+ ],
1117
+ "model.layers.21.input_layernorm.weight": [
1118
+ 1536
1119
+ ],
1120
+ "model.layers.21.post_attention_layernorm.weight": [
1121
+ 1536
1122
+ ],
1123
+ "model.layers.22.self_attn.q_proj.bias": [
1124
+ 1536
1125
+ ],
1126
+ "model.layers.22.self_attn.k_proj.bias": [
1127
+ 256
1128
+ ],
1129
+ "model.layers.22.self_attn.v_proj.bias": [
1130
+ 256
1131
+ ],
1132
+ "model.layers.22.input_layernorm.weight": [
1133
+ 1536
1134
+ ],
1135
+ "model.layers.22.post_attention_layernorm.weight": [
1136
+ 1536
1137
+ ],
1138
+ "model.layers.23.self_attn.q_proj.bias": [
1139
+ 1536
1140
+ ],
1141
+ "model.layers.23.self_attn.k_proj.bias": [
1142
+ 256
1143
+ ],
1144
+ "model.layers.23.self_attn.v_proj.bias": [
1145
+ 256
1146
+ ],
1147
+ "model.layers.23.input_layernorm.weight": [
1148
+ 1536
1149
+ ],
1150
+ "model.layers.23.post_attention_layernorm.weight": [
1151
+ 1536
1152
+ ],
1153
+ "model.layers.24.self_attn.q_proj.bias": [
1154
+ 1536
1155
+ ],
1156
+ "model.layers.24.self_attn.k_proj.bias": [
1157
+ 256
1158
+ ],
1159
+ "model.layers.24.self_attn.v_proj.bias": [
1160
+ 256
1161
+ ],
1162
+ "model.layers.24.input_layernorm.weight": [
1163
+ 1536
1164
+ ],
1165
+ "model.layers.24.post_attention_layernorm.weight": [
1166
+ 1536
1167
+ ],
1168
+ "model.layers.25.self_attn.q_proj.bias": [
1169
+ 1536
1170
+ ],
1171
+ "model.layers.25.self_attn.k_proj.bias": [
1172
+ 256
1173
+ ],
1174
+ "model.layers.25.self_attn.v_proj.bias": [
1175
+ 256
1176
+ ],
1177
+ "model.layers.25.input_layernorm.weight": [
1178
+ 1536
1179
+ ],
1180
+ "model.layers.25.post_attention_layernorm.weight": [
1181
+ 1536
1182
+ ],
1183
+ "model.layers.26.self_attn.q_proj.bias": [
1184
+ 1536
1185
+ ],
1186
+ "model.layers.26.self_attn.k_proj.bias": [
1187
+ 256
1188
+ ],
1189
+ "model.layers.26.self_attn.v_proj.bias": [
1190
+ 256
1191
+ ],
1192
+ "model.layers.26.input_layernorm.weight": [
1193
+ 1536
1194
+ ],
1195
+ "model.layers.26.post_attention_layernorm.weight": [
1196
+ 1536
1197
+ ],
1198
+ "model.layers.27.self_attn.q_proj.bias": [
1199
+ 1536
1200
+ ],
1201
+ "model.layers.27.self_attn.k_proj.bias": [
1202
+ 256
1203
+ ],
1204
+ "model.layers.27.self_attn.v_proj.bias": [
1205
+ 256
1206
+ ],
1207
+ "model.layers.27.input_layernorm.weight": [
1208
+ 1536
1209
+ ],
1210
+ "model.layers.27.post_attention_layernorm.weight": [
1211
+ 1536
1212
+ ],
1213
+ "model.norm.weight": [
1214
+ 1536
1215
+ ],
1216
+ "lm_head.weight": [
1217
+ 151936,
1218
+ 1536
1219
+ ]
1220
+ }
1221
+ }
deepseek-r1-1.5b-gunary/model_layers_14_self_attn_k_proj_bias.fp16 ADDED
Binary file (512 Bytes). View file
 
deepseek-r1-1.5b-gunary/model_layers_15_input_layernorm_weight.fp16 ADDED
Binary file (3.07 kB). View file
 
deepseek-r1-1.5b-gunary/model_layers_22_self_attn_v_proj_bias.fp16 ADDED
Binary file (512 Bytes). View file
 
deepseek-r1-1.5b-gunary/model_layers_4_self_attn_v_proj_weight.sign ADDED
Binary file (49.2 kB). View file
 
deepseek-r1-1.5b-gunary/model_layers_8_self_attn_k_proj_bias.fp16 ADDED
Binary file (512 Bytes). View file
 
deepseek-r1-1.5b-unary/model_layers_14_self_attn_q_proj_bias.fp16 ADDED
Binary file (3.07 kB). View file
 
deepseek-r1-1.5b-unary/model_layers_16_self_attn_v_proj_bias.fp16 ADDED
Binary file (512 Bytes). View file
 
deepseek-r1-1.5b-unary/model_layers_17_input_layernorm_weight.fp16 ADDED
Binary file (3.07 kB). View file
 
deepseek-r1-1.5b-unary/model_layers_17_self_attn_v_proj_weight.scales ADDED
Binary file (1.02 kB). View file
 
deepseek-r1-1.5b-unary/model_layers_17_self_attn_v_proj_weight.sign ADDED
Binary file (49.2 kB). View file
 
deepseek-r1-1.5b-unary/model_layers_22_mlp_gate_proj_weight.scales ADDED
Binary file (35.8 kB). View file
 
deepseek-r1-1.5b-unary/model_layers_24_self_attn_v_proj_bias.fp16 ADDED
Binary file (512 Bytes). View file
 
deepseek-r1-1.5b-unary/model_layers_3_self_attn_v_proj_weight.scales ADDED
Binary file (1.02 kB). View file
 
deepseek-r1-1.5b-unary/model_layers_4_post_attention_layernorm_weight.fp16 ADDED
Binary file (3.07 kB). View file
 
deepseek-r1-1.5b-unary/model_layers_5_self_attn_k_proj_weight.sign ADDED
Binary file (49.2 kB). View file
 
deepseek-r1-1.5b-unary/model_layers_5_self_attn_o_proj_weight.scales ADDED
Binary file (6.14 kB). View file
 
deepseek-r1-1.5b-unary/model_layers_5_self_attn_v_proj_bias.fp16 ADDED
Binary file (512 Bytes). View file
 
deepseek-r1-1.5b-unary/model_layers_6_post_attention_layernorm_weight.fp16 ADDED
Binary file (3.07 kB). View file
 
deepseek-r1-1.5b-unary/model_layers_7_self_attn_q_proj_bias.fp16 ADDED
Binary file (3.07 kB). View file