Add library name, pipeline tag and link to Github

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +1297 -3
README.md CHANGED
@@ -1,13 +1,1307 @@
1
  ---
2
- language:
3
- - en
4
  base_model:
5
  - meta-llama/Llama-3.1-8B-Instruct
 
 
6
  tags:
7
  - peft
8
  - lora
9
  - moe
10
  - moa
 
 
11
  ---
12
 
13
- arxiv: https://arxiv.org/abs/2506.05928
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model:
3
  - meta-llama/Llama-3.1-8B-Instruct
4
+ language:
5
+ - en
6
  tags:
7
  - peft
8
  - lora
9
  - moe
10
  - moa
11
+ pipeline_tag: text-generation
12
+ library_name: transformers
13
  ---
14
 
15
+ arxiv: https://arxiv.org/abs/2506.05928
16
+
17
+ This repository contains the model weights for the paper [MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models](https://huggingface.co/papers/2506.05928).
18
+ For code and more details, please see the [Github repository](https://github.com/dmis-lab/MoA).
19
+
20
+ # File information
21
+
22
+ The repository contains the following file information:
23
+
24
+ Filename: trainable.json
25
+ Content: {
26
+ "trainable_params": 24520288,
27
+ "trainable_params_kv": [
28
+ [
29
+ "llama.layers.0.attention.prompt_gate",
30
+ [
31
+ 1,
32
+ 32,
33
+ 1,
34
+ 1
35
+ ]
36
+ ],
37
+ [
38
+ "llama.layers.0.attention.lora_Q.lora_A.weight",
39
+ [
40
+ 8,
41
+ 4096
42
+ ]
43
+ ],
44
+ [
45
+ "llama.layers.0.attention.lora_Q.lora_B.weight",
46
+ [
47
+ 4096,
48
+ 8
49
+ ]
50
+ ],
51
+ [
52
+ "llama.layers.0.attention.lora_K.lora_A.weight",
53
+ [
54
+ 8,
55
+ 4096
56
+ ]
57
+ ],
58
+ [
59
+ "llama.layers.0.attention.lora_K.lora_B.weight",
60
+ [
61
+ 1024,
62
+ 8
63
+ ]
64
+ ],
65
+ [
66
+ "llama.layers.0.attention.lora_V.lora_A.weight",
67
+ [
68
+ 8,
69
+ 4096
70
+ ]
71
+ ],
72
+ [
73
+ "llama.layers.0.attention.lora_V.lora_B.weight",
74
+ [
75
+ 1024,
76
+ 8
77
+ ]
78
+ ],
79
+ [
80
+ "llama.layers.0.attention.lora_O.lora_A.weight",
81
+ [
82
+ 8,
83
+ 4096
84
+ ]
85
+ ],
86
+ [
87
+ "llama.layers.0.attention.lora_O.lora_B.weight",
88
+ [
89
+ 4096,
90
+ 8
91
+ ]
92
+ ],
93
+ [
94
+ "llama.layers.0.attention.prompt.weight",
95
+ [
96
+ 10,
97
+ 4096
98
+ ]
99
+ ],
100
+ [
101
+ "llama.layers.0.feed_forward.lora_DOWN.lora_A.weight",
102
+ [
103
+ 8,
104
+ 14336
105
+ ]
106
+ ],
107
+ [
108
+ "llama.layers.0.feed_forward.lora_DOWN.lora_B.weight",
109
+ [
110
+ 4096,
111
+ 8
112
+ ]
113
+ ],
114
+ [
115
+ "llama.layers.0.p_adapter.down_proj.weight",
116
+ [
117
+ 16,
118
+ 4096
119
+ ]
120
+ ],
121
+ [
122
+ "llama.layers.0.p_adapter.down_proj.bias",
123
+ [
124
+ 16
125
+ ]
126
+ ],
127
+ [
128
+ "llama.layers.0.p_adapter.up_proj.weight",
129
+ [
130
+ 4096,
131
+ 16
132
+ ]
133
+ ],
134
+ [
135
+ "llama.layers.0.p_adapter.up_proj.bias",
136
+ [
137
+ 4096
138
+ ]
139
+ ],
140
+ [
141
+ "llama.layers.0.adapter_type_router.w1.weight",
142
+ [
143
+ 28,
144
+ 4096
145
+ ]
146
+ ],
147
+ [
148
+ "llama.layers.0.adapter_type_router.w1.bias",
149
+ [
150
+ 28
151
+ ]
152
+ ],
153
+ [
154
+ "llama.layers.0.adapter_type_router.w2.weight",
155
+ [
156
+ 7,
157
+ 28
158
+ ]
159
+ ],
160
+ [
161
+ "llama.layers.0.adapter_type_router.w2.bias",
162
+ [
163
+ 7
164
+ ]
165
+ ],
166
+ [
167
+ "llama.layers.0.adapter_type_router.w3.weight",
168
+ [
169
+ 28,
170
+ 4096
171
+ ]
172
+ ],
173
+ [
174
+ "llama.layers.0.adapter_type_router.w3.bias",
175
+ [
176
+ 28
177
+ ]
178
+ ],
179
+ [
180
+ "llama.layers.1.attention.prompt_gate",
181
+ [
182
+ 1,
183
+ 32,
184
+ 1,
185
+ 1
186
+ ]
187
+ ],
188
+ [
189
+ "llama.layers.1.attention.lora_Q.lora_A.weight",
190
+ [
191
+ 8,
192
+ 4096
193
+ ]
194
+ ],
195
+ [
196
+ "llama.layers.1.attention.lora_Q.lora_B.weight",
197
+ [
198
+ 4096,
199
+ 8
200
+ ]
201
+ ],
202
+ [
203
+ "llama.layers.1.attention.lora_K.lora_A.weight",
204
+ [
205
+ 8,
206
+ 4096
207
+ ]
208
+ ],
209
+ [
210
+ "llama.layers.1.attention.lora_K.lora_B.weight",
211
+ [
212
+ 1024,
213
+ 8
214
+ ]
215
+ ],
216
+ [
217
+ "llama.layers.1.attention.lora_V.lora_A.weight",
218
+ [
219
+ 8,
220
+ 4096
221
+ ]
222
+ ],
223
+ [
224
+ "llama.layers.1.attention.lora_V.lora_B.weight",
225
+ [
226
+ 1024,
227
+ 8
228
+ ]
229
+ ],
230
+ [
231
+ "llama.layers.1.attention.lora_O.lora_A.weight",
232
+ [
233
+ 8,
234
+ 4096
235
+ ]
236
+ ],
237
+ [
238
+ "llama.layers.1.attention.lora_O.lora_B.weight",
239
+ [
240
+ 4096,
241
+ 8
242
+ ]
243
+ ],
244
+ [
245
+ "llama.layers.1.attention.prompt.weight",
246
+ [
247
+ 10,
248
+ 4096
249
+ ]
250
+ ],
251
+ [
252
+ "llama.layers.1.feed_forward.lora_DOWN.lora_A.weight",
253
+ [
254
+ 8,
255
+ 14336
256
+ ]
257
+ ],
258
+ [
259
+ "llama.layers.1.feed_forward.lora_DOWN.lora_B.weight",
260
+ [
261
+ 4096,
262
+ 8
263
+ ]
264
+ ],
265
+ [
266
+ "llama.layers.1.p_adapter.down_proj.weight",
267
+ [
268
+ 16,
269
+ 4096
270
+ ]
271
+ ],
272
+ [
273
+ "llama.layers.1.p_adapter.down_proj.bias",
274
+ [
275
+ 16
276
+ ]
277
+ ],
278
+ [
279
+ "llama.layers.1.p_adapter.up_proj.weight",
280
+ [
281
+ 4096,
282
+ 16
283
+ ]
284
+ ],
285
+ [
286
+ "llama.layers.1.p_adapter.up_proj.bias",
287
+ [
288
+ 4096
289
+ ]
290
+ ],
291
+ [
292
+ "llama.layers.1.adapter_type_router.w1.weight",
293
+ [
294
+ 28,
295
+ 4096
296
+ ]
297
+ ],
298
+ [
299
+ "llama.layers.1.adapter_type_router.w1.bias",
300
+ [
301
+ 28
302
+ ]
303
+ ],
304
+ [
305
+ "llama.layers.1.adapter_type_router.w2.weight",
306
+ [
307
+ 7,
308
+ 28
309
+ ]
310
+ ],
311
+ [
312
+ "llama.layers.1.adapter_type_router.w2.bias",
313
+ [
314
+ 7
315
+ ]
316
+ ],
317
+ [
318
+ "llama.layers.1.adapter_type_router.w3.weight",
319
+ [
320
+ 28,
321
+ 4096
322
+ ]
323
+ ],
324
+ [
325
+ "llama.layers.1.adapter_type_router.w3.bias",
326
+ [
327
+ 28
328
+ ]
329
+ ],
330
+ [
331
+ "llama.layers.2.attention.prompt_gate",
332
+ [
333
+ 1,
334
+ 32,
335
+ 1,
336
+ 1
337
+ ]
338
+ ],
339
+ [
340
+ "llama.layers.2.attention.lora_Q.lora_A.weight",
341
+ [
342
+ 8,
343
+ 4096
344
+ ]
345
+ ],
346
+ [
347
+ "llama.layers.2.attention.lora_Q.lora_B.weight",
348
+ [
349
+ 4096,
350
+ 8
351
+ ]
352
+ ],
353
+ [
354
+ "llama.layers.2.attention.lora_K.lora_A.weight",
355
+ [
356
+ 8,
357
+ 4096
358
+ ]
359
+ ],
360
+ [
361
+ "llama.layers.2.attention.lora_K.lora_B.weight",
362
+ [
363
+ 1024,
364
+ 8
365
+ ]
366
+ ],
367
+ [
368
+ "llama.layers.2.attention.lora_V.lora_A.weight",
369
+ [
370
+ 8,
371
+ 4096
372
+ ]
373
+ ],
374
+ [
375
+ "llama.layers.2.attention.lora_V.lora_B.weight",
376
+ [
377
+ 1024,
378
+ 8
379
+ ]
380
+ ],
381
+ [
382
+ "llama.layers.2.attention.lora_O.lora_A.weight",
383
+ [
384
+ 8,
385
+ 4096
386
+ ]
387
+ ],
388
+ [
389
+ "llama.layers.2.attention.lora_O.lora_B.weight",
390
+ [
391
+ 4096,
392
+ 8
393
+ ]
394
+ ],
395
+ [
396
+ "llama.layers.2.attention.prompt.weight",
397
+ [
398
+ 10,
399
+ 4096
400
+ ]
401
+ ],
402
+ [
403
+ "llama.layers.2.feed_forward.lora_DOWN.lora_A.weight",
404
+ [
405
+ 8,
406
+ 14336
407
+ ]
408
+ ],
409
+ [
410
+ "llama.layers.2.feed_forward.lora_DOWN.lora_B.weight",
411
+ [
412
+ 4096,
413
+ 8
414
+ ]
415
+ ],
416
+ [
417
+ "llama.layers.2.p_adapter.down_proj.weight",
418
+ [
419
+ 16,
420
+ 4096
421
+ ]
422
+ ],
423
+ [
424
+ "llama.layers.2.p_adapter.down_proj.bias",
425
+ [
426
+ 16
427
+ ]
428
+ ],
429
+ [
430
+ "llama.layers.2.p_adapter.up_proj.weight",
431
+ [
432
+ 4096,
433
+ 16
434
+ ]
435
+ ],
436
+ [
437
+ "llama.layers.2.p_adapter.up_proj.bias",
438
+ [
439
+ 4096
440
+ ]
441
+ ],
442
+ [
443
+ "llama.layers.2.adapter_type_router.w1.weight",
444
+ [
445
+ 28,
446
+ 4096
447
+ ]
448
+ ],
449
+ [
450
+ "llama.layers.2.adapter_type_router.w1.bias",
451
+ [
452
+ 28
453
+ ]
454
+ ],
455
+ [
456
+ "llama.layers.2.adapter_type_router.w2.weight",
457
+ [
458
+ 7,
459
+ 28
460
+ ]
461
+ ],
462
+ [
463
+ "llama.layers.2.adapter_type_router.w2.bias",
464
+ [
465
+ 7
466
+ ]
467
+ ],
468
+ [
469
+ "llama.layers.2.adapter_type_router.w3.weight",
470
+ [
471
+ 28,
472
+ 4096
473
+ ]
474
+ ],
475
+ [
476
+ "llama.layers.2.adapter_type_router.w3.bias",
477
+ [
478
+ 28
479
+ ]
480
+ ],
481
+ [
482
+ "llama.layers.3.attention.prompt_gate",
483
+ [
484
+ 1,
485
+ 32,
486
+ 1,
487
+ 1
488
+ ]
489
+ ],
490
+ [
491
+ "llama.layers.3.attention.lora_Q.lora_A.weight",
492
+ [
493
+ 8,
494
+ 4096
495
+ ]
496
+ ],
497
+ [
498
+ "llama.layers.3.attention.lora_Q.lora_B.weight",
499
+ [
500
+ 4096,
501
+ 8
502
+ ]
503
+ ],
504
+ [
505
+ "llama.layers.3.attention.lora_K.lora_A.weight",
506
+ [
507
+ 8,
508
+ 4096
509
+ ]
510
+ ],
511
+ [
512
+ "llama.layers.3.attention.lora_K.lora_B.weight",
513
+ [
514
+ 1024,
515
+ 8
516
+ ]
517
+ ],
518
+ [
519
+ "llama.layers.3.attention.lora_V.lora_A.weight",
520
+ [
521
+ 8,
522
+ 4096
523
+ ]
524
+ ],
525
+ [
526
+ "llama.layers.3.attention.lora_V.lora_B.weight",
527
+ [
528
+ 1024,
529
+ 8
530
+ ]
531
+ ],
532
+ [
533
+ "llama.layers.3.attention.lora_O.lora_A.weight",
534
+ [
535
+ 8,
536
+ 4096
537
+ ]
538
+ ],
539
+ [
540
+ "llama.layers.3.attention.lora_O.lora_B.weight",
541
+ [
542
+ 4096,
543
+ 8
544
+ ]
545
+ ],
546
+ [
547
+ "llama.layers.3.attention.prompt.weight",
548
+ [
549
+ 10,
550
+ 4096
551
+ ]
552
+ ],
553
+ [
554
+ "llama.layers.3.feed_forward.lora_DOWN.lora_A.weight",
555
+ [
556
+ 8,
557
+ 14336
558
+ ]
559
+ ],
560
+ [
561
+ "llama.layers.3.feed_forward.lora_DOWN.lora_B.weight",
562
+ [
563
+ 4096,
564
+ 8
565
+ ]
566
+ ],
567
+ [
568
+ "llama.layers.3.p_adapter.down_proj.weight",
569
+ [
570
+ 16,
571
+ 4096
572
+ ]
573
+ ],
574
+ [
575
+ "llama.layers.3.p_adapter.down_proj.bias",
576
+ [
577
+ 16
578
+ ]
579
+ ],
580
+ [
581
+ "llama.layers.3.p_adapter.up_proj.weight",
582
+ [
583
+ 4096,
584
+ 16
585
+ ]
586
+ ],
587
+ [
588
+ "llama.layers.3.p_adapter.up_proj.bias",
589
+ [
590
+ 4096
591
+ ]
592
+ ],
593
+ [
594
+ "llama.layers.3.adapter_type_router.w1.weight",
595
+ [
596
+ 28,
597
+ 4096
598
+ ]
599
+ ],
600
+ [
601
+ "llama.layers.3.adapter_type_router.w1.bias",
602
+ [
603
+ 28
604
+ ]
605
+ ],
606
+ [
607
+ "llama.layers.3.adapter_type_router.w2.weight",
608
+ [
609
+ 7,
610
+ 28
611
+ ]
612
+ ],
613
+ [
614
+ "llama.layers.3.adapter_type_router.w2.bias",
615
+ [
616
+ 7
617
+ ]
618
+ ],
619
+ [
620
+ "llama.layers.3.adapter_type_router.w3.weight",
621
+ [
622
+ 28,
623
+ 4096
624
+ ]
625
+ ],
626
+ [
627
+ "llama.layers.3.adapter_type_router.w3.bias",
628
+ [
629
+ 28
630
+ ]
631
+ ],
632
+ [
633
+ "llama.layers.4.attention.prompt_gate",
634
+ [
635
+ 1,
636
+ 32,
637
+ 1,
638
+ 1
639
+ ]
640
+ ],
641
+ [
642
+ "llama.layers.4.attention.lora_Q.lora_A.weight",
643
+ [
644
+ 8,
645
+ 4096
646
+ ]
647
+ ],
648
+ [
649
+ "llama.layers.4.attention.lora_Q.lora_B.weight",
650
+ [
651
+ 4096,
652
+ 8
653
+ ]
654
+ ],
655
+ [
656
+ "llama.layers.4.attention.lora_K.lora_A.weight",
657
+ [
658
+ 8,
659
+ 4096
660
+ ]
661
+ ],
662
+ [
663
+ "llama.layers.4.attention.lora_K.lora_B.weight",
664
+ [
665
+ 1024,
666
+ 8
667
+ ]
668
+ ],
669
+ [
670
+ "llama.layers.4.attention.lora_V.lora_A.weight",
671
+ [
672
+ 8,
673
+ 4096
674
+ ]
675
+ ],
676
+ [
677
+ "llama.layers.4.attention.lora_V.lora_B.weight",
678
+ [
679
+ 1024,
680
+ 8
681
+ ]
682
+ ],
683
+ [
684
+ "llama.layers.4.attention.lora_O.lora_A.weight",
685
+ [
686
+ 8,
687
+ 4096
688
+ ]
689
+ ],
690
+ [
691
+ "llama.layers.4.attention.lora_O.lora_B.weight",
692
+ [
693
+ 4096,
694
+ 8
695
+ ]
696
+ ],
697
+ [
698
+ "llama.layers.4.attention.prompt.weight",
699
+ [
700
+ 10,
701
+ 4096
702
+ ]
703
+ ],
704
+ [
705
+ "llama.layers.4.feed_forward.lora_DOWN.lora_A.weight",
706
+ [
707
+ 8,
708
+ 14336
709
+ ]
710
+ ],
711
+ [
712
+ "llama.layers.4.feed_forward.lora_DOWN.lora_B.weight",
713
+ [
714
+ 4096,
715
+ 8
716
+ ]
717
+ ],
718
+ [
719
+ "llama.layers.4.p_adapter.down_proj.weight",
720
+ [
721
+ 16,
722
+ 4096
723
+ ]
724
+ ],
725
+ [
726
+ "llama.layers.4.p_adapter.down_proj.bias",
727
+ [
728
+ 16
729
+ ]
730
+ ],
731
+ [
732
+ "llama.layers.4.p_adapter.up_proj.weight",
733
+ [
734
+ 4096,
735
+ 16
736
+ ]
737
+ ],
738
+ [
739
+ "llama.layers.4.p_adapter.up_proj.bias",
740
+ [
741
+ 4096
742
+ ]
743
+ ],
744
+ [
745
+ "llama.layers.4.adapter_type_router.w1.weight",
746
+ [
747
+ 28,
748
+ 4096
749
+ ]
750
+ ],
751
+ [
752
+ "llama.layers.4.adapter_type_router.w1.bias",
753
+ [
754
+ 28
755
+ ]
756
+ ],
757
+ [
758
+ "llama.layers.4.adapter_type_router.w2.weight",
759
+ [
760
+ 7,
761
+ 28
762
+ ]
763
+ ],
764
+ [
765
+ "llama.layers.4.adapter_type_router.w2.bias",
766
+ [
767
+ 7
768
+ ]
769
+ ],
770
+ [
771
+ "llama.layers.4.adapter_type_router.w3.weight",
772
+ [
773
+ 28,
774
+ 4096
775
+ ]
776
+ ],
777
+ [
778
+ "llama.layers.4.adapter_type_router.w3.bias",
779
+ [
780
+ 28
781
+ ]
782
+ ],
783
+ [
784
+ "llama.layers.5.attention.prompt_gate",
785
+ [
786
+ 1,
787
+ 32,
788
+ 1,
789
+ 1
790
+ ]
791
+ ],
792
+ [
793
+ "llama.layers.5.attention.lora_Q.lora_A.weight",
794
+ [
795
+ 8,
796
+ 4096
797
+ ]
798
+ ],
799
+ [
800
+ "llama.layers.5.attention.lora_Q.lora_B.weight",
801
+ [
802
+ 4096,
803
+ 8
804
+ ]
805
+ ],
806
+ [
807
+ "llama.layers.5.attention.lora_K.lora_A.weight",
808
+ [
809
+ 8,
810
+ 4096
811
+ ]
812
+ ],
813
+ [
814
+ "llama.layers.5.attention.lora_K.lora_B.weight",
815
+ [
816
+ 1024,
817
+ 8
818
+ ]
819
+ ],
820
+ [
821
+ "llama.layers.5.attention.lora_V.lora_A.weight",
822
+ [
823
+ 8,
824
+ 4096
825
+ ]
826
+ ],
827
+ [
828
+ "llama.layers.5.attention.lora_V.lora_B.weight",
829
+ [
830
+ 1024,
831
+ 8
832
+ ]
833
+ ],
834
+ [
835
+ "llama.layers.5.attention.lora_O.lora_A.weight",
836
+ [
837
+ 8,
838
+ 4096
839
+ ]
840
+ ],
841
+ [
842
+ "llama.layers.5.attention.lora_O.lora_B.weight",
843
+ [
844
+ 4096,
845
+ 8
846
+ ]
847
+ ],
848
+ [
849
+ "llama.layers.5.attention.prompt.weight",
850
+ [
851
+ 10,
852
+ 4096
853
+ ]
854
+ ],
855
+ [
856
+ "llama.layers.5.feed_forward.lora_DOWN.lora_A.weight",
857
+ [
858
+ 8,
859
+ 14336
860
+ ]
861
+ ],
862
+ [
863
+ "llama.layers.5.feed_forward.lora_DOWN.lora_B.weight",
864
+ [
865
+ 4096,
866
+ 8
867
+ ]
868
+ ],
869
+ [
870
+ "llama.layers.5.p_adapter.down_proj.weight",
871
+ [
872
+ 16,
873
+ 4096
874
+ ]
875
+ ],
876
+ [
877
+ "llama.layers.5.p_adapter.down_proj.bias",
878
+ [
879
+ 16
880
+ ]
881
+ ],
882
+ [
883
+ "llama.layers.5.p_adapter.up_proj.weight",
884
+ [
885
+ 4096,
886
+ 16
887
+ ]
888
+ ],
889
+ [
890
+ "llama.layers.5.p_adapter.up_proj.bias",
891
+ [
892
+ 4096
893
+ ]
894
+ ],
895
+ [
896
+ "llama.layers.5.adapter_type_router.w1.weight",
897
+ [
898
+ 28,
899
+ 4096
900
+ ]
901
+ ],
902
+ [
903
+ "llama.layers.5.adapter_type_router.w1.bias",
904
+ [
905
+ 28
906
+ ]
907
+ ],
908
+ [
909
+ "llama.layers.5.adapter_type_router.w2.weight",
910
+ [
911
+ 7,
912
+ 28
913
+ ]
914
+ ],
915
+ [
916
+ "llama.layers.5.adapter_type_router.w2.bias",
917
+ [
918
+ 7
919
+ ]
920
+ ],
921
+ [
922
+ "llama.layers.5.adapter_type_router.w3.weight",
923
+ [
924
+ 28,
925
+ 4096
926
+ ]
927
+ ],
928
+ [
929
+ "llama.layers.5.adapter_type_router.w3.bias",
930
+ [
931
+ 28
932
+ ]
933
+ ],
934
+ [
935
+ "llama.layers.6.attention.prompt_gate",
936
+ [
937
+ 1,
938
+ 32,
939
+ 1,
940
+ 1
941
+ ]
942
+ ],
943
+ [
944
+ "llama.layers.6.attention.lora_Q.lora_A.weight",
945
+ [
946
+ 8,
947
+ 4096
948
+ ]
949
+ ],
950
+ [
951
+ "llama.layers.6.attention.lora_Q.lora_B.weight",
952
+ [
953
+ 4096,
954
+ 8
955
+ ]
956
+ ],
957
+ [
958
+ "llama.layers.6.attention.lora_K.lora_A.weight",
959
+ [
960
+ 8,
961
+ 4096
962
+ ]
963
+ ],
964
+ [
965
+ "llama.layers.6.attention.lora_K.lora_B.weight",
966
+ [
967
+ 1024,
968
+ 8
969
+ ]
970
+ ],
971
+ [
972
+ "llama.layers.6.attention.lora_V.lora_A.weight",
973
+ [
974
+ 8,
975
+ 4096
976
+ ]
977
+ ],
978
+ [
979
+ "llama.layers.6.attention.lora_V.lora_B.weight",
980
+ [
981
+ 1024,
982
+ 8
983
+ ]
984
+ ],
985
+ [
986
+ "llama.layers.6.attention.lora_O.lora_A.weight",
987
+ [
988
+ 8,
989
+ 4096
990
+ ]
991
+ ],
992
+ [
993
+ "llama.layers.6.attention.lora_O.lora_B.weight",
994
+ [
995
+ 4096,
996
+ 8
997
+ ]
998
+ ],
999
+ [
1000
+ "llama.layers.6.attention.prompt.weight",
1001
+ [
1002
+ 10,
1003
+ 4096
1004
+ ]
1005
+ ],
1006
+ [
1007
+ "llama.layers.6.feed_forward.lora_DOWN.lora_A.weight",
1008
+ [
1009
+ 8,
1010
+ 14336
1011
+ ]
1012
+ ],
1013
+ [
1014
+ "llama.layers.6.feed_forward.lora_DOWN.lora_B.weight",
1015
+ [
1016
+ 4096,
1017
+ 8
1018
+ ]
1019
+ ],
1020
+ [
1021
+ "llama.layers.6.p_adapter.down_proj.weight",
1022
+ [
1023
+ 16,
1024
+ 4096
1025
+ ]
1026
+ ],
1027
+ [
1028
+ "llama.layers.6.p_adapter.down_proj.bias",
1029
+ [
1030
+ 16
1031
+ ]
1032
+ ],
1033
+ [
1034
+ "llama.layers.6.p_adapter.up_proj.weight",
1035
+ [
1036
+ 4096,
1037
+ 16
1038
+ ]
1039
+ ],
1040
+ [
1041
+ "llama.layers.6.p_adapter.up_proj.bias",
1042
+ [
1043
+ 4096
1044
+ ]
1045
+ ],
1046
+ [
1047
+ "llama.layers.6.adapter_type_router.w1.weight",
1048
+ [
1049
+ 28,
1050
+ 4096
1051
+ ]
1052
+ ],
1053
+ [
1054
+ "llama.layers.6.adapter_type_router.w1.bias",
1055
+ [
1056
+ 28
1057
+ ]
1058
+ ],
1059
+ [
1060
+ "llama.layers.6.adapter_type_router.w2.weight",
1061
+ [
1062
+ 7,
1063
+ 28
1064
+ ]
1065
+ ],
1066
+ [
1067
+ "llama.layers.6.adapter_type_router.w2.bias",
1068
+ [
1069
+ 7
1070
+ ]
1071
+ ],
1072
+ [
1073
+ "llama.layers.6.adapter_type_router.w3.weight",
1074
+ [
1075
+ 28,
1076
+ 4096
1077
+ ]
1078
+ ],
1079
+ [
1080
+ "llama.layers.6.adapter_type_router.w3.bias",
1081
+ [
1082
+ 28
1083
+ ]
1084
+ ],
1085
+ [
1086
+ "llama.layers.7.attention.prompt_gate",
1087
+ [
1088
+ 1,
1089
+ 32,
1090
+ 1,
1091
+ 1
1092
+ ]
1093
+ ],
1094
+ [
1095
+ "llama.layers.7.attention.lora_Q.lora_A.weight",
1096
+ [
1097
+ 8,
1098
+ 4096
1099
+ ]
1100
+ ],
1101
+ [
1102
+ "llama.layers.7.attention.lora_Q.lora_B.weight",
1103
+ [
1104
+ 4096,
1105
+ 8
1106
+ ]
1107
+ ],
1108
+ [
1109
+ "llama.layers.7.attention.lora_K.lora_A.weight",
1110
+ [
1111
+ 8,
1112
+ 4096
1113
+ ]
1114
+ ],
1115
+ [
1116
+ "llama.layers.7.attention.lora_K.lora_B.weight",
1117
+ [
1118
+ 1024,
1119
+ 8
1120
+ ]
1121
+ ],
1122
+ [
1123
+ "llama.layers.7.attention.lora_V.lora_A.weight",
1124
+ [
1125
+ 8,
1126
+ 4096
1127
+ ]
1128
+ ],
1129
+ [
1130
+ "llama.layers.7.attention.lora_V.lora_B.weight",
1131
+ [
1132
+ 1024,
1133
+ 8
1134
+ ]
1135
+ ],
1136
+ [
1137
+ "llama.layers.7.attention.lora_O.lora_A.weight",
1138
+ [
1139
+ 8,
1140
+ 4096
1141
+ ]
1142
+ ],
1143
+ [
1144
+ "llama.layers.7.attention.lora_O.lora_B.weight",
1145
+ [
1146
+ 4096,
1147
+ 8
1148
+ ]
1149
+ ],
1150
+ [
1151
+ "llama.layers.7.attention.prompt.weight",
1152
+ [
1153
+ 10,
1154
+ 4096
1155
+ ]
1156
+ ],
1157
+ [
1158
+ "llama.layers.7.feed_forward.lora_DOWN.lora_A.weight",
1159
+ [
1160
+ 8,
1161
+ 14336
1162
+ ]
1163
+ ],
1164
+ [
1165
+ "llama.layers.7.feed_forward.lora_DOWN.lora_B.weight",
1166
+ [
1167
+ 4096,
1168
+ 8
1169
+ ]
1170
+ ],
1171
+ [
1172
+ "llama.layers.7.p_adapter.down_proj.weight",
1173
+ [
1174
+ 16,
1175
+ 4096
1176
+ ]
1177
+ ],
1178
+ [
1179
+ "llama.layers.7.p_adapter.down_proj.bias",
1180
+ [
1181
+ 16
1182
+ ]
1183
+ ],
1184
+ [
1185
+ "llama.layers.7.p_adapter.up_proj.weight",
1186
+ [
1187
+ 4096,
1188
+ 16
1189
+ ]
1190
+ ],
1191
+ [
1192
+ "llama.layers.7.p_adapter.up_proj.bias",
1193
+ [
1194
+ 4096
1195
+ ]
1196
+ ],
1197
+ [
1198
+ "llama.layers.7.adapter_type_router.w1.weight",
1199
+ [
1200
+ 28,
1201
+ 4096
1202
+ ]
1203
+ ],
1204
+ [
1205
+ "llama.layers.7.adapter_type_router.w1.bias",
1206
+ [
1207
+ 28
1208
+ ]
1209
+ ],
1210
+ [
1211
+ "llama.layers.7.adapter_type_router.w2.weight",
1212
+ [
1213
+ 7,
1214
+ 28
1215
+ ]
1216
+ ],
1217
+ [
1218
+ "llama.layers.7.adapter_type_router.w2.bias",
1219
+ [
1220
+ 7
1221
+ ]
1222
+ ],
1223
+ [
1224
+ "llama.layers.7.adapter_type_router.w3.weight",
1225
+ [
1226
+ 28,
1227
+ 4096
1228
+ ]
1229
+ ],
1230
+ [
1231
+ "llama.layers.7.adapter_type_router.w3.bias",
1232
+ [
1233
+ 28
1234
+ ]
1235
+ ],
1236
+ [
1237
+ "llama.layers.8.attention.prompt_gate",
1238
+ [
1239
+ 1,
1240
+ 32,
1241
+ 1,
1242
+ 1
1243
+ ]
1244
+ ],
1245
+ [
1246
+ "llama.layers.8.attention.lora_Q.lora_A.weight",
1247
+ [
1248
+ 8,
1249
+ 4096
1250
+ ]
1251
+ ],
1252
+ [
1253
+ "llama.layers.8.attention.lora_Q.lora_B.weight",
1254
+ [
1255
+ 4096,
1256
+ 8
1257
+ ]
1258
+ ],
1259
+ [
1260
+ "llama.layers.8.attention.lora_K.lora_A.weight",
1261
+ [
1262
+ 8,
1263
+ 4096
1264
+ ]
1265
+ ],
1266
+ [
1267
+ "llama.layers.8.attention.lora_K.lora_B.weight",
1268
+ [
1269
+ 1024,
1270
+ 8
1271
+ ]
1272
+ ],
1273
+ [
1274
+ "llama.layers.8.attention.lora_V.lora_A.weight",
1275
+ [
1276
+ 8,
1277
+ 4096
1278
+ ]
1279
+ ],
1280
+ [
1281
+ "llama.layers.8.attention.lora_V.lora_B.weight",
1282
+ [
1283
+ 1024,
1284
+ 8
1285
+ ]
1286
+ ],
1287
+ [
1288
+ "llama.layers.8.attention.lora_O.lora_A.weight",
1289
+ [
1290
+ 8,
1291
+ 4096
1292
+ ]
1293
+ ],
1294
+ [
1295
+ "llama.layers.8.attention.lora_O.lora_B.weight",
1296
+ [
1297
+ 4096,
1298
+ 8
1299
+ ]
1300
+ ],
1301
+ [
1302
+ "llama.layers.8.attention.prompt.weight",
1303
+ [
1304
+ 10,
1305
+ 4096
1306
+ ]
1307
+ ],