Akicou commited on
Commit
718e4c3
·
verified ·
1 Parent(s): d9fc33b

Upload PACER merged Qwen coder models

Browse files
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - pacer
5
+ - model-merging
6
+ - merged-model
7
+ - moe
8
+ license: apache-2.0
9
+ ---
10
+
11
+ # pacer-merge
12
+
13
+ This model was created using **PACER (Permutation-Aligned Consensus Expert Routing)**.
14
+
15
+ ## Model Details
16
+
17
+ **Merge Type:** PACER (Base-Free, Interference-Aware)
18
+
19
+ **Source Models:**
20
+ - `fluently/FluentlyQwen3-Coder-4B-0909`
21
+ - `SamuelBang/AesCoder-4B`
22
+
23
+ **Merge Configuration:**
24
+ - Interference Threshold: `0.35`
25
+ - Top-K Experts: `2`
26
+ - Merged Layers: `0`
27
+ - MoE Layers: `108`
28
+
29
+ ## How PACER Works
30
+
31
+ PACER is a novel model merging framework that:
32
+ 1. **Aligns models geometrically** using Git Re-Basin
33
+ 2. **Computes a Consensus Barycenter** as a synthetic base
34
+ 3. **Analyzes interference** per layer
35
+ 4. **Merges low-interference layers** using DARE-TIES
36
+ 5. **Upcycles high-interference layers** to Mixture-of-Experts
37
+
38
+ ## Usage
39
+
40
+ ```python
41
+ from transformers import AutoModelForCausalLM, AutoTokenizer
42
+
43
+ model = AutoModelForCausalLM.from_pretrained("pacer-merge")
44
+ tokenizer = AutoTokenizer.from_pretrained("pacer-merge")
45
+
46
+ # Use the model
47
+ inputs = tokenizer("Hello, world!", return_tensors="pt")
48
+ outputs = model.generate(**inputs)
49
+ ```
50
+
51
+ ## Created With
52
+
53
+ [PacerKit](https://github.com/yourusername/pacerkit) - PACER Model Merging Framework
54
+
55
+ **Created:** 2025-12-09 21:46:52
config.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "dtype": "bfloat16",
9
+ "eos_token_id": 151645,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 2560,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 9728,
15
+ "layer_types": [
16
+ "full_attention",
17
+ "full_attention",
18
+ "full_attention",
19
+ "full_attention",
20
+ "full_attention",
21
+ "full_attention",
22
+ "full_attention",
23
+ "full_attention",
24
+ "full_attention",
25
+ "full_attention",
26
+ "full_attention",
27
+ "full_attention",
28
+ "full_attention",
29
+ "full_attention",
30
+ "full_attention",
31
+ "full_attention",
32
+ "full_attention",
33
+ "full_attention",
34
+ "full_attention",
35
+ "full_attention",
36
+ "full_attention",
37
+ "full_attention",
38
+ "full_attention",
39
+ "full_attention",
40
+ "full_attention",
41
+ "full_attention",
42
+ "full_attention",
43
+ "full_attention",
44
+ "full_attention",
45
+ "full_attention",
46
+ "full_attention",
47
+ "full_attention",
48
+ "full_attention",
49
+ "full_attention",
50
+ "full_attention",
51
+ "full_attention"
52
+ ],
53
+ "max_position_embeddings": 40960,
54
+ "max_window_layers": 36,
55
+ "model_type": "qwen3",
56
+ "num_attention_heads": 32,
57
+ "num_hidden_layers": 36,
58
+ "num_key_value_heads": 8,
59
+ "pad_token_id": 151643,
60
+ "rms_norm_eps": 1e-06,
61
+ "rope_scaling": null,
62
+ "rope_theta": 1000000,
63
+ "sliding_window": null,
64
+ "tie_word_embeddings": true,
65
+ "transformers_version": "4.57.3",
66
+ "unsloth_version": "2025.9.2",
67
+ "use_cache": true,
68
+ "use_sliding_window": false,
69
+ "vocab_size": 151936
70
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 151643,
4
+ "eos_token_id": 151645,
5
+ "max_length": 40960,
6
+ "pad_token_id": 151643,
7
+ "transformers_version": "4.57.3"
8
+ }
merge_config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "project_name": "pacer-merge",
3
+ "models": [
4
+ "fluently/FluentlyQwen3-Coder-4B-0909",
5
+ "SamuelBang/AesCoder-4B"
6
+ ],
7
+ "pacer": {
8
+ "interference_threshold": 0.35,
9
+ "top_k_experts": 2
10
+ },
11
+ "summary": {
12
+ "total_layers": 108,
13
+ "merge_layers": 0,
14
+ "moe_layers": 108,
15
+ "avg_interference": 0.9777785492716012,
16
+ "max_interference": 0.9826418738812208,
17
+ "min_interference": 0.9674257636070251,
18
+ "threshold": 0.35
19
+ }
20
+ }
merge_report.json ADDED
@@ -0,0 +1,888 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": {
3
+ "project_name": "pacer-merge",
4
+ "models": [
5
+ "fluently/FluentlyQwen3-Coder-4B-0909",
6
+ "SamuelBang/AesCoder-4B"
7
+ ],
8
+ "interference_threshold": 0.35,
9
+ "top_k_experts": 2
10
+ },
11
+ "summary": {
12
+ "total_layers": 108,
13
+ "merge_layers": 0,
14
+ "moe_layers": 108,
15
+ "avg_interference": 0.9777785492716012,
16
+ "max_interference": 0.9826418738812208,
17
+ "min_interference": 0.9674257636070251,
18
+ "threshold": 0.35
19
+ },
20
+ "layer_decisions": {
21
+ "model.layers.0.mlp.gate_proj.weight": {
22
+ "score": 0.9758567679673433,
23
+ "decision": "upcycle_moe"
24
+ },
25
+ "model.layers.0.mlp.up_proj.weight": {
26
+ "score": 0.9773118272423744,
27
+ "decision": "upcycle_moe"
28
+ },
29
+ "model.layers.0.mlp.down_proj.weight": {
30
+ "score": 0.9756715409457684,
31
+ "decision": "upcycle_moe"
32
+ },
33
+ "model.layers.1.mlp.gate_proj.weight": {
34
+ "score": 0.9674257636070251,
35
+ "decision": "upcycle_moe"
36
+ },
37
+ "model.layers.1.mlp.up_proj.weight": {
38
+ "score": 0.9758508205413818,
39
+ "decision": "upcycle_moe"
40
+ },
41
+ "model.layers.1.mlp.down_proj.weight": {
42
+ "score": 0.9768250994384289,
43
+ "decision": "upcycle_moe"
44
+ },
45
+ "model.layers.2.mlp.gate_proj.weight": {
46
+ "score": 0.973158223554492,
47
+ "decision": "upcycle_moe"
48
+ },
49
+ "model.layers.2.mlp.up_proj.weight": {
50
+ "score": 0.9724544435739517,
51
+ "decision": "upcycle_moe"
52
+ },
53
+ "model.layers.2.mlp.down_proj.weight": {
54
+ "score": 0.975229199975729,
55
+ "decision": "upcycle_moe"
56
+ },
57
+ "model.layers.3.mlp.gate_proj.weight": {
58
+ "score": 0.9705729000270367,
59
+ "decision": "upcycle_moe"
60
+ },
61
+ "model.layers.3.mlp.up_proj.weight": {
62
+ "score": 0.9761323314160109,
63
+ "decision": "upcycle_moe"
64
+ },
65
+ "model.layers.3.mlp.down_proj.weight": {
66
+ "score": 0.9766243025660515,
67
+ "decision": "upcycle_moe"
68
+ },
69
+ "model.layers.4.mlp.gate_proj.weight": {
70
+ "score": 0.9693492259830236,
71
+ "decision": "upcycle_moe"
72
+ },
73
+ "model.layers.4.mlp.up_proj.weight": {
74
+ "score": 0.9771679863333702,
75
+ "decision": "upcycle_moe"
76
+ },
77
+ "model.layers.4.mlp.down_proj.weight": {
78
+ "score": 0.9777354244142771,
79
+ "decision": "upcycle_moe"
80
+ },
81
+ "model.layers.5.mlp.gate_proj.weight": {
82
+ "score": 0.9719736948609352,
83
+ "decision": "upcycle_moe"
84
+ },
85
+ "model.layers.5.mlp.up_proj.weight": {
86
+ "score": 0.9768269564956427,
87
+ "decision": "upcycle_moe"
88
+ },
89
+ "model.layers.5.mlp.down_proj.weight": {
90
+ "score": 0.9775361530482769,
91
+ "decision": "upcycle_moe"
92
+ },
93
+ "model.layers.6.mlp.gate_proj.weight": {
94
+ "score": 0.9740530084818602,
95
+ "decision": "upcycle_moe"
96
+ },
97
+ "model.layers.6.mlp.up_proj.weight": {
98
+ "score": 0.9770794659852982,
99
+ "decision": "upcycle_moe"
100
+ },
101
+ "model.layers.6.mlp.down_proj.weight": {
102
+ "score": 0.9774630032479763,
103
+ "decision": "upcycle_moe"
104
+ },
105
+ "model.layers.7.mlp.gate_proj.weight": {
106
+ "score": 0.9748863540589809,
107
+ "decision": "upcycle_moe"
108
+ },
109
+ "model.layers.7.mlp.up_proj.weight": {
110
+ "score": 0.9771392066031694,
111
+ "decision": "upcycle_moe"
112
+ },
113
+ "model.layers.7.mlp.down_proj.weight": {
114
+ "score": 0.9776077885180712,
115
+ "decision": "upcycle_moe"
116
+ },
117
+ "model.layers.8.mlp.gate_proj.weight": {
118
+ "score": 0.9771762136369944,
119
+ "decision": "upcycle_moe"
120
+ },
121
+ "model.layers.8.mlp.up_proj.weight": {
122
+ "score": 0.9769909419119358,
123
+ "decision": "upcycle_moe"
124
+ },
125
+ "model.layers.8.mlp.down_proj.weight": {
126
+ "score": 0.9772236570715904,
127
+ "decision": "upcycle_moe"
128
+ },
129
+ "model.layers.9.mlp.gate_proj.weight": {
130
+ "score": 0.9760845918208361,
131
+ "decision": "upcycle_moe"
132
+ },
133
+ "model.layers.9.mlp.up_proj.weight": {
134
+ "score": 0.9766342304646969,
135
+ "decision": "upcycle_moe"
136
+ },
137
+ "model.layers.9.mlp.down_proj.weight": {
138
+ "score": 0.9773884564638138,
139
+ "decision": "upcycle_moe"
140
+ },
141
+ "model.layers.10.mlp.gate_proj.weight": {
142
+ "score": 0.977036003023386,
143
+ "decision": "upcycle_moe"
144
+ },
145
+ "model.layers.10.mlp.up_proj.weight": {
146
+ "score": 0.9762268159538507,
147
+ "decision": "upcycle_moe"
148
+ },
149
+ "model.layers.10.mlp.down_proj.weight": {
150
+ "score": 0.9767854642122984,
151
+ "decision": "upcycle_moe"
152
+ },
153
+ "model.layers.11.mlp.gate_proj.weight": {
154
+ "score": 0.9778869729489088,
155
+ "decision": "upcycle_moe"
156
+ },
157
+ "model.layers.11.mlp.up_proj.weight": {
158
+ "score": 0.9764058664441109,
159
+ "decision": "upcycle_moe"
160
+ },
161
+ "model.layers.11.mlp.down_proj.weight": {
162
+ "score": 0.9770964700728655,
163
+ "decision": "upcycle_moe"
164
+ },
165
+ "model.layers.12.mlp.gate_proj.weight": {
166
+ "score": 0.9788605384528637,
167
+ "decision": "upcycle_moe"
168
+ },
169
+ "model.layers.12.mlp.up_proj.weight": {
170
+ "score": 0.9766901917755604,
171
+ "decision": "upcycle_moe"
172
+ },
173
+ "model.layers.12.mlp.down_proj.weight": {
174
+ "score": 0.9772241190075874,
175
+ "decision": "upcycle_moe"
176
+ },
177
+ "model.layers.13.mlp.gate_proj.weight": {
178
+ "score": 0.9798912685364485,
179
+ "decision": "upcycle_moe"
180
+ },
181
+ "model.layers.13.mlp.up_proj.weight": {
182
+ "score": 0.9771493915468454,
183
+ "decision": "upcycle_moe"
184
+ },
185
+ "model.layers.13.mlp.down_proj.weight": {
186
+ "score": 0.9775336850434542,
187
+ "decision": "upcycle_moe"
188
+ },
189
+ "model.layers.14.mlp.gate_proj.weight": {
190
+ "score": 0.9807573985308409,
191
+ "decision": "upcycle_moe"
192
+ },
193
+ "model.layers.14.mlp.up_proj.weight": {
194
+ "score": 0.9772995170205832,
195
+ "decision": "upcycle_moe"
196
+ },
197
+ "model.layers.14.mlp.down_proj.weight": {
198
+ "score": 0.9775482192635536,
199
+ "decision": "upcycle_moe"
200
+ },
201
+ "model.layers.15.mlp.gate_proj.weight": {
202
+ "score": 0.9814920481294394,
203
+ "decision": "upcycle_moe"
204
+ },
205
+ "model.layers.15.mlp.up_proj.weight": {
206
+ "score": 0.978249479085207,
207
+ "decision": "upcycle_moe"
208
+ },
209
+ "model.layers.15.mlp.down_proj.weight": {
210
+ "score": 0.9782819077372551,
211
+ "decision": "upcycle_moe"
212
+ },
213
+ "model.layers.16.mlp.gate_proj.weight": {
214
+ "score": 0.9815784655511379,
215
+ "decision": "upcycle_moe"
216
+ },
217
+ "model.layers.16.mlp.up_proj.weight": {
218
+ "score": 0.9783014710992575,
219
+ "decision": "upcycle_moe"
220
+ },
221
+ "model.layers.16.mlp.down_proj.weight": {
222
+ "score": 0.9782586451619864,
223
+ "decision": "upcycle_moe"
224
+ },
225
+ "model.layers.17.mlp.gate_proj.weight": {
226
+ "score": 0.9818551931530237,
227
+ "decision": "upcycle_moe"
228
+ },
229
+ "model.layers.17.mlp.up_proj.weight": {
230
+ "score": 0.978881973773241,
231
+ "decision": "upcycle_moe"
232
+ },
233
+ "model.layers.17.mlp.down_proj.weight": {
234
+ "score": 0.9785361550748348,
235
+ "decision": "upcycle_moe"
236
+ },
237
+ "model.layers.18.mlp.gate_proj.weight": {
238
+ "score": 0.9821842219680548,
239
+ "decision": "upcycle_moe"
240
+ },
241
+ "model.layers.18.mlp.up_proj.weight": {
242
+ "score": 0.9791757967323065,
243
+ "decision": "upcycle_moe"
244
+ },
245
+ "model.layers.18.mlp.down_proj.weight": {
246
+ "score": 0.9788669720292091,
247
+ "decision": "upcycle_moe"
248
+ },
249
+ "model.layers.19.mlp.gate_proj.weight": {
250
+ "score": 0.9823547545820475,
251
+ "decision": "upcycle_moe"
252
+ },
253
+ "model.layers.19.mlp.up_proj.weight": {
254
+ "score": 0.9793595056980848,
255
+ "decision": "upcycle_moe"
256
+ },
257
+ "model.layers.19.mlp.down_proj.weight": {
258
+ "score": 0.9791094567626715,
259
+ "decision": "upcycle_moe"
260
+ },
261
+ "model.layers.20.mlp.gate_proj.weight": {
262
+ "score": 0.9826161675155163,
263
+ "decision": "upcycle_moe"
264
+ },
265
+ "model.layers.20.mlp.up_proj.weight": {
266
+ "score": 0.9792643412947655,
267
+ "decision": "upcycle_moe"
268
+ },
269
+ "model.layers.20.mlp.down_proj.weight": {
270
+ "score": 0.9788958225399256,
271
+ "decision": "upcycle_moe"
272
+ },
273
+ "model.layers.21.mlp.gate_proj.weight": {
274
+ "score": 0.9826418738812208,
275
+ "decision": "upcycle_moe"
276
+ },
277
+ "model.layers.21.mlp.up_proj.weight": {
278
+ "score": 0.9789138380438089,
279
+ "decision": "upcycle_moe"
280
+ },
281
+ "model.layers.21.mlp.down_proj.weight": {
282
+ "score": 0.9783939477056265,
283
+ "decision": "upcycle_moe"
284
+ },
285
+ "model.layers.22.mlp.gate_proj.weight": {
286
+ "score": 0.9819891918450594,
287
+ "decision": "upcycle_moe"
288
+ },
289
+ "model.layers.22.mlp.up_proj.weight": {
290
+ "score": 0.9783588405698538,
291
+ "decision": "upcycle_moe"
292
+ },
293
+ "model.layers.22.mlp.down_proj.weight": {
294
+ "score": 0.9774981644004583,
295
+ "decision": "upcycle_moe"
296
+ },
297
+ "model.layers.23.mlp.gate_proj.weight": {
298
+ "score": 0.9811321999877691,
299
+ "decision": "upcycle_moe"
300
+ },
301
+ "model.layers.23.mlp.up_proj.weight": {
302
+ "score": 0.9782727509737015,
303
+ "decision": "upcycle_moe"
304
+ },
305
+ "model.layers.23.mlp.down_proj.weight": {
306
+ "score": 0.9773894734680653,
307
+ "decision": "upcycle_moe"
308
+ },
309
+ "model.layers.24.mlp.gate_proj.weight": {
310
+ "score": 0.9806375782936811,
311
+ "decision": "upcycle_moe"
312
+ },
313
+ "model.layers.24.mlp.up_proj.weight": {
314
+ "score": 0.9782112184911966,
315
+ "decision": "upcycle_moe"
316
+ },
317
+ "model.layers.24.mlp.down_proj.weight": {
318
+ "score": 0.9772476609796286,
319
+ "decision": "upcycle_moe"
320
+ },
321
+ "model.layers.25.mlp.gate_proj.weight": {
322
+ "score": 0.9800966791808605,
323
+ "decision": "upcycle_moe"
324
+ },
325
+ "model.layers.25.mlp.up_proj.weight": {
326
+ "score": 0.9781745038926601,
327
+ "decision": "upcycle_moe"
328
+ },
329
+ "model.layers.25.mlp.down_proj.weight": {
330
+ "score": 0.9775232560932636,
331
+ "decision": "upcycle_moe"
332
+ },
333
+ "model.layers.26.mlp.gate_proj.weight": {
334
+ "score": 0.9799431785941124,
335
+ "decision": "upcycle_moe"
336
+ },
337
+ "model.layers.26.mlp.up_proj.weight": {
338
+ "score": 0.9781098868697882,
339
+ "decision": "upcycle_moe"
340
+ },
341
+ "model.layers.26.mlp.down_proj.weight": {
342
+ "score": 0.9775767754763365,
343
+ "decision": "upcycle_moe"
344
+ },
345
+ "model.layers.27.mlp.gate_proj.weight": {
346
+ "score": 0.9798190761357546,
347
+ "decision": "upcycle_moe"
348
+ },
349
+ "model.layers.27.mlp.up_proj.weight": {
350
+ "score": 0.9779539816081524,
351
+ "decision": "upcycle_moe"
352
+ },
353
+ "model.layers.27.mlp.down_proj.weight": {
354
+ "score": 0.9775818083435297,
355
+ "decision": "upcycle_moe"
356
+ },
357
+ "model.layers.28.mlp.gate_proj.weight": {
358
+ "score": 0.9798130877315998,
359
+ "decision": "upcycle_moe"
360
+ },
361
+ "model.layers.28.mlp.up_proj.weight": {
362
+ "score": 0.9780505858361721,
363
+ "decision": "upcycle_moe"
364
+ },
365
+ "model.layers.28.mlp.down_proj.weight": {
366
+ "score": 0.9777195602655411,
367
+ "decision": "upcycle_moe"
368
+ },
369
+ "model.layers.29.mlp.gate_proj.weight": {
370
+ "score": 0.97960414737463,
371
+ "decision": "upcycle_moe"
372
+ },
373
+ "model.layers.29.mlp.up_proj.weight": {
374
+ "score": 0.9777578953653574,
375
+ "decision": "upcycle_moe"
376
+ },
377
+ "model.layers.29.mlp.down_proj.weight": {
378
+ "score": 0.9774074014276266,
379
+ "decision": "upcycle_moe"
380
+ },
381
+ "model.layers.30.mlp.gate_proj.weight": {
382
+ "score": 0.979683106765151,
383
+ "decision": "upcycle_moe"
384
+ },
385
+ "model.layers.30.mlp.up_proj.weight": {
386
+ "score": 0.9778437279164791,
387
+ "decision": "upcycle_moe"
388
+ },
389
+ "model.layers.30.mlp.down_proj.weight": {
390
+ "score": 0.9775861166417599,
391
+ "decision": "upcycle_moe"
392
+ },
393
+ "model.layers.31.mlp.gate_proj.weight": {
394
+ "score": 0.9799774046987295,
395
+ "decision": "upcycle_moe"
396
+ },
397
+ "model.layers.31.mlp.up_proj.weight": {
398
+ "score": 0.9778894502669573,
399
+ "decision": "upcycle_moe"
400
+ },
401
+ "model.layers.31.mlp.down_proj.weight": {
402
+ "score": 0.9775977656245232,
403
+ "decision": "upcycle_moe"
404
+ },
405
+ "model.layers.32.mlp.gate_proj.weight": {
406
+ "score": 0.9808168206363916,
407
+ "decision": "upcycle_moe"
408
+ },
409
+ "model.layers.32.mlp.up_proj.weight": {
410
+ "score": 0.9781559966504574,
411
+ "decision": "upcycle_moe"
412
+ },
413
+ "model.layers.32.mlp.down_proj.weight": {
414
+ "score": 0.9776150044053793,
415
+ "decision": "upcycle_moe"
416
+ },
417
+ "model.layers.33.mlp.gate_proj.weight": {
418
+ "score": 0.9813535250723362,
419
+ "decision": "upcycle_moe"
420
+ },
421
+ "model.layers.33.mlp.up_proj.weight": {
422
+ "score": 0.9779957178980112,
423
+ "decision": "upcycle_moe"
424
+ },
425
+ "model.layers.33.mlp.down_proj.weight": {
426
+ "score": 0.9773503355681896,
427
+ "decision": "upcycle_moe"
428
+ },
429
+ "model.layers.34.mlp.gate_proj.weight": {
430
+ "score": 0.9805151708424091,
431
+ "decision": "upcycle_moe"
432
+ },
433
+ "model.layers.34.mlp.up_proj.weight": {
434
+ "score": 0.9770685620605946,
435
+ "decision": "upcycle_moe"
436
+ },
437
+ "model.layers.34.mlp.down_proj.weight": {
438
+ "score": 0.9765571355819702,
439
+ "decision": "upcycle_moe"
440
+ },
441
+ "model.layers.35.mlp.gate_proj.weight": {
442
+ "score": 0.9779612477868795,
443
+ "decision": "upcycle_moe"
444
+ },
445
+ "model.layers.35.mlp.up_proj.weight": {
446
+ "score": 0.9755715448409319,
447
+ "decision": "upcycle_moe"
448
+ },
449
+ "model.layers.35.mlp.down_proj.weight": {
450
+ "score": 0.9756422452628613,
451
+ "decision": "upcycle_moe"
452
+ }
453
+ },
454
+ "moe_layers": {
455
+ "model.layers.0.mlp.gate_proj.weight": {
456
+ "num_experts": 2,
457
+ "score": 0.9758567679673433
458
+ },
459
+ "model.layers.0.mlp.up_proj.weight": {
460
+ "num_experts": 2,
461
+ "score": 0.9773118272423744
462
+ },
463
+ "model.layers.0.mlp.down_proj.weight": {
464
+ "num_experts": 2,
465
+ "score": 0.9756715409457684
466
+ },
467
+ "model.layers.1.mlp.gate_proj.weight": {
468
+ "num_experts": 2,
469
+ "score": 0.9674257636070251
470
+ },
471
+ "model.layers.1.mlp.up_proj.weight": {
472
+ "num_experts": 2,
473
+ "score": 0.9758508205413818
474
+ },
475
+ "model.layers.1.mlp.down_proj.weight": {
476
+ "num_experts": 2,
477
+ "score": 0.9768250994384289
478
+ },
479
+ "model.layers.2.mlp.gate_proj.weight": {
480
+ "num_experts": 2,
481
+ "score": 0.973158223554492
482
+ },
483
+ "model.layers.2.mlp.up_proj.weight": {
484
+ "num_experts": 2,
485
+ "score": 0.9724544435739517
486
+ },
487
+ "model.layers.2.mlp.down_proj.weight": {
488
+ "num_experts": 2,
489
+ "score": 0.975229199975729
490
+ },
491
+ "model.layers.3.mlp.gate_proj.weight": {
492
+ "num_experts": 2,
493
+ "score": 0.9705729000270367
494
+ },
495
+ "model.layers.3.mlp.up_proj.weight": {
496
+ "num_experts": 2,
497
+ "score": 0.9761323314160109
498
+ },
499
+ "model.layers.3.mlp.down_proj.weight": {
500
+ "num_experts": 2,
501
+ "score": 0.9766243025660515
502
+ },
503
+ "model.layers.4.mlp.gate_proj.weight": {
504
+ "num_experts": 2,
505
+ "score": 0.9693492259830236
506
+ },
507
+ "model.layers.4.mlp.up_proj.weight": {
508
+ "num_experts": 2,
509
+ "score": 0.9771679863333702
510
+ },
511
+ "model.layers.4.mlp.down_proj.weight": {
512
+ "num_experts": 2,
513
+ "score": 0.9777354244142771
514
+ },
515
+ "model.layers.5.mlp.gate_proj.weight": {
516
+ "num_experts": 2,
517
+ "score": 0.9719736948609352
518
+ },
519
+ "model.layers.5.mlp.up_proj.weight": {
520
+ "num_experts": 2,
521
+ "score": 0.9768269564956427
522
+ },
523
+ "model.layers.5.mlp.down_proj.weight": {
524
+ "num_experts": 2,
525
+ "score": 0.9775361530482769
526
+ },
527
+ "model.layers.6.mlp.gate_proj.weight": {
528
+ "num_experts": 2,
529
+ "score": 0.9740530084818602
530
+ },
531
+ "model.layers.6.mlp.up_proj.weight": {
532
+ "num_experts": 2,
533
+ "score": 0.9770794659852982
534
+ },
535
+ "model.layers.6.mlp.down_proj.weight": {
536
+ "num_experts": 2,
537
+ "score": 0.9774630032479763
538
+ },
539
+ "model.layers.7.mlp.gate_proj.weight": {
540
+ "num_experts": 2,
541
+ "score": 0.9748863540589809
542
+ },
543
+ "model.layers.7.mlp.up_proj.weight": {
544
+ "num_experts": 2,
545
+ "score": 0.9771392066031694
546
+ },
547
+ "model.layers.7.mlp.down_proj.weight": {
548
+ "num_experts": 2,
549
+ "score": 0.9776077885180712
550
+ },
551
+ "model.layers.8.mlp.gate_proj.weight": {
552
+ "num_experts": 2,
553
+ "score": 0.9771762136369944
554
+ },
555
+ "model.layers.8.mlp.up_proj.weight": {
556
+ "num_experts": 2,
557
+ "score": 0.9769909419119358
558
+ },
559
+ "model.layers.8.mlp.down_proj.weight": {
560
+ "num_experts": 2,
561
+ "score": 0.9772236570715904
562
+ },
563
+ "model.layers.9.mlp.gate_proj.weight": {
564
+ "num_experts": 2,
565
+ "score": 0.9760845918208361
566
+ },
567
+ "model.layers.9.mlp.up_proj.weight": {
568
+ "num_experts": 2,
569
+ "score": 0.9766342304646969
570
+ },
571
+ "model.layers.9.mlp.down_proj.weight": {
572
+ "num_experts": 2,
573
+ "score": 0.9773884564638138
574
+ },
575
+ "model.layers.10.mlp.gate_proj.weight": {
576
+ "num_experts": 2,
577
+ "score": 0.977036003023386
578
+ },
579
+ "model.layers.10.mlp.up_proj.weight": {
580
+ "num_experts": 2,
581
+ "score": 0.9762268159538507
582
+ },
583
+ "model.layers.10.mlp.down_proj.weight": {
584
+ "num_experts": 2,
585
+ "score": 0.9767854642122984
586
+ },
587
+ "model.layers.11.mlp.gate_proj.weight": {
588
+ "num_experts": 2,
589
+ "score": 0.9778869729489088
590
+ },
591
+ "model.layers.11.mlp.up_proj.weight": {
592
+ "num_experts": 2,
593
+ "score": 0.9764058664441109
594
+ },
595
+ "model.layers.11.mlp.down_proj.weight": {
596
+ "num_experts": 2,
597
+ "score": 0.9770964700728655
598
+ },
599
+ "model.layers.12.mlp.gate_proj.weight": {
600
+ "num_experts": 2,
601
+ "score": 0.9788605384528637
602
+ },
603
+ "model.layers.12.mlp.up_proj.weight": {
604
+ "num_experts": 2,
605
+ "score": 0.9766901917755604
606
+ },
607
+ "model.layers.12.mlp.down_proj.weight": {
608
+ "num_experts": 2,
609
+ "score": 0.9772241190075874
610
+ },
611
+ "model.layers.13.mlp.gate_proj.weight": {
612
+ "num_experts": 2,
613
+ "score": 0.9798912685364485
614
+ },
615
+ "model.layers.13.mlp.up_proj.weight": {
616
+ "num_experts": 2,
617
+ "score": 0.9771493915468454
618
+ },
619
+ "model.layers.13.mlp.down_proj.weight": {
620
+ "num_experts": 2,
621
+ "score": 0.9775336850434542
622
+ },
623
+ "model.layers.14.mlp.gate_proj.weight": {
624
+ "num_experts": 2,
625
+ "score": 0.9807573985308409
626
+ },
627
+ "model.layers.14.mlp.up_proj.weight": {
628
+ "num_experts": 2,
629
+ "score": 0.9772995170205832
630
+ },
631
+ "model.layers.14.mlp.down_proj.weight": {
632
+ "num_experts": 2,
633
+ "score": 0.9775482192635536
634
+ },
635
+ "model.layers.15.mlp.gate_proj.weight": {
636
+ "num_experts": 2,
637
+ "score": 0.9814920481294394
638
+ },
639
+ "model.layers.15.mlp.up_proj.weight": {
640
+ "num_experts": 2,
641
+ "score": 0.978249479085207
642
+ },
643
+ "model.layers.15.mlp.down_proj.weight": {
644
+ "num_experts": 2,
645
+ "score": 0.9782819077372551
646
+ },
647
+ "model.layers.16.mlp.gate_proj.weight": {
648
+ "num_experts": 2,
649
+ "score": 0.9815784655511379
650
+ },
651
+ "model.layers.16.mlp.up_proj.weight": {
652
+ "num_experts": 2,
653
+ "score": 0.9783014710992575
654
+ },
655
+ "model.layers.16.mlp.down_proj.weight": {
656
+ "num_experts": 2,
657
+ "score": 0.9782586451619864
658
+ },
659
+ "model.layers.17.mlp.gate_proj.weight": {
660
+ "num_experts": 2,
661
+ "score": 0.9818551931530237
662
+ },
663
+ "model.layers.17.mlp.up_proj.weight": {
664
+ "num_experts": 2,
665
+ "score": 0.978881973773241
666
+ },
667
+ "model.layers.17.mlp.down_proj.weight": {
668
+ "num_experts": 2,
669
+ "score": 0.9785361550748348
670
+ },
671
+ "model.layers.18.mlp.gate_proj.weight": {
672
+ "num_experts": 2,
673
+ "score": 0.9821842219680548
674
+ },
675
+ "model.layers.18.mlp.up_proj.weight": {
676
+ "num_experts": 2,
677
+ "score": 0.9791757967323065
678
+ },
679
+ "model.layers.18.mlp.down_proj.weight": {
680
+ "num_experts": 2,
681
+ "score": 0.9788669720292091
682
+ },
683
+ "model.layers.19.mlp.gate_proj.weight": {
684
+ "num_experts": 2,
685
+ "score": 0.9823547545820475
686
+ },
687
+ "model.layers.19.mlp.up_proj.weight": {
688
+ "num_experts": 2,
689
+ "score": 0.9793595056980848
690
+ },
691
+ "model.layers.19.mlp.down_proj.weight": {
692
+ "num_experts": 2,
693
+ "score": 0.9791094567626715
694
+ },
695
+ "model.layers.20.mlp.gate_proj.weight": {
696
+ "num_experts": 2,
697
+ "score": 0.9826161675155163
698
+ },
699
+ "model.layers.20.mlp.up_proj.weight": {
700
+ "num_experts": 2,
701
+ "score": 0.9792643412947655
702
+ },
703
+ "model.layers.20.mlp.down_proj.weight": {
704
+ "num_experts": 2,
705
+ "score": 0.9788958225399256
706
+ },
707
+ "model.layers.21.mlp.gate_proj.weight": {
708
+ "num_experts": 2,
709
+ "score": 0.9826418738812208
710
+ },
711
+ "model.layers.21.mlp.up_proj.weight": {
712
+ "num_experts": 2,
713
+ "score": 0.9789138380438089
714
+ },
715
+ "model.layers.21.mlp.down_proj.weight": {
716
+ "num_experts": 2,
717
+ "score": 0.9783939477056265
718
+ },
719
+ "model.layers.22.mlp.gate_proj.weight": {
720
+ "num_experts": 2,
721
+ "score": 0.9819891918450594
722
+ },
723
+ "model.layers.22.mlp.up_proj.weight": {
724
+ "num_experts": 2,
725
+ "score": 0.9783588405698538
726
+ },
727
+ "model.layers.22.mlp.down_proj.weight": {
728
+ "num_experts": 2,
729
+ "score": 0.9774981644004583
730
+ },
731
+ "model.layers.23.mlp.gate_proj.weight": {
732
+ "num_experts": 2,
733
+ "score": 0.9811321999877691
734
+ },
735
+ "model.layers.23.mlp.up_proj.weight": {
736
+ "num_experts": 2,
737
+ "score": 0.9782727509737015
738
+ },
739
+ "model.layers.23.mlp.down_proj.weight": {
740
+ "num_experts": 2,
741
+ "score": 0.9773894734680653
742
+ },
743
+ "model.layers.24.mlp.gate_proj.weight": {
744
+ "num_experts": 2,
745
+ "score": 0.9806375782936811
746
+ },
747
+ "model.layers.24.mlp.up_proj.weight": {
748
+ "num_experts": 2,
749
+ "score": 0.9782112184911966
750
+ },
751
+ "model.layers.24.mlp.down_proj.weight": {
752
+ "num_experts": 2,
753
+ "score": 0.9772476609796286
754
+ },
755
+ "model.layers.25.mlp.gate_proj.weight": {
756
+ "num_experts": 2,
757
+ "score": 0.9800966791808605
758
+ },
759
+ "model.layers.25.mlp.up_proj.weight": {
760
+ "num_experts": 2,
761
+ "score": 0.9781745038926601
762
+ },
763
+ "model.layers.25.mlp.down_proj.weight": {
764
+ "num_experts": 2,
765
+ "score": 0.9775232560932636
766
+ },
767
+ "model.layers.26.mlp.gate_proj.weight": {
768
+ "num_experts": 2,
769
+ "score": 0.9799431785941124
770
+ },
771
+ "model.layers.26.mlp.up_proj.weight": {
772
+ "num_experts": 2,
773
+ "score": 0.9781098868697882
774
+ },
775
+ "model.layers.26.mlp.down_proj.weight": {
776
+ "num_experts": 2,
777
+ "score": 0.9775767754763365
778
+ },
779
+ "model.layers.27.mlp.gate_proj.weight": {
780
+ "num_experts": 2,
781
+ "score": 0.9798190761357546
782
+ },
783
+ "model.layers.27.mlp.up_proj.weight": {
784
+ "num_experts": 2,
785
+ "score": 0.9779539816081524
786
+ },
787
+ "model.layers.27.mlp.down_proj.weight": {
788
+ "num_experts": 2,
789
+ "score": 0.9775818083435297
790
+ },
791
+ "model.layers.28.mlp.gate_proj.weight": {
792
+ "num_experts": 2,
793
+ "score": 0.9798130877315998
794
+ },
795
+ "model.layers.28.mlp.up_proj.weight": {
796
+ "num_experts": 2,
797
+ "score": 0.9780505858361721
798
+ },
799
+ "model.layers.28.mlp.down_proj.weight": {
800
+ "num_experts": 2,
801
+ "score": 0.9777195602655411
802
+ },
803
+ "model.layers.29.mlp.gate_proj.weight": {
804
+ "num_experts": 2,
805
+ "score": 0.97960414737463
806
+ },
807
+ "model.layers.29.mlp.up_proj.weight": {
808
+ "num_experts": 2,
809
+ "score": 0.9777578953653574
810
+ },
811
+ "model.layers.29.mlp.down_proj.weight": {
812
+ "num_experts": 2,
813
+ "score": 0.9774074014276266
814
+ },
815
+ "model.layers.30.mlp.gate_proj.weight": {
816
+ "num_experts": 2,
817
+ "score": 0.979683106765151
818
+ },
819
+ "model.layers.30.mlp.up_proj.weight": {
820
+ "num_experts": 2,
821
+ "score": 0.9778437279164791
822
+ },
823
+ "model.layers.30.mlp.down_proj.weight": {
824
+ "num_experts": 2,
825
+ "score": 0.9775861166417599
826
+ },
827
+ "model.layers.31.mlp.gate_proj.weight": {
828
+ "num_experts": 2,
829
+ "score": 0.9799774046987295
830
+ },
831
+ "model.layers.31.mlp.up_proj.weight": {
832
+ "num_experts": 2,
833
+ "score": 0.9778894502669573
834
+ },
835
+ "model.layers.31.mlp.down_proj.weight": {
836
+ "num_experts": 2,
837
+ "score": 0.9775977656245232
838
+ },
839
+ "model.layers.32.mlp.gate_proj.weight": {
840
+ "num_experts": 2,
841
+ "score": 0.9808168206363916
842
+ },
843
+ "model.layers.32.mlp.up_proj.weight": {
844
+ "num_experts": 2,
845
+ "score": 0.9781559966504574
846
+ },
847
+ "model.layers.32.mlp.down_proj.weight": {
848
+ "num_experts": 2,
849
+ "score": 0.9776150044053793
850
+ },
851
+ "model.layers.33.mlp.gate_proj.weight": {
852
+ "num_experts": 2,
853
+ "score": 0.9813535250723362
854
+ },
855
+ "model.layers.33.mlp.up_proj.weight": {
856
+ "num_experts": 2,
857
+ "score": 0.9779957178980112
858
+ },
859
+ "model.layers.33.mlp.down_proj.weight": {
860
+ "num_experts": 2,
861
+ "score": 0.9773503355681896
862
+ },
863
+ "model.layers.34.mlp.gate_proj.weight": {
864
+ "num_experts": 2,
865
+ "score": 0.9805151708424091
866
+ },
867
+ "model.layers.34.mlp.up_proj.weight": {
868
+ "num_experts": 2,
869
+ "score": 0.9770685620605946
870
+ },
871
+ "model.layers.34.mlp.down_proj.weight": {
872
+ "num_experts": 2,
873
+ "score": 0.9765571355819702
874
+ },
875
+ "model.layers.35.mlp.gate_proj.weight": {
876
+ "num_experts": 2,
877
+ "score": 0.9779612477868795
878
+ },
879
+ "model.layers.35.mlp.up_proj.weight": {
880
+ "num_experts": 2,
881
+ "score": 0.9755715448409319
882
+ },
883
+ "model.layers.35.mlp.down_proj.weight": {
884
+ "num_experts": 2,
885
+ "score": 0.9756422452628613
886
+ }
887
+ }
888
+ }
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b89c47f8598b12a217beabdb6a895b03e0583934e605ae233e230d838c15cf75
3
+ size 4967215360
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88cde30d6f41c57b2d874f48465956b7b75944efacc91174f6f1316b21c85951
3
+ size 3077766632
model.safetensors.index.json ADDED
@@ -0,0 +1,406 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_parameters": 4022468096,
4
+ "total_size": 8044936192
5
+ },
6
+ "weight_map": {
7
+ "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
13
+ "model.layers.0.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
14
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
15
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
16
+ "model.layers.0.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
17
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
18
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
19
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
20
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
21
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
22
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
23
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
24
+ "model.layers.1.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
25
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
26
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
27
+ "model.layers.1.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
28
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
29
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
30
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
31
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
32
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
33
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
34
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
35
+ "model.layers.10.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
36
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
37
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
38
+ "model.layers.10.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
39
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
40
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
41
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
42
+ "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
43
+ "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
44
+ "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
45
+ "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
46
+ "model.layers.11.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
47
+ "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
48
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
49
+ "model.layers.11.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
50
+ "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
51
+ "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
52
+ "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
53
+ "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
54
+ "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
55
+ "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
56
+ "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
57
+ "model.layers.12.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
58
+ "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
59
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
60
+ "model.layers.12.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
61
+ "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
62
+ "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
63
+ "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
64
+ "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
65
+ "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
66
+ "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
67
+ "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
68
+ "model.layers.13.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
69
+ "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
70
+ "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
71
+ "model.layers.13.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
72
+ "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
73
+ "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
74
+ "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
75
+ "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
76
+ "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
77
+ "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
78
+ "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
79
+ "model.layers.14.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
80
+ "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
81
+ "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
82
+ "model.layers.14.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
83
+ "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
84
+ "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
85
+ "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
86
+ "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
87
+ "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
88
+ "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
89
+ "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
90
+ "model.layers.15.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
91
+ "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
92
+ "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
93
+ "model.layers.15.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
94
+ "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
95
+ "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
96
+ "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
97
+ "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
98
+ "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
99
+ "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
100
+ "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
101
+ "model.layers.16.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
102
+ "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
103
+ "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
104
+ "model.layers.16.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
105
+ "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
106
+ "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
107
+ "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
108
+ "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
109
+ "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
110
+ "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
111
+ "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
112
+ "model.layers.17.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
113
+ "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
114
+ "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
115
+ "model.layers.17.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
116
+ "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
117
+ "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
118
+ "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
119
+ "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
120
+ "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
121
+ "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
122
+ "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
123
+ "model.layers.18.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
124
+ "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
125
+ "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
126
+ "model.layers.18.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
127
+ "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
128
+ "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
129
+ "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
130
+ "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
131
+ "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
132
+ "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
133
+ "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
134
+ "model.layers.19.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
135
+ "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
136
+ "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
137
+ "model.layers.19.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
138
+ "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
139
+ "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
140
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
141
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
142
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
143
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
144
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
145
+ "model.layers.2.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
146
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
147
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
148
+ "model.layers.2.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
149
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
150
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
151
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
152
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
153
+ "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
154
+ "model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
155
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
156
+ "model.layers.20.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
157
+ "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
158
+ "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
159
+ "model.layers.20.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
160
+ "model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
161
+ "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
162
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
163
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
164
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
165
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
166
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
167
+ "model.layers.21.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
168
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
169
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
170
+ "model.layers.21.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
171
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
172
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
173
+ "model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
174
+ "model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
175
+ "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
176
+ "model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
177
+ "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
178
+ "model.layers.22.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
179
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
180
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
181
+ "model.layers.22.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
182
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
183
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
184
+ "model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
185
+ "model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
186
+ "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
187
+ "model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
188
+ "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
189
+ "model.layers.23.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
190
+ "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
191
+ "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
192
+ "model.layers.23.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
193
+ "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
194
+ "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
195
+ "model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
196
+ "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
197
+ "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
198
+ "model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
199
+ "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
200
+ "model.layers.24.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
201
+ "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
202
+ "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
203
+ "model.layers.24.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
204
+ "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
205
+ "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
206
+ "model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
207
+ "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
208
+ "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
209
+ "model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
210
+ "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
211
+ "model.layers.25.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
212
+ "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
213
+ "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
214
+ "model.layers.25.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
215
+ "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
216
+ "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
217
+ "model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
218
+ "model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
219
+ "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
220
+ "model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
221
+ "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
222
+ "model.layers.26.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
223
+ "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
224
+ "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
225
+ "model.layers.26.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
226
+ "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
227
+ "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
228
+ "model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
229
+ "model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
230
+ "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
231
+ "model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
232
+ "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
233
+ "model.layers.27.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
234
+ "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
235
+ "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
236
+ "model.layers.27.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
237
+ "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
238
+ "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
239
+ "model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
240
+ "model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
241
+ "model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
242
+ "model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
243
+ "model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
244
+ "model.layers.28.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
245
+ "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
246
+ "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
247
+ "model.layers.28.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
248
+ "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
249
+ "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
250
+ "model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
251
+ "model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
252
+ "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
253
+ "model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
254
+ "model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
255
+ "model.layers.29.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
256
+ "model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
257
+ "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
258
+ "model.layers.29.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
259
+ "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
260
+ "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
261
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
262
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
263
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
264
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
265
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
266
+ "model.layers.3.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
267
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
268
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
269
+ "model.layers.3.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
270
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
271
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
272
+ "model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
273
+ "model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
274
+ "model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
275
+ "model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
276
+ "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
277
+ "model.layers.30.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
278
+ "model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
279
+ "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
280
+ "model.layers.30.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
281
+ "model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
282
+ "model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
283
+ "model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
284
+ "model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
285
+ "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
286
+ "model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
287
+ "model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
288
+ "model.layers.31.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
289
+ "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
290
+ "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
291
+ "model.layers.31.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
292
+ "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
293
+ "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
294
+ "model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
295
+ "model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
296
+ "model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
297
+ "model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
298
+ "model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
299
+ "model.layers.32.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
300
+ "model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
301
+ "model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
302
+ "model.layers.32.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
303
+ "model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
304
+ "model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
305
+ "model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
306
+ "model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
307
+ "model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
308
+ "model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
309
+ "model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
310
+ "model.layers.33.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
311
+ "model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
312
+ "model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
313
+ "model.layers.33.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
314
+ "model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
315
+ "model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
316
+ "model.layers.34.input_layernorm.weight": "model-00002-of-00002.safetensors",
317
+ "model.layers.34.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
318
+ "model.layers.34.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
319
+ "model.layers.34.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
320
+ "model.layers.34.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
321
+ "model.layers.34.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
322
+ "model.layers.34.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
323
+ "model.layers.34.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
324
+ "model.layers.34.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
325
+ "model.layers.34.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
326
+ "model.layers.34.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
327
+ "model.layers.35.input_layernorm.weight": "model-00002-of-00002.safetensors",
328
+ "model.layers.35.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
329
+ "model.layers.35.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
330
+ "model.layers.35.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
331
+ "model.layers.35.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
332
+ "model.layers.35.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
333
+ "model.layers.35.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
334
+ "model.layers.35.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
335
+ "model.layers.35.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
336
+ "model.layers.35.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
337
+ "model.layers.35.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
338
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
339
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
340
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
341
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
342
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
343
+ "model.layers.4.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
344
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
345
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
346
+ "model.layers.4.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
347
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
348
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
349
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
350
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
351
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
352
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
353
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
354
+ "model.layers.5.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
355
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
356
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
357
+ "model.layers.5.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
358
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
359
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
360
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
361
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
362
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
363
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
364
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
365
+ "model.layers.6.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
366
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
367
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
368
+ "model.layers.6.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
369
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
370
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
371
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
372
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
373
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
374
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
375
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
376
+ "model.layers.7.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
377
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
378
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
379
+ "model.layers.7.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
380
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
381
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
382
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
383
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
384
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
385
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
386
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
387
+ "model.layers.8.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
388
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
389
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
390
+ "model.layers.8.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
391
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
392
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
393
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
394
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
395
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
396
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
397
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
398
+ "model.layers.9.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
399
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
400
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
401
+ "model.layers.9.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
402
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
403
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
404
+ "model.norm.weight": "model-00002-of-00002.safetensors"
405
+ }
406
+ }