loulou2 commited on
Commit
9efa5b2
·
verified ·
1 Parent(s): 29bd162

Add files using upload-large-folder tool

Browse files
Files changed (5) hide show
  1. README.md +88 -0
  2. config.json +1284 -0
  3. generation_config.json +12 -0
  4. model.safetensors +3 -0
  5. smash_config.json +39 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - safetensors
5
+ - pruna_pro-ai
6
+ - pruna-ai
7
+ ---
8
+
9
+ # Model Card for loulou2/tiny_llama_higgs
10
+
11
+ This model was created using the [pruna](https://github.com/PrunaAI/pruna) library. Pruna is a model optimization framework built for developers, enabling you to deliver more efficient models with minimal implementation overhead.
12
+
13
+ ## Usage
14
+
15
+ First things first, you need to install the pruna library:
16
+
17
+ ```bash
18
+ pip install pruna_pro
19
+ ```
20
+
21
+ You can [use the transformers library to load the model](https://huggingface.co/loulou2/tiny_llama_higgs?library=transformers) but this might not include all optimizations by default.
22
+
23
+ To ensure that all optimizations are applied, use the pruna library to load the model using the following code:
24
+
25
+ ```python
26
+ from pruna_pro import PrunaProModel
27
+
28
+ loaded_model = PrunaProModel.from_pretrained(
29
+ "loulou2/tiny_llama_higgs"
30
+ )
31
+ # we can then run inference using the methods supported by the base model
32
+ ```
33
+
34
+ Alternatively, you can visit [the Pruna documentation](https://docs.pruna.ai/en/stable/) for more information.
35
+
36
+ ## Smash Configuration
37
+
38
+ The compression configuration of the model is stored in the `smash_config.json` file, which describes the optimization methods that were applied to the model.
39
+
40
+ ```bash
41
+ {
42
+ "batcher": null,
43
+ "cacher": null,
44
+ "compiler": null,
45
+ "distiller": null,
46
+ "distributer": null,
47
+ "enhancer": null,
48
+ "factorizer": null,
49
+ "kernel": null,
50
+ "pruner": null,
51
+ "quantizer": "higgs",
52
+ "recoverer": null,
53
+ "higgs_group_size": 256,
54
+ "higgs_hadamard_size": 1024,
55
+ "higgs_p": 2,
56
+ "higgs_weight_bits": 4,
57
+ "batch_size": 1,
58
+ "device": "cuda",
59
+ "device_map": null,
60
+ "save_fns": [
61
+ "transformers_higgs"
62
+ ],
63
+ "load_fns": [
64
+ "transformers_higgs"
65
+ ],
66
+ "reapply_after_load": {
67
+ "factorizer": null,
68
+ "pruner": null,
69
+ "quantizer": null,
70
+ "distiller": null,
71
+ "kernel": null,
72
+ "cacher": null,
73
+ "recoverer": null,
74
+ "distributer": null,
75
+ "compiler": null,
76
+ "batcher": null,
77
+ "enhancer": null
78
+ }
79
+ }
80
+ ```
81
+
82
+ ## 🌍 Join the Pruna AI community!
83
+
84
+ [![Twitter](https://img.shields.io/twitter/follow/PrunaAI?style=social)](https://twitter.com/PrunaAI)
85
+ [![GitHub](https://img.shields.io/github/followers/PrunaAI?label=Follow%20%40PrunaAI&style=social)](https://github.com/PrunaAI)
86
+ [![LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue)](https://www.linkedin.com/company/93832878/admin/feed/posts/?feedType=following)
87
+ [![Discord](https://img.shields.io/badge/Discord-Join%20Us-blue?style=social&logo=discord)](https://discord.gg/JFQmtFKCjd)
88
+ [![Reddit](https://img.shields.io/reddit/subreddit-subscribers/PrunaAI?style=social)](https://www.reddit.com/r/PrunaAI/)
config.json ADDED
@@ -0,0 +1,1284 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 128000,
8
+ "eos_token_id": [
9
+ 128001,
10
+ 128008,
11
+ 128009
12
+ ],
13
+ "head_dim": 64,
14
+ "hidden_act": "silu",
15
+ "hidden_size": 2048,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 8192,
18
+ "max_position_embeddings": 131072,
19
+ "mlp_bias": false,
20
+ "model_type": "llama",
21
+ "num_attention_heads": 32,
22
+ "num_hidden_layers": 16,
23
+ "num_key_value_heads": 8,
24
+ "pretraining_tp": 1,
25
+ "quantization_config": {
26
+ "bits": 4,
27
+ "example_batch_size": 1,
28
+ "group_size": 256,
29
+ "hadamard_size": 1024,
30
+ "modules_to_not_convert": [
31
+ "lm_head"
32
+ ],
33
+ "p": 2,
34
+ "quant_method": "higgs",
35
+ "tune_metadata": {
36
+ "model.layers.0.mlp.down_proj": {
37
+ "K": 8192,
38
+ "M": 1,
39
+ "N": 2048,
40
+ "device": "cuda:0",
41
+ "dtype": "torch.bfloat16",
42
+ "group_size": 256,
43
+ "num_bits": 4,
44
+ "num_sms": 114,
45
+ "template_id": 136
46
+ },
47
+ "model.layers.0.mlp.gate_proj": {
48
+ "K": 2048,
49
+ "M": 1,
50
+ "N": 8192,
51
+ "device": "cuda:0",
52
+ "dtype": "torch.bfloat16",
53
+ "group_size": 256,
54
+ "num_bits": 4,
55
+ "num_sms": 114,
56
+ "template_id": 136
57
+ },
58
+ "model.layers.0.mlp.up_proj": {
59
+ "K": 2048,
60
+ "M": 1,
61
+ "N": 8192,
62
+ "device": "cuda:0",
63
+ "dtype": "torch.bfloat16",
64
+ "group_size": 256,
65
+ "num_bits": 4,
66
+ "num_sms": 114,
67
+ "template_id": 136
68
+ },
69
+ "model.layers.0.self_attn.k_proj": {
70
+ "K": 2048,
71
+ "M": 1,
72
+ "N": 512,
73
+ "device": "cuda:0",
74
+ "dtype": "torch.bfloat16",
75
+ "group_size": 256,
76
+ "num_bits": 4,
77
+ "num_sms": 114,
78
+ "template_id": 36
79
+ },
80
+ "model.layers.0.self_attn.o_proj": {
81
+ "K": 2048,
82
+ "M": 1,
83
+ "N": 2048,
84
+ "device": "cuda:0",
85
+ "dtype": "torch.bfloat16",
86
+ "group_size": 256,
87
+ "num_bits": 4,
88
+ "num_sms": 114,
89
+ "template_id": 88
90
+ },
91
+ "model.layers.0.self_attn.q_proj": {
92
+ "K": 2048,
93
+ "M": 1,
94
+ "N": 2048,
95
+ "device": "cuda:0",
96
+ "dtype": "torch.bfloat16",
97
+ "group_size": 256,
98
+ "num_bits": 4,
99
+ "num_sms": 114,
100
+ "template_id": 88
101
+ },
102
+ "model.layers.0.self_attn.v_proj": {
103
+ "K": 2048,
104
+ "M": 1,
105
+ "N": 512,
106
+ "device": "cuda:0",
107
+ "dtype": "torch.bfloat16",
108
+ "group_size": 256,
109
+ "num_bits": 4,
110
+ "num_sms": 114,
111
+ "template_id": 36
112
+ },
113
+ "model.layers.1.mlp.down_proj": {
114
+ "K": 8192,
115
+ "M": 1,
116
+ "N": 2048,
117
+ "device": "cuda:0",
118
+ "dtype": "torch.bfloat16",
119
+ "group_size": 256,
120
+ "num_bits": 4,
121
+ "num_sms": 114,
122
+ "template_id": 136
123
+ },
124
+ "model.layers.1.mlp.gate_proj": {
125
+ "K": 2048,
126
+ "M": 1,
127
+ "N": 8192,
128
+ "device": "cuda:0",
129
+ "dtype": "torch.bfloat16",
130
+ "group_size": 256,
131
+ "num_bits": 4,
132
+ "num_sms": 114,
133
+ "template_id": 136
134
+ },
135
+ "model.layers.1.mlp.up_proj": {
136
+ "K": 2048,
137
+ "M": 1,
138
+ "N": 8192,
139
+ "device": "cuda:0",
140
+ "dtype": "torch.bfloat16",
141
+ "group_size": 256,
142
+ "num_bits": 4,
143
+ "num_sms": 114,
144
+ "template_id": 136
145
+ },
146
+ "model.layers.1.self_attn.k_proj": {
147
+ "K": 2048,
148
+ "M": 1,
149
+ "N": 512,
150
+ "device": "cuda:0",
151
+ "dtype": "torch.bfloat16",
152
+ "group_size": 256,
153
+ "num_bits": 4,
154
+ "num_sms": 114,
155
+ "template_id": 36
156
+ },
157
+ "model.layers.1.self_attn.o_proj": {
158
+ "K": 2048,
159
+ "M": 1,
160
+ "N": 2048,
161
+ "device": "cuda:0",
162
+ "dtype": "torch.bfloat16",
163
+ "group_size": 256,
164
+ "num_bits": 4,
165
+ "num_sms": 114,
166
+ "template_id": 88
167
+ },
168
+ "model.layers.1.self_attn.q_proj": {
169
+ "K": 2048,
170
+ "M": 1,
171
+ "N": 2048,
172
+ "device": "cuda:0",
173
+ "dtype": "torch.bfloat16",
174
+ "group_size": 256,
175
+ "num_bits": 4,
176
+ "num_sms": 114,
177
+ "template_id": 88
178
+ },
179
+ "model.layers.1.self_attn.v_proj": {
180
+ "K": 2048,
181
+ "M": 1,
182
+ "N": 512,
183
+ "device": "cuda:0",
184
+ "dtype": "torch.bfloat16",
185
+ "group_size": 256,
186
+ "num_bits": 4,
187
+ "num_sms": 114,
188
+ "template_id": 36
189
+ },
190
+ "model.layers.10.mlp.down_proj": {
191
+ "K": 8192,
192
+ "M": 1,
193
+ "N": 2048,
194
+ "device": "cuda:0",
195
+ "dtype": "torch.bfloat16",
196
+ "group_size": 256,
197
+ "num_bits": 4,
198
+ "num_sms": 114,
199
+ "template_id": 136
200
+ },
201
+ "model.layers.10.mlp.gate_proj": {
202
+ "K": 2048,
203
+ "M": 1,
204
+ "N": 8192,
205
+ "device": "cuda:0",
206
+ "dtype": "torch.bfloat16",
207
+ "group_size": 256,
208
+ "num_bits": 4,
209
+ "num_sms": 114,
210
+ "template_id": 136
211
+ },
212
+ "model.layers.10.mlp.up_proj": {
213
+ "K": 2048,
214
+ "M": 1,
215
+ "N": 8192,
216
+ "device": "cuda:0",
217
+ "dtype": "torch.bfloat16",
218
+ "group_size": 256,
219
+ "num_bits": 4,
220
+ "num_sms": 114,
221
+ "template_id": 136
222
+ },
223
+ "model.layers.10.self_attn.k_proj": {
224
+ "K": 2048,
225
+ "M": 1,
226
+ "N": 512,
227
+ "device": "cuda:0",
228
+ "dtype": "torch.bfloat16",
229
+ "group_size": 256,
230
+ "num_bits": 4,
231
+ "num_sms": 114,
232
+ "template_id": 36
233
+ },
234
+ "model.layers.10.self_attn.o_proj": {
235
+ "K": 2048,
236
+ "M": 1,
237
+ "N": 2048,
238
+ "device": "cuda:0",
239
+ "dtype": "torch.bfloat16",
240
+ "group_size": 256,
241
+ "num_bits": 4,
242
+ "num_sms": 114,
243
+ "template_id": 88
244
+ },
245
+ "model.layers.10.self_attn.q_proj": {
246
+ "K": 2048,
247
+ "M": 1,
248
+ "N": 2048,
249
+ "device": "cuda:0",
250
+ "dtype": "torch.bfloat16",
251
+ "group_size": 256,
252
+ "num_bits": 4,
253
+ "num_sms": 114,
254
+ "template_id": 88
255
+ },
256
+ "model.layers.10.self_attn.v_proj": {
257
+ "K": 2048,
258
+ "M": 1,
259
+ "N": 512,
260
+ "device": "cuda:0",
261
+ "dtype": "torch.bfloat16",
262
+ "group_size": 256,
263
+ "num_bits": 4,
264
+ "num_sms": 114,
265
+ "template_id": 36
266
+ },
267
+ "model.layers.11.mlp.down_proj": {
268
+ "K": 8192,
269
+ "M": 1,
270
+ "N": 2048,
271
+ "device": "cuda:0",
272
+ "dtype": "torch.bfloat16",
273
+ "group_size": 256,
274
+ "num_bits": 4,
275
+ "num_sms": 114,
276
+ "template_id": 136
277
+ },
278
+ "model.layers.11.mlp.gate_proj": {
279
+ "K": 2048,
280
+ "M": 1,
281
+ "N": 8192,
282
+ "device": "cuda:0",
283
+ "dtype": "torch.bfloat16",
284
+ "group_size": 256,
285
+ "num_bits": 4,
286
+ "num_sms": 114,
287
+ "template_id": 136
288
+ },
289
+ "model.layers.11.mlp.up_proj": {
290
+ "K": 2048,
291
+ "M": 1,
292
+ "N": 8192,
293
+ "device": "cuda:0",
294
+ "dtype": "torch.bfloat16",
295
+ "group_size": 256,
296
+ "num_bits": 4,
297
+ "num_sms": 114,
298
+ "template_id": 136
299
+ },
300
+ "model.layers.11.self_attn.k_proj": {
301
+ "K": 2048,
302
+ "M": 1,
303
+ "N": 512,
304
+ "device": "cuda:0",
305
+ "dtype": "torch.bfloat16",
306
+ "group_size": 256,
307
+ "num_bits": 4,
308
+ "num_sms": 114,
309
+ "template_id": 36
310
+ },
311
+ "model.layers.11.self_attn.o_proj": {
312
+ "K": 2048,
313
+ "M": 1,
314
+ "N": 2048,
315
+ "device": "cuda:0",
316
+ "dtype": "torch.bfloat16",
317
+ "group_size": 256,
318
+ "num_bits": 4,
319
+ "num_sms": 114,
320
+ "template_id": 88
321
+ },
322
+ "model.layers.11.self_attn.q_proj": {
323
+ "K": 2048,
324
+ "M": 1,
325
+ "N": 2048,
326
+ "device": "cuda:0",
327
+ "dtype": "torch.bfloat16",
328
+ "group_size": 256,
329
+ "num_bits": 4,
330
+ "num_sms": 114,
331
+ "template_id": 88
332
+ },
333
+ "model.layers.11.self_attn.v_proj": {
334
+ "K": 2048,
335
+ "M": 1,
336
+ "N": 512,
337
+ "device": "cuda:0",
338
+ "dtype": "torch.bfloat16",
339
+ "group_size": 256,
340
+ "num_bits": 4,
341
+ "num_sms": 114,
342
+ "template_id": 36
343
+ },
344
+ "model.layers.12.mlp.down_proj": {
345
+ "K": 8192,
346
+ "M": 1,
347
+ "N": 2048,
348
+ "device": "cuda:0",
349
+ "dtype": "torch.bfloat16",
350
+ "group_size": 256,
351
+ "num_bits": 4,
352
+ "num_sms": 114,
353
+ "template_id": 136
354
+ },
355
+ "model.layers.12.mlp.gate_proj": {
356
+ "K": 2048,
357
+ "M": 1,
358
+ "N": 8192,
359
+ "device": "cuda:0",
360
+ "dtype": "torch.bfloat16",
361
+ "group_size": 256,
362
+ "num_bits": 4,
363
+ "num_sms": 114,
364
+ "template_id": 136
365
+ },
366
+ "model.layers.12.mlp.up_proj": {
367
+ "K": 2048,
368
+ "M": 1,
369
+ "N": 8192,
370
+ "device": "cuda:0",
371
+ "dtype": "torch.bfloat16",
372
+ "group_size": 256,
373
+ "num_bits": 4,
374
+ "num_sms": 114,
375
+ "template_id": 136
376
+ },
377
+ "model.layers.12.self_attn.k_proj": {
378
+ "K": 2048,
379
+ "M": 1,
380
+ "N": 512,
381
+ "device": "cuda:0",
382
+ "dtype": "torch.bfloat16",
383
+ "group_size": 256,
384
+ "num_bits": 4,
385
+ "num_sms": 114,
386
+ "template_id": 36
387
+ },
388
+ "model.layers.12.self_attn.o_proj": {
389
+ "K": 2048,
390
+ "M": 1,
391
+ "N": 2048,
392
+ "device": "cuda:0",
393
+ "dtype": "torch.bfloat16",
394
+ "group_size": 256,
395
+ "num_bits": 4,
396
+ "num_sms": 114,
397
+ "template_id": 88
398
+ },
399
+ "model.layers.12.self_attn.q_proj": {
400
+ "K": 2048,
401
+ "M": 1,
402
+ "N": 2048,
403
+ "device": "cuda:0",
404
+ "dtype": "torch.bfloat16",
405
+ "group_size": 256,
406
+ "num_bits": 4,
407
+ "num_sms": 114,
408
+ "template_id": 88
409
+ },
410
+ "model.layers.12.self_attn.v_proj": {
411
+ "K": 2048,
412
+ "M": 1,
413
+ "N": 512,
414
+ "device": "cuda:0",
415
+ "dtype": "torch.bfloat16",
416
+ "group_size": 256,
417
+ "num_bits": 4,
418
+ "num_sms": 114,
419
+ "template_id": 36
420
+ },
421
+ "model.layers.13.mlp.down_proj": {
422
+ "K": 8192,
423
+ "M": 1,
424
+ "N": 2048,
425
+ "device": "cuda:0",
426
+ "dtype": "torch.bfloat16",
427
+ "group_size": 256,
428
+ "num_bits": 4,
429
+ "num_sms": 114,
430
+ "template_id": 136
431
+ },
432
+ "model.layers.13.mlp.gate_proj": {
433
+ "K": 2048,
434
+ "M": 1,
435
+ "N": 8192,
436
+ "device": "cuda:0",
437
+ "dtype": "torch.bfloat16",
438
+ "group_size": 256,
439
+ "num_bits": 4,
440
+ "num_sms": 114,
441
+ "template_id": 136
442
+ },
443
+ "model.layers.13.mlp.up_proj": {
444
+ "K": 2048,
445
+ "M": 1,
446
+ "N": 8192,
447
+ "device": "cuda:0",
448
+ "dtype": "torch.bfloat16",
449
+ "group_size": 256,
450
+ "num_bits": 4,
451
+ "num_sms": 114,
452
+ "template_id": 136
453
+ },
454
+ "model.layers.13.self_attn.k_proj": {
455
+ "K": 2048,
456
+ "M": 1,
457
+ "N": 512,
458
+ "device": "cuda:0",
459
+ "dtype": "torch.bfloat16",
460
+ "group_size": 256,
461
+ "num_bits": 4,
462
+ "num_sms": 114,
463
+ "template_id": 36
464
+ },
465
+ "model.layers.13.self_attn.o_proj": {
466
+ "K": 2048,
467
+ "M": 1,
468
+ "N": 2048,
469
+ "device": "cuda:0",
470
+ "dtype": "torch.bfloat16",
471
+ "group_size": 256,
472
+ "num_bits": 4,
473
+ "num_sms": 114,
474
+ "template_id": 88
475
+ },
476
+ "model.layers.13.self_attn.q_proj": {
477
+ "K": 2048,
478
+ "M": 1,
479
+ "N": 2048,
480
+ "device": "cuda:0",
481
+ "dtype": "torch.bfloat16",
482
+ "group_size": 256,
483
+ "num_bits": 4,
484
+ "num_sms": 114,
485
+ "template_id": 88
486
+ },
487
+ "model.layers.13.self_attn.v_proj": {
488
+ "K": 2048,
489
+ "M": 1,
490
+ "N": 512,
491
+ "device": "cuda:0",
492
+ "dtype": "torch.bfloat16",
493
+ "group_size": 256,
494
+ "num_bits": 4,
495
+ "num_sms": 114,
496
+ "template_id": 36
497
+ },
498
+ "model.layers.14.mlp.down_proj": {
499
+ "K": 8192,
500
+ "M": 1,
501
+ "N": 2048,
502
+ "device": "cuda:0",
503
+ "dtype": "torch.bfloat16",
504
+ "group_size": 256,
505
+ "num_bits": 4,
506
+ "num_sms": 114,
507
+ "template_id": 136
508
+ },
509
+ "model.layers.14.mlp.gate_proj": {
510
+ "K": 2048,
511
+ "M": 1,
512
+ "N": 8192,
513
+ "device": "cuda:0",
514
+ "dtype": "torch.bfloat16",
515
+ "group_size": 256,
516
+ "num_bits": 4,
517
+ "num_sms": 114,
518
+ "template_id": 136
519
+ },
520
+ "model.layers.14.mlp.up_proj": {
521
+ "K": 2048,
522
+ "M": 1,
523
+ "N": 8192,
524
+ "device": "cuda:0",
525
+ "dtype": "torch.bfloat16",
526
+ "group_size": 256,
527
+ "num_bits": 4,
528
+ "num_sms": 114,
529
+ "template_id": 136
530
+ },
531
+ "model.layers.14.self_attn.k_proj": {
532
+ "K": 2048,
533
+ "M": 1,
534
+ "N": 512,
535
+ "device": "cuda:0",
536
+ "dtype": "torch.bfloat16",
537
+ "group_size": 256,
538
+ "num_bits": 4,
539
+ "num_sms": 114,
540
+ "template_id": 36
541
+ },
542
+ "model.layers.14.self_attn.o_proj": {
543
+ "K": 2048,
544
+ "M": 1,
545
+ "N": 2048,
546
+ "device": "cuda:0",
547
+ "dtype": "torch.bfloat16",
548
+ "group_size": 256,
549
+ "num_bits": 4,
550
+ "num_sms": 114,
551
+ "template_id": 88
552
+ },
553
+ "model.layers.14.self_attn.q_proj": {
554
+ "K": 2048,
555
+ "M": 1,
556
+ "N": 2048,
557
+ "device": "cuda:0",
558
+ "dtype": "torch.bfloat16",
559
+ "group_size": 256,
560
+ "num_bits": 4,
561
+ "num_sms": 114,
562
+ "template_id": 88
563
+ },
564
+ "model.layers.14.self_attn.v_proj": {
565
+ "K": 2048,
566
+ "M": 1,
567
+ "N": 512,
568
+ "device": "cuda:0",
569
+ "dtype": "torch.bfloat16",
570
+ "group_size": 256,
571
+ "num_bits": 4,
572
+ "num_sms": 114,
573
+ "template_id": 36
574
+ },
575
+ "model.layers.15.mlp.down_proj": {
576
+ "K": 8192,
577
+ "M": 1,
578
+ "N": 2048,
579
+ "device": "cuda:0",
580
+ "dtype": "torch.bfloat16",
581
+ "group_size": 256,
582
+ "num_bits": 4,
583
+ "num_sms": 114,
584
+ "template_id": 136
585
+ },
586
+ "model.layers.15.mlp.gate_proj": {
587
+ "K": 2048,
588
+ "M": 1,
589
+ "N": 8192,
590
+ "device": "cuda:0",
591
+ "dtype": "torch.bfloat16",
592
+ "group_size": 256,
593
+ "num_bits": 4,
594
+ "num_sms": 114,
595
+ "template_id": 136
596
+ },
597
+ "model.layers.15.mlp.up_proj": {
598
+ "K": 2048,
599
+ "M": 1,
600
+ "N": 8192,
601
+ "device": "cuda:0",
602
+ "dtype": "torch.bfloat16",
603
+ "group_size": 256,
604
+ "num_bits": 4,
605
+ "num_sms": 114,
606
+ "template_id": 136
607
+ },
608
+ "model.layers.15.self_attn.k_proj": {
609
+ "K": 2048,
610
+ "M": 1,
611
+ "N": 512,
612
+ "device": "cuda:0",
613
+ "dtype": "torch.bfloat16",
614
+ "group_size": 256,
615
+ "num_bits": 4,
616
+ "num_sms": 114,
617
+ "template_id": 36
618
+ },
619
+ "model.layers.15.self_attn.o_proj": {
620
+ "K": 2048,
621
+ "M": 1,
622
+ "N": 2048,
623
+ "device": "cuda:0",
624
+ "dtype": "torch.bfloat16",
625
+ "group_size": 256,
626
+ "num_bits": 4,
627
+ "num_sms": 114,
628
+ "template_id": 88
629
+ },
630
+ "model.layers.15.self_attn.q_proj": {
631
+ "K": 2048,
632
+ "M": 1,
633
+ "N": 2048,
634
+ "device": "cuda:0",
635
+ "dtype": "torch.bfloat16",
636
+ "group_size": 256,
637
+ "num_bits": 4,
638
+ "num_sms": 114,
639
+ "template_id": 88
640
+ },
641
+ "model.layers.15.self_attn.v_proj": {
642
+ "K": 2048,
643
+ "M": 1,
644
+ "N": 512,
645
+ "device": "cuda:0",
646
+ "dtype": "torch.bfloat16",
647
+ "group_size": 256,
648
+ "num_bits": 4,
649
+ "num_sms": 114,
650
+ "template_id": 36
651
+ },
652
+ "model.layers.2.mlp.down_proj": {
653
+ "K": 8192,
654
+ "M": 1,
655
+ "N": 2048,
656
+ "device": "cuda:0",
657
+ "dtype": "torch.bfloat16",
658
+ "group_size": 256,
659
+ "num_bits": 4,
660
+ "num_sms": 114,
661
+ "template_id": 136
662
+ },
663
+ "model.layers.2.mlp.gate_proj": {
664
+ "K": 2048,
665
+ "M": 1,
666
+ "N": 8192,
667
+ "device": "cuda:0",
668
+ "dtype": "torch.bfloat16",
669
+ "group_size": 256,
670
+ "num_bits": 4,
671
+ "num_sms": 114,
672
+ "template_id": 136
673
+ },
674
+ "model.layers.2.mlp.up_proj": {
675
+ "K": 2048,
676
+ "M": 1,
677
+ "N": 8192,
678
+ "device": "cuda:0",
679
+ "dtype": "torch.bfloat16",
680
+ "group_size": 256,
681
+ "num_bits": 4,
682
+ "num_sms": 114,
683
+ "template_id": 136
684
+ },
685
+ "model.layers.2.self_attn.k_proj": {
686
+ "K": 2048,
687
+ "M": 1,
688
+ "N": 512,
689
+ "device": "cuda:0",
690
+ "dtype": "torch.bfloat16",
691
+ "group_size": 256,
692
+ "num_bits": 4,
693
+ "num_sms": 114,
694
+ "template_id": 36
695
+ },
696
+ "model.layers.2.self_attn.o_proj": {
697
+ "K": 2048,
698
+ "M": 1,
699
+ "N": 2048,
700
+ "device": "cuda:0",
701
+ "dtype": "torch.bfloat16",
702
+ "group_size": 256,
703
+ "num_bits": 4,
704
+ "num_sms": 114,
705
+ "template_id": 88
706
+ },
707
+ "model.layers.2.self_attn.q_proj": {
708
+ "K": 2048,
709
+ "M": 1,
710
+ "N": 2048,
711
+ "device": "cuda:0",
712
+ "dtype": "torch.bfloat16",
713
+ "group_size": 256,
714
+ "num_bits": 4,
715
+ "num_sms": 114,
716
+ "template_id": 88
717
+ },
718
+ "model.layers.2.self_attn.v_proj": {
719
+ "K": 2048,
720
+ "M": 1,
721
+ "N": 512,
722
+ "device": "cuda:0",
723
+ "dtype": "torch.bfloat16",
724
+ "group_size": 256,
725
+ "num_bits": 4,
726
+ "num_sms": 114,
727
+ "template_id": 36
728
+ },
729
+ "model.layers.3.mlp.down_proj": {
730
+ "K": 8192,
731
+ "M": 1,
732
+ "N": 2048,
733
+ "device": "cuda:0",
734
+ "dtype": "torch.bfloat16",
735
+ "group_size": 256,
736
+ "num_bits": 4,
737
+ "num_sms": 114,
738
+ "template_id": 136
739
+ },
740
+ "model.layers.3.mlp.gate_proj": {
741
+ "K": 2048,
742
+ "M": 1,
743
+ "N": 8192,
744
+ "device": "cuda:0",
745
+ "dtype": "torch.bfloat16",
746
+ "group_size": 256,
747
+ "num_bits": 4,
748
+ "num_sms": 114,
749
+ "template_id": 136
750
+ },
751
+ "model.layers.3.mlp.up_proj": {
752
+ "K": 2048,
753
+ "M": 1,
754
+ "N": 8192,
755
+ "device": "cuda:0",
756
+ "dtype": "torch.bfloat16",
757
+ "group_size": 256,
758
+ "num_bits": 4,
759
+ "num_sms": 114,
760
+ "template_id": 136
761
+ },
762
+ "model.layers.3.self_attn.k_proj": {
763
+ "K": 2048,
764
+ "M": 1,
765
+ "N": 512,
766
+ "device": "cuda:0",
767
+ "dtype": "torch.bfloat16",
768
+ "group_size": 256,
769
+ "num_bits": 4,
770
+ "num_sms": 114,
771
+ "template_id": 36
772
+ },
773
+ "model.layers.3.self_attn.o_proj": {
774
+ "K": 2048,
775
+ "M": 1,
776
+ "N": 2048,
777
+ "device": "cuda:0",
778
+ "dtype": "torch.bfloat16",
779
+ "group_size": 256,
780
+ "num_bits": 4,
781
+ "num_sms": 114,
782
+ "template_id": 88
783
+ },
784
+ "model.layers.3.self_attn.q_proj": {
785
+ "K": 2048,
786
+ "M": 1,
787
+ "N": 2048,
788
+ "device": "cuda:0",
789
+ "dtype": "torch.bfloat16",
790
+ "group_size": 256,
791
+ "num_bits": 4,
792
+ "num_sms": 114,
793
+ "template_id": 88
794
+ },
795
+ "model.layers.3.self_attn.v_proj": {
796
+ "K": 2048,
797
+ "M": 1,
798
+ "N": 512,
799
+ "device": "cuda:0",
800
+ "dtype": "torch.bfloat16",
801
+ "group_size": 256,
802
+ "num_bits": 4,
803
+ "num_sms": 114,
804
+ "template_id": 36
805
+ },
806
+ "model.layers.4.mlp.down_proj": {
807
+ "K": 8192,
808
+ "M": 1,
809
+ "N": 2048,
810
+ "device": "cuda:0",
811
+ "dtype": "torch.bfloat16",
812
+ "group_size": 256,
813
+ "num_bits": 4,
814
+ "num_sms": 114,
815
+ "template_id": 136
816
+ },
817
+ "model.layers.4.mlp.gate_proj": {
818
+ "K": 2048,
819
+ "M": 1,
820
+ "N": 8192,
821
+ "device": "cuda:0",
822
+ "dtype": "torch.bfloat16",
823
+ "group_size": 256,
824
+ "num_bits": 4,
825
+ "num_sms": 114,
826
+ "template_id": 136
827
+ },
828
+ "model.layers.4.mlp.up_proj": {
829
+ "K": 2048,
830
+ "M": 1,
831
+ "N": 8192,
832
+ "device": "cuda:0",
833
+ "dtype": "torch.bfloat16",
834
+ "group_size": 256,
835
+ "num_bits": 4,
836
+ "num_sms": 114,
837
+ "template_id": 136
838
+ },
839
+ "model.layers.4.self_attn.k_proj": {
840
+ "K": 2048,
841
+ "M": 1,
842
+ "N": 512,
843
+ "device": "cuda:0",
844
+ "dtype": "torch.bfloat16",
845
+ "group_size": 256,
846
+ "num_bits": 4,
847
+ "num_sms": 114,
848
+ "template_id": 36
849
+ },
850
+ "model.layers.4.self_attn.o_proj": {
851
+ "K": 2048,
852
+ "M": 1,
853
+ "N": 2048,
854
+ "device": "cuda:0",
855
+ "dtype": "torch.bfloat16",
856
+ "group_size": 256,
857
+ "num_bits": 4,
858
+ "num_sms": 114,
859
+ "template_id": 88
860
+ },
861
+ "model.layers.4.self_attn.q_proj": {
862
+ "K": 2048,
863
+ "M": 1,
864
+ "N": 2048,
865
+ "device": "cuda:0",
866
+ "dtype": "torch.bfloat16",
867
+ "group_size": 256,
868
+ "num_bits": 4,
869
+ "num_sms": 114,
870
+ "template_id": 88
871
+ },
872
+ "model.layers.4.self_attn.v_proj": {
873
+ "K": 2048,
874
+ "M": 1,
875
+ "N": 512,
876
+ "device": "cuda:0",
877
+ "dtype": "torch.bfloat16",
878
+ "group_size": 256,
879
+ "num_bits": 4,
880
+ "num_sms": 114,
881
+ "template_id": 36
882
+ },
883
+ "model.layers.5.mlp.down_proj": {
884
+ "K": 8192,
885
+ "M": 1,
886
+ "N": 2048,
887
+ "device": "cuda:0",
888
+ "dtype": "torch.bfloat16",
889
+ "group_size": 256,
890
+ "num_bits": 4,
891
+ "num_sms": 114,
892
+ "template_id": 136
893
+ },
894
+ "model.layers.5.mlp.gate_proj": {
895
+ "K": 2048,
896
+ "M": 1,
897
+ "N": 8192,
898
+ "device": "cuda:0",
899
+ "dtype": "torch.bfloat16",
900
+ "group_size": 256,
901
+ "num_bits": 4,
902
+ "num_sms": 114,
903
+ "template_id": 136
904
+ },
905
+ "model.layers.5.mlp.up_proj": {
906
+ "K": 2048,
907
+ "M": 1,
908
+ "N": 8192,
909
+ "device": "cuda:0",
910
+ "dtype": "torch.bfloat16",
911
+ "group_size": 256,
912
+ "num_bits": 4,
913
+ "num_sms": 114,
914
+ "template_id": 136
915
+ },
916
+ "model.layers.5.self_attn.k_proj": {
917
+ "K": 2048,
918
+ "M": 1,
919
+ "N": 512,
920
+ "device": "cuda:0",
921
+ "dtype": "torch.bfloat16",
922
+ "group_size": 256,
923
+ "num_bits": 4,
924
+ "num_sms": 114,
925
+ "template_id": 36
926
+ },
927
+ "model.layers.5.self_attn.o_proj": {
928
+ "K": 2048,
929
+ "M": 1,
930
+ "N": 2048,
931
+ "device": "cuda:0",
932
+ "dtype": "torch.bfloat16",
933
+ "group_size": 256,
934
+ "num_bits": 4,
935
+ "num_sms": 114,
936
+ "template_id": 88
937
+ },
938
+ "model.layers.5.self_attn.q_proj": {
939
+ "K": 2048,
940
+ "M": 1,
941
+ "N": 2048,
942
+ "device": "cuda:0",
943
+ "dtype": "torch.bfloat16",
944
+ "group_size": 256,
945
+ "num_bits": 4,
946
+ "num_sms": 114,
947
+ "template_id": 88
948
+ },
949
+ "model.layers.5.self_attn.v_proj": {
950
+ "K": 2048,
951
+ "M": 1,
952
+ "N": 512,
953
+ "device": "cuda:0",
954
+ "dtype": "torch.bfloat16",
955
+ "group_size": 256,
956
+ "num_bits": 4,
957
+ "num_sms": 114,
958
+ "template_id": 36
959
+ },
960
+ "model.layers.6.mlp.down_proj": {
961
+ "K": 8192,
962
+ "M": 1,
963
+ "N": 2048,
964
+ "device": "cuda:0",
965
+ "dtype": "torch.bfloat16",
966
+ "group_size": 256,
967
+ "num_bits": 4,
968
+ "num_sms": 114,
969
+ "template_id": 136
970
+ },
971
+ "model.layers.6.mlp.gate_proj": {
972
+ "K": 2048,
973
+ "M": 1,
974
+ "N": 8192,
975
+ "device": "cuda:0",
976
+ "dtype": "torch.bfloat16",
977
+ "group_size": 256,
978
+ "num_bits": 4,
979
+ "num_sms": 114,
980
+ "template_id": 136
981
+ },
982
+ "model.layers.6.mlp.up_proj": {
983
+ "K": 2048,
984
+ "M": 1,
985
+ "N": 8192,
986
+ "device": "cuda:0",
987
+ "dtype": "torch.bfloat16",
988
+ "group_size": 256,
989
+ "num_bits": 4,
990
+ "num_sms": 114,
991
+ "template_id": 136
992
+ },
993
+ "model.layers.6.self_attn.k_proj": {
994
+ "K": 2048,
995
+ "M": 1,
996
+ "N": 512,
997
+ "device": "cuda:0",
998
+ "dtype": "torch.bfloat16",
999
+ "group_size": 256,
1000
+ "num_bits": 4,
1001
+ "num_sms": 114,
1002
+ "template_id": 36
1003
+ },
1004
+ "model.layers.6.self_attn.o_proj": {
1005
+ "K": 2048,
1006
+ "M": 1,
1007
+ "N": 2048,
1008
+ "device": "cuda:0",
1009
+ "dtype": "torch.bfloat16",
1010
+ "group_size": 256,
1011
+ "num_bits": 4,
1012
+ "num_sms": 114,
1013
+ "template_id": 88
1014
+ },
1015
+ "model.layers.6.self_attn.q_proj": {
1016
+ "K": 2048,
1017
+ "M": 1,
1018
+ "N": 2048,
1019
+ "device": "cuda:0",
1020
+ "dtype": "torch.bfloat16",
1021
+ "group_size": 256,
1022
+ "num_bits": 4,
1023
+ "num_sms": 114,
1024
+ "template_id": 88
1025
+ },
1026
+ "model.layers.6.self_attn.v_proj": {
1027
+ "K": 2048,
1028
+ "M": 1,
1029
+ "N": 512,
1030
+ "device": "cuda:0",
1031
+ "dtype": "torch.bfloat16",
1032
+ "group_size": 256,
1033
+ "num_bits": 4,
1034
+ "num_sms": 114,
1035
+ "template_id": 36
1036
+ },
1037
+ "model.layers.7.mlp.down_proj": {
1038
+ "K": 8192,
1039
+ "M": 1,
1040
+ "N": 2048,
1041
+ "device": "cuda:0",
1042
+ "dtype": "torch.bfloat16",
1043
+ "group_size": 256,
1044
+ "num_bits": 4,
1045
+ "num_sms": 114,
1046
+ "template_id": 136
1047
+ },
1048
+ "model.layers.7.mlp.gate_proj": {
1049
+ "K": 2048,
1050
+ "M": 1,
1051
+ "N": 8192,
1052
+ "device": "cuda:0",
1053
+ "dtype": "torch.bfloat16",
1054
+ "group_size": 256,
1055
+ "num_bits": 4,
1056
+ "num_sms": 114,
1057
+ "template_id": 136
1058
+ },
1059
+ "model.layers.7.mlp.up_proj": {
1060
+ "K": 2048,
1061
+ "M": 1,
1062
+ "N": 8192,
1063
+ "device": "cuda:0",
1064
+ "dtype": "torch.bfloat16",
1065
+ "group_size": 256,
1066
+ "num_bits": 4,
1067
+ "num_sms": 114,
1068
+ "template_id": 136
1069
+ },
1070
+ "model.layers.7.self_attn.k_proj": {
1071
+ "K": 2048,
1072
+ "M": 1,
1073
+ "N": 512,
1074
+ "device": "cuda:0",
1075
+ "dtype": "torch.bfloat16",
1076
+ "group_size": 256,
1077
+ "num_bits": 4,
1078
+ "num_sms": 114,
1079
+ "template_id": 36
1080
+ },
1081
+ "model.layers.7.self_attn.o_proj": {
1082
+ "K": 2048,
1083
+ "M": 1,
1084
+ "N": 2048,
1085
+ "device": "cuda:0",
1086
+ "dtype": "torch.bfloat16",
1087
+ "group_size": 256,
1088
+ "num_bits": 4,
1089
+ "num_sms": 114,
1090
+ "template_id": 88
1091
+ },
1092
+ "model.layers.7.self_attn.q_proj": {
1093
+ "K": 2048,
1094
+ "M": 1,
1095
+ "N": 2048,
1096
+ "device": "cuda:0",
1097
+ "dtype": "torch.bfloat16",
1098
+ "group_size": 256,
1099
+ "num_bits": 4,
1100
+ "num_sms": 114,
1101
+ "template_id": 88
1102
+ },
1103
+ "model.layers.7.self_attn.v_proj": {
1104
+ "K": 2048,
1105
+ "M": 1,
1106
+ "N": 512,
1107
+ "device": "cuda:0",
1108
+ "dtype": "torch.bfloat16",
1109
+ "group_size": 256,
1110
+ "num_bits": 4,
1111
+ "num_sms": 114,
1112
+ "template_id": 36
1113
+ },
1114
+ "model.layers.8.mlp.down_proj": {
1115
+ "K": 8192,
1116
+ "M": 1,
1117
+ "N": 2048,
1118
+ "device": "cuda:0",
1119
+ "dtype": "torch.bfloat16",
1120
+ "group_size": 256,
1121
+ "num_bits": 4,
1122
+ "num_sms": 114,
1123
+ "template_id": 136
1124
+ },
1125
+ "model.layers.8.mlp.gate_proj": {
1126
+ "K": 2048,
1127
+ "M": 1,
1128
+ "N": 8192,
1129
+ "device": "cuda:0",
1130
+ "dtype": "torch.bfloat16",
1131
+ "group_size": 256,
1132
+ "num_bits": 4,
1133
+ "num_sms": 114,
1134
+ "template_id": 136
1135
+ },
1136
+ "model.layers.8.mlp.up_proj": {
1137
+ "K": 2048,
1138
+ "M": 1,
1139
+ "N": 8192,
1140
+ "device": "cuda:0",
1141
+ "dtype": "torch.bfloat16",
1142
+ "group_size": 256,
1143
+ "num_bits": 4,
1144
+ "num_sms": 114,
1145
+ "template_id": 136
1146
+ },
1147
+ "model.layers.8.self_attn.k_proj": {
1148
+ "K": 2048,
1149
+ "M": 1,
1150
+ "N": 512,
1151
+ "device": "cuda:0",
1152
+ "dtype": "torch.bfloat16",
1153
+ "group_size": 256,
1154
+ "num_bits": 4,
1155
+ "num_sms": 114,
1156
+ "template_id": 36
1157
+ },
1158
+ "model.layers.8.self_attn.o_proj": {
1159
+ "K": 2048,
1160
+ "M": 1,
1161
+ "N": 2048,
1162
+ "device": "cuda:0",
1163
+ "dtype": "torch.bfloat16",
1164
+ "group_size": 256,
1165
+ "num_bits": 4,
1166
+ "num_sms": 114,
1167
+ "template_id": 88
1168
+ },
1169
+ "model.layers.8.self_attn.q_proj": {
1170
+ "K": 2048,
1171
+ "M": 1,
1172
+ "N": 2048,
1173
+ "device": "cuda:0",
1174
+ "dtype": "torch.bfloat16",
1175
+ "group_size": 256,
1176
+ "num_bits": 4,
1177
+ "num_sms": 114,
1178
+ "template_id": 88
1179
+ },
1180
+ "model.layers.8.self_attn.v_proj": {
1181
+ "K": 2048,
1182
+ "M": 1,
1183
+ "N": 512,
1184
+ "device": "cuda:0",
1185
+ "dtype": "torch.bfloat16",
1186
+ "group_size": 256,
1187
+ "num_bits": 4,
1188
+ "num_sms": 114,
1189
+ "template_id": 36
1190
+ },
1191
+ "model.layers.9.mlp.down_proj": {
1192
+ "K": 8192,
1193
+ "M": 1,
1194
+ "N": 2048,
1195
+ "device": "cuda:0",
1196
+ "dtype": "torch.bfloat16",
1197
+ "group_size": 256,
1198
+ "num_bits": 4,
1199
+ "num_sms": 114,
1200
+ "template_id": 136
1201
+ },
1202
+ "model.layers.9.mlp.gate_proj": {
1203
+ "K": 2048,
1204
+ "M": 1,
1205
+ "N": 8192,
1206
+ "device": "cuda:0",
1207
+ "dtype": "torch.bfloat16",
1208
+ "group_size": 256,
1209
+ "num_bits": 4,
1210
+ "num_sms": 114,
1211
+ "template_id": 136
1212
+ },
1213
+ "model.layers.9.mlp.up_proj": {
1214
+ "K": 2048,
1215
+ "M": 1,
1216
+ "N": 8192,
1217
+ "device": "cuda:0",
1218
+ "dtype": "torch.bfloat16",
1219
+ "group_size": 256,
1220
+ "num_bits": 4,
1221
+ "num_sms": 114,
1222
+ "template_id": 136
1223
+ },
1224
+ "model.layers.9.self_attn.k_proj": {
1225
+ "K": 2048,
1226
+ "M": 1,
1227
+ "N": 512,
1228
+ "device": "cuda:0",
1229
+ "dtype": "torch.bfloat16",
1230
+ "group_size": 256,
1231
+ "num_bits": 4,
1232
+ "num_sms": 114,
1233
+ "template_id": 36
1234
+ },
1235
+ "model.layers.9.self_attn.o_proj": {
1236
+ "K": 2048,
1237
+ "M": 1,
1238
+ "N": 2048,
1239
+ "device": "cuda:0",
1240
+ "dtype": "torch.bfloat16",
1241
+ "group_size": 256,
1242
+ "num_bits": 4,
1243
+ "num_sms": 114,
1244
+ "template_id": 88
1245
+ },
1246
+ "model.layers.9.self_attn.q_proj": {
1247
+ "K": 2048,
1248
+ "M": 1,
1249
+ "N": 2048,
1250
+ "device": "cuda:0",
1251
+ "dtype": "torch.bfloat16",
1252
+ "group_size": 256,
1253
+ "num_bits": 4,
1254
+ "num_sms": 114,
1255
+ "template_id": 88
1256
+ },
1257
+ "model.layers.9.self_attn.v_proj": {
1258
+ "K": 2048,
1259
+ "M": 1,
1260
+ "N": 512,
1261
+ "device": "cuda:0",
1262
+ "dtype": "torch.bfloat16",
1263
+ "group_size": 256,
1264
+ "num_bits": 4,
1265
+ "num_sms": 114,
1266
+ "template_id": 36
1267
+ }
1268
+ }
1269
+ },
1270
+ "rms_norm_eps": 1e-05,
1271
+ "rope_scaling": {
1272
+ "factor": 32.0,
1273
+ "high_freq_factor": 4.0,
1274
+ "low_freq_factor": 1.0,
1275
+ "original_max_position_embeddings": 8192,
1276
+ "rope_type": "llama3"
1277
+ },
1278
+ "rope_theta": 500000.0,
1279
+ "tie_word_embeddings": true,
1280
+ "torch_dtype": "bfloat16",
1281
+ "transformers_version": "4.52.4",
1282
+ "use_cache": true,
1283
+ "vocab_size": 128256
1284
+ }
generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 128000,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 128001,
6
+ 128008,
7
+ 128009
8
+ ],
9
+ "temperature": 0.6,
10
+ "top_p": 0.9,
11
+ "transformers_version": "4.52.4"
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c594ec0bd527139a6e77dbcb624029500e83007ada4bbb1c0cf18fb4f6936dc5
3
+ size 556243632
smash_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "batcher": null,
3
+ "cacher": null,
4
+ "compiler": null,
5
+ "distiller": null,
6
+ "distributer": null,
7
+ "enhancer": null,
8
+ "factorizer": null,
9
+ "kernel": null,
10
+ "pruner": null,
11
+ "quantizer": "higgs",
12
+ "recoverer": null,
13
+ "higgs_group_size": 256,
14
+ "higgs_hadamard_size": 1024,
15
+ "higgs_p": 2,
16
+ "higgs_weight_bits": 4,
17
+ "batch_size": 1,
18
+ "device": "cuda",
19
+ "device_map": null,
20
+ "save_fns": [
21
+ "transformers_higgs"
22
+ ],
23
+ "load_fns": [
24
+ "transformers_higgs"
25
+ ],
26
+ "reapply_after_load": {
27
+ "factorizer": null,
28
+ "pruner": null,
29
+ "quantizer": null,
30
+ "distiller": null,
31
+ "kernel": null,
32
+ "cacher": null,
33
+ "recoverer": null,
34
+ "distributer": null,
35
+ "compiler": null,
36
+ "batcher": null,
37
+ "enhancer": null
38
+ }
39
+ }