codelion commited on
Commit
76b5ff7
·
verified ·
1 Parent(s): 093b7c8

humanizer-1B-OptIQ-4bit v0.1.4: stacked SFT + DPO LoRAs on MiniCPM5-1B-OptIQ-4bit

Browse files
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: mlx
6
+ tags:
7
+ - text-generation
8
+ - humanizer
9
+ - ai-detection
10
+ - lora
11
+ - mlx
12
+ - mlx-optiq
13
+ - apple-silicon
14
+ base_model: mlx-community/MiniCPM5-1B-OptiQ-4bit
15
+ pipeline_tag: text-generation
16
+ ---
17
+
18
+ # humanizer-1B-OptIQ-4bit
19
+
20
+ **A 1 B model that matches human writing on the RADAR AI detector.** Stacked SFT + DPO LoRA adapters on top of `mlx-community/MiniCPM5-1B-OptIQ-4bit` close 100 % of the gap to the human reference on a 200-draft held-out evaluation.
21
+
22
+ | | P(AI) (RADAR-Vicuna-7B) |
23
+ | --- | ---: |
24
+ | Source AI drafts (Qwen3.5-4B + Gemma-4-e4b output) | 0.51 |
25
+ | **`humanizer-1B-OptIQ-4bit` (SFT + DPO stacked)** | **0.37** |
26
+ | Human reference (EditLens ICLR 2026, n=200) | 0.37 |
27
+
28
+ Build, recipe, and discussion: <https://mlx-optiq.com/blog/humanizer-stacked-lora>
29
+
30
+ ## What's in this repo
31
+
32
+ ```
33
+ humanizer-1B-OptIQ-4bit/
34
+ ├── model.safetensors + config.json + tokenizer* base MiniCPM5-1B-OptIQ-4bit
35
+ ├── optiq_metadata.json per-layer bit assignments
36
+ └── adapters/
37
+ ├── humanizer-sft/ SFT humanizer LoRA
38
+ │ ├── adapters.safetensors
39
+ │ ├── adapter_config.json
40
+ │ └── optiq_lora_config.json
41
+ └── humanizer-dpo/ DPO continuation LoRA
42
+ ├── adapters.safetensors
43
+ ├── adapter_config.json
44
+ └── optiq_lora_config.json
45
+ ```
46
+
47
+ - **Base** — `mlx-community/MiniCPM5-1B-OptiQ-4bit`. OptIQ mixed-precision quant of `openbmb/MiniCPM5-1B`. 875 MB on disk, Capability Score 30.28.
48
+ - **SFT adapter** — trained on canonical SFT data derived from the EditLens ICLR 2026 corpus. `--preset large` (ranks 32-64 with `by_bits` overlay), 600 iters, `mask_prompt=True`.
49
+ - **DPO adapter** — trained as a *delta* on top of the SFT via `optiq lora train --method dpo --mount-adapter`. The reference KL is anchored against base + SFT (the textbook SFT → DPO continuation), the saved adapter contains only the DPO delta. 300 iters, β=0.1, LR 5e-5 with linear warmup → cosine decay (the OptIQ DPO defaults).
50
+
51
+ The DPO adapter is meaningful **only when applied alongside the SFT adapter** — it's a delta from the SFT distribution, not a standalone LoRA. Apply both at inference for the headline result.
52
+
53
+ ## Use
54
+
55
+ You need `mlx-optiq >= 0.1.4` for the multi-LoRA serving and stacking syntax:
56
+
57
+ ```bash
58
+ pip install 'mlx-optiq>=0.1.4'
59
+
60
+ # Download the repo
61
+ huggingface-cli download mlx-community/humanizer-1B-OptIQ-4bit \
62
+ --local-dir ./humanizer-1B-OptIQ-4bit
63
+
64
+ # Serve with both adapters mounted
65
+ optiq serve \
66
+ --model ./humanizer-1B-OptIQ-4bit \
67
+ --adapter ./humanizer-1B-OptIQ-4bit/adapters/humanizer-sft \
68
+ --adapter ./humanizer-1B-OptIQ-4bit/adapters/humanizer-dpo \
69
+ --port 8080
70
+ ```
71
+
72
+ Then send requests with both adapters active via the `+` stacking syntax in the request body:
73
+
74
+ ```bash
75
+ curl http://localhost:8080/v1/chat/completions \
76
+ -H "Content-Type: application/json" \
77
+ -d '{
78
+ "model": "./humanizer-1B-OptIQ-4bit",
79
+ "adapter": "humanizer-sft+humanizer-dpo",
80
+ "messages": [
81
+ {"role": "system", "content": "Rewrite AI-generated drafts into natural human-style prose, preserving meaning, facts, names, numbers, citations, URLs, quotes, and formatting."},
82
+ {"role": "user", "content": "STYLE: direct technical blog\nTONE: analytical, clear, non-corporate\nLENGTH: preserve within 15%\n\nDraft to rewrite:\n\n[your AI-generated draft here]"}
83
+ ],
84
+ "temperature": 0.4,
85
+ "max_tokens": 1600,
86
+ "chat_template_kwargs": {"enable_thinking": false}
87
+ }'
88
+ ```
89
+
90
+ The OpenAI-compatible endpoint is a drop-in for Open WebUI, Continue, Cursor, your own scripts, etc. Send `"adapter": "humanizer-sft"` to use SFT alone, `"adapter": "base"` to bypass adapters entirely (handy for A/B comparisons).
91
+
92
+ ## Held-out evaluation
93
+
94
+ 200 AI-generated drafts from the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) held-out set, rewritten by each system and scored by [RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B). Lower P(AI) is more human-like.
95
+
96
+ | Pipeline | P(AI) | Delta vs source | Slop / 1 K tokens |
97
+ | --- | ---: | ---: | ---: |
98
+ | Source AI draft (Qwen3.5-4B + Gemma-4-e4b) | 0.51 | — | 0.6 |
99
+ | SFT humanizer alone | 0.50 | -0.01 | 0.2 |
100
+ | **SFT + DPO stacked (this repo)** | **0.37** | **-0.14** | **0.0** |
101
+ | Human reference (target) | 0.37 | -0.14 | 0.1 |
102
+
103
+ The stacked pipeline produces fewer slop phrases per 1 K tokens (0.0) than the human reference itself (0.1).
104
+
105
+ ## Intended use & limitations
106
+
107
+ - **Intended use**: rewriting AI-generated drafts (blog posts, articles, reports) into more natural-sounding prose. Preserves facts, names, numbers, URLs, citations.
108
+ - **Trained on**: the EditLens ICLR 2026 corpus filtered through the OptIQ Labs dataset-building pipeline (Qwen3.5-4B and Gemma-4-e4b as the source AI models; the original EditLens human-written prose as target).
109
+ - **AI-detector caveat**: RADAR-Vicuna-7B is one detector among many. Matching the human reference on RADAR means the rewrites land at the same point on RADAR's scale as the EditLens human-written set; other detectors will give different numbers, and detector arms races mean any specific score has a shelf life. The reproducible claim is the **delta from source** and the **gap closure against a fixed human reference**, both held up across the entire 200-draft held-out set.
110
+ - **Length**: the rewrites tend to over-generate (length ratio about 3-4x the source). Apply a max-tokens or post-truncation step if you need length-faithful output.
111
+ - **Capability outside humanization**: this LoRA stack is heavily specialized for the rewrite-this-AI-draft format. Out-of-format prompts will degrade behavior. Serve `"adapter": "base"` for general MiniCPM5-1B inference.
112
+
113
+ ## License
114
+
115
+ - Base model: `openbmb/MiniCPM5-1B` (Apache-2.0).
116
+ - LoRA adapters: Apache-2.0, this release.
117
+ - Training data: derived from [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) (research use).
118
+
119
+ ## Citation
120
+
121
+ ```bibtex
122
+ @misc{mlxoptiq2026humanizer1b,
123
+ title = {humanizer-1B-OptIQ-4bit: a stacked SFT + DPO LoRA on a 1 B model that matches human writing on RADAR},
124
+ author = {{mlx-optiq team}},
125
+ year = {2026},
126
+ url = {https://huggingface.co/mlx-community/humanizer-1B-OptIQ-4bit},
127
+ }
128
+ ```
adapters/humanizer-dpo/adapter_config.json ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fine_tune_type": "lora",
3
+ "num_layers": -1,
4
+ "lora_parameters": {
5
+ "rank": 32,
6
+ "scale": 1.0,
7
+ "dropout": 0.0,
8
+ "keys": null
9
+ },
10
+ "base_model_name_or_path": "optiq_output/openbmb_MiniCPM5-1B/optiq_mixed",
11
+ "bias": "none",
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": false,
14
+ "init_lora_weights": true,
15
+ "layers_to_transform": null,
16
+ "layers_pattern": null,
17
+ "lora_alpha": 32,
18
+ "lora_dropout": 0.0,
19
+ "modules_to_save": null,
20
+ "peft_type": "LORA",
21
+ "r": 32,
22
+ "revision": null,
23
+ "target_modules": [
24
+ "q_proj",
25
+ "k_proj",
26
+ "v_proj",
27
+ "o_proj",
28
+ "gate_proj",
29
+ "up_proj",
30
+ "down_proj"
31
+ ],
32
+ "task_type": "CAUSAL_LM",
33
+ "optiq": {
34
+ "rank_scaling": "by_bits",
35
+ "applied_ranks": {
36
+ "layer_0.mlp.up_proj": 64,
37
+ "layer_0.mlp.down_proj": 64,
38
+ "layer_0.mlp.gate_proj": 64,
39
+ "layer_0.self_attn.o_proj": 64,
40
+ "layer_0.self_attn.v_proj": 64,
41
+ "layer_0.self_attn.k_proj": 64,
42
+ "layer_0.self_attn.q_proj": 64,
43
+ "layer_1.mlp.up_proj": 32,
44
+ "layer_1.mlp.down_proj": 64,
45
+ "layer_1.mlp.gate_proj": 32,
46
+ "layer_1.self_attn.o_proj": 64,
47
+ "layer_1.self_attn.v_proj": 64,
48
+ "layer_1.self_attn.k_proj": 32,
49
+ "layer_1.self_attn.q_proj": 32,
50
+ "layer_2.mlp.up_proj": 32,
51
+ "layer_2.mlp.down_proj": 32,
52
+ "layer_2.mlp.gate_proj": 32,
53
+ "layer_2.self_attn.o_proj": 64,
54
+ "layer_2.self_attn.v_proj": 64,
55
+ "layer_2.self_attn.k_proj": 32,
56
+ "layer_2.self_attn.q_proj": 32,
57
+ "layer_3.mlp.up_proj": 32,
58
+ "layer_3.mlp.down_proj": 32,
59
+ "layer_3.mlp.gate_proj": 32,
60
+ "layer_3.self_attn.o_proj": 64,
61
+ "layer_3.self_attn.v_proj": 64,
62
+ "layer_3.self_attn.k_proj": 32,
63
+ "layer_3.self_attn.q_proj": 32,
64
+ "layer_4.mlp.up_proj": 32,
65
+ "layer_4.mlp.down_proj": 64,
66
+ "layer_4.mlp.gate_proj": 32,
67
+ "layer_4.self_attn.o_proj": 64,
68
+ "layer_4.self_attn.v_proj": 64,
69
+ "layer_4.self_attn.k_proj": 32,
70
+ "layer_4.self_attn.q_proj": 32,
71
+ "layer_5.mlp.up_proj": 32,
72
+ "layer_5.mlp.down_proj": 32,
73
+ "layer_5.mlp.gate_proj": 32,
74
+ "layer_5.self_attn.o_proj": 32,
75
+ "layer_5.self_attn.v_proj": 64,
76
+ "layer_5.self_attn.k_proj": 32,
77
+ "layer_5.self_attn.q_proj": 64,
78
+ "layer_6.mlp.up_proj": 32,
79
+ "layer_6.mlp.down_proj": 32,
80
+ "layer_6.mlp.gate_proj": 32,
81
+ "layer_6.self_attn.o_proj": 64,
82
+ "layer_6.self_attn.v_proj": 64,
83
+ "layer_6.self_attn.k_proj": 32,
84
+ "layer_6.self_attn.q_proj": 32,
85
+ "layer_7.mlp.up_proj": 64,
86
+ "layer_7.mlp.down_proj": 32,
87
+ "layer_7.mlp.gate_proj": 32,
88
+ "layer_7.self_attn.o_proj": 64,
89
+ "layer_7.self_attn.v_proj": 32,
90
+ "layer_7.self_attn.k_proj": 32,
91
+ "layer_7.self_attn.q_proj": 64,
92
+ "layer_8.mlp.up_proj": 32,
93
+ "layer_8.mlp.down_proj": 32,
94
+ "layer_8.mlp.gate_proj": 32,
95
+ "layer_8.self_attn.o_proj": 64,
96
+ "layer_8.self_attn.v_proj": 32,
97
+ "layer_8.self_attn.k_proj": 32,
98
+ "layer_8.self_attn.q_proj": 64,
99
+ "layer_9.mlp.up_proj": 32,
100
+ "layer_9.mlp.down_proj": 32,
101
+ "layer_9.mlp.gate_proj": 32,
102
+ "layer_9.self_attn.o_proj": 64,
103
+ "layer_9.self_attn.v_proj": 64,
104
+ "layer_9.self_attn.k_proj": 32,
105
+ "layer_9.self_attn.q_proj": 32,
106
+ "layer_10.mlp.up_proj": 64,
107
+ "layer_10.mlp.down_proj": 32,
108
+ "layer_10.mlp.gate_proj": 32,
109
+ "layer_10.self_attn.o_proj": 64,
110
+ "layer_10.self_attn.v_proj": 64,
111
+ "layer_10.self_attn.k_proj": 32,
112
+ "layer_10.self_attn.q_proj": 32,
113
+ "layer_11.mlp.up_proj": 32,
114
+ "layer_11.mlp.down_proj": 32,
115
+ "layer_11.mlp.gate_proj": 32,
116
+ "layer_11.self_attn.o_proj": 64,
117
+ "layer_11.self_attn.v_proj": 64,
118
+ "layer_11.self_attn.k_proj": 32,
119
+ "layer_11.self_attn.q_proj": 32,
120
+ "layer_12.mlp.up_proj": 32,
121
+ "layer_12.mlp.down_proj": 32,
122
+ "layer_12.mlp.gate_proj": 32,
123
+ "layer_12.self_attn.o_proj": 64,
124
+ "layer_12.self_attn.v_proj": 64,
125
+ "layer_12.self_attn.k_proj": 32,
126
+ "layer_12.self_attn.q_proj": 32,
127
+ "layer_13.mlp.up_proj": 64,
128
+ "layer_13.mlp.down_proj": 32,
129
+ "layer_13.mlp.gate_proj": 32,
130
+ "layer_13.self_attn.o_proj": 64,
131
+ "layer_13.self_attn.v_proj": 64,
132
+ "layer_13.self_attn.k_proj": 32,
133
+ "layer_13.self_attn.q_proj": 32,
134
+ "layer_14.mlp.up_proj": 32,
135
+ "layer_14.mlp.down_proj": 32,
136
+ "layer_14.mlp.gate_proj": 32,
137
+ "layer_14.self_attn.o_proj": 64,
138
+ "layer_14.self_attn.v_proj": 64,
139
+ "layer_14.self_attn.k_proj": 32,
140
+ "layer_14.self_attn.q_proj": 32,
141
+ "layer_15.mlp.up_proj": 32,
142
+ "layer_15.mlp.down_proj": 32,
143
+ "layer_15.mlp.gate_proj": 32,
144
+ "layer_15.self_attn.o_proj": 64,
145
+ "layer_15.self_attn.v_proj": 64,
146
+ "layer_15.self_attn.k_proj": 32,
147
+ "layer_15.self_attn.q_proj": 32,
148
+ "layer_16.mlp.up_proj": 64,
149
+ "layer_16.mlp.down_proj": 32,
150
+ "layer_16.mlp.gate_proj": 32,
151
+ "layer_16.self_attn.o_proj": 64,
152
+ "layer_16.self_attn.v_proj": 64,
153
+ "layer_16.self_attn.k_proj": 32,
154
+ "layer_16.self_attn.q_proj": 32,
155
+ "layer_17.mlp.up_proj": 32,
156
+ "layer_17.mlp.down_proj": 32,
157
+ "layer_17.mlp.gate_proj": 32,
158
+ "layer_17.self_attn.o_proj": 64,
159
+ "layer_17.self_attn.v_proj": 64,
160
+ "layer_17.self_attn.k_proj": 32,
161
+ "layer_17.self_attn.q_proj": 32,
162
+ "layer_18.mlp.up_proj": 32,
163
+ "layer_18.mlp.down_proj": 32,
164
+ "layer_18.mlp.gate_proj": 32,
165
+ "layer_18.self_attn.o_proj": 64,
166
+ "layer_18.self_attn.v_proj": 64,
167
+ "layer_18.self_attn.k_proj": 32,
168
+ "layer_18.self_attn.q_proj": 32,
169
+ "layer_19.mlp.up_proj": 64,
170
+ "layer_19.mlp.down_proj": 32,
171
+ "layer_19.mlp.gate_proj": 32,
172
+ "layer_19.self_attn.o_proj": 64,
173
+ "layer_19.self_attn.v_proj": 64,
174
+ "layer_19.self_attn.k_proj": 32,
175
+ "layer_19.self_attn.q_proj": 32,
176
+ "layer_20.mlp.up_proj": 32,
177
+ "layer_20.mlp.down_proj": 32,
178
+ "layer_20.mlp.gate_proj": 32,
179
+ "layer_20.self_attn.o_proj": 32,
180
+ "layer_20.self_attn.v_proj": 64,
181
+ "layer_20.self_attn.k_proj": 32,
182
+ "layer_20.self_attn.q_proj": 64,
183
+ "layer_21.mlp.up_proj": 32,
184
+ "layer_21.mlp.down_proj": 32,
185
+ "layer_21.mlp.gate_proj": 32,
186
+ "layer_21.self_attn.o_proj": 64,
187
+ "layer_21.self_attn.v_proj": 64,
188
+ "layer_21.self_attn.k_proj": 32,
189
+ "layer_21.self_attn.q_proj": 32,
190
+ "layer_22.mlp.up_proj": 64,
191
+ "layer_22.mlp.down_proj": 32,
192
+ "layer_22.mlp.gate_proj": 32,
193
+ "layer_22.self_attn.o_proj": 64,
194
+ "layer_22.self_attn.v_proj": 64,
195
+ "layer_22.self_attn.k_proj": 32,
196
+ "layer_22.self_attn.q_proj": 32,
197
+ "layer_23.mlp.up_proj": 64,
198
+ "layer_23.mlp.down_proj": 64,
199
+ "layer_23.mlp.gate_proj": 64,
200
+ "layer_23.self_attn.o_proj": 64,
201
+ "layer_23.self_attn.v_proj": 64,
202
+ "layer_23.self_attn.k_proj": 64,
203
+ "layer_23.self_attn.q_proj": 64
204
+ }
205
+ }
206
+ }
adapters/humanizer-dpo/adapters.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:101a8d2833a7ad5f0610a112aadeedfc1571eacb9809464c1ce1f088ab2e269b
3
+ size 119050013
adapters/humanizer-dpo/optiq_lora_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "rank": 32,
3
+ "scale": 1.0,
4
+ "dropout": 0.0,
5
+ "rank_scaling": "by_bits",
6
+ "method": "dpo",
7
+ "dpo_beta": 0.1,
8
+ "dpo_learning_rate": 5e-05,
9
+ "dpo_warmup_iters": null,
10
+ "dpo_lr_schedule": "cosine",
11
+ "target_modules": [
12
+ "q_proj",
13
+ "k_proj",
14
+ "v_proj",
15
+ "o_proj",
16
+ "gate_proj",
17
+ "up_proj",
18
+ "down_proj"
19
+ ],
20
+ "num_layers": -1,
21
+ "use_dora": false,
22
+ "mask_prompt": true,
23
+ "batch_size": 1,
24
+ "iters": 300,
25
+ "learning_rate": 5e-05,
26
+ "max_seq_length": 2048,
27
+ "grad_accumulation_steps": 1,
28
+ "grad_checkpoint": true,
29
+ "val_batches": 25,
30
+ "steps_per_report": 10,
31
+ "steps_per_eval": 200,
32
+ "steps_per_save": 100,
33
+ "adapter_path": "adapters/humanizer-dpo-minicpm5-1b",
34
+ "mount_adapter": "adapters/humanizer-minicpm5-1b",
35
+ "clear_cache_threshold": 0,
36
+ "applied_ranks": {
37
+ "layer_0.mlp.up_proj": 64,
38
+ "layer_0.mlp.down_proj": 64,
39
+ "layer_0.mlp.gate_proj": 64,
40
+ "layer_0.self_attn.o_proj": 64,
41
+ "layer_0.self_attn.v_proj": 64,
42
+ "layer_0.self_attn.k_proj": 64,
43
+ "layer_0.self_attn.q_proj": 64,
44
+ "layer_1.mlp.up_proj": 32,
45
+ "layer_1.mlp.down_proj": 64,
46
+ "layer_1.mlp.gate_proj": 32,
47
+ "layer_1.self_attn.o_proj": 64,
48
+ "layer_1.self_attn.v_proj": 64,
49
+ "layer_1.self_attn.k_proj": 32,
50
+ "layer_1.self_attn.q_proj": 32,
51
+ "layer_2.mlp.up_proj": 32,
52
+ "layer_2.mlp.down_proj": 32,
53
+ "layer_2.mlp.gate_proj": 32,
54
+ "layer_2.self_attn.o_proj": 64,
55
+ "layer_2.self_attn.v_proj": 64,
56
+ "layer_2.self_attn.k_proj": 32,
57
+ "layer_2.self_attn.q_proj": 32,
58
+ "layer_3.mlp.up_proj": 32,
59
+ "layer_3.mlp.down_proj": 32,
60
+ "layer_3.mlp.gate_proj": 32,
61
+ "layer_3.self_attn.o_proj": 64,
62
+ "layer_3.self_attn.v_proj": 64,
63
+ "layer_3.self_attn.k_proj": 32,
64
+ "layer_3.self_attn.q_proj": 32,
65
+ "layer_4.mlp.up_proj": 32,
66
+ "layer_4.mlp.down_proj": 64,
67
+ "layer_4.mlp.gate_proj": 32,
68
+ "layer_4.self_attn.o_proj": 64,
69
+ "layer_4.self_attn.v_proj": 64,
70
+ "layer_4.self_attn.k_proj": 32,
71
+ "layer_4.self_attn.q_proj": 32,
72
+ "layer_5.mlp.up_proj": 32,
73
+ "layer_5.mlp.down_proj": 32,
74
+ "layer_5.mlp.gate_proj": 32,
75
+ "layer_5.self_attn.o_proj": 32,
76
+ "layer_5.self_attn.v_proj": 64,
77
+ "layer_5.self_attn.k_proj": 32,
78
+ "layer_5.self_attn.q_proj": 64,
79
+ "layer_6.mlp.up_proj": 32,
80
+ "layer_6.mlp.down_proj": 32,
81
+ "layer_6.mlp.gate_proj": 32,
82
+ "layer_6.self_attn.o_proj": 64,
83
+ "layer_6.self_attn.v_proj": 64,
84
+ "layer_6.self_attn.k_proj": 32,
85
+ "layer_6.self_attn.q_proj": 32,
86
+ "layer_7.mlp.up_proj": 64,
87
+ "layer_7.mlp.down_proj": 32,
88
+ "layer_7.mlp.gate_proj": 32,
89
+ "layer_7.self_attn.o_proj": 64,
90
+ "layer_7.self_attn.v_proj": 32,
91
+ "layer_7.self_attn.k_proj": 32,
92
+ "layer_7.self_attn.q_proj": 64,
93
+ "layer_8.mlp.up_proj": 32,
94
+ "layer_8.mlp.down_proj": 32,
95
+ "layer_8.mlp.gate_proj": 32,
96
+ "layer_8.self_attn.o_proj": 64,
97
+ "layer_8.self_attn.v_proj": 32,
98
+ "layer_8.self_attn.k_proj": 32,
99
+ "layer_8.self_attn.q_proj": 64,
100
+ "layer_9.mlp.up_proj": 32,
101
+ "layer_9.mlp.down_proj": 32,
102
+ "layer_9.mlp.gate_proj": 32,
103
+ "layer_9.self_attn.o_proj": 64,
104
+ "layer_9.self_attn.v_proj": 64,
105
+ "layer_9.self_attn.k_proj": 32,
106
+ "layer_9.self_attn.q_proj": 32,
107
+ "layer_10.mlp.up_proj": 64,
108
+ "layer_10.mlp.down_proj": 32,
109
+ "layer_10.mlp.gate_proj": 32,
110
+ "layer_10.self_attn.o_proj": 64,
111
+ "layer_10.self_attn.v_proj": 64,
112
+ "layer_10.self_attn.k_proj": 32,
113
+ "layer_10.self_attn.q_proj": 32,
114
+ "layer_11.mlp.up_proj": 32,
115
+ "layer_11.mlp.down_proj": 32,
116
+ "layer_11.mlp.gate_proj": 32,
117
+ "layer_11.self_attn.o_proj": 64,
118
+ "layer_11.self_attn.v_proj": 64,
119
+ "layer_11.self_attn.k_proj": 32,
120
+ "layer_11.self_attn.q_proj": 32,
121
+ "layer_12.mlp.up_proj": 32,
122
+ "layer_12.mlp.down_proj": 32,
123
+ "layer_12.mlp.gate_proj": 32,
124
+ "layer_12.self_attn.o_proj": 64,
125
+ "layer_12.self_attn.v_proj": 64,
126
+ "layer_12.self_attn.k_proj": 32,
127
+ "layer_12.self_attn.q_proj": 32,
128
+ "layer_13.mlp.up_proj": 64,
129
+ "layer_13.mlp.down_proj": 32,
130
+ "layer_13.mlp.gate_proj": 32,
131
+ "layer_13.self_attn.o_proj": 64,
132
+ "layer_13.self_attn.v_proj": 64,
133
+ "layer_13.self_attn.k_proj": 32,
134
+ "layer_13.self_attn.q_proj": 32,
135
+ "layer_14.mlp.up_proj": 32,
136
+ "layer_14.mlp.down_proj": 32,
137
+ "layer_14.mlp.gate_proj": 32,
138
+ "layer_14.self_attn.o_proj": 64,
139
+ "layer_14.self_attn.v_proj": 64,
140
+ "layer_14.self_attn.k_proj": 32,
141
+ "layer_14.self_attn.q_proj": 32,
142
+ "layer_15.mlp.up_proj": 32,
143
+ "layer_15.mlp.down_proj": 32,
144
+ "layer_15.mlp.gate_proj": 32,
145
+ "layer_15.self_attn.o_proj": 64,
146
+ "layer_15.self_attn.v_proj": 64,
147
+ "layer_15.self_attn.k_proj": 32,
148
+ "layer_15.self_attn.q_proj": 32,
149
+ "layer_16.mlp.up_proj": 64,
150
+ "layer_16.mlp.down_proj": 32,
151
+ "layer_16.mlp.gate_proj": 32,
152
+ "layer_16.self_attn.o_proj": 64,
153
+ "layer_16.self_attn.v_proj": 64,
154
+ "layer_16.self_attn.k_proj": 32,
155
+ "layer_16.self_attn.q_proj": 32,
156
+ "layer_17.mlp.up_proj": 32,
157
+ "layer_17.mlp.down_proj": 32,
158
+ "layer_17.mlp.gate_proj": 32,
159
+ "layer_17.self_attn.o_proj": 64,
160
+ "layer_17.self_attn.v_proj": 64,
161
+ "layer_17.self_attn.k_proj": 32,
162
+ "layer_17.self_attn.q_proj": 32,
163
+ "layer_18.mlp.up_proj": 32,
164
+ "layer_18.mlp.down_proj": 32,
165
+ "layer_18.mlp.gate_proj": 32,
166
+ "layer_18.self_attn.o_proj": 64,
167
+ "layer_18.self_attn.v_proj": 64,
168
+ "layer_18.self_attn.k_proj": 32,
169
+ "layer_18.self_attn.q_proj": 32,
170
+ "layer_19.mlp.up_proj": 64,
171
+ "layer_19.mlp.down_proj": 32,
172
+ "layer_19.mlp.gate_proj": 32,
173
+ "layer_19.self_attn.o_proj": 64,
174
+ "layer_19.self_attn.v_proj": 64,
175
+ "layer_19.self_attn.k_proj": 32,
176
+ "layer_19.self_attn.q_proj": 32,
177
+ "layer_20.mlp.up_proj": 32,
178
+ "layer_20.mlp.down_proj": 32,
179
+ "layer_20.mlp.gate_proj": 32,
180
+ "layer_20.self_attn.o_proj": 32,
181
+ "layer_20.self_attn.v_proj": 64,
182
+ "layer_20.self_attn.k_proj": 32,
183
+ "layer_20.self_attn.q_proj": 64,
184
+ "layer_21.mlp.up_proj": 32,
185
+ "layer_21.mlp.down_proj": 32,
186
+ "layer_21.mlp.gate_proj": 32,
187
+ "layer_21.self_attn.o_proj": 64,
188
+ "layer_21.self_attn.v_proj": 64,
189
+ "layer_21.self_attn.k_proj": 32,
190
+ "layer_21.self_attn.q_proj": 32,
191
+ "layer_22.mlp.up_proj": 64,
192
+ "layer_22.mlp.down_proj": 32,
193
+ "layer_22.mlp.gate_proj": 32,
194
+ "layer_22.self_attn.o_proj": 64,
195
+ "layer_22.self_attn.v_proj": 64,
196
+ "layer_22.self_attn.k_proj": 32,
197
+ "layer_22.self_attn.q_proj": 32,
198
+ "layer_23.mlp.up_proj": 64,
199
+ "layer_23.mlp.down_proj": 64,
200
+ "layer_23.mlp.gate_proj": 64,
201
+ "layer_23.self_attn.o_proj": 64,
202
+ "layer_23.self_attn.v_proj": 64,
203
+ "layer_23.self_attn.k_proj": 64,
204
+ "layer_23.self_attn.q_proj": 64
205
+ },
206
+ "source_model": "optiq_output/openbmb_MiniCPM5-1B/optiq_mixed"
207
+ }
adapters/humanizer-sft/adapter_config.json ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fine_tune_type": "lora",
3
+ "num_layers": -1,
4
+ "lora_parameters": {
5
+ "rank": 32,
6
+ "scale": 1.0,
7
+ "dropout": 0.0,
8
+ "keys": null
9
+ },
10
+ "base_model_name_or_path": "optiq_output/openbmb_MiniCPM5-1B/optiq_mixed",
11
+ "bias": "none",
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": false,
14
+ "init_lora_weights": true,
15
+ "layers_to_transform": null,
16
+ "layers_pattern": null,
17
+ "lora_alpha": 32,
18
+ "lora_dropout": 0.0,
19
+ "modules_to_save": null,
20
+ "peft_type": "LORA",
21
+ "r": 32,
22
+ "revision": null,
23
+ "target_modules": [
24
+ "q_proj",
25
+ "k_proj",
26
+ "v_proj",
27
+ "o_proj",
28
+ "gate_proj",
29
+ "up_proj",
30
+ "down_proj"
31
+ ],
32
+ "task_type": "CAUSAL_LM",
33
+ "optiq": {
34
+ "rank_scaling": "by_bits",
35
+ "applied_ranks": {
36
+ "layer_0.mlp.up_proj": 64,
37
+ "layer_0.mlp.down_proj": 64,
38
+ "layer_0.mlp.gate_proj": 64,
39
+ "layer_0.self_attn.o_proj": 64,
40
+ "layer_0.self_attn.v_proj": 64,
41
+ "layer_0.self_attn.k_proj": 64,
42
+ "layer_0.self_attn.q_proj": 64,
43
+ "layer_1.mlp.up_proj": 32,
44
+ "layer_1.mlp.down_proj": 64,
45
+ "layer_1.mlp.gate_proj": 32,
46
+ "layer_1.self_attn.o_proj": 64,
47
+ "layer_1.self_attn.v_proj": 64,
48
+ "layer_1.self_attn.k_proj": 32,
49
+ "layer_1.self_attn.q_proj": 32,
50
+ "layer_2.mlp.up_proj": 32,
51
+ "layer_2.mlp.down_proj": 32,
52
+ "layer_2.mlp.gate_proj": 32,
53
+ "layer_2.self_attn.o_proj": 64,
54
+ "layer_2.self_attn.v_proj": 64,
55
+ "layer_2.self_attn.k_proj": 32,
56
+ "layer_2.self_attn.q_proj": 32,
57
+ "layer_3.mlp.up_proj": 32,
58
+ "layer_3.mlp.down_proj": 32,
59
+ "layer_3.mlp.gate_proj": 32,
60
+ "layer_3.self_attn.o_proj": 64,
61
+ "layer_3.self_attn.v_proj": 64,
62
+ "layer_3.self_attn.k_proj": 32,
63
+ "layer_3.self_attn.q_proj": 32,
64
+ "layer_4.mlp.up_proj": 32,
65
+ "layer_4.mlp.down_proj": 64,
66
+ "layer_4.mlp.gate_proj": 32,
67
+ "layer_4.self_attn.o_proj": 64,
68
+ "layer_4.self_attn.v_proj": 64,
69
+ "layer_4.self_attn.k_proj": 32,
70
+ "layer_4.self_attn.q_proj": 32,
71
+ "layer_5.mlp.up_proj": 32,
72
+ "layer_5.mlp.down_proj": 32,
73
+ "layer_5.mlp.gate_proj": 32,
74
+ "layer_5.self_attn.o_proj": 32,
75
+ "layer_5.self_attn.v_proj": 64,
76
+ "layer_5.self_attn.k_proj": 32,
77
+ "layer_5.self_attn.q_proj": 64,
78
+ "layer_6.mlp.up_proj": 32,
79
+ "layer_6.mlp.down_proj": 32,
80
+ "layer_6.mlp.gate_proj": 32,
81
+ "layer_6.self_attn.o_proj": 64,
82
+ "layer_6.self_attn.v_proj": 64,
83
+ "layer_6.self_attn.k_proj": 32,
84
+ "layer_6.self_attn.q_proj": 32,
85
+ "layer_7.mlp.up_proj": 64,
86
+ "layer_7.mlp.down_proj": 32,
87
+ "layer_7.mlp.gate_proj": 32,
88
+ "layer_7.self_attn.o_proj": 64,
89
+ "layer_7.self_attn.v_proj": 32,
90
+ "layer_7.self_attn.k_proj": 32,
91
+ "layer_7.self_attn.q_proj": 64,
92
+ "layer_8.mlp.up_proj": 32,
93
+ "layer_8.mlp.down_proj": 32,
94
+ "layer_8.mlp.gate_proj": 32,
95
+ "layer_8.self_attn.o_proj": 64,
96
+ "layer_8.self_attn.v_proj": 32,
97
+ "layer_8.self_attn.k_proj": 32,
98
+ "layer_8.self_attn.q_proj": 64,
99
+ "layer_9.mlp.up_proj": 32,
100
+ "layer_9.mlp.down_proj": 32,
101
+ "layer_9.mlp.gate_proj": 32,
102
+ "layer_9.self_attn.o_proj": 64,
103
+ "layer_9.self_attn.v_proj": 64,
104
+ "layer_9.self_attn.k_proj": 32,
105
+ "layer_9.self_attn.q_proj": 32,
106
+ "layer_10.mlp.up_proj": 64,
107
+ "layer_10.mlp.down_proj": 32,
108
+ "layer_10.mlp.gate_proj": 32,
109
+ "layer_10.self_attn.o_proj": 64,
110
+ "layer_10.self_attn.v_proj": 64,
111
+ "layer_10.self_attn.k_proj": 32,
112
+ "layer_10.self_attn.q_proj": 32,
113
+ "layer_11.mlp.up_proj": 32,
114
+ "layer_11.mlp.down_proj": 32,
115
+ "layer_11.mlp.gate_proj": 32,
116
+ "layer_11.self_attn.o_proj": 64,
117
+ "layer_11.self_attn.v_proj": 64,
118
+ "layer_11.self_attn.k_proj": 32,
119
+ "layer_11.self_attn.q_proj": 32,
120
+ "layer_12.mlp.up_proj": 32,
121
+ "layer_12.mlp.down_proj": 32,
122
+ "layer_12.mlp.gate_proj": 32,
123
+ "layer_12.self_attn.o_proj": 64,
124
+ "layer_12.self_attn.v_proj": 64,
125
+ "layer_12.self_attn.k_proj": 32,
126
+ "layer_12.self_attn.q_proj": 32,
127
+ "layer_13.mlp.up_proj": 64,
128
+ "layer_13.mlp.down_proj": 32,
129
+ "layer_13.mlp.gate_proj": 32,
130
+ "layer_13.self_attn.o_proj": 64,
131
+ "layer_13.self_attn.v_proj": 64,
132
+ "layer_13.self_attn.k_proj": 32,
133
+ "layer_13.self_attn.q_proj": 32,
134
+ "layer_14.mlp.up_proj": 32,
135
+ "layer_14.mlp.down_proj": 32,
136
+ "layer_14.mlp.gate_proj": 32,
137
+ "layer_14.self_attn.o_proj": 64,
138
+ "layer_14.self_attn.v_proj": 64,
139
+ "layer_14.self_attn.k_proj": 32,
140
+ "layer_14.self_attn.q_proj": 32,
141
+ "layer_15.mlp.up_proj": 32,
142
+ "layer_15.mlp.down_proj": 32,
143
+ "layer_15.mlp.gate_proj": 32,
144
+ "layer_15.self_attn.o_proj": 64,
145
+ "layer_15.self_attn.v_proj": 64,
146
+ "layer_15.self_attn.k_proj": 32,
147
+ "layer_15.self_attn.q_proj": 32,
148
+ "layer_16.mlp.up_proj": 64,
149
+ "layer_16.mlp.down_proj": 32,
150
+ "layer_16.mlp.gate_proj": 32,
151
+ "layer_16.self_attn.o_proj": 64,
152
+ "layer_16.self_attn.v_proj": 64,
153
+ "layer_16.self_attn.k_proj": 32,
154
+ "layer_16.self_attn.q_proj": 32,
155
+ "layer_17.mlp.up_proj": 32,
156
+ "layer_17.mlp.down_proj": 32,
157
+ "layer_17.mlp.gate_proj": 32,
158
+ "layer_17.self_attn.o_proj": 64,
159
+ "layer_17.self_attn.v_proj": 64,
160
+ "layer_17.self_attn.k_proj": 32,
161
+ "layer_17.self_attn.q_proj": 32,
162
+ "layer_18.mlp.up_proj": 32,
163
+ "layer_18.mlp.down_proj": 32,
164
+ "layer_18.mlp.gate_proj": 32,
165
+ "layer_18.self_attn.o_proj": 64,
166
+ "layer_18.self_attn.v_proj": 64,
167
+ "layer_18.self_attn.k_proj": 32,
168
+ "layer_18.self_attn.q_proj": 32,
169
+ "layer_19.mlp.up_proj": 64,
170
+ "layer_19.mlp.down_proj": 32,
171
+ "layer_19.mlp.gate_proj": 32,
172
+ "layer_19.self_attn.o_proj": 64,
173
+ "layer_19.self_attn.v_proj": 64,
174
+ "layer_19.self_attn.k_proj": 32,
175
+ "layer_19.self_attn.q_proj": 32,
176
+ "layer_20.mlp.up_proj": 32,
177
+ "layer_20.mlp.down_proj": 32,
178
+ "layer_20.mlp.gate_proj": 32,
179
+ "layer_20.self_attn.o_proj": 32,
180
+ "layer_20.self_attn.v_proj": 64,
181
+ "layer_20.self_attn.k_proj": 32,
182
+ "layer_20.self_attn.q_proj": 64,
183
+ "layer_21.mlp.up_proj": 32,
184
+ "layer_21.mlp.down_proj": 32,
185
+ "layer_21.mlp.gate_proj": 32,
186
+ "layer_21.self_attn.o_proj": 64,
187
+ "layer_21.self_attn.v_proj": 64,
188
+ "layer_21.self_attn.k_proj": 32,
189
+ "layer_21.self_attn.q_proj": 32,
190
+ "layer_22.mlp.up_proj": 64,
191
+ "layer_22.mlp.down_proj": 32,
192
+ "layer_22.mlp.gate_proj": 32,
193
+ "layer_22.self_attn.o_proj": 64,
194
+ "layer_22.self_attn.v_proj": 64,
195
+ "layer_22.self_attn.k_proj": 32,
196
+ "layer_22.self_attn.q_proj": 32,
197
+ "layer_23.mlp.up_proj": 64,
198
+ "layer_23.mlp.down_proj": 64,
199
+ "layer_23.mlp.gate_proj": 64,
200
+ "layer_23.self_attn.o_proj": 64,
201
+ "layer_23.self_attn.v_proj": 64,
202
+ "layer_23.self_attn.k_proj": 64,
203
+ "layer_23.self_attn.q_proj": 64
204
+ }
205
+ }
206
+ }
adapters/humanizer-sft/adapters.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2dd9685946a90b65dcd123a83a1f401b5b47ccbd6a68c09b19d8b6d2fe98e650
3
+ size 119206006
adapters/humanizer-sft/optiq_lora_config.json ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "rank": 32,
3
+ "scale": 1.0,
4
+ "dropout": 0.0,
5
+ "rank_scaling": "by_bits",
6
+ "method": "sft",
7
+ "dpo_beta": 0.1,
8
+ "dpo_learning_rate": 5e-05,
9
+ "dpo_warmup_iters": null,
10
+ "dpo_lr_schedule": "cosine",
11
+ "target_modules": [
12
+ "q_proj",
13
+ "k_proj",
14
+ "v_proj",
15
+ "o_proj",
16
+ "gate_proj",
17
+ "up_proj",
18
+ "down_proj"
19
+ ],
20
+ "num_layers": -1,
21
+ "use_dora": false,
22
+ "mask_prompt": true,
23
+ "batch_size": 1,
24
+ "iters": 600,
25
+ "learning_rate": 0.0002,
26
+ "max_seq_length": 2048,
27
+ "grad_accumulation_steps": 1,
28
+ "grad_checkpoint": true,
29
+ "val_batches": 25,
30
+ "steps_per_report": 10,
31
+ "steps_per_eval": 200,
32
+ "steps_per_save": 100,
33
+ "adapter_path": "adapters/humanizer-minicpm5-1b",
34
+ "clear_cache_threshold": 0,
35
+ "applied_ranks": {
36
+ "layer_0.mlp.up_proj": 64,
37
+ "layer_0.mlp.down_proj": 64,
38
+ "layer_0.mlp.gate_proj": 64,
39
+ "layer_0.self_attn.o_proj": 64,
40
+ "layer_0.self_attn.v_proj": 64,
41
+ "layer_0.self_attn.k_proj": 64,
42
+ "layer_0.self_attn.q_proj": 64,
43
+ "layer_1.mlp.up_proj": 32,
44
+ "layer_1.mlp.down_proj": 64,
45
+ "layer_1.mlp.gate_proj": 32,
46
+ "layer_1.self_attn.o_proj": 64,
47
+ "layer_1.self_attn.v_proj": 64,
48
+ "layer_1.self_attn.k_proj": 32,
49
+ "layer_1.self_attn.q_proj": 32,
50
+ "layer_2.mlp.up_proj": 32,
51
+ "layer_2.mlp.down_proj": 32,
52
+ "layer_2.mlp.gate_proj": 32,
53
+ "layer_2.self_attn.o_proj": 64,
54
+ "layer_2.self_attn.v_proj": 64,
55
+ "layer_2.self_attn.k_proj": 32,
56
+ "layer_2.self_attn.q_proj": 32,
57
+ "layer_3.mlp.up_proj": 32,
58
+ "layer_3.mlp.down_proj": 32,
59
+ "layer_3.mlp.gate_proj": 32,
60
+ "layer_3.self_attn.o_proj": 64,
61
+ "layer_3.self_attn.v_proj": 64,
62
+ "layer_3.self_attn.k_proj": 32,
63
+ "layer_3.self_attn.q_proj": 32,
64
+ "layer_4.mlp.up_proj": 32,
65
+ "layer_4.mlp.down_proj": 64,
66
+ "layer_4.mlp.gate_proj": 32,
67
+ "layer_4.self_attn.o_proj": 64,
68
+ "layer_4.self_attn.v_proj": 64,
69
+ "layer_4.self_attn.k_proj": 32,
70
+ "layer_4.self_attn.q_proj": 32,
71
+ "layer_5.mlp.up_proj": 32,
72
+ "layer_5.mlp.down_proj": 32,
73
+ "layer_5.mlp.gate_proj": 32,
74
+ "layer_5.self_attn.o_proj": 32,
75
+ "layer_5.self_attn.v_proj": 64,
76
+ "layer_5.self_attn.k_proj": 32,
77
+ "layer_5.self_attn.q_proj": 64,
78
+ "layer_6.mlp.up_proj": 32,
79
+ "layer_6.mlp.down_proj": 32,
80
+ "layer_6.mlp.gate_proj": 32,
81
+ "layer_6.self_attn.o_proj": 64,
82
+ "layer_6.self_attn.v_proj": 64,
83
+ "layer_6.self_attn.k_proj": 32,
84
+ "layer_6.self_attn.q_proj": 32,
85
+ "layer_7.mlp.up_proj": 64,
86
+ "layer_7.mlp.down_proj": 32,
87
+ "layer_7.mlp.gate_proj": 32,
88
+ "layer_7.self_attn.o_proj": 64,
89
+ "layer_7.self_attn.v_proj": 32,
90
+ "layer_7.self_attn.k_proj": 32,
91
+ "layer_7.self_attn.q_proj": 64,
92
+ "layer_8.mlp.up_proj": 32,
93
+ "layer_8.mlp.down_proj": 32,
94
+ "layer_8.mlp.gate_proj": 32,
95
+ "layer_8.self_attn.o_proj": 64,
96
+ "layer_8.self_attn.v_proj": 32,
97
+ "layer_8.self_attn.k_proj": 32,
98
+ "layer_8.self_attn.q_proj": 64,
99
+ "layer_9.mlp.up_proj": 32,
100
+ "layer_9.mlp.down_proj": 32,
101
+ "layer_9.mlp.gate_proj": 32,
102
+ "layer_9.self_attn.o_proj": 64,
103
+ "layer_9.self_attn.v_proj": 64,
104
+ "layer_9.self_attn.k_proj": 32,
105
+ "layer_9.self_attn.q_proj": 32,
106
+ "layer_10.mlp.up_proj": 64,
107
+ "layer_10.mlp.down_proj": 32,
108
+ "layer_10.mlp.gate_proj": 32,
109
+ "layer_10.self_attn.o_proj": 64,
110
+ "layer_10.self_attn.v_proj": 64,
111
+ "layer_10.self_attn.k_proj": 32,
112
+ "layer_10.self_attn.q_proj": 32,
113
+ "layer_11.mlp.up_proj": 32,
114
+ "layer_11.mlp.down_proj": 32,
115
+ "layer_11.mlp.gate_proj": 32,
116
+ "layer_11.self_attn.o_proj": 64,
117
+ "layer_11.self_attn.v_proj": 64,
118
+ "layer_11.self_attn.k_proj": 32,
119
+ "layer_11.self_attn.q_proj": 32,
120
+ "layer_12.mlp.up_proj": 32,
121
+ "layer_12.mlp.down_proj": 32,
122
+ "layer_12.mlp.gate_proj": 32,
123
+ "layer_12.self_attn.o_proj": 64,
124
+ "layer_12.self_attn.v_proj": 64,
125
+ "layer_12.self_attn.k_proj": 32,
126
+ "layer_12.self_attn.q_proj": 32,
127
+ "layer_13.mlp.up_proj": 64,
128
+ "layer_13.mlp.down_proj": 32,
129
+ "layer_13.mlp.gate_proj": 32,
130
+ "layer_13.self_attn.o_proj": 64,
131
+ "layer_13.self_attn.v_proj": 64,
132
+ "layer_13.self_attn.k_proj": 32,
133
+ "layer_13.self_attn.q_proj": 32,
134
+ "layer_14.mlp.up_proj": 32,
135
+ "layer_14.mlp.down_proj": 32,
136
+ "layer_14.mlp.gate_proj": 32,
137
+ "layer_14.self_attn.o_proj": 64,
138
+ "layer_14.self_attn.v_proj": 64,
139
+ "layer_14.self_attn.k_proj": 32,
140
+ "layer_14.self_attn.q_proj": 32,
141
+ "layer_15.mlp.up_proj": 32,
142
+ "layer_15.mlp.down_proj": 32,
143
+ "layer_15.mlp.gate_proj": 32,
144
+ "layer_15.self_attn.o_proj": 64,
145
+ "layer_15.self_attn.v_proj": 64,
146
+ "layer_15.self_attn.k_proj": 32,
147
+ "layer_15.self_attn.q_proj": 32,
148
+ "layer_16.mlp.up_proj": 64,
149
+ "layer_16.mlp.down_proj": 32,
150
+ "layer_16.mlp.gate_proj": 32,
151
+ "layer_16.self_attn.o_proj": 64,
152
+ "layer_16.self_attn.v_proj": 64,
153
+ "layer_16.self_attn.k_proj": 32,
154
+ "layer_16.self_attn.q_proj": 32,
155
+ "layer_17.mlp.up_proj": 32,
156
+ "layer_17.mlp.down_proj": 32,
157
+ "layer_17.mlp.gate_proj": 32,
158
+ "layer_17.self_attn.o_proj": 64,
159
+ "layer_17.self_attn.v_proj": 64,
160
+ "layer_17.self_attn.k_proj": 32,
161
+ "layer_17.self_attn.q_proj": 32,
162
+ "layer_18.mlp.up_proj": 32,
163
+ "layer_18.mlp.down_proj": 32,
164
+ "layer_18.mlp.gate_proj": 32,
165
+ "layer_18.self_attn.o_proj": 64,
166
+ "layer_18.self_attn.v_proj": 64,
167
+ "layer_18.self_attn.k_proj": 32,
168
+ "layer_18.self_attn.q_proj": 32,
169
+ "layer_19.mlp.up_proj": 64,
170
+ "layer_19.mlp.down_proj": 32,
171
+ "layer_19.mlp.gate_proj": 32,
172
+ "layer_19.self_attn.o_proj": 64,
173
+ "layer_19.self_attn.v_proj": 64,
174
+ "layer_19.self_attn.k_proj": 32,
175
+ "layer_19.self_attn.q_proj": 32,
176
+ "layer_20.mlp.up_proj": 32,
177
+ "layer_20.mlp.down_proj": 32,
178
+ "layer_20.mlp.gate_proj": 32,
179
+ "layer_20.self_attn.o_proj": 32,
180
+ "layer_20.self_attn.v_proj": 64,
181
+ "layer_20.self_attn.k_proj": 32,
182
+ "layer_20.self_attn.q_proj": 64,
183
+ "layer_21.mlp.up_proj": 32,
184
+ "layer_21.mlp.down_proj": 32,
185
+ "layer_21.mlp.gate_proj": 32,
186
+ "layer_21.self_attn.o_proj": 64,
187
+ "layer_21.self_attn.v_proj": 64,
188
+ "layer_21.self_attn.k_proj": 32,
189
+ "layer_21.self_attn.q_proj": 32,
190
+ "layer_22.mlp.up_proj": 64,
191
+ "layer_22.mlp.down_proj": 32,
192
+ "layer_22.mlp.gate_proj": 32,
193
+ "layer_22.self_attn.o_proj": 64,
194
+ "layer_22.self_attn.v_proj": 64,
195
+ "layer_22.self_attn.k_proj": 32,
196
+ "layer_22.self_attn.q_proj": 32,
197
+ "layer_23.mlp.up_proj": 64,
198
+ "layer_23.mlp.down_proj": 64,
199
+ "layer_23.mlp.gate_proj": 64,
200
+ "layer_23.self_attn.o_proj": 64,
201
+ "layer_23.self_attn.v_proj": 64,
202
+ "layer_23.self_attn.k_proj": 64,
203
+ "layer_23.self_attn.q_proj": 64
204
+ },
205
+ "source_model": "optiq_output/openbmb_MiniCPM5-1B/optiq_mixed"
206
+ }
chat_template.jinja ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {{- bos_token }}{%- if tools %}
2
+ {%- set tool_definitions %}
3
+ {{- "# Tools\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
4
+ {%- for tool in tools %}
5
+ {{- "\n" }}
6
+ {{- tool | tojson(ensure_ascii=False) }}
7
+ {%- endfor %}
8
+ {{- '\n</tools>\n\nTool usage guidelines:\n- You may call zero or more functions. If no function calls are needed, just answer normally and do not include any <function ... </function>.\n- When calling a function, return an XML object within <function ... </function> using:\n<function name="function-name"><param name="param-name">param-value</param></function>\n- param-value may be multi-line. If it contains <, & or newline characters, wrap it in a CDATA block: <param name="param-name"><![CDATA[...multi-line value...]]></param>' }}
9
+ {%- endset %}
10
+
11
+ {{- '<|im_start|>system\n' }}
12
+ {%- if messages[0].role == 'system' %}
13
+ {%- if '<tool_def_sep>' in messages[0].content %}
14
+ {{- messages[0].content.replace('<tool_def_sep>', tool_definitions) }}
15
+ {%- else %}
16
+ {{- messages[0].content + '\n\n' + tool_definitions }}
17
+ {%- endif %}
18
+ {%- else %}
19
+ {{- tool_definitions.lstrip() }}
20
+ {%- endif %}
21
+ {{- '<|im_end|>\n' }}
22
+ {%- else %}
23
+ {%- if messages[0].role == 'system' %}
24
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
25
+ {%- endif %}
26
+ {%- endif %}
27
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
28
+ {%- for message in messages[::-1] %}
29
+ {%- set index = (messages|length - 1) - loop.index0 %}
30
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
31
+ {%- set ns.multi_step_tool = false %}
32
+ {%- set ns.last_query_index = index %}
33
+ {%- endif %}
34
+ {%- endfor %}
35
+ {%- for message in messages %}
36
+ {%- if message.content is string %}
37
+ {%- set content = message.content %}
38
+ {%- else %}
39
+ {%- set content = '' %}
40
+ {%- endif %}
41
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
42
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
43
+ {%- elif message.role == "assistant" %}
44
+ {%- set reasoning_content = '' %}
45
+ {%- if message.reasoning_content is string %}
46
+ {%- set reasoning_content = message.reasoning_content %}
47
+ {%- else %}
48
+ {%- if '</think>' in content %}
49
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
50
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
51
+ {%- endif %}
52
+ {%- endif %}
53
+
54
+ {%- if message.tool_calls %}
55
+ {%- set content_parts = content.split('<tool_sep>') %}
56
+ {%- set processed_content = content_parts[0] %}
57
+ {%- set tool_calls_count = message.tool_calls|length %}
58
+ {%- set tool_sep_count = content_parts|length - 1 %}
59
+ {%- set min_count = [tool_calls_count, tool_sep_count]|min %}
60
+
61
+ {%- for i in range(1, content_parts|length) %}
62
+ {%- set tool_index = i - 1 %}
63
+ {%- if tool_index < tool_calls_count %}
64
+ {%- set tool_call = message.tool_calls[tool_index] %}
65
+ {%- if tool_call.function %}
66
+ {%- set tool_call = tool_call.function %}
67
+ {%- endif %}
68
+ {%- set single_tool_xml %}
69
+ {{- '<function name="' ~ tool_call.name ~ '">' }}
70
+ {%- if tool_call.arguments %}
71
+ {%- set args_dict = tool_call.arguments %}
72
+ {%- for param_name, param_value in args_dict.items() %}
73
+ {{- '<param name="' ~ param_name ~ '">' }}
74
+ {%- if param_value is string and ('<' in param_value or '&' in param_value or '\n' in param_value) %}
75
+ {{- '<![CDATA[' + param_value + ']]>' }}
76
+ {%- else %}
77
+ {{- param_value }}
78
+ {%- endif %}
79
+ {{- '</param>' }}
80
+ {%- endfor %}
81
+ {%- endif %}
82
+ {{- '</function>' }}
83
+ {%- endset %}
84
+ {%- set processed_content = processed_content + single_tool_xml + content_parts[i] %}
85
+ {%- else %}
86
+ {%- set processed_content = processed_content + content_parts[i] %}
87
+ {%- endif %}
88
+ {%- endfor %}
89
+
90
+ {%- if tool_calls_count > tool_sep_count %}
91
+ {%- for remaining_index in range(tool_sep_count, tool_calls_count) %}
92
+ {%- set tool_call = message.tool_calls[remaining_index] %}
93
+ {%- if tool_call.function %}
94
+ {%- set tool_call = tool_call.function %}
95
+ {%- endif %}
96
+ {%- set remaining_tool_xml %}
97
+ {{- '<function name="' ~ tool_call.name ~ '">' }}
98
+ {%- if tool_call.arguments %}
99
+ {%- set args_dict = tool_call.arguments %}
100
+ {%- for param_name, param_value in args_dict.items() %}
101
+ {{- '<param name="' ~ param_name ~ '">' }}
102
+ {%- if param_value is string and ('<' in param_value or '&' in param_value or '\n' in param_value) %}
103
+ {{- '<![CDATA[' + param_value + ']]>' }}
104
+ {%- else %}
105
+ {{- param_value }}
106
+ {%- endif %}
107
+ {{- '</param>' }}
108
+ {%- endfor %}
109
+ {%- endif %}
110
+ {{- '</function>' }}
111
+ {%- endset %}
112
+ {%- set processed_content = processed_content + remaining_tool_xml %}
113
+ {%- endfor %}
114
+ {%- endif %}
115
+
116
+ {%- set content = processed_content %}
117
+ {%- endif %}
118
+
119
+ {%- if loop.index0 > ns.last_query_index %}
120
+ {%- if reasoning_content %}
121
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
122
+ {%- else %}
123
+ {{- '<|im_start|>' + message.role + '\n' + content }}
124
+ {%- endif %}
125
+ {%- else %}
126
+ {{- '<|im_start|>' + message.role + '\n' + content }}
127
+ {%- endif %}
128
+
129
+ {%- if message.tool_calls and not has_tool_sep %}
130
+ {%- for tool_call in message.tool_calls %}
131
+ {%- if (loop.first and content) or (not loop.first) %}
132
+ {{- '\n' }}
133
+ {%- endif %}
134
+ {%- if tool_call.function %}
135
+ {%- set tool_call = tool_call.function %}
136
+ {%- endif %}
137
+ {{- '<function name="' ~ tool_call.name ~ '">' }}
138
+ {%- if tool_call.arguments %}
139
+ {%- set args_dict = tool_call.arguments %}
140
+ {%- for param_name, param_value in args_dict.items() %}
141
+ {{- '<param name="' ~ param_name ~ '">' }}
142
+ {%- if param_value is string and ('<' in param_value or '&' in param_value or '\n' in param_value) %}
143
+ {{- '<![CDATA[' + param_value + ']]>' }}
144
+ {%- else %}
145
+ {{- param_value }}
146
+ {%- endif %}
147
+ {{- '</param>' }}
148
+ {%- endfor %}
149
+ {%- endif %}
150
+ {{- '</function>' }}
151
+ {%- endfor %}
152
+ {%- endif %}
153
+ {{- '<|im_end|>\n' }}
154
+ {%- elif message.role == "tool" %}
155
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
156
+ {{- '<|im_start|>user' }}
157
+ {%- endif %}
158
+ {{- '\n<tool_response>\n' }}
159
+ {%- if message.content is string %}
160
+ {{- content }}
161
+ {%- else %}
162
+ {{- message.content | tojson(ensure_ascii=False) }}
163
+ {%- endif %}
164
+ {{- '\n</tool_response>' }}
165
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
166
+ {{- '<|im_end|>\n' }}
167
+ {%- endif %}
168
+ {%- endif %}
169
+ {%- endfor %}
170
+ {%- if add_generation_prompt %}
171
+ {{- '<|im_start|>assistant\n' }}
172
+ {%- if enable_thinking is defined %}
173
+ {%- if enable_thinking is false %}
174
+ {{- '<think>\n\n</think>\n\n' }}
175
+ {%- elif enable_thinking is true %}
176
+ {{- '<think>\n' }}
177
+ {%- endif %}
178
+ {%- endif %}
179
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,1399 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "bos_token_id": 0,
6
+ "eos_token_id": [
7
+ 1,
8
+ 130073
9
+ ],
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 1536,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 4608,
15
+ "max_position_embeddings": 131072,
16
+ "model_type": "llama",
17
+ "num_attention_heads": 16,
18
+ "num_hidden_layers": 24,
19
+ "num_key_value_heads": 2,
20
+ "pad_token_id": 1,
21
+ "quantization": {
22
+ "group_size": 64,
23
+ "bits": 4,
24
+ "mode": "affine",
25
+ "model.embed_tokens": {
26
+ "bits": 8,
27
+ "group_size": 64
28
+ },
29
+ "model.layers.0.self_attn.q_proj": {
30
+ "bits": 8,
31
+ "group_size": 64
32
+ },
33
+ "model.layers.0.self_attn.k_proj": {
34
+ "bits": 8,
35
+ "group_size": 64
36
+ },
37
+ "model.layers.0.self_attn.v_proj": {
38
+ "bits": 8,
39
+ "group_size": 64
40
+ },
41
+ "model.layers.0.self_attn.o_proj": {
42
+ "bits": 8,
43
+ "group_size": 64
44
+ },
45
+ "model.layers.0.mlp.gate_proj": {
46
+ "bits": 8,
47
+ "group_size": 64
48
+ },
49
+ "model.layers.0.mlp.down_proj": {
50
+ "bits": 8,
51
+ "group_size": 64
52
+ },
53
+ "model.layers.0.mlp.up_proj": {
54
+ "bits": 8,
55
+ "group_size": 64
56
+ },
57
+ "model.layers.1.self_attn.q_proj": {
58
+ "bits": 4,
59
+ "group_size": 64
60
+ },
61
+ "model.layers.1.self_attn.k_proj": {
62
+ "bits": 4,
63
+ "group_size": 64
64
+ },
65
+ "model.layers.1.self_attn.v_proj": {
66
+ "bits": 8,
67
+ "group_size": 64
68
+ },
69
+ "model.layers.1.self_attn.o_proj": {
70
+ "bits": 8,
71
+ "group_size": 64
72
+ },
73
+ "model.layers.1.mlp.gate_proj": {
74
+ "bits": 4,
75
+ "group_size": 64
76
+ },
77
+ "model.layers.1.mlp.down_proj": {
78
+ "bits": 8,
79
+ "group_size": 64
80
+ },
81
+ "model.layers.1.mlp.up_proj": {
82
+ "bits": 4,
83
+ "group_size": 64
84
+ },
85
+ "model.layers.2.self_attn.q_proj": {
86
+ "bits": 4,
87
+ "group_size": 64
88
+ },
89
+ "model.layers.2.self_attn.k_proj": {
90
+ "bits": 4,
91
+ "group_size": 64
92
+ },
93
+ "model.layers.2.self_attn.v_proj": {
94
+ "bits": 8,
95
+ "group_size": 64
96
+ },
97
+ "model.layers.2.self_attn.o_proj": {
98
+ "bits": 8,
99
+ "group_size": 64
100
+ },
101
+ "model.layers.2.mlp.gate_proj": {
102
+ "bits": 4,
103
+ "group_size": 64
104
+ },
105
+ "model.layers.2.mlp.down_proj": {
106
+ "bits": 4,
107
+ "group_size": 64
108
+ },
109
+ "model.layers.2.mlp.up_proj": {
110
+ "bits": 4,
111
+ "group_size": 64
112
+ },
113
+ "model.layers.3.self_attn.q_proj": {
114
+ "bits": 4,
115
+ "group_size": 64
116
+ },
117
+ "model.layers.3.self_attn.k_proj": {
118
+ "bits": 4,
119
+ "group_size": 64
120
+ },
121
+ "model.layers.3.self_attn.v_proj": {
122
+ "bits": 8,
123
+ "group_size": 64
124
+ },
125
+ "model.layers.3.self_attn.o_proj": {
126
+ "bits": 8,
127
+ "group_size": 64
128
+ },
129
+ "model.layers.3.mlp.gate_proj": {
130
+ "bits": 4,
131
+ "group_size": 64
132
+ },
133
+ "model.layers.3.mlp.down_proj": {
134
+ "bits": 4,
135
+ "group_size": 64
136
+ },
137
+ "model.layers.3.mlp.up_proj": {
138
+ "bits": 4,
139
+ "group_size": 64
140
+ },
141
+ "model.layers.4.self_attn.q_proj": {
142
+ "bits": 4,
143
+ "group_size": 64
144
+ },
145
+ "model.layers.4.self_attn.k_proj": {
146
+ "bits": 4,
147
+ "group_size": 64
148
+ },
149
+ "model.layers.4.self_attn.v_proj": {
150
+ "bits": 8,
151
+ "group_size": 64
152
+ },
153
+ "model.layers.4.self_attn.o_proj": {
154
+ "bits": 8,
155
+ "group_size": 64
156
+ },
157
+ "model.layers.4.mlp.gate_proj": {
158
+ "bits": 4,
159
+ "group_size": 64
160
+ },
161
+ "model.layers.4.mlp.down_proj": {
162
+ "bits": 8,
163
+ "group_size": 64
164
+ },
165
+ "model.layers.4.mlp.up_proj": {
166
+ "bits": 4,
167
+ "group_size": 64
168
+ },
169
+ "model.layers.5.self_attn.q_proj": {
170
+ "bits": 8,
171
+ "group_size": 64
172
+ },
173
+ "model.layers.5.self_attn.k_proj": {
174
+ "bits": 4,
175
+ "group_size": 64
176
+ },
177
+ "model.layers.5.self_attn.v_proj": {
178
+ "bits": 8,
179
+ "group_size": 64
180
+ },
181
+ "model.layers.5.self_attn.o_proj": {
182
+ "bits": 4,
183
+ "group_size": 64
184
+ },
185
+ "model.layers.5.mlp.gate_proj": {
186
+ "bits": 4,
187
+ "group_size": 64
188
+ },
189
+ "model.layers.5.mlp.down_proj": {
190
+ "bits": 4,
191
+ "group_size": 64
192
+ },
193
+ "model.layers.5.mlp.up_proj": {
194
+ "bits": 4,
195
+ "group_size": 64
196
+ },
197
+ "model.layers.6.self_attn.q_proj": {
198
+ "bits": 4,
199
+ "group_size": 64
200
+ },
201
+ "model.layers.6.self_attn.k_proj": {
202
+ "bits": 4,
203
+ "group_size": 64
204
+ },
205
+ "model.layers.6.self_attn.v_proj": {
206
+ "bits": 8,
207
+ "group_size": 64
208
+ },
209
+ "model.layers.6.self_attn.o_proj": {
210
+ "bits": 8,
211
+ "group_size": 64
212
+ },
213
+ "model.layers.6.mlp.gate_proj": {
214
+ "bits": 4,
215
+ "group_size": 64
216
+ },
217
+ "model.layers.6.mlp.down_proj": {
218
+ "bits": 4,
219
+ "group_size": 64
220
+ },
221
+ "model.layers.6.mlp.up_proj": {
222
+ "bits": 4,
223
+ "group_size": 64
224
+ },
225
+ "model.layers.7.self_attn.q_proj": {
226
+ "bits": 8,
227
+ "group_size": 64
228
+ },
229
+ "model.layers.7.self_attn.k_proj": {
230
+ "bits": 4,
231
+ "group_size": 64
232
+ },
233
+ "model.layers.7.self_attn.v_proj": {
234
+ "bits": 4,
235
+ "group_size": 64
236
+ },
237
+ "model.layers.7.self_attn.o_proj": {
238
+ "bits": 8,
239
+ "group_size": 64
240
+ },
241
+ "model.layers.7.mlp.gate_proj": {
242
+ "bits": 4,
243
+ "group_size": 64
244
+ },
245
+ "model.layers.7.mlp.down_proj": {
246
+ "bits": 4,
247
+ "group_size": 64
248
+ },
249
+ "model.layers.7.mlp.up_proj": {
250
+ "bits": 8,
251
+ "group_size": 64
252
+ },
253
+ "model.layers.8.self_attn.q_proj": {
254
+ "bits": 8,
255
+ "group_size": 64
256
+ },
257
+ "model.layers.8.self_attn.k_proj": {
258
+ "bits": 4,
259
+ "group_size": 64
260
+ },
261
+ "model.layers.8.self_attn.v_proj": {
262
+ "bits": 4,
263
+ "group_size": 64
264
+ },
265
+ "model.layers.8.self_attn.o_proj": {
266
+ "bits": 8,
267
+ "group_size": 64
268
+ },
269
+ "model.layers.8.mlp.gate_proj": {
270
+ "bits": 4,
271
+ "group_size": 64
272
+ },
273
+ "model.layers.8.mlp.down_proj": {
274
+ "bits": 4,
275
+ "group_size": 64
276
+ },
277
+ "model.layers.8.mlp.up_proj": {
278
+ "bits": 4,
279
+ "group_size": 64
280
+ },
281
+ "model.layers.9.self_attn.q_proj": {
282
+ "bits": 4,
283
+ "group_size": 64
284
+ },
285
+ "model.layers.9.self_attn.k_proj": {
286
+ "bits": 4,
287
+ "group_size": 64
288
+ },
289
+ "model.layers.9.self_attn.v_proj": {
290
+ "bits": 8,
291
+ "group_size": 64
292
+ },
293
+ "model.layers.9.self_attn.o_proj": {
294
+ "bits": 8,
295
+ "group_size": 64
296
+ },
297
+ "model.layers.9.mlp.gate_proj": {
298
+ "bits": 4,
299
+ "group_size": 64
300
+ },
301
+ "model.layers.9.mlp.down_proj": {
302
+ "bits": 4,
303
+ "group_size": 64
304
+ },
305
+ "model.layers.9.mlp.up_proj": {
306
+ "bits": 4,
307
+ "group_size": 64
308
+ },
309
+ "model.layers.10.self_attn.q_proj": {
310
+ "bits": 4,
311
+ "group_size": 64
312
+ },
313
+ "model.layers.10.self_attn.k_proj": {
314
+ "bits": 4,
315
+ "group_size": 64
316
+ },
317
+ "model.layers.10.self_attn.v_proj": {
318
+ "bits": 8,
319
+ "group_size": 64
320
+ },
321
+ "model.layers.10.self_attn.o_proj": {
322
+ "bits": 8,
323
+ "group_size": 64
324
+ },
325
+ "model.layers.10.mlp.gate_proj": {
326
+ "bits": 4,
327
+ "group_size": 64
328
+ },
329
+ "model.layers.10.mlp.down_proj": {
330
+ "bits": 4,
331
+ "group_size": 64
332
+ },
333
+ "model.layers.10.mlp.up_proj": {
334
+ "bits": 8,
335
+ "group_size": 64
336
+ },
337
+ "model.layers.11.self_attn.q_proj": {
338
+ "bits": 4,
339
+ "group_size": 64
340
+ },
341
+ "model.layers.11.self_attn.k_proj": {
342
+ "bits": 4,
343
+ "group_size": 64
344
+ },
345
+ "model.layers.11.self_attn.v_proj": {
346
+ "bits": 8,
347
+ "group_size": 64
348
+ },
349
+ "model.layers.11.self_attn.o_proj": {
350
+ "bits": 8,
351
+ "group_size": 64
352
+ },
353
+ "model.layers.11.mlp.gate_proj": {
354
+ "bits": 4,
355
+ "group_size": 64
356
+ },
357
+ "model.layers.11.mlp.down_proj": {
358
+ "bits": 4,
359
+ "group_size": 64
360
+ },
361
+ "model.layers.11.mlp.up_proj": {
362
+ "bits": 4,
363
+ "group_size": 64
364
+ },
365
+ "model.layers.12.self_attn.q_proj": {
366
+ "bits": 4,
367
+ "group_size": 64
368
+ },
369
+ "model.layers.12.self_attn.k_proj": {
370
+ "bits": 4,
371
+ "group_size": 64
372
+ },
373
+ "model.layers.12.self_attn.v_proj": {
374
+ "bits": 8,
375
+ "group_size": 64
376
+ },
377
+ "model.layers.12.self_attn.o_proj": {
378
+ "bits": 8,
379
+ "group_size": 64
380
+ },
381
+ "model.layers.12.mlp.gate_proj": {
382
+ "bits": 4,
383
+ "group_size": 64
384
+ },
385
+ "model.layers.12.mlp.down_proj": {
386
+ "bits": 4,
387
+ "group_size": 64
388
+ },
389
+ "model.layers.12.mlp.up_proj": {
390
+ "bits": 4,
391
+ "group_size": 64
392
+ },
393
+ "model.layers.13.self_attn.q_proj": {
394
+ "bits": 4,
395
+ "group_size": 64
396
+ },
397
+ "model.layers.13.self_attn.k_proj": {
398
+ "bits": 4,
399
+ "group_size": 64
400
+ },
401
+ "model.layers.13.self_attn.v_proj": {
402
+ "bits": 8,
403
+ "group_size": 64
404
+ },
405
+ "model.layers.13.self_attn.o_proj": {
406
+ "bits": 8,
407
+ "group_size": 64
408
+ },
409
+ "model.layers.13.mlp.gate_proj": {
410
+ "bits": 4,
411
+ "group_size": 64
412
+ },
413
+ "model.layers.13.mlp.down_proj": {
414
+ "bits": 4,
415
+ "group_size": 64
416
+ },
417
+ "model.layers.13.mlp.up_proj": {
418
+ "bits": 8,
419
+ "group_size": 64
420
+ },
421
+ "model.layers.14.self_attn.q_proj": {
422
+ "bits": 4,
423
+ "group_size": 64
424
+ },
425
+ "model.layers.14.self_attn.k_proj": {
426
+ "bits": 4,
427
+ "group_size": 64
428
+ },
429
+ "model.layers.14.self_attn.v_proj": {
430
+ "bits": 8,
431
+ "group_size": 64
432
+ },
433
+ "model.layers.14.self_attn.o_proj": {
434
+ "bits": 8,
435
+ "group_size": 64
436
+ },
437
+ "model.layers.14.mlp.gate_proj": {
438
+ "bits": 4,
439
+ "group_size": 64
440
+ },
441
+ "model.layers.14.mlp.down_proj": {
442
+ "bits": 4,
443
+ "group_size": 64
444
+ },
445
+ "model.layers.14.mlp.up_proj": {
446
+ "bits": 4,
447
+ "group_size": 64
448
+ },
449
+ "model.layers.15.self_attn.q_proj": {
450
+ "bits": 4,
451
+ "group_size": 64
452
+ },
453
+ "model.layers.15.self_attn.k_proj": {
454
+ "bits": 4,
455
+ "group_size": 64
456
+ },
457
+ "model.layers.15.self_attn.v_proj": {
458
+ "bits": 8,
459
+ "group_size": 64
460
+ },
461
+ "model.layers.15.self_attn.o_proj": {
462
+ "bits": 8,
463
+ "group_size": 64
464
+ },
465
+ "model.layers.15.mlp.gate_proj": {
466
+ "bits": 4,
467
+ "group_size": 64
468
+ },
469
+ "model.layers.15.mlp.down_proj": {
470
+ "bits": 4,
471
+ "group_size": 64
472
+ },
473
+ "model.layers.15.mlp.up_proj": {
474
+ "bits": 4,
475
+ "group_size": 64
476
+ },
477
+ "model.layers.16.self_attn.q_proj": {
478
+ "bits": 4,
479
+ "group_size": 64
480
+ },
481
+ "model.layers.16.self_attn.k_proj": {
482
+ "bits": 4,
483
+ "group_size": 64
484
+ },
485
+ "model.layers.16.self_attn.v_proj": {
486
+ "bits": 8,
487
+ "group_size": 64
488
+ },
489
+ "model.layers.16.self_attn.o_proj": {
490
+ "bits": 8,
491
+ "group_size": 64
492
+ },
493
+ "model.layers.16.mlp.gate_proj": {
494
+ "bits": 4,
495
+ "group_size": 64
496
+ },
497
+ "model.layers.16.mlp.down_proj": {
498
+ "bits": 4,
499
+ "group_size": 64
500
+ },
501
+ "model.layers.16.mlp.up_proj": {
502
+ "bits": 8,
503
+ "group_size": 64
504
+ },
505
+ "model.layers.17.self_attn.q_proj": {
506
+ "bits": 4,
507
+ "group_size": 64
508
+ },
509
+ "model.layers.17.self_attn.k_proj": {
510
+ "bits": 4,
511
+ "group_size": 64
512
+ },
513
+ "model.layers.17.self_attn.v_proj": {
514
+ "bits": 8,
515
+ "group_size": 64
516
+ },
517
+ "model.layers.17.self_attn.o_proj": {
518
+ "bits": 8,
519
+ "group_size": 64
520
+ },
521
+ "model.layers.17.mlp.gate_proj": {
522
+ "bits": 4,
523
+ "group_size": 64
524
+ },
525
+ "model.layers.17.mlp.down_proj": {
526
+ "bits": 4,
527
+ "group_size": 64
528
+ },
529
+ "model.layers.17.mlp.up_proj": {
530
+ "bits": 4,
531
+ "group_size": 64
532
+ },
533
+ "model.layers.18.self_attn.q_proj": {
534
+ "bits": 4,
535
+ "group_size": 64
536
+ },
537
+ "model.layers.18.self_attn.k_proj": {
538
+ "bits": 4,
539
+ "group_size": 64
540
+ },
541
+ "model.layers.18.self_attn.v_proj": {
542
+ "bits": 8,
543
+ "group_size": 64
544
+ },
545
+ "model.layers.18.self_attn.o_proj": {
546
+ "bits": 8,
547
+ "group_size": 64
548
+ },
549
+ "model.layers.18.mlp.gate_proj": {
550
+ "bits": 4,
551
+ "group_size": 64
552
+ },
553
+ "model.layers.18.mlp.down_proj": {
554
+ "bits": 4,
555
+ "group_size": 64
556
+ },
557
+ "model.layers.18.mlp.up_proj": {
558
+ "bits": 4,
559
+ "group_size": 64
560
+ },
561
+ "model.layers.19.self_attn.q_proj": {
562
+ "bits": 4,
563
+ "group_size": 64
564
+ },
565
+ "model.layers.19.self_attn.k_proj": {
566
+ "bits": 4,
567
+ "group_size": 64
568
+ },
569
+ "model.layers.19.self_attn.v_proj": {
570
+ "bits": 8,
571
+ "group_size": 64
572
+ },
573
+ "model.layers.19.self_attn.o_proj": {
574
+ "bits": 8,
575
+ "group_size": 64
576
+ },
577
+ "model.layers.19.mlp.gate_proj": {
578
+ "bits": 4,
579
+ "group_size": 64
580
+ },
581
+ "model.layers.19.mlp.down_proj": {
582
+ "bits": 4,
583
+ "group_size": 64
584
+ },
585
+ "model.layers.19.mlp.up_proj": {
586
+ "bits": 8,
587
+ "group_size": 64
588
+ },
589
+ "model.layers.20.self_attn.q_proj": {
590
+ "bits": 8,
591
+ "group_size": 64
592
+ },
593
+ "model.layers.20.self_attn.k_proj": {
594
+ "bits": 4,
595
+ "group_size": 64
596
+ },
597
+ "model.layers.20.self_attn.v_proj": {
598
+ "bits": 8,
599
+ "group_size": 64
600
+ },
601
+ "model.layers.20.self_attn.o_proj": {
602
+ "bits": 4,
603
+ "group_size": 64
604
+ },
605
+ "model.layers.20.mlp.gate_proj": {
606
+ "bits": 4,
607
+ "group_size": 64
608
+ },
609
+ "model.layers.20.mlp.down_proj": {
610
+ "bits": 4,
611
+ "group_size": 64
612
+ },
613
+ "model.layers.20.mlp.up_proj": {
614
+ "bits": 4,
615
+ "group_size": 64
616
+ },
617
+ "model.layers.21.self_attn.q_proj": {
618
+ "bits": 4,
619
+ "group_size": 64
620
+ },
621
+ "model.layers.21.self_attn.k_proj": {
622
+ "bits": 4,
623
+ "group_size": 64
624
+ },
625
+ "model.layers.21.self_attn.v_proj": {
626
+ "bits": 8,
627
+ "group_size": 64
628
+ },
629
+ "model.layers.21.self_attn.o_proj": {
630
+ "bits": 8,
631
+ "group_size": 64
632
+ },
633
+ "model.layers.21.mlp.gate_proj": {
634
+ "bits": 4,
635
+ "group_size": 64
636
+ },
637
+ "model.layers.21.mlp.down_proj": {
638
+ "bits": 4,
639
+ "group_size": 64
640
+ },
641
+ "model.layers.21.mlp.up_proj": {
642
+ "bits": 4,
643
+ "group_size": 64
644
+ },
645
+ "model.layers.22.self_attn.q_proj": {
646
+ "bits": 4,
647
+ "group_size": 64
648
+ },
649
+ "model.layers.22.self_attn.k_proj": {
650
+ "bits": 4,
651
+ "group_size": 64
652
+ },
653
+ "model.layers.22.self_attn.v_proj": {
654
+ "bits": 8,
655
+ "group_size": 64
656
+ },
657
+ "model.layers.22.self_attn.o_proj": {
658
+ "bits": 8,
659
+ "group_size": 64
660
+ },
661
+ "model.layers.22.mlp.gate_proj": {
662
+ "bits": 4,
663
+ "group_size": 64
664
+ },
665
+ "model.layers.22.mlp.down_proj": {
666
+ "bits": 4,
667
+ "group_size": 64
668
+ },
669
+ "model.layers.22.mlp.up_proj": {
670
+ "bits": 8,
671
+ "group_size": 64
672
+ },
673
+ "model.layers.23.self_attn.q_proj": {
674
+ "bits": 8,
675
+ "group_size": 64
676
+ },
677
+ "model.layers.23.self_attn.k_proj": {
678
+ "bits": 8,
679
+ "group_size": 64
680
+ },
681
+ "model.layers.23.self_attn.v_proj": {
682
+ "bits": 8,
683
+ "group_size": 64
684
+ },
685
+ "model.layers.23.self_attn.o_proj": {
686
+ "bits": 8,
687
+ "group_size": 64
688
+ },
689
+ "model.layers.23.mlp.gate_proj": {
690
+ "bits": 8,
691
+ "group_size": 64
692
+ },
693
+ "model.layers.23.mlp.down_proj": {
694
+ "bits": 8,
695
+ "group_size": 64
696
+ },
697
+ "model.layers.23.mlp.up_proj": {
698
+ "bits": 8,
699
+ "group_size": 64
700
+ },
701
+ "lm_head": {
702
+ "bits": 8,
703
+ "group_size": 64
704
+ }
705
+ },
706
+ "quantization_config": {
707
+ "group_size": 64,
708
+ "bits": 4,
709
+ "mode": "affine",
710
+ "model.embed_tokens": {
711
+ "bits": 8,
712
+ "group_size": 64
713
+ },
714
+ "model.layers.0.self_attn.q_proj": {
715
+ "bits": 8,
716
+ "group_size": 64
717
+ },
718
+ "model.layers.0.self_attn.k_proj": {
719
+ "bits": 8,
720
+ "group_size": 64
721
+ },
722
+ "model.layers.0.self_attn.v_proj": {
723
+ "bits": 8,
724
+ "group_size": 64
725
+ },
726
+ "model.layers.0.self_attn.o_proj": {
727
+ "bits": 8,
728
+ "group_size": 64
729
+ },
730
+ "model.layers.0.mlp.gate_proj": {
731
+ "bits": 8,
732
+ "group_size": 64
733
+ },
734
+ "model.layers.0.mlp.down_proj": {
735
+ "bits": 8,
736
+ "group_size": 64
737
+ },
738
+ "model.layers.0.mlp.up_proj": {
739
+ "bits": 8,
740
+ "group_size": 64
741
+ },
742
+ "model.layers.1.self_attn.q_proj": {
743
+ "bits": 4,
744
+ "group_size": 64
745
+ },
746
+ "model.layers.1.self_attn.k_proj": {
747
+ "bits": 4,
748
+ "group_size": 64
749
+ },
750
+ "model.layers.1.self_attn.v_proj": {
751
+ "bits": 8,
752
+ "group_size": 64
753
+ },
754
+ "model.layers.1.self_attn.o_proj": {
755
+ "bits": 8,
756
+ "group_size": 64
757
+ },
758
+ "model.layers.1.mlp.gate_proj": {
759
+ "bits": 4,
760
+ "group_size": 64
761
+ },
762
+ "model.layers.1.mlp.down_proj": {
763
+ "bits": 8,
764
+ "group_size": 64
765
+ },
766
+ "model.layers.1.mlp.up_proj": {
767
+ "bits": 4,
768
+ "group_size": 64
769
+ },
770
+ "model.layers.2.self_attn.q_proj": {
771
+ "bits": 4,
772
+ "group_size": 64
773
+ },
774
+ "model.layers.2.self_attn.k_proj": {
775
+ "bits": 4,
776
+ "group_size": 64
777
+ },
778
+ "model.layers.2.self_attn.v_proj": {
779
+ "bits": 8,
780
+ "group_size": 64
781
+ },
782
+ "model.layers.2.self_attn.o_proj": {
783
+ "bits": 8,
784
+ "group_size": 64
785
+ },
786
+ "model.layers.2.mlp.gate_proj": {
787
+ "bits": 4,
788
+ "group_size": 64
789
+ },
790
+ "model.layers.2.mlp.down_proj": {
791
+ "bits": 4,
792
+ "group_size": 64
793
+ },
794
+ "model.layers.2.mlp.up_proj": {
795
+ "bits": 4,
796
+ "group_size": 64
797
+ },
798
+ "model.layers.3.self_attn.q_proj": {
799
+ "bits": 4,
800
+ "group_size": 64
801
+ },
802
+ "model.layers.3.self_attn.k_proj": {
803
+ "bits": 4,
804
+ "group_size": 64
805
+ },
806
+ "model.layers.3.self_attn.v_proj": {
807
+ "bits": 8,
808
+ "group_size": 64
809
+ },
810
+ "model.layers.3.self_attn.o_proj": {
811
+ "bits": 8,
812
+ "group_size": 64
813
+ },
814
+ "model.layers.3.mlp.gate_proj": {
815
+ "bits": 4,
816
+ "group_size": 64
817
+ },
818
+ "model.layers.3.mlp.down_proj": {
819
+ "bits": 4,
820
+ "group_size": 64
821
+ },
822
+ "model.layers.3.mlp.up_proj": {
823
+ "bits": 4,
824
+ "group_size": 64
825
+ },
826
+ "model.layers.4.self_attn.q_proj": {
827
+ "bits": 4,
828
+ "group_size": 64
829
+ },
830
+ "model.layers.4.self_attn.k_proj": {
831
+ "bits": 4,
832
+ "group_size": 64
833
+ },
834
+ "model.layers.4.self_attn.v_proj": {
835
+ "bits": 8,
836
+ "group_size": 64
837
+ },
838
+ "model.layers.4.self_attn.o_proj": {
839
+ "bits": 8,
840
+ "group_size": 64
841
+ },
842
+ "model.layers.4.mlp.gate_proj": {
843
+ "bits": 4,
844
+ "group_size": 64
845
+ },
846
+ "model.layers.4.mlp.down_proj": {
847
+ "bits": 8,
848
+ "group_size": 64
849
+ },
850
+ "model.layers.4.mlp.up_proj": {
851
+ "bits": 4,
852
+ "group_size": 64
853
+ },
854
+ "model.layers.5.self_attn.q_proj": {
855
+ "bits": 8,
856
+ "group_size": 64
857
+ },
858
+ "model.layers.5.self_attn.k_proj": {
859
+ "bits": 4,
860
+ "group_size": 64
861
+ },
862
+ "model.layers.5.self_attn.v_proj": {
863
+ "bits": 8,
864
+ "group_size": 64
865
+ },
866
+ "model.layers.5.self_attn.o_proj": {
867
+ "bits": 4,
868
+ "group_size": 64
869
+ },
870
+ "model.layers.5.mlp.gate_proj": {
871
+ "bits": 4,
872
+ "group_size": 64
873
+ },
874
+ "model.layers.5.mlp.down_proj": {
875
+ "bits": 4,
876
+ "group_size": 64
877
+ },
878
+ "model.layers.5.mlp.up_proj": {
879
+ "bits": 4,
880
+ "group_size": 64
881
+ },
882
+ "model.layers.6.self_attn.q_proj": {
883
+ "bits": 4,
884
+ "group_size": 64
885
+ },
886
+ "model.layers.6.self_attn.k_proj": {
887
+ "bits": 4,
888
+ "group_size": 64
889
+ },
890
+ "model.layers.6.self_attn.v_proj": {
891
+ "bits": 8,
892
+ "group_size": 64
893
+ },
894
+ "model.layers.6.self_attn.o_proj": {
895
+ "bits": 8,
896
+ "group_size": 64
897
+ },
898
+ "model.layers.6.mlp.gate_proj": {
899
+ "bits": 4,
900
+ "group_size": 64
901
+ },
902
+ "model.layers.6.mlp.down_proj": {
903
+ "bits": 4,
904
+ "group_size": 64
905
+ },
906
+ "model.layers.6.mlp.up_proj": {
907
+ "bits": 4,
908
+ "group_size": 64
909
+ },
910
+ "model.layers.7.self_attn.q_proj": {
911
+ "bits": 8,
912
+ "group_size": 64
913
+ },
914
+ "model.layers.7.self_attn.k_proj": {
915
+ "bits": 4,
916
+ "group_size": 64
917
+ },
918
+ "model.layers.7.self_attn.v_proj": {
919
+ "bits": 4,
920
+ "group_size": 64
921
+ },
922
+ "model.layers.7.self_attn.o_proj": {
923
+ "bits": 8,
924
+ "group_size": 64
925
+ },
926
+ "model.layers.7.mlp.gate_proj": {
927
+ "bits": 4,
928
+ "group_size": 64
929
+ },
930
+ "model.layers.7.mlp.down_proj": {
931
+ "bits": 4,
932
+ "group_size": 64
933
+ },
934
+ "model.layers.7.mlp.up_proj": {
935
+ "bits": 8,
936
+ "group_size": 64
937
+ },
938
+ "model.layers.8.self_attn.q_proj": {
939
+ "bits": 8,
940
+ "group_size": 64
941
+ },
942
+ "model.layers.8.self_attn.k_proj": {
943
+ "bits": 4,
944
+ "group_size": 64
945
+ },
946
+ "model.layers.8.self_attn.v_proj": {
947
+ "bits": 4,
948
+ "group_size": 64
949
+ },
950
+ "model.layers.8.self_attn.o_proj": {
951
+ "bits": 8,
952
+ "group_size": 64
953
+ },
954
+ "model.layers.8.mlp.gate_proj": {
955
+ "bits": 4,
956
+ "group_size": 64
957
+ },
958
+ "model.layers.8.mlp.down_proj": {
959
+ "bits": 4,
960
+ "group_size": 64
961
+ },
962
+ "model.layers.8.mlp.up_proj": {
963
+ "bits": 4,
964
+ "group_size": 64
965
+ },
966
+ "model.layers.9.self_attn.q_proj": {
967
+ "bits": 4,
968
+ "group_size": 64
969
+ },
970
+ "model.layers.9.self_attn.k_proj": {
971
+ "bits": 4,
972
+ "group_size": 64
973
+ },
974
+ "model.layers.9.self_attn.v_proj": {
975
+ "bits": 8,
976
+ "group_size": 64
977
+ },
978
+ "model.layers.9.self_attn.o_proj": {
979
+ "bits": 8,
980
+ "group_size": 64
981
+ },
982
+ "model.layers.9.mlp.gate_proj": {
983
+ "bits": 4,
984
+ "group_size": 64
985
+ },
986
+ "model.layers.9.mlp.down_proj": {
987
+ "bits": 4,
988
+ "group_size": 64
989
+ },
990
+ "model.layers.9.mlp.up_proj": {
991
+ "bits": 4,
992
+ "group_size": 64
993
+ },
994
+ "model.layers.10.self_attn.q_proj": {
995
+ "bits": 4,
996
+ "group_size": 64
997
+ },
998
+ "model.layers.10.self_attn.k_proj": {
999
+ "bits": 4,
1000
+ "group_size": 64
1001
+ },
1002
+ "model.layers.10.self_attn.v_proj": {
1003
+ "bits": 8,
1004
+ "group_size": 64
1005
+ },
1006
+ "model.layers.10.self_attn.o_proj": {
1007
+ "bits": 8,
1008
+ "group_size": 64
1009
+ },
1010
+ "model.layers.10.mlp.gate_proj": {
1011
+ "bits": 4,
1012
+ "group_size": 64
1013
+ },
1014
+ "model.layers.10.mlp.down_proj": {
1015
+ "bits": 4,
1016
+ "group_size": 64
1017
+ },
1018
+ "model.layers.10.mlp.up_proj": {
1019
+ "bits": 8,
1020
+ "group_size": 64
1021
+ },
1022
+ "model.layers.11.self_attn.q_proj": {
1023
+ "bits": 4,
1024
+ "group_size": 64
1025
+ },
1026
+ "model.layers.11.self_attn.k_proj": {
1027
+ "bits": 4,
1028
+ "group_size": 64
1029
+ },
1030
+ "model.layers.11.self_attn.v_proj": {
1031
+ "bits": 8,
1032
+ "group_size": 64
1033
+ },
1034
+ "model.layers.11.self_attn.o_proj": {
1035
+ "bits": 8,
1036
+ "group_size": 64
1037
+ },
1038
+ "model.layers.11.mlp.gate_proj": {
1039
+ "bits": 4,
1040
+ "group_size": 64
1041
+ },
1042
+ "model.layers.11.mlp.down_proj": {
1043
+ "bits": 4,
1044
+ "group_size": 64
1045
+ },
1046
+ "model.layers.11.mlp.up_proj": {
1047
+ "bits": 4,
1048
+ "group_size": 64
1049
+ },
1050
+ "model.layers.12.self_attn.q_proj": {
1051
+ "bits": 4,
1052
+ "group_size": 64
1053
+ },
1054
+ "model.layers.12.self_attn.k_proj": {
1055
+ "bits": 4,
1056
+ "group_size": 64
1057
+ },
1058
+ "model.layers.12.self_attn.v_proj": {
1059
+ "bits": 8,
1060
+ "group_size": 64
1061
+ },
1062
+ "model.layers.12.self_attn.o_proj": {
1063
+ "bits": 8,
1064
+ "group_size": 64
1065
+ },
1066
+ "model.layers.12.mlp.gate_proj": {
1067
+ "bits": 4,
1068
+ "group_size": 64
1069
+ },
1070
+ "model.layers.12.mlp.down_proj": {
1071
+ "bits": 4,
1072
+ "group_size": 64
1073
+ },
1074
+ "model.layers.12.mlp.up_proj": {
1075
+ "bits": 4,
1076
+ "group_size": 64
1077
+ },
1078
+ "model.layers.13.self_attn.q_proj": {
1079
+ "bits": 4,
1080
+ "group_size": 64
1081
+ },
1082
+ "model.layers.13.self_attn.k_proj": {
1083
+ "bits": 4,
1084
+ "group_size": 64
1085
+ },
1086
+ "model.layers.13.self_attn.v_proj": {
1087
+ "bits": 8,
1088
+ "group_size": 64
1089
+ },
1090
+ "model.layers.13.self_attn.o_proj": {
1091
+ "bits": 8,
1092
+ "group_size": 64
1093
+ },
1094
+ "model.layers.13.mlp.gate_proj": {
1095
+ "bits": 4,
1096
+ "group_size": 64
1097
+ },
1098
+ "model.layers.13.mlp.down_proj": {
1099
+ "bits": 4,
1100
+ "group_size": 64
1101
+ },
1102
+ "model.layers.13.mlp.up_proj": {
1103
+ "bits": 8,
1104
+ "group_size": 64
1105
+ },
1106
+ "model.layers.14.self_attn.q_proj": {
1107
+ "bits": 4,
1108
+ "group_size": 64
1109
+ },
1110
+ "model.layers.14.self_attn.k_proj": {
1111
+ "bits": 4,
1112
+ "group_size": 64
1113
+ },
1114
+ "model.layers.14.self_attn.v_proj": {
1115
+ "bits": 8,
1116
+ "group_size": 64
1117
+ },
1118
+ "model.layers.14.self_attn.o_proj": {
1119
+ "bits": 8,
1120
+ "group_size": 64
1121
+ },
1122
+ "model.layers.14.mlp.gate_proj": {
1123
+ "bits": 4,
1124
+ "group_size": 64
1125
+ },
1126
+ "model.layers.14.mlp.down_proj": {
1127
+ "bits": 4,
1128
+ "group_size": 64
1129
+ },
1130
+ "model.layers.14.mlp.up_proj": {
1131
+ "bits": 4,
1132
+ "group_size": 64
1133
+ },
1134
+ "model.layers.15.self_attn.q_proj": {
1135
+ "bits": 4,
1136
+ "group_size": 64
1137
+ },
1138
+ "model.layers.15.self_attn.k_proj": {
1139
+ "bits": 4,
1140
+ "group_size": 64
1141
+ },
1142
+ "model.layers.15.self_attn.v_proj": {
1143
+ "bits": 8,
1144
+ "group_size": 64
1145
+ },
1146
+ "model.layers.15.self_attn.o_proj": {
1147
+ "bits": 8,
1148
+ "group_size": 64
1149
+ },
1150
+ "model.layers.15.mlp.gate_proj": {
1151
+ "bits": 4,
1152
+ "group_size": 64
1153
+ },
1154
+ "model.layers.15.mlp.down_proj": {
1155
+ "bits": 4,
1156
+ "group_size": 64
1157
+ },
1158
+ "model.layers.15.mlp.up_proj": {
1159
+ "bits": 4,
1160
+ "group_size": 64
1161
+ },
1162
+ "model.layers.16.self_attn.q_proj": {
1163
+ "bits": 4,
1164
+ "group_size": 64
1165
+ },
1166
+ "model.layers.16.self_attn.k_proj": {
1167
+ "bits": 4,
1168
+ "group_size": 64
1169
+ },
1170
+ "model.layers.16.self_attn.v_proj": {
1171
+ "bits": 8,
1172
+ "group_size": 64
1173
+ },
1174
+ "model.layers.16.self_attn.o_proj": {
1175
+ "bits": 8,
1176
+ "group_size": 64
1177
+ },
1178
+ "model.layers.16.mlp.gate_proj": {
1179
+ "bits": 4,
1180
+ "group_size": 64
1181
+ },
1182
+ "model.layers.16.mlp.down_proj": {
1183
+ "bits": 4,
1184
+ "group_size": 64
1185
+ },
1186
+ "model.layers.16.mlp.up_proj": {
1187
+ "bits": 8,
1188
+ "group_size": 64
1189
+ },
1190
+ "model.layers.17.self_attn.q_proj": {
1191
+ "bits": 4,
1192
+ "group_size": 64
1193
+ },
1194
+ "model.layers.17.self_attn.k_proj": {
1195
+ "bits": 4,
1196
+ "group_size": 64
1197
+ },
1198
+ "model.layers.17.self_attn.v_proj": {
1199
+ "bits": 8,
1200
+ "group_size": 64
1201
+ },
1202
+ "model.layers.17.self_attn.o_proj": {
1203
+ "bits": 8,
1204
+ "group_size": 64
1205
+ },
1206
+ "model.layers.17.mlp.gate_proj": {
1207
+ "bits": 4,
1208
+ "group_size": 64
1209
+ },
1210
+ "model.layers.17.mlp.down_proj": {
1211
+ "bits": 4,
1212
+ "group_size": 64
1213
+ },
1214
+ "model.layers.17.mlp.up_proj": {
1215
+ "bits": 4,
1216
+ "group_size": 64
1217
+ },
1218
+ "model.layers.18.self_attn.q_proj": {
1219
+ "bits": 4,
1220
+ "group_size": 64
1221
+ },
1222
+ "model.layers.18.self_attn.k_proj": {
1223
+ "bits": 4,
1224
+ "group_size": 64
1225
+ },
1226
+ "model.layers.18.self_attn.v_proj": {
1227
+ "bits": 8,
1228
+ "group_size": 64
1229
+ },
1230
+ "model.layers.18.self_attn.o_proj": {
1231
+ "bits": 8,
1232
+ "group_size": 64
1233
+ },
1234
+ "model.layers.18.mlp.gate_proj": {
1235
+ "bits": 4,
1236
+ "group_size": 64
1237
+ },
1238
+ "model.layers.18.mlp.down_proj": {
1239
+ "bits": 4,
1240
+ "group_size": 64
1241
+ },
1242
+ "model.layers.18.mlp.up_proj": {
1243
+ "bits": 4,
1244
+ "group_size": 64
1245
+ },
1246
+ "model.layers.19.self_attn.q_proj": {
1247
+ "bits": 4,
1248
+ "group_size": 64
1249
+ },
1250
+ "model.layers.19.self_attn.k_proj": {
1251
+ "bits": 4,
1252
+ "group_size": 64
1253
+ },
1254
+ "model.layers.19.self_attn.v_proj": {
1255
+ "bits": 8,
1256
+ "group_size": 64
1257
+ },
1258
+ "model.layers.19.self_attn.o_proj": {
1259
+ "bits": 8,
1260
+ "group_size": 64
1261
+ },
1262
+ "model.layers.19.mlp.gate_proj": {
1263
+ "bits": 4,
1264
+ "group_size": 64
1265
+ },
1266
+ "model.layers.19.mlp.down_proj": {
1267
+ "bits": 4,
1268
+ "group_size": 64
1269
+ },
1270
+ "model.layers.19.mlp.up_proj": {
1271
+ "bits": 8,
1272
+ "group_size": 64
1273
+ },
1274
+ "model.layers.20.self_attn.q_proj": {
1275
+ "bits": 8,
1276
+ "group_size": 64
1277
+ },
1278
+ "model.layers.20.self_attn.k_proj": {
1279
+ "bits": 4,
1280
+ "group_size": 64
1281
+ },
1282
+ "model.layers.20.self_attn.v_proj": {
1283
+ "bits": 8,
1284
+ "group_size": 64
1285
+ },
1286
+ "model.layers.20.self_attn.o_proj": {
1287
+ "bits": 4,
1288
+ "group_size": 64
1289
+ },
1290
+ "model.layers.20.mlp.gate_proj": {
1291
+ "bits": 4,
1292
+ "group_size": 64
1293
+ },
1294
+ "model.layers.20.mlp.down_proj": {
1295
+ "bits": 4,
1296
+ "group_size": 64
1297
+ },
1298
+ "model.layers.20.mlp.up_proj": {
1299
+ "bits": 4,
1300
+ "group_size": 64
1301
+ },
1302
+ "model.layers.21.self_attn.q_proj": {
1303
+ "bits": 4,
1304
+ "group_size": 64
1305
+ },
1306
+ "model.layers.21.self_attn.k_proj": {
1307
+ "bits": 4,
1308
+ "group_size": 64
1309
+ },
1310
+ "model.layers.21.self_attn.v_proj": {
1311
+ "bits": 8,
1312
+ "group_size": 64
1313
+ },
1314
+ "model.layers.21.self_attn.o_proj": {
1315
+ "bits": 8,
1316
+ "group_size": 64
1317
+ },
1318
+ "model.layers.21.mlp.gate_proj": {
1319
+ "bits": 4,
1320
+ "group_size": 64
1321
+ },
1322
+ "model.layers.21.mlp.down_proj": {
1323
+ "bits": 4,
1324
+ "group_size": 64
1325
+ },
1326
+ "model.layers.21.mlp.up_proj": {
1327
+ "bits": 4,
1328
+ "group_size": 64
1329
+ },
1330
+ "model.layers.22.self_attn.q_proj": {
1331
+ "bits": 4,
1332
+ "group_size": 64
1333
+ },
1334
+ "model.layers.22.self_attn.k_proj": {
1335
+ "bits": 4,
1336
+ "group_size": 64
1337
+ },
1338
+ "model.layers.22.self_attn.v_proj": {
1339
+ "bits": 8,
1340
+ "group_size": 64
1341
+ },
1342
+ "model.layers.22.self_attn.o_proj": {
1343
+ "bits": 8,
1344
+ "group_size": 64
1345
+ },
1346
+ "model.layers.22.mlp.gate_proj": {
1347
+ "bits": 4,
1348
+ "group_size": 64
1349
+ },
1350
+ "model.layers.22.mlp.down_proj": {
1351
+ "bits": 4,
1352
+ "group_size": 64
1353
+ },
1354
+ "model.layers.22.mlp.up_proj": {
1355
+ "bits": 8,
1356
+ "group_size": 64
1357
+ },
1358
+ "model.layers.23.self_attn.q_proj": {
1359
+ "bits": 8,
1360
+ "group_size": 64
1361
+ },
1362
+ "model.layers.23.self_attn.k_proj": {
1363
+ "bits": 8,
1364
+ "group_size": 64
1365
+ },
1366
+ "model.layers.23.self_attn.v_proj": {
1367
+ "bits": 8,
1368
+ "group_size": 64
1369
+ },
1370
+ "model.layers.23.self_attn.o_proj": {
1371
+ "bits": 8,
1372
+ "group_size": 64
1373
+ },
1374
+ "model.layers.23.mlp.gate_proj": {
1375
+ "bits": 8,
1376
+ "group_size": 64
1377
+ },
1378
+ "model.layers.23.mlp.down_proj": {
1379
+ "bits": 8,
1380
+ "group_size": 64
1381
+ },
1382
+ "model.layers.23.mlp.up_proj": {
1383
+ "bits": 8,
1384
+ "group_size": 64
1385
+ },
1386
+ "lm_head": {
1387
+ "bits": 8,
1388
+ "group_size": 64
1389
+ }
1390
+ },
1391
+ "rms_norm_eps": 1e-06,
1392
+ "rope_scaling": null,
1393
+ "rope_theta": 5000000,
1394
+ "tie_word_embeddings": false,
1395
+ "torch_dtype": "bfloat16",
1396
+ "transformers_version": "5.6.2",
1397
+ "use_cache": true,
1398
+ "vocab_size": 130560
1399
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": [
5
+ 1,
6
+ 130073
7
+ ],
8
+ "pad_token_id": 1,
9
+ "do_sample": true,
10
+ "temperature": 0.9,
11
+ "top_p": 0.95,
12
+ "transformers_version": "5.6.2"
13
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88bb686ed4a28f7c2065e27aabef7669f84961ac47c83efe2d003436c179e2e4
3
+ size 906870787
model.safetensors.index.json ADDED
@@ -0,0 +1,567 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 906808320,
4
+ "total_parameters": 1080632832
5
+ },
6
+ "weight_map": {
7
+ "lm_head.biases": "model.safetensors",
8
+ "lm_head.scales": "model.safetensors",
9
+ "lm_head.weight": "model.safetensors",
10
+ "model.embed_tokens.biases": "model.safetensors",
11
+ "model.embed_tokens.scales": "model.safetensors",
12
+ "model.embed_tokens.weight": "model.safetensors",
13
+ "model.layers.0.input_layernorm.weight": "model.safetensors",
14
+ "model.layers.0.mlp.down_proj.biases": "model.safetensors",
15
+ "model.layers.0.mlp.down_proj.scales": "model.safetensors",
16
+ "model.layers.0.mlp.down_proj.weight": "model.safetensors",
17
+ "model.layers.0.mlp.gate_proj.biases": "model.safetensors",
18
+ "model.layers.0.mlp.gate_proj.scales": "model.safetensors",
19
+ "model.layers.0.mlp.gate_proj.weight": "model.safetensors",
20
+ "model.layers.0.mlp.up_proj.biases": "model.safetensors",
21
+ "model.layers.0.mlp.up_proj.scales": "model.safetensors",
22
+ "model.layers.0.mlp.up_proj.weight": "model.safetensors",
23
+ "model.layers.0.post_attention_layernorm.weight": "model.safetensors",
24
+ "model.layers.0.self_attn.k_proj.biases": "model.safetensors",
25
+ "model.layers.0.self_attn.k_proj.scales": "model.safetensors",
26
+ "model.layers.0.self_attn.k_proj.weight": "model.safetensors",
27
+ "model.layers.0.self_attn.o_proj.biases": "model.safetensors",
28
+ "model.layers.0.self_attn.o_proj.scales": "model.safetensors",
29
+ "model.layers.0.self_attn.o_proj.weight": "model.safetensors",
30
+ "model.layers.0.self_attn.q_proj.biases": "model.safetensors",
31
+ "model.layers.0.self_attn.q_proj.scales": "model.safetensors",
32
+ "model.layers.0.self_attn.q_proj.weight": "model.safetensors",
33
+ "model.layers.0.self_attn.v_proj.biases": "model.safetensors",
34
+ "model.layers.0.self_attn.v_proj.scales": "model.safetensors",
35
+ "model.layers.0.self_attn.v_proj.weight": "model.safetensors",
36
+ "model.layers.1.input_layernorm.weight": "model.safetensors",
37
+ "model.layers.1.mlp.down_proj.biases": "model.safetensors",
38
+ "model.layers.1.mlp.down_proj.scales": "model.safetensors",
39
+ "model.layers.1.mlp.down_proj.weight": "model.safetensors",
40
+ "model.layers.1.mlp.gate_proj.biases": "model.safetensors",
41
+ "model.layers.1.mlp.gate_proj.scales": "model.safetensors",
42
+ "model.layers.1.mlp.gate_proj.weight": "model.safetensors",
43
+ "model.layers.1.mlp.up_proj.biases": "model.safetensors",
44
+ "model.layers.1.mlp.up_proj.scales": "model.safetensors",
45
+ "model.layers.1.mlp.up_proj.weight": "model.safetensors",
46
+ "model.layers.1.post_attention_layernorm.weight": "model.safetensors",
47
+ "model.layers.1.self_attn.k_proj.biases": "model.safetensors",
48
+ "model.layers.1.self_attn.k_proj.scales": "model.safetensors",
49
+ "model.layers.1.self_attn.k_proj.weight": "model.safetensors",
50
+ "model.layers.1.self_attn.o_proj.biases": "model.safetensors",
51
+ "model.layers.1.self_attn.o_proj.scales": "model.safetensors",
52
+ "model.layers.1.self_attn.o_proj.weight": "model.safetensors",
53
+ "model.layers.1.self_attn.q_proj.biases": "model.safetensors",
54
+ "model.layers.1.self_attn.q_proj.scales": "model.safetensors",
55
+ "model.layers.1.self_attn.q_proj.weight": "model.safetensors",
56
+ "model.layers.1.self_attn.v_proj.biases": "model.safetensors",
57
+ "model.layers.1.self_attn.v_proj.scales": "model.safetensors",
58
+ "model.layers.1.self_attn.v_proj.weight": "model.safetensors",
59
+ "model.layers.10.input_layernorm.weight": "model.safetensors",
60
+ "model.layers.10.mlp.down_proj.biases": "model.safetensors",
61
+ "model.layers.10.mlp.down_proj.scales": "model.safetensors",
62
+ "model.layers.10.mlp.down_proj.weight": "model.safetensors",
63
+ "model.layers.10.mlp.gate_proj.biases": "model.safetensors",
64
+ "model.layers.10.mlp.gate_proj.scales": "model.safetensors",
65
+ "model.layers.10.mlp.gate_proj.weight": "model.safetensors",
66
+ "model.layers.10.mlp.up_proj.biases": "model.safetensors",
67
+ "model.layers.10.mlp.up_proj.scales": "model.safetensors",
68
+ "model.layers.10.mlp.up_proj.weight": "model.safetensors",
69
+ "model.layers.10.post_attention_layernorm.weight": "model.safetensors",
70
+ "model.layers.10.self_attn.k_proj.biases": "model.safetensors",
71
+ "model.layers.10.self_attn.k_proj.scales": "model.safetensors",
72
+ "model.layers.10.self_attn.k_proj.weight": "model.safetensors",
73
+ "model.layers.10.self_attn.o_proj.biases": "model.safetensors",
74
+ "model.layers.10.self_attn.o_proj.scales": "model.safetensors",
75
+ "model.layers.10.self_attn.o_proj.weight": "model.safetensors",
76
+ "model.layers.10.self_attn.q_proj.biases": "model.safetensors",
77
+ "model.layers.10.self_attn.q_proj.scales": "model.safetensors",
78
+ "model.layers.10.self_attn.q_proj.weight": "model.safetensors",
79
+ "model.layers.10.self_attn.v_proj.biases": "model.safetensors",
80
+ "model.layers.10.self_attn.v_proj.scales": "model.safetensors",
81
+ "model.layers.10.self_attn.v_proj.weight": "model.safetensors",
82
+ "model.layers.11.input_layernorm.weight": "model.safetensors",
83
+ "model.layers.11.mlp.down_proj.biases": "model.safetensors",
84
+ "model.layers.11.mlp.down_proj.scales": "model.safetensors",
85
+ "model.layers.11.mlp.down_proj.weight": "model.safetensors",
86
+ "model.layers.11.mlp.gate_proj.biases": "model.safetensors",
87
+ "model.layers.11.mlp.gate_proj.scales": "model.safetensors",
88
+ "model.layers.11.mlp.gate_proj.weight": "model.safetensors",
89
+ "model.layers.11.mlp.up_proj.biases": "model.safetensors",
90
+ "model.layers.11.mlp.up_proj.scales": "model.safetensors",
91
+ "model.layers.11.mlp.up_proj.weight": "model.safetensors",
92
+ "model.layers.11.post_attention_layernorm.weight": "model.safetensors",
93
+ "model.layers.11.self_attn.k_proj.biases": "model.safetensors",
94
+ "model.layers.11.self_attn.k_proj.scales": "model.safetensors",
95
+ "model.layers.11.self_attn.k_proj.weight": "model.safetensors",
96
+ "model.layers.11.self_attn.o_proj.biases": "model.safetensors",
97
+ "model.layers.11.self_attn.o_proj.scales": "model.safetensors",
98
+ "model.layers.11.self_attn.o_proj.weight": "model.safetensors",
99
+ "model.layers.11.self_attn.q_proj.biases": "model.safetensors",
100
+ "model.layers.11.self_attn.q_proj.scales": "model.safetensors",
101
+ "model.layers.11.self_attn.q_proj.weight": "model.safetensors",
102
+ "model.layers.11.self_attn.v_proj.biases": "model.safetensors",
103
+ "model.layers.11.self_attn.v_proj.scales": "model.safetensors",
104
+ "model.layers.11.self_attn.v_proj.weight": "model.safetensors",
105
+ "model.layers.12.input_layernorm.weight": "model.safetensors",
106
+ "model.layers.12.mlp.down_proj.biases": "model.safetensors",
107
+ "model.layers.12.mlp.down_proj.scales": "model.safetensors",
108
+ "model.layers.12.mlp.down_proj.weight": "model.safetensors",
109
+ "model.layers.12.mlp.gate_proj.biases": "model.safetensors",
110
+ "model.layers.12.mlp.gate_proj.scales": "model.safetensors",
111
+ "model.layers.12.mlp.gate_proj.weight": "model.safetensors",
112
+ "model.layers.12.mlp.up_proj.biases": "model.safetensors",
113
+ "model.layers.12.mlp.up_proj.scales": "model.safetensors",
114
+ "model.layers.12.mlp.up_proj.weight": "model.safetensors",
115
+ "model.layers.12.post_attention_layernorm.weight": "model.safetensors",
116
+ "model.layers.12.self_attn.k_proj.biases": "model.safetensors",
117
+ "model.layers.12.self_attn.k_proj.scales": "model.safetensors",
118
+ "model.layers.12.self_attn.k_proj.weight": "model.safetensors",
119
+ "model.layers.12.self_attn.o_proj.biases": "model.safetensors",
120
+ "model.layers.12.self_attn.o_proj.scales": "model.safetensors",
121
+ "model.layers.12.self_attn.o_proj.weight": "model.safetensors",
122
+ "model.layers.12.self_attn.q_proj.biases": "model.safetensors",
123
+ "model.layers.12.self_attn.q_proj.scales": "model.safetensors",
124
+ "model.layers.12.self_attn.q_proj.weight": "model.safetensors",
125
+ "model.layers.12.self_attn.v_proj.biases": "model.safetensors",
126
+ "model.layers.12.self_attn.v_proj.scales": "model.safetensors",
127
+ "model.layers.12.self_attn.v_proj.weight": "model.safetensors",
128
+ "model.layers.13.input_layernorm.weight": "model.safetensors",
129
+ "model.layers.13.mlp.down_proj.biases": "model.safetensors",
130
+ "model.layers.13.mlp.down_proj.scales": "model.safetensors",
131
+ "model.layers.13.mlp.down_proj.weight": "model.safetensors",
132
+ "model.layers.13.mlp.gate_proj.biases": "model.safetensors",
133
+ "model.layers.13.mlp.gate_proj.scales": "model.safetensors",
134
+ "model.layers.13.mlp.gate_proj.weight": "model.safetensors",
135
+ "model.layers.13.mlp.up_proj.biases": "model.safetensors",
136
+ "model.layers.13.mlp.up_proj.scales": "model.safetensors",
137
+ "model.layers.13.mlp.up_proj.weight": "model.safetensors",
138
+ "model.layers.13.post_attention_layernorm.weight": "model.safetensors",
139
+ "model.layers.13.self_attn.k_proj.biases": "model.safetensors",
140
+ "model.layers.13.self_attn.k_proj.scales": "model.safetensors",
141
+ "model.layers.13.self_attn.k_proj.weight": "model.safetensors",
142
+ "model.layers.13.self_attn.o_proj.biases": "model.safetensors",
143
+ "model.layers.13.self_attn.o_proj.scales": "model.safetensors",
144
+ "model.layers.13.self_attn.o_proj.weight": "model.safetensors",
145
+ "model.layers.13.self_attn.q_proj.biases": "model.safetensors",
146
+ "model.layers.13.self_attn.q_proj.scales": "model.safetensors",
147
+ "model.layers.13.self_attn.q_proj.weight": "model.safetensors",
148
+ "model.layers.13.self_attn.v_proj.biases": "model.safetensors",
149
+ "model.layers.13.self_attn.v_proj.scales": "model.safetensors",
150
+ "model.layers.13.self_attn.v_proj.weight": "model.safetensors",
151
+ "model.layers.14.input_layernorm.weight": "model.safetensors",
152
+ "model.layers.14.mlp.down_proj.biases": "model.safetensors",
153
+ "model.layers.14.mlp.down_proj.scales": "model.safetensors",
154
+ "model.layers.14.mlp.down_proj.weight": "model.safetensors",
155
+ "model.layers.14.mlp.gate_proj.biases": "model.safetensors",
156
+ "model.layers.14.mlp.gate_proj.scales": "model.safetensors",
157
+ "model.layers.14.mlp.gate_proj.weight": "model.safetensors",
158
+ "model.layers.14.mlp.up_proj.biases": "model.safetensors",
159
+ "model.layers.14.mlp.up_proj.scales": "model.safetensors",
160
+ "model.layers.14.mlp.up_proj.weight": "model.safetensors",
161
+ "model.layers.14.post_attention_layernorm.weight": "model.safetensors",
162
+ "model.layers.14.self_attn.k_proj.biases": "model.safetensors",
163
+ "model.layers.14.self_attn.k_proj.scales": "model.safetensors",
164
+ "model.layers.14.self_attn.k_proj.weight": "model.safetensors",
165
+ "model.layers.14.self_attn.o_proj.biases": "model.safetensors",
166
+ "model.layers.14.self_attn.o_proj.scales": "model.safetensors",
167
+ "model.layers.14.self_attn.o_proj.weight": "model.safetensors",
168
+ "model.layers.14.self_attn.q_proj.biases": "model.safetensors",
169
+ "model.layers.14.self_attn.q_proj.scales": "model.safetensors",
170
+ "model.layers.14.self_attn.q_proj.weight": "model.safetensors",
171
+ "model.layers.14.self_attn.v_proj.biases": "model.safetensors",
172
+ "model.layers.14.self_attn.v_proj.scales": "model.safetensors",
173
+ "model.layers.14.self_attn.v_proj.weight": "model.safetensors",
174
+ "model.layers.15.input_layernorm.weight": "model.safetensors",
175
+ "model.layers.15.mlp.down_proj.biases": "model.safetensors",
176
+ "model.layers.15.mlp.down_proj.scales": "model.safetensors",
177
+ "model.layers.15.mlp.down_proj.weight": "model.safetensors",
178
+ "model.layers.15.mlp.gate_proj.biases": "model.safetensors",
179
+ "model.layers.15.mlp.gate_proj.scales": "model.safetensors",
180
+ "model.layers.15.mlp.gate_proj.weight": "model.safetensors",
181
+ "model.layers.15.mlp.up_proj.biases": "model.safetensors",
182
+ "model.layers.15.mlp.up_proj.scales": "model.safetensors",
183
+ "model.layers.15.mlp.up_proj.weight": "model.safetensors",
184
+ "model.layers.15.post_attention_layernorm.weight": "model.safetensors",
185
+ "model.layers.15.self_attn.k_proj.biases": "model.safetensors",
186
+ "model.layers.15.self_attn.k_proj.scales": "model.safetensors",
187
+ "model.layers.15.self_attn.k_proj.weight": "model.safetensors",
188
+ "model.layers.15.self_attn.o_proj.biases": "model.safetensors",
189
+ "model.layers.15.self_attn.o_proj.scales": "model.safetensors",
190
+ "model.layers.15.self_attn.o_proj.weight": "model.safetensors",
191
+ "model.layers.15.self_attn.q_proj.biases": "model.safetensors",
192
+ "model.layers.15.self_attn.q_proj.scales": "model.safetensors",
193
+ "model.layers.15.self_attn.q_proj.weight": "model.safetensors",
194
+ "model.layers.15.self_attn.v_proj.biases": "model.safetensors",
195
+ "model.layers.15.self_attn.v_proj.scales": "model.safetensors",
196
+ "model.layers.15.self_attn.v_proj.weight": "model.safetensors",
197
+ "model.layers.16.input_layernorm.weight": "model.safetensors",
198
+ "model.layers.16.mlp.down_proj.biases": "model.safetensors",
199
+ "model.layers.16.mlp.down_proj.scales": "model.safetensors",
200
+ "model.layers.16.mlp.down_proj.weight": "model.safetensors",
201
+ "model.layers.16.mlp.gate_proj.biases": "model.safetensors",
202
+ "model.layers.16.mlp.gate_proj.scales": "model.safetensors",
203
+ "model.layers.16.mlp.gate_proj.weight": "model.safetensors",
204
+ "model.layers.16.mlp.up_proj.biases": "model.safetensors",
205
+ "model.layers.16.mlp.up_proj.scales": "model.safetensors",
206
+ "model.layers.16.mlp.up_proj.weight": "model.safetensors",
207
+ "model.layers.16.post_attention_layernorm.weight": "model.safetensors",
208
+ "model.layers.16.self_attn.k_proj.biases": "model.safetensors",
209
+ "model.layers.16.self_attn.k_proj.scales": "model.safetensors",
210
+ "model.layers.16.self_attn.k_proj.weight": "model.safetensors",
211
+ "model.layers.16.self_attn.o_proj.biases": "model.safetensors",
212
+ "model.layers.16.self_attn.o_proj.scales": "model.safetensors",
213
+ "model.layers.16.self_attn.o_proj.weight": "model.safetensors",
214
+ "model.layers.16.self_attn.q_proj.biases": "model.safetensors",
215
+ "model.layers.16.self_attn.q_proj.scales": "model.safetensors",
216
+ "model.layers.16.self_attn.q_proj.weight": "model.safetensors",
217
+ "model.layers.16.self_attn.v_proj.biases": "model.safetensors",
218
+ "model.layers.16.self_attn.v_proj.scales": "model.safetensors",
219
+ "model.layers.16.self_attn.v_proj.weight": "model.safetensors",
220
+ "model.layers.17.input_layernorm.weight": "model.safetensors",
221
+ "model.layers.17.mlp.down_proj.biases": "model.safetensors",
222
+ "model.layers.17.mlp.down_proj.scales": "model.safetensors",
223
+ "model.layers.17.mlp.down_proj.weight": "model.safetensors",
224
+ "model.layers.17.mlp.gate_proj.biases": "model.safetensors",
225
+ "model.layers.17.mlp.gate_proj.scales": "model.safetensors",
226
+ "model.layers.17.mlp.gate_proj.weight": "model.safetensors",
227
+ "model.layers.17.mlp.up_proj.biases": "model.safetensors",
228
+ "model.layers.17.mlp.up_proj.scales": "model.safetensors",
229
+ "model.layers.17.mlp.up_proj.weight": "model.safetensors",
230
+ "model.layers.17.post_attention_layernorm.weight": "model.safetensors",
231
+ "model.layers.17.self_attn.k_proj.biases": "model.safetensors",
232
+ "model.layers.17.self_attn.k_proj.scales": "model.safetensors",
233
+ "model.layers.17.self_attn.k_proj.weight": "model.safetensors",
234
+ "model.layers.17.self_attn.o_proj.biases": "model.safetensors",
235
+ "model.layers.17.self_attn.o_proj.scales": "model.safetensors",
236
+ "model.layers.17.self_attn.o_proj.weight": "model.safetensors",
237
+ "model.layers.17.self_attn.q_proj.biases": "model.safetensors",
238
+ "model.layers.17.self_attn.q_proj.scales": "model.safetensors",
239
+ "model.layers.17.self_attn.q_proj.weight": "model.safetensors",
240
+ "model.layers.17.self_attn.v_proj.biases": "model.safetensors",
241
+ "model.layers.17.self_attn.v_proj.scales": "model.safetensors",
242
+ "model.layers.17.self_attn.v_proj.weight": "model.safetensors",
243
+ "model.layers.18.input_layernorm.weight": "model.safetensors",
244
+ "model.layers.18.mlp.down_proj.biases": "model.safetensors",
245
+ "model.layers.18.mlp.down_proj.scales": "model.safetensors",
246
+ "model.layers.18.mlp.down_proj.weight": "model.safetensors",
247
+ "model.layers.18.mlp.gate_proj.biases": "model.safetensors",
248
+ "model.layers.18.mlp.gate_proj.scales": "model.safetensors",
249
+ "model.layers.18.mlp.gate_proj.weight": "model.safetensors",
250
+ "model.layers.18.mlp.up_proj.biases": "model.safetensors",
251
+ "model.layers.18.mlp.up_proj.scales": "model.safetensors",
252
+ "model.layers.18.mlp.up_proj.weight": "model.safetensors",
253
+ "model.layers.18.post_attention_layernorm.weight": "model.safetensors",
254
+ "model.layers.18.self_attn.k_proj.biases": "model.safetensors",
255
+ "model.layers.18.self_attn.k_proj.scales": "model.safetensors",
256
+ "model.layers.18.self_attn.k_proj.weight": "model.safetensors",
257
+ "model.layers.18.self_attn.o_proj.biases": "model.safetensors",
258
+ "model.layers.18.self_attn.o_proj.scales": "model.safetensors",
259
+ "model.layers.18.self_attn.o_proj.weight": "model.safetensors",
260
+ "model.layers.18.self_attn.q_proj.biases": "model.safetensors",
261
+ "model.layers.18.self_attn.q_proj.scales": "model.safetensors",
262
+ "model.layers.18.self_attn.q_proj.weight": "model.safetensors",
263
+ "model.layers.18.self_attn.v_proj.biases": "model.safetensors",
264
+ "model.layers.18.self_attn.v_proj.scales": "model.safetensors",
265
+ "model.layers.18.self_attn.v_proj.weight": "model.safetensors",
266
+ "model.layers.19.input_layernorm.weight": "model.safetensors",
267
+ "model.layers.19.mlp.down_proj.biases": "model.safetensors",
268
+ "model.layers.19.mlp.down_proj.scales": "model.safetensors",
269
+ "model.layers.19.mlp.down_proj.weight": "model.safetensors",
270
+ "model.layers.19.mlp.gate_proj.biases": "model.safetensors",
271
+ "model.layers.19.mlp.gate_proj.scales": "model.safetensors",
272
+ "model.layers.19.mlp.gate_proj.weight": "model.safetensors",
273
+ "model.layers.19.mlp.up_proj.biases": "model.safetensors",
274
+ "model.layers.19.mlp.up_proj.scales": "model.safetensors",
275
+ "model.layers.19.mlp.up_proj.weight": "model.safetensors",
276
+ "model.layers.19.post_attention_layernorm.weight": "model.safetensors",
277
+ "model.layers.19.self_attn.k_proj.biases": "model.safetensors",
278
+ "model.layers.19.self_attn.k_proj.scales": "model.safetensors",
279
+ "model.layers.19.self_attn.k_proj.weight": "model.safetensors",
280
+ "model.layers.19.self_attn.o_proj.biases": "model.safetensors",
281
+ "model.layers.19.self_attn.o_proj.scales": "model.safetensors",
282
+ "model.layers.19.self_attn.o_proj.weight": "model.safetensors",
283
+ "model.layers.19.self_attn.q_proj.biases": "model.safetensors",
284
+ "model.layers.19.self_attn.q_proj.scales": "model.safetensors",
285
+ "model.layers.19.self_attn.q_proj.weight": "model.safetensors",
286
+ "model.layers.19.self_attn.v_proj.biases": "model.safetensors",
287
+ "model.layers.19.self_attn.v_proj.scales": "model.safetensors",
288
+ "model.layers.19.self_attn.v_proj.weight": "model.safetensors",
289
+ "model.layers.2.input_layernorm.weight": "model.safetensors",
290
+ "model.layers.2.mlp.down_proj.biases": "model.safetensors",
291
+ "model.layers.2.mlp.down_proj.scales": "model.safetensors",
292
+ "model.layers.2.mlp.down_proj.weight": "model.safetensors",
293
+ "model.layers.2.mlp.gate_proj.biases": "model.safetensors",
294
+ "model.layers.2.mlp.gate_proj.scales": "model.safetensors",
295
+ "model.layers.2.mlp.gate_proj.weight": "model.safetensors",
296
+ "model.layers.2.mlp.up_proj.biases": "model.safetensors",
297
+ "model.layers.2.mlp.up_proj.scales": "model.safetensors",
298
+ "model.layers.2.mlp.up_proj.weight": "model.safetensors",
299
+ "model.layers.2.post_attention_layernorm.weight": "model.safetensors",
300
+ "model.layers.2.self_attn.k_proj.biases": "model.safetensors",
301
+ "model.layers.2.self_attn.k_proj.scales": "model.safetensors",
302
+ "model.layers.2.self_attn.k_proj.weight": "model.safetensors",
303
+ "model.layers.2.self_attn.o_proj.biases": "model.safetensors",
304
+ "model.layers.2.self_attn.o_proj.scales": "model.safetensors",
305
+ "model.layers.2.self_attn.o_proj.weight": "model.safetensors",
306
+ "model.layers.2.self_attn.q_proj.biases": "model.safetensors",
307
+ "model.layers.2.self_attn.q_proj.scales": "model.safetensors",
308
+ "model.layers.2.self_attn.q_proj.weight": "model.safetensors",
309
+ "model.layers.2.self_attn.v_proj.biases": "model.safetensors",
310
+ "model.layers.2.self_attn.v_proj.scales": "model.safetensors",
311
+ "model.layers.2.self_attn.v_proj.weight": "model.safetensors",
312
+ "model.layers.20.input_layernorm.weight": "model.safetensors",
313
+ "model.layers.20.mlp.down_proj.biases": "model.safetensors",
314
+ "model.layers.20.mlp.down_proj.scales": "model.safetensors",
315
+ "model.layers.20.mlp.down_proj.weight": "model.safetensors",
316
+ "model.layers.20.mlp.gate_proj.biases": "model.safetensors",
317
+ "model.layers.20.mlp.gate_proj.scales": "model.safetensors",
318
+ "model.layers.20.mlp.gate_proj.weight": "model.safetensors",
319
+ "model.layers.20.mlp.up_proj.biases": "model.safetensors",
320
+ "model.layers.20.mlp.up_proj.scales": "model.safetensors",
321
+ "model.layers.20.mlp.up_proj.weight": "model.safetensors",
322
+ "model.layers.20.post_attention_layernorm.weight": "model.safetensors",
323
+ "model.layers.20.self_attn.k_proj.biases": "model.safetensors",
324
+ "model.layers.20.self_attn.k_proj.scales": "model.safetensors",
325
+ "model.layers.20.self_attn.k_proj.weight": "model.safetensors",
326
+ "model.layers.20.self_attn.o_proj.biases": "model.safetensors",
327
+ "model.layers.20.self_attn.o_proj.scales": "model.safetensors",
328
+ "model.layers.20.self_attn.o_proj.weight": "model.safetensors",
329
+ "model.layers.20.self_attn.q_proj.biases": "model.safetensors",
330
+ "model.layers.20.self_attn.q_proj.scales": "model.safetensors",
331
+ "model.layers.20.self_attn.q_proj.weight": "model.safetensors",
332
+ "model.layers.20.self_attn.v_proj.biases": "model.safetensors",
333
+ "model.layers.20.self_attn.v_proj.scales": "model.safetensors",
334
+ "model.layers.20.self_attn.v_proj.weight": "model.safetensors",
335
+ "model.layers.21.input_layernorm.weight": "model.safetensors",
336
+ "model.layers.21.mlp.down_proj.biases": "model.safetensors",
337
+ "model.layers.21.mlp.down_proj.scales": "model.safetensors",
338
+ "model.layers.21.mlp.down_proj.weight": "model.safetensors",
339
+ "model.layers.21.mlp.gate_proj.biases": "model.safetensors",
340
+ "model.layers.21.mlp.gate_proj.scales": "model.safetensors",
341
+ "model.layers.21.mlp.gate_proj.weight": "model.safetensors",
342
+ "model.layers.21.mlp.up_proj.biases": "model.safetensors",
343
+ "model.layers.21.mlp.up_proj.scales": "model.safetensors",
344
+ "model.layers.21.mlp.up_proj.weight": "model.safetensors",
345
+ "model.layers.21.post_attention_layernorm.weight": "model.safetensors",
346
+ "model.layers.21.self_attn.k_proj.biases": "model.safetensors",
347
+ "model.layers.21.self_attn.k_proj.scales": "model.safetensors",
348
+ "model.layers.21.self_attn.k_proj.weight": "model.safetensors",
349
+ "model.layers.21.self_attn.o_proj.biases": "model.safetensors",
350
+ "model.layers.21.self_attn.o_proj.scales": "model.safetensors",
351
+ "model.layers.21.self_attn.o_proj.weight": "model.safetensors",
352
+ "model.layers.21.self_attn.q_proj.biases": "model.safetensors",
353
+ "model.layers.21.self_attn.q_proj.scales": "model.safetensors",
354
+ "model.layers.21.self_attn.q_proj.weight": "model.safetensors",
355
+ "model.layers.21.self_attn.v_proj.biases": "model.safetensors",
356
+ "model.layers.21.self_attn.v_proj.scales": "model.safetensors",
357
+ "model.layers.21.self_attn.v_proj.weight": "model.safetensors",
358
+ "model.layers.22.input_layernorm.weight": "model.safetensors",
359
+ "model.layers.22.mlp.down_proj.biases": "model.safetensors",
360
+ "model.layers.22.mlp.down_proj.scales": "model.safetensors",
361
+ "model.layers.22.mlp.down_proj.weight": "model.safetensors",
362
+ "model.layers.22.mlp.gate_proj.biases": "model.safetensors",
363
+ "model.layers.22.mlp.gate_proj.scales": "model.safetensors",
364
+ "model.layers.22.mlp.gate_proj.weight": "model.safetensors",
365
+ "model.layers.22.mlp.up_proj.biases": "model.safetensors",
366
+ "model.layers.22.mlp.up_proj.scales": "model.safetensors",
367
+ "model.layers.22.mlp.up_proj.weight": "model.safetensors",
368
+ "model.layers.22.post_attention_layernorm.weight": "model.safetensors",
369
+ "model.layers.22.self_attn.k_proj.biases": "model.safetensors",
370
+ "model.layers.22.self_attn.k_proj.scales": "model.safetensors",
371
+ "model.layers.22.self_attn.k_proj.weight": "model.safetensors",
372
+ "model.layers.22.self_attn.o_proj.biases": "model.safetensors",
373
+ "model.layers.22.self_attn.o_proj.scales": "model.safetensors",
374
+ "model.layers.22.self_attn.o_proj.weight": "model.safetensors",
375
+ "model.layers.22.self_attn.q_proj.biases": "model.safetensors",
376
+ "model.layers.22.self_attn.q_proj.scales": "model.safetensors",
377
+ "model.layers.22.self_attn.q_proj.weight": "model.safetensors",
378
+ "model.layers.22.self_attn.v_proj.biases": "model.safetensors",
379
+ "model.layers.22.self_attn.v_proj.scales": "model.safetensors",
380
+ "model.layers.22.self_attn.v_proj.weight": "model.safetensors",
381
+ "model.layers.23.input_layernorm.weight": "model.safetensors",
382
+ "model.layers.23.mlp.down_proj.biases": "model.safetensors",
383
+ "model.layers.23.mlp.down_proj.scales": "model.safetensors",
384
+ "model.layers.23.mlp.down_proj.weight": "model.safetensors",
385
+ "model.layers.23.mlp.gate_proj.biases": "model.safetensors",
386
+ "model.layers.23.mlp.gate_proj.scales": "model.safetensors",
387
+ "model.layers.23.mlp.gate_proj.weight": "model.safetensors",
388
+ "model.layers.23.mlp.up_proj.biases": "model.safetensors",
389
+ "model.layers.23.mlp.up_proj.scales": "model.safetensors",
390
+ "model.layers.23.mlp.up_proj.weight": "model.safetensors",
391
+ "model.layers.23.post_attention_layernorm.weight": "model.safetensors",
392
+ "model.layers.23.self_attn.k_proj.biases": "model.safetensors",
393
+ "model.layers.23.self_attn.k_proj.scales": "model.safetensors",
394
+ "model.layers.23.self_attn.k_proj.weight": "model.safetensors",
395
+ "model.layers.23.self_attn.o_proj.biases": "model.safetensors",
396
+ "model.layers.23.self_attn.o_proj.scales": "model.safetensors",
397
+ "model.layers.23.self_attn.o_proj.weight": "model.safetensors",
398
+ "model.layers.23.self_attn.q_proj.biases": "model.safetensors",
399
+ "model.layers.23.self_attn.q_proj.scales": "model.safetensors",
400
+ "model.layers.23.self_attn.q_proj.weight": "model.safetensors",
401
+ "model.layers.23.self_attn.v_proj.biases": "model.safetensors",
402
+ "model.layers.23.self_attn.v_proj.scales": "model.safetensors",
403
+ "model.layers.23.self_attn.v_proj.weight": "model.safetensors",
404
+ "model.layers.3.input_layernorm.weight": "model.safetensors",
405
+ "model.layers.3.mlp.down_proj.biases": "model.safetensors",
406
+ "model.layers.3.mlp.down_proj.scales": "model.safetensors",
407
+ "model.layers.3.mlp.down_proj.weight": "model.safetensors",
408
+ "model.layers.3.mlp.gate_proj.biases": "model.safetensors",
409
+ "model.layers.3.mlp.gate_proj.scales": "model.safetensors",
410
+ "model.layers.3.mlp.gate_proj.weight": "model.safetensors",
411
+ "model.layers.3.mlp.up_proj.biases": "model.safetensors",
412
+ "model.layers.3.mlp.up_proj.scales": "model.safetensors",
413
+ "model.layers.3.mlp.up_proj.weight": "model.safetensors",
414
+ "model.layers.3.post_attention_layernorm.weight": "model.safetensors",
415
+ "model.layers.3.self_attn.k_proj.biases": "model.safetensors",
416
+ "model.layers.3.self_attn.k_proj.scales": "model.safetensors",
417
+ "model.layers.3.self_attn.k_proj.weight": "model.safetensors",
418
+ "model.layers.3.self_attn.o_proj.biases": "model.safetensors",
419
+ "model.layers.3.self_attn.o_proj.scales": "model.safetensors",
420
+ "model.layers.3.self_attn.o_proj.weight": "model.safetensors",
421
+ "model.layers.3.self_attn.q_proj.biases": "model.safetensors",
422
+ "model.layers.3.self_attn.q_proj.scales": "model.safetensors",
423
+ "model.layers.3.self_attn.q_proj.weight": "model.safetensors",
424
+ "model.layers.3.self_attn.v_proj.biases": "model.safetensors",
425
+ "model.layers.3.self_attn.v_proj.scales": "model.safetensors",
426
+ "model.layers.3.self_attn.v_proj.weight": "model.safetensors",
427
+ "model.layers.4.input_layernorm.weight": "model.safetensors",
428
+ "model.layers.4.mlp.down_proj.biases": "model.safetensors",
429
+ "model.layers.4.mlp.down_proj.scales": "model.safetensors",
430
+ "model.layers.4.mlp.down_proj.weight": "model.safetensors",
431
+ "model.layers.4.mlp.gate_proj.biases": "model.safetensors",
432
+ "model.layers.4.mlp.gate_proj.scales": "model.safetensors",
433
+ "model.layers.4.mlp.gate_proj.weight": "model.safetensors",
434
+ "model.layers.4.mlp.up_proj.biases": "model.safetensors",
435
+ "model.layers.4.mlp.up_proj.scales": "model.safetensors",
436
+ "model.layers.4.mlp.up_proj.weight": "model.safetensors",
437
+ "model.layers.4.post_attention_layernorm.weight": "model.safetensors",
438
+ "model.layers.4.self_attn.k_proj.biases": "model.safetensors",
439
+ "model.layers.4.self_attn.k_proj.scales": "model.safetensors",
440
+ "model.layers.4.self_attn.k_proj.weight": "model.safetensors",
441
+ "model.layers.4.self_attn.o_proj.biases": "model.safetensors",
442
+ "model.layers.4.self_attn.o_proj.scales": "model.safetensors",
443
+ "model.layers.4.self_attn.o_proj.weight": "model.safetensors",
444
+ "model.layers.4.self_attn.q_proj.biases": "model.safetensors",
445
+ "model.layers.4.self_attn.q_proj.scales": "model.safetensors",
446
+ "model.layers.4.self_attn.q_proj.weight": "model.safetensors",
447
+ "model.layers.4.self_attn.v_proj.biases": "model.safetensors",
448
+ "model.layers.4.self_attn.v_proj.scales": "model.safetensors",
449
+ "model.layers.4.self_attn.v_proj.weight": "model.safetensors",
450
+ "model.layers.5.input_layernorm.weight": "model.safetensors",
451
+ "model.layers.5.mlp.down_proj.biases": "model.safetensors",
452
+ "model.layers.5.mlp.down_proj.scales": "model.safetensors",
453
+ "model.layers.5.mlp.down_proj.weight": "model.safetensors",
454
+ "model.layers.5.mlp.gate_proj.biases": "model.safetensors",
455
+ "model.layers.5.mlp.gate_proj.scales": "model.safetensors",
456
+ "model.layers.5.mlp.gate_proj.weight": "model.safetensors",
457
+ "model.layers.5.mlp.up_proj.biases": "model.safetensors",
458
+ "model.layers.5.mlp.up_proj.scales": "model.safetensors",
459
+ "model.layers.5.mlp.up_proj.weight": "model.safetensors",
460
+ "model.layers.5.post_attention_layernorm.weight": "model.safetensors",
461
+ "model.layers.5.self_attn.k_proj.biases": "model.safetensors",
462
+ "model.layers.5.self_attn.k_proj.scales": "model.safetensors",
463
+ "model.layers.5.self_attn.k_proj.weight": "model.safetensors",
464
+ "model.layers.5.self_attn.o_proj.biases": "model.safetensors",
465
+ "model.layers.5.self_attn.o_proj.scales": "model.safetensors",
466
+ "model.layers.5.self_attn.o_proj.weight": "model.safetensors",
467
+ "model.layers.5.self_attn.q_proj.biases": "model.safetensors",
468
+ "model.layers.5.self_attn.q_proj.scales": "model.safetensors",
469
+ "model.layers.5.self_attn.q_proj.weight": "model.safetensors",
470
+ "model.layers.5.self_attn.v_proj.biases": "model.safetensors",
471
+ "model.layers.5.self_attn.v_proj.scales": "model.safetensors",
472
+ "model.layers.5.self_attn.v_proj.weight": "model.safetensors",
473
+ "model.layers.6.input_layernorm.weight": "model.safetensors",
474
+ "model.layers.6.mlp.down_proj.biases": "model.safetensors",
475
+ "model.layers.6.mlp.down_proj.scales": "model.safetensors",
476
+ "model.layers.6.mlp.down_proj.weight": "model.safetensors",
477
+ "model.layers.6.mlp.gate_proj.biases": "model.safetensors",
478
+ "model.layers.6.mlp.gate_proj.scales": "model.safetensors",
479
+ "model.layers.6.mlp.gate_proj.weight": "model.safetensors",
480
+ "model.layers.6.mlp.up_proj.biases": "model.safetensors",
481
+ "model.layers.6.mlp.up_proj.scales": "model.safetensors",
482
+ "model.layers.6.mlp.up_proj.weight": "model.safetensors",
483
+ "model.layers.6.post_attention_layernorm.weight": "model.safetensors",
484
+ "model.layers.6.self_attn.k_proj.biases": "model.safetensors",
485
+ "model.layers.6.self_attn.k_proj.scales": "model.safetensors",
486
+ "model.layers.6.self_attn.k_proj.weight": "model.safetensors",
487
+ "model.layers.6.self_attn.o_proj.biases": "model.safetensors",
488
+ "model.layers.6.self_attn.o_proj.scales": "model.safetensors",
489
+ "model.layers.6.self_attn.o_proj.weight": "model.safetensors",
490
+ "model.layers.6.self_attn.q_proj.biases": "model.safetensors",
491
+ "model.layers.6.self_attn.q_proj.scales": "model.safetensors",
492
+ "model.layers.6.self_attn.q_proj.weight": "model.safetensors",
493
+ "model.layers.6.self_attn.v_proj.biases": "model.safetensors",
494
+ "model.layers.6.self_attn.v_proj.scales": "model.safetensors",
495
+ "model.layers.6.self_attn.v_proj.weight": "model.safetensors",
496
+ "model.layers.7.input_layernorm.weight": "model.safetensors",
497
+ "model.layers.7.mlp.down_proj.biases": "model.safetensors",
498
+ "model.layers.7.mlp.down_proj.scales": "model.safetensors",
499
+ "model.layers.7.mlp.down_proj.weight": "model.safetensors",
500
+ "model.layers.7.mlp.gate_proj.biases": "model.safetensors",
501
+ "model.layers.7.mlp.gate_proj.scales": "model.safetensors",
502
+ "model.layers.7.mlp.gate_proj.weight": "model.safetensors",
503
+ "model.layers.7.mlp.up_proj.biases": "model.safetensors",
504
+ "model.layers.7.mlp.up_proj.scales": "model.safetensors",
505
+ "model.layers.7.mlp.up_proj.weight": "model.safetensors",
506
+ "model.layers.7.post_attention_layernorm.weight": "model.safetensors",
507
+ "model.layers.7.self_attn.k_proj.biases": "model.safetensors",
508
+ "model.layers.7.self_attn.k_proj.scales": "model.safetensors",
509
+ "model.layers.7.self_attn.k_proj.weight": "model.safetensors",
510
+ "model.layers.7.self_attn.o_proj.biases": "model.safetensors",
511
+ "model.layers.7.self_attn.o_proj.scales": "model.safetensors",
512
+ "model.layers.7.self_attn.o_proj.weight": "model.safetensors",
513
+ "model.layers.7.self_attn.q_proj.biases": "model.safetensors",
514
+ "model.layers.7.self_attn.q_proj.scales": "model.safetensors",
515
+ "model.layers.7.self_attn.q_proj.weight": "model.safetensors",
516
+ "model.layers.7.self_attn.v_proj.biases": "model.safetensors",
517
+ "model.layers.7.self_attn.v_proj.scales": "model.safetensors",
518
+ "model.layers.7.self_attn.v_proj.weight": "model.safetensors",
519
+ "model.layers.8.input_layernorm.weight": "model.safetensors",
520
+ "model.layers.8.mlp.down_proj.biases": "model.safetensors",
521
+ "model.layers.8.mlp.down_proj.scales": "model.safetensors",
522
+ "model.layers.8.mlp.down_proj.weight": "model.safetensors",
523
+ "model.layers.8.mlp.gate_proj.biases": "model.safetensors",
524
+ "model.layers.8.mlp.gate_proj.scales": "model.safetensors",
525
+ "model.layers.8.mlp.gate_proj.weight": "model.safetensors",
526
+ "model.layers.8.mlp.up_proj.biases": "model.safetensors",
527
+ "model.layers.8.mlp.up_proj.scales": "model.safetensors",
528
+ "model.layers.8.mlp.up_proj.weight": "model.safetensors",
529
+ "model.layers.8.post_attention_layernorm.weight": "model.safetensors",
530
+ "model.layers.8.self_attn.k_proj.biases": "model.safetensors",
531
+ "model.layers.8.self_attn.k_proj.scales": "model.safetensors",
532
+ "model.layers.8.self_attn.k_proj.weight": "model.safetensors",
533
+ "model.layers.8.self_attn.o_proj.biases": "model.safetensors",
534
+ "model.layers.8.self_attn.o_proj.scales": "model.safetensors",
535
+ "model.layers.8.self_attn.o_proj.weight": "model.safetensors",
536
+ "model.layers.8.self_attn.q_proj.biases": "model.safetensors",
537
+ "model.layers.8.self_attn.q_proj.scales": "model.safetensors",
538
+ "model.layers.8.self_attn.q_proj.weight": "model.safetensors",
539
+ "model.layers.8.self_attn.v_proj.biases": "model.safetensors",
540
+ "model.layers.8.self_attn.v_proj.scales": "model.safetensors",
541
+ "model.layers.8.self_attn.v_proj.weight": "model.safetensors",
542
+ "model.layers.9.input_layernorm.weight": "model.safetensors",
543
+ "model.layers.9.mlp.down_proj.biases": "model.safetensors",
544
+ "model.layers.9.mlp.down_proj.scales": "model.safetensors",
545
+ "model.layers.9.mlp.down_proj.weight": "model.safetensors",
546
+ "model.layers.9.mlp.gate_proj.biases": "model.safetensors",
547
+ "model.layers.9.mlp.gate_proj.scales": "model.safetensors",
548
+ "model.layers.9.mlp.gate_proj.weight": "model.safetensors",
549
+ "model.layers.9.mlp.up_proj.biases": "model.safetensors",
550
+ "model.layers.9.mlp.up_proj.scales": "model.safetensors",
551
+ "model.layers.9.mlp.up_proj.weight": "model.safetensors",
552
+ "model.layers.9.post_attention_layernorm.weight": "model.safetensors",
553
+ "model.layers.9.self_attn.k_proj.biases": "model.safetensors",
554
+ "model.layers.9.self_attn.k_proj.scales": "model.safetensors",
555
+ "model.layers.9.self_attn.k_proj.weight": "model.safetensors",
556
+ "model.layers.9.self_attn.o_proj.biases": "model.safetensors",
557
+ "model.layers.9.self_attn.o_proj.scales": "model.safetensors",
558
+ "model.layers.9.self_attn.o_proj.weight": "model.safetensors",
559
+ "model.layers.9.self_attn.q_proj.biases": "model.safetensors",
560
+ "model.layers.9.self_attn.q_proj.scales": "model.safetensors",
561
+ "model.layers.9.self_attn.q_proj.weight": "model.safetensors",
562
+ "model.layers.9.self_attn.v_proj.biases": "model.safetensors",
563
+ "model.layers.9.self_attn.v_proj.scales": "model.safetensors",
564
+ "model.layers.9.self_attn.v_proj.weight": "model.safetensors",
565
+ "model.norm.weight": "model.safetensors"
566
+ }
567
+ }
optiq_metadata.json ADDED
@@ -0,0 +1,688 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "method": "optiq_mixed_precision",
3
+ "base_model": "openbmb/MiniCPM5-1B",
4
+ "reference": "bf16",
5
+ "target_bpw": 5.0,
6
+ "achieved_bpw": 5.805183199285076,
7
+ "n_high_bits": 67,
8
+ "n_low_bits": 102,
9
+ "threshold": 0.0,
10
+ "per_layer": {
11
+ "lm_head": {
12
+ "bits": 8,
13
+ "group_size": 64
14
+ },
15
+ "model.layers.23.mlp.up_proj": {
16
+ "bits": 8,
17
+ "group_size": 64
18
+ },
19
+ "model.layers.23.mlp.down_proj": {
20
+ "bits": 8,
21
+ "group_size": 64
22
+ },
23
+ "model.layers.23.mlp.gate_proj": {
24
+ "bits": 8,
25
+ "group_size": 64
26
+ },
27
+ "model.layers.23.self_attn.o_proj": {
28
+ "bits": 8,
29
+ "group_size": 64
30
+ },
31
+ "model.layers.23.self_attn.v_proj": {
32
+ "bits": 8,
33
+ "group_size": 64
34
+ },
35
+ "model.layers.23.self_attn.k_proj": {
36
+ "bits": 8,
37
+ "group_size": 64
38
+ },
39
+ "model.layers.23.self_attn.q_proj": {
40
+ "bits": 8,
41
+ "group_size": 64
42
+ },
43
+ "model.layers.22.mlp.up_proj": {
44
+ "bits": 8,
45
+ "group_size": 64
46
+ },
47
+ "model.layers.22.mlp.down_proj": {
48
+ "bits": 4,
49
+ "group_size": 64
50
+ },
51
+ "model.layers.22.mlp.gate_proj": {
52
+ "bits": 4,
53
+ "group_size": 64
54
+ },
55
+ "model.layers.22.self_attn.o_proj": {
56
+ "bits": 8,
57
+ "group_size": 64
58
+ },
59
+ "model.layers.22.self_attn.v_proj": {
60
+ "bits": 8,
61
+ "group_size": 64
62
+ },
63
+ "model.layers.22.self_attn.k_proj": {
64
+ "bits": 4,
65
+ "group_size": 64
66
+ },
67
+ "model.layers.22.self_attn.q_proj": {
68
+ "bits": 4,
69
+ "group_size": 64
70
+ },
71
+ "model.layers.21.mlp.up_proj": {
72
+ "bits": 4,
73
+ "group_size": 64
74
+ },
75
+ "model.layers.21.mlp.down_proj": {
76
+ "bits": 4,
77
+ "group_size": 64
78
+ },
79
+ "model.layers.21.mlp.gate_proj": {
80
+ "bits": 4,
81
+ "group_size": 64
82
+ },
83
+ "model.layers.21.self_attn.o_proj": {
84
+ "bits": 8,
85
+ "group_size": 64
86
+ },
87
+ "model.layers.21.self_attn.v_proj": {
88
+ "bits": 8,
89
+ "group_size": 64
90
+ },
91
+ "model.layers.21.self_attn.k_proj": {
92
+ "bits": 4,
93
+ "group_size": 64
94
+ },
95
+ "model.layers.21.self_attn.q_proj": {
96
+ "bits": 4,
97
+ "group_size": 64
98
+ },
99
+ "model.layers.20.mlp.up_proj": {
100
+ "bits": 4,
101
+ "group_size": 64
102
+ },
103
+ "model.layers.20.mlp.down_proj": {
104
+ "bits": 4,
105
+ "group_size": 64
106
+ },
107
+ "model.layers.20.mlp.gate_proj": {
108
+ "bits": 4,
109
+ "group_size": 64
110
+ },
111
+ "model.layers.20.self_attn.o_proj": {
112
+ "bits": 4,
113
+ "group_size": 64
114
+ },
115
+ "model.layers.20.self_attn.v_proj": {
116
+ "bits": 8,
117
+ "group_size": 64
118
+ },
119
+ "model.layers.20.self_attn.k_proj": {
120
+ "bits": 4,
121
+ "group_size": 64
122
+ },
123
+ "model.layers.20.self_attn.q_proj": {
124
+ "bits": 8,
125
+ "group_size": 64
126
+ },
127
+ "model.layers.19.mlp.up_proj": {
128
+ "bits": 8,
129
+ "group_size": 64
130
+ },
131
+ "model.layers.19.mlp.down_proj": {
132
+ "bits": 4,
133
+ "group_size": 64
134
+ },
135
+ "model.layers.19.mlp.gate_proj": {
136
+ "bits": 4,
137
+ "group_size": 64
138
+ },
139
+ "model.layers.19.self_attn.o_proj": {
140
+ "bits": 8,
141
+ "group_size": 64
142
+ },
143
+ "model.layers.19.self_attn.v_proj": {
144
+ "bits": 8,
145
+ "group_size": 64
146
+ },
147
+ "model.layers.19.self_attn.k_proj": {
148
+ "bits": 4,
149
+ "group_size": 64
150
+ },
151
+ "model.layers.19.self_attn.q_proj": {
152
+ "bits": 4,
153
+ "group_size": 64
154
+ },
155
+ "model.layers.18.mlp.up_proj": {
156
+ "bits": 4,
157
+ "group_size": 64
158
+ },
159
+ "model.layers.18.mlp.down_proj": {
160
+ "bits": 4,
161
+ "group_size": 64
162
+ },
163
+ "model.layers.18.mlp.gate_proj": {
164
+ "bits": 4,
165
+ "group_size": 64
166
+ },
167
+ "model.layers.18.self_attn.o_proj": {
168
+ "bits": 8,
169
+ "group_size": 64
170
+ },
171
+ "model.layers.18.self_attn.v_proj": {
172
+ "bits": 8,
173
+ "group_size": 64
174
+ },
175
+ "model.layers.18.self_attn.k_proj": {
176
+ "bits": 4,
177
+ "group_size": 64
178
+ },
179
+ "model.layers.18.self_attn.q_proj": {
180
+ "bits": 4,
181
+ "group_size": 64
182
+ },
183
+ "model.layers.17.mlp.up_proj": {
184
+ "bits": 4,
185
+ "group_size": 64
186
+ },
187
+ "model.layers.17.mlp.down_proj": {
188
+ "bits": 4,
189
+ "group_size": 64
190
+ },
191
+ "model.layers.17.mlp.gate_proj": {
192
+ "bits": 4,
193
+ "group_size": 64
194
+ },
195
+ "model.layers.17.self_attn.o_proj": {
196
+ "bits": 8,
197
+ "group_size": 64
198
+ },
199
+ "model.layers.17.self_attn.v_proj": {
200
+ "bits": 8,
201
+ "group_size": 64
202
+ },
203
+ "model.layers.17.self_attn.k_proj": {
204
+ "bits": 4,
205
+ "group_size": 64
206
+ },
207
+ "model.layers.17.self_attn.q_proj": {
208
+ "bits": 4,
209
+ "group_size": 64
210
+ },
211
+ "model.layers.16.mlp.up_proj": {
212
+ "bits": 8,
213
+ "group_size": 64
214
+ },
215
+ "model.layers.16.mlp.down_proj": {
216
+ "bits": 4,
217
+ "group_size": 64
218
+ },
219
+ "model.layers.16.mlp.gate_proj": {
220
+ "bits": 4,
221
+ "group_size": 64
222
+ },
223
+ "model.layers.16.self_attn.o_proj": {
224
+ "bits": 8,
225
+ "group_size": 64
226
+ },
227
+ "model.layers.16.self_attn.v_proj": {
228
+ "bits": 8,
229
+ "group_size": 64
230
+ },
231
+ "model.layers.16.self_attn.k_proj": {
232
+ "bits": 4,
233
+ "group_size": 64
234
+ },
235
+ "model.layers.16.self_attn.q_proj": {
236
+ "bits": 4,
237
+ "group_size": 64
238
+ },
239
+ "model.layers.15.mlp.up_proj": {
240
+ "bits": 4,
241
+ "group_size": 64
242
+ },
243
+ "model.layers.15.mlp.down_proj": {
244
+ "bits": 4,
245
+ "group_size": 64
246
+ },
247
+ "model.layers.15.mlp.gate_proj": {
248
+ "bits": 4,
249
+ "group_size": 64
250
+ },
251
+ "model.layers.15.self_attn.o_proj": {
252
+ "bits": 8,
253
+ "group_size": 64
254
+ },
255
+ "model.layers.15.self_attn.v_proj": {
256
+ "bits": 8,
257
+ "group_size": 64
258
+ },
259
+ "model.layers.15.self_attn.k_proj": {
260
+ "bits": 4,
261
+ "group_size": 64
262
+ },
263
+ "model.layers.15.self_attn.q_proj": {
264
+ "bits": 4,
265
+ "group_size": 64
266
+ },
267
+ "model.layers.14.mlp.up_proj": {
268
+ "bits": 4,
269
+ "group_size": 64
270
+ },
271
+ "model.layers.14.mlp.down_proj": {
272
+ "bits": 4,
273
+ "group_size": 64
274
+ },
275
+ "model.layers.14.mlp.gate_proj": {
276
+ "bits": 4,
277
+ "group_size": 64
278
+ },
279
+ "model.layers.14.self_attn.o_proj": {
280
+ "bits": 8,
281
+ "group_size": 64
282
+ },
283
+ "model.layers.14.self_attn.v_proj": {
284
+ "bits": 8,
285
+ "group_size": 64
286
+ },
287
+ "model.layers.14.self_attn.k_proj": {
288
+ "bits": 4,
289
+ "group_size": 64
290
+ },
291
+ "model.layers.14.self_attn.q_proj": {
292
+ "bits": 4,
293
+ "group_size": 64
294
+ },
295
+ "model.layers.13.mlp.up_proj": {
296
+ "bits": 8,
297
+ "group_size": 64
298
+ },
299
+ "model.layers.13.mlp.down_proj": {
300
+ "bits": 4,
301
+ "group_size": 64
302
+ },
303
+ "model.layers.13.mlp.gate_proj": {
304
+ "bits": 4,
305
+ "group_size": 64
306
+ },
307
+ "model.layers.13.self_attn.o_proj": {
308
+ "bits": 8,
309
+ "group_size": 64
310
+ },
311
+ "model.layers.13.self_attn.v_proj": {
312
+ "bits": 8,
313
+ "group_size": 64
314
+ },
315
+ "model.layers.13.self_attn.k_proj": {
316
+ "bits": 4,
317
+ "group_size": 64
318
+ },
319
+ "model.layers.13.self_attn.q_proj": {
320
+ "bits": 4,
321
+ "group_size": 64
322
+ },
323
+ "model.layers.12.mlp.up_proj": {
324
+ "bits": 4,
325
+ "group_size": 64
326
+ },
327
+ "model.layers.12.mlp.down_proj": {
328
+ "bits": 4,
329
+ "group_size": 64
330
+ },
331
+ "model.layers.12.mlp.gate_proj": {
332
+ "bits": 4,
333
+ "group_size": 64
334
+ },
335
+ "model.layers.12.self_attn.o_proj": {
336
+ "bits": 8,
337
+ "group_size": 64
338
+ },
339
+ "model.layers.12.self_attn.v_proj": {
340
+ "bits": 8,
341
+ "group_size": 64
342
+ },
343
+ "model.layers.12.self_attn.k_proj": {
344
+ "bits": 4,
345
+ "group_size": 64
346
+ },
347
+ "model.layers.12.self_attn.q_proj": {
348
+ "bits": 4,
349
+ "group_size": 64
350
+ },
351
+ "model.layers.11.mlp.up_proj": {
352
+ "bits": 4,
353
+ "group_size": 64
354
+ },
355
+ "model.layers.11.mlp.down_proj": {
356
+ "bits": 4,
357
+ "group_size": 64
358
+ },
359
+ "model.layers.11.mlp.gate_proj": {
360
+ "bits": 4,
361
+ "group_size": 64
362
+ },
363
+ "model.layers.11.self_attn.o_proj": {
364
+ "bits": 8,
365
+ "group_size": 64
366
+ },
367
+ "model.layers.11.self_attn.v_proj": {
368
+ "bits": 8,
369
+ "group_size": 64
370
+ },
371
+ "model.layers.11.self_attn.k_proj": {
372
+ "bits": 4,
373
+ "group_size": 64
374
+ },
375
+ "model.layers.11.self_attn.q_proj": {
376
+ "bits": 4,
377
+ "group_size": 64
378
+ },
379
+ "model.layers.10.mlp.up_proj": {
380
+ "bits": 8,
381
+ "group_size": 64
382
+ },
383
+ "model.layers.10.mlp.down_proj": {
384
+ "bits": 4,
385
+ "group_size": 64
386
+ },
387
+ "model.layers.10.mlp.gate_proj": {
388
+ "bits": 4,
389
+ "group_size": 64
390
+ },
391
+ "model.layers.10.self_attn.o_proj": {
392
+ "bits": 8,
393
+ "group_size": 64
394
+ },
395
+ "model.layers.10.self_attn.v_proj": {
396
+ "bits": 8,
397
+ "group_size": 64
398
+ },
399
+ "model.layers.10.self_attn.k_proj": {
400
+ "bits": 4,
401
+ "group_size": 64
402
+ },
403
+ "model.layers.10.self_attn.q_proj": {
404
+ "bits": 4,
405
+ "group_size": 64
406
+ },
407
+ "model.layers.9.mlp.up_proj": {
408
+ "bits": 4,
409
+ "group_size": 64
410
+ },
411
+ "model.layers.9.mlp.down_proj": {
412
+ "bits": 4,
413
+ "group_size": 64
414
+ },
415
+ "model.layers.9.mlp.gate_proj": {
416
+ "bits": 4,
417
+ "group_size": 64
418
+ },
419
+ "model.layers.9.self_attn.o_proj": {
420
+ "bits": 8,
421
+ "group_size": 64
422
+ },
423
+ "model.layers.9.self_attn.v_proj": {
424
+ "bits": 8,
425
+ "group_size": 64
426
+ },
427
+ "model.layers.9.self_attn.k_proj": {
428
+ "bits": 4,
429
+ "group_size": 64
430
+ },
431
+ "model.layers.9.self_attn.q_proj": {
432
+ "bits": 4,
433
+ "group_size": 64
434
+ },
435
+ "model.layers.8.mlp.up_proj": {
436
+ "bits": 4,
437
+ "group_size": 64
438
+ },
439
+ "model.layers.8.mlp.down_proj": {
440
+ "bits": 4,
441
+ "group_size": 64
442
+ },
443
+ "model.layers.8.mlp.gate_proj": {
444
+ "bits": 4,
445
+ "group_size": 64
446
+ },
447
+ "model.layers.8.self_attn.o_proj": {
448
+ "bits": 8,
449
+ "group_size": 64
450
+ },
451
+ "model.layers.8.self_attn.v_proj": {
452
+ "bits": 4,
453
+ "group_size": 64
454
+ },
455
+ "model.layers.8.self_attn.k_proj": {
456
+ "bits": 4,
457
+ "group_size": 64
458
+ },
459
+ "model.layers.8.self_attn.q_proj": {
460
+ "bits": 8,
461
+ "group_size": 64
462
+ },
463
+ "model.layers.7.mlp.up_proj": {
464
+ "bits": 8,
465
+ "group_size": 64
466
+ },
467
+ "model.layers.7.mlp.down_proj": {
468
+ "bits": 4,
469
+ "group_size": 64
470
+ },
471
+ "model.layers.7.mlp.gate_proj": {
472
+ "bits": 4,
473
+ "group_size": 64
474
+ },
475
+ "model.layers.7.self_attn.o_proj": {
476
+ "bits": 8,
477
+ "group_size": 64
478
+ },
479
+ "model.layers.7.self_attn.v_proj": {
480
+ "bits": 4,
481
+ "group_size": 64
482
+ },
483
+ "model.layers.7.self_attn.k_proj": {
484
+ "bits": 4,
485
+ "group_size": 64
486
+ },
487
+ "model.layers.7.self_attn.q_proj": {
488
+ "bits": 8,
489
+ "group_size": 64
490
+ },
491
+ "model.layers.6.mlp.up_proj": {
492
+ "bits": 4,
493
+ "group_size": 64
494
+ },
495
+ "model.layers.6.mlp.down_proj": {
496
+ "bits": 4,
497
+ "group_size": 64
498
+ },
499
+ "model.layers.6.mlp.gate_proj": {
500
+ "bits": 4,
501
+ "group_size": 64
502
+ },
503
+ "model.layers.6.self_attn.o_proj": {
504
+ "bits": 8,
505
+ "group_size": 64
506
+ },
507
+ "model.layers.6.self_attn.v_proj": {
508
+ "bits": 8,
509
+ "group_size": 64
510
+ },
511
+ "model.layers.6.self_attn.k_proj": {
512
+ "bits": 4,
513
+ "group_size": 64
514
+ },
515
+ "model.layers.6.self_attn.q_proj": {
516
+ "bits": 4,
517
+ "group_size": 64
518
+ },
519
+ "model.layers.5.mlp.up_proj": {
520
+ "bits": 4,
521
+ "group_size": 64
522
+ },
523
+ "model.layers.5.mlp.down_proj": {
524
+ "bits": 4,
525
+ "group_size": 64
526
+ },
527
+ "model.layers.5.mlp.gate_proj": {
528
+ "bits": 4,
529
+ "group_size": 64
530
+ },
531
+ "model.layers.5.self_attn.o_proj": {
532
+ "bits": 4,
533
+ "group_size": 64
534
+ },
535
+ "model.layers.5.self_attn.v_proj": {
536
+ "bits": 8,
537
+ "group_size": 64
538
+ },
539
+ "model.layers.5.self_attn.k_proj": {
540
+ "bits": 4,
541
+ "group_size": 64
542
+ },
543
+ "model.layers.5.self_attn.q_proj": {
544
+ "bits": 8,
545
+ "group_size": 64
546
+ },
547
+ "model.layers.4.mlp.up_proj": {
548
+ "bits": 4,
549
+ "group_size": 64
550
+ },
551
+ "model.layers.4.mlp.down_proj": {
552
+ "bits": 8,
553
+ "group_size": 64
554
+ },
555
+ "model.layers.4.mlp.gate_proj": {
556
+ "bits": 4,
557
+ "group_size": 64
558
+ },
559
+ "model.layers.4.self_attn.o_proj": {
560
+ "bits": 8,
561
+ "group_size": 64
562
+ },
563
+ "model.layers.4.self_attn.v_proj": {
564
+ "bits": 8,
565
+ "group_size": 64
566
+ },
567
+ "model.layers.4.self_attn.k_proj": {
568
+ "bits": 4,
569
+ "group_size": 64
570
+ },
571
+ "model.layers.4.self_attn.q_proj": {
572
+ "bits": 4,
573
+ "group_size": 64
574
+ },
575
+ "model.layers.3.mlp.up_proj": {
576
+ "bits": 4,
577
+ "group_size": 64
578
+ },
579
+ "model.layers.3.mlp.down_proj": {
580
+ "bits": 4,
581
+ "group_size": 64
582
+ },
583
+ "model.layers.3.mlp.gate_proj": {
584
+ "bits": 4,
585
+ "group_size": 64
586
+ },
587
+ "model.layers.3.self_attn.o_proj": {
588
+ "bits": 8,
589
+ "group_size": 64
590
+ },
591
+ "model.layers.3.self_attn.v_proj": {
592
+ "bits": 8,
593
+ "group_size": 64
594
+ },
595
+ "model.layers.3.self_attn.k_proj": {
596
+ "bits": 4,
597
+ "group_size": 64
598
+ },
599
+ "model.layers.3.self_attn.q_proj": {
600
+ "bits": 4,
601
+ "group_size": 64
602
+ },
603
+ "model.layers.2.mlp.up_proj": {
604
+ "bits": 4,
605
+ "group_size": 64
606
+ },
607
+ "model.layers.2.mlp.down_proj": {
608
+ "bits": 4,
609
+ "group_size": 64
610
+ },
611
+ "model.layers.2.mlp.gate_proj": {
612
+ "bits": 4,
613
+ "group_size": 64
614
+ },
615
+ "model.layers.2.self_attn.o_proj": {
616
+ "bits": 8,
617
+ "group_size": 64
618
+ },
619
+ "model.layers.2.self_attn.v_proj": {
620
+ "bits": 8,
621
+ "group_size": 64
622
+ },
623
+ "model.layers.2.self_attn.k_proj": {
624
+ "bits": 4,
625
+ "group_size": 64
626
+ },
627
+ "model.layers.2.self_attn.q_proj": {
628
+ "bits": 4,
629
+ "group_size": 64
630
+ },
631
+ "model.layers.1.mlp.up_proj": {
632
+ "bits": 4,
633
+ "group_size": 64
634
+ },
635
+ "model.layers.1.mlp.down_proj": {
636
+ "bits": 8,
637
+ "group_size": 64
638
+ },
639
+ "model.layers.1.mlp.gate_proj": {
640
+ "bits": 4,
641
+ "group_size": 64
642
+ },
643
+ "model.layers.1.self_attn.o_proj": {
644
+ "bits": 8,
645
+ "group_size": 64
646
+ },
647
+ "model.layers.1.self_attn.v_proj": {
648
+ "bits": 8,
649
+ "group_size": 64
650
+ },
651
+ "model.layers.1.self_attn.k_proj": {
652
+ "bits": 4,
653
+ "group_size": 64
654
+ },
655
+ "model.layers.1.self_attn.q_proj": {
656
+ "bits": 4,
657
+ "group_size": 64
658
+ },
659
+ "model.layers.0.mlp.up_proj": {
660
+ "bits": 8,
661
+ "group_size": 64
662
+ },
663
+ "model.layers.0.mlp.down_proj": {
664
+ "bits": 8,
665
+ "group_size": 64
666
+ },
667
+ "model.layers.0.mlp.gate_proj": {
668
+ "bits": 8,
669
+ "group_size": 64
670
+ },
671
+ "model.layers.0.self_attn.o_proj": {
672
+ "bits": 8,
673
+ "group_size": 64
674
+ },
675
+ "model.layers.0.self_attn.v_proj": {
676
+ "bits": 8,
677
+ "group_size": 64
678
+ },
679
+ "model.layers.0.self_attn.k_proj": {
680
+ "bits": 8,
681
+ "group_size": 64
682
+ },
683
+ "model.layers.0.self_attn.q_proj": {
684
+ "bits": 8,
685
+ "group_size": 64
686
+ }
687
+ }
688
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": null,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<s>",
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "</s>",
7
+ "is_local": true,
8
+ "legacy": true,
9
+ "local_files_only": false,
10
+ "model_max_length": 1000000000000000019884624838656,
11
+ "pad_token": "</s>",
12
+ "sp_model_kwargs": {},
13
+ "spaces_between_special_tokens": false,
14
+ "tokenizer_class": "TokenizersBackend",
15
+ "unk_token": "<unk>",
16
+ "use_default_system_prompt": false
17
+ }