LifeWiki-ai
/

Safetensors
qwen3
LifeWiki ethanlshen commited on
Commit
febb9f1
·
0 Parent(s):

Duplicate from allenai/SERA-32B

Browse files

Co-authored-by: Ethan Shen <ethanlshen@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,192 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # SERA-32B
5
+
6
+ **SERA-32B** is the first model in Ai2's [Open Coding Agents](https://huggingface.co/collections/allenai/open-coding-agents) series. It is a state-of-the-art open-source coding agent that achieves **49.5%** on SWE-bench Verified, matching the performance of frontier open models like Devstral-Small-2 (24B) and larger models like GLM-4.5-Air (110B). SERA-32B was trained using Soft Verified Generation (SVG), a simple and efficient method that is **26x cheaper than reinforcement learning** and **57x cheaper than previous synthetic data methods** to reach equivalent performance. The total cost for data generation and training is approximately $2,000 (40 GPU-days).
7
+
8
+ - **Paper:** [https://allenai.org/papers/opencodingagents](https://allenai.org/papers/opencodingagents)
9
+ - **Code:** [https://github.com/allenai/SERA](https://github.com/allenai/SERA)
10
+ - **CLI:** [https://github.com/allenai/sera-cli](https://github.com/allenai/sera-cli) | [PyPI](https://pypi.org/project/ai2-sera-cli/)
11
+ - **Collection:** [https://huggingface.co/collections/allenai/open-coding-agents](https://huggingface.co/collections/allenai/open-coding-agents)
12
+ - **Dataset:** 25000 from [https://huggingface.co/datasets/allenai/Sera-4.6-Lite-T2](https://huggingface.co/datasets/allenai/Sera-4.6-Lite-T2)
13
+
14
+ ## Model Variants
15
+
16
+ | Model | HuggingFace | Base | Teacher | SWE-bench Verified |
17
+ |-------|-------------|------|---------|-------------------|
18
+ | **SERA-32B** | [allenai/SERA-32B](https://huggingface.co/allenai/SERA-32B) | Qwen 3-32B | GLM-4.6 | 49.5% ± 1.9% |
19
+ | SERA-32B-GA | [allenai/SERA-32B-GA](https://huggingface.co/allenai/SERA-32B-GA) | Qwen 3-32B | GLM-4.5-Air | 46.6% ± 0.7% |
20
+ | SERA-8B | [allenai/SERA-8B](https://huggingface.co/allenai/SERA-8B) | Qwen 3-8B | GLM-4.6 | 31.7% ± 0.9% |
21
+ | SERA-8B-GA | [allenai/SERA-8B-GA](https://huggingface.co/allenai/SERA-8B-GA) | Qwen 3-8B | GLM-4.5-Air | 31.7% ± 0.4% |
22
+
23
+ All results evaluated at 32K context length. Standard deviations computed over 3 random seeds.
24
+
25
+ ## Performance
26
+
27
+ ### SWE-bench Verified (32K Context)
28
+
29
+ | Model | Type | Resolve Rate |
30
+ |-------|------|--------------|
31
+ | SkyRL-8B | Open-source | 9.4% |
32
+ | Nex-N1-8B | Open-source | 20.3% |
33
+ | **SERA-8B** | **Open-source** | **31.7%** |
34
+ | Qwen 3-32B (base) | Open-weight | 24.4% |
35
+ | SWE-smith | Open-source | 32.6% |
36
+ | SkyRL-Agent | Open-source | 39.4% |
37
+ | DeepSWE | Open-source | 42.2% |
38
+ | **SERA-32B** | **Open-source** | **49.5%** |
39
+ | Devstral-Small-2 (24B) | Open-weight | 50.0% |
40
+ | GLM-4.5-Air (110B) | Open-weight | 50.5% |
41
+
42
+ *Open-source: code, model weights, and data publicly available. Open-weight: model weights available but training data/code not fully released.*
43
+
44
+ ## Quickstart
45
+
46
+ The easiest way to use SERA is with the `sera` CLI, which provides seamless integration with Claude Code:
47
+
48
+ ```bash
49
+ # Install the CLI
50
+ uv tool install ai2-sera-cli
51
+
52
+ # Option 1: Deploy on Modal (recommended for trying out)
53
+ modal setup # one-time setup
54
+ sera --modal
55
+
56
+ # Option 2: Use an existing endpoint
57
+ export SERA_API_KEY=<your_api_key>
58
+ sera --endpoint <endpoint_url>
59
+ ```
60
+
61
+ The first run with `--modal` takes approximately 10 minutes to download the model (~65GB) and compile. Subsequent runs start in 1-2 minutes.
62
+
63
+ For more deployment options, see the [sera-cli documentation](https://github.com/allenai/sera-cli).
64
+
65
+ ## Self Hosting
66
+
67
+ ```
68
+ vllm serve allenai/SERA-32B --port 8001 \
69
+ --tensor-parallel-size 4 \
70
+ --max-model-len 32768 \
71
+ --trust-remote-code \
72
+ --enable-auto-tool-choice \
73
+ --tool-call-parser hermes \
74
+ --enforce-eager \
75
+ --seed 42 \
76
+ --disable-cascade-attn
77
+ ```
78
+
79
+ ## Model Details
80
+
81
+ | | |
82
+ |---|---|
83
+ | **Developer** | Allen Institute for AI (Ai2) |
84
+ | **Authors** | Ethan Shen, Daniel Tormoen, Saurabh Shah, Ali Farhadi, Tim Dettmers |
85
+ | **Base Model** | Qwen 3-32B |
86
+ | **Teacher Model** | GLM-4.6 (357B) |
87
+ | **Model Type** | Coding agent / Software engineering agent |
88
+ | **Training Method** | Supervised fine-tuning on synthetic agent trajectories |
89
+ | **Context Length** | 32K tokens |
90
+ | **License** | Apache 2.0 |
91
+
92
+ ### Training Configuration
93
+
94
+ | | |
95
+ |---|---|
96
+ | **Epochs** | 3 |
97
+ | **Learning Rate** | 1e-5 |
98
+ | **Weight Decay** | 0.01 |
99
+ | **Max Sequence Length** | 32,768 tokens |
100
+ | **Training Framework** | Axolotl |
101
+ | **Inference Framework** | vLLM |
102
+ | **Compute** | 40 GPU-days (~$2,000) |
103
+
104
+ ## Training Data
105
+
106
+ SERA-32B is trained on 25,000 synthetic coding agent trajectories generated using **Soft Verified Generation (SVG)**. SVG is a two-rollout pipeline:
107
+
108
+ 1. **First rollout:** A teacher model makes a change to a codebase starting from a randomly selected function
109
+ 2. **Synthetic PR:** The trajectory is converted into a pull request description
110
+ 3. **Second rollout:** The teacher attempts to reproduce the change given only the PR description
111
+ 4. **Soft verification:** Patches are compared using line-level recall (no test execution required)
112
+
113
+ This approach removes the need for test infrastructure and enables data generation from any repository.
114
+
115
+ - **Source Repositories:** 121 Python codebases
116
+ - **Teacher Model:** GLM-4.6 (357B)
117
+ - **Dataset:** [Coming soon]
118
+
119
+ ## Intended Use
120
+
121
+ - **Automated software engineering:** Bug fixes, feature implementation, refactoring
122
+ - **Repository specialization:** Fine-tune on private codebases to create specialized coding agents (~8,000 trajectories / $1,300)
123
+ - **Research:** Studying coding agents, data generation methods, and agent behavior
124
+
125
+ ## Limitations
126
+
127
+ - **SWE-bench training artifact:** The model was trained on SWE-bench-style tasks and may attempt to call a nonexistent `submit` tool when finished editing. The sera-cli proxy handles this automatically.
128
+ - **Evaluation scope:** Only validated on SWE-bench Verified (Python repositories). Performance on other languages or benchmarks is unknown.
129
+ - **Teacher bound:** Performance is largely bounded by the teacher model (GLM-4.6) capability.
130
+ - **Statistical variance:** Results computed over 3 seeds. Effects smaller than 2-3% should be interpreted with caution.
131
+ - **Model-specific:** Experiments use Qwen 3 as the base model. Generalization to other model families is not validated.
132
+
133
+ ## Bias, Risks, and Limitations
134
+
135
+ Like any language model without safety filtering, SERA can be prompted to generate harmful or insecure code. Users should be aware of the following risks:
136
+
137
+ - **Code security:** May generate code with security vulnerabilities (e.g., injection attacks, insecure defaults). All generated code should be reviewed before deployment.
138
+ - **Accuracy:** May produce incorrect or buggy code. Outputs should be tested and verified.
139
+ - **Inherited biases:** May reflect biases present in the Qwen 3-32B base model and GLM-4.6 teacher model.
140
+ - **Misuse potential:** Could potentially be used to generate malicious code or identify vulnerabilities for exploitation.
141
+
142
+ ## Responsible Use
143
+
144
+ This model is intended for research and educational use. Users should adhere to Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). Key principles include:
145
+
146
+ - Use the model for beneficial purposes
147
+ - Review and test all generated code before deployment
148
+ - Do not use to generate malicious software or exploit vulnerabilities
149
+ - Consider the potential impact of automated code generation in your context
150
+
151
+ ## Hardware Requirements
152
+
153
+ | Configuration | GPU | Notes |
154
+ |--------------|-----|-------|
155
+ | Minimum | 1× 80GB GPU (A100, H100) | 32K context |
156
+ | Recommended | 1× H100 | Best performance |
157
+
158
+ Quantization (AWQ, GPTQ) can reduce memory requirements if needed.
159
+
160
+ ## License
161
+
162
+ This model is licensed under **Apache 2.0**. It is intended for research and educational use and may be used commercially in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use).
163
+
164
+ ## Citation
165
+
166
+ ```bibtex
167
+ @misc{shen2026serasoftverifiedefficientrepository,
168
+ title={SERA: Soft-Verified Efficient Repository Agents},
169
+ author={Ethan Shen and Danny Tormoen and Saurabh Shah and Ali Farhadi and Tim Dettmers},
170
+ year={2026},
171
+ eprint={2601.20789},
172
+ archivePrefix={arXiv},
173
+ primaryClass={cs.CL},
174
+ url={https://arxiv.org/abs/2601.20789},
175
+ }
176
+ ```
177
+
178
+ ## Contact
179
+
180
+ - **Email:** ethans03@cs.washington.edu, dettmers@cmu.edu
181
+ - **Issues:** [GitHub Issues](https://github.com/allenai/SERA/issues)
182
+
183
+ SERA / Open Coding Agents - Disclaimer Text
184
+
185
+ Bias, Risks, and Limitations
186
+ SERA-32B/SERA-8B is an open coding agent model released for research and educational purposes without any safety filtering or safety tuning. As a research artifact, this model is not suitable for real-world use without significant human oversight. Like other coding agents, this model may propagate biases present in training data or generate incorrect or insecure code. Security risks include prompt injection and data leakage. Always verify code outputs and manage context windows to avoid disclosing sensitive data or information.
187
+
188
+
189
+ Bias, Risks, and Limitations
190
+ Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified.
191
+ License
192
+ This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
config.json ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "dtype": "bfloat16",
8
+ "eos_token_id": 151645,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 5120,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 25600,
14
+ "layer_types": [
15
+ "full_attention",
16
+ "full_attention",
17
+ "full_attention",
18
+ "full_attention",
19
+ "full_attention",
20
+ "full_attention",
21
+ "full_attention",
22
+ "full_attention",
23
+ "full_attention",
24
+ "full_attention",
25
+ "full_attention",
26
+ "full_attention",
27
+ "full_attention",
28
+ "full_attention",
29
+ "full_attention",
30
+ "full_attention",
31
+ "full_attention",
32
+ "full_attention",
33
+ "full_attention",
34
+ "full_attention",
35
+ "full_attention",
36
+ "full_attention",
37
+ "full_attention",
38
+ "full_attention",
39
+ "full_attention",
40
+ "full_attention",
41
+ "full_attention",
42
+ "full_attention",
43
+ "full_attention",
44
+ "full_attention",
45
+ "full_attention",
46
+ "full_attention",
47
+ "full_attention",
48
+ "full_attention",
49
+ "full_attention",
50
+ "full_attention",
51
+ "full_attention",
52
+ "full_attention",
53
+ "full_attention",
54
+ "full_attention",
55
+ "full_attention",
56
+ "full_attention",
57
+ "full_attention",
58
+ "full_attention",
59
+ "full_attention",
60
+ "full_attention",
61
+ "full_attention",
62
+ "full_attention",
63
+ "full_attention",
64
+ "full_attention",
65
+ "full_attention",
66
+ "full_attention",
67
+ "full_attention",
68
+ "full_attention",
69
+ "full_attention",
70
+ "full_attention",
71
+ "full_attention",
72
+ "full_attention",
73
+ "full_attention",
74
+ "full_attention",
75
+ "full_attention",
76
+ "full_attention",
77
+ "full_attention",
78
+ "full_attention"
79
+ ],
80
+ "max_position_embeddings": 40960,
81
+ "max_window_layers": 64,
82
+ "model_type": "qwen3",
83
+ "num_attention_heads": 64,
84
+ "num_hidden_layers": 64,
85
+ "num_key_value_heads": 8,
86
+ "pad_token_id": 151643,
87
+ "rms_norm_eps": 1e-06,
88
+ "rope_scaling": null,
89
+ "rope_theta": 1000000,
90
+ "sliding_window": null,
91
+ "tie_word_embeddings": false,
92
+ "transformers_version": "4.57.1",
93
+ "use_cache": false,
94
+ "use_sliding_window": false,
95
+ "vocab_size": 151936
96
+ }
debug.log ADDED
@@ -0,0 +1 @@
 
 
1
+
generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_sample": false,
3
+ "eos_token_id": [
4
+ 151645,
5
+ 151643
6
+ ],
7
+ "pad_token_id": 151643,
8
+ "temperature": 0.0,
9
+ "top_k": 20,
10
+ "top_p": 0.95,
11
+ "transformers_version": "4.57.1"
12
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ab311717862c6eef6fafea1f7de952f432c6381674f0872d3286a675b1c9953
3
+ size 4932307552
model-00002-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfaadb1126993c7ba0b7993faeb394d1a726194e4f81cbb61c31098b53b8fd53
3
+ size 4875989664
model-00003-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44b10889c9bdb988fe841138b9320b42d1d9ad473552d9c9fff70193803c1323
3
+ size 4875989688
model-00004-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73322127538df641e60822c12e2996ab153a7dd93566ba7c050fd62ae8584374
3
+ size 4875989720
model-00005-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e05da331e90094e2ee206e721eaec9e30a2a6f66c368c89e7d78698f1b70de3
3
+ size 4875989720
model-00006-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8efda6aa45ff8fe828eb732b27a1dd46105e8c59ccd1b43fff30e125d3cf5250
3
+ size 4875989720
model-00007-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4cc175e32d609124526fc39e3fbe34ed266eef05687f73378a4f36c2a3e1ec6
3
+ size 4875989720
model-00008-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3faa4dc109f06d9b7898111af54b63cb85caef66e0e6a54a7261de436ab706c
3
+ size 4875989720
model-00009-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f4d176613a9bce786d4deb0714fd75a0f1e0c119534cb33d3ecf94ec06f6a94
3
+ size 4875989720
model-00010-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba0d5efe3ef79626b93ea99a448abb581e33db0207c96e431711001b5fbce22f
3
+ size 4875989720
model-00011-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42a132c8e6cacb9c0f47ccf149f2585af9bbc7fb5d6f4336a6db5513fbf508d1
3
+ size 4875989720
model-00012-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a377e81b82892e813a5e72f10fdeda9db3824453871cd88644998416708a76a7
3
+ size 4875989720
model-00013-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d92453ee91f5cd85542ab83748becf762da21d902e0e73eb8ab08ba205166c9
3
+ size 4875989720
model-00014-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:676b49f85e4b13e1bca2931bf964650f05bdc1d11bf46cf5ec140b305e460220
3
+ size 2080144016
model.safetensors.index.json ADDED
@@ -0,0 +1,715 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_parameters": 676864,
4
+ "total_size": 65524246528
5
+ },
6
+ "weight_map": {
7
+ "lm_head.weight": "model-00014-of-00014.safetensors",
8
+ "model.embed_tokens.weight": "model-00001-of-00014.safetensors",
9
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00014.safetensors",
10
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
11
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
12
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00014.safetensors",
13
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
14
+ "model.layers.0.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
15
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
16
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
17
+ "model.layers.0.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
18
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
19
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
20
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00014.safetensors",
21
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
22
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
23
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00014.safetensors",
24
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
25
+ "model.layers.1.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
26
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
27
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
28
+ "model.layers.1.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
29
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
30
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
31
+ "model.layers.10.input_layernorm.weight": "model-00003-of-00014.safetensors",
32
+ "model.layers.10.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
33
+ "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
34
+ "model.layers.10.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
35
+ "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
36
+ "model.layers.10.self_attn.k_norm.weight": "model-00003-of-00014.safetensors",
37
+ "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
38
+ "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
39
+ "model.layers.10.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
40
+ "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
41
+ "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
42
+ "model.layers.11.input_layernorm.weight": "model-00003-of-00014.safetensors",
43
+ "model.layers.11.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
44
+ "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
45
+ "model.layers.11.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
46
+ "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
47
+ "model.layers.11.self_attn.k_norm.weight": "model-00003-of-00014.safetensors",
48
+ "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
49
+ "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
50
+ "model.layers.11.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
51
+ "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
52
+ "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
53
+ "model.layers.12.input_layernorm.weight": "model-00003-of-00014.safetensors",
54
+ "model.layers.12.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
55
+ "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
56
+ "model.layers.12.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
57
+ "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
58
+ "model.layers.12.self_attn.k_norm.weight": "model-00003-of-00014.safetensors",
59
+ "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
60
+ "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
61
+ "model.layers.12.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
62
+ "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
63
+ "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
64
+ "model.layers.13.input_layernorm.weight": "model-00004-of-00014.safetensors",
65
+ "model.layers.13.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
66
+ "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
67
+ "model.layers.13.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
68
+ "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
69
+ "model.layers.13.self_attn.k_norm.weight": "model-00003-of-00014.safetensors",
70
+ "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
71
+ "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
72
+ "model.layers.13.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
73
+ "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
74
+ "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
75
+ "model.layers.14.input_layernorm.weight": "model-00004-of-00014.safetensors",
76
+ "model.layers.14.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
77
+ "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
78
+ "model.layers.14.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
79
+ "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
80
+ "model.layers.14.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
81
+ "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
82
+ "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
83
+ "model.layers.14.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
84
+ "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
85
+ "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
86
+ "model.layers.15.input_layernorm.weight": "model-00004-of-00014.safetensors",
87
+ "model.layers.15.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
88
+ "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
89
+ "model.layers.15.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
90
+ "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
91
+ "model.layers.15.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
92
+ "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
93
+ "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
94
+ "model.layers.15.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
95
+ "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
96
+ "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
97
+ "model.layers.16.input_layernorm.weight": "model-00004-of-00014.safetensors",
98
+ "model.layers.16.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
99
+ "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
100
+ "model.layers.16.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
101
+ "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
102
+ "model.layers.16.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
103
+ "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
104
+ "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
105
+ "model.layers.16.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
106
+ "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
107
+ "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
108
+ "model.layers.17.input_layernorm.weight": "model-00004-of-00014.safetensors",
109
+ "model.layers.17.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
110
+ "model.layers.17.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
111
+ "model.layers.17.mlp.up_proj.weight": "model-00004-of-00014.safetensors",
112
+ "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
113
+ "model.layers.17.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
114
+ "model.layers.17.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
115
+ "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
116
+ "model.layers.17.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
117
+ "model.layers.17.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
118
+ "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
119
+ "model.layers.18.input_layernorm.weight": "model-00005-of-00014.safetensors",
120
+ "model.layers.18.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
121
+ "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00014.safetensors",
122
+ "model.layers.18.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
123
+ "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
124
+ "model.layers.18.self_attn.k_norm.weight": "model-00004-of-00014.safetensors",
125
+ "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
126
+ "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
127
+ "model.layers.18.self_attn.q_norm.weight": "model-00004-of-00014.safetensors",
128
+ "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
129
+ "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
130
+ "model.layers.19.input_layernorm.weight": "model-00005-of-00014.safetensors",
131
+ "model.layers.19.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
132
+ "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
133
+ "model.layers.19.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
134
+ "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
135
+ "model.layers.19.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
136
+ "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
137
+ "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
138
+ "model.layers.19.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
139
+ "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
140
+ "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
141
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00014.safetensors",
142
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
143
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
144
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00014.safetensors",
145
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
146
+ "model.layers.2.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
147
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
148
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
149
+ "model.layers.2.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
150
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
151
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
152
+ "model.layers.20.input_layernorm.weight": "model-00005-of-00014.safetensors",
153
+ "model.layers.20.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
154
+ "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
155
+ "model.layers.20.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
156
+ "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
157
+ "model.layers.20.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
158
+ "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
159
+ "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
160
+ "model.layers.20.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
161
+ "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
162
+ "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
163
+ "model.layers.21.input_layernorm.weight": "model-00005-of-00014.safetensors",
164
+ "model.layers.21.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
165
+ "model.layers.21.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
166
+ "model.layers.21.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
167
+ "model.layers.21.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
168
+ "model.layers.21.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
169
+ "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
170
+ "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
171
+ "model.layers.21.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
172
+ "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
173
+ "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
174
+ "model.layers.22.input_layernorm.weight": "model-00005-of-00014.safetensors",
175
+ "model.layers.22.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
176
+ "model.layers.22.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
177
+ "model.layers.22.mlp.up_proj.weight": "model-00005-of-00014.safetensors",
178
+ "model.layers.22.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
179
+ "model.layers.22.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
180
+ "model.layers.22.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
181
+ "model.layers.22.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
182
+ "model.layers.22.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
183
+ "model.layers.22.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
184
+ "model.layers.22.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
185
+ "model.layers.23.input_layernorm.weight": "model-00006-of-00014.safetensors",
186
+ "model.layers.23.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
187
+ "model.layers.23.mlp.gate_proj.weight": "model-00005-of-00014.safetensors",
188
+ "model.layers.23.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
189
+ "model.layers.23.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
190
+ "model.layers.23.self_attn.k_norm.weight": "model-00005-of-00014.safetensors",
191
+ "model.layers.23.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
192
+ "model.layers.23.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
193
+ "model.layers.23.self_attn.q_norm.weight": "model-00005-of-00014.safetensors",
194
+ "model.layers.23.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
195
+ "model.layers.23.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
196
+ "model.layers.24.input_layernorm.weight": "model-00006-of-00014.safetensors",
197
+ "model.layers.24.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
198
+ "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
199
+ "model.layers.24.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
200
+ "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
201
+ "model.layers.24.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
202
+ "model.layers.24.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
203
+ "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
204
+ "model.layers.24.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
205
+ "model.layers.24.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
206
+ "model.layers.24.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
207
+ "model.layers.25.input_layernorm.weight": "model-00006-of-00014.safetensors",
208
+ "model.layers.25.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
209
+ "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
210
+ "model.layers.25.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
211
+ "model.layers.25.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
212
+ "model.layers.25.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
213
+ "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
214
+ "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
215
+ "model.layers.25.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
216
+ "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
217
+ "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
218
+ "model.layers.26.input_layernorm.weight": "model-00006-of-00014.safetensors",
219
+ "model.layers.26.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
220
+ "model.layers.26.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
221
+ "model.layers.26.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
222
+ "model.layers.26.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
223
+ "model.layers.26.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
224
+ "model.layers.26.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
225
+ "model.layers.26.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
226
+ "model.layers.26.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
227
+ "model.layers.26.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
228
+ "model.layers.26.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
229
+ "model.layers.27.input_layernorm.weight": "model-00006-of-00014.safetensors",
230
+ "model.layers.27.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
231
+ "model.layers.27.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
232
+ "model.layers.27.mlp.up_proj.weight": "model-00006-of-00014.safetensors",
233
+ "model.layers.27.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
234
+ "model.layers.27.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
235
+ "model.layers.27.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
236
+ "model.layers.27.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
237
+ "model.layers.27.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
238
+ "model.layers.27.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
239
+ "model.layers.27.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
240
+ "model.layers.28.input_layernorm.weight": "model-00007-of-00014.safetensors",
241
+ "model.layers.28.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
242
+ "model.layers.28.mlp.gate_proj.weight": "model-00006-of-00014.safetensors",
243
+ "model.layers.28.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
244
+ "model.layers.28.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
245
+ "model.layers.28.self_attn.k_norm.weight": "model-00006-of-00014.safetensors",
246
+ "model.layers.28.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
247
+ "model.layers.28.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
248
+ "model.layers.28.self_attn.q_norm.weight": "model-00006-of-00014.safetensors",
249
+ "model.layers.28.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
250
+ "model.layers.28.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
251
+ "model.layers.29.input_layernorm.weight": "model-00007-of-00014.safetensors",
252
+ "model.layers.29.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
253
+ "model.layers.29.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
254
+ "model.layers.29.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
255
+ "model.layers.29.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
256
+ "model.layers.29.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
257
+ "model.layers.29.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
258
+ "model.layers.29.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
259
+ "model.layers.29.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
260
+ "model.layers.29.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
261
+ "model.layers.29.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
262
+ "model.layers.3.input_layernorm.weight": "model-00002-of-00014.safetensors",
263
+ "model.layers.3.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
264
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00014.safetensors",
265
+ "model.layers.3.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
266
+ "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
267
+ "model.layers.3.self_attn.k_norm.weight": "model-00001-of-00014.safetensors",
268
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
269
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
270
+ "model.layers.3.self_attn.q_norm.weight": "model-00001-of-00014.safetensors",
271
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
272
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
273
+ "model.layers.30.input_layernorm.weight": "model-00007-of-00014.safetensors",
274
+ "model.layers.30.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
275
+ "model.layers.30.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
276
+ "model.layers.30.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
277
+ "model.layers.30.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
278
+ "model.layers.30.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
279
+ "model.layers.30.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
280
+ "model.layers.30.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
281
+ "model.layers.30.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
282
+ "model.layers.30.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
283
+ "model.layers.30.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
284
+ "model.layers.31.input_layernorm.weight": "model-00007-of-00014.safetensors",
285
+ "model.layers.31.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
286
+ "model.layers.31.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
287
+ "model.layers.31.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
288
+ "model.layers.31.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
289
+ "model.layers.31.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
290
+ "model.layers.31.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
291
+ "model.layers.31.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
292
+ "model.layers.31.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
293
+ "model.layers.31.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
294
+ "model.layers.31.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
295
+ "model.layers.32.input_layernorm.weight": "model-00007-of-00014.safetensors",
296
+ "model.layers.32.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
297
+ "model.layers.32.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
298
+ "model.layers.32.mlp.up_proj.weight": "model-00007-of-00014.safetensors",
299
+ "model.layers.32.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
300
+ "model.layers.32.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
301
+ "model.layers.32.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
302
+ "model.layers.32.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
303
+ "model.layers.32.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
304
+ "model.layers.32.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
305
+ "model.layers.32.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
306
+ "model.layers.33.input_layernorm.weight": "model-00008-of-00014.safetensors",
307
+ "model.layers.33.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
308
+ "model.layers.33.mlp.gate_proj.weight": "model-00007-of-00014.safetensors",
309
+ "model.layers.33.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
310
+ "model.layers.33.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
311
+ "model.layers.33.self_attn.k_norm.weight": "model-00007-of-00014.safetensors",
312
+ "model.layers.33.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
313
+ "model.layers.33.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
314
+ "model.layers.33.self_attn.q_norm.weight": "model-00007-of-00014.safetensors",
315
+ "model.layers.33.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
316
+ "model.layers.33.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
317
+ "model.layers.34.input_layernorm.weight": "model-00008-of-00014.safetensors",
318
+ "model.layers.34.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
319
+ "model.layers.34.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
320
+ "model.layers.34.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
321
+ "model.layers.34.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
322
+ "model.layers.34.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
323
+ "model.layers.34.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
324
+ "model.layers.34.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
325
+ "model.layers.34.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
326
+ "model.layers.34.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
327
+ "model.layers.34.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
328
+ "model.layers.35.input_layernorm.weight": "model-00008-of-00014.safetensors",
329
+ "model.layers.35.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
330
+ "model.layers.35.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
331
+ "model.layers.35.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
332
+ "model.layers.35.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
333
+ "model.layers.35.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
334
+ "model.layers.35.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
335
+ "model.layers.35.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
336
+ "model.layers.35.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
337
+ "model.layers.35.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
338
+ "model.layers.35.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
339
+ "model.layers.36.input_layernorm.weight": "model-00008-of-00014.safetensors",
340
+ "model.layers.36.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
341
+ "model.layers.36.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
342
+ "model.layers.36.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
343
+ "model.layers.36.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
344
+ "model.layers.36.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
345
+ "model.layers.36.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
346
+ "model.layers.36.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
347
+ "model.layers.36.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
348
+ "model.layers.36.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
349
+ "model.layers.36.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
350
+ "model.layers.37.input_layernorm.weight": "model-00008-of-00014.safetensors",
351
+ "model.layers.37.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
352
+ "model.layers.37.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
353
+ "model.layers.37.mlp.up_proj.weight": "model-00008-of-00014.safetensors",
354
+ "model.layers.37.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
355
+ "model.layers.37.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
356
+ "model.layers.37.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
357
+ "model.layers.37.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
358
+ "model.layers.37.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
359
+ "model.layers.37.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
360
+ "model.layers.37.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
361
+ "model.layers.38.input_layernorm.weight": "model-00009-of-00014.safetensors",
362
+ "model.layers.38.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
363
+ "model.layers.38.mlp.gate_proj.weight": "model-00008-of-00014.safetensors",
364
+ "model.layers.38.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
365
+ "model.layers.38.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
366
+ "model.layers.38.self_attn.k_norm.weight": "model-00008-of-00014.safetensors",
367
+ "model.layers.38.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
368
+ "model.layers.38.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
369
+ "model.layers.38.self_attn.q_norm.weight": "model-00008-of-00014.safetensors",
370
+ "model.layers.38.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
371
+ "model.layers.38.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
372
+ "model.layers.39.input_layernorm.weight": "model-00009-of-00014.safetensors",
373
+ "model.layers.39.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
374
+ "model.layers.39.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
375
+ "model.layers.39.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
376
+ "model.layers.39.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
377
+ "model.layers.39.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
378
+ "model.layers.39.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
379
+ "model.layers.39.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
380
+ "model.layers.39.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
381
+ "model.layers.39.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
382
+ "model.layers.39.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
383
+ "model.layers.4.input_layernorm.weight": "model-00002-of-00014.safetensors",
384
+ "model.layers.4.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
385
+ "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
386
+ "model.layers.4.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
387
+ "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
388
+ "model.layers.4.self_attn.k_norm.weight": "model-00002-of-00014.safetensors",
389
+ "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
390
+ "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
391
+ "model.layers.4.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
392
+ "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
393
+ "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
394
+ "model.layers.40.input_layernorm.weight": "model-00009-of-00014.safetensors",
395
+ "model.layers.40.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
396
+ "model.layers.40.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
397
+ "model.layers.40.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
398
+ "model.layers.40.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
399
+ "model.layers.40.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
400
+ "model.layers.40.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
401
+ "model.layers.40.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
402
+ "model.layers.40.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
403
+ "model.layers.40.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
404
+ "model.layers.40.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
405
+ "model.layers.41.input_layernorm.weight": "model-00009-of-00014.safetensors",
406
+ "model.layers.41.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
407
+ "model.layers.41.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
408
+ "model.layers.41.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
409
+ "model.layers.41.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
410
+ "model.layers.41.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
411
+ "model.layers.41.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
412
+ "model.layers.41.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
413
+ "model.layers.41.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
414
+ "model.layers.41.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
415
+ "model.layers.41.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
416
+ "model.layers.42.input_layernorm.weight": "model-00009-of-00014.safetensors",
417
+ "model.layers.42.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
418
+ "model.layers.42.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
419
+ "model.layers.42.mlp.up_proj.weight": "model-00009-of-00014.safetensors",
420
+ "model.layers.42.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
421
+ "model.layers.42.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
422
+ "model.layers.42.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
423
+ "model.layers.42.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
424
+ "model.layers.42.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
425
+ "model.layers.42.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
426
+ "model.layers.42.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
427
+ "model.layers.43.input_layernorm.weight": "model-00010-of-00014.safetensors",
428
+ "model.layers.43.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
429
+ "model.layers.43.mlp.gate_proj.weight": "model-00009-of-00014.safetensors",
430
+ "model.layers.43.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
431
+ "model.layers.43.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
432
+ "model.layers.43.self_attn.k_norm.weight": "model-00009-of-00014.safetensors",
433
+ "model.layers.43.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
434
+ "model.layers.43.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
435
+ "model.layers.43.self_attn.q_norm.weight": "model-00009-of-00014.safetensors",
436
+ "model.layers.43.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
437
+ "model.layers.43.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
438
+ "model.layers.44.input_layernorm.weight": "model-00010-of-00014.safetensors",
439
+ "model.layers.44.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
440
+ "model.layers.44.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
441
+ "model.layers.44.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
442
+ "model.layers.44.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
443
+ "model.layers.44.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
444
+ "model.layers.44.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
445
+ "model.layers.44.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
446
+ "model.layers.44.self_attn.q_norm.weight": "model-00010-of-00014.safetensors",
447
+ "model.layers.44.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
448
+ "model.layers.44.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
449
+ "model.layers.45.input_layernorm.weight": "model-00010-of-00014.safetensors",
450
+ "model.layers.45.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
451
+ "model.layers.45.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
452
+ "model.layers.45.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
453
+ "model.layers.45.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
454
+ "model.layers.45.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
455
+ "model.layers.45.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
456
+ "model.layers.45.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
457
+ "model.layers.45.self_attn.q_norm.weight": "model-00010-of-00014.safetensors",
458
+ "model.layers.45.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
459
+ "model.layers.45.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
460
+ "model.layers.46.input_layernorm.weight": "model-00010-of-00014.safetensors",
461
+ "model.layers.46.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
462
+ "model.layers.46.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
463
+ "model.layers.46.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
464
+ "model.layers.46.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
465
+ "model.layers.46.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
466
+ "model.layers.46.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
467
+ "model.layers.46.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
468
+ "model.layers.46.self_attn.q_norm.weight": "model-00010-of-00014.safetensors",
469
+ "model.layers.46.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
470
+ "model.layers.46.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
471
+ "model.layers.47.input_layernorm.weight": "model-00010-of-00014.safetensors",
472
+ "model.layers.47.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
473
+ "model.layers.47.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
474
+ "model.layers.47.mlp.up_proj.weight": "model-00010-of-00014.safetensors",
475
+ "model.layers.47.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
476
+ "model.layers.47.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
477
+ "model.layers.47.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
478
+ "model.layers.47.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
479
+ "model.layers.47.self_attn.q_norm.weight": "model-00010-of-00014.safetensors",
480
+ "model.layers.47.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
481
+ "model.layers.47.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
482
+ "model.layers.48.input_layernorm.weight": "model-00011-of-00014.safetensors",
483
+ "model.layers.48.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
484
+ "model.layers.48.mlp.gate_proj.weight": "model-00010-of-00014.safetensors",
485
+ "model.layers.48.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
486
+ "model.layers.48.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
487
+ "model.layers.48.self_attn.k_norm.weight": "model-00010-of-00014.safetensors",
488
+ "model.layers.48.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
489
+ "model.layers.48.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
490
+ "model.layers.48.self_attn.q_norm.weight": "model-00010-of-00014.safetensors",
491
+ "model.layers.48.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
492
+ "model.layers.48.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
493
+ "model.layers.49.input_layernorm.weight": "model-00011-of-00014.safetensors",
494
+ "model.layers.49.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
495
+ "model.layers.49.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
496
+ "model.layers.49.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
497
+ "model.layers.49.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
498
+ "model.layers.49.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
499
+ "model.layers.49.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
500
+ "model.layers.49.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
501
+ "model.layers.49.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
502
+ "model.layers.49.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
503
+ "model.layers.49.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
504
+ "model.layers.5.input_layernorm.weight": "model-00002-of-00014.safetensors",
505
+ "model.layers.5.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
506
+ "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
507
+ "model.layers.5.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
508
+ "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
509
+ "model.layers.5.self_attn.k_norm.weight": "model-00002-of-00014.safetensors",
510
+ "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
511
+ "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
512
+ "model.layers.5.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
513
+ "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
514
+ "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
515
+ "model.layers.50.input_layernorm.weight": "model-00011-of-00014.safetensors",
516
+ "model.layers.50.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
517
+ "model.layers.50.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
518
+ "model.layers.50.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
519
+ "model.layers.50.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
520
+ "model.layers.50.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
521
+ "model.layers.50.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
522
+ "model.layers.50.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
523
+ "model.layers.50.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
524
+ "model.layers.50.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
525
+ "model.layers.50.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
526
+ "model.layers.51.input_layernorm.weight": "model-00011-of-00014.safetensors",
527
+ "model.layers.51.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
528
+ "model.layers.51.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
529
+ "model.layers.51.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
530
+ "model.layers.51.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
531
+ "model.layers.51.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
532
+ "model.layers.51.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
533
+ "model.layers.51.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
534
+ "model.layers.51.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
535
+ "model.layers.51.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
536
+ "model.layers.51.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
537
+ "model.layers.52.input_layernorm.weight": "model-00011-of-00014.safetensors",
538
+ "model.layers.52.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
539
+ "model.layers.52.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
540
+ "model.layers.52.mlp.up_proj.weight": "model-00011-of-00014.safetensors",
541
+ "model.layers.52.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
542
+ "model.layers.52.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
543
+ "model.layers.52.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
544
+ "model.layers.52.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
545
+ "model.layers.52.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
546
+ "model.layers.52.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
547
+ "model.layers.52.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
548
+ "model.layers.53.input_layernorm.weight": "model-00012-of-00014.safetensors",
549
+ "model.layers.53.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
550
+ "model.layers.53.mlp.gate_proj.weight": "model-00011-of-00014.safetensors",
551
+ "model.layers.53.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
552
+ "model.layers.53.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
553
+ "model.layers.53.self_attn.k_norm.weight": "model-00011-of-00014.safetensors",
554
+ "model.layers.53.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
555
+ "model.layers.53.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
556
+ "model.layers.53.self_attn.q_norm.weight": "model-00011-of-00014.safetensors",
557
+ "model.layers.53.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
558
+ "model.layers.53.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
559
+ "model.layers.54.input_layernorm.weight": "model-00012-of-00014.safetensors",
560
+ "model.layers.54.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
561
+ "model.layers.54.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
562
+ "model.layers.54.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
563
+ "model.layers.54.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
564
+ "model.layers.54.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
565
+ "model.layers.54.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
566
+ "model.layers.54.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
567
+ "model.layers.54.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
568
+ "model.layers.54.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
569
+ "model.layers.54.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
570
+ "model.layers.55.input_layernorm.weight": "model-00012-of-00014.safetensors",
571
+ "model.layers.55.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
572
+ "model.layers.55.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
573
+ "model.layers.55.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
574
+ "model.layers.55.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
575
+ "model.layers.55.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
576
+ "model.layers.55.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
577
+ "model.layers.55.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
578
+ "model.layers.55.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
579
+ "model.layers.55.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
580
+ "model.layers.55.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
581
+ "model.layers.56.input_layernorm.weight": "model-00012-of-00014.safetensors",
582
+ "model.layers.56.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
583
+ "model.layers.56.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
584
+ "model.layers.56.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
585
+ "model.layers.56.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
586
+ "model.layers.56.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
587
+ "model.layers.56.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
588
+ "model.layers.56.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
589
+ "model.layers.56.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
590
+ "model.layers.56.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
591
+ "model.layers.56.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
592
+ "model.layers.57.input_layernorm.weight": "model-00012-of-00014.safetensors",
593
+ "model.layers.57.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
594
+ "model.layers.57.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
595
+ "model.layers.57.mlp.up_proj.weight": "model-00012-of-00014.safetensors",
596
+ "model.layers.57.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
597
+ "model.layers.57.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
598
+ "model.layers.57.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
599
+ "model.layers.57.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
600
+ "model.layers.57.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
601
+ "model.layers.57.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
602
+ "model.layers.57.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
603
+ "model.layers.58.input_layernorm.weight": "model-00013-of-00014.safetensors",
604
+ "model.layers.58.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
605
+ "model.layers.58.mlp.gate_proj.weight": "model-00012-of-00014.safetensors",
606
+ "model.layers.58.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
607
+ "model.layers.58.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
608
+ "model.layers.58.self_attn.k_norm.weight": "model-00012-of-00014.safetensors",
609
+ "model.layers.58.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
610
+ "model.layers.58.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
611
+ "model.layers.58.self_attn.q_norm.weight": "model-00012-of-00014.safetensors",
612
+ "model.layers.58.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
613
+ "model.layers.58.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
614
+ "model.layers.59.input_layernorm.weight": "model-00013-of-00014.safetensors",
615
+ "model.layers.59.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
616
+ "model.layers.59.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
617
+ "model.layers.59.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
618
+ "model.layers.59.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
619
+ "model.layers.59.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
620
+ "model.layers.59.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
621
+ "model.layers.59.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
622
+ "model.layers.59.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
623
+ "model.layers.59.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
624
+ "model.layers.59.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
625
+ "model.layers.6.input_layernorm.weight": "model-00002-of-00014.safetensors",
626
+ "model.layers.6.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
627
+ "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
628
+ "model.layers.6.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
629
+ "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
630
+ "model.layers.6.self_attn.k_norm.weight": "model-00002-of-00014.safetensors",
631
+ "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
632
+ "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
633
+ "model.layers.6.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
634
+ "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
635
+ "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
636
+ "model.layers.60.input_layernorm.weight": "model-00013-of-00014.safetensors",
637
+ "model.layers.60.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
638
+ "model.layers.60.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
639
+ "model.layers.60.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
640
+ "model.layers.60.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
641
+ "model.layers.60.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
642
+ "model.layers.60.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
643
+ "model.layers.60.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
644
+ "model.layers.60.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
645
+ "model.layers.60.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
646
+ "model.layers.60.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
647
+ "model.layers.61.input_layernorm.weight": "model-00013-of-00014.safetensors",
648
+ "model.layers.61.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
649
+ "model.layers.61.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
650
+ "model.layers.61.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
651
+ "model.layers.61.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
652
+ "model.layers.61.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
653
+ "model.layers.61.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
654
+ "model.layers.61.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
655
+ "model.layers.61.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
656
+ "model.layers.61.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
657
+ "model.layers.61.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
658
+ "model.layers.62.input_layernorm.weight": "model-00013-of-00014.safetensors",
659
+ "model.layers.62.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
660
+ "model.layers.62.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
661
+ "model.layers.62.mlp.up_proj.weight": "model-00013-of-00014.safetensors",
662
+ "model.layers.62.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
663
+ "model.layers.62.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
664
+ "model.layers.62.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
665
+ "model.layers.62.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
666
+ "model.layers.62.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
667
+ "model.layers.62.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
668
+ "model.layers.62.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
669
+ "model.layers.63.input_layernorm.weight": "model-00014-of-00014.safetensors",
670
+ "model.layers.63.mlp.down_proj.weight": "model-00014-of-00014.safetensors",
671
+ "model.layers.63.mlp.gate_proj.weight": "model-00013-of-00014.safetensors",
672
+ "model.layers.63.mlp.up_proj.weight": "model-00014-of-00014.safetensors",
673
+ "model.layers.63.post_attention_layernorm.weight": "model-00014-of-00014.safetensors",
674
+ "model.layers.63.self_attn.k_norm.weight": "model-00013-of-00014.safetensors",
675
+ "model.layers.63.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
676
+ "model.layers.63.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
677
+ "model.layers.63.self_attn.q_norm.weight": "model-00013-of-00014.safetensors",
678
+ "model.layers.63.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
679
+ "model.layers.63.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
680
+ "model.layers.7.input_layernorm.weight": "model-00002-of-00014.safetensors",
681
+ "model.layers.7.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
682
+ "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
683
+ "model.layers.7.mlp.up_proj.weight": "model-00002-of-00014.safetensors",
684
+ "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
685
+ "model.layers.7.self_attn.k_norm.weight": "model-00002-of-00014.safetensors",
686
+ "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
687
+ "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
688
+ "model.layers.7.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
689
+ "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
690
+ "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
691
+ "model.layers.8.input_layernorm.weight": "model-00003-of-00014.safetensors",
692
+ "model.layers.8.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
693
+ "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00014.safetensors",
694
+ "model.layers.8.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
695
+ "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
696
+ "model.layers.8.self_attn.k_norm.weight": "model-00002-of-00014.safetensors",
697
+ "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
698
+ "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
699
+ "model.layers.8.self_attn.q_norm.weight": "model-00002-of-00014.safetensors",
700
+ "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
701
+ "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
702
+ "model.layers.9.input_layernorm.weight": "model-00003-of-00014.safetensors",
703
+ "model.layers.9.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
704
+ "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00014.safetensors",
705
+ "model.layers.9.mlp.up_proj.weight": "model-00003-of-00014.safetensors",
706
+ "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
707
+ "model.layers.9.self_attn.k_norm.weight": "model-00003-of-00014.safetensors",
708
+ "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
709
+ "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
710
+ "model.layers.9.self_attn.q_norm.weight": "model-00003-of-00014.safetensors",
711
+ "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
712
+ "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
713
+ "model.norm.weight": "model-00014-of-00014.safetensors"
714
+ }
715
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654
tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
231
+ "clean_up_tokenization_spaces": false,
232
+ "eos_token": "<|im_end|>",
233
+ "errors": "replace",
234
+ "model_max_length": 131072,
235
+ "pad_token": "<|endoftext|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:430a720d1a9e8d8d349d353cb07b4a20f2c968358669b9be50bfbb40bbef7cd4
3
+ size 9336
vocab.json ADDED
The diff for this file is too large to render. See raw diff