HeAAAAA commited on
Commit
c38e10f
·
verified ·
1 Parent(s): f95662a

Upload LlamaForCausalLM

Browse files
README.md CHANGED
@@ -47,171 +47,104 @@ language:
47
 
48
  # 1. Introduction
49
 
50
- This is a fine-tuned evaluator for roly-playing tasks. The used training data set can be seen in [Annotated Role-playing Evaluation Dataset](https://huggingface.co/datasets/HeAAAAA/Crab-manually-annotated-role-playing-evaluation-dataset). More details can be seen in [Crab](https://huggingface.co/HeAAAAA/Crab).
 
 
 
 
 
 
 
51
 
 
52
 
53
 
54
 
55
- # 2. Six Aspect-specific Metrics
56
 
57
- - **Language Fluency**: Pertains to the natural and fluent communication style, independent of grammatical strictness or contextual background.
58
 
59
- - **Language Relevance**: Focuses on the ability to stay on topic and respond appropriately, essentially testing the capacity to follow instructions.
60
-
61
- - **Role Language**: Evaluates whether the text reflects the vocabulary and tone specific to roles, including appropriate actions.
62
 
63
- - **Role Knowledge**: Involves a deep understanding of both general knowledge and information specific to the roles, ensuring accurate and informed role portrayal.
 
64
 
65
- - **Emotional Expression**: Reviews the suitability of emotions, emotional intelligence, and empathy expressed in context with the role's traits.
66
 
67
- - **Interactive Engagement**: Measures the text's ability to draw the user in, encouraging ongoing interaction and contributing dynamically to the dialogue.
68
 
 
 
69
 
70
 
71
  # 3. Performance
72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
  <div align="center">
75
 
76
  <!-- Logo 图片 -->
77
- <img src="https://cdn-uploads.huggingface.co/production/uploads/650add6348983c90ab688b6e/22mf5uc1zpYLD0vGLtWM_.png" width="500" style="border-radius: 20px;"/>
78
-
79
- Figure 1: The Spearman and Pearson correlations with human evaluations for the proposed RoleRM, ChatGPT, PairEval, G-Eval, and GPTScore. We average all aspects for calculations.
80
 
81
  </div>
 
82
 
83
 
84
- <div align="center">
 
 
 
 
 
85
 
 
86
 
87
- <!-- Logo 图片 -->
88
- <img src="https://cdn-uploads.huggingface.co/production/uploads/650add6348983c90ab688b6e/VKwol9kSa6qJPFGZ-B1jR.png" width="500" style="border-radius: 20px;"/>
89
 
90
- Figure 2: The comparison between RoleRM and ChatGPT. We calculate MAE to illustrate the gaps of Human Annotations with RoleRM and ChatGPT.
 
91
 
92
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
 
94
 
95
- # 4. Usage
96
-
97
- <pre lang="markdown">
98
-
99
-
100
- from transformers import AutoTokenizer, AutoModelForCausalLM
101
-
102
- bot_name = "Hermione"
103
- bot_personality = "Intelligent, curious, respectful, and eager to learn"
104
- bot_description = "Hermione and Hagrid were in the Forbidden Forest, walking on a narrow path surrounded by trees. Hermione looked around carefully, fascinated by the dense forest. Hagrid was leading the way, pointing out various creatures and telling her about their habits and characteristics."
105
- cp = "None"
106
-
107
- user_name = "Hagrid"
108
- user_description = "Hagrid is the Care of Magical Creatures teacher at Hogwarts. He is a half-giant with a great love for all creatures, magical or not."
109
- relation = "Teacher and student"
110
- scene = "Hermione and Hagrid are in the Forbidden Forest, exploring and learning about the various magical creatures that live there."
111
-
112
- current_Dialogue1 = "(round1) human: Now, this here is a Bowtruckle, Hermione. They're very small, only about the size of a twig, and they're very shy. They usually live in trees and are very good at camouflaging themselves. You have to be very careful when handling them because they have very sharp fingers. Hermione, do you like them? \nbot: (Hermione shook her head) No, not really. I'm sorry, Hagrid. I don't mean to offend you."
113
- current_Dialogue2 = "(round2) human: (Hagrid looked slightly disappointed but continued) That's alright, Hermione. Everyone has different tastes. Let's move on. This here is an Acromantula. They're giant spiders, Hermione. Very dangerous if you get too close. They can grow up to ten feet in diameter and have eight sharp legs. \nbot: (Hermione shuddered at the sight) You're brave, Hagrid. I couldn't get anywhere near that thing."
114
-
115
- def build_inputs_prompt(current_Dialogue):
116
- inputs_prompt = f"""
117
- # Role: Dialogue Quality Evaluation Expert
118
- ## Goal: You need to score the utterance of the bot in the current dialogue based on the following 6 targets:
119
- 1. Language Fluency: This score evaluates the fluency and naturalness of the language, making the text feel organic, lifelike,
120
- and not rigid or stilted. The focus here is solely on the overall smoothness and flow of the language, without considering the
121
- specific content. The goal is to evaluate how natural and conversational the language sounds, irrespective of the grammatical
122
- correctness. However, the bot is allowed to be syntactically incoherent when engaging in everyday colloquialisms or
123
- expressing emotions such as excitement and nervousness.
124
- 2. Language Relevance: This score evaluates how well the bot responds to the current topic, staying focused and relevant
125
- without introducing irrelevant information. The key consideration is whether the bot’s response correctly addresses the
126
- specific instructions or questions posed, regardless of the content or quality of the response itself. For example, the answer of
127
- the bot is not irrelevant to the topic of the current conversation, or the answer is too long-winded, it should be given a low
128
- score.
129
- 3. Role Language: This score evaluates how well the language used by the bot in the dialogue matches their established
130
- personality and traits. The focus is on whether the bot speaks in a style consistent with their individual personalities, creating
131
- a natural and authentic conversation. This rating considers only the overall language style, not the content or accuracy of the
132
- responses. For example, if the bot exhibits everyday colloquial expressions that fit the style of the character, it should be
133
- given a high score; if the bot uses formal language in everyday conversations, it should be given a low score.
134
- 4. Role Knowledge: This score evaluates the level of understanding and using of common sense (basic knowledge) and role
135
- knowledge (as well as related background) by the bot. If the bot speaks against what they are supposed to know, they should
136
- be scored low.
137
- 5. Emotional Expression: This score evaluates how well the bot’s emotional responses, including expressions of empathy
138
- and emotional intelligence, align with their established personality and the context of the dialogue. If the bot’s emotional
139
- responses (actions or expressions) are inappropriate/stiff or out of character, it should be given a low score.
140
- 6. Interactive Engagement: This score evaluates how engaging and motivating the bot’s dialogue is, encouraging the user to
141
- continue the conversation. The focus is on the overall conversational flow and interactivity, without considering the use of
142
- specialized vocabulary or any mismatches in communication styles. If the bot ends the dialogue with a question, it should
143
- receive a high score.
144
-
145
- The scoring criteria for the above six targets are as follows:
146
- 0 - Negative, poor performance, long-winded
147
- 1 - Dialogue does not reflect the indicator or does not quite meet the standards
148
- 2 - More in line with standards but still has some defects
149
- 3 - Perfectly meets the criteria
150
-
151
- ## The information of the bot is as follows: bot’s name: {bot_name}
152
- bot personality: {bot_personality}
153
- bot description: {bot_description}
154
- Reference speaking style: {cp}
155
-
156
- ## Current scenario Interlocutor: {user_name}, {user_description} Relationship with bot: {relation} Scene: {scene}
157
- ## The historical dialogue is as follows:
158
- history
159
- Please score the above six targets (with a range of 0-3, separated by spaces) in response to bot.name (i.e. bot)’s utterance in
160
- the current dialogue.
161
- ## Current Dialogue: {current_Dialogue}
162
- """
163
-
164
- return inputs_prompt
165
-
166
-
167
- path = "HeAAAAA/RoleRM"
168
-
169
- model = AutoModelForCausalLM.from_pretrained(path).to("cuda")
170
- tokenizer = AutoTokenizer.from_pretrained(path)
171
-
172
-
173
- inputs_prompt1 = build_inputs_prompt(current_Dialogue1)
174
- inputs_prompt2 = build_inputs_prompt(current_Dialogue2)
175
-
176
- inputs = tokenizer(inputs_prompt1, return_tensors="pt")
177
- outputs = model.generate(
178
- **inputs,
179
- max_new_tokens=200,
180
- do_sample=True,
181
- temperature=0.7,
182
- top_p=0.9,
183
- top_k=50,
184
- repetition_penalty=1.1,
185
- eos_token_id=tokenizer.eos_token_id
186
- )
187
-
188
- new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
189
- new_text = tokenizer.decode(new_tokens, skip_special_tokens=True)
190
-
191
- print(new_text)
192
-
193
- inputs = tokenizer(inputs_prompt2, return_tensors="pt").to("cuda")
194
- outputs = model.generate(
195
- **inputs,
196
- max_new_tokens=200,
197
- do_sample=True,
198
- temperature=0.7,
199
- top_p=0.9,
200
- top_k=50,
201
- repetition_penalty=1.1,
202
- eos_token_id=tokenizer.eos_token_id
203
- )
204
-
205
- new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
206
- new_text = tokenizer.decode(new_tokens, skip_special_tokens=True)
207
-
208
- print(new_text)
209
-
210
-
211
- </pre>
212
-
213
-
214
- # 5. Citation
215
 
216
  ```bibtex
217
  @misc{kimiteam2025kimivltechnicalreport,
 
47
 
48
  # 1. Introduction
49
 
50
+ We introduces Crab, a novel Configurable Role-Playing (RP) LLM with Assessing Benchmark, which consists of Role-Centric Dataset Curation, Persona-Embodying LLM Construction, and Comprehensive Benchmark Creation for RP dialogue generation.
51
+ Distinct from traditional RP models that employ only several preset roles, Crab enables dynamic configuration of desired roles, thereby enhancing related flexibility and adaptability.
52
+ To effectively train RP-LLMs, we curated the largest RP training dataset.
53
+ The dataset provides a detailed role overview for each dialogue, including character profile, conversation scenario, and tagged topic, capturing a broad range of role-based behaviors, emotions, and interactions.
54
+ We also noticed that current benchmarks lack both proper evaluation standards and methods.
55
+ Thus, to validate RP-LLMs' effectiveness, we introduced a new benchmark containing an evaluation standard, a test dataset with manual annotations, and a reward model RoleRM designed to automatically assess specific aspects of RP while aligning with human perception.
56
+ Sufficient experiments reveal that RoleRM significantly outperforms ChatGPT and other evaluation methods in conducting fine-grained evaluations of RP.
57
+ Also, RP-LLMs powered by Crab demonstrate superior performance across various fine-grained aspects.
58
 
59
+ More details can be seen at Github {https://github.com/KaiHe-better/Crab?tab=readme-ov-file}.
60
 
61
 
62
 
 
63
 
64
+ # 2. Configurable Role-Playing LLM
65
 
66
+ <div align="center">
 
 
67
 
68
+ <!-- Logo 图片 -->
69
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/650add6348983c90ab688b6e/fDaDq8tzBBUuEteND8N53.png" width="500" style="border-radius: 20px;"/>
70
 
71
+ </div>
72
 
 
73
 
74
+ Unlike existing RP-LLMs, where a single role is trained with numerous dialogues, our approach introduces a diverse range of roles with detailed configuration information
75
+ while keeping dialogue per role minimal. This enables LLMs to generate dialogues dynamically from configurations rather than memorizing specific roles, enhancing flexibility and adaptability. Additionally, we propose RoleRM in our benchmarks to address the challenge of evaluating RP performance.
76
 
77
 
78
  # 3. Performance
79
 
80
+ | Models | Overall | Language Fluency | Language Relevance | Role Language | Role Knowledge | Emotional Expression | Interactive Engagement |
81
+ |----------------------|---------|------------------|---------------------|----------------|-----------------|-----------------------|------------------------|
82
+ | Llama-2-7B | 1.57 | 2.19 | 1.83 | 1.63 | 1.37 | 1.21 | 1.21 |
83
+ | Llama-3-8B | 1.99 | 2.56 | 2.36 | 2.09 | 1.78 | 1.56 | 1.60 |
84
+ | Llama-3.1-8B | 1.94 | 2.52 | 2.30 | 2.01 | 1.75 | 1.47 | 1.57 |
85
+ | Llama-2-7B-Crab | 2.14 | 2.73 | 2.35 | 2.07 | 1.88 | 1.69 | 2.12 |
86
+ | Llama-3-8B-Crab | 2.22 | 2.81 | 2.51 | 2.16 | 1.95 | 1.77 | 2.13 |
87
+ | **Llama-3.1-8B-Crab**| **2.23**| **2.87** | **2.56** | **2.17** | **1.95** | **1.76** | **2.09** |
88
+ | GPT3.5 | 1.66 | 2.35 | 2.11 | 1.72 | 1.50 | 1.11 | 1.17 |
89
+ | GPT4o | 1.86 | 2.44 | 2.27 | 1.90 | 1.69 | 1.33 | 1.51 |
90
+ | GPT4 | 2.13 | 2.73 | 2.53 | 2.18 | 1.90 | 1.62 | 1.86 |
91
+ | CharacterGLM-6B | 1.83 | 2.37 | 1.96 | 1.80 | 1.60 | 1.39 | 1.86 |
92
+ | Pygmalion-2-7B | 2.11 | 2.82 | 2.49 | 2.01 | 1.86 | 1.58 | 1.91 |
93
+ | Haruhi-Zero-7B | 2.17 | 2.80 | 2.49 | 2.12 | 2.00 | 1.74 | 1.86 |
94
+
95
+
96
+ Table 1: The results of evaluation on the test data of our Benchmark. The listed scores are from our RoleRM. Bold fonts indicate the best results and underlined fonts represent the second best. The subscripts represent the difference between each model and Crab (Llama-3.1-8B-Crab) counterpart.
97
+
98
+
99
 
100
  <div align="center">
101
 
102
  <!-- Logo 图片 -->
103
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/650add6348983c90ab688b6e/veerjA_MP5ZXmOxjAdZRO.png" width="500" style="border-radius: 20px;"/>
 
 
104
 
105
  </div>
106
+ Figure 2: Human evaluation comparing Crab, GPT-3.5, and Pygmalion-2-7B. We selected a general LLM and one well-known RP-LLM to compare their generations against our Crab. For the same dialogue, annotators ranked responses from the three LLMs.
107
 
108
 
109
+ | Models | Overall | Language Fluency | Language Relevance | Role Language | Role Knowledge | Emotional Expression | Interactive Engagement |
110
+ |---------------|---------|------------------|---------------------|----------------|-----------------|-----------------------|------------------------|
111
+ | **Crab (sampled)** | **2.20** | **2.71** | **2.45** | **2.15** | **1.95** | **1.84** | **2.12** |
112
+ | w/o base | 2.17 | 2.72 | 2.41 | 2.07 | 1.89 | 1.79 | 2.11 |
113
+ | w/o ref. | 2.15 | 2.70 | 2.40 | 2.01 | 1.85 | 1.82 | 2.11 |
114
+ | w/o scene | 2.15 | 2.69 | 2.39 | 2.10 | 1.90 | 1.81 | 1.98 |
115
 
116
+ Table 2: The ablation study for Crab. Due to missing attributes in our dataset, we sampled 1,000 fully attributed instances as the sub-test set to conduct the ablation experiments, referred to as Crab (sampled). The notation “w/o base" means without base role information for training RP-LLMs, including age, gender, personality, description, and expression; “w/o ref." means without catchphrases and knowledge; “w/o scene" means without interlocutor, relation, scenario, and tags.
117
 
118
+ <br>
 
119
 
120
+ # 4. Three Datasets
121
+ We publish three datasets, including Crab role-playing train set, Crab role-playing evaluation benchmark, and manually annotated role-playing evaluation dataset (can be used for training a Role-palying Evaluation Model).
122
 
123
+ ## 4.1 Crab role-playing train set:
124
+ {https://huggingface.co/datasets/HeAAAAA/Crab-role-playing-train-set}
125
+
126
+ ## 4.2 Crab role-playing evaluation benchmark:
127
+ {https://huggingface.co/datasets/HeAAAAA/Crab-role-playing-evaluation-benchmark}
128
+
129
+ ## 4.3 Crab manually annotated role-playing evaluation dataset:
130
+ {https://huggingface.co/datasets/HeAAAAA/Crab-manually-annotated-role-playing-evaluation-dataset}
131
+
132
+
133
+ <br>
134
+
135
+ # 5. Fine-tuned Role-playing Model
136
+ We release a fine-tuned model to achieve configurable Role-Playing tasks.
137
+ {https://huggingface.co/HeAAAAA/Crab}
138
+
139
+ <br>
140
+
141
+ # 6. Role-palying Evaluation Model
142
+ We release a trained model to automate the evaluation of role-playing tasks.
143
+ {https://huggingface.co/HeAAAAA/RoleRM}
144
 
145
+ <br>
146
 
147
+ # 7. Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
148
 
149
  ```bibtex
150
  @misc{kimiteam2025kimivltechnicalreport,
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/raid/hpc/hekai/WorkShop/My_project/Crab/rm_models/model",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 128000,
9
+ "eos_token_id": 128009,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 4096,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 14336,
15
+ "max_position_embeddings": 8192,
16
+ "mlp_bias": false,
17
+ "model_type": "llama",
18
+ "num_attention_heads": 32,
19
+ "num_hidden_layers": 32,
20
+ "num_key_value_heads": 8,
21
+ "pretraining_tp": 1,
22
+ "rms_norm_eps": 1e-05,
23
+ "rope_scaling": null,
24
+ "rope_theta": 500000.0,
25
+ "tie_word_embeddings": false,
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.48.3",
28
+ "use_cache": true,
29
+ "vocab_size": 128256
30
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "eos_token_id": [
3
+ 128001,
4
+ 128009
5
+ ],
6
+ "transformers_version": "4.48.3"
7
+ }
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:efd22df38b88d548854925fcf41218883e284fd25d4b171d128baf9d33396065
3
+ size 4886466168
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eee8fbb6020502c26e489a176a00c0d0500126dbcc4a1f237aba1086a1e07810
3
+ size 4832007448
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d14b931f3c791a78a77add08a255d3776f67375d7ee1e2ea28c46373eee5403a
3
+ size 4999813112
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30e86e284058ee94b04ad71a4fbbea07b6a6d1d8ce10d4cc3faf7f2bf4d98b81
3
+ size 4999813128
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cef1c5e79b22b353785b4a43466d48bfb006b44efa24fcab460c50f99db54d06
3
+ size 4832007496
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a042fa71db01fdfbce39d23180fd547e3fb5ab10e92fcbf8022c96ca0d61b1dc
3
+ size 4999813120
model-00007-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1864dd3b47934dc279c4c173b1de0522508a3e82350f289e0ec6e95629d3486c
3
+ size 2571158184
model.safetensors.index.json ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 32121044992
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00007-of-00007.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00007.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00007-of-00007.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
242
+ "model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
243
+ "model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
244
+ "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
245
+ "model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
246
+ "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
247
+ "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
248
+ "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
249
+ "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
250
+ "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
251
+ "model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
252
+ "model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
253
+ "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
254
+ "model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
255
+ "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
256
+ "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
257
+ "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
258
+ "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
259
+ "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
260
+ "model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
261
+ "model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
262
+ "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
263
+ "model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
264
+ "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
265
+ "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
266
+ "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
267
+ "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
268
+ "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
269
+ "model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
270
+ "model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
271
+ "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
272
+ "model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
273
+ "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
274
+ "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
275
+ "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
276
+ "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
277
+ "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
278
+ "model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
279
+ "model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
280
+ "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
281
+ "model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
282
+ "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
283
+ "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
284
+ "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
285
+ "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
286
+ "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
287
+ "model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
288
+ "model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
289
+ "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
290
+ "model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
291
+ "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
292
+ "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
293
+ "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
294
+ "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
295
+ "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
296
+ "model.norm.weight": "model-00007-of-00007.safetensors"
297
+ }
298
+ }