JinNian0072 commited on
Commit
1bb59a9
·
verified ·
1 Parent(s): c81e813

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ checkpoint-1149/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ checkpoint-383/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-Coder-7B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen2.5-Coder-7B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen2.5-Coder-7B",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 16,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "down_proj",
33
+ "v_proj",
34
+ "k_proj",
35
+ "gate_proj",
36
+ "o_proj",
37
+ "q_proj",
38
+ "up_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecc472753be5246dce3068d4e2f267aa3e7ab2e1d509a3c9bbed869a9625ab76
3
+ size 161533192
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
checkpoint-1149/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-Coder-7B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen2.5-Coder-7B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
checkpoint-1149/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen2.5-Coder-7B",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 16,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "down_proj",
33
+ "v_proj",
34
+ "k_proj",
35
+ "gate_proj",
36
+ "o_proj",
37
+ "q_proj",
38
+ "up_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
checkpoint-1149/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed1053fe7af1aba1da4b8c18a7022055c9f30a4b16b2ec4a961fe5299ab28667
3
+ size 161533192
checkpoint-1149/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
checkpoint-1149/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
checkpoint-1149/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1149/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f017f9f308bc0b93958eb7983b184a92e88a2a4203473f16e0894cefcc5c5469
3
+ size 323296891
checkpoint-1149/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b0d381a1c1a48abe941fe8cb028183aee25436577c15ee1dd0c28a1ba1234db
3
+ size 14645
checkpoint-1149/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8310c1156da2fce414e1ac4bfa75f72c674ba2a250f122a4ead559e9b9c45f16
3
+ size 1465
checkpoint-1149/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
checkpoint-1149/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:913950e4971737031da511cdd1b410daae4566f62eb845b3975bca5a102323d8
3
+ size 11421995
checkpoint-1149/tokenizer_config.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|endoftext|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 32768,
203
+ "pad_token": "<|endoftext|>",
204
+ "padding_side": "right",
205
+ "split_special_tokens": false,
206
+ "tokenizer_class": "Qwen2Tokenizer",
207
+ "unk_token": null
208
+ }
checkpoint-1149/trainer_state.json ADDED
@@ -0,0 +1,856 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 383,
3
+ "best_metric": 0.3038630783557892,
4
+ "best_model_checkpoint": "./lora_qwen7b_cpp_abdiff_v1/checkpoint-383",
5
+ "epoch": 3.0,
6
+ "eval_steps": 500,
7
+ "global_step": 1149,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.02611818478615736,
14
+ "grad_norm": 0.272135853767395,
15
+ "learning_rate": 2.347826086956522e-06,
16
+ "loss": 0.747,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.05223636957231472,
21
+ "grad_norm": 0.2994903028011322,
22
+ "learning_rate": 4.956521739130435e-06,
23
+ "loss": 0.729,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.07835455435847209,
28
+ "grad_norm": 0.3347424566745758,
29
+ "learning_rate": 7.5652173913043475e-06,
30
+ "loss": 0.7694,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.10447273914462944,
35
+ "grad_norm": 0.24008171260356903,
36
+ "learning_rate": 1.017391304347826e-05,
37
+ "loss": 0.6931,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.1305909239307868,
42
+ "grad_norm": 0.31324484944343567,
43
+ "learning_rate": 1.2782608695652173e-05,
44
+ "loss": 0.6601,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.15670910871694418,
49
+ "grad_norm": 0.3652394115924835,
50
+ "learning_rate": 1.5391304347826088e-05,
51
+ "loss": 0.4995,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.18282729350310153,
56
+ "grad_norm": 0.18758858740329742,
57
+ "learning_rate": 1.8e-05,
58
+ "loss": 0.3874,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.20894547828925888,
63
+ "grad_norm": 0.11186491698026657,
64
+ "learning_rate": 2.0608695652173913e-05,
65
+ "loss": 0.3579,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.23506366307541626,
70
+ "grad_norm": 0.0821973979473114,
71
+ "learning_rate": 2.3217391304347826e-05,
72
+ "loss": 0.3375,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.2611818478615736,
77
+ "grad_norm": 0.10780682414770126,
78
+ "learning_rate": 2.582608695652174e-05,
79
+ "loss": 0.3286,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.28730003264773096,
84
+ "grad_norm": 0.09552709758281708,
85
+ "learning_rate": 2.8434782608695652e-05,
86
+ "loss": 0.3135,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.31341821743388837,
91
+ "grad_norm": 0.1099899634718895,
92
+ "learning_rate": 2.988394584139265e-05,
93
+ "loss": 0.2828,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.3395364022200457,
98
+ "grad_norm": 0.08481885492801666,
99
+ "learning_rate": 2.9593810444874276e-05,
100
+ "loss": 0.2884,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.36565458700620307,
105
+ "grad_norm": 0.0872187465429306,
106
+ "learning_rate": 2.93036750483559e-05,
107
+ "loss": 0.2856,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.3917727717923604,
112
+ "grad_norm": 0.09415256232023239,
113
+ "learning_rate": 2.9013539651837528e-05,
114
+ "loss": 0.2958,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.41789095657851777,
119
+ "grad_norm": 0.09529490023851395,
120
+ "learning_rate": 2.872340425531915e-05,
121
+ "loss": 0.2832,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.4440091413646752,
126
+ "grad_norm": 0.12695518136024475,
127
+ "learning_rate": 2.8433268858800773e-05,
128
+ "loss": 0.2852,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.4701273261508325,
133
+ "grad_norm": 0.09822621941566467,
134
+ "learning_rate": 2.81431334622824e-05,
135
+ "loss": 0.2647,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.4962455109369899,
140
+ "grad_norm": 0.11181768029928207,
141
+ "learning_rate": 2.785299806576402e-05,
142
+ "loss": 0.2718,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.5223636957231472,
147
+ "grad_norm": 0.0999976173043251,
148
+ "learning_rate": 2.7562862669245647e-05,
149
+ "loss": 0.287,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.5484818805093046,
154
+ "grad_norm": 0.11232498288154602,
155
+ "learning_rate": 2.7272727272727273e-05,
156
+ "loss": 0.2657,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.5746000652954619,
161
+ "grad_norm": 0.09556713700294495,
162
+ "learning_rate": 2.69825918762089e-05,
163
+ "loss": 0.2728,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.6007182500816193,
168
+ "grad_norm": 0.09838665276765823,
169
+ "learning_rate": 2.669245647969052e-05,
170
+ "loss": 0.2679,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.6268364348677767,
175
+ "grad_norm": 0.12542720139026642,
176
+ "learning_rate": 2.6402321083172148e-05,
177
+ "loss": 0.2717,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.6529546196539341,
182
+ "grad_norm": 0.12866875529289246,
183
+ "learning_rate": 2.6112185686653773e-05,
184
+ "loss": 0.2722,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.6790728044400914,
189
+ "grad_norm": 0.12063395977020264,
190
+ "learning_rate": 2.5822050290135396e-05,
191
+ "loss": 0.2665,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.7051909892262488,
196
+ "grad_norm": 0.12060956656932831,
197
+ "learning_rate": 2.5531914893617022e-05,
198
+ "loss": 0.2498,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.7313091740124061,
203
+ "grad_norm": 0.1269434541463852,
204
+ "learning_rate": 2.5241779497098648e-05,
205
+ "loss": 0.2685,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.7574273587985635,
210
+ "grad_norm": 0.13224200904369354,
211
+ "learning_rate": 2.495164410058027e-05,
212
+ "loss": 0.2562,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.7835455435847208,
217
+ "grad_norm": 0.12492657452821732,
218
+ "learning_rate": 2.4661508704061896e-05,
219
+ "loss": 0.2436,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.8096637283708782,
224
+ "grad_norm": 0.19281449913978577,
225
+ "learning_rate": 2.4371373307543522e-05,
226
+ "loss": 0.2593,
227
+ "step": 310
228
+ },
229
+ {
230
+ "epoch": 0.8357819131570355,
231
+ "grad_norm": 0.13644647598266602,
232
+ "learning_rate": 2.408123791102515e-05,
233
+ "loss": 0.2678,
234
+ "step": 320
235
+ },
236
+ {
237
+ "epoch": 0.861900097943193,
238
+ "grad_norm": 0.13078241050243378,
239
+ "learning_rate": 2.379110251450677e-05,
240
+ "loss": 0.2586,
241
+ "step": 330
242
+ },
243
+ {
244
+ "epoch": 0.8880182827293504,
245
+ "grad_norm": 0.16267353296279907,
246
+ "learning_rate": 2.3500967117988397e-05,
247
+ "loss": 0.2441,
248
+ "step": 340
249
+ },
250
+ {
251
+ "epoch": 0.9141364675155077,
252
+ "grad_norm": 0.14218953251838684,
253
+ "learning_rate": 2.321083172147002e-05,
254
+ "loss": 0.2561,
255
+ "step": 350
256
+ },
257
+ {
258
+ "epoch": 0.940254652301665,
259
+ "grad_norm": 0.14463242888450623,
260
+ "learning_rate": 2.2920696324951642e-05,
261
+ "loss": 0.2437,
262
+ "step": 360
263
+ },
264
+ {
265
+ "epoch": 0.9663728370878224,
266
+ "grad_norm": 0.15065905451774597,
267
+ "learning_rate": 2.2630560928433268e-05,
268
+ "loss": 0.2448,
269
+ "step": 370
270
+ },
271
+ {
272
+ "epoch": 0.9924910218739798,
273
+ "grad_norm": 0.1440647393465042,
274
+ "learning_rate": 2.2340425531914894e-05,
275
+ "loss": 0.2407,
276
+ "step": 380
277
+ },
278
+ {
279
+ "epoch": 1.0,
280
+ "eval_loss": 0.3038630783557892,
281
+ "eval_runtime": 46.9955,
282
+ "eval_samples_per_second": 19.534,
283
+ "eval_steps_per_second": 9.767,
284
+ "step": 383
285
+ },
286
+ {
287
+ "epoch": 1.01828272935031,
288
+ "grad_norm": 0.18325522541999817,
289
+ "learning_rate": 2.2050290135396516e-05,
290
+ "loss": 0.2475,
291
+ "step": 390
292
+ },
293
+ {
294
+ "epoch": 1.0444009141364674,
295
+ "grad_norm": 0.19586290419101715,
296
+ "learning_rate": 2.1760154738878142e-05,
297
+ "loss": 0.2269,
298
+ "step": 400
299
+ },
300
+ {
301
+ "epoch": 1.070519098922625,
302
+ "grad_norm": 0.1957836151123047,
303
+ "learning_rate": 2.1470019342359768e-05,
304
+ "loss": 0.2383,
305
+ "step": 410
306
+ },
307
+ {
308
+ "epoch": 1.0966372837087823,
309
+ "grad_norm": 0.20044924318790436,
310
+ "learning_rate": 2.1179883945841394e-05,
311
+ "loss": 0.2277,
312
+ "step": 420
313
+ },
314
+ {
315
+ "epoch": 1.1227554684949397,
316
+ "grad_norm": 0.17480885982513428,
317
+ "learning_rate": 2.0889748549323017e-05,
318
+ "loss": 0.2364,
319
+ "step": 430
320
+ },
321
+ {
322
+ "epoch": 1.148873653281097,
323
+ "grad_norm": 0.22346200048923492,
324
+ "learning_rate": 2.0599613152804643e-05,
325
+ "loss": 0.2337,
326
+ "step": 440
327
+ },
328
+ {
329
+ "epoch": 1.1749918380672544,
330
+ "grad_norm": 0.18232356011867523,
331
+ "learning_rate": 2.030947775628627e-05,
332
+ "loss": 0.2391,
333
+ "step": 450
334
+ },
335
+ {
336
+ "epoch": 1.2011100228534117,
337
+ "grad_norm": 0.21770867705345154,
338
+ "learning_rate": 2.001934235976789e-05,
339
+ "loss": 0.2219,
340
+ "step": 460
341
+ },
342
+ {
343
+ "epoch": 1.227228207639569,
344
+ "grad_norm": 0.23520149290561676,
345
+ "learning_rate": 1.9729206963249517e-05,
346
+ "loss": 0.2204,
347
+ "step": 470
348
+ },
349
+ {
350
+ "epoch": 1.2533463924257264,
351
+ "grad_norm": 0.2378666251897812,
352
+ "learning_rate": 1.9439071566731143e-05,
353
+ "loss": 0.2185,
354
+ "step": 480
355
+ },
356
+ {
357
+ "epoch": 1.2794645772118838,
358
+ "grad_norm": 0.23565010726451874,
359
+ "learning_rate": 1.914893617021277e-05,
360
+ "loss": 0.2013,
361
+ "step": 490
362
+ },
363
+ {
364
+ "epoch": 1.3055827619980411,
365
+ "grad_norm": 0.2177935689687729,
366
+ "learning_rate": 1.885880077369439e-05,
367
+ "loss": 0.2321,
368
+ "step": 500
369
+ },
370
+ {
371
+ "epoch": 1.3317009467841985,
372
+ "grad_norm": 0.2423669397830963,
373
+ "learning_rate": 1.8568665377176018e-05,
374
+ "loss": 0.2155,
375
+ "step": 510
376
+ },
377
+ {
378
+ "epoch": 1.3578191315703558,
379
+ "grad_norm": 0.2234184592962265,
380
+ "learning_rate": 1.8278529980657643e-05,
381
+ "loss": 0.207,
382
+ "step": 520
383
+ },
384
+ {
385
+ "epoch": 1.3839373163565132,
386
+ "grad_norm": 0.27478086948394775,
387
+ "learning_rate": 1.7988394584139263e-05,
388
+ "loss": 0.202,
389
+ "step": 530
390
+ },
391
+ {
392
+ "epoch": 1.4100555011426705,
393
+ "grad_norm": 0.23157715797424316,
394
+ "learning_rate": 1.769825918762089e-05,
395
+ "loss": 0.2132,
396
+ "step": 540
397
+ },
398
+ {
399
+ "epoch": 1.4361736859288279,
400
+ "grad_norm": 0.2137601226568222,
401
+ "learning_rate": 1.7408123791102515e-05,
402
+ "loss": 0.2206,
403
+ "step": 550
404
+ },
405
+ {
406
+ "epoch": 1.4622918707149852,
407
+ "grad_norm": 0.25221434235572815,
408
+ "learning_rate": 1.7117988394584137e-05,
409
+ "loss": 0.1936,
410
+ "step": 560
411
+ },
412
+ {
413
+ "epoch": 1.4884100555011428,
414
+ "grad_norm": 0.265802800655365,
415
+ "learning_rate": 1.6827852998065763e-05,
416
+ "loss": 0.1927,
417
+ "step": 570
418
+ },
419
+ {
420
+ "epoch": 1.5145282402873002,
421
+ "grad_norm": 0.29695677757263184,
422
+ "learning_rate": 1.653771760154739e-05,
423
+ "loss": 0.1901,
424
+ "step": 580
425
+ },
426
+ {
427
+ "epoch": 1.5406464250734575,
428
+ "grad_norm": 0.28603777289390564,
429
+ "learning_rate": 1.6247582205029015e-05,
430
+ "loss": 0.1817,
431
+ "step": 590
432
+ },
433
+ {
434
+ "epoch": 1.5667646098596149,
435
+ "grad_norm": 0.27603891491889954,
436
+ "learning_rate": 1.5957446808510637e-05,
437
+ "loss": 0.1796,
438
+ "step": 600
439
+ },
440
+ {
441
+ "epoch": 1.5928827946457722,
442
+ "grad_norm": 0.23388804495334625,
443
+ "learning_rate": 1.5667311411992263e-05,
444
+ "loss": 0.2099,
445
+ "step": 610
446
+ },
447
+ {
448
+ "epoch": 1.6190009794319296,
449
+ "grad_norm": 0.3027153015136719,
450
+ "learning_rate": 1.537717601547389e-05,
451
+ "loss": 0.1874,
452
+ "step": 620
453
+ },
454
+ {
455
+ "epoch": 1.645119164218087,
456
+ "grad_norm": 0.23569999635219574,
457
+ "learning_rate": 1.5087040618955514e-05,
458
+ "loss": 0.2092,
459
+ "step": 630
460
+ },
461
+ {
462
+ "epoch": 1.6712373490042443,
463
+ "grad_norm": 0.28446313738822937,
464
+ "learning_rate": 1.4796905222437138e-05,
465
+ "loss": 0.1928,
466
+ "step": 640
467
+ },
468
+ {
469
+ "epoch": 1.6973555337904016,
470
+ "grad_norm": 0.33438780903816223,
471
+ "learning_rate": 1.4506769825918764e-05,
472
+ "loss": 0.2076,
473
+ "step": 650
474
+ },
475
+ {
476
+ "epoch": 1.723473718576559,
477
+ "grad_norm": 0.24610307812690735,
478
+ "learning_rate": 1.4216634429400386e-05,
479
+ "loss": 0.1834,
480
+ "step": 660
481
+ },
482
+ {
483
+ "epoch": 1.7495919033627163,
484
+ "grad_norm": 0.3221604526042938,
485
+ "learning_rate": 1.392649903288201e-05,
486
+ "loss": 0.1886,
487
+ "step": 670
488
+ },
489
+ {
490
+ "epoch": 1.7757100881488737,
491
+ "grad_norm": 0.30149781703948975,
492
+ "learning_rate": 1.3636363636363637e-05,
493
+ "loss": 0.1903,
494
+ "step": 680
495
+ },
496
+ {
497
+ "epoch": 1.801828272935031,
498
+ "grad_norm": 0.26918601989746094,
499
+ "learning_rate": 1.334622823984526e-05,
500
+ "loss": 0.1847,
501
+ "step": 690
502
+ },
503
+ {
504
+ "epoch": 1.8279464577211884,
505
+ "grad_norm": 0.36477166414260864,
506
+ "learning_rate": 1.3056092843326887e-05,
507
+ "loss": 0.1873,
508
+ "step": 700
509
+ },
510
+ {
511
+ "epoch": 1.8540646425073457,
512
+ "grad_norm": 0.2950615882873535,
513
+ "learning_rate": 1.2765957446808511e-05,
514
+ "loss": 0.1884,
515
+ "step": 710
516
+ },
517
+ {
518
+ "epoch": 1.880182827293503,
519
+ "grad_norm": 0.29690101742744446,
520
+ "learning_rate": 1.2475822050290135e-05,
521
+ "loss": 0.1876,
522
+ "step": 720
523
+ },
524
+ {
525
+ "epoch": 1.9063010120796604,
526
+ "grad_norm": 0.3099982738494873,
527
+ "learning_rate": 1.2185686653771761e-05,
528
+ "loss": 0.1778,
529
+ "step": 730
530
+ },
531
+ {
532
+ "epoch": 1.9324191968658178,
533
+ "grad_norm": 0.32279762625694275,
534
+ "learning_rate": 1.1895551257253385e-05,
535
+ "loss": 0.1854,
536
+ "step": 740
537
+ },
538
+ {
539
+ "epoch": 1.958537381651975,
540
+ "grad_norm": 0.2992270588874817,
541
+ "learning_rate": 1.160541586073501e-05,
542
+ "loss": 0.1747,
543
+ "step": 750
544
+ },
545
+ {
546
+ "epoch": 1.9846555664381325,
547
+ "grad_norm": 0.27671152353286743,
548
+ "learning_rate": 1.1315280464216634e-05,
549
+ "loss": 0.1624,
550
+ "step": 760
551
+ },
552
+ {
553
+ "epoch": 2.0,
554
+ "eval_loss": 0.31829941272735596,
555
+ "eval_runtime": 46.8149,
556
+ "eval_samples_per_second": 19.609,
557
+ "eval_steps_per_second": 9.805,
558
+ "step": 766
559
+ },
560
+ {
561
+ "epoch": 2.0104472739144628,
562
+ "grad_norm": 0.302716463804245,
563
+ "learning_rate": 1.1025145067698258e-05,
564
+ "loss": 0.1923,
565
+ "step": 770
566
+ },
567
+ {
568
+ "epoch": 2.03656545870062,
569
+ "grad_norm": 0.2851350009441376,
570
+ "learning_rate": 1.0735009671179884e-05,
571
+ "loss": 0.1759,
572
+ "step": 780
573
+ },
574
+ {
575
+ "epoch": 2.0626836434867775,
576
+ "grad_norm": 0.3431122303009033,
577
+ "learning_rate": 1.0444874274661508e-05,
578
+ "loss": 0.1682,
579
+ "step": 790
580
+ },
581
+ {
582
+ "epoch": 2.088801828272935,
583
+ "grad_norm": 0.33571285009384155,
584
+ "learning_rate": 1.0154738878143134e-05,
585
+ "loss": 0.1568,
586
+ "step": 800
587
+ },
588
+ {
589
+ "epoch": 2.1149200130590926,
590
+ "grad_norm": 0.38898637890815735,
591
+ "learning_rate": 9.864603481624759e-06,
592
+ "loss": 0.1543,
593
+ "step": 810
594
+ },
595
+ {
596
+ "epoch": 2.14103819784525,
597
+ "grad_norm": 0.30346909165382385,
598
+ "learning_rate": 9.574468085106385e-06,
599
+ "loss": 0.1592,
600
+ "step": 820
601
+ },
602
+ {
603
+ "epoch": 2.1671563826314073,
604
+ "grad_norm": 0.33506202697753906,
605
+ "learning_rate": 9.284332688588009e-06,
606
+ "loss": 0.1853,
607
+ "step": 830
608
+ },
609
+ {
610
+ "epoch": 2.1932745674175647,
611
+ "grad_norm": 0.30120134353637695,
612
+ "learning_rate": 8.994197292069631e-06,
613
+ "loss": 0.1621,
614
+ "step": 840
615
+ },
616
+ {
617
+ "epoch": 2.219392752203722,
618
+ "grad_norm": 0.35409367084503174,
619
+ "learning_rate": 8.704061895551257e-06,
620
+ "loss": 0.1733,
621
+ "step": 850
622
+ },
623
+ {
624
+ "epoch": 2.2455109369898794,
625
+ "grad_norm": 0.4092079699039459,
626
+ "learning_rate": 8.413926499032882e-06,
627
+ "loss": 0.1616,
628
+ "step": 860
629
+ },
630
+ {
631
+ "epoch": 2.2716291217760367,
632
+ "grad_norm": 0.3036758005619049,
633
+ "learning_rate": 8.123791102514507e-06,
634
+ "loss": 0.1703,
635
+ "step": 870
636
+ },
637
+ {
638
+ "epoch": 2.297747306562194,
639
+ "grad_norm": 0.40276363492012024,
640
+ "learning_rate": 7.833655705996132e-06,
641
+ "loss": 0.1591,
642
+ "step": 880
643
+ },
644
+ {
645
+ "epoch": 2.3238654913483514,
646
+ "grad_norm": 0.3477386236190796,
647
+ "learning_rate": 7.543520309477757e-06,
648
+ "loss": 0.1848,
649
+ "step": 890
650
+ },
651
+ {
652
+ "epoch": 2.3499836761345088,
653
+ "grad_norm": 0.3512537479400635,
654
+ "learning_rate": 7.253384912959382e-06,
655
+ "loss": 0.1668,
656
+ "step": 900
657
+ },
658
+ {
659
+ "epoch": 2.376101860920666,
660
+ "grad_norm": 0.3957723081111908,
661
+ "learning_rate": 6.963249516441005e-06,
662
+ "loss": 0.1568,
663
+ "step": 910
664
+ },
665
+ {
666
+ "epoch": 2.4022200457068235,
667
+ "grad_norm": 0.36369815468788147,
668
+ "learning_rate": 6.67311411992263e-06,
669
+ "loss": 0.164,
670
+ "step": 920
671
+ },
672
+ {
673
+ "epoch": 2.428338230492981,
674
+ "grad_norm": 0.37673720717430115,
675
+ "learning_rate": 6.3829787234042555e-06,
676
+ "loss": 0.1635,
677
+ "step": 930
678
+ },
679
+ {
680
+ "epoch": 2.454456415279138,
681
+ "grad_norm": 0.42920976877212524,
682
+ "learning_rate": 6.092843326885881e-06,
683
+ "loss": 0.1608,
684
+ "step": 940
685
+ },
686
+ {
687
+ "epoch": 2.4805746000652955,
688
+ "grad_norm": 0.42978808283805847,
689
+ "learning_rate": 5.802707930367505e-06,
690
+ "loss": 0.1638,
691
+ "step": 950
692
+ },
693
+ {
694
+ "epoch": 2.506692784851453,
695
+ "grad_norm": 0.4102988839149475,
696
+ "learning_rate": 5.512572533849129e-06,
697
+ "loss": 0.1654,
698
+ "step": 960
699
+ },
700
+ {
701
+ "epoch": 2.53281096963761,
702
+ "grad_norm": 0.33200573921203613,
703
+ "learning_rate": 5.222437137330754e-06,
704
+ "loss": 0.1587,
705
+ "step": 970
706
+ },
707
+ {
708
+ "epoch": 2.5589291544237676,
709
+ "grad_norm": 0.34544673562049866,
710
+ "learning_rate": 4.932301740812379e-06,
711
+ "loss": 0.1422,
712
+ "step": 980
713
+ },
714
+ {
715
+ "epoch": 2.585047339209925,
716
+ "grad_norm": 0.3770700693130493,
717
+ "learning_rate": 4.642166344294004e-06,
718
+ "loss": 0.1533,
719
+ "step": 990
720
+ },
721
+ {
722
+ "epoch": 2.6111655239960823,
723
+ "grad_norm": 0.4285246431827545,
724
+ "learning_rate": 4.352030947775629e-06,
725
+ "loss": 0.1667,
726
+ "step": 1000
727
+ },
728
+ {
729
+ "epoch": 2.6372837087822396,
730
+ "grad_norm": 0.3781305253505707,
731
+ "learning_rate": 4.061895551257254e-06,
732
+ "loss": 0.1545,
733
+ "step": 1010
734
+ },
735
+ {
736
+ "epoch": 2.663401893568397,
737
+ "grad_norm": 0.41318264603614807,
738
+ "learning_rate": 3.7717601547388784e-06,
739
+ "loss": 0.1679,
740
+ "step": 1020
741
+ },
742
+ {
743
+ "epoch": 2.6895200783545543,
744
+ "grad_norm": 0.40817004442214966,
745
+ "learning_rate": 3.4816247582205027e-06,
746
+ "loss": 0.1573,
747
+ "step": 1030
748
+ },
749
+ {
750
+ "epoch": 2.7156382631407117,
751
+ "grad_norm": 0.41028645634651184,
752
+ "learning_rate": 3.1914893617021277e-06,
753
+ "loss": 0.1474,
754
+ "step": 1040
755
+ },
756
+ {
757
+ "epoch": 2.741756447926869,
758
+ "grad_norm": 0.3509976267814636,
759
+ "learning_rate": 2.9013539651837524e-06,
760
+ "loss": 0.1578,
761
+ "step": 1050
762
+ },
763
+ {
764
+ "epoch": 2.7678746327130264,
765
+ "grad_norm": 0.4071323573589325,
766
+ "learning_rate": 2.611218568665377e-06,
767
+ "loss": 0.1496,
768
+ "step": 1060
769
+ },
770
+ {
771
+ "epoch": 2.7939928174991837,
772
+ "grad_norm": 0.2635524570941925,
773
+ "learning_rate": 2.321083172147002e-06,
774
+ "loss": 0.1696,
775
+ "step": 1070
776
+ },
777
+ {
778
+ "epoch": 2.820111002285341,
779
+ "grad_norm": 0.3947674632072449,
780
+ "learning_rate": 2.030947775628627e-06,
781
+ "loss": 0.1664,
782
+ "step": 1080
783
+ },
784
+ {
785
+ "epoch": 2.8462291870714984,
786
+ "grad_norm": 0.47258490324020386,
787
+ "learning_rate": 1.7408123791102513e-06,
788
+ "loss": 0.1617,
789
+ "step": 1090
790
+ },
791
+ {
792
+ "epoch": 2.8723473718576558,
793
+ "grad_norm": 0.3031919300556183,
794
+ "learning_rate": 1.4506769825918762e-06,
795
+ "loss": 0.1454,
796
+ "step": 1100
797
+ },
798
+ {
799
+ "epoch": 2.898465556643813,
800
+ "grad_norm": 0.3442898988723755,
801
+ "learning_rate": 1.160541586073501e-06,
802
+ "loss": 0.1584,
803
+ "step": 1110
804
+ },
805
+ {
806
+ "epoch": 2.9245837414299705,
807
+ "grad_norm": 0.46089524030685425,
808
+ "learning_rate": 8.704061895551257e-07,
809
+ "loss": 0.1607,
810
+ "step": 1120
811
+ },
812
+ {
813
+ "epoch": 2.950701926216128,
814
+ "grad_norm": 0.35179299116134644,
815
+ "learning_rate": 5.802707930367505e-07,
816
+ "loss": 0.1668,
817
+ "step": 1130
818
+ },
819
+ {
820
+ "epoch": 2.9768201110022856,
821
+ "grad_norm": 0.40915292501449585,
822
+ "learning_rate": 2.901353965183753e-07,
823
+ "loss": 0.1522,
824
+ "step": 1140
825
+ },
826
+ {
827
+ "epoch": 3.0,
828
+ "eval_loss": 0.3278330862522125,
829
+ "eval_runtime": 46.5303,
830
+ "eval_samples_per_second": 19.729,
831
+ "eval_steps_per_second": 9.865,
832
+ "step": 1149
833
+ }
834
+ ],
835
+ "logging_steps": 10,
836
+ "max_steps": 1149,
837
+ "num_input_tokens_seen": 0,
838
+ "num_train_epochs": 3,
839
+ "save_steps": 500,
840
+ "stateful_callbacks": {
841
+ "TrainerControl": {
842
+ "args": {
843
+ "should_epoch_stop": false,
844
+ "should_evaluate": false,
845
+ "should_log": false,
846
+ "should_save": true,
847
+ "should_training_stop": true
848
+ },
849
+ "attributes": {}
850
+ }
851
+ },
852
+ "total_flos": 1.1050463737748521e+18,
853
+ "train_batch_size": 2,
854
+ "trial_name": null,
855
+ "trial_params": null
856
+ }
checkpoint-1149/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e0dcc7897cff0517627b4255462f4819ca6de3aa7097d571954aeb56a80da70f
3
+ size 5841
checkpoint-1149/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-383/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-Coder-7B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen2.5-Coder-7B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
checkpoint-383/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen2.5-Coder-7B",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 16,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "down_proj",
33
+ "v_proj",
34
+ "k_proj",
35
+ "gate_proj",
36
+ "o_proj",
37
+ "q_proj",
38
+ "up_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
checkpoint-383/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecc472753be5246dce3068d4e2f267aa3e7ab2e1d509a3c9bbed869a9625ab76
3
+ size 161533192
checkpoint-383/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
checkpoint-383/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
checkpoint-383/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-383/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5059d7c6ca8da1076c3405bdb7d85b50f493098407671ea56adb26d96c440b7
3
+ size 323296891
checkpoint-383/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9957b24788596f61d86971133b3996b0d0b8777a9784fe8551daa67d1e57cb95
3
+ size 14645
checkpoint-383/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:443804228bfaed061367c6ee5d8062aa183eb4a06f8087b03c0cc956e1cebbaa
3
+ size 1465
checkpoint-383/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
checkpoint-383/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:913950e4971737031da511cdd1b410daae4566f62eb845b3975bca5a102323d8
3
+ size 11421995
checkpoint-383/tokenizer_config.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|endoftext|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 32768,
203
+ "pad_token": "<|endoftext|>",
204
+ "padding_side": "right",
205
+ "split_special_tokens": false,
206
+ "tokenizer_class": "Qwen2Tokenizer",
207
+ "unk_token": null
208
+ }
checkpoint-383/trainer_state.json ADDED
@@ -0,0 +1,308 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 383,
3
+ "best_metric": 0.3038630783557892,
4
+ "best_model_checkpoint": "./lora_qwen7b_cpp_abdiff_v1/checkpoint-383",
5
+ "epoch": 1.0,
6
+ "eval_steps": 500,
7
+ "global_step": 383,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.02611818478615736,
14
+ "grad_norm": 0.272135853767395,
15
+ "learning_rate": 2.347826086956522e-06,
16
+ "loss": 0.747,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.05223636957231472,
21
+ "grad_norm": 0.2994903028011322,
22
+ "learning_rate": 4.956521739130435e-06,
23
+ "loss": 0.729,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.07835455435847209,
28
+ "grad_norm": 0.3347424566745758,
29
+ "learning_rate": 7.5652173913043475e-06,
30
+ "loss": 0.7694,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.10447273914462944,
35
+ "grad_norm": 0.24008171260356903,
36
+ "learning_rate": 1.017391304347826e-05,
37
+ "loss": 0.6931,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.1305909239307868,
42
+ "grad_norm": 0.31324484944343567,
43
+ "learning_rate": 1.2782608695652173e-05,
44
+ "loss": 0.6601,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.15670910871694418,
49
+ "grad_norm": 0.3652394115924835,
50
+ "learning_rate": 1.5391304347826088e-05,
51
+ "loss": 0.4995,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.18282729350310153,
56
+ "grad_norm": 0.18758858740329742,
57
+ "learning_rate": 1.8e-05,
58
+ "loss": 0.3874,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.20894547828925888,
63
+ "grad_norm": 0.11186491698026657,
64
+ "learning_rate": 2.0608695652173913e-05,
65
+ "loss": 0.3579,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.23506366307541626,
70
+ "grad_norm": 0.0821973979473114,
71
+ "learning_rate": 2.3217391304347826e-05,
72
+ "loss": 0.3375,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.2611818478615736,
77
+ "grad_norm": 0.10780682414770126,
78
+ "learning_rate": 2.582608695652174e-05,
79
+ "loss": 0.3286,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.28730003264773096,
84
+ "grad_norm": 0.09552709758281708,
85
+ "learning_rate": 2.8434782608695652e-05,
86
+ "loss": 0.3135,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.31341821743388837,
91
+ "grad_norm": 0.1099899634718895,
92
+ "learning_rate": 2.988394584139265e-05,
93
+ "loss": 0.2828,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.3395364022200457,
98
+ "grad_norm": 0.08481885492801666,
99
+ "learning_rate": 2.9593810444874276e-05,
100
+ "loss": 0.2884,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.36565458700620307,
105
+ "grad_norm": 0.0872187465429306,
106
+ "learning_rate": 2.93036750483559e-05,
107
+ "loss": 0.2856,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.3917727717923604,
112
+ "grad_norm": 0.09415256232023239,
113
+ "learning_rate": 2.9013539651837528e-05,
114
+ "loss": 0.2958,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.41789095657851777,
119
+ "grad_norm": 0.09529490023851395,
120
+ "learning_rate": 2.872340425531915e-05,
121
+ "loss": 0.2832,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.4440091413646752,
126
+ "grad_norm": 0.12695518136024475,
127
+ "learning_rate": 2.8433268858800773e-05,
128
+ "loss": 0.2852,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.4701273261508325,
133
+ "grad_norm": 0.09822621941566467,
134
+ "learning_rate": 2.81431334622824e-05,
135
+ "loss": 0.2647,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.4962455109369899,
140
+ "grad_norm": 0.11181768029928207,
141
+ "learning_rate": 2.785299806576402e-05,
142
+ "loss": 0.2718,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.5223636957231472,
147
+ "grad_norm": 0.0999976173043251,
148
+ "learning_rate": 2.7562862669245647e-05,
149
+ "loss": 0.287,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.5484818805093046,
154
+ "grad_norm": 0.11232498288154602,
155
+ "learning_rate": 2.7272727272727273e-05,
156
+ "loss": 0.2657,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.5746000652954619,
161
+ "grad_norm": 0.09556713700294495,
162
+ "learning_rate": 2.69825918762089e-05,
163
+ "loss": 0.2728,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.6007182500816193,
168
+ "grad_norm": 0.09838665276765823,
169
+ "learning_rate": 2.669245647969052e-05,
170
+ "loss": 0.2679,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.6268364348677767,
175
+ "grad_norm": 0.12542720139026642,
176
+ "learning_rate": 2.6402321083172148e-05,
177
+ "loss": 0.2717,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.6529546196539341,
182
+ "grad_norm": 0.12866875529289246,
183
+ "learning_rate": 2.6112185686653773e-05,
184
+ "loss": 0.2722,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.6790728044400914,
189
+ "grad_norm": 0.12063395977020264,
190
+ "learning_rate": 2.5822050290135396e-05,
191
+ "loss": 0.2665,
192
+ "step": 260
193
+ },
194
+ {
195
+ "epoch": 0.7051909892262488,
196
+ "grad_norm": 0.12060956656932831,
197
+ "learning_rate": 2.5531914893617022e-05,
198
+ "loss": 0.2498,
199
+ "step": 270
200
+ },
201
+ {
202
+ "epoch": 0.7313091740124061,
203
+ "grad_norm": 0.1269434541463852,
204
+ "learning_rate": 2.5241779497098648e-05,
205
+ "loss": 0.2685,
206
+ "step": 280
207
+ },
208
+ {
209
+ "epoch": 0.7574273587985635,
210
+ "grad_norm": 0.13224200904369354,
211
+ "learning_rate": 2.495164410058027e-05,
212
+ "loss": 0.2562,
213
+ "step": 290
214
+ },
215
+ {
216
+ "epoch": 0.7835455435847208,
217
+ "grad_norm": 0.12492657452821732,
218
+ "learning_rate": 2.4661508704061896e-05,
219
+ "loss": 0.2436,
220
+ "step": 300
221
+ },
222
+ {
223
+ "epoch": 0.8096637283708782,
224
+ "grad_norm": 0.19281449913978577,
225
+ "learning_rate": 2.4371373307543522e-05,
226
+ "loss": 0.2593,
227
+ "step": 310
228
+ },
229
+ {
230
+ "epoch": 0.8357819131570355,
231
+ "grad_norm": 0.13644647598266602,
232
+ "learning_rate": 2.408123791102515e-05,
233
+ "loss": 0.2678,
234
+ "step": 320
235
+ },
236
+ {
237
+ "epoch": 0.861900097943193,
238
+ "grad_norm": 0.13078241050243378,
239
+ "learning_rate": 2.379110251450677e-05,
240
+ "loss": 0.2586,
241
+ "step": 330
242
+ },
243
+ {
244
+ "epoch": 0.8880182827293504,
245
+ "grad_norm": 0.16267353296279907,
246
+ "learning_rate": 2.3500967117988397e-05,
247
+ "loss": 0.2441,
248
+ "step": 340
249
+ },
250
+ {
251
+ "epoch": 0.9141364675155077,
252
+ "grad_norm": 0.14218953251838684,
253
+ "learning_rate": 2.321083172147002e-05,
254
+ "loss": 0.2561,
255
+ "step": 350
256
+ },
257
+ {
258
+ "epoch": 0.940254652301665,
259
+ "grad_norm": 0.14463242888450623,
260
+ "learning_rate": 2.2920696324951642e-05,
261
+ "loss": 0.2437,
262
+ "step": 360
263
+ },
264
+ {
265
+ "epoch": 0.9663728370878224,
266
+ "grad_norm": 0.15065905451774597,
267
+ "learning_rate": 2.2630560928433268e-05,
268
+ "loss": 0.2448,
269
+ "step": 370
270
+ },
271
+ {
272
+ "epoch": 0.9924910218739798,
273
+ "grad_norm": 0.1440647393465042,
274
+ "learning_rate": 2.2340425531914894e-05,
275
+ "loss": 0.2407,
276
+ "step": 380
277
+ },
278
+ {
279
+ "epoch": 1.0,
280
+ "eval_loss": 0.3038630783557892,
281
+ "eval_runtime": 46.9955,
282
+ "eval_samples_per_second": 19.534,
283
+ "eval_steps_per_second": 9.767,
284
+ "step": 383
285
+ }
286
+ ],
287
+ "logging_steps": 10,
288
+ "max_steps": 1149,
289
+ "num_input_tokens_seen": 0,
290
+ "num_train_epochs": 3,
291
+ "save_steps": 500,
292
+ "stateful_callbacks": {
293
+ "TrainerControl": {
294
+ "args": {
295
+ "should_epoch_stop": false,
296
+ "should_evaluate": false,
297
+ "should_log": false,
298
+ "should_save": true,
299
+ "should_training_stop": false
300
+ },
301
+ "attributes": {}
302
+ }
303
+ },
304
+ "total_flos": 3.6853700492339405e+17,
305
+ "train_batch_size": 2,
306
+ "trial_name": null,
307
+ "trial_params": null
308
+ }
checkpoint-383/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e0dcc7897cff0517627b4255462f4819ca6de3aa7097d571954aeb56a80da70f
3
+ size 5841
checkpoint-383/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:913950e4971737031da511cdd1b410daae4566f62eb845b3975bca5a102323d8
3
+ size 11421995
tokenizer_config.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|endoftext|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 32768,
203
+ "pad_token": "<|endoftext|>",
204
+ "padding_side": "right",
205
+ "split_special_tokens": false,
206
+ "tokenizer_class": "Qwen2Tokenizer",
207
+ "unk_token": null
208
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e0dcc7897cff0517627b4255462f4819ca6de3aa7097d571954aeb56a80da70f
3
+ size 5841
vocab.json ADDED
The diff for this file is too large to render. See raw diff