lekssays commited on
Commit
6a0bda8
·
verified ·
1 Parent(s): 48c4403

Initial model upload

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ checkpoint-500/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ checkpoint-568/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,202 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: unsloth/qwq-32b-preview-bnb-4bit
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.14.0
adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "unsloth/qwq-32b-preview-bnb-4bit",
5
+ "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
+ "fan_in_fan_out": false,
9
+ "inference_mode": true,
10
+ "init_lora_weights": true,
11
+ "layer_replication": null,
12
+ "layers_pattern": null,
13
+ "layers_to_transform": null,
14
+ "loftq_config": {},
15
+ "lora_alpha": 32,
16
+ "lora_bias": false,
17
+ "lora_dropout": 0,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 64,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "k_proj",
27
+ "q_proj",
28
+ "up_proj",
29
+ "o_proj",
30
+ "v_proj",
31
+ "gate_proj",
32
+ "down_proj"
33
+ ],
34
+ "task_type": "CAUSAL_LM",
35
+ "use_dora": false,
36
+ "use_rslora": true
37
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5cdb6133af159aecdfd884d89b887938f2de9fcd125838697f1697d34a4aecab
3
+ size 2147605960
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
checkpoint-500/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: unsloth/qwq-32b-preview-bnb-4bit
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.14.0
checkpoint-500/adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "unsloth/qwq-32b-preview-bnb-4bit",
5
+ "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
+ "fan_in_fan_out": false,
9
+ "inference_mode": true,
10
+ "init_lora_weights": true,
11
+ "layer_replication": null,
12
+ "layers_pattern": null,
13
+ "layers_to_transform": null,
14
+ "loftq_config": {},
15
+ "lora_alpha": 32,
16
+ "lora_bias": false,
17
+ "lora_dropout": 0,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 64,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "k_proj",
27
+ "q_proj",
28
+ "up_proj",
29
+ "o_proj",
30
+ "v_proj",
31
+ "gate_proj",
32
+ "down_proj"
33
+ ],
34
+ "task_type": "CAUSAL_LM",
35
+ "use_dora": false,
36
+ "use_rslora": true
37
+ }
checkpoint-500/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5cdb6133af159aecdfd884d89b887938f2de9fcd125838697f1697d34a4aecab
3
+ size 2147605960
checkpoint-500/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
checkpoint-500/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-500/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a901a9a5d0b82d24fd529f22334f48775a959ca301e59768510ecc03599210d
3
+ size 1091573332
checkpoint-500/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e1c0fb523914a5a4fd99843319c531ac1b76196088779c16816fd1dd6a05915
3
+ size 14244
checkpoint-500/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:183b296603218453f024f97991de116afc302207a9c1d93500eb201e53da7e60
3
+ size 1064
checkpoint-500/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|vision_pad|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
checkpoint-500/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
checkpoint-500/tokenizer_config.json ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "extra_special_tokens": {},
203
+ "model_max_length": 32768,
204
+ "pad_token": "<|vision_pad|>",
205
+ "padding_side": "right",
206
+ "split_special_tokens": false,
207
+ "tokenizer_class": "Qwen2Tokenizer",
208
+ "unk_token": null
209
+ }
checkpoint-500/trainer_state.json ADDED
@@ -0,0 +1,3573 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.03437558189034462,
3
+ "best_model_checkpoint": "./models/QwQ-32B-Preview_All-Label-Avengers/checkpoint-500",
4
+ "epoch": 1.7557117750439368,
5
+ "eval_steps": 100,
6
+ "global_step": 500,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0035149384885764497,
13
+ "grad_norm": 2.4306211471557617,
14
+ "learning_rate": 2.325581395348837e-06,
15
+ "loss": 1.7264,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.007029876977152899,
20
+ "grad_norm": 2.4965059757232666,
21
+ "learning_rate": 4.651162790697674e-06,
22
+ "loss": 1.7642,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.01054481546572935,
27
+ "grad_norm": 2.1308212280273438,
28
+ "learning_rate": 6.976744186046512e-06,
29
+ "loss": 1.7182,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.014059753954305799,
34
+ "grad_norm": 1.6225489377975464,
35
+ "learning_rate": 9.302325581395349e-06,
36
+ "loss": 1.7998,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.01757469244288225,
41
+ "grad_norm": 1.573882818222046,
42
+ "learning_rate": 1.1627906976744187e-05,
43
+ "loss": 1.769,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.0210896309314587,
48
+ "grad_norm": 1.4130934476852417,
49
+ "learning_rate": 1.3953488372093024e-05,
50
+ "loss": 1.6766,
51
+ "step": 6
52
+ },
53
+ {
54
+ "epoch": 0.02460456942003515,
55
+ "grad_norm": 1.290319800376892,
56
+ "learning_rate": 1.6279069767441862e-05,
57
+ "loss": 1.6666,
58
+ "step": 7
59
+ },
60
+ {
61
+ "epoch": 0.028119507908611598,
62
+ "grad_norm": 1.12405526638031,
63
+ "learning_rate": 1.8604651162790697e-05,
64
+ "loss": 1.5581,
65
+ "step": 8
66
+ },
67
+ {
68
+ "epoch": 0.03163444639718805,
69
+ "grad_norm": 1.1816198825836182,
70
+ "learning_rate": 2.0930232558139536e-05,
71
+ "loss": 1.4704,
72
+ "step": 9
73
+ },
74
+ {
75
+ "epoch": 0.0351493848857645,
76
+ "grad_norm": 1.293810486793518,
77
+ "learning_rate": 2.3255813953488374e-05,
78
+ "loss": 1.4473,
79
+ "step": 10
80
+ },
81
+ {
82
+ "epoch": 0.03866432337434095,
83
+ "grad_norm": 1.3082643747329712,
84
+ "learning_rate": 2.5581395348837212e-05,
85
+ "loss": 1.1881,
86
+ "step": 11
87
+ },
88
+ {
89
+ "epoch": 0.0421792618629174,
90
+ "grad_norm": 1.2528855800628662,
91
+ "learning_rate": 2.7906976744186048e-05,
92
+ "loss": 1.0205,
93
+ "step": 12
94
+ },
95
+ {
96
+ "epoch": 0.04569420035149385,
97
+ "grad_norm": 1.2669192552566528,
98
+ "learning_rate": 3.0232558139534883e-05,
99
+ "loss": 0.8347,
100
+ "step": 13
101
+ },
102
+ {
103
+ "epoch": 0.0492091388400703,
104
+ "grad_norm": 1.3900455236434937,
105
+ "learning_rate": 3.2558139534883724e-05,
106
+ "loss": 0.6902,
107
+ "step": 14
108
+ },
109
+ {
110
+ "epoch": 0.05272407732864675,
111
+ "grad_norm": 1.1759059429168701,
112
+ "learning_rate": 3.488372093023256e-05,
113
+ "loss": 0.5728,
114
+ "step": 15
115
+ },
116
+ {
117
+ "epoch": 0.056239015817223195,
118
+ "grad_norm": 1.1620585918426514,
119
+ "learning_rate": 3.7209302325581394e-05,
120
+ "loss": 0.3972,
121
+ "step": 16
122
+ },
123
+ {
124
+ "epoch": 0.05975395430579965,
125
+ "grad_norm": 0.7304664254188538,
126
+ "learning_rate": 3.953488372093023e-05,
127
+ "loss": 0.3352,
128
+ "step": 17
129
+ },
130
+ {
131
+ "epoch": 0.0632688927943761,
132
+ "grad_norm": 0.46442729234695435,
133
+ "learning_rate": 4.186046511627907e-05,
134
+ "loss": 0.2574,
135
+ "step": 18
136
+ },
137
+ {
138
+ "epoch": 0.06678383128295255,
139
+ "grad_norm": 0.30402088165283203,
140
+ "learning_rate": 4.418604651162791e-05,
141
+ "loss": 0.206,
142
+ "step": 19
143
+ },
144
+ {
145
+ "epoch": 0.070298769771529,
146
+ "grad_norm": 0.32389989495277405,
147
+ "learning_rate": 4.651162790697675e-05,
148
+ "loss": 0.1669,
149
+ "step": 20
150
+ },
151
+ {
152
+ "epoch": 0.07381370826010544,
153
+ "grad_norm": 0.20291510224342346,
154
+ "learning_rate": 4.883720930232558e-05,
155
+ "loss": 0.2767,
156
+ "step": 21
157
+ },
158
+ {
159
+ "epoch": 0.0773286467486819,
160
+ "grad_norm": 0.19848257303237915,
161
+ "learning_rate": 5.1162790697674425e-05,
162
+ "loss": 0.1934,
163
+ "step": 22
164
+ },
165
+ {
166
+ "epoch": 0.08084358523725835,
167
+ "grad_norm": 0.26607176661491394,
168
+ "learning_rate": 5.348837209302326e-05,
169
+ "loss": 0.2551,
170
+ "step": 23
171
+ },
172
+ {
173
+ "epoch": 0.0843585237258348,
174
+ "grad_norm": 0.24385203421115875,
175
+ "learning_rate": 5.5813953488372095e-05,
176
+ "loss": 0.2151,
177
+ "step": 24
178
+ },
179
+ {
180
+ "epoch": 0.08787346221441125,
181
+ "grad_norm": 0.2142629325389862,
182
+ "learning_rate": 5.8139534883720933e-05,
183
+ "loss": 0.2451,
184
+ "step": 25
185
+ },
186
+ {
187
+ "epoch": 0.0913884007029877,
188
+ "grad_norm": 0.18275344371795654,
189
+ "learning_rate": 6.0465116279069765e-05,
190
+ "loss": 0.2199,
191
+ "step": 26
192
+ },
193
+ {
194
+ "epoch": 0.09490333919156414,
195
+ "grad_norm": 0.18780425190925598,
196
+ "learning_rate": 6.27906976744186e-05,
197
+ "loss": 0.1993,
198
+ "step": 27
199
+ },
200
+ {
201
+ "epoch": 0.0984182776801406,
202
+ "grad_norm": 0.15310600399971008,
203
+ "learning_rate": 6.511627906976745e-05,
204
+ "loss": 0.1828,
205
+ "step": 28
206
+ },
207
+ {
208
+ "epoch": 0.10193321616871705,
209
+ "grad_norm": 0.18113793432712555,
210
+ "learning_rate": 6.744186046511628e-05,
211
+ "loss": 0.2343,
212
+ "step": 29
213
+ },
214
+ {
215
+ "epoch": 0.1054481546572935,
216
+ "grad_norm": 0.1769862323999405,
217
+ "learning_rate": 6.976744186046513e-05,
218
+ "loss": 0.2475,
219
+ "step": 30
220
+ },
221
+ {
222
+ "epoch": 0.10896309314586995,
223
+ "grad_norm": 0.15000343322753906,
224
+ "learning_rate": 7.209302325581396e-05,
225
+ "loss": 0.1947,
226
+ "step": 31
227
+ },
228
+ {
229
+ "epoch": 0.11247803163444639,
230
+ "grad_norm": 0.13723662495613098,
231
+ "learning_rate": 7.441860465116279e-05,
232
+ "loss": 0.1622,
233
+ "step": 32
234
+ },
235
+ {
236
+ "epoch": 0.11599297012302284,
237
+ "grad_norm": 0.3588716983795166,
238
+ "learning_rate": 7.674418604651163e-05,
239
+ "loss": 0.2407,
240
+ "step": 33
241
+ },
242
+ {
243
+ "epoch": 0.1195079086115993,
244
+ "grad_norm": 0.13395386934280396,
245
+ "learning_rate": 7.906976744186047e-05,
246
+ "loss": 0.1658,
247
+ "step": 34
248
+ },
249
+ {
250
+ "epoch": 0.12302284710017575,
251
+ "grad_norm": 0.14775735139846802,
252
+ "learning_rate": 8.139534883720931e-05,
253
+ "loss": 0.1826,
254
+ "step": 35
255
+ },
256
+ {
257
+ "epoch": 0.1265377855887522,
258
+ "grad_norm": 0.1646275818347931,
259
+ "learning_rate": 8.372093023255814e-05,
260
+ "loss": 0.248,
261
+ "step": 36
262
+ },
263
+ {
264
+ "epoch": 0.13005272407732865,
265
+ "grad_norm": 0.1475340873003006,
266
+ "learning_rate": 8.604651162790697e-05,
267
+ "loss": 0.1973,
268
+ "step": 37
269
+ },
270
+ {
271
+ "epoch": 0.1335676625659051,
272
+ "grad_norm": 0.1472110003232956,
273
+ "learning_rate": 8.837209302325582e-05,
274
+ "loss": 0.1749,
275
+ "step": 38
276
+ },
277
+ {
278
+ "epoch": 0.13708260105448156,
279
+ "grad_norm": 0.175991490483284,
280
+ "learning_rate": 9.069767441860465e-05,
281
+ "loss": 0.2015,
282
+ "step": 39
283
+ },
284
+ {
285
+ "epoch": 0.140597539543058,
286
+ "grad_norm": 0.15504464507102966,
287
+ "learning_rate": 9.30232558139535e-05,
288
+ "loss": 0.1953,
289
+ "step": 40
290
+ },
291
+ {
292
+ "epoch": 0.14411247803163443,
293
+ "grad_norm": 0.1356693059206009,
294
+ "learning_rate": 9.534883720930233e-05,
295
+ "loss": 0.1514,
296
+ "step": 41
297
+ },
298
+ {
299
+ "epoch": 0.14762741652021089,
300
+ "grad_norm": 0.1488504558801651,
301
+ "learning_rate": 9.767441860465116e-05,
302
+ "loss": 0.2224,
303
+ "step": 42
304
+ },
305
+ {
306
+ "epoch": 0.15114235500878734,
307
+ "grad_norm": 0.15970350801944733,
308
+ "learning_rate": 0.0001,
309
+ "loss": 0.204,
310
+ "step": 43
311
+ },
312
+ {
313
+ "epoch": 0.1546572934973638,
314
+ "grad_norm": 0.15022525191307068,
315
+ "learning_rate": 0.00010232558139534885,
316
+ "loss": 0.1918,
317
+ "step": 44
318
+ },
319
+ {
320
+ "epoch": 0.15817223198594024,
321
+ "grad_norm": 0.15593580901622772,
322
+ "learning_rate": 0.00010465116279069768,
323
+ "loss": 0.2139,
324
+ "step": 45
325
+ },
326
+ {
327
+ "epoch": 0.1616871704745167,
328
+ "grad_norm": 0.18487761914730072,
329
+ "learning_rate": 0.00010697674418604651,
330
+ "loss": 0.221,
331
+ "step": 46
332
+ },
333
+ {
334
+ "epoch": 0.16520210896309315,
335
+ "grad_norm": 0.14810211956501007,
336
+ "learning_rate": 0.00010930232558139534,
337
+ "loss": 0.1725,
338
+ "step": 47
339
+ },
340
+ {
341
+ "epoch": 0.1687170474516696,
342
+ "grad_norm": 0.14104707539081573,
343
+ "learning_rate": 0.00011162790697674419,
344
+ "loss": 0.1917,
345
+ "step": 48
346
+ },
347
+ {
348
+ "epoch": 0.17223198594024605,
349
+ "grad_norm": 0.14195498824119568,
350
+ "learning_rate": 0.00011395348837209304,
351
+ "loss": 0.1238,
352
+ "step": 49
353
+ },
354
+ {
355
+ "epoch": 0.1757469244288225,
356
+ "grad_norm": 0.17054127156734467,
357
+ "learning_rate": 0.00011627906976744187,
358
+ "loss": 0.1907,
359
+ "step": 50
360
+ },
361
+ {
362
+ "epoch": 0.17926186291739896,
363
+ "grad_norm": 0.14932486414909363,
364
+ "learning_rate": 0.00011860465116279071,
365
+ "loss": 0.2198,
366
+ "step": 51
367
+ },
368
+ {
369
+ "epoch": 0.1827768014059754,
370
+ "grad_norm": 0.1489359587430954,
371
+ "learning_rate": 0.00012093023255813953,
372
+ "loss": 0.15,
373
+ "step": 52
374
+ },
375
+ {
376
+ "epoch": 0.18629173989455183,
377
+ "grad_norm": 0.18924278020858765,
378
+ "learning_rate": 0.00012325581395348836,
379
+ "loss": 0.2287,
380
+ "step": 53
381
+ },
382
+ {
383
+ "epoch": 0.18980667838312829,
384
+ "grad_norm": 0.16668835282325745,
385
+ "learning_rate": 0.0001255813953488372,
386
+ "loss": 0.17,
387
+ "step": 54
388
+ },
389
+ {
390
+ "epoch": 0.19332161687170474,
391
+ "grad_norm": 0.16550259292125702,
392
+ "learning_rate": 0.00012790697674418605,
393
+ "loss": 0.2501,
394
+ "step": 55
395
+ },
396
+ {
397
+ "epoch": 0.1968365553602812,
398
+ "grad_norm": 0.18516315519809723,
399
+ "learning_rate": 0.0001302325581395349,
400
+ "loss": 0.221,
401
+ "step": 56
402
+ },
403
+ {
404
+ "epoch": 0.20035149384885764,
405
+ "grad_norm": 0.1466783583164215,
406
+ "learning_rate": 0.00013255813953488372,
407
+ "loss": 0.165,
408
+ "step": 57
409
+ },
410
+ {
411
+ "epoch": 0.2038664323374341,
412
+ "grad_norm": 0.16772057116031647,
413
+ "learning_rate": 0.00013488372093023256,
414
+ "loss": 0.173,
415
+ "step": 58
416
+ },
417
+ {
418
+ "epoch": 0.20738137082601055,
419
+ "grad_norm": 0.13877983391284943,
420
+ "learning_rate": 0.0001372093023255814,
421
+ "loss": 0.1387,
422
+ "step": 59
423
+ },
424
+ {
425
+ "epoch": 0.210896309314587,
426
+ "grad_norm": 0.1386060118675232,
427
+ "learning_rate": 0.00013953488372093025,
428
+ "loss": 0.1249,
429
+ "step": 60
430
+ },
431
+ {
432
+ "epoch": 0.21441124780316345,
433
+ "grad_norm": 0.14806397259235382,
434
+ "learning_rate": 0.0001418604651162791,
435
+ "loss": 0.1471,
436
+ "step": 61
437
+ },
438
+ {
439
+ "epoch": 0.2179261862917399,
440
+ "grad_norm": 0.14199328422546387,
441
+ "learning_rate": 0.00014418604651162791,
442
+ "loss": 0.1218,
443
+ "step": 62
444
+ },
445
+ {
446
+ "epoch": 0.22144112478031636,
447
+ "grad_norm": 0.2156495600938797,
448
+ "learning_rate": 0.00014651162790697673,
449
+ "loss": 0.1707,
450
+ "step": 63
451
+ },
452
+ {
453
+ "epoch": 0.22495606326889278,
454
+ "grad_norm": 0.1671278327703476,
455
+ "learning_rate": 0.00014883720930232558,
456
+ "loss": 0.176,
457
+ "step": 64
458
+ },
459
+ {
460
+ "epoch": 0.22847100175746923,
461
+ "grad_norm": 0.23461486399173737,
462
+ "learning_rate": 0.00015116279069767442,
463
+ "loss": 0.1741,
464
+ "step": 65
465
+ },
466
+ {
467
+ "epoch": 0.23198594024604569,
468
+ "grad_norm": 0.234146386384964,
469
+ "learning_rate": 0.00015348837209302327,
470
+ "loss": 0.2728,
471
+ "step": 66
472
+ },
473
+ {
474
+ "epoch": 0.23550087873462214,
475
+ "grad_norm": 0.17772763967514038,
476
+ "learning_rate": 0.0001558139534883721,
477
+ "loss": 0.1268,
478
+ "step": 67
479
+ },
480
+ {
481
+ "epoch": 0.2390158172231986,
482
+ "grad_norm": 0.17169110476970673,
483
+ "learning_rate": 0.00015813953488372093,
484
+ "loss": 0.1236,
485
+ "step": 68
486
+ },
487
+ {
488
+ "epoch": 0.24253075571177504,
489
+ "grad_norm": 0.208359956741333,
490
+ "learning_rate": 0.00016046511627906978,
491
+ "loss": 0.1933,
492
+ "step": 69
493
+ },
494
+ {
495
+ "epoch": 0.2460456942003515,
496
+ "grad_norm": 0.2004474252462387,
497
+ "learning_rate": 0.00016279069767441862,
498
+ "loss": 0.1925,
499
+ "step": 70
500
+ },
501
+ {
502
+ "epoch": 0.24956063268892795,
503
+ "grad_norm": 0.22541354596614838,
504
+ "learning_rate": 0.00016511627906976747,
505
+ "loss": 0.1802,
506
+ "step": 71
507
+ },
508
+ {
509
+ "epoch": 0.2530755711775044,
510
+ "grad_norm": 0.23796068131923676,
511
+ "learning_rate": 0.00016744186046511629,
512
+ "loss": 0.203,
513
+ "step": 72
514
+ },
515
+ {
516
+ "epoch": 0.2565905096660808,
517
+ "grad_norm": 0.1686377376317978,
518
+ "learning_rate": 0.0001697674418604651,
519
+ "loss": 0.1693,
520
+ "step": 73
521
+ },
522
+ {
523
+ "epoch": 0.2601054481546573,
524
+ "grad_norm": 0.14461028575897217,
525
+ "learning_rate": 0.00017209302325581395,
526
+ "loss": 0.1203,
527
+ "step": 74
528
+ },
529
+ {
530
+ "epoch": 0.26362038664323373,
531
+ "grad_norm": 0.1455416977405548,
532
+ "learning_rate": 0.0001744186046511628,
533
+ "loss": 0.0993,
534
+ "step": 75
535
+ },
536
+ {
537
+ "epoch": 0.2671353251318102,
538
+ "grad_norm": 0.15028974413871765,
539
+ "learning_rate": 0.00017674418604651164,
540
+ "loss": 0.1035,
541
+ "step": 76
542
+ },
543
+ {
544
+ "epoch": 0.27065026362038663,
545
+ "grad_norm": 0.2451164275407791,
546
+ "learning_rate": 0.00017906976744186048,
547
+ "loss": 0.1948,
548
+ "step": 77
549
+ },
550
+ {
551
+ "epoch": 0.2741652021089631,
552
+ "grad_norm": 0.159187912940979,
553
+ "learning_rate": 0.0001813953488372093,
554
+ "loss": 0.1251,
555
+ "step": 78
556
+ },
557
+ {
558
+ "epoch": 0.27768014059753954,
559
+ "grad_norm": 0.25709378719329834,
560
+ "learning_rate": 0.00018372093023255815,
561
+ "loss": 0.0994,
562
+ "step": 79
563
+ },
564
+ {
565
+ "epoch": 0.281195079086116,
566
+ "grad_norm": 0.1876499205827713,
567
+ "learning_rate": 0.000186046511627907,
568
+ "loss": 0.1344,
569
+ "step": 80
570
+ },
571
+ {
572
+ "epoch": 0.28471001757469244,
573
+ "grad_norm": 0.17002429068088531,
574
+ "learning_rate": 0.00018837209302325584,
575
+ "loss": 0.1226,
576
+ "step": 81
577
+ },
578
+ {
579
+ "epoch": 0.28822495606326887,
580
+ "grad_norm": 0.19951826333999634,
581
+ "learning_rate": 0.00019069767441860466,
582
+ "loss": 0.181,
583
+ "step": 82
584
+ },
585
+ {
586
+ "epoch": 0.29173989455184535,
587
+ "grad_norm": 0.2241663634777069,
588
+ "learning_rate": 0.0001930232558139535,
589
+ "loss": 0.1659,
590
+ "step": 83
591
+ },
592
+ {
593
+ "epoch": 0.29525483304042177,
594
+ "grad_norm": 0.19991622865200043,
595
+ "learning_rate": 0.00019534883720930232,
596
+ "loss": 0.1638,
597
+ "step": 84
598
+ },
599
+ {
600
+ "epoch": 0.29876977152899825,
601
+ "grad_norm": 0.17116616666316986,
602
+ "learning_rate": 0.00019767441860465116,
603
+ "loss": 0.0865,
604
+ "step": 85
605
+ },
606
+ {
607
+ "epoch": 0.3022847100175747,
608
+ "grad_norm": 0.2004881054162979,
609
+ "learning_rate": 0.0002,
610
+ "loss": 0.1426,
611
+ "step": 86
612
+ },
613
+ {
614
+ "epoch": 0.30579964850615116,
615
+ "grad_norm": 0.1808985322713852,
616
+ "learning_rate": 0.0001995850622406639,
617
+ "loss": 0.1523,
618
+ "step": 87
619
+ },
620
+ {
621
+ "epoch": 0.3093145869947276,
622
+ "grad_norm": 0.25665614008903503,
623
+ "learning_rate": 0.0001991701244813278,
624
+ "loss": 0.1296,
625
+ "step": 88
626
+ },
627
+ {
628
+ "epoch": 0.31282952548330406,
629
+ "grad_norm": 0.1651581972837448,
630
+ "learning_rate": 0.0001987551867219917,
631
+ "loss": 0.1194,
632
+ "step": 89
633
+ },
634
+ {
635
+ "epoch": 0.3163444639718805,
636
+ "grad_norm": 0.19588658213615417,
637
+ "learning_rate": 0.00019834024896265561,
638
+ "loss": 0.167,
639
+ "step": 90
640
+ },
641
+ {
642
+ "epoch": 0.31985940246045697,
643
+ "grad_norm": 0.23157978057861328,
644
+ "learning_rate": 0.00019792531120331952,
645
+ "loss": 0.1484,
646
+ "step": 91
647
+ },
648
+ {
649
+ "epoch": 0.3233743409490334,
650
+ "grad_norm": 0.19277921319007874,
651
+ "learning_rate": 0.00019751037344398342,
652
+ "loss": 0.1622,
653
+ "step": 92
654
+ },
655
+ {
656
+ "epoch": 0.3268892794376098,
657
+ "grad_norm": 0.2064240425825119,
658
+ "learning_rate": 0.00019709543568464732,
659
+ "loss": 0.1529,
660
+ "step": 93
661
+ },
662
+ {
663
+ "epoch": 0.3304042179261863,
664
+ "grad_norm": 0.3631608784198761,
665
+ "learning_rate": 0.0001966804979253112,
666
+ "loss": 0.12,
667
+ "step": 94
668
+ },
669
+ {
670
+ "epoch": 0.3339191564147627,
671
+ "grad_norm": 0.3188731372356415,
672
+ "learning_rate": 0.0001962655601659751,
673
+ "loss": 0.1911,
674
+ "step": 95
675
+ },
676
+ {
677
+ "epoch": 0.3374340949033392,
678
+ "grad_norm": 0.18162627518177032,
679
+ "learning_rate": 0.000195850622406639,
680
+ "loss": 0.1048,
681
+ "step": 96
682
+ },
683
+ {
684
+ "epoch": 0.3409490333919156,
685
+ "grad_norm": 0.13265462219715118,
686
+ "learning_rate": 0.0001954356846473029,
687
+ "loss": 0.0898,
688
+ "step": 97
689
+ },
690
+ {
691
+ "epoch": 0.3444639718804921,
692
+ "grad_norm": 0.18490265309810638,
693
+ "learning_rate": 0.0001950207468879668,
694
+ "loss": 0.1274,
695
+ "step": 98
696
+ },
697
+ {
698
+ "epoch": 0.34797891036906853,
699
+ "grad_norm": 0.13423575460910797,
700
+ "learning_rate": 0.0001946058091286307,
701
+ "loss": 0.0941,
702
+ "step": 99
703
+ },
704
+ {
705
+ "epoch": 0.351493848857645,
706
+ "grad_norm": 0.16704542934894562,
707
+ "learning_rate": 0.00019419087136929463,
708
+ "loss": 0.1202,
709
+ "step": 100
710
+ },
711
+ {
712
+ "epoch": 0.351493848857645,
713
+ "eval_loss": 0.12555669248104095,
714
+ "eval_runtime": 178.7564,
715
+ "eval_samples_per_second": 2.831,
716
+ "eval_steps_per_second": 0.358,
717
+ "step": 100
718
+ },
719
+ {
720
+ "epoch": 0.35500878734622143,
721
+ "grad_norm": 0.16820575296878815,
722
+ "learning_rate": 0.00019377593360995853,
723
+ "loss": 0.107,
724
+ "step": 101
725
+ },
726
+ {
727
+ "epoch": 0.3585237258347979,
728
+ "grad_norm": 0.30647799372673035,
729
+ "learning_rate": 0.00019336099585062243,
730
+ "loss": 0.1739,
731
+ "step": 102
732
+ },
733
+ {
734
+ "epoch": 0.36203866432337434,
735
+ "grad_norm": 0.24310201406478882,
736
+ "learning_rate": 0.00019294605809128633,
737
+ "loss": 0.136,
738
+ "step": 103
739
+ },
740
+ {
741
+ "epoch": 0.3655536028119508,
742
+ "grad_norm": 0.22280918061733246,
743
+ "learning_rate": 0.00019253112033195023,
744
+ "loss": 0.1606,
745
+ "step": 104
746
+ },
747
+ {
748
+ "epoch": 0.36906854130052724,
749
+ "grad_norm": 0.17549653351306915,
750
+ "learning_rate": 0.00019211618257261413,
751
+ "loss": 0.0676,
752
+ "step": 105
753
+ },
754
+ {
755
+ "epoch": 0.37258347978910367,
756
+ "grad_norm": 0.15003140270709991,
757
+ "learning_rate": 0.00019170124481327803,
758
+ "loss": 0.1123,
759
+ "step": 106
760
+ },
761
+ {
762
+ "epoch": 0.37609841827768015,
763
+ "grad_norm": 0.17405907809734344,
764
+ "learning_rate": 0.00019128630705394194,
765
+ "loss": 0.1145,
766
+ "step": 107
767
+ },
768
+ {
769
+ "epoch": 0.37961335676625657,
770
+ "grad_norm": 0.22116738557815552,
771
+ "learning_rate": 0.00019087136929460584,
772
+ "loss": 0.2581,
773
+ "step": 108
774
+ },
775
+ {
776
+ "epoch": 0.38312829525483305,
777
+ "grad_norm": 0.17248253524303436,
778
+ "learning_rate": 0.0001904564315352697,
779
+ "loss": 0.116,
780
+ "step": 109
781
+ },
782
+ {
783
+ "epoch": 0.3866432337434095,
784
+ "grad_norm": 0.1622823178768158,
785
+ "learning_rate": 0.0001900414937759336,
786
+ "loss": 0.1112,
787
+ "step": 110
788
+ },
789
+ {
790
+ "epoch": 0.39015817223198596,
791
+ "grad_norm": 0.19119834899902344,
792
+ "learning_rate": 0.0001896265560165975,
793
+ "loss": 0.1545,
794
+ "step": 111
795
+ },
796
+ {
797
+ "epoch": 0.3936731107205624,
798
+ "grad_norm": 0.21934115886688232,
799
+ "learning_rate": 0.00018921161825726141,
800
+ "loss": 0.1259,
801
+ "step": 112
802
+ },
803
+ {
804
+ "epoch": 0.39718804920913886,
805
+ "grad_norm": 0.2224389910697937,
806
+ "learning_rate": 0.00018879668049792532,
807
+ "loss": 0.1125,
808
+ "step": 113
809
+ },
810
+ {
811
+ "epoch": 0.4007029876977153,
812
+ "grad_norm": 0.4230864346027374,
813
+ "learning_rate": 0.00018838174273858922,
814
+ "loss": 0.1372,
815
+ "step": 114
816
+ },
817
+ {
818
+ "epoch": 0.40421792618629176,
819
+ "grad_norm": 0.22001534700393677,
820
+ "learning_rate": 0.00018796680497925312,
821
+ "loss": 0.1052,
822
+ "step": 115
823
+ },
824
+ {
825
+ "epoch": 0.4077328646748682,
826
+ "grad_norm": 0.18826138973236084,
827
+ "learning_rate": 0.00018755186721991702,
828
+ "loss": 0.109,
829
+ "step": 116
830
+ },
831
+ {
832
+ "epoch": 0.4112478031634446,
833
+ "grad_norm": 0.18136151134967804,
834
+ "learning_rate": 0.00018713692946058092,
835
+ "loss": 0.1017,
836
+ "step": 117
837
+ },
838
+ {
839
+ "epoch": 0.4147627416520211,
840
+ "grad_norm": 0.1845865249633789,
841
+ "learning_rate": 0.00018672199170124482,
842
+ "loss": 0.1208,
843
+ "step": 118
844
+ },
845
+ {
846
+ "epoch": 0.4182776801405975,
847
+ "grad_norm": 0.2266140878200531,
848
+ "learning_rate": 0.00018630705394190872,
849
+ "loss": 0.1312,
850
+ "step": 119
851
+ },
852
+ {
853
+ "epoch": 0.421792618629174,
854
+ "grad_norm": 0.19567352533340454,
855
+ "learning_rate": 0.00018589211618257262,
856
+ "loss": 0.1188,
857
+ "step": 120
858
+ },
859
+ {
860
+ "epoch": 0.4253075571177504,
861
+ "grad_norm": 0.19902321696281433,
862
+ "learning_rate": 0.00018547717842323653,
863
+ "loss": 0.1031,
864
+ "step": 121
865
+ },
866
+ {
867
+ "epoch": 0.4288224956063269,
868
+ "grad_norm": 0.17255137860774994,
869
+ "learning_rate": 0.00018506224066390043,
870
+ "loss": 0.1095,
871
+ "step": 122
872
+ },
873
+ {
874
+ "epoch": 0.43233743409490333,
875
+ "grad_norm": 0.20506910979747772,
876
+ "learning_rate": 0.00018464730290456433,
877
+ "loss": 0.1176,
878
+ "step": 123
879
+ },
880
+ {
881
+ "epoch": 0.4358523725834798,
882
+ "grad_norm": 0.128911554813385,
883
+ "learning_rate": 0.0001842323651452282,
884
+ "loss": 0.0805,
885
+ "step": 124
886
+ },
887
+ {
888
+ "epoch": 0.43936731107205623,
889
+ "grad_norm": 0.21368758380413055,
890
+ "learning_rate": 0.0001838174273858921,
891
+ "loss": 0.1099,
892
+ "step": 125
893
+ },
894
+ {
895
+ "epoch": 0.4428822495606327,
896
+ "grad_norm": 0.19906243681907654,
897
+ "learning_rate": 0.000183402489626556,
898
+ "loss": 0.13,
899
+ "step": 126
900
+ },
901
+ {
902
+ "epoch": 0.44639718804920914,
903
+ "grad_norm": 0.2857668101787567,
904
+ "learning_rate": 0.0001829875518672199,
905
+ "loss": 0.1051,
906
+ "step": 127
907
+ },
908
+ {
909
+ "epoch": 0.44991212653778556,
910
+ "grad_norm": 0.18914097547531128,
911
+ "learning_rate": 0.00018257261410788383,
912
+ "loss": 0.1271,
913
+ "step": 128
914
+ },
915
+ {
916
+ "epoch": 0.45342706502636204,
917
+ "grad_norm": 0.17243094742298126,
918
+ "learning_rate": 0.00018215767634854774,
919
+ "loss": 0.087,
920
+ "step": 129
921
+ },
922
+ {
923
+ "epoch": 0.45694200351493847,
924
+ "grad_norm": 0.1896534413099289,
925
+ "learning_rate": 0.00018174273858921164,
926
+ "loss": 0.0635,
927
+ "step": 130
928
+ },
929
+ {
930
+ "epoch": 0.46045694200351495,
931
+ "grad_norm": 0.2188025712966919,
932
+ "learning_rate": 0.00018132780082987554,
933
+ "loss": 0.123,
934
+ "step": 131
935
+ },
936
+ {
937
+ "epoch": 0.46397188049209137,
938
+ "grad_norm": 0.14551542699337006,
939
+ "learning_rate": 0.00018091286307053944,
940
+ "loss": 0.0932,
941
+ "step": 132
942
+ },
943
+ {
944
+ "epoch": 0.46748681898066785,
945
+ "grad_norm": 0.17265719175338745,
946
+ "learning_rate": 0.00018049792531120334,
947
+ "loss": 0.0855,
948
+ "step": 133
949
+ },
950
+ {
951
+ "epoch": 0.4710017574692443,
952
+ "grad_norm": 0.15373249351978302,
953
+ "learning_rate": 0.00018008298755186724,
954
+ "loss": 0.0797,
955
+ "step": 134
956
+ },
957
+ {
958
+ "epoch": 0.47451669595782076,
959
+ "grad_norm": 0.2197037786245346,
960
+ "learning_rate": 0.00017966804979253114,
961
+ "loss": 0.1079,
962
+ "step": 135
963
+ },
964
+ {
965
+ "epoch": 0.4780316344463972,
966
+ "grad_norm": 0.13843557238578796,
967
+ "learning_rate": 0.00017925311203319504,
968
+ "loss": 0.0893,
969
+ "step": 136
970
+ },
971
+ {
972
+ "epoch": 0.48154657293497366,
973
+ "grad_norm": 0.1315803974866867,
974
+ "learning_rate": 0.00017883817427385895,
975
+ "loss": 0.0798,
976
+ "step": 137
977
+ },
978
+ {
979
+ "epoch": 0.4850615114235501,
980
+ "grad_norm": 0.16765527427196503,
981
+ "learning_rate": 0.00017842323651452285,
982
+ "loss": 0.1129,
983
+ "step": 138
984
+ },
985
+ {
986
+ "epoch": 0.48857644991212656,
987
+ "grad_norm": 0.1512288898229599,
988
+ "learning_rate": 0.00017800829875518672,
989
+ "loss": 0.1077,
990
+ "step": 139
991
+ },
992
+ {
993
+ "epoch": 0.492091388400703,
994
+ "grad_norm": 0.12558338046073914,
995
+ "learning_rate": 0.00017759336099585062,
996
+ "loss": 0.0757,
997
+ "step": 140
998
+ },
999
+ {
1000
+ "epoch": 0.4956063268892794,
1001
+ "grad_norm": 0.148188054561615,
1002
+ "learning_rate": 0.00017717842323651452,
1003
+ "loss": 0.0838,
1004
+ "step": 141
1005
+ },
1006
+ {
1007
+ "epoch": 0.4991212653778559,
1008
+ "grad_norm": 0.11713378876447678,
1009
+ "learning_rate": 0.00017676348547717842,
1010
+ "loss": 0.0671,
1011
+ "step": 142
1012
+ },
1013
+ {
1014
+ "epoch": 0.5026362038664324,
1015
+ "grad_norm": 0.1130850538611412,
1016
+ "learning_rate": 0.00017634854771784233,
1017
+ "loss": 0.0651,
1018
+ "step": 143
1019
+ },
1020
+ {
1021
+ "epoch": 0.5061511423550088,
1022
+ "grad_norm": 0.2193579077720642,
1023
+ "learning_rate": 0.00017593360995850623,
1024
+ "loss": 0.1009,
1025
+ "step": 144
1026
+ },
1027
+ {
1028
+ "epoch": 0.5096660808435852,
1029
+ "grad_norm": 0.16929981112480164,
1030
+ "learning_rate": 0.00017551867219917013,
1031
+ "loss": 0.0726,
1032
+ "step": 145
1033
+ },
1034
+ {
1035
+ "epoch": 0.5131810193321616,
1036
+ "grad_norm": 0.250143438577652,
1037
+ "learning_rate": 0.00017510373443983403,
1038
+ "loss": 0.0946,
1039
+ "step": 146
1040
+ },
1041
+ {
1042
+ "epoch": 0.5166959578207382,
1043
+ "grad_norm": 0.18434955179691315,
1044
+ "learning_rate": 0.00017468879668049793,
1045
+ "loss": 0.0928,
1046
+ "step": 147
1047
+ },
1048
+ {
1049
+ "epoch": 0.5202108963093146,
1050
+ "grad_norm": 0.24621042609214783,
1051
+ "learning_rate": 0.00017427385892116183,
1052
+ "loss": 0.0915,
1053
+ "step": 148
1054
+ },
1055
+ {
1056
+ "epoch": 0.523725834797891,
1057
+ "grad_norm": 0.14161203801631927,
1058
+ "learning_rate": 0.00017385892116182573,
1059
+ "loss": 0.0754,
1060
+ "step": 149
1061
+ },
1062
+ {
1063
+ "epoch": 0.5272407732864675,
1064
+ "grad_norm": 0.22755476832389832,
1065
+ "learning_rate": 0.00017344398340248963,
1066
+ "loss": 0.1233,
1067
+ "step": 150
1068
+ },
1069
+ {
1070
+ "epoch": 0.5307557117750439,
1071
+ "grad_norm": 0.14949020743370056,
1072
+ "learning_rate": 0.00017302904564315354,
1073
+ "loss": 0.102,
1074
+ "step": 151
1075
+ },
1076
+ {
1077
+ "epoch": 0.5342706502636204,
1078
+ "grad_norm": 0.14558053016662598,
1079
+ "learning_rate": 0.00017261410788381744,
1080
+ "loss": 0.0649,
1081
+ "step": 152
1082
+ },
1083
+ {
1084
+ "epoch": 0.5377855887521968,
1085
+ "grad_norm": 0.13403621315956116,
1086
+ "learning_rate": 0.00017219917012448134,
1087
+ "loss": 0.0769,
1088
+ "step": 153
1089
+ },
1090
+ {
1091
+ "epoch": 0.5413005272407733,
1092
+ "grad_norm": 0.1492697149515152,
1093
+ "learning_rate": 0.0001717842323651452,
1094
+ "loss": 0.0695,
1095
+ "step": 154
1096
+ },
1097
+ {
1098
+ "epoch": 0.5448154657293497,
1099
+ "grad_norm": 0.176810160279274,
1100
+ "learning_rate": 0.00017136929460580914,
1101
+ "loss": 0.0914,
1102
+ "step": 155
1103
+ },
1104
+ {
1105
+ "epoch": 0.5483304042179262,
1106
+ "grad_norm": 0.1693807691335678,
1107
+ "learning_rate": 0.00017095435684647304,
1108
+ "loss": 0.1186,
1109
+ "step": 156
1110
+ },
1111
+ {
1112
+ "epoch": 0.5518453427065027,
1113
+ "grad_norm": 0.16816498339176178,
1114
+ "learning_rate": 0.00017053941908713694,
1115
+ "loss": 0.0891,
1116
+ "step": 157
1117
+ },
1118
+ {
1119
+ "epoch": 0.5553602811950791,
1120
+ "grad_norm": 0.1630387306213379,
1121
+ "learning_rate": 0.00017012448132780084,
1122
+ "loss": 0.0817,
1123
+ "step": 158
1124
+ },
1125
+ {
1126
+ "epoch": 0.5588752196836555,
1127
+ "grad_norm": 0.2269333004951477,
1128
+ "learning_rate": 0.00016970954356846475,
1129
+ "loss": 0.1311,
1130
+ "step": 159
1131
+ },
1132
+ {
1133
+ "epoch": 0.562390158172232,
1134
+ "grad_norm": 0.16005942225456238,
1135
+ "learning_rate": 0.00016929460580912865,
1136
+ "loss": 0.077,
1137
+ "step": 160
1138
+ },
1139
+ {
1140
+ "epoch": 0.5659050966608085,
1141
+ "grad_norm": 0.22128961980342865,
1142
+ "learning_rate": 0.00016887966804979255,
1143
+ "loss": 0.0663,
1144
+ "step": 161
1145
+ },
1146
+ {
1147
+ "epoch": 0.5694200351493849,
1148
+ "grad_norm": 0.24515265226364136,
1149
+ "learning_rate": 0.00016846473029045645,
1150
+ "loss": 0.0891,
1151
+ "step": 162
1152
+ },
1153
+ {
1154
+ "epoch": 0.5729349736379613,
1155
+ "grad_norm": 0.15797489881515503,
1156
+ "learning_rate": 0.00016804979253112035,
1157
+ "loss": 0.0604,
1158
+ "step": 163
1159
+ },
1160
+ {
1161
+ "epoch": 0.5764499121265377,
1162
+ "grad_norm": 0.11231053620576859,
1163
+ "learning_rate": 0.00016763485477178425,
1164
+ "loss": 0.0637,
1165
+ "step": 164
1166
+ },
1167
+ {
1168
+ "epoch": 0.5799648506151143,
1169
+ "grad_norm": 0.1319078505039215,
1170
+ "learning_rate": 0.00016721991701244815,
1171
+ "loss": 0.061,
1172
+ "step": 165
1173
+ },
1174
+ {
1175
+ "epoch": 0.5834797891036907,
1176
+ "grad_norm": 0.15379250049591064,
1177
+ "learning_rate": 0.00016680497925311205,
1178
+ "loss": 0.0727,
1179
+ "step": 166
1180
+ },
1181
+ {
1182
+ "epoch": 0.5869947275922671,
1183
+ "grad_norm": 0.12770119309425354,
1184
+ "learning_rate": 0.00016639004149377596,
1185
+ "loss": 0.0749,
1186
+ "step": 167
1187
+ },
1188
+ {
1189
+ "epoch": 0.5905096660808435,
1190
+ "grad_norm": 0.17939890921115875,
1191
+ "learning_rate": 0.00016597510373443986,
1192
+ "loss": 0.0904,
1193
+ "step": 168
1194
+ },
1195
+ {
1196
+ "epoch": 0.5940246045694201,
1197
+ "grad_norm": 0.15956297516822815,
1198
+ "learning_rate": 0.00016556016597510373,
1199
+ "loss": 0.0892,
1200
+ "step": 169
1201
+ },
1202
+ {
1203
+ "epoch": 0.5975395430579965,
1204
+ "grad_norm": 0.17730525135993958,
1205
+ "learning_rate": 0.00016514522821576763,
1206
+ "loss": 0.0783,
1207
+ "step": 170
1208
+ },
1209
+ {
1210
+ "epoch": 0.6010544815465729,
1211
+ "grad_norm": 0.32799553871154785,
1212
+ "learning_rate": 0.00016473029045643153,
1213
+ "loss": 0.082,
1214
+ "step": 171
1215
+ },
1216
+ {
1217
+ "epoch": 0.6045694200351494,
1218
+ "grad_norm": 0.1100563257932663,
1219
+ "learning_rate": 0.00016431535269709543,
1220
+ "loss": 0.0633,
1221
+ "step": 172
1222
+ },
1223
+ {
1224
+ "epoch": 0.6080843585237259,
1225
+ "grad_norm": 0.12468238919973373,
1226
+ "learning_rate": 0.00016390041493775934,
1227
+ "loss": 0.0506,
1228
+ "step": 173
1229
+ },
1230
+ {
1231
+ "epoch": 0.6115992970123023,
1232
+ "grad_norm": 0.11538554728031158,
1233
+ "learning_rate": 0.00016348547717842324,
1234
+ "loss": 0.0614,
1235
+ "step": 174
1236
+ },
1237
+ {
1238
+ "epoch": 0.6151142355008787,
1239
+ "grad_norm": 0.15802019834518433,
1240
+ "learning_rate": 0.00016307053941908714,
1241
+ "loss": 0.0806,
1242
+ "step": 175
1243
+ },
1244
+ {
1245
+ "epoch": 0.6186291739894552,
1246
+ "grad_norm": 0.17224474251270294,
1247
+ "learning_rate": 0.00016265560165975104,
1248
+ "loss": 0.0588,
1249
+ "step": 176
1250
+ },
1251
+ {
1252
+ "epoch": 0.6221441124780316,
1253
+ "grad_norm": 0.2495221644639969,
1254
+ "learning_rate": 0.00016224066390041494,
1255
+ "loss": 0.1109,
1256
+ "step": 177
1257
+ },
1258
+ {
1259
+ "epoch": 0.6256590509666081,
1260
+ "grad_norm": 0.1328081488609314,
1261
+ "learning_rate": 0.00016182572614107884,
1262
+ "loss": 0.0572,
1263
+ "step": 178
1264
+ },
1265
+ {
1266
+ "epoch": 0.6291739894551845,
1267
+ "grad_norm": 0.18458999693393707,
1268
+ "learning_rate": 0.00016141078838174274,
1269
+ "loss": 0.0975,
1270
+ "step": 179
1271
+ },
1272
+ {
1273
+ "epoch": 0.632688927943761,
1274
+ "grad_norm": 0.1648002564907074,
1275
+ "learning_rate": 0.00016099585062240664,
1276
+ "loss": 0.065,
1277
+ "step": 180
1278
+ },
1279
+ {
1280
+ "epoch": 0.6362038664323374,
1281
+ "grad_norm": 0.20207300782203674,
1282
+ "learning_rate": 0.00016058091286307055,
1283
+ "loss": 0.1027,
1284
+ "step": 181
1285
+ },
1286
+ {
1287
+ "epoch": 0.6397188049209139,
1288
+ "grad_norm": 0.17972172796726227,
1289
+ "learning_rate": 0.00016016597510373445,
1290
+ "loss": 0.1069,
1291
+ "step": 182
1292
+ },
1293
+ {
1294
+ "epoch": 0.6432337434094904,
1295
+ "grad_norm": 0.15856166183948517,
1296
+ "learning_rate": 0.00015975103734439835,
1297
+ "loss": 0.0948,
1298
+ "step": 183
1299
+ },
1300
+ {
1301
+ "epoch": 0.6467486818980668,
1302
+ "grad_norm": 0.21730534732341766,
1303
+ "learning_rate": 0.00015933609958506225,
1304
+ "loss": 0.1264,
1305
+ "step": 184
1306
+ },
1307
+ {
1308
+ "epoch": 0.6502636203866432,
1309
+ "grad_norm": 0.10988626629114151,
1310
+ "learning_rate": 0.00015892116182572615,
1311
+ "loss": 0.0439,
1312
+ "step": 185
1313
+ },
1314
+ {
1315
+ "epoch": 0.6537785588752196,
1316
+ "grad_norm": 0.17182111740112305,
1317
+ "learning_rate": 0.00015850622406639005,
1318
+ "loss": 0.1122,
1319
+ "step": 186
1320
+ },
1321
+ {
1322
+ "epoch": 0.6572934973637962,
1323
+ "grad_norm": 0.13410013914108276,
1324
+ "learning_rate": 0.00015809128630705395,
1325
+ "loss": 0.0551,
1326
+ "step": 187
1327
+ },
1328
+ {
1329
+ "epoch": 0.6608084358523726,
1330
+ "grad_norm": 0.11806394904851913,
1331
+ "learning_rate": 0.00015767634854771785,
1332
+ "loss": 0.0782,
1333
+ "step": 188
1334
+ },
1335
+ {
1336
+ "epoch": 0.664323374340949,
1337
+ "grad_norm": 0.18627063930034637,
1338
+ "learning_rate": 0.00015726141078838176,
1339
+ "loss": 0.0787,
1340
+ "step": 189
1341
+ },
1342
+ {
1343
+ "epoch": 0.6678383128295254,
1344
+ "grad_norm": 0.26391828060150146,
1345
+ "learning_rate": 0.00015684647302904566,
1346
+ "loss": 0.0614,
1347
+ "step": 190
1348
+ },
1349
+ {
1350
+ "epoch": 0.671353251318102,
1351
+ "grad_norm": 0.1657516062259674,
1352
+ "learning_rate": 0.00015643153526970956,
1353
+ "loss": 0.0798,
1354
+ "step": 191
1355
+ },
1356
+ {
1357
+ "epoch": 0.6748681898066784,
1358
+ "grad_norm": 0.15479570627212524,
1359
+ "learning_rate": 0.00015601659751037346,
1360
+ "loss": 0.0811,
1361
+ "step": 192
1362
+ },
1363
+ {
1364
+ "epoch": 0.6783831282952548,
1365
+ "grad_norm": 0.16939130425453186,
1366
+ "learning_rate": 0.00015560165975103736,
1367
+ "loss": 0.0749,
1368
+ "step": 193
1369
+ },
1370
+ {
1371
+ "epoch": 0.6818980667838312,
1372
+ "grad_norm": 0.17529913783073425,
1373
+ "learning_rate": 0.00015518672199170126,
1374
+ "loss": 0.0517,
1375
+ "step": 194
1376
+ },
1377
+ {
1378
+ "epoch": 0.6854130052724078,
1379
+ "grad_norm": 0.1807660162448883,
1380
+ "learning_rate": 0.00015477178423236516,
1381
+ "loss": 0.0928,
1382
+ "step": 195
1383
+ },
1384
+ {
1385
+ "epoch": 0.6889279437609842,
1386
+ "grad_norm": 0.13066497445106506,
1387
+ "learning_rate": 0.00015435684647302906,
1388
+ "loss": 0.0525,
1389
+ "step": 196
1390
+ },
1391
+ {
1392
+ "epoch": 0.6924428822495606,
1393
+ "grad_norm": 0.17132331430912018,
1394
+ "learning_rate": 0.00015394190871369297,
1395
+ "loss": 0.1139,
1396
+ "step": 197
1397
+ },
1398
+ {
1399
+ "epoch": 0.6959578207381371,
1400
+ "grad_norm": 0.1522899866104126,
1401
+ "learning_rate": 0.00015352697095435687,
1402
+ "loss": 0.0818,
1403
+ "step": 198
1404
+ },
1405
+ {
1406
+ "epoch": 0.6994727592267135,
1407
+ "grad_norm": 0.18056978285312653,
1408
+ "learning_rate": 0.00015311203319502074,
1409
+ "loss": 0.0698,
1410
+ "step": 199
1411
+ },
1412
+ {
1413
+ "epoch": 0.70298769771529,
1414
+ "grad_norm": 0.1712590903043747,
1415
+ "learning_rate": 0.00015269709543568464,
1416
+ "loss": 0.0743,
1417
+ "step": 200
1418
+ },
1419
+ {
1420
+ "epoch": 0.70298769771529,
1421
+ "eval_loss": 0.07312215119600296,
1422
+ "eval_runtime": 178.6909,
1423
+ "eval_samples_per_second": 2.832,
1424
+ "eval_steps_per_second": 0.358,
1425
+ "step": 200
1426
+ },
1427
+ {
1428
+ "epoch": 0.7065026362038664,
1429
+ "grad_norm": 0.14959929883480072,
1430
+ "learning_rate": 0.00015228215767634854,
1431
+ "loss": 0.0838,
1432
+ "step": 201
1433
+ },
1434
+ {
1435
+ "epoch": 0.7100175746924429,
1436
+ "grad_norm": 0.18013989925384521,
1437
+ "learning_rate": 0.00015186721991701244,
1438
+ "loss": 0.0768,
1439
+ "step": 202
1440
+ },
1441
+ {
1442
+ "epoch": 0.7135325131810193,
1443
+ "grad_norm": 0.1600325107574463,
1444
+ "learning_rate": 0.00015145228215767635,
1445
+ "loss": 0.0934,
1446
+ "step": 203
1447
+ },
1448
+ {
1449
+ "epoch": 0.7170474516695958,
1450
+ "grad_norm": 0.1611802875995636,
1451
+ "learning_rate": 0.00015103734439834025,
1452
+ "loss": 0.0675,
1453
+ "step": 204
1454
+ },
1455
+ {
1456
+ "epoch": 0.7205623901581723,
1457
+ "grad_norm": 0.1379013955593109,
1458
+ "learning_rate": 0.00015062240663900415,
1459
+ "loss": 0.0883,
1460
+ "step": 205
1461
+ },
1462
+ {
1463
+ "epoch": 0.7240773286467487,
1464
+ "grad_norm": 0.1827850192785263,
1465
+ "learning_rate": 0.00015020746887966805,
1466
+ "loss": 0.0686,
1467
+ "step": 206
1468
+ },
1469
+ {
1470
+ "epoch": 0.7275922671353251,
1471
+ "grad_norm": 0.14171870052814484,
1472
+ "learning_rate": 0.00014979253112033195,
1473
+ "loss": 0.0624,
1474
+ "step": 207
1475
+ },
1476
+ {
1477
+ "epoch": 0.7311072056239016,
1478
+ "grad_norm": 0.15266627073287964,
1479
+ "learning_rate": 0.00014937759336099585,
1480
+ "loss": 0.0797,
1481
+ "step": 208
1482
+ },
1483
+ {
1484
+ "epoch": 0.7346221441124781,
1485
+ "grad_norm": 0.14795458316802979,
1486
+ "learning_rate": 0.00014896265560165975,
1487
+ "loss": 0.0713,
1488
+ "step": 209
1489
+ },
1490
+ {
1491
+ "epoch": 0.7381370826010545,
1492
+ "grad_norm": 0.12821845710277557,
1493
+ "learning_rate": 0.00014854771784232365,
1494
+ "loss": 0.0592,
1495
+ "step": 210
1496
+ },
1497
+ {
1498
+ "epoch": 0.7416520210896309,
1499
+ "grad_norm": 0.12268483638763428,
1500
+ "learning_rate": 0.00014813278008298756,
1501
+ "loss": 0.0484,
1502
+ "step": 211
1503
+ },
1504
+ {
1505
+ "epoch": 0.7451669595782073,
1506
+ "grad_norm": 0.12236201018095016,
1507
+ "learning_rate": 0.00014771784232365146,
1508
+ "loss": 0.0642,
1509
+ "step": 212
1510
+ },
1511
+ {
1512
+ "epoch": 0.7486818980667839,
1513
+ "grad_norm": 0.1301528811454773,
1514
+ "learning_rate": 0.00014730290456431536,
1515
+ "loss": 0.0696,
1516
+ "step": 213
1517
+ },
1518
+ {
1519
+ "epoch": 0.7521968365553603,
1520
+ "grad_norm": 0.1912907063961029,
1521
+ "learning_rate": 0.00014688796680497926,
1522
+ "loss": 0.1043,
1523
+ "step": 214
1524
+ },
1525
+ {
1526
+ "epoch": 0.7557117750439367,
1527
+ "grad_norm": 0.11695796251296997,
1528
+ "learning_rate": 0.00014647302904564316,
1529
+ "loss": 0.0463,
1530
+ "step": 215
1531
+ },
1532
+ {
1533
+ "epoch": 0.7592267135325131,
1534
+ "grad_norm": 0.18601001799106598,
1535
+ "learning_rate": 0.00014605809128630706,
1536
+ "loss": 0.0643,
1537
+ "step": 216
1538
+ },
1539
+ {
1540
+ "epoch": 0.7627416520210897,
1541
+ "grad_norm": 0.1472391039133072,
1542
+ "learning_rate": 0.00014564315352697096,
1543
+ "loss": 0.0691,
1544
+ "step": 217
1545
+ },
1546
+ {
1547
+ "epoch": 0.7662565905096661,
1548
+ "grad_norm": 0.1806488037109375,
1549
+ "learning_rate": 0.00014522821576763486,
1550
+ "loss": 0.0429,
1551
+ "step": 218
1552
+ },
1553
+ {
1554
+ "epoch": 0.7697715289982425,
1555
+ "grad_norm": 0.11694241315126419,
1556
+ "learning_rate": 0.00014481327800829877,
1557
+ "loss": 0.0604,
1558
+ "step": 219
1559
+ },
1560
+ {
1561
+ "epoch": 0.773286467486819,
1562
+ "grad_norm": 0.21174874901771545,
1563
+ "learning_rate": 0.00014439834024896267,
1564
+ "loss": 0.0691,
1565
+ "step": 220
1566
+ },
1567
+ {
1568
+ "epoch": 0.7768014059753954,
1569
+ "grad_norm": 0.16953590512275696,
1570
+ "learning_rate": 0.00014398340248962657,
1571
+ "loss": 0.0712,
1572
+ "step": 221
1573
+ },
1574
+ {
1575
+ "epoch": 0.7803163444639719,
1576
+ "grad_norm": 0.14451764523983002,
1577
+ "learning_rate": 0.00014356846473029047,
1578
+ "loss": 0.0423,
1579
+ "step": 222
1580
+ },
1581
+ {
1582
+ "epoch": 0.7838312829525483,
1583
+ "grad_norm": 0.20399795472621918,
1584
+ "learning_rate": 0.00014315352697095437,
1585
+ "loss": 0.0685,
1586
+ "step": 223
1587
+ },
1588
+ {
1589
+ "epoch": 0.7873462214411248,
1590
+ "grad_norm": 0.14048655331134796,
1591
+ "learning_rate": 0.00014273858921161827,
1592
+ "loss": 0.0675,
1593
+ "step": 224
1594
+ },
1595
+ {
1596
+ "epoch": 0.7908611599297012,
1597
+ "grad_norm": 0.2338792383670807,
1598
+ "learning_rate": 0.00014232365145228217,
1599
+ "loss": 0.063,
1600
+ "step": 225
1601
+ },
1602
+ {
1603
+ "epoch": 0.7943760984182777,
1604
+ "grad_norm": 0.14961326122283936,
1605
+ "learning_rate": 0.00014190871369294607,
1606
+ "loss": 0.0651,
1607
+ "step": 226
1608
+ },
1609
+ {
1610
+ "epoch": 0.7978910369068541,
1611
+ "grad_norm": 0.15257729589939117,
1612
+ "learning_rate": 0.00014149377593360998,
1613
+ "loss": 0.0428,
1614
+ "step": 227
1615
+ },
1616
+ {
1617
+ "epoch": 0.8014059753954306,
1618
+ "grad_norm": 0.24161341786384583,
1619
+ "learning_rate": 0.00014107883817427388,
1620
+ "loss": 0.0764,
1621
+ "step": 228
1622
+ },
1623
+ {
1624
+ "epoch": 0.804920913884007,
1625
+ "grad_norm": 0.1564214676618576,
1626
+ "learning_rate": 0.00014066390041493778,
1627
+ "loss": 0.0528,
1628
+ "step": 229
1629
+ },
1630
+ {
1631
+ "epoch": 0.8084358523725835,
1632
+ "grad_norm": 0.184456467628479,
1633
+ "learning_rate": 0.00014024896265560165,
1634
+ "loss": 0.0553,
1635
+ "step": 230
1636
+ },
1637
+ {
1638
+ "epoch": 0.81195079086116,
1639
+ "grad_norm": 0.1617126762866974,
1640
+ "learning_rate": 0.00013983402489626555,
1641
+ "loss": 0.0632,
1642
+ "step": 231
1643
+ },
1644
+ {
1645
+ "epoch": 0.8154657293497364,
1646
+ "grad_norm": 0.13109862804412842,
1647
+ "learning_rate": 0.00013941908713692945,
1648
+ "loss": 0.0475,
1649
+ "step": 232
1650
+ },
1651
+ {
1652
+ "epoch": 0.8189806678383128,
1653
+ "grad_norm": 0.13069158792495728,
1654
+ "learning_rate": 0.00013900414937759336,
1655
+ "loss": 0.0388,
1656
+ "step": 233
1657
+ },
1658
+ {
1659
+ "epoch": 0.8224956063268892,
1660
+ "grad_norm": 0.13061407208442688,
1661
+ "learning_rate": 0.00013858921161825726,
1662
+ "loss": 0.0472,
1663
+ "step": 234
1664
+ },
1665
+ {
1666
+ "epoch": 0.8260105448154658,
1667
+ "grad_norm": 0.27340853214263916,
1668
+ "learning_rate": 0.00013817427385892116,
1669
+ "loss": 0.1715,
1670
+ "step": 235
1671
+ },
1672
+ {
1673
+ "epoch": 0.8295254833040422,
1674
+ "grad_norm": 0.11120892316102982,
1675
+ "learning_rate": 0.00013775933609958506,
1676
+ "loss": 0.0393,
1677
+ "step": 236
1678
+ },
1679
+ {
1680
+ "epoch": 0.8330404217926186,
1681
+ "grad_norm": 0.1893947571516037,
1682
+ "learning_rate": 0.00013734439834024896,
1683
+ "loss": 0.0897,
1684
+ "step": 237
1685
+ },
1686
+ {
1687
+ "epoch": 0.836555360281195,
1688
+ "grad_norm": 0.11574364453554153,
1689
+ "learning_rate": 0.00013692946058091286,
1690
+ "loss": 0.053,
1691
+ "step": 238
1692
+ },
1693
+ {
1694
+ "epoch": 0.8400702987697716,
1695
+ "grad_norm": 0.13650314509868622,
1696
+ "learning_rate": 0.00013651452282157676,
1697
+ "loss": 0.0718,
1698
+ "step": 239
1699
+ },
1700
+ {
1701
+ "epoch": 0.843585237258348,
1702
+ "grad_norm": 0.19434796273708344,
1703
+ "learning_rate": 0.00013609958506224066,
1704
+ "loss": 0.0606,
1705
+ "step": 240
1706
+ },
1707
+ {
1708
+ "epoch": 0.8471001757469244,
1709
+ "grad_norm": 0.17645402252674103,
1710
+ "learning_rate": 0.00013568464730290457,
1711
+ "loss": 0.0995,
1712
+ "step": 241
1713
+ },
1714
+ {
1715
+ "epoch": 0.8506151142355008,
1716
+ "grad_norm": 0.13868428766727448,
1717
+ "learning_rate": 0.00013526970954356847,
1718
+ "loss": 0.0768,
1719
+ "step": 242
1720
+ },
1721
+ {
1722
+ "epoch": 0.8541300527240774,
1723
+ "grad_norm": 0.1283743679523468,
1724
+ "learning_rate": 0.00013485477178423237,
1725
+ "loss": 0.0587,
1726
+ "step": 243
1727
+ },
1728
+ {
1729
+ "epoch": 0.8576449912126538,
1730
+ "grad_norm": 0.1996167004108429,
1731
+ "learning_rate": 0.00013443983402489627,
1732
+ "loss": 0.0562,
1733
+ "step": 244
1734
+ },
1735
+ {
1736
+ "epoch": 0.8611599297012302,
1737
+ "grad_norm": 0.16913233697414398,
1738
+ "learning_rate": 0.00013402489626556017,
1739
+ "loss": 0.1163,
1740
+ "step": 245
1741
+ },
1742
+ {
1743
+ "epoch": 0.8646748681898067,
1744
+ "grad_norm": 0.18035849928855896,
1745
+ "learning_rate": 0.00013360995850622407,
1746
+ "loss": 0.063,
1747
+ "step": 246
1748
+ },
1749
+ {
1750
+ "epoch": 0.8681898066783831,
1751
+ "grad_norm": 0.13519661128520966,
1752
+ "learning_rate": 0.00013319502074688797,
1753
+ "loss": 0.082,
1754
+ "step": 247
1755
+ },
1756
+ {
1757
+ "epoch": 0.8717047451669596,
1758
+ "grad_norm": 0.12751896679401398,
1759
+ "learning_rate": 0.00013278008298755187,
1760
+ "loss": 0.0594,
1761
+ "step": 248
1762
+ },
1763
+ {
1764
+ "epoch": 0.875219683655536,
1765
+ "grad_norm": 0.10623247176408768,
1766
+ "learning_rate": 0.00013236514522821578,
1767
+ "loss": 0.0635,
1768
+ "step": 249
1769
+ },
1770
+ {
1771
+ "epoch": 0.8787346221441125,
1772
+ "grad_norm": 0.16478785872459412,
1773
+ "learning_rate": 0.00013195020746887968,
1774
+ "loss": 0.0646,
1775
+ "step": 250
1776
+ },
1777
+ {
1778
+ "epoch": 0.8822495606326889,
1779
+ "grad_norm": 0.14821447432041168,
1780
+ "learning_rate": 0.00013153526970954358,
1781
+ "loss": 0.0574,
1782
+ "step": 251
1783
+ },
1784
+ {
1785
+ "epoch": 0.8857644991212654,
1786
+ "grad_norm": 0.14830711483955383,
1787
+ "learning_rate": 0.00013112033195020748,
1788
+ "loss": 0.0569,
1789
+ "step": 252
1790
+ },
1791
+ {
1792
+ "epoch": 0.8892794376098418,
1793
+ "grad_norm": 0.15182024240493774,
1794
+ "learning_rate": 0.00013070539419087138,
1795
+ "loss": 0.0486,
1796
+ "step": 253
1797
+ },
1798
+ {
1799
+ "epoch": 0.8927943760984183,
1800
+ "grad_norm": 0.13804112374782562,
1801
+ "learning_rate": 0.00013029045643153528,
1802
+ "loss": 0.0341,
1803
+ "step": 254
1804
+ },
1805
+ {
1806
+ "epoch": 0.8963093145869947,
1807
+ "grad_norm": 0.23451949656009674,
1808
+ "learning_rate": 0.00012987551867219918,
1809
+ "loss": 0.104,
1810
+ "step": 255
1811
+ },
1812
+ {
1813
+ "epoch": 0.8998242530755711,
1814
+ "grad_norm": 0.19352230429649353,
1815
+ "learning_rate": 0.00012946058091286308,
1816
+ "loss": 0.1153,
1817
+ "step": 256
1818
+ },
1819
+ {
1820
+ "epoch": 0.9033391915641477,
1821
+ "grad_norm": 0.1840514838695526,
1822
+ "learning_rate": 0.00012904564315352699,
1823
+ "loss": 0.0661,
1824
+ "step": 257
1825
+ },
1826
+ {
1827
+ "epoch": 0.9068541300527241,
1828
+ "grad_norm": 0.17361341416835785,
1829
+ "learning_rate": 0.0001286307053941909,
1830
+ "loss": 0.0473,
1831
+ "step": 258
1832
+ },
1833
+ {
1834
+ "epoch": 0.9103690685413005,
1835
+ "grad_norm": 0.24270959198474884,
1836
+ "learning_rate": 0.0001282157676348548,
1837
+ "loss": 0.0578,
1838
+ "step": 259
1839
+ },
1840
+ {
1841
+ "epoch": 0.9138840070298769,
1842
+ "grad_norm": 0.2252267301082611,
1843
+ "learning_rate": 0.00012780082987551866,
1844
+ "loss": 0.0804,
1845
+ "step": 260
1846
+ },
1847
+ {
1848
+ "epoch": 0.9173989455184535,
1849
+ "grad_norm": 0.13661521673202515,
1850
+ "learning_rate": 0.00012738589211618256,
1851
+ "loss": 0.0433,
1852
+ "step": 261
1853
+ },
1854
+ {
1855
+ "epoch": 0.9209138840070299,
1856
+ "grad_norm": 0.14729367196559906,
1857
+ "learning_rate": 0.00012697095435684646,
1858
+ "loss": 0.0506,
1859
+ "step": 262
1860
+ },
1861
+ {
1862
+ "epoch": 0.9244288224956063,
1863
+ "grad_norm": 0.12483599036931992,
1864
+ "learning_rate": 0.00012655601659751037,
1865
+ "loss": 0.0361,
1866
+ "step": 263
1867
+ },
1868
+ {
1869
+ "epoch": 0.9279437609841827,
1870
+ "grad_norm": 0.14307105541229248,
1871
+ "learning_rate": 0.00012614107883817427,
1872
+ "loss": 0.0423,
1873
+ "step": 264
1874
+ },
1875
+ {
1876
+ "epoch": 0.9314586994727593,
1877
+ "grad_norm": 0.18808116018772125,
1878
+ "learning_rate": 0.00012572614107883817,
1879
+ "loss": 0.0585,
1880
+ "step": 265
1881
+ },
1882
+ {
1883
+ "epoch": 0.9349736379613357,
1884
+ "grad_norm": 0.37760576605796814,
1885
+ "learning_rate": 0.00012531120331950207,
1886
+ "loss": 0.0689,
1887
+ "step": 266
1888
+ },
1889
+ {
1890
+ "epoch": 0.9384885764499121,
1891
+ "grad_norm": 0.12569907307624817,
1892
+ "learning_rate": 0.00012489626556016597,
1893
+ "loss": 0.0376,
1894
+ "step": 267
1895
+ },
1896
+ {
1897
+ "epoch": 0.9420035149384886,
1898
+ "grad_norm": 0.13358008861541748,
1899
+ "learning_rate": 0.00012448132780082987,
1900
+ "loss": 0.045,
1901
+ "step": 268
1902
+ },
1903
+ {
1904
+ "epoch": 0.945518453427065,
1905
+ "grad_norm": 0.2513868808746338,
1906
+ "learning_rate": 0.00012406639004149377,
1907
+ "loss": 0.0848,
1908
+ "step": 269
1909
+ },
1910
+ {
1911
+ "epoch": 0.9490333919156415,
1912
+ "grad_norm": 0.13180561363697052,
1913
+ "learning_rate": 0.00012365145228215767,
1914
+ "loss": 0.068,
1915
+ "step": 270
1916
+ },
1917
+ {
1918
+ "epoch": 0.9525483304042179,
1919
+ "grad_norm": 0.18591004610061646,
1920
+ "learning_rate": 0.00012323651452282158,
1921
+ "loss": 0.0657,
1922
+ "step": 271
1923
+ },
1924
+ {
1925
+ "epoch": 0.9560632688927944,
1926
+ "grad_norm": 0.14035774767398834,
1927
+ "learning_rate": 0.00012282157676348548,
1928
+ "loss": 0.0483,
1929
+ "step": 272
1930
+ },
1931
+ {
1932
+ "epoch": 0.9595782073813708,
1933
+ "grad_norm": 0.160274475812912,
1934
+ "learning_rate": 0.0001224066390041494,
1935
+ "loss": 0.0361,
1936
+ "step": 273
1937
+ },
1938
+ {
1939
+ "epoch": 0.9630931458699473,
1940
+ "grad_norm": 0.11012961715459824,
1941
+ "learning_rate": 0.0001219917012448133,
1942
+ "loss": 0.0394,
1943
+ "step": 274
1944
+ },
1945
+ {
1946
+ "epoch": 0.9666080843585237,
1947
+ "grad_norm": 0.14294235408306122,
1948
+ "learning_rate": 0.00012157676348547717,
1949
+ "loss": 0.0677,
1950
+ "step": 275
1951
+ },
1952
+ {
1953
+ "epoch": 0.9701230228471002,
1954
+ "grad_norm": 0.18198156356811523,
1955
+ "learning_rate": 0.00012116182572614108,
1956
+ "loss": 0.0617,
1957
+ "step": 276
1958
+ },
1959
+ {
1960
+ "epoch": 0.9736379613356766,
1961
+ "grad_norm": 0.13320007920265198,
1962
+ "learning_rate": 0.00012074688796680498,
1963
+ "loss": 0.0432,
1964
+ "step": 277
1965
+ },
1966
+ {
1967
+ "epoch": 0.9771528998242531,
1968
+ "grad_norm": 0.1330510377883911,
1969
+ "learning_rate": 0.00012033195020746888,
1970
+ "loss": 0.0443,
1971
+ "step": 278
1972
+ },
1973
+ {
1974
+ "epoch": 0.9806678383128296,
1975
+ "grad_norm": 0.16604888439178467,
1976
+ "learning_rate": 0.00011991701244813279,
1977
+ "loss": 0.0614,
1978
+ "step": 279
1979
+ },
1980
+ {
1981
+ "epoch": 0.984182776801406,
1982
+ "grad_norm": 0.13113927841186523,
1983
+ "learning_rate": 0.00011950207468879669,
1984
+ "loss": 0.0593,
1985
+ "step": 280
1986
+ },
1987
+ {
1988
+ "epoch": 0.9876977152899824,
1989
+ "grad_norm": 0.14142243564128876,
1990
+ "learning_rate": 0.00011908713692946059,
1991
+ "loss": 0.0473,
1992
+ "step": 281
1993
+ },
1994
+ {
1995
+ "epoch": 0.9912126537785588,
1996
+ "grad_norm": 0.1621742993593216,
1997
+ "learning_rate": 0.00011867219917012449,
1998
+ "loss": 0.0555,
1999
+ "step": 282
2000
+ },
2001
+ {
2002
+ "epoch": 0.9947275922671354,
2003
+ "grad_norm": 0.16650210320949554,
2004
+ "learning_rate": 0.00011825726141078839,
2005
+ "loss": 0.0678,
2006
+ "step": 283
2007
+ },
2008
+ {
2009
+ "epoch": 0.9982425307557118,
2010
+ "grad_norm": 0.1241079717874527,
2011
+ "learning_rate": 0.00011784232365145229,
2012
+ "loss": 0.0773,
2013
+ "step": 284
2014
+ },
2015
+ {
2016
+ "epoch": 1.0,
2017
+ "grad_norm": 0.5624362826347351,
2018
+ "learning_rate": 0.0001174273858921162,
2019
+ "loss": 0.0893,
2020
+ "step": 285
2021
+ },
2022
+ {
2023
+ "epoch": 1.0035149384885764,
2024
+ "grad_norm": 0.1310419738292694,
2025
+ "learning_rate": 0.0001170124481327801,
2026
+ "loss": 0.0323,
2027
+ "step": 286
2028
+ },
2029
+ {
2030
+ "epoch": 1.0070298769771528,
2031
+ "grad_norm": 0.2076292186975479,
2032
+ "learning_rate": 0.000116597510373444,
2033
+ "loss": 0.0864,
2034
+ "step": 287
2035
+ },
2036
+ {
2037
+ "epoch": 1.0105448154657293,
2038
+ "grad_norm": 0.16203393042087555,
2039
+ "learning_rate": 0.0001161825726141079,
2040
+ "loss": 0.0411,
2041
+ "step": 288
2042
+ },
2043
+ {
2044
+ "epoch": 1.0140597539543057,
2045
+ "grad_norm": 0.18476437032222748,
2046
+ "learning_rate": 0.0001157676348547718,
2047
+ "loss": 0.0596,
2048
+ "step": 289
2049
+ },
2050
+ {
2051
+ "epoch": 1.0175746924428823,
2052
+ "grad_norm": 0.1175275593996048,
2053
+ "learning_rate": 0.00011535269709543569,
2054
+ "loss": 0.0335,
2055
+ "step": 290
2056
+ },
2057
+ {
2058
+ "epoch": 1.0210896309314588,
2059
+ "grad_norm": 0.14564698934555054,
2060
+ "learning_rate": 0.00011493775933609959,
2061
+ "loss": 0.0372,
2062
+ "step": 291
2063
+ },
2064
+ {
2065
+ "epoch": 1.0246045694200352,
2066
+ "grad_norm": 0.11464350670576096,
2067
+ "learning_rate": 0.00011452282157676349,
2068
+ "loss": 0.0361,
2069
+ "step": 292
2070
+ },
2071
+ {
2072
+ "epoch": 1.0281195079086116,
2073
+ "grad_norm": 0.17061354219913483,
2074
+ "learning_rate": 0.00011410788381742739,
2075
+ "loss": 0.0319,
2076
+ "step": 293
2077
+ },
2078
+ {
2079
+ "epoch": 1.031634446397188,
2080
+ "grad_norm": 0.12375995516777039,
2081
+ "learning_rate": 0.00011369294605809129,
2082
+ "loss": 0.0295,
2083
+ "step": 294
2084
+ },
2085
+ {
2086
+ "epoch": 1.0351493848857645,
2087
+ "grad_norm": 0.1310289353132248,
2088
+ "learning_rate": 0.00011327800829875519,
2089
+ "loss": 0.0335,
2090
+ "step": 295
2091
+ },
2092
+ {
2093
+ "epoch": 1.038664323374341,
2094
+ "grad_norm": 0.11135981231927872,
2095
+ "learning_rate": 0.0001128630705394191,
2096
+ "loss": 0.0249,
2097
+ "step": 296
2098
+ },
2099
+ {
2100
+ "epoch": 1.0421792618629173,
2101
+ "grad_norm": 0.15976910293102264,
2102
+ "learning_rate": 0.000112448132780083,
2103
+ "loss": 0.0341,
2104
+ "step": 297
2105
+ },
2106
+ {
2107
+ "epoch": 1.0456942003514937,
2108
+ "grad_norm": 0.13744553923606873,
2109
+ "learning_rate": 0.0001120331950207469,
2110
+ "loss": 0.0314,
2111
+ "step": 298
2112
+ },
2113
+ {
2114
+ "epoch": 1.0492091388400704,
2115
+ "grad_norm": 0.13137714564800262,
2116
+ "learning_rate": 0.0001116182572614108,
2117
+ "loss": 0.0337,
2118
+ "step": 299
2119
+ },
2120
+ {
2121
+ "epoch": 1.0527240773286468,
2122
+ "grad_norm": 0.16147193312644958,
2123
+ "learning_rate": 0.0001112033195020747,
2124
+ "loss": 0.0437,
2125
+ "step": 300
2126
+ },
2127
+ {
2128
+ "epoch": 1.0527240773286468,
2129
+ "eval_loss": 0.05115514248609543,
2130
+ "eval_runtime": 178.6438,
2131
+ "eval_samples_per_second": 2.832,
2132
+ "eval_steps_per_second": 0.358,
2133
+ "step": 300
2134
+ },
2135
+ {
2136
+ "epoch": 1.0562390158172232,
2137
+ "grad_norm": 0.12608520686626434,
2138
+ "learning_rate": 0.0001107883817427386,
2139
+ "loss": 0.0458,
2140
+ "step": 301
2141
+ },
2142
+ {
2143
+ "epoch": 1.0597539543057997,
2144
+ "grad_norm": 0.1479218602180481,
2145
+ "learning_rate": 0.0001103734439834025,
2146
+ "loss": 0.0324,
2147
+ "step": 302
2148
+ },
2149
+ {
2150
+ "epoch": 1.063268892794376,
2151
+ "grad_norm": 0.11473022401332855,
2152
+ "learning_rate": 0.0001099585062240664,
2153
+ "loss": 0.0313,
2154
+ "step": 303
2155
+ },
2156
+ {
2157
+ "epoch": 1.0667838312829525,
2158
+ "grad_norm": 0.11279737949371338,
2159
+ "learning_rate": 0.0001095435684647303,
2160
+ "loss": 0.0334,
2161
+ "step": 304
2162
+ },
2163
+ {
2164
+ "epoch": 1.070298769771529,
2165
+ "grad_norm": 0.11454727500677109,
2166
+ "learning_rate": 0.00010912863070539419,
2167
+ "loss": 0.0346,
2168
+ "step": 305
2169
+ },
2170
+ {
2171
+ "epoch": 1.0738137082601054,
2172
+ "grad_norm": 0.16063125431537628,
2173
+ "learning_rate": 0.00010871369294605809,
2174
+ "loss": 0.0445,
2175
+ "step": 306
2176
+ },
2177
+ {
2178
+ "epoch": 1.0773286467486818,
2179
+ "grad_norm": 0.08446641266345978,
2180
+ "learning_rate": 0.000108298755186722,
2181
+ "loss": 0.0233,
2182
+ "step": 307
2183
+ },
2184
+ {
2185
+ "epoch": 1.0808435852372584,
2186
+ "grad_norm": 0.12478985637426376,
2187
+ "learning_rate": 0.0001078838174273859,
2188
+ "loss": 0.0454,
2189
+ "step": 308
2190
+ },
2191
+ {
2192
+ "epoch": 1.0843585237258349,
2193
+ "grad_norm": 0.10559240728616714,
2194
+ "learning_rate": 0.0001074688796680498,
2195
+ "loss": 0.0286,
2196
+ "step": 309
2197
+ },
2198
+ {
2199
+ "epoch": 1.0878734622144113,
2200
+ "grad_norm": 0.08633790910243988,
2201
+ "learning_rate": 0.0001070539419087137,
2202
+ "loss": 0.0286,
2203
+ "step": 310
2204
+ },
2205
+ {
2206
+ "epoch": 1.0913884007029877,
2207
+ "grad_norm": 0.1345328986644745,
2208
+ "learning_rate": 0.0001066390041493776,
2209
+ "loss": 0.0387,
2210
+ "step": 311
2211
+ },
2212
+ {
2213
+ "epoch": 1.0949033391915641,
2214
+ "grad_norm": 0.1071149930357933,
2215
+ "learning_rate": 0.0001062240663900415,
2216
+ "loss": 0.03,
2217
+ "step": 312
2218
+ },
2219
+ {
2220
+ "epoch": 1.0984182776801406,
2221
+ "grad_norm": 0.1703241467475891,
2222
+ "learning_rate": 0.0001058091286307054,
2223
+ "loss": 0.0527,
2224
+ "step": 313
2225
+ },
2226
+ {
2227
+ "epoch": 1.101933216168717,
2228
+ "grad_norm": 0.09247270226478577,
2229
+ "learning_rate": 0.0001053941908713693,
2230
+ "loss": 0.0323,
2231
+ "step": 314
2232
+ },
2233
+ {
2234
+ "epoch": 1.1054481546572934,
2235
+ "grad_norm": 0.15563809871673584,
2236
+ "learning_rate": 0.0001049792531120332,
2237
+ "loss": 0.0377,
2238
+ "step": 315
2239
+ },
2240
+ {
2241
+ "epoch": 1.10896309314587,
2242
+ "grad_norm": 0.09453997761011124,
2243
+ "learning_rate": 0.0001045643153526971,
2244
+ "loss": 0.0245,
2245
+ "step": 316
2246
+ },
2247
+ {
2248
+ "epoch": 1.1124780316344465,
2249
+ "grad_norm": 0.10154609382152557,
2250
+ "learning_rate": 0.000104149377593361,
2251
+ "loss": 0.0366,
2252
+ "step": 317
2253
+ },
2254
+ {
2255
+ "epoch": 1.115992970123023,
2256
+ "grad_norm": 0.09299053996801376,
2257
+ "learning_rate": 0.00010373443983402491,
2258
+ "loss": 0.0255,
2259
+ "step": 318
2260
+ },
2261
+ {
2262
+ "epoch": 1.1195079086115993,
2263
+ "grad_norm": 0.11212610453367233,
2264
+ "learning_rate": 0.00010331950207468881,
2265
+ "loss": 0.0244,
2266
+ "step": 319
2267
+ },
2268
+ {
2269
+ "epoch": 1.1230228471001757,
2270
+ "grad_norm": 0.08826769888401031,
2271
+ "learning_rate": 0.0001029045643153527,
2272
+ "loss": 0.025,
2273
+ "step": 320
2274
+ },
2275
+ {
2276
+ "epoch": 1.1265377855887522,
2277
+ "grad_norm": 0.16080456972122192,
2278
+ "learning_rate": 0.0001024896265560166,
2279
+ "loss": 0.0333,
2280
+ "step": 321
2281
+ },
2282
+ {
2283
+ "epoch": 1.1300527240773286,
2284
+ "grad_norm": 0.11305706948041916,
2285
+ "learning_rate": 0.0001020746887966805,
2286
+ "loss": 0.0262,
2287
+ "step": 322
2288
+ },
2289
+ {
2290
+ "epoch": 1.133567662565905,
2291
+ "grad_norm": 0.1509881168603897,
2292
+ "learning_rate": 0.0001016597510373444,
2293
+ "loss": 0.0389,
2294
+ "step": 323
2295
+ },
2296
+ {
2297
+ "epoch": 1.1370826010544817,
2298
+ "grad_norm": 0.11454585194587708,
2299
+ "learning_rate": 0.0001012448132780083,
2300
+ "loss": 0.0323,
2301
+ "step": 324
2302
+ },
2303
+ {
2304
+ "epoch": 1.140597539543058,
2305
+ "grad_norm": 0.16684749722480774,
2306
+ "learning_rate": 0.0001008298755186722,
2307
+ "loss": 0.0441,
2308
+ "step": 325
2309
+ },
2310
+ {
2311
+ "epoch": 1.1441124780316345,
2312
+ "grad_norm": 0.09474905580282211,
2313
+ "learning_rate": 0.0001004149377593361,
2314
+ "loss": 0.0307,
2315
+ "step": 326
2316
+ },
2317
+ {
2318
+ "epoch": 1.147627416520211,
2319
+ "grad_norm": 0.1257026344537735,
2320
+ "learning_rate": 0.0001,
2321
+ "loss": 0.0382,
2322
+ "step": 327
2323
+ },
2324
+ {
2325
+ "epoch": 1.1511423550087874,
2326
+ "grad_norm": 0.16471615433692932,
2327
+ "learning_rate": 9.95850622406639e-05,
2328
+ "loss": 0.0472,
2329
+ "step": 328
2330
+ },
2331
+ {
2332
+ "epoch": 1.1546572934973638,
2333
+ "grad_norm": 0.16340135037899017,
2334
+ "learning_rate": 9.917012448132781e-05,
2335
+ "loss": 0.0502,
2336
+ "step": 329
2337
+ },
2338
+ {
2339
+ "epoch": 1.1581722319859402,
2340
+ "grad_norm": 0.09674856066703796,
2341
+ "learning_rate": 9.875518672199171e-05,
2342
+ "loss": 0.029,
2343
+ "step": 330
2344
+ },
2345
+ {
2346
+ "epoch": 1.1616871704745166,
2347
+ "grad_norm": 0.11328493803739548,
2348
+ "learning_rate": 9.83402489626556e-05,
2349
+ "loss": 0.0313,
2350
+ "step": 331
2351
+ },
2352
+ {
2353
+ "epoch": 1.165202108963093,
2354
+ "grad_norm": 0.14702650904655457,
2355
+ "learning_rate": 9.79253112033195e-05,
2356
+ "loss": 0.0563,
2357
+ "step": 332
2358
+ },
2359
+ {
2360
+ "epoch": 1.1687170474516697,
2361
+ "grad_norm": 0.1175096407532692,
2362
+ "learning_rate": 9.75103734439834e-05,
2363
+ "loss": 0.0324,
2364
+ "step": 333
2365
+ },
2366
+ {
2367
+ "epoch": 1.1722319859402461,
2368
+ "grad_norm": 0.1501559168100357,
2369
+ "learning_rate": 9.709543568464731e-05,
2370
+ "loss": 0.0433,
2371
+ "step": 334
2372
+ },
2373
+ {
2374
+ "epoch": 1.1757469244288226,
2375
+ "grad_norm": 0.1217423677444458,
2376
+ "learning_rate": 9.668049792531121e-05,
2377
+ "loss": 0.0325,
2378
+ "step": 335
2379
+ },
2380
+ {
2381
+ "epoch": 1.179261862917399,
2382
+ "grad_norm": 0.21164333820343018,
2383
+ "learning_rate": 9.626556016597512e-05,
2384
+ "loss": 0.0331,
2385
+ "step": 336
2386
+ },
2387
+ {
2388
+ "epoch": 1.1827768014059754,
2389
+ "grad_norm": 0.08103303611278534,
2390
+ "learning_rate": 9.585062240663902e-05,
2391
+ "loss": 0.0242,
2392
+ "step": 337
2393
+ },
2394
+ {
2395
+ "epoch": 1.1862917398945518,
2396
+ "grad_norm": 0.09643135219812393,
2397
+ "learning_rate": 9.543568464730292e-05,
2398
+ "loss": 0.0301,
2399
+ "step": 338
2400
+ },
2401
+ {
2402
+ "epoch": 1.1898066783831283,
2403
+ "grad_norm": 0.11211568117141724,
2404
+ "learning_rate": 9.50207468879668e-05,
2405
+ "loss": 0.0355,
2406
+ "step": 339
2407
+ },
2408
+ {
2409
+ "epoch": 1.1933216168717047,
2410
+ "grad_norm": 0.08113870769739151,
2411
+ "learning_rate": 9.460580912863071e-05,
2412
+ "loss": 0.0255,
2413
+ "step": 340
2414
+ },
2415
+ {
2416
+ "epoch": 1.196836555360281,
2417
+ "grad_norm": 0.10726438462734222,
2418
+ "learning_rate": 9.419087136929461e-05,
2419
+ "loss": 0.035,
2420
+ "step": 341
2421
+ },
2422
+ {
2423
+ "epoch": 1.2003514938488578,
2424
+ "grad_norm": 0.09150339663028717,
2425
+ "learning_rate": 9.377593360995851e-05,
2426
+ "loss": 0.0335,
2427
+ "step": 342
2428
+ },
2429
+ {
2430
+ "epoch": 1.2038664323374342,
2431
+ "grad_norm": 0.10595715790987015,
2432
+ "learning_rate": 9.336099585062241e-05,
2433
+ "loss": 0.0379,
2434
+ "step": 343
2435
+ },
2436
+ {
2437
+ "epoch": 1.2073813708260106,
2438
+ "grad_norm": 0.0731506273150444,
2439
+ "learning_rate": 9.294605809128631e-05,
2440
+ "loss": 0.0232,
2441
+ "step": 344
2442
+ },
2443
+ {
2444
+ "epoch": 1.210896309314587,
2445
+ "grad_norm": 0.08540864288806915,
2446
+ "learning_rate": 9.253112033195021e-05,
2447
+ "loss": 0.0223,
2448
+ "step": 345
2449
+ },
2450
+ {
2451
+ "epoch": 1.2144112478031635,
2452
+ "grad_norm": 0.07775469869375229,
2453
+ "learning_rate": 9.21161825726141e-05,
2454
+ "loss": 0.0204,
2455
+ "step": 346
2456
+ },
2457
+ {
2458
+ "epoch": 1.2179261862917399,
2459
+ "grad_norm": 0.10598568618297577,
2460
+ "learning_rate": 9.1701244813278e-05,
2461
+ "loss": 0.0312,
2462
+ "step": 347
2463
+ },
2464
+ {
2465
+ "epoch": 1.2214411247803163,
2466
+ "grad_norm": 0.1299615204334259,
2467
+ "learning_rate": 9.128630705394192e-05,
2468
+ "loss": 0.0308,
2469
+ "step": 348
2470
+ },
2471
+ {
2472
+ "epoch": 1.2249560632688927,
2473
+ "grad_norm": 0.10963129252195358,
2474
+ "learning_rate": 9.087136929460582e-05,
2475
+ "loss": 0.0267,
2476
+ "step": 349
2477
+ },
2478
+ {
2479
+ "epoch": 1.2284710017574691,
2480
+ "grad_norm": 0.24104978144168854,
2481
+ "learning_rate": 9.045643153526972e-05,
2482
+ "loss": 0.0614,
2483
+ "step": 350
2484
+ },
2485
+ {
2486
+ "epoch": 1.2319859402460458,
2487
+ "grad_norm": 0.060414571315050125,
2488
+ "learning_rate": 9.004149377593362e-05,
2489
+ "loss": 0.0197,
2490
+ "step": 351
2491
+ },
2492
+ {
2493
+ "epoch": 1.2355008787346222,
2494
+ "grad_norm": 0.10535873472690582,
2495
+ "learning_rate": 8.962655601659752e-05,
2496
+ "loss": 0.0348,
2497
+ "step": 352
2498
+ },
2499
+ {
2500
+ "epoch": 1.2390158172231986,
2501
+ "grad_norm": 0.11544504016637802,
2502
+ "learning_rate": 8.921161825726142e-05,
2503
+ "loss": 0.0266,
2504
+ "step": 353
2505
+ },
2506
+ {
2507
+ "epoch": 1.242530755711775,
2508
+ "grad_norm": 0.10720725357532501,
2509
+ "learning_rate": 8.879668049792531e-05,
2510
+ "loss": 0.0345,
2511
+ "step": 354
2512
+ },
2513
+ {
2514
+ "epoch": 1.2460456942003515,
2515
+ "grad_norm": 0.1170039027929306,
2516
+ "learning_rate": 8.838174273858921e-05,
2517
+ "loss": 0.043,
2518
+ "step": 355
2519
+ },
2520
+ {
2521
+ "epoch": 1.249560632688928,
2522
+ "grad_norm": 0.08738044649362564,
2523
+ "learning_rate": 8.796680497925311e-05,
2524
+ "loss": 0.0277,
2525
+ "step": 356
2526
+ },
2527
+ {
2528
+ "epoch": 1.2530755711775043,
2529
+ "grad_norm": 0.11182297021150589,
2530
+ "learning_rate": 8.755186721991701e-05,
2531
+ "loss": 0.0273,
2532
+ "step": 357
2533
+ },
2534
+ {
2535
+ "epoch": 1.2565905096660808,
2536
+ "grad_norm": 0.09249255061149597,
2537
+ "learning_rate": 8.713692946058092e-05,
2538
+ "loss": 0.0337,
2539
+ "step": 358
2540
+ },
2541
+ {
2542
+ "epoch": 1.2601054481546572,
2543
+ "grad_norm": 0.10308247059583664,
2544
+ "learning_rate": 8.672199170124482e-05,
2545
+ "loss": 0.034,
2546
+ "step": 359
2547
+ },
2548
+ {
2549
+ "epoch": 1.2636203866432338,
2550
+ "grad_norm": 0.0934724286198616,
2551
+ "learning_rate": 8.630705394190872e-05,
2552
+ "loss": 0.0274,
2553
+ "step": 360
2554
+ },
2555
+ {
2556
+ "epoch": 1.2671353251318103,
2557
+ "grad_norm": 0.08682486414909363,
2558
+ "learning_rate": 8.58921161825726e-05,
2559
+ "loss": 0.034,
2560
+ "step": 361
2561
+ },
2562
+ {
2563
+ "epoch": 1.2706502636203867,
2564
+ "grad_norm": 0.17702794075012207,
2565
+ "learning_rate": 8.547717842323652e-05,
2566
+ "loss": 0.0846,
2567
+ "step": 362
2568
+ },
2569
+ {
2570
+ "epoch": 1.2741652021089631,
2571
+ "grad_norm": 0.13790243864059448,
2572
+ "learning_rate": 8.506224066390042e-05,
2573
+ "loss": 0.0346,
2574
+ "step": 363
2575
+ },
2576
+ {
2577
+ "epoch": 1.2776801405975395,
2578
+ "grad_norm": 0.08539144694805145,
2579
+ "learning_rate": 8.464730290456432e-05,
2580
+ "loss": 0.0283,
2581
+ "step": 364
2582
+ },
2583
+ {
2584
+ "epoch": 1.281195079086116,
2585
+ "grad_norm": 0.09345310926437378,
2586
+ "learning_rate": 8.423236514522822e-05,
2587
+ "loss": 0.0283,
2588
+ "step": 365
2589
+ },
2590
+ {
2591
+ "epoch": 1.2847100175746924,
2592
+ "grad_norm": 0.1070915013551712,
2593
+ "learning_rate": 8.381742738589213e-05,
2594
+ "loss": 0.0334,
2595
+ "step": 366
2596
+ },
2597
+ {
2598
+ "epoch": 1.2882249560632688,
2599
+ "grad_norm": 0.1502905786037445,
2600
+ "learning_rate": 8.340248962655603e-05,
2601
+ "loss": 0.0511,
2602
+ "step": 367
2603
+ },
2604
+ {
2605
+ "epoch": 1.2917398945518452,
2606
+ "grad_norm": 0.09904105961322784,
2607
+ "learning_rate": 8.298755186721993e-05,
2608
+ "loss": 0.0291,
2609
+ "step": 368
2610
+ },
2611
+ {
2612
+ "epoch": 1.2952548330404219,
2613
+ "grad_norm": 0.1428239345550537,
2614
+ "learning_rate": 8.257261410788382e-05,
2615
+ "loss": 0.0339,
2616
+ "step": 369
2617
+ },
2618
+ {
2619
+ "epoch": 1.2987697715289983,
2620
+ "grad_norm": 0.08671260625123978,
2621
+ "learning_rate": 8.215767634854772e-05,
2622
+ "loss": 0.0288,
2623
+ "step": 370
2624
+ },
2625
+ {
2626
+ "epoch": 1.3022847100175747,
2627
+ "grad_norm": 0.1576804518699646,
2628
+ "learning_rate": 8.174273858921162e-05,
2629
+ "loss": 0.0707,
2630
+ "step": 371
2631
+ },
2632
+ {
2633
+ "epoch": 1.3057996485061512,
2634
+ "grad_norm": 0.10015079379081726,
2635
+ "learning_rate": 8.132780082987552e-05,
2636
+ "loss": 0.0354,
2637
+ "step": 372
2638
+ },
2639
+ {
2640
+ "epoch": 1.3093145869947276,
2641
+ "grad_norm": 0.11030863225460052,
2642
+ "learning_rate": 8.091286307053942e-05,
2643
+ "loss": 0.0349,
2644
+ "step": 373
2645
+ },
2646
+ {
2647
+ "epoch": 1.312829525483304,
2648
+ "grad_norm": 0.09071624279022217,
2649
+ "learning_rate": 8.049792531120332e-05,
2650
+ "loss": 0.0315,
2651
+ "step": 374
2652
+ },
2653
+ {
2654
+ "epoch": 1.3163444639718804,
2655
+ "grad_norm": 0.12986351549625397,
2656
+ "learning_rate": 8.008298755186722e-05,
2657
+ "loss": 0.0415,
2658
+ "step": 375
2659
+ },
2660
+ {
2661
+ "epoch": 1.319859402460457,
2662
+ "grad_norm": 0.09092254191637039,
2663
+ "learning_rate": 7.966804979253112e-05,
2664
+ "loss": 0.0327,
2665
+ "step": 376
2666
+ },
2667
+ {
2668
+ "epoch": 1.3233743409490333,
2669
+ "grad_norm": 0.10521815717220306,
2670
+ "learning_rate": 7.925311203319503e-05,
2671
+ "loss": 0.0288,
2672
+ "step": 377
2673
+ },
2674
+ {
2675
+ "epoch": 1.32688927943761,
2676
+ "grad_norm": 0.1093856543302536,
2677
+ "learning_rate": 7.883817427385893e-05,
2678
+ "loss": 0.0323,
2679
+ "step": 378
2680
+ },
2681
+ {
2682
+ "epoch": 1.3304042179261863,
2683
+ "grad_norm": 0.08131673187017441,
2684
+ "learning_rate": 7.842323651452283e-05,
2685
+ "loss": 0.031,
2686
+ "step": 379
2687
+ },
2688
+ {
2689
+ "epoch": 1.3339191564147628,
2690
+ "grad_norm": 0.11505994945764542,
2691
+ "learning_rate": 7.800829875518673e-05,
2692
+ "loss": 0.0389,
2693
+ "step": 380
2694
+ },
2695
+ {
2696
+ "epoch": 1.3374340949033392,
2697
+ "grad_norm": 0.11734279245138168,
2698
+ "learning_rate": 7.759336099585063e-05,
2699
+ "loss": 0.0347,
2700
+ "step": 381
2701
+ },
2702
+ {
2703
+ "epoch": 1.3409490333919156,
2704
+ "grad_norm": 0.1097937524318695,
2705
+ "learning_rate": 7.717842323651453e-05,
2706
+ "loss": 0.0345,
2707
+ "step": 382
2708
+ },
2709
+ {
2710
+ "epoch": 1.344463971880492,
2711
+ "grad_norm": 0.08438119292259216,
2712
+ "learning_rate": 7.676348547717843e-05,
2713
+ "loss": 0.0214,
2714
+ "step": 383
2715
+ },
2716
+ {
2717
+ "epoch": 1.3479789103690685,
2718
+ "grad_norm": 0.08465591818094254,
2719
+ "learning_rate": 7.634854771784232e-05,
2720
+ "loss": 0.0306,
2721
+ "step": 384
2722
+ },
2723
+ {
2724
+ "epoch": 1.3514938488576451,
2725
+ "grad_norm": 0.08609631657600403,
2726
+ "learning_rate": 7.593360995850622e-05,
2727
+ "loss": 0.0214,
2728
+ "step": 385
2729
+ },
2730
+ {
2731
+ "epoch": 1.3550087873462213,
2732
+ "grad_norm": 0.10714844614267349,
2733
+ "learning_rate": 7.551867219917012e-05,
2734
+ "loss": 0.0272,
2735
+ "step": 386
2736
+ },
2737
+ {
2738
+ "epoch": 1.358523725834798,
2739
+ "grad_norm": 0.0996679738163948,
2740
+ "learning_rate": 7.510373443983402e-05,
2741
+ "loss": 0.0151,
2742
+ "step": 387
2743
+ },
2744
+ {
2745
+ "epoch": 1.3620386643233744,
2746
+ "grad_norm": 0.1393178105354309,
2747
+ "learning_rate": 7.468879668049793e-05,
2748
+ "loss": 0.0475,
2749
+ "step": 388
2750
+ },
2751
+ {
2752
+ "epoch": 1.3655536028119508,
2753
+ "grad_norm": 0.1370774358510971,
2754
+ "learning_rate": 7.427385892116183e-05,
2755
+ "loss": 0.0249,
2756
+ "step": 389
2757
+ },
2758
+ {
2759
+ "epoch": 1.3690685413005272,
2760
+ "grad_norm": 0.08210641145706177,
2761
+ "learning_rate": 7.385892116182573e-05,
2762
+ "loss": 0.0256,
2763
+ "step": 390
2764
+ },
2765
+ {
2766
+ "epoch": 1.3725834797891037,
2767
+ "grad_norm": 0.10344867408275604,
2768
+ "learning_rate": 7.344398340248963e-05,
2769
+ "loss": 0.0368,
2770
+ "step": 391
2771
+ },
2772
+ {
2773
+ "epoch": 1.37609841827768,
2774
+ "grad_norm": 0.11747119575738907,
2775
+ "learning_rate": 7.302904564315353e-05,
2776
+ "loss": 0.0438,
2777
+ "step": 392
2778
+ },
2779
+ {
2780
+ "epoch": 1.3796133567662565,
2781
+ "grad_norm": 0.0993000715970993,
2782
+ "learning_rate": 7.261410788381743e-05,
2783
+ "loss": 0.026,
2784
+ "step": 393
2785
+ },
2786
+ {
2787
+ "epoch": 1.3831282952548332,
2788
+ "grad_norm": 0.104093037545681,
2789
+ "learning_rate": 7.219917012448133e-05,
2790
+ "loss": 0.0276,
2791
+ "step": 394
2792
+ },
2793
+ {
2794
+ "epoch": 1.3866432337434094,
2795
+ "grad_norm": 0.11269046366214752,
2796
+ "learning_rate": 7.178423236514523e-05,
2797
+ "loss": 0.0242,
2798
+ "step": 395
2799
+ },
2800
+ {
2801
+ "epoch": 1.390158172231986,
2802
+ "grad_norm": 0.10255884379148483,
2803
+ "learning_rate": 7.136929460580914e-05,
2804
+ "loss": 0.0256,
2805
+ "step": 396
2806
+ },
2807
+ {
2808
+ "epoch": 1.3936731107205624,
2809
+ "grad_norm": 0.1722956746816635,
2810
+ "learning_rate": 7.095435684647304e-05,
2811
+ "loss": 0.0498,
2812
+ "step": 397
2813
+ },
2814
+ {
2815
+ "epoch": 1.3971880492091389,
2816
+ "grad_norm": 0.11679506301879883,
2817
+ "learning_rate": 7.053941908713694e-05,
2818
+ "loss": 0.038,
2819
+ "step": 398
2820
+ },
2821
+ {
2822
+ "epoch": 1.4007029876977153,
2823
+ "grad_norm": 0.08600778132677078,
2824
+ "learning_rate": 7.012448132780083e-05,
2825
+ "loss": 0.0279,
2826
+ "step": 399
2827
+ },
2828
+ {
2829
+ "epoch": 1.4042179261862917,
2830
+ "grad_norm": 0.07060538977384567,
2831
+ "learning_rate": 6.970954356846473e-05,
2832
+ "loss": 0.0233,
2833
+ "step": 400
2834
+ },
2835
+ {
2836
+ "epoch": 1.4042179261862917,
2837
+ "eval_loss": 0.0394415408372879,
2838
+ "eval_runtime": 178.4848,
2839
+ "eval_samples_per_second": 2.835,
2840
+ "eval_steps_per_second": 0.359,
2841
+ "step": 400
2842
+ },
2843
+ {
2844
+ "epoch": 1.4077328646748681,
2845
+ "grad_norm": 0.09761733561754227,
2846
+ "learning_rate": 6.929460580912863e-05,
2847
+ "loss": 0.0322,
2848
+ "step": 401
2849
+ },
2850
+ {
2851
+ "epoch": 1.4112478031634446,
2852
+ "grad_norm": 0.08879027515649796,
2853
+ "learning_rate": 6.887966804979253e-05,
2854
+ "loss": 0.0288,
2855
+ "step": 402
2856
+ },
2857
+ {
2858
+ "epoch": 1.4147627416520212,
2859
+ "grad_norm": 0.11523272097110748,
2860
+ "learning_rate": 6.846473029045643e-05,
2861
+ "loss": 0.0291,
2862
+ "step": 403
2863
+ },
2864
+ {
2865
+ "epoch": 1.4182776801405974,
2866
+ "grad_norm": 0.09919934719800949,
2867
+ "learning_rate": 6.804979253112033e-05,
2868
+ "loss": 0.0381,
2869
+ "step": 404
2870
+ },
2871
+ {
2872
+ "epoch": 1.421792618629174,
2873
+ "grad_norm": 0.07385391741991043,
2874
+ "learning_rate": 6.763485477178423e-05,
2875
+ "loss": 0.0272,
2876
+ "step": 405
2877
+ },
2878
+ {
2879
+ "epoch": 1.4253075571177505,
2880
+ "grad_norm": 0.07892516255378723,
2881
+ "learning_rate": 6.721991701244813e-05,
2882
+ "loss": 0.0254,
2883
+ "step": 406
2884
+ },
2885
+ {
2886
+ "epoch": 1.428822495606327,
2887
+ "grad_norm": 0.1717812418937683,
2888
+ "learning_rate": 6.680497925311204e-05,
2889
+ "loss": 0.0598,
2890
+ "step": 407
2891
+ },
2892
+ {
2893
+ "epoch": 1.4323374340949033,
2894
+ "grad_norm": 0.10333540290594101,
2895
+ "learning_rate": 6.639004149377594e-05,
2896
+ "loss": 0.0321,
2897
+ "step": 408
2898
+ },
2899
+ {
2900
+ "epoch": 1.4358523725834798,
2901
+ "grad_norm": 0.10540790855884552,
2902
+ "learning_rate": 6.597510373443984e-05,
2903
+ "loss": 0.0316,
2904
+ "step": 409
2905
+ },
2906
+ {
2907
+ "epoch": 1.4393673110720562,
2908
+ "grad_norm": 0.09513910859823227,
2909
+ "learning_rate": 6.556016597510374e-05,
2910
+ "loss": 0.0378,
2911
+ "step": 410
2912
+ },
2913
+ {
2914
+ "epoch": 1.4428822495606326,
2915
+ "grad_norm": 0.13851504027843475,
2916
+ "learning_rate": 6.514522821576764e-05,
2917
+ "loss": 0.0336,
2918
+ "step": 411
2919
+ },
2920
+ {
2921
+ "epoch": 1.4463971880492092,
2922
+ "grad_norm": 0.09247998148202896,
2923
+ "learning_rate": 6.473029045643154e-05,
2924
+ "loss": 0.0362,
2925
+ "step": 412
2926
+ },
2927
+ {
2928
+ "epoch": 1.4499121265377855,
2929
+ "grad_norm": 0.087465301156044,
2930
+ "learning_rate": 6.431535269709544e-05,
2931
+ "loss": 0.0301,
2932
+ "step": 413
2933
+ },
2934
+ {
2935
+ "epoch": 1.453427065026362,
2936
+ "grad_norm": 0.06486547738313675,
2937
+ "learning_rate": 6.390041493775933e-05,
2938
+ "loss": 0.0218,
2939
+ "step": 414
2940
+ },
2941
+ {
2942
+ "epoch": 1.4569420035149385,
2943
+ "grad_norm": 0.07497644424438477,
2944
+ "learning_rate": 6.348547717842323e-05,
2945
+ "loss": 0.0279,
2946
+ "step": 415
2947
+ },
2948
+ {
2949
+ "epoch": 1.460456942003515,
2950
+ "grad_norm": 0.09684702008962631,
2951
+ "learning_rate": 6.307053941908713e-05,
2952
+ "loss": 0.0382,
2953
+ "step": 416
2954
+ },
2955
+ {
2956
+ "epoch": 1.4639718804920914,
2957
+ "grad_norm": 0.10948526114225388,
2958
+ "learning_rate": 6.265560165975103e-05,
2959
+ "loss": 0.0336,
2960
+ "step": 417
2961
+ },
2962
+ {
2963
+ "epoch": 1.4674868189806678,
2964
+ "grad_norm": 0.08034105598926544,
2965
+ "learning_rate": 6.224066390041494e-05,
2966
+ "loss": 0.024,
2967
+ "step": 418
2968
+ },
2969
+ {
2970
+ "epoch": 1.4710017574692442,
2971
+ "grad_norm": 0.04268701747059822,
2972
+ "learning_rate": 6.182572614107884e-05,
2973
+ "loss": 0.0157,
2974
+ "step": 419
2975
+ },
2976
+ {
2977
+ "epoch": 1.4745166959578206,
2978
+ "grad_norm": 0.0965370312333107,
2979
+ "learning_rate": 6.141078838174274e-05,
2980
+ "loss": 0.0287,
2981
+ "step": 420
2982
+ },
2983
+ {
2984
+ "epoch": 1.4780316344463973,
2985
+ "grad_norm": 0.08190871775150299,
2986
+ "learning_rate": 6.099585062240665e-05,
2987
+ "loss": 0.0298,
2988
+ "step": 421
2989
+ },
2990
+ {
2991
+ "epoch": 1.4815465729349737,
2992
+ "grad_norm": 0.08082269132137299,
2993
+ "learning_rate": 6.058091286307054e-05,
2994
+ "loss": 0.0262,
2995
+ "step": 422
2996
+ },
2997
+ {
2998
+ "epoch": 1.4850615114235501,
2999
+ "grad_norm": 0.09048648923635483,
3000
+ "learning_rate": 6.016597510373444e-05,
3001
+ "loss": 0.0256,
3002
+ "step": 423
3003
+ },
3004
+ {
3005
+ "epoch": 1.4885764499121266,
3006
+ "grad_norm": 0.07813292741775513,
3007
+ "learning_rate": 5.9751037344398344e-05,
3008
+ "loss": 0.0329,
3009
+ "step": 424
3010
+ },
3011
+ {
3012
+ "epoch": 1.492091388400703,
3013
+ "grad_norm": 0.09163984656333923,
3014
+ "learning_rate": 5.9336099585062245e-05,
3015
+ "loss": 0.025,
3016
+ "step": 425
3017
+ },
3018
+ {
3019
+ "epoch": 1.4956063268892794,
3020
+ "grad_norm": 0.08940015733242035,
3021
+ "learning_rate": 5.8921161825726146e-05,
3022
+ "loss": 0.0238,
3023
+ "step": 426
3024
+ },
3025
+ {
3026
+ "epoch": 1.4991212653778558,
3027
+ "grad_norm": 0.11653712391853333,
3028
+ "learning_rate": 5.850622406639005e-05,
3029
+ "loss": 0.0277,
3030
+ "step": 427
3031
+ },
3032
+ {
3033
+ "epoch": 1.5026362038664325,
3034
+ "grad_norm": 0.10306468605995178,
3035
+ "learning_rate": 5.809128630705395e-05,
3036
+ "loss": 0.0293,
3037
+ "step": 428
3038
+ },
3039
+ {
3040
+ "epoch": 1.5061511423550087,
3041
+ "grad_norm": 0.10990747809410095,
3042
+ "learning_rate": 5.767634854771784e-05,
3043
+ "loss": 0.0283,
3044
+ "step": 429
3045
+ },
3046
+ {
3047
+ "epoch": 1.5096660808435853,
3048
+ "grad_norm": 0.08972524851560593,
3049
+ "learning_rate": 5.7261410788381744e-05,
3050
+ "loss": 0.0221,
3051
+ "step": 430
3052
+ },
3053
+ {
3054
+ "epoch": 1.5131810193321615,
3055
+ "grad_norm": 0.08488809317350388,
3056
+ "learning_rate": 5.6846473029045646e-05,
3057
+ "loss": 0.0225,
3058
+ "step": 431
3059
+ },
3060
+ {
3061
+ "epoch": 1.5166959578207382,
3062
+ "grad_norm": 0.10268191993236542,
3063
+ "learning_rate": 5.643153526970955e-05,
3064
+ "loss": 0.0259,
3065
+ "step": 432
3066
+ },
3067
+ {
3068
+ "epoch": 1.5202108963093146,
3069
+ "grad_norm": 0.0738985687494278,
3070
+ "learning_rate": 5.601659751037345e-05,
3071
+ "loss": 0.0246,
3072
+ "step": 433
3073
+ },
3074
+ {
3075
+ "epoch": 1.523725834797891,
3076
+ "grad_norm": 0.09742774814367294,
3077
+ "learning_rate": 5.560165975103735e-05,
3078
+ "loss": 0.0262,
3079
+ "step": 434
3080
+ },
3081
+ {
3082
+ "epoch": 1.5272407732864675,
3083
+ "grad_norm": 0.18726953864097595,
3084
+ "learning_rate": 5.518672199170125e-05,
3085
+ "loss": 0.0661,
3086
+ "step": 435
3087
+ },
3088
+ {
3089
+ "epoch": 1.5307557117750439,
3090
+ "grad_norm": 0.08196069300174713,
3091
+ "learning_rate": 5.477178423236515e-05,
3092
+ "loss": 0.0251,
3093
+ "step": 436
3094
+ },
3095
+ {
3096
+ "epoch": 1.5342706502636205,
3097
+ "grad_norm": 0.11763795465230942,
3098
+ "learning_rate": 5.4356846473029046e-05,
3099
+ "loss": 0.0365,
3100
+ "step": 437
3101
+ },
3102
+ {
3103
+ "epoch": 1.5377855887521967,
3104
+ "grad_norm": 0.08862517774105072,
3105
+ "learning_rate": 5.394190871369295e-05,
3106
+ "loss": 0.0281,
3107
+ "step": 438
3108
+ },
3109
+ {
3110
+ "epoch": 1.5413005272407734,
3111
+ "grad_norm": 0.08350022882223129,
3112
+ "learning_rate": 5.352697095435685e-05,
3113
+ "loss": 0.025,
3114
+ "step": 439
3115
+ },
3116
+ {
3117
+ "epoch": 1.5448154657293496,
3118
+ "grad_norm": 0.07232296466827393,
3119
+ "learning_rate": 5.311203319502075e-05,
3120
+ "loss": 0.0229,
3121
+ "step": 440
3122
+ },
3123
+ {
3124
+ "epoch": 1.5483304042179262,
3125
+ "grad_norm": 0.07918655872344971,
3126
+ "learning_rate": 5.269709543568465e-05,
3127
+ "loss": 0.0222,
3128
+ "step": 441
3129
+ },
3130
+ {
3131
+ "epoch": 1.5518453427065027,
3132
+ "grad_norm": 0.1173887774348259,
3133
+ "learning_rate": 5.228215767634855e-05,
3134
+ "loss": 0.0313,
3135
+ "step": 442
3136
+ },
3137
+ {
3138
+ "epoch": 1.555360281195079,
3139
+ "grad_norm": 0.08697450160980225,
3140
+ "learning_rate": 5.1867219917012454e-05,
3141
+ "loss": 0.0237,
3142
+ "step": 443
3143
+ },
3144
+ {
3145
+ "epoch": 1.5588752196836555,
3146
+ "grad_norm": 0.10145950317382812,
3147
+ "learning_rate": 5.145228215767635e-05,
3148
+ "loss": 0.027,
3149
+ "step": 444
3150
+ },
3151
+ {
3152
+ "epoch": 1.562390158172232,
3153
+ "grad_norm": 0.13053980469703674,
3154
+ "learning_rate": 5.103734439834025e-05,
3155
+ "loss": 0.0372,
3156
+ "step": 445
3157
+ },
3158
+ {
3159
+ "epoch": 1.5659050966608086,
3160
+ "grad_norm": 0.11387127637863159,
3161
+ "learning_rate": 5.062240663900415e-05,
3162
+ "loss": 0.037,
3163
+ "step": 446
3164
+ },
3165
+ {
3166
+ "epoch": 1.5694200351493848,
3167
+ "grad_norm": 0.09816371649503708,
3168
+ "learning_rate": 5.020746887966805e-05,
3169
+ "loss": 0.0319,
3170
+ "step": 447
3171
+ },
3172
+ {
3173
+ "epoch": 1.5729349736379614,
3174
+ "grad_norm": 0.07378123700618744,
3175
+ "learning_rate": 4.979253112033195e-05,
3176
+ "loss": 0.0258,
3177
+ "step": 448
3178
+ },
3179
+ {
3180
+ "epoch": 1.5764499121265376,
3181
+ "grad_norm": 0.0998382717370987,
3182
+ "learning_rate": 4.9377593360995854e-05,
3183
+ "loss": 0.0402,
3184
+ "step": 449
3185
+ },
3186
+ {
3187
+ "epoch": 1.5799648506151143,
3188
+ "grad_norm": 0.053047847002744675,
3189
+ "learning_rate": 4.896265560165975e-05,
3190
+ "loss": 0.0193,
3191
+ "step": 450
3192
+ },
3193
+ {
3194
+ "epoch": 1.5834797891036907,
3195
+ "grad_norm": 0.12353768944740295,
3196
+ "learning_rate": 4.854771784232366e-05,
3197
+ "loss": 0.038,
3198
+ "step": 451
3199
+ },
3200
+ {
3201
+ "epoch": 1.5869947275922671,
3202
+ "grad_norm": 0.10450875759124756,
3203
+ "learning_rate": 4.813278008298756e-05,
3204
+ "loss": 0.0353,
3205
+ "step": 452
3206
+ },
3207
+ {
3208
+ "epoch": 1.5905096660808435,
3209
+ "grad_norm": 0.11111406981945038,
3210
+ "learning_rate": 4.771784232365146e-05,
3211
+ "loss": 0.0361,
3212
+ "step": 453
3213
+ },
3214
+ {
3215
+ "epoch": 1.59402460456942,
3216
+ "grad_norm": 0.08408033102750778,
3217
+ "learning_rate": 4.7302904564315354e-05,
3218
+ "loss": 0.0239,
3219
+ "step": 454
3220
+ },
3221
+ {
3222
+ "epoch": 1.5975395430579966,
3223
+ "grad_norm": 0.08122161030769348,
3224
+ "learning_rate": 4.6887966804979255e-05,
3225
+ "loss": 0.0271,
3226
+ "step": 455
3227
+ },
3228
+ {
3229
+ "epoch": 1.6010544815465728,
3230
+ "grad_norm": 0.1017850786447525,
3231
+ "learning_rate": 4.6473029045643156e-05,
3232
+ "loss": 0.0358,
3233
+ "step": 456
3234
+ },
3235
+ {
3236
+ "epoch": 1.6045694200351495,
3237
+ "grad_norm": 0.09576894342899323,
3238
+ "learning_rate": 4.605809128630705e-05,
3239
+ "loss": 0.0307,
3240
+ "step": 457
3241
+ },
3242
+ {
3243
+ "epoch": 1.6080843585237259,
3244
+ "grad_norm": 0.1270962357521057,
3245
+ "learning_rate": 4.564315352697096e-05,
3246
+ "loss": 0.0323,
3247
+ "step": 458
3248
+ },
3249
+ {
3250
+ "epoch": 1.6115992970123023,
3251
+ "grad_norm": 0.09580868482589722,
3252
+ "learning_rate": 4.522821576763486e-05,
3253
+ "loss": 0.0379,
3254
+ "step": 459
3255
+ },
3256
+ {
3257
+ "epoch": 1.6151142355008787,
3258
+ "grad_norm": 0.10440625250339508,
3259
+ "learning_rate": 4.481327800829876e-05,
3260
+ "loss": 0.041,
3261
+ "step": 460
3262
+ },
3263
+ {
3264
+ "epoch": 1.6186291739894552,
3265
+ "grad_norm": 0.12036912888288498,
3266
+ "learning_rate": 4.4398340248962656e-05,
3267
+ "loss": 0.0365,
3268
+ "step": 461
3269
+ },
3270
+ {
3271
+ "epoch": 1.6221441124780316,
3272
+ "grad_norm": 0.06830941140651703,
3273
+ "learning_rate": 4.398340248962656e-05,
3274
+ "loss": 0.0224,
3275
+ "step": 462
3276
+ },
3277
+ {
3278
+ "epoch": 1.625659050966608,
3279
+ "grad_norm": 0.08766771107912064,
3280
+ "learning_rate": 4.356846473029046e-05,
3281
+ "loss": 0.0349,
3282
+ "step": 463
3283
+ },
3284
+ {
3285
+ "epoch": 1.6291739894551847,
3286
+ "grad_norm": 0.12244472652673721,
3287
+ "learning_rate": 4.315352697095436e-05,
3288
+ "loss": 0.0305,
3289
+ "step": 464
3290
+ },
3291
+ {
3292
+ "epoch": 1.6326889279437609,
3293
+ "grad_norm": 0.06699530780315399,
3294
+ "learning_rate": 4.273858921161826e-05,
3295
+ "loss": 0.0213,
3296
+ "step": 465
3297
+ },
3298
+ {
3299
+ "epoch": 1.6362038664323375,
3300
+ "grad_norm": 0.04710953310132027,
3301
+ "learning_rate": 4.232365145228216e-05,
3302
+ "loss": 0.0172,
3303
+ "step": 466
3304
+ },
3305
+ {
3306
+ "epoch": 1.639718804920914,
3307
+ "grad_norm": 0.03541887179017067,
3308
+ "learning_rate": 4.190871369294606e-05,
3309
+ "loss": 0.0145,
3310
+ "step": 467
3311
+ },
3312
+ {
3313
+ "epoch": 1.6432337434094904,
3314
+ "grad_norm": 0.08300907164812088,
3315
+ "learning_rate": 4.1493775933609964e-05,
3316
+ "loss": 0.0186,
3317
+ "step": 468
3318
+ },
3319
+ {
3320
+ "epoch": 1.6467486818980668,
3321
+ "grad_norm": 0.08546904474496841,
3322
+ "learning_rate": 4.107883817427386e-05,
3323
+ "loss": 0.0263,
3324
+ "step": 469
3325
+ },
3326
+ {
3327
+ "epoch": 1.6502636203866432,
3328
+ "grad_norm": 0.0735994502902031,
3329
+ "learning_rate": 4.066390041493776e-05,
3330
+ "loss": 0.0216,
3331
+ "step": 470
3332
+ },
3333
+ {
3334
+ "epoch": 1.6537785588752196,
3335
+ "grad_norm": 0.0836733877658844,
3336
+ "learning_rate": 4.024896265560166e-05,
3337
+ "loss": 0.0274,
3338
+ "step": 471
3339
+ },
3340
+ {
3341
+ "epoch": 1.657293497363796,
3342
+ "grad_norm": 0.07491277158260345,
3343
+ "learning_rate": 3.983402489626556e-05,
3344
+ "loss": 0.0225,
3345
+ "step": 472
3346
+ },
3347
+ {
3348
+ "epoch": 1.6608084358523727,
3349
+ "grad_norm": 0.10139133036136627,
3350
+ "learning_rate": 3.9419087136929464e-05,
3351
+ "loss": 0.0344,
3352
+ "step": 473
3353
+ },
3354
+ {
3355
+ "epoch": 1.664323374340949,
3356
+ "grad_norm": 0.06060206517577171,
3357
+ "learning_rate": 3.9004149377593365e-05,
3358
+ "loss": 0.0222,
3359
+ "step": 474
3360
+ },
3361
+ {
3362
+ "epoch": 1.6678383128295255,
3363
+ "grad_norm": 0.08016759902238846,
3364
+ "learning_rate": 3.8589211618257266e-05,
3365
+ "loss": 0.0199,
3366
+ "step": 475
3367
+ },
3368
+ {
3369
+ "epoch": 1.671353251318102,
3370
+ "grad_norm": 0.06554888933897018,
3371
+ "learning_rate": 3.817427385892116e-05,
3372
+ "loss": 0.0233,
3373
+ "step": 476
3374
+ },
3375
+ {
3376
+ "epoch": 1.6748681898066784,
3377
+ "grad_norm": 0.08745160698890686,
3378
+ "learning_rate": 3.775933609958506e-05,
3379
+ "loss": 0.0287,
3380
+ "step": 477
3381
+ },
3382
+ {
3383
+ "epoch": 1.6783831282952548,
3384
+ "grad_norm": 0.0867169126868248,
3385
+ "learning_rate": 3.734439834024896e-05,
3386
+ "loss": 0.0315,
3387
+ "step": 478
3388
+ },
3389
+ {
3390
+ "epoch": 1.6818980667838312,
3391
+ "grad_norm": 0.07407302409410477,
3392
+ "learning_rate": 3.6929460580912864e-05,
3393
+ "loss": 0.0253,
3394
+ "step": 479
3395
+ },
3396
+ {
3397
+ "epoch": 1.685413005272408,
3398
+ "grad_norm": 0.05431722104549408,
3399
+ "learning_rate": 3.6514522821576766e-05,
3400
+ "loss": 0.0167,
3401
+ "step": 480
3402
+ },
3403
+ {
3404
+ "epoch": 1.688927943760984,
3405
+ "grad_norm": 0.08641435205936432,
3406
+ "learning_rate": 3.609958506224067e-05,
3407
+ "loss": 0.0284,
3408
+ "step": 481
3409
+ },
3410
+ {
3411
+ "epoch": 1.6924428822495607,
3412
+ "grad_norm": 0.05755087360739708,
3413
+ "learning_rate": 3.568464730290457e-05,
3414
+ "loss": 0.0177,
3415
+ "step": 482
3416
+ },
3417
+ {
3418
+ "epoch": 1.695957820738137,
3419
+ "grad_norm": 0.057126596570014954,
3420
+ "learning_rate": 3.526970954356847e-05,
3421
+ "loss": 0.0187,
3422
+ "step": 483
3423
+ },
3424
+ {
3425
+ "epoch": 1.6994727592267136,
3426
+ "grad_norm": 0.13521793484687805,
3427
+ "learning_rate": 3.4854771784232364e-05,
3428
+ "loss": 0.0423,
3429
+ "step": 484
3430
+ },
3431
+ {
3432
+ "epoch": 1.70298769771529,
3433
+ "grad_norm": 0.06800301373004913,
3434
+ "learning_rate": 3.4439834024896265e-05,
3435
+ "loss": 0.0218,
3436
+ "step": 485
3437
+ },
3438
+ {
3439
+ "epoch": 1.7065026362038664,
3440
+ "grad_norm": 0.11221302300691605,
3441
+ "learning_rate": 3.4024896265560166e-05,
3442
+ "loss": 0.0274,
3443
+ "step": 486
3444
+ },
3445
+ {
3446
+ "epoch": 1.7100175746924429,
3447
+ "grad_norm": 0.07233195751905441,
3448
+ "learning_rate": 3.360995850622407e-05,
3449
+ "loss": 0.0251,
3450
+ "step": 487
3451
+ },
3452
+ {
3453
+ "epoch": 1.7135325131810193,
3454
+ "grad_norm": 0.08482959866523743,
3455
+ "learning_rate": 3.319502074688797e-05,
3456
+ "loss": 0.0183,
3457
+ "step": 488
3458
+ },
3459
+ {
3460
+ "epoch": 1.717047451669596,
3461
+ "grad_norm": 0.05164050683379173,
3462
+ "learning_rate": 3.278008298755187e-05,
3463
+ "loss": 0.0172,
3464
+ "step": 489
3465
+ },
3466
+ {
3467
+ "epoch": 1.7205623901581721,
3468
+ "grad_norm": 0.09971071034669876,
3469
+ "learning_rate": 3.236514522821577e-05,
3470
+ "loss": 0.0263,
3471
+ "step": 490
3472
+ },
3473
+ {
3474
+ "epoch": 1.7240773286467488,
3475
+ "grad_norm": 0.0788310170173645,
3476
+ "learning_rate": 3.1950207468879666e-05,
3477
+ "loss": 0.0256,
3478
+ "step": 491
3479
+ },
3480
+ {
3481
+ "epoch": 1.727592267135325,
3482
+ "grad_norm": 0.08644759654998779,
3483
+ "learning_rate": 3.153526970954357e-05,
3484
+ "loss": 0.0253,
3485
+ "step": 492
3486
+ },
3487
+ {
3488
+ "epoch": 1.7311072056239016,
3489
+ "grad_norm": 0.0552949421107769,
3490
+ "learning_rate": 3.112033195020747e-05,
3491
+ "loss": 0.0201,
3492
+ "step": 493
3493
+ },
3494
+ {
3495
+ "epoch": 1.734622144112478,
3496
+ "grad_norm": 0.12068801373243332,
3497
+ "learning_rate": 3.070539419087137e-05,
3498
+ "loss": 0.0324,
3499
+ "step": 494
3500
+ },
3501
+ {
3502
+ "epoch": 1.7381370826010545,
3503
+ "grad_norm": 0.0852043628692627,
3504
+ "learning_rate": 3.029045643153527e-05,
3505
+ "loss": 0.0251,
3506
+ "step": 495
3507
+ },
3508
+ {
3509
+ "epoch": 1.741652021089631,
3510
+ "grad_norm": 0.08057104051113129,
3511
+ "learning_rate": 2.9875518672199172e-05,
3512
+ "loss": 0.0225,
3513
+ "step": 496
3514
+ },
3515
+ {
3516
+ "epoch": 1.7451669595782073,
3517
+ "grad_norm": 0.06668426841497421,
3518
+ "learning_rate": 2.9460580912863073e-05,
3519
+ "loss": 0.0235,
3520
+ "step": 497
3521
+ },
3522
+ {
3523
+ "epoch": 1.748681898066784,
3524
+ "grad_norm": 0.046700526028871536,
3525
+ "learning_rate": 2.9045643153526974e-05,
3526
+ "loss": 0.0169,
3527
+ "step": 498
3528
+ },
3529
+ {
3530
+ "epoch": 1.7521968365553602,
3531
+ "grad_norm": 0.10106682032346725,
3532
+ "learning_rate": 2.8630705394190872e-05,
3533
+ "loss": 0.0271,
3534
+ "step": 499
3535
+ },
3536
+ {
3537
+ "epoch": 1.7557117750439368,
3538
+ "grad_norm": 0.0915568470954895,
3539
+ "learning_rate": 2.8215767634854773e-05,
3540
+ "loss": 0.0275,
3541
+ "step": 500
3542
+ },
3543
+ {
3544
+ "epoch": 1.7557117750439368,
3545
+ "eval_loss": 0.03437558189034462,
3546
+ "eval_runtime": 178.6138,
3547
+ "eval_samples_per_second": 2.833,
3548
+ "eval_steps_per_second": 0.358,
3549
+ "step": 500
3550
+ }
3551
+ ],
3552
+ "logging_steps": 1,
3553
+ "max_steps": 568,
3554
+ "num_input_tokens_seen": 0,
3555
+ "num_train_epochs": 2,
3556
+ "save_steps": 100,
3557
+ "stateful_callbacks": {
3558
+ "TrainerControl": {
3559
+ "args": {
3560
+ "should_epoch_stop": false,
3561
+ "should_evaluate": false,
3562
+ "should_log": false,
3563
+ "should_save": true,
3564
+ "should_training_stop": false
3565
+ },
3566
+ "attributes": {}
3567
+ }
3568
+ },
3569
+ "total_flos": 1.5333795292618875e+18,
3570
+ "train_batch_size": 8,
3571
+ "trial_name": null,
3572
+ "trial_params": null
3573
+ }
checkpoint-500/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5317ae7c8fd1fdde83f11b34f1e8a0bbac31dca36c4ce0659dee88523cebf08b
3
+ size 5624
checkpoint-500/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-568/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: unsloth/qwq-32b-preview-bnb-4bit
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.14.0
checkpoint-568/adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "unsloth/qwq-32b-preview-bnb-4bit",
5
+ "bias": "none",
6
+ "eva_config": null,
7
+ "exclude_modules": null,
8
+ "fan_in_fan_out": false,
9
+ "inference_mode": true,
10
+ "init_lora_weights": true,
11
+ "layer_replication": null,
12
+ "layers_pattern": null,
13
+ "layers_to_transform": null,
14
+ "loftq_config": {},
15
+ "lora_alpha": 32,
16
+ "lora_bias": false,
17
+ "lora_dropout": 0,
18
+ "megatron_config": null,
19
+ "megatron_core": "megatron.core",
20
+ "modules_to_save": null,
21
+ "peft_type": "LORA",
22
+ "r": 64,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "k_proj",
27
+ "q_proj",
28
+ "up_proj",
29
+ "o_proj",
30
+ "v_proj",
31
+ "gate_proj",
32
+ "down_proj"
33
+ ],
34
+ "task_type": "CAUSAL_LM",
35
+ "use_dora": false,
36
+ "use_rslora": true
37
+ }
checkpoint-568/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a89415edce0bc8e5e1d7061c79b5016468b445c5df4f6a0e4854de6b4abdb566
3
+ size 2147605960
checkpoint-568/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
checkpoint-568/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-568/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:74b59b0f81060a6bcb2d995e4705440aaaf41ab896964087829f1dadbf1ffa4b
3
+ size 1091573332
checkpoint-568/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e1c0fb523914a5a4fd99843319c531ac1b76196088779c16816fd1dd6a05915
3
+ size 14244
checkpoint-568/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c46b8151cf74e639931a65476d42e59be513c3e428f58e4cc322c87abf981835
3
+ size 1064
checkpoint-568/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|vision_pad|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
checkpoint-568/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
checkpoint-568/tokenizer_config.json ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "extra_special_tokens": {},
203
+ "model_max_length": 32768,
204
+ "pad_token": "<|vision_pad|>",
205
+ "padding_side": "right",
206
+ "split_special_tokens": false,
207
+ "tokenizer_class": "Qwen2Tokenizer",
208
+ "unk_token": null
209
+ }
checkpoint-568/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-568/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5317ae7c8fd1fdde83f11b34f1e8a0bbac31dca36c4ce0659dee88523cebf08b
3
+ size 5624
checkpoint-568/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|vision_pad|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
tokenizer_config.json ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "extra_special_tokens": {},
203
+ "model_max_length": 32768,
204
+ "pad_token": "<|vision_pad|>",
205
+ "padding_side": "right",
206
+ "split_special_tokens": false,
207
+ "tokenizer_class": "Qwen2Tokenizer",
208
+ "unk_token": null
209
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff