tony24254 commited on
Commit
eb4f7d4
·
verified ·
1 Parent(s): 6407baa

Add files using upload-large-folder tool

Browse files
Files changed (50) hide show
  1. .gitkeep +1 -0
  2. Math_QA/group_09/adapter/README.md +202 -0
  3. Math_QA/group_09/adapter/adapter_config.json +34 -0
  4. Math_QA/group_09/adapter/added_tokens.json +24 -0
  5. Math_QA/group_09/adapter/chat_template.jinja +54 -0
  6. Math_QA/group_09/adapter/merges.txt +0 -0
  7. Math_QA/group_09/adapter/special_tokens_map.json +31 -0
  8. Math_QA/group_09/adapter/tokenizer_config.json +207 -0
  9. Math_QA/group_09/adapter/vocab.json +0 -0
  10. Math_QA/group_09/checkpoints/checkpoint-1200/adapter_config.json +34 -0
  11. Math_QA/group_09/checkpoints/checkpoint-1200/added_tokens.json +24 -0
  12. Math_QA/group_09/checkpoints/checkpoint-1200/chat_template.jinja +54 -0
  13. Math_QA/group_09/checkpoints/checkpoint-1200/merges.txt +0 -0
  14. Math_QA/group_09/checkpoints/checkpoint-1200/special_tokens_map.json +31 -0
  15. Math_QA/group_09/checkpoints/checkpoint-1500/added_tokens.json +24 -0
  16. Math_QA/group_09/checkpoints/checkpoint-1800/README.md +202 -0
  17. Math_QA/group_09/checkpoints/checkpoint-1800/adapter_config.json +34 -0
  18. Math_QA/group_09/checkpoints/checkpoint-1800/added_tokens.json +24 -0
  19. Math_QA/group_09/checkpoints/checkpoint-1800/chat_template.jinja +54 -0
  20. Math_QA/group_09/checkpoints/checkpoint-1800/merges.txt +0 -0
  21. Math_QA/group_09/checkpoints/checkpoint-1800/special_tokens_map.json +31 -0
  22. Math_QA/group_09/checkpoints/checkpoint-1800/tokenizer_config.json +207 -0
  23. Math_QA/group_09/checkpoints/checkpoint-1800/trainer_state.json +2561 -0
  24. Math_QA/group_09/checkpoints/checkpoint-1800/vocab.json +0 -0
  25. Math_QA/group_09/checkpoints/checkpoint-300/README.md +202 -0
  26. Math_QA/group_09/checkpoints/checkpoint-300/adapter_config.json +34 -0
  27. Math_QA/group_09/checkpoints/checkpoint-300/added_tokens.json +24 -0
  28. Math_QA/group_09/checkpoints/checkpoint-300/chat_template.jinja +54 -0
  29. Math_QA/group_09/checkpoints/checkpoint-300/special_tokens_map.json +31 -0
  30. Math_QA/group_09/checkpoints/checkpoint-300/tokenizer_config.json +207 -0
  31. Math_QA/group_09/checkpoints/checkpoint-300/trainer_state.json +461 -0
  32. Math_QA/group_09/checkpoints/checkpoint-300/vocab.json +0 -0
  33. Math_QA/group_09/checkpoints/checkpoint-600/README.md +202 -0
  34. Math_QA/group_09/checkpoints/checkpoint-600/adapter_config.json +34 -0
  35. Math_QA/group_09/checkpoints/checkpoint-600/added_tokens.json +24 -0
  36. Math_QA/group_09/checkpoints/checkpoint-600/chat_template.jinja +54 -0
  37. Math_QA/group_09/checkpoints/checkpoint-600/merges.txt +0 -0
  38. Math_QA/group_09/checkpoints/checkpoint-600/special_tokens_map.json +31 -0
  39. Math_QA/group_09/checkpoints/checkpoint-600/tokenizer_config.json +207 -0
  40. Math_QA/group_09/checkpoints/checkpoint-600/trainer_state.json +881 -0
  41. Math_QA/group_09/checkpoints/checkpoint-600/vocab.json +0 -0
  42. Math_QA/group_09/metadata.json +2718 -0
  43. Math_QA/group_09/prompt_group.json +613 -0
  44. Math_QA/group_09/tokenizer/added_tokens.json +24 -0
  45. Math_QA/group_09/tokenizer/chat_template.jinja +54 -0
  46. Math_QA/group_09/tokenizer/merges.txt +0 -0
  47. Math_QA/group_09/tokenizer/special_tokens_map.json +31 -0
  48. Math_QA/group_09/tokenizer/tokenizer_config.json +207 -0
  49. Math_QA/group_09/tokenizer/vocab.json +0 -0
  50. README.md +223 -0
.gitkeep ADDED
@@ -0,0 +1 @@
 
 
1
+
Math_QA/group_09/adapter/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: /hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/不冻结Qwen训练/models/Qwen2.5-1.5B-Instruct
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.12.0
Math_QA/group_09/adapter/adapter_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/\u4e0d\u51bb\u7ed3Qwen\u8bad\u7ec3/models/Qwen2.5-1.5B-Instruct",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 128,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 64,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "gate_proj",
24
+ "o_proj",
25
+ "k_proj",
26
+ "q_proj",
27
+ "up_proj",
28
+ "down_proj",
29
+ "v_proj"
30
+ ],
31
+ "task_type": "CAUSAL_LM",
32
+ "use_dora": false,
33
+ "use_rslora": false
34
+ }
Math_QA/group_09/adapter/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
Math_QA/group_09/adapter/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
Math_QA/group_09/adapter/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
Math_QA/group_09/adapter/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
Math_QA/group_09/adapter/tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
Math_QA/group_09/adapter/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
Math_QA/group_09/checkpoints/checkpoint-1200/adapter_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/\u4e0d\u51bb\u7ed3Qwen\u8bad\u7ec3/models/Qwen2.5-1.5B-Instruct",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 128,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 64,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "gate_proj",
24
+ "o_proj",
25
+ "k_proj",
26
+ "q_proj",
27
+ "up_proj",
28
+ "down_proj",
29
+ "v_proj"
30
+ ],
31
+ "task_type": "CAUSAL_LM",
32
+ "use_dora": false,
33
+ "use_rslora": false
34
+ }
Math_QA/group_09/checkpoints/checkpoint-1200/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
Math_QA/group_09/checkpoints/checkpoint-1200/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
Math_QA/group_09/checkpoints/checkpoint-1200/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
Math_QA/group_09/checkpoints/checkpoint-1200/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
Math_QA/group_09/checkpoints/checkpoint-1500/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
Math_QA/group_09/checkpoints/checkpoint-1800/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: /hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/不冻结Qwen训练/models/Qwen2.5-1.5B-Instruct
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.12.0
Math_QA/group_09/checkpoints/checkpoint-1800/adapter_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/\u4e0d\u51bb\u7ed3Qwen\u8bad\u7ec3/models/Qwen2.5-1.5B-Instruct",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 128,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 64,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "gate_proj",
24
+ "o_proj",
25
+ "k_proj",
26
+ "q_proj",
27
+ "up_proj",
28
+ "down_proj",
29
+ "v_proj"
30
+ ],
31
+ "task_type": "CAUSAL_LM",
32
+ "use_dora": false,
33
+ "use_rslora": false
34
+ }
Math_QA/group_09/checkpoints/checkpoint-1800/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
Math_QA/group_09/checkpoints/checkpoint-1800/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
Math_QA/group_09/checkpoints/checkpoint-1800/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
Math_QA/group_09/checkpoints/checkpoint-1800/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
Math_QA/group_09/checkpoints/checkpoint-1800/tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
Math_QA/group_09/checkpoints/checkpoint-1800/trainer_state.json ADDED
@@ -0,0 +1,2561 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 450.0,
6
+ "eval_steps": 500,
7
+ "global_step": 1800,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.32,
14
+ "grad_norm": 10.95508098602295,
15
+ "learning_rate": 0.0,
16
+ "loss": 1.9528,
17
+ "step": 1
18
+ },
19
+ {
20
+ "epoch": 1.32,
21
+ "grad_norm": 6.976505279541016,
22
+ "learning_rate": 7.4074074074074075e-06,
23
+ "loss": 1.7919,
24
+ "step": 5
25
+ },
26
+ {
27
+ "epoch": 2.64,
28
+ "grad_norm": 3.288942575454712,
29
+ "learning_rate": 1.6666666666666667e-05,
30
+ "loss": 1.6625,
31
+ "step": 10
32
+ },
33
+ {
34
+ "epoch": 3.96,
35
+ "grad_norm": 2.111987829208374,
36
+ "learning_rate": 2.5925925925925925e-05,
37
+ "loss": 1.2009,
38
+ "step": 15
39
+ },
40
+ {
41
+ "epoch": 5.0,
42
+ "grad_norm": 2.4555912017822266,
43
+ "learning_rate": 3.518518518518519e-05,
44
+ "loss": 0.8544,
45
+ "step": 20
46
+ },
47
+ {
48
+ "epoch": 6.32,
49
+ "grad_norm": 0.656902015209198,
50
+ "learning_rate": 4.4444444444444447e-05,
51
+ "loss": 0.7449,
52
+ "step": 25
53
+ },
54
+ {
55
+ "epoch": 7.64,
56
+ "grad_norm": 0.5291489958763123,
57
+ "learning_rate": 5.370370370370371e-05,
58
+ "loss": 0.5884,
59
+ "step": 30
60
+ },
61
+ {
62
+ "epoch": 8.96,
63
+ "grad_norm": 0.5356371998786926,
64
+ "learning_rate": 6.296296296296296e-05,
65
+ "loss": 0.6349,
66
+ "step": 35
67
+ },
68
+ {
69
+ "epoch": 10.0,
70
+ "grad_norm": 1.3841232061386108,
71
+ "learning_rate": 7.222222222222222e-05,
72
+ "loss": 0.5088,
73
+ "step": 40
74
+ },
75
+ {
76
+ "epoch": 11.32,
77
+ "grad_norm": 0.6104851365089417,
78
+ "learning_rate": 8.148148148148148e-05,
79
+ "loss": 0.4279,
80
+ "step": 45
81
+ },
82
+ {
83
+ "epoch": 12.64,
84
+ "grad_norm": 0.6166547536849976,
85
+ "learning_rate": 9.074074074074075e-05,
86
+ "loss": 0.2689,
87
+ "step": 50
88
+ },
89
+ {
90
+ "epoch": 13.96,
91
+ "grad_norm": 1.70536470413208,
92
+ "learning_rate": 0.0001,
93
+ "loss": 0.1985,
94
+ "step": 55
95
+ },
96
+ {
97
+ "epoch": 15.0,
98
+ "grad_norm": 3.289710760116577,
99
+ "learning_rate": 9.971363115693013e-05,
100
+ "loss": 0.1346,
101
+ "step": 60
102
+ },
103
+ {
104
+ "epoch": 16.32,
105
+ "grad_norm": 0.7723984122276306,
106
+ "learning_rate": 9.942726231386026e-05,
107
+ "loss": 0.0959,
108
+ "step": 65
109
+ },
110
+ {
111
+ "epoch": 17.64,
112
+ "grad_norm": 0.8176506161689758,
113
+ "learning_rate": 9.914089347079038e-05,
114
+ "loss": 0.0617,
115
+ "step": 70
116
+ },
117
+ {
118
+ "epoch": 18.96,
119
+ "grad_norm": 0.5226219892501831,
120
+ "learning_rate": 9.885452462772051e-05,
121
+ "loss": 0.0428,
122
+ "step": 75
123
+ },
124
+ {
125
+ "epoch": 20.0,
126
+ "grad_norm": 2.8831839561462402,
127
+ "learning_rate": 9.856815578465064e-05,
128
+ "loss": 0.0416,
129
+ "step": 80
130
+ },
131
+ {
132
+ "epoch": 21.32,
133
+ "grad_norm": 0.26046544313430786,
134
+ "learning_rate": 9.828178694158075e-05,
135
+ "loss": 0.0334,
136
+ "step": 85
137
+ },
138
+ {
139
+ "epoch": 22.64,
140
+ "grad_norm": 0.5656669735908508,
141
+ "learning_rate": 9.799541809851088e-05,
142
+ "loss": 0.0347,
143
+ "step": 90
144
+ },
145
+ {
146
+ "epoch": 23.96,
147
+ "grad_norm": 0.5219624042510986,
148
+ "learning_rate": 9.7709049255441e-05,
149
+ "loss": 0.0336,
150
+ "step": 95
151
+ },
152
+ {
153
+ "epoch": 25.0,
154
+ "grad_norm": 1.2479528188705444,
155
+ "learning_rate": 9.742268041237114e-05,
156
+ "loss": 0.0325,
157
+ "step": 100
158
+ },
159
+ {
160
+ "epoch": 26.32,
161
+ "grad_norm": 0.3272712826728821,
162
+ "learning_rate": 9.713631156930127e-05,
163
+ "loss": 0.0317,
164
+ "step": 105
165
+ },
166
+ {
167
+ "epoch": 27.64,
168
+ "grad_norm": 0.4236655831336975,
169
+ "learning_rate": 9.68499427262314e-05,
170
+ "loss": 0.0322,
171
+ "step": 110
172
+ },
173
+ {
174
+ "epoch": 28.96,
175
+ "grad_norm": 0.23534469306468964,
176
+ "learning_rate": 9.656357388316152e-05,
177
+ "loss": 0.029,
178
+ "step": 115
179
+ },
180
+ {
181
+ "epoch": 30.0,
182
+ "grad_norm": 0.530704140663147,
183
+ "learning_rate": 9.627720504009165e-05,
184
+ "loss": 0.0301,
185
+ "step": 120
186
+ },
187
+ {
188
+ "epoch": 31.32,
189
+ "grad_norm": 0.08252622932195663,
190
+ "learning_rate": 9.599083619702178e-05,
191
+ "loss": 0.029,
192
+ "step": 125
193
+ },
194
+ {
195
+ "epoch": 32.64,
196
+ "grad_norm": 0.2679576277732849,
197
+ "learning_rate": 9.57044673539519e-05,
198
+ "loss": 0.0287,
199
+ "step": 130
200
+ },
201
+ {
202
+ "epoch": 33.96,
203
+ "grad_norm": 0.30863121151924133,
204
+ "learning_rate": 9.541809851088203e-05,
205
+ "loss": 0.029,
206
+ "step": 135
207
+ },
208
+ {
209
+ "epoch": 35.0,
210
+ "grad_norm": 0.27921056747436523,
211
+ "learning_rate": 9.513172966781214e-05,
212
+ "loss": 0.0272,
213
+ "step": 140
214
+ },
215
+ {
216
+ "epoch": 36.32,
217
+ "grad_norm": 0.15001249313354492,
218
+ "learning_rate": 9.484536082474227e-05,
219
+ "loss": 0.0289,
220
+ "step": 145
221
+ },
222
+ {
223
+ "epoch": 37.64,
224
+ "grad_norm": 0.391609787940979,
225
+ "learning_rate": 9.45589919816724e-05,
226
+ "loss": 0.0295,
227
+ "step": 150
228
+ },
229
+ {
230
+ "epoch": 38.96,
231
+ "grad_norm": 0.24230684340000153,
232
+ "learning_rate": 9.427262313860252e-05,
233
+ "loss": 0.0265,
234
+ "step": 155
235
+ },
236
+ {
237
+ "epoch": 40.0,
238
+ "grad_norm": 2.2498250007629395,
239
+ "learning_rate": 9.398625429553265e-05,
240
+ "loss": 0.0319,
241
+ "step": 160
242
+ },
243
+ {
244
+ "epoch": 41.32,
245
+ "grad_norm": 0.14986856281757355,
246
+ "learning_rate": 9.369988545246277e-05,
247
+ "loss": 0.0277,
248
+ "step": 165
249
+ },
250
+ {
251
+ "epoch": 42.64,
252
+ "grad_norm": 0.14574986696243286,
253
+ "learning_rate": 9.34135166093929e-05,
254
+ "loss": 0.0264,
255
+ "step": 170
256
+ },
257
+ {
258
+ "epoch": 43.96,
259
+ "grad_norm": 0.11353456974029541,
260
+ "learning_rate": 9.312714776632303e-05,
261
+ "loss": 0.026,
262
+ "step": 175
263
+ },
264
+ {
265
+ "epoch": 45.0,
266
+ "grad_norm": 0.19234131276607513,
267
+ "learning_rate": 9.284077892325315e-05,
268
+ "loss": 0.0237,
269
+ "step": 180
270
+ },
271
+ {
272
+ "epoch": 46.32,
273
+ "grad_norm": 0.058677107095718384,
274
+ "learning_rate": 9.255441008018328e-05,
275
+ "loss": 0.0265,
276
+ "step": 185
277
+ },
278
+ {
279
+ "epoch": 47.64,
280
+ "grad_norm": 0.2846521735191345,
281
+ "learning_rate": 9.22680412371134e-05,
282
+ "loss": 0.0279,
283
+ "step": 190
284
+ },
285
+ {
286
+ "epoch": 48.96,
287
+ "grad_norm": 0.06889114528894424,
288
+ "learning_rate": 9.198167239404353e-05,
289
+ "loss": 0.0257,
290
+ "step": 195
291
+ },
292
+ {
293
+ "epoch": 50.0,
294
+ "grad_norm": 0.1600271314382553,
295
+ "learning_rate": 9.169530355097366e-05,
296
+ "loss": 0.0249,
297
+ "step": 200
298
+ },
299
+ {
300
+ "epoch": 51.32,
301
+ "grad_norm": 0.06680695712566376,
302
+ "learning_rate": 9.140893470790379e-05,
303
+ "loss": 0.0245,
304
+ "step": 205
305
+ },
306
+ {
307
+ "epoch": 52.64,
308
+ "grad_norm": 0.06898869574069977,
309
+ "learning_rate": 9.112256586483391e-05,
310
+ "loss": 0.0257,
311
+ "step": 210
312
+ },
313
+ {
314
+ "epoch": 53.96,
315
+ "grad_norm": 0.04665664583444595,
316
+ "learning_rate": 9.083619702176404e-05,
317
+ "loss": 0.0246,
318
+ "step": 215
319
+ },
320
+ {
321
+ "epoch": 55.0,
322
+ "grad_norm": 0.18880419433116913,
323
+ "learning_rate": 9.054982817869416e-05,
324
+ "loss": 0.0267,
325
+ "step": 220
326
+ },
327
+ {
328
+ "epoch": 56.32,
329
+ "grad_norm": 0.05329155549407005,
330
+ "learning_rate": 9.026345933562429e-05,
331
+ "loss": 0.0258,
332
+ "step": 225
333
+ },
334
+ {
335
+ "epoch": 57.64,
336
+ "grad_norm": 0.05351603031158447,
337
+ "learning_rate": 8.997709049255442e-05,
338
+ "loss": 0.0264,
339
+ "step": 230
340
+ },
341
+ {
342
+ "epoch": 58.96,
343
+ "grad_norm": 0.05472696200013161,
344
+ "learning_rate": 8.969072164948454e-05,
345
+ "loss": 0.0266,
346
+ "step": 235
347
+ },
348
+ {
349
+ "epoch": 60.0,
350
+ "grad_norm": 0.17182305455207825,
351
+ "learning_rate": 8.940435280641467e-05,
352
+ "loss": 0.0255,
353
+ "step": 240
354
+ },
355
+ {
356
+ "epoch": 61.32,
357
+ "grad_norm": 0.05441403388977051,
358
+ "learning_rate": 8.91179839633448e-05,
359
+ "loss": 0.0259,
360
+ "step": 245
361
+ },
362
+ {
363
+ "epoch": 62.64,
364
+ "grad_norm": 0.05443132296204567,
365
+ "learning_rate": 8.883161512027491e-05,
366
+ "loss": 0.025,
367
+ "step": 250
368
+ },
369
+ {
370
+ "epoch": 63.96,
371
+ "grad_norm": 0.05410757660865784,
372
+ "learning_rate": 8.854524627720504e-05,
373
+ "loss": 0.0261,
374
+ "step": 255
375
+ },
376
+ {
377
+ "epoch": 65.0,
378
+ "grad_norm": 0.16327381134033203,
379
+ "learning_rate": 8.825887743413516e-05,
380
+ "loss": 0.0265,
381
+ "step": 260
382
+ },
383
+ {
384
+ "epoch": 66.32,
385
+ "grad_norm": 0.05516252666711807,
386
+ "learning_rate": 8.797250859106529e-05,
387
+ "loss": 0.0251,
388
+ "step": 265
389
+ },
390
+ {
391
+ "epoch": 67.64,
392
+ "grad_norm": 0.0483415424823761,
393
+ "learning_rate": 8.768613974799542e-05,
394
+ "loss": 0.0255,
395
+ "step": 270
396
+ },
397
+ {
398
+ "epoch": 68.96,
399
+ "grad_norm": 0.062226541340351105,
400
+ "learning_rate": 8.739977090492554e-05,
401
+ "loss": 0.0247,
402
+ "step": 275
403
+ },
404
+ {
405
+ "epoch": 70.0,
406
+ "grad_norm": 0.20358847081661224,
407
+ "learning_rate": 8.711340206185567e-05,
408
+ "loss": 0.0267,
409
+ "step": 280
410
+ },
411
+ {
412
+ "epoch": 71.32,
413
+ "grad_norm": 0.04628003016114235,
414
+ "learning_rate": 8.682703321878581e-05,
415
+ "loss": 0.0255,
416
+ "step": 285
417
+ },
418
+ {
419
+ "epoch": 72.64,
420
+ "grad_norm": 0.06483373790979385,
421
+ "learning_rate": 8.654066437571594e-05,
422
+ "loss": 0.0257,
423
+ "step": 290
424
+ },
425
+ {
426
+ "epoch": 73.96,
427
+ "grad_norm": 0.04926105588674545,
428
+ "learning_rate": 8.625429553264606e-05,
429
+ "loss": 0.0244,
430
+ "step": 295
431
+ },
432
+ {
433
+ "epoch": 75.0,
434
+ "grad_norm": 0.1988091617822647,
435
+ "learning_rate": 8.596792668957619e-05,
436
+ "loss": 0.0239,
437
+ "step": 300
438
+ },
439
+ {
440
+ "epoch": 76.32,
441
+ "grad_norm": 0.04305023327469826,
442
+ "learning_rate": 8.56815578465063e-05,
443
+ "loss": 0.0248,
444
+ "step": 305
445
+ },
446
+ {
447
+ "epoch": 77.64,
448
+ "grad_norm": 0.04323578625917435,
449
+ "learning_rate": 8.539518900343643e-05,
450
+ "loss": 0.0254,
451
+ "step": 310
452
+ },
453
+ {
454
+ "epoch": 78.96,
455
+ "grad_norm": 0.04426678270101547,
456
+ "learning_rate": 8.510882016036655e-05,
457
+ "loss": 0.0254,
458
+ "step": 315
459
+ },
460
+ {
461
+ "epoch": 80.0,
462
+ "grad_norm": 0.14689449965953827,
463
+ "learning_rate": 8.482245131729668e-05,
464
+ "loss": 0.0259,
465
+ "step": 320
466
+ },
467
+ {
468
+ "epoch": 81.32,
469
+ "grad_norm": 0.04256561025977135,
470
+ "learning_rate": 8.453608247422681e-05,
471
+ "loss": 0.0256,
472
+ "step": 325
473
+ },
474
+ {
475
+ "epoch": 82.64,
476
+ "grad_norm": 0.03943061828613281,
477
+ "learning_rate": 8.424971363115693e-05,
478
+ "loss": 0.0235,
479
+ "step": 330
480
+ },
481
+ {
482
+ "epoch": 83.96,
483
+ "grad_norm": 0.041899990290403366,
484
+ "learning_rate": 8.396334478808706e-05,
485
+ "loss": 0.0249,
486
+ "step": 335
487
+ },
488
+ {
489
+ "epoch": 85.0,
490
+ "grad_norm": 0.151236429810524,
491
+ "learning_rate": 8.367697594501719e-05,
492
+ "loss": 0.0255,
493
+ "step": 340
494
+ },
495
+ {
496
+ "epoch": 86.32,
497
+ "grad_norm": 0.042102884501218796,
498
+ "learning_rate": 8.339060710194731e-05,
499
+ "loss": 0.0244,
500
+ "step": 345
501
+ },
502
+ {
503
+ "epoch": 87.64,
504
+ "grad_norm": 0.04723669961094856,
505
+ "learning_rate": 8.310423825887744e-05,
506
+ "loss": 0.0251,
507
+ "step": 350
508
+ },
509
+ {
510
+ "epoch": 88.96,
511
+ "grad_norm": 0.0578082799911499,
512
+ "learning_rate": 8.281786941580757e-05,
513
+ "loss": 0.0261,
514
+ "step": 355
515
+ },
516
+ {
517
+ "epoch": 90.0,
518
+ "grad_norm": 0.10269813239574432,
519
+ "learning_rate": 8.253150057273768e-05,
520
+ "loss": 0.0225,
521
+ "step": 360
522
+ },
523
+ {
524
+ "epoch": 91.32,
525
+ "grad_norm": 0.046400491148233414,
526
+ "learning_rate": 8.224513172966782e-05,
527
+ "loss": 0.0262,
528
+ "step": 365
529
+ },
530
+ {
531
+ "epoch": 92.64,
532
+ "grad_norm": 0.04183673858642578,
533
+ "learning_rate": 8.195876288659795e-05,
534
+ "loss": 0.0239,
535
+ "step": 370
536
+ },
537
+ {
538
+ "epoch": 93.96,
539
+ "grad_norm": 0.04400316998362541,
540
+ "learning_rate": 8.167239404352807e-05,
541
+ "loss": 0.0263,
542
+ "step": 375
543
+ },
544
+ {
545
+ "epoch": 95.0,
546
+ "grad_norm": 0.10862386226654053,
547
+ "learning_rate": 8.13860252004582e-05,
548
+ "loss": 0.025,
549
+ "step": 380
550
+ },
551
+ {
552
+ "epoch": 96.32,
553
+ "grad_norm": 0.05308162048459053,
554
+ "learning_rate": 8.109965635738833e-05,
555
+ "loss": 0.0248,
556
+ "step": 385
557
+ },
558
+ {
559
+ "epoch": 97.64,
560
+ "grad_norm": 0.04261139780282974,
561
+ "learning_rate": 8.081328751431845e-05,
562
+ "loss": 0.0244,
563
+ "step": 390
564
+ },
565
+ {
566
+ "epoch": 98.96,
567
+ "grad_norm": 0.05337546020746231,
568
+ "learning_rate": 8.052691867124858e-05,
569
+ "loss": 0.0253,
570
+ "step": 395
571
+ },
572
+ {
573
+ "epoch": 100.0,
574
+ "grad_norm": 0.15639856457710266,
575
+ "learning_rate": 8.02405498281787e-05,
576
+ "loss": 0.0243,
577
+ "step": 400
578
+ },
579
+ {
580
+ "epoch": 101.32,
581
+ "grad_norm": 0.04450729116797447,
582
+ "learning_rate": 7.995418098510883e-05,
583
+ "loss": 0.0258,
584
+ "step": 405
585
+ },
586
+ {
587
+ "epoch": 102.64,
588
+ "grad_norm": 0.042327046394348145,
589
+ "learning_rate": 7.966781214203894e-05,
590
+ "loss": 0.0244,
591
+ "step": 410
592
+ },
593
+ {
594
+ "epoch": 103.96,
595
+ "grad_norm": 0.04105006903409958,
596
+ "learning_rate": 7.938144329896907e-05,
597
+ "loss": 0.0253,
598
+ "step": 415
599
+ },
600
+ {
601
+ "epoch": 105.0,
602
+ "grad_norm": 0.17930248379707336,
603
+ "learning_rate": 7.90950744558992e-05,
604
+ "loss": 0.0261,
605
+ "step": 420
606
+ },
607
+ {
608
+ "epoch": 106.32,
609
+ "grad_norm": 0.04404031112790108,
610
+ "learning_rate": 7.880870561282932e-05,
611
+ "loss": 0.0241,
612
+ "step": 425
613
+ },
614
+ {
615
+ "epoch": 107.64,
616
+ "grad_norm": 0.04142986983060837,
617
+ "learning_rate": 7.852233676975945e-05,
618
+ "loss": 0.0245,
619
+ "step": 430
620
+ },
621
+ {
622
+ "epoch": 108.96,
623
+ "grad_norm": 0.041959185153245926,
624
+ "learning_rate": 7.823596792668958e-05,
625
+ "loss": 0.0254,
626
+ "step": 435
627
+ },
628
+ {
629
+ "epoch": 110.0,
630
+ "grad_norm": 0.27740439772605896,
631
+ "learning_rate": 7.79495990836197e-05,
632
+ "loss": 0.0292,
633
+ "step": 440
634
+ },
635
+ {
636
+ "epoch": 111.32,
637
+ "grad_norm": 0.03657572343945503,
638
+ "learning_rate": 7.766323024054983e-05,
639
+ "loss": 0.026,
640
+ "step": 445
641
+ },
642
+ {
643
+ "epoch": 112.64,
644
+ "grad_norm": 0.042320434004068375,
645
+ "learning_rate": 7.737686139747996e-05,
646
+ "loss": 0.0251,
647
+ "step": 450
648
+ },
649
+ {
650
+ "epoch": 113.96,
651
+ "grad_norm": 0.0473681204020977,
652
+ "learning_rate": 7.709049255441008e-05,
653
+ "loss": 0.026,
654
+ "step": 455
655
+ },
656
+ {
657
+ "epoch": 115.0,
658
+ "grad_norm": 0.1326676607131958,
659
+ "learning_rate": 7.680412371134021e-05,
660
+ "loss": 0.0241,
661
+ "step": 460
662
+ },
663
+ {
664
+ "epoch": 116.32,
665
+ "grad_norm": 0.04483647271990776,
666
+ "learning_rate": 7.651775486827034e-05,
667
+ "loss": 0.0236,
668
+ "step": 465
669
+ },
670
+ {
671
+ "epoch": 117.64,
672
+ "grad_norm": 0.038961004465818405,
673
+ "learning_rate": 7.623138602520046e-05,
674
+ "loss": 0.0235,
675
+ "step": 470
676
+ },
677
+ {
678
+ "epoch": 118.96,
679
+ "grad_norm": 0.042134907096624374,
680
+ "learning_rate": 7.594501718213059e-05,
681
+ "loss": 0.0252,
682
+ "step": 475
683
+ },
684
+ {
685
+ "epoch": 120.0,
686
+ "grad_norm": 0.13292020559310913,
687
+ "learning_rate": 7.565864833906071e-05,
688
+ "loss": 0.024,
689
+ "step": 480
690
+ },
691
+ {
692
+ "epoch": 121.32,
693
+ "grad_norm": 0.03745294362306595,
694
+ "learning_rate": 7.537227949599084e-05,
695
+ "loss": 0.025,
696
+ "step": 485
697
+ },
698
+ {
699
+ "epoch": 122.64,
700
+ "grad_norm": 0.035545315593481064,
701
+ "learning_rate": 7.508591065292097e-05,
702
+ "loss": 0.0253,
703
+ "step": 490
704
+ },
705
+ {
706
+ "epoch": 123.96,
707
+ "grad_norm": 0.03991984575986862,
708
+ "learning_rate": 7.47995418098511e-05,
709
+ "loss": 0.026,
710
+ "step": 495
711
+ },
712
+ {
713
+ "epoch": 125.0,
714
+ "grad_norm": 0.1339961290359497,
715
+ "learning_rate": 7.451317296678122e-05,
716
+ "loss": 0.0246,
717
+ "step": 500
718
+ },
719
+ {
720
+ "epoch": 126.32,
721
+ "grad_norm": 0.04381132498383522,
722
+ "learning_rate": 7.422680412371135e-05,
723
+ "loss": 0.0235,
724
+ "step": 505
725
+ },
726
+ {
727
+ "epoch": 127.64,
728
+ "grad_norm": 0.048515841364860535,
729
+ "learning_rate": 7.394043528064147e-05,
730
+ "loss": 0.0242,
731
+ "step": 510
732
+ },
733
+ {
734
+ "epoch": 128.96,
735
+ "grad_norm": 0.04145604744553566,
736
+ "learning_rate": 7.36540664375716e-05,
737
+ "loss": 0.0249,
738
+ "step": 515
739
+ },
740
+ {
741
+ "epoch": 130.0,
742
+ "grad_norm": 0.14400818943977356,
743
+ "learning_rate": 7.336769759450171e-05,
744
+ "loss": 0.0247,
745
+ "step": 520
746
+ },
747
+ {
748
+ "epoch": 131.32,
749
+ "grad_norm": 0.04025031998753548,
750
+ "learning_rate": 7.308132875143184e-05,
751
+ "loss": 0.0241,
752
+ "step": 525
753
+ },
754
+ {
755
+ "epoch": 132.64,
756
+ "grad_norm": 0.037277135998010635,
757
+ "learning_rate": 7.279495990836197e-05,
758
+ "loss": 0.0242,
759
+ "step": 530
760
+ },
761
+ {
762
+ "epoch": 133.96,
763
+ "grad_norm": 0.03666083887219429,
764
+ "learning_rate": 7.250859106529209e-05,
765
+ "loss": 0.0251,
766
+ "step": 535
767
+ },
768
+ {
769
+ "epoch": 135.0,
770
+ "grad_norm": 0.09921745210886002,
771
+ "learning_rate": 7.222222222222222e-05,
772
+ "loss": 0.0241,
773
+ "step": 540
774
+ },
775
+ {
776
+ "epoch": 136.32,
777
+ "grad_norm": 0.0382193848490715,
778
+ "learning_rate": 7.193585337915235e-05,
779
+ "loss": 0.0247,
780
+ "step": 545
781
+ },
782
+ {
783
+ "epoch": 137.64,
784
+ "grad_norm": 0.0314810685813427,
785
+ "learning_rate": 7.164948453608247e-05,
786
+ "loss": 0.0239,
787
+ "step": 550
788
+ },
789
+ {
790
+ "epoch": 138.96,
791
+ "grad_norm": 0.04278745502233505,
792
+ "learning_rate": 7.136311569301261e-05,
793
+ "loss": 0.0243,
794
+ "step": 555
795
+ },
796
+ {
797
+ "epoch": 140.0,
798
+ "grad_norm": 0.09295342862606049,
799
+ "learning_rate": 7.107674684994274e-05,
800
+ "loss": 0.0234,
801
+ "step": 560
802
+ },
803
+ {
804
+ "epoch": 141.32,
805
+ "grad_norm": 0.03429599106311798,
806
+ "learning_rate": 7.079037800687286e-05,
807
+ "loss": 0.0248,
808
+ "step": 565
809
+ },
810
+ {
811
+ "epoch": 142.64,
812
+ "grad_norm": 0.03622185438871384,
813
+ "learning_rate": 7.050400916380299e-05,
814
+ "loss": 0.0234,
815
+ "step": 570
816
+ },
817
+ {
818
+ "epoch": 143.96,
819
+ "grad_norm": 0.042615506798028946,
820
+ "learning_rate": 7.02176403207331e-05,
821
+ "loss": 0.0242,
822
+ "step": 575
823
+ },
824
+ {
825
+ "epoch": 145.0,
826
+ "grad_norm": 0.13792142271995544,
827
+ "learning_rate": 6.993127147766323e-05,
828
+ "loss": 0.0268,
829
+ "step": 580
830
+ },
831
+ {
832
+ "epoch": 146.32,
833
+ "grad_norm": 0.035664405673742294,
834
+ "learning_rate": 6.964490263459336e-05,
835
+ "loss": 0.0231,
836
+ "step": 585
837
+ },
838
+ {
839
+ "epoch": 147.64,
840
+ "grad_norm": 0.033511932939291,
841
+ "learning_rate": 6.935853379152348e-05,
842
+ "loss": 0.0258,
843
+ "step": 590
844
+ },
845
+ {
846
+ "epoch": 148.96,
847
+ "grad_norm": 0.036591917276382446,
848
+ "learning_rate": 6.907216494845361e-05,
849
+ "loss": 0.0248,
850
+ "step": 595
851
+ },
852
+ {
853
+ "epoch": 150.0,
854
+ "grad_norm": 0.11892726272344589,
855
+ "learning_rate": 6.878579610538374e-05,
856
+ "loss": 0.0257,
857
+ "step": 600
858
+ },
859
+ {
860
+ "epoch": 151.32,
861
+ "grad_norm": 0.03532181680202484,
862
+ "learning_rate": 6.849942726231386e-05,
863
+ "loss": 0.0246,
864
+ "step": 605
865
+ },
866
+ {
867
+ "epoch": 152.64,
868
+ "grad_norm": 0.039349090307950974,
869
+ "learning_rate": 6.821305841924399e-05,
870
+ "loss": 0.0244,
871
+ "step": 610
872
+ },
873
+ {
874
+ "epoch": 153.96,
875
+ "grad_norm": 0.03686106950044632,
876
+ "learning_rate": 6.792668957617412e-05,
877
+ "loss": 0.0247,
878
+ "step": 615
879
+ },
880
+ {
881
+ "epoch": 155.0,
882
+ "grad_norm": 0.08257201313972473,
883
+ "learning_rate": 6.764032073310424e-05,
884
+ "loss": 0.0231,
885
+ "step": 620
886
+ },
887
+ {
888
+ "epoch": 156.32,
889
+ "grad_norm": 0.035335343331098557,
890
+ "learning_rate": 6.735395189003437e-05,
891
+ "loss": 0.0243,
892
+ "step": 625
893
+ },
894
+ {
895
+ "epoch": 157.64,
896
+ "grad_norm": 0.030693387612700462,
897
+ "learning_rate": 6.706758304696448e-05,
898
+ "loss": 0.0239,
899
+ "step": 630
900
+ },
901
+ {
902
+ "epoch": 158.96,
903
+ "grad_norm": 0.031573694199323654,
904
+ "learning_rate": 6.678121420389462e-05,
905
+ "loss": 0.0236,
906
+ "step": 635
907
+ },
908
+ {
909
+ "epoch": 160.0,
910
+ "grad_norm": 0.11772840470075607,
911
+ "learning_rate": 6.649484536082475e-05,
912
+ "loss": 0.0247,
913
+ "step": 640
914
+ },
915
+ {
916
+ "epoch": 161.32,
917
+ "grad_norm": 0.03553156182169914,
918
+ "learning_rate": 6.620847651775487e-05,
919
+ "loss": 0.0231,
920
+ "step": 645
921
+ },
922
+ {
923
+ "epoch": 162.64,
924
+ "grad_norm": 0.04065680876374245,
925
+ "learning_rate": 6.5922107674685e-05,
926
+ "loss": 0.0247,
927
+ "step": 650
928
+ },
929
+ {
930
+ "epoch": 163.96,
931
+ "grad_norm": 0.03680557757616043,
932
+ "learning_rate": 6.563573883161513e-05,
933
+ "loss": 0.0244,
934
+ "step": 655
935
+ },
936
+ {
937
+ "epoch": 165.0,
938
+ "grad_norm": 0.1432940512895584,
939
+ "learning_rate": 6.534936998854525e-05,
940
+ "loss": 0.0254,
941
+ "step": 660
942
+ },
943
+ {
944
+ "epoch": 166.32,
945
+ "grad_norm": 0.0374530591070652,
946
+ "learning_rate": 6.506300114547538e-05,
947
+ "loss": 0.024,
948
+ "step": 665
949
+ },
950
+ {
951
+ "epoch": 167.64,
952
+ "grad_norm": 0.039093125611543655,
953
+ "learning_rate": 6.477663230240551e-05,
954
+ "loss": 0.0242,
955
+ "step": 670
956
+ },
957
+ {
958
+ "epoch": 168.96,
959
+ "grad_norm": 0.03439056873321533,
960
+ "learning_rate": 6.449026345933563e-05,
961
+ "loss": 0.0238,
962
+ "step": 675
963
+ },
964
+ {
965
+ "epoch": 170.0,
966
+ "grad_norm": 0.07211510837078094,
967
+ "learning_rate": 6.420389461626576e-05,
968
+ "loss": 0.0224,
969
+ "step": 680
970
+ },
971
+ {
972
+ "epoch": 171.32,
973
+ "grad_norm": 0.03178408369421959,
974
+ "learning_rate": 6.391752577319587e-05,
975
+ "loss": 0.0246,
976
+ "step": 685
977
+ },
978
+ {
979
+ "epoch": 172.64,
980
+ "grad_norm": 0.02913156896829605,
981
+ "learning_rate": 6.3631156930126e-05,
982
+ "loss": 0.0255,
983
+ "step": 690
984
+ },
985
+ {
986
+ "epoch": 173.96,
987
+ "grad_norm": 0.03487716615200043,
988
+ "learning_rate": 6.334478808705613e-05,
989
+ "loss": 0.0257,
990
+ "step": 695
991
+ },
992
+ {
993
+ "epoch": 175.0,
994
+ "grad_norm": 0.12451174110174179,
995
+ "learning_rate": 6.305841924398625e-05,
996
+ "loss": 0.0253,
997
+ "step": 700
998
+ },
999
+ {
1000
+ "epoch": 176.32,
1001
+ "grad_norm": 0.0366508811712265,
1002
+ "learning_rate": 6.277205040091638e-05,
1003
+ "loss": 0.0241,
1004
+ "step": 705
1005
+ },
1006
+ {
1007
+ "epoch": 177.64,
1008
+ "grad_norm": 0.03491870313882828,
1009
+ "learning_rate": 6.24856815578465e-05,
1010
+ "loss": 0.0242,
1011
+ "step": 710
1012
+ },
1013
+ {
1014
+ "epoch": 178.96,
1015
+ "grad_norm": 0.03027982823550701,
1016
+ "learning_rate": 6.219931271477663e-05,
1017
+ "loss": 0.0257,
1018
+ "step": 715
1019
+ },
1020
+ {
1021
+ "epoch": 180.0,
1022
+ "grad_norm": 0.08150530606508255,
1023
+ "learning_rate": 6.191294387170676e-05,
1024
+ "loss": 0.0236,
1025
+ "step": 720
1026
+ },
1027
+ {
1028
+ "epoch": 181.32,
1029
+ "grad_norm": 0.03483245149254799,
1030
+ "learning_rate": 6.162657502863689e-05,
1031
+ "loss": 0.0232,
1032
+ "step": 725
1033
+ },
1034
+ {
1035
+ "epoch": 182.64,
1036
+ "grad_norm": 0.034706421196460724,
1037
+ "learning_rate": 6.134020618556701e-05,
1038
+ "loss": 0.0241,
1039
+ "step": 730
1040
+ },
1041
+ {
1042
+ "epoch": 183.96,
1043
+ "grad_norm": 0.03622004762291908,
1044
+ "learning_rate": 6.105383734249714e-05,
1045
+ "loss": 0.0233,
1046
+ "step": 735
1047
+ },
1048
+ {
1049
+ "epoch": 185.0,
1050
+ "grad_norm": 0.10144224017858505,
1051
+ "learning_rate": 6.076746849942726e-05,
1052
+ "loss": 0.0249,
1053
+ "step": 740
1054
+ },
1055
+ {
1056
+ "epoch": 186.32,
1057
+ "grad_norm": 0.03530497848987579,
1058
+ "learning_rate": 6.0481099656357384e-05,
1059
+ "loss": 0.0238,
1060
+ "step": 745
1061
+ },
1062
+ {
1063
+ "epoch": 187.64,
1064
+ "grad_norm": 0.034086182713508606,
1065
+ "learning_rate": 6.019473081328752e-05,
1066
+ "loss": 0.0245,
1067
+ "step": 750
1068
+ },
1069
+ {
1070
+ "epoch": 188.96,
1071
+ "grad_norm": 0.039041388779878616,
1072
+ "learning_rate": 5.9908361970217644e-05,
1073
+ "loss": 0.0243,
1074
+ "step": 755
1075
+ },
1076
+ {
1077
+ "epoch": 190.0,
1078
+ "grad_norm": 0.1247899979352951,
1079
+ "learning_rate": 5.962199312714777e-05,
1080
+ "loss": 0.0245,
1081
+ "step": 760
1082
+ },
1083
+ {
1084
+ "epoch": 191.32,
1085
+ "grad_norm": 0.035458508878946304,
1086
+ "learning_rate": 5.93356242840779e-05,
1087
+ "loss": 0.0238,
1088
+ "step": 765
1089
+ },
1090
+ {
1091
+ "epoch": 192.64,
1092
+ "grad_norm": 0.03673034906387329,
1093
+ "learning_rate": 5.904925544100802e-05,
1094
+ "loss": 0.0244,
1095
+ "step": 770
1096
+ },
1097
+ {
1098
+ "epoch": 193.96,
1099
+ "grad_norm": 0.03364979103207588,
1100
+ "learning_rate": 5.876288659793815e-05,
1101
+ "loss": 0.0239,
1102
+ "step": 775
1103
+ },
1104
+ {
1105
+ "epoch": 195.0,
1106
+ "grad_norm": 0.09387586265802383,
1107
+ "learning_rate": 5.8476517754868276e-05,
1108
+ "loss": 0.0235,
1109
+ "step": 780
1110
+ },
1111
+ {
1112
+ "epoch": 196.32,
1113
+ "grad_norm": 0.03462570905685425,
1114
+ "learning_rate": 5.81901489117984e-05,
1115
+ "loss": 0.0248,
1116
+ "step": 785
1117
+ },
1118
+ {
1119
+ "epoch": 197.64,
1120
+ "grad_norm": 0.03342005982995033,
1121
+ "learning_rate": 5.790378006872853e-05,
1122
+ "loss": 0.0246,
1123
+ "step": 790
1124
+ },
1125
+ {
1126
+ "epoch": 198.96,
1127
+ "grad_norm": 0.041909925639629364,
1128
+ "learning_rate": 5.761741122565865e-05,
1129
+ "loss": 0.0246,
1130
+ "step": 795
1131
+ },
1132
+ {
1133
+ "epoch": 200.0,
1134
+ "grad_norm": 0.15439164638519287,
1135
+ "learning_rate": 5.7331042382588775e-05,
1136
+ "loss": 0.0258,
1137
+ "step": 800
1138
+ },
1139
+ {
1140
+ "epoch": 201.32,
1141
+ "grad_norm": 0.02883634716272354,
1142
+ "learning_rate": 5.70446735395189e-05,
1143
+ "loss": 0.0236,
1144
+ "step": 805
1145
+ },
1146
+ {
1147
+ "epoch": 202.64,
1148
+ "grad_norm": 0.029865020886063576,
1149
+ "learning_rate": 5.675830469644903e-05,
1150
+ "loss": 0.0235,
1151
+ "step": 810
1152
+ },
1153
+ {
1154
+ "epoch": 203.96,
1155
+ "grad_norm": 0.030608315020799637,
1156
+ "learning_rate": 5.6471935853379155e-05,
1157
+ "loss": 0.024,
1158
+ "step": 815
1159
+ },
1160
+ {
1161
+ "epoch": 205.0,
1162
+ "grad_norm": 0.07783036679029465,
1163
+ "learning_rate": 5.618556701030928e-05,
1164
+ "loss": 0.0224,
1165
+ "step": 820
1166
+ },
1167
+ {
1168
+ "epoch": 206.32,
1169
+ "grad_norm": 0.035508111119270325,
1170
+ "learning_rate": 5.589919816723941e-05,
1171
+ "loss": 0.0233,
1172
+ "step": 825
1173
+ },
1174
+ {
1175
+ "epoch": 207.64,
1176
+ "grad_norm": 0.03703364357352257,
1177
+ "learning_rate": 5.5612829324169534e-05,
1178
+ "loss": 0.0242,
1179
+ "step": 830
1180
+ },
1181
+ {
1182
+ "epoch": 208.96,
1183
+ "grad_norm": 0.030922846868634224,
1184
+ "learning_rate": 5.532646048109966e-05,
1185
+ "loss": 0.0239,
1186
+ "step": 835
1187
+ },
1188
+ {
1189
+ "epoch": 210.0,
1190
+ "grad_norm": 0.11316124349832535,
1191
+ "learning_rate": 5.504009163802979e-05,
1192
+ "loss": 0.0236,
1193
+ "step": 840
1194
+ },
1195
+ {
1196
+ "epoch": 211.32,
1197
+ "grad_norm": 0.032941922545433044,
1198
+ "learning_rate": 5.4753722794959914e-05,
1199
+ "loss": 0.0237,
1200
+ "step": 845
1201
+ },
1202
+ {
1203
+ "epoch": 212.64,
1204
+ "grad_norm": 0.028119860216975212,
1205
+ "learning_rate": 5.4467353951890033e-05,
1206
+ "loss": 0.0235,
1207
+ "step": 850
1208
+ },
1209
+ {
1210
+ "epoch": 213.96,
1211
+ "grad_norm": 0.03130020201206207,
1212
+ "learning_rate": 5.418098510882016e-05,
1213
+ "loss": 0.023,
1214
+ "step": 855
1215
+ },
1216
+ {
1217
+ "epoch": 215.0,
1218
+ "grad_norm": 0.06978127360343933,
1219
+ "learning_rate": 5.3894616265750286e-05,
1220
+ "loss": 0.0226,
1221
+ "step": 860
1222
+ },
1223
+ {
1224
+ "epoch": 216.32,
1225
+ "grad_norm": 0.030422938987612724,
1226
+ "learning_rate": 5.360824742268041e-05,
1227
+ "loss": 0.0231,
1228
+ "step": 865
1229
+ },
1230
+ {
1231
+ "epoch": 217.64,
1232
+ "grad_norm": 0.028223881497979164,
1233
+ "learning_rate": 5.332187857961054e-05,
1234
+ "loss": 0.0238,
1235
+ "step": 870
1236
+ },
1237
+ {
1238
+ "epoch": 218.96,
1239
+ "grad_norm": 0.029208194464445114,
1240
+ "learning_rate": 5.3035509736540666e-05,
1241
+ "loss": 0.0243,
1242
+ "step": 875
1243
+ },
1244
+ {
1245
+ "epoch": 220.0,
1246
+ "grad_norm": 0.16511231660842896,
1247
+ "learning_rate": 5.274914089347079e-05,
1248
+ "loss": 0.0271,
1249
+ "step": 880
1250
+ },
1251
+ {
1252
+ "epoch": 221.32,
1253
+ "grad_norm": 0.03705955296754837,
1254
+ "learning_rate": 5.246277205040092e-05,
1255
+ "loss": 0.0243,
1256
+ "step": 885
1257
+ },
1258
+ {
1259
+ "epoch": 222.64,
1260
+ "grad_norm": 0.030203381553292274,
1261
+ "learning_rate": 5.2176403207331045e-05,
1262
+ "loss": 0.0241,
1263
+ "step": 890
1264
+ },
1265
+ {
1266
+ "epoch": 223.96,
1267
+ "grad_norm": 0.027039049193263054,
1268
+ "learning_rate": 5.189003436426118e-05,
1269
+ "loss": 0.0234,
1270
+ "step": 895
1271
+ },
1272
+ {
1273
+ "epoch": 225.0,
1274
+ "grad_norm": 0.11282758414745331,
1275
+ "learning_rate": 5.1603665521191305e-05,
1276
+ "loss": 0.0254,
1277
+ "step": 900
1278
+ },
1279
+ {
1280
+ "epoch": 226.32,
1281
+ "grad_norm": 0.03700408712029457,
1282
+ "learning_rate": 5.131729667812142e-05,
1283
+ "loss": 0.0236,
1284
+ "step": 905
1285
+ },
1286
+ {
1287
+ "epoch": 227.64,
1288
+ "grad_norm": 0.030705822631716728,
1289
+ "learning_rate": 5.1030927835051544e-05,
1290
+ "loss": 0.024,
1291
+ "step": 910
1292
+ },
1293
+ {
1294
+ "epoch": 228.96,
1295
+ "grad_norm": 0.03678268566727638,
1296
+ "learning_rate": 5.074455899198167e-05,
1297
+ "loss": 0.0238,
1298
+ "step": 915
1299
+ },
1300
+ {
1301
+ "epoch": 230.0,
1302
+ "grad_norm": 0.12632058560848236,
1303
+ "learning_rate": 5.04581901489118e-05,
1304
+ "loss": 0.0269,
1305
+ "step": 920
1306
+ },
1307
+ {
1308
+ "epoch": 231.32,
1309
+ "grad_norm": 0.030165374279022217,
1310
+ "learning_rate": 5.0171821305841924e-05,
1311
+ "loss": 0.0244,
1312
+ "step": 925
1313
+ },
1314
+ {
1315
+ "epoch": 232.64,
1316
+ "grad_norm": 0.029971277341246605,
1317
+ "learning_rate": 4.988545246277205e-05,
1318
+ "loss": 0.0239,
1319
+ "step": 930
1320
+ },
1321
+ {
1322
+ "epoch": 233.96,
1323
+ "grad_norm": 0.033762127161026,
1324
+ "learning_rate": 4.9599083619702184e-05,
1325
+ "loss": 0.0237,
1326
+ "step": 935
1327
+ },
1328
+ {
1329
+ "epoch": 235.0,
1330
+ "grad_norm": 0.09928340464830399,
1331
+ "learning_rate": 4.931271477663231e-05,
1332
+ "loss": 0.0236,
1333
+ "step": 940
1334
+ },
1335
+ {
1336
+ "epoch": 236.32,
1337
+ "grad_norm": 0.030009057372808456,
1338
+ "learning_rate": 4.902634593356243e-05,
1339
+ "loss": 0.0238,
1340
+ "step": 945
1341
+ },
1342
+ {
1343
+ "epoch": 237.64,
1344
+ "grad_norm": 0.03369998559355736,
1345
+ "learning_rate": 4.8739977090492556e-05,
1346
+ "loss": 0.0239,
1347
+ "step": 950
1348
+ },
1349
+ {
1350
+ "epoch": 238.96,
1351
+ "grad_norm": 0.03107636794447899,
1352
+ "learning_rate": 4.845360824742268e-05,
1353
+ "loss": 0.0251,
1354
+ "step": 955
1355
+ },
1356
+ {
1357
+ "epoch": 240.0,
1358
+ "grad_norm": 0.10390744358301163,
1359
+ "learning_rate": 4.816723940435281e-05,
1360
+ "loss": 0.0227,
1361
+ "step": 960
1362
+ },
1363
+ {
1364
+ "epoch": 241.32,
1365
+ "grad_norm": 0.03572176396846771,
1366
+ "learning_rate": 4.7880870561282936e-05,
1367
+ "loss": 0.0242,
1368
+ "step": 965
1369
+ },
1370
+ {
1371
+ "epoch": 242.64,
1372
+ "grad_norm": 0.03051804192364216,
1373
+ "learning_rate": 4.7594501718213055e-05,
1374
+ "loss": 0.0232,
1375
+ "step": 970
1376
+ },
1377
+ {
1378
+ "epoch": 243.96,
1379
+ "grad_norm": 0.031635165214538574,
1380
+ "learning_rate": 4.730813287514318e-05,
1381
+ "loss": 0.0241,
1382
+ "step": 975
1383
+ },
1384
+ {
1385
+ "epoch": 245.0,
1386
+ "grad_norm": 0.0863058865070343,
1387
+ "learning_rate": 4.7021764032073315e-05,
1388
+ "loss": 0.0231,
1389
+ "step": 980
1390
+ },
1391
+ {
1392
+ "epoch": 246.32,
1393
+ "grad_norm": 0.03220526874065399,
1394
+ "learning_rate": 4.673539518900344e-05,
1395
+ "loss": 0.0237,
1396
+ "step": 985
1397
+ },
1398
+ {
1399
+ "epoch": 247.64,
1400
+ "grad_norm": 0.030770031735301018,
1401
+ "learning_rate": 4.644902634593357e-05,
1402
+ "loss": 0.0229,
1403
+ "step": 990
1404
+ },
1405
+ {
1406
+ "epoch": 248.96,
1407
+ "grad_norm": 0.036592498421669006,
1408
+ "learning_rate": 4.6162657502863694e-05,
1409
+ "loss": 0.0233,
1410
+ "step": 995
1411
+ },
1412
+ {
1413
+ "epoch": 250.0,
1414
+ "grad_norm": 0.09140961617231369,
1415
+ "learning_rate": 4.5876288659793814e-05,
1416
+ "loss": 0.0233,
1417
+ "step": 1000
1418
+ },
1419
+ {
1420
+ "epoch": 251.32,
1421
+ "grad_norm": 0.03191279247403145,
1422
+ "learning_rate": 4.558991981672394e-05,
1423
+ "loss": 0.0234,
1424
+ "step": 1005
1425
+ },
1426
+ {
1427
+ "epoch": 252.64,
1428
+ "grad_norm": 0.02950333058834076,
1429
+ "learning_rate": 4.530355097365407e-05,
1430
+ "loss": 0.024,
1431
+ "step": 1010
1432
+ },
1433
+ {
1434
+ "epoch": 253.96,
1435
+ "grad_norm": 0.031532324850559235,
1436
+ "learning_rate": 4.5017182130584194e-05,
1437
+ "loss": 0.0233,
1438
+ "step": 1015
1439
+ },
1440
+ {
1441
+ "epoch": 255.0,
1442
+ "grad_norm": 0.10817220062017441,
1443
+ "learning_rate": 4.473081328751432e-05,
1444
+ "loss": 0.0228,
1445
+ "step": 1020
1446
+ },
1447
+ {
1448
+ "epoch": 256.32,
1449
+ "grad_norm": 0.03229045867919922,
1450
+ "learning_rate": 4.4444444444444447e-05,
1451
+ "loss": 0.0249,
1452
+ "step": 1025
1453
+ },
1454
+ {
1455
+ "epoch": 257.64,
1456
+ "grad_norm": 0.027881359681487083,
1457
+ "learning_rate": 4.415807560137457e-05,
1458
+ "loss": 0.0236,
1459
+ "step": 1030
1460
+ },
1461
+ {
1462
+ "epoch": 258.96,
1463
+ "grad_norm": 0.027970343828201294,
1464
+ "learning_rate": 4.38717067583047e-05,
1465
+ "loss": 0.0248,
1466
+ "step": 1035
1467
+ },
1468
+ {
1469
+ "epoch": 260.0,
1470
+ "grad_norm": 0.0961368978023529,
1471
+ "learning_rate": 4.3585337915234826e-05,
1472
+ "loss": 0.0236,
1473
+ "step": 1040
1474
+ },
1475
+ {
1476
+ "epoch": 261.32,
1477
+ "grad_norm": 0.03192312270402908,
1478
+ "learning_rate": 4.329896907216495e-05,
1479
+ "loss": 0.0231,
1480
+ "step": 1045
1481
+ },
1482
+ {
1483
+ "epoch": 262.64,
1484
+ "grad_norm": 0.03287699446082115,
1485
+ "learning_rate": 4.301260022909508e-05,
1486
+ "loss": 0.0244,
1487
+ "step": 1050
1488
+ },
1489
+ {
1490
+ "epoch": 263.96,
1491
+ "grad_norm": 0.03482283651828766,
1492
+ "learning_rate": 4.27262313860252e-05,
1493
+ "loss": 0.0231,
1494
+ "step": 1055
1495
+ },
1496
+ {
1497
+ "epoch": 265.0,
1498
+ "grad_norm": 0.12014977633953094,
1499
+ "learning_rate": 4.2439862542955325e-05,
1500
+ "loss": 0.0246,
1501
+ "step": 1060
1502
+ },
1503
+ {
1504
+ "epoch": 266.32,
1505
+ "grad_norm": 0.030348435044288635,
1506
+ "learning_rate": 4.215349369988545e-05,
1507
+ "loss": 0.0235,
1508
+ "step": 1065
1509
+ },
1510
+ {
1511
+ "epoch": 267.64,
1512
+ "grad_norm": 0.027197284623980522,
1513
+ "learning_rate": 4.1867124856815585e-05,
1514
+ "loss": 0.0238,
1515
+ "step": 1070
1516
+ },
1517
+ {
1518
+ "epoch": 268.96,
1519
+ "grad_norm": 0.03164960816502571,
1520
+ "learning_rate": 4.158075601374571e-05,
1521
+ "loss": 0.024,
1522
+ "step": 1075
1523
+ },
1524
+ {
1525
+ "epoch": 270.0,
1526
+ "grad_norm": 0.09021521359682083,
1527
+ "learning_rate": 4.129438717067583e-05,
1528
+ "loss": 0.0237,
1529
+ "step": 1080
1530
+ },
1531
+ {
1532
+ "epoch": 271.32,
1533
+ "grad_norm": 0.03432054817676544,
1534
+ "learning_rate": 4.100801832760596e-05,
1535
+ "loss": 0.024,
1536
+ "step": 1085
1537
+ },
1538
+ {
1539
+ "epoch": 272.64,
1540
+ "grad_norm": 0.029961712658405304,
1541
+ "learning_rate": 4.0721649484536084e-05,
1542
+ "loss": 0.0224,
1543
+ "step": 1090
1544
+ },
1545
+ {
1546
+ "epoch": 273.96,
1547
+ "grad_norm": 0.02801748737692833,
1548
+ "learning_rate": 4.043528064146621e-05,
1549
+ "loss": 0.0245,
1550
+ "step": 1095
1551
+ },
1552
+ {
1553
+ "epoch": 275.0,
1554
+ "grad_norm": 0.09304305166006088,
1555
+ "learning_rate": 4.014891179839634e-05,
1556
+ "loss": 0.0229,
1557
+ "step": 1100
1558
+ },
1559
+ {
1560
+ "epoch": 276.32,
1561
+ "grad_norm": 0.03154018521308899,
1562
+ "learning_rate": 3.9862542955326463e-05,
1563
+ "loss": 0.0242,
1564
+ "step": 1105
1565
+ },
1566
+ {
1567
+ "epoch": 277.64,
1568
+ "grad_norm": 0.029925866052508354,
1569
+ "learning_rate": 3.957617411225659e-05,
1570
+ "loss": 0.024,
1571
+ "step": 1110
1572
+ },
1573
+ {
1574
+ "epoch": 278.96,
1575
+ "grad_norm": 0.032234761863946915,
1576
+ "learning_rate": 3.9289805269186716e-05,
1577
+ "loss": 0.0232,
1578
+ "step": 1115
1579
+ },
1580
+ {
1581
+ "epoch": 280.0,
1582
+ "grad_norm": 0.09113281220197678,
1583
+ "learning_rate": 3.900343642611684e-05,
1584
+ "loss": 0.0238,
1585
+ "step": 1120
1586
+ },
1587
+ {
1588
+ "epoch": 281.32,
1589
+ "grad_norm": 0.03371744975447655,
1590
+ "learning_rate": 3.871706758304697e-05,
1591
+ "loss": 0.0242,
1592
+ "step": 1125
1593
+ },
1594
+ {
1595
+ "epoch": 282.64,
1596
+ "grad_norm": 0.033525336533784866,
1597
+ "learning_rate": 3.8430698739977096e-05,
1598
+ "loss": 0.0234,
1599
+ "step": 1130
1600
+ },
1601
+ {
1602
+ "epoch": 283.96,
1603
+ "grad_norm": 0.030558524653315544,
1604
+ "learning_rate": 3.8144329896907216e-05,
1605
+ "loss": 0.0237,
1606
+ "step": 1135
1607
+ },
1608
+ {
1609
+ "epoch": 285.0,
1610
+ "grad_norm": 0.07060851901769638,
1611
+ "learning_rate": 3.785796105383734e-05,
1612
+ "loss": 0.022,
1613
+ "step": 1140
1614
+ },
1615
+ {
1616
+ "epoch": 286.32,
1617
+ "grad_norm": 0.02952047996222973,
1618
+ "learning_rate": 3.757159221076747e-05,
1619
+ "loss": 0.0238,
1620
+ "step": 1145
1621
+ },
1622
+ {
1623
+ "epoch": 287.64,
1624
+ "grad_norm": 0.030197326093912125,
1625
+ "learning_rate": 3.7285223367697595e-05,
1626
+ "loss": 0.0227,
1627
+ "step": 1150
1628
+ },
1629
+ {
1630
+ "epoch": 288.96,
1631
+ "grad_norm": 0.028898609802126884,
1632
+ "learning_rate": 3.699885452462772e-05,
1633
+ "loss": 0.0232,
1634
+ "step": 1155
1635
+ },
1636
+ {
1637
+ "epoch": 290.0,
1638
+ "grad_norm": 0.10391610860824585,
1639
+ "learning_rate": 3.671248568155785e-05,
1640
+ "loss": 0.0236,
1641
+ "step": 1160
1642
+ },
1643
+ {
1644
+ "epoch": 291.32,
1645
+ "grad_norm": 0.0285499207675457,
1646
+ "learning_rate": 3.6426116838487974e-05,
1647
+ "loss": 0.0238,
1648
+ "step": 1165
1649
+ },
1650
+ {
1651
+ "epoch": 292.64,
1652
+ "grad_norm": 0.028268715366721153,
1653
+ "learning_rate": 3.61397479954181e-05,
1654
+ "loss": 0.0229,
1655
+ "step": 1170
1656
+ },
1657
+ {
1658
+ "epoch": 293.96,
1659
+ "grad_norm": 0.02961159311234951,
1660
+ "learning_rate": 3.585337915234823e-05,
1661
+ "loss": 0.0247,
1662
+ "step": 1175
1663
+ },
1664
+ {
1665
+ "epoch": 295.0,
1666
+ "grad_norm": 0.08803751319646835,
1667
+ "learning_rate": 3.5567010309278354e-05,
1668
+ "loss": 0.0226,
1669
+ "step": 1180
1670
+ },
1671
+ {
1672
+ "epoch": 296.32,
1673
+ "grad_norm": 0.03452374413609505,
1674
+ "learning_rate": 3.528064146620848e-05,
1675
+ "loss": 0.0244,
1676
+ "step": 1185
1677
+ },
1678
+ {
1679
+ "epoch": 297.64,
1680
+ "grad_norm": 0.028895270079374313,
1681
+ "learning_rate": 3.49942726231386e-05,
1682
+ "loss": 0.023,
1683
+ "step": 1190
1684
+ },
1685
+ {
1686
+ "epoch": 298.96,
1687
+ "grad_norm": 0.029182473197579384,
1688
+ "learning_rate": 3.4707903780068726e-05,
1689
+ "loss": 0.0234,
1690
+ "step": 1195
1691
+ },
1692
+ {
1693
+ "epoch": 300.0,
1694
+ "grad_norm": 0.11874058097600937,
1695
+ "learning_rate": 3.442153493699885e-05,
1696
+ "loss": 0.0235,
1697
+ "step": 1200
1698
+ },
1699
+ {
1700
+ "epoch": 301.32,
1701
+ "grad_norm": 0.030481066554784775,
1702
+ "learning_rate": 3.4135166093928986e-05,
1703
+ "loss": 0.0237,
1704
+ "step": 1205
1705
+ },
1706
+ {
1707
+ "epoch": 302.64,
1708
+ "grad_norm": 0.03108309395611286,
1709
+ "learning_rate": 3.384879725085911e-05,
1710
+ "loss": 0.023,
1711
+ "step": 1210
1712
+ },
1713
+ {
1714
+ "epoch": 303.96,
1715
+ "grad_norm": 0.03036290407180786,
1716
+ "learning_rate": 3.356242840778923e-05,
1717
+ "loss": 0.0228,
1718
+ "step": 1215
1719
+ },
1720
+ {
1721
+ "epoch": 305.0,
1722
+ "grad_norm": 0.07720436155796051,
1723
+ "learning_rate": 3.327605956471936e-05,
1724
+ "loss": 0.0223,
1725
+ "step": 1220
1726
+ },
1727
+ {
1728
+ "epoch": 306.32,
1729
+ "grad_norm": 0.03028162382543087,
1730
+ "learning_rate": 3.2989690721649485e-05,
1731
+ "loss": 0.0235,
1732
+ "step": 1225
1733
+ },
1734
+ {
1735
+ "epoch": 307.64,
1736
+ "grad_norm": 0.033151157200336456,
1737
+ "learning_rate": 3.270332187857961e-05,
1738
+ "loss": 0.0226,
1739
+ "step": 1230
1740
+ },
1741
+ {
1742
+ "epoch": 308.96,
1743
+ "grad_norm": 0.02951214276254177,
1744
+ "learning_rate": 3.241695303550974e-05,
1745
+ "loss": 0.0235,
1746
+ "step": 1235
1747
+ },
1748
+ {
1749
+ "epoch": 310.0,
1750
+ "grad_norm": 0.09070917963981628,
1751
+ "learning_rate": 3.2130584192439865e-05,
1752
+ "loss": 0.0257,
1753
+ "step": 1240
1754
+ },
1755
+ {
1756
+ "epoch": 311.32,
1757
+ "grad_norm": 0.03337477520108223,
1758
+ "learning_rate": 3.184421534936999e-05,
1759
+ "loss": 0.0248,
1760
+ "step": 1245
1761
+ },
1762
+ {
1763
+ "epoch": 312.64,
1764
+ "grad_norm": 0.03151268512010574,
1765
+ "learning_rate": 3.155784650630012e-05,
1766
+ "loss": 0.0226,
1767
+ "step": 1250
1768
+ },
1769
+ {
1770
+ "epoch": 313.96,
1771
+ "grad_norm": 0.030940482392907143,
1772
+ "learning_rate": 3.1271477663230244e-05,
1773
+ "loss": 0.024,
1774
+ "step": 1255
1775
+ },
1776
+ {
1777
+ "epoch": 315.0,
1778
+ "grad_norm": 0.09032298624515533,
1779
+ "learning_rate": 3.098510882016037e-05,
1780
+ "loss": 0.0236,
1781
+ "step": 1260
1782
+ },
1783
+ {
1784
+ "epoch": 316.32,
1785
+ "grad_norm": 0.029143668711185455,
1786
+ "learning_rate": 3.06987399770905e-05,
1787
+ "loss": 0.0222,
1788
+ "step": 1265
1789
+ },
1790
+ {
1791
+ "epoch": 317.64,
1792
+ "grad_norm": 0.029851289466023445,
1793
+ "learning_rate": 3.0412371134020617e-05,
1794
+ "loss": 0.0246,
1795
+ "step": 1270
1796
+ },
1797
+ {
1798
+ "epoch": 318.96,
1799
+ "grad_norm": 0.03257305920124054,
1800
+ "learning_rate": 3.0126002290950743e-05,
1801
+ "loss": 0.023,
1802
+ "step": 1275
1803
+ },
1804
+ {
1805
+ "epoch": 320.0,
1806
+ "grad_norm": 0.10195237398147583,
1807
+ "learning_rate": 2.983963344788087e-05,
1808
+ "loss": 0.0242,
1809
+ "step": 1280
1810
+ },
1811
+ {
1812
+ "epoch": 321.32,
1813
+ "grad_norm": 0.03116573579609394,
1814
+ "learning_rate": 2.9553264604811e-05,
1815
+ "loss": 0.0237,
1816
+ "step": 1285
1817
+ },
1818
+ {
1819
+ "epoch": 322.64,
1820
+ "grad_norm": 0.033235374838113785,
1821
+ "learning_rate": 2.9266895761741126e-05,
1822
+ "loss": 0.0253,
1823
+ "step": 1290
1824
+ },
1825
+ {
1826
+ "epoch": 323.96,
1827
+ "grad_norm": 0.03546692803502083,
1828
+ "learning_rate": 2.8980526918671253e-05,
1829
+ "loss": 0.0234,
1830
+ "step": 1295
1831
+ },
1832
+ {
1833
+ "epoch": 325.0,
1834
+ "grad_norm": 0.0778215229511261,
1835
+ "learning_rate": 2.8694158075601372e-05,
1836
+ "loss": 0.0231,
1837
+ "step": 1300
1838
+ },
1839
+ {
1840
+ "epoch": 326.32,
1841
+ "grad_norm": 0.029815560206770897,
1842
+ "learning_rate": 2.8407789232531502e-05,
1843
+ "loss": 0.0241,
1844
+ "step": 1305
1845
+ },
1846
+ {
1847
+ "epoch": 327.64,
1848
+ "grad_norm": 0.03497137874364853,
1849
+ "learning_rate": 2.812142038946163e-05,
1850
+ "loss": 0.0233,
1851
+ "step": 1310
1852
+ },
1853
+ {
1854
+ "epoch": 328.96,
1855
+ "grad_norm": 0.030050713568925858,
1856
+ "learning_rate": 2.7835051546391755e-05,
1857
+ "loss": 0.0244,
1858
+ "step": 1315
1859
+ },
1860
+ {
1861
+ "epoch": 330.0,
1862
+ "grad_norm": 0.09748966246843338,
1863
+ "learning_rate": 2.754868270332188e-05,
1864
+ "loss": 0.0231,
1865
+ "step": 1320
1866
+ },
1867
+ {
1868
+ "epoch": 331.32,
1869
+ "grad_norm": 0.0319872722029686,
1870
+ "learning_rate": 2.7262313860252005e-05,
1871
+ "loss": 0.0235,
1872
+ "step": 1325
1873
+ },
1874
+ {
1875
+ "epoch": 332.64,
1876
+ "grad_norm": 0.029525283724069595,
1877
+ "learning_rate": 2.697594501718213e-05,
1878
+ "loss": 0.0243,
1879
+ "step": 1330
1880
+ },
1881
+ {
1882
+ "epoch": 333.96,
1883
+ "grad_norm": 0.029868364334106445,
1884
+ "learning_rate": 2.6689576174112258e-05,
1885
+ "loss": 0.0245,
1886
+ "step": 1335
1887
+ },
1888
+ {
1889
+ "epoch": 335.0,
1890
+ "grad_norm": 0.07746418565511703,
1891
+ "learning_rate": 2.6403207331042384e-05,
1892
+ "loss": 0.0212,
1893
+ "step": 1340
1894
+ },
1895
+ {
1896
+ "epoch": 336.32,
1897
+ "grad_norm": 0.02571861259639263,
1898
+ "learning_rate": 2.611683848797251e-05,
1899
+ "loss": 0.0233,
1900
+ "step": 1345
1901
+ },
1902
+ {
1903
+ "epoch": 337.64,
1904
+ "grad_norm": 0.0320206955075264,
1905
+ "learning_rate": 2.5830469644902637e-05,
1906
+ "loss": 0.0238,
1907
+ "step": 1350
1908
+ },
1909
+ {
1910
+ "epoch": 338.96,
1911
+ "grad_norm": 0.03084505721926689,
1912
+ "learning_rate": 2.554410080183276e-05,
1913
+ "loss": 0.024,
1914
+ "step": 1355
1915
+ },
1916
+ {
1917
+ "epoch": 340.0,
1918
+ "grad_norm": 0.1282522976398468,
1919
+ "learning_rate": 2.5257731958762887e-05,
1920
+ "loss": 0.0237,
1921
+ "step": 1360
1922
+ },
1923
+ {
1924
+ "epoch": 341.32,
1925
+ "grad_norm": 0.03159436210989952,
1926
+ "learning_rate": 2.4971363115693013e-05,
1927
+ "loss": 0.0239,
1928
+ "step": 1365
1929
+ },
1930
+ {
1931
+ "epoch": 342.64,
1932
+ "grad_norm": 0.03368183225393295,
1933
+ "learning_rate": 2.468499427262314e-05,
1934
+ "loss": 0.023,
1935
+ "step": 1370
1936
+ },
1937
+ {
1938
+ "epoch": 343.96,
1939
+ "grad_norm": 0.02871900610625744,
1940
+ "learning_rate": 2.4398625429553266e-05,
1941
+ "loss": 0.0232,
1942
+ "step": 1375
1943
+ },
1944
+ {
1945
+ "epoch": 345.0,
1946
+ "grad_norm": 0.06527750939130783,
1947
+ "learning_rate": 2.4112256586483393e-05,
1948
+ "loss": 0.0216,
1949
+ "step": 1380
1950
+ },
1951
+ {
1952
+ "epoch": 346.32,
1953
+ "grad_norm": 0.029657971113920212,
1954
+ "learning_rate": 2.3825887743413516e-05,
1955
+ "loss": 0.0246,
1956
+ "step": 1385
1957
+ },
1958
+ {
1959
+ "epoch": 347.64,
1960
+ "grad_norm": 0.029672225937247276,
1961
+ "learning_rate": 2.3539518900343642e-05,
1962
+ "loss": 0.0226,
1963
+ "step": 1390
1964
+ },
1965
+ {
1966
+ "epoch": 348.96,
1967
+ "grad_norm": 0.032295338809490204,
1968
+ "learning_rate": 2.3253150057273772e-05,
1969
+ "loss": 0.0234,
1970
+ "step": 1395
1971
+ },
1972
+ {
1973
+ "epoch": 350.0,
1974
+ "grad_norm": 0.12228602916002274,
1975
+ "learning_rate": 2.2966781214203895e-05,
1976
+ "loss": 0.0246,
1977
+ "step": 1400
1978
+ },
1979
+ {
1980
+ "epoch": 351.32,
1981
+ "grad_norm": 0.031152470037341118,
1982
+ "learning_rate": 2.268041237113402e-05,
1983
+ "loss": 0.0244,
1984
+ "step": 1405
1985
+ },
1986
+ {
1987
+ "epoch": 352.64,
1988
+ "grad_norm": 0.03246377035975456,
1989
+ "learning_rate": 2.2394043528064148e-05,
1990
+ "loss": 0.0241,
1991
+ "step": 1410
1992
+ },
1993
+ {
1994
+ "epoch": 353.96,
1995
+ "grad_norm": 0.03664344921708107,
1996
+ "learning_rate": 2.210767468499427e-05,
1997
+ "loss": 0.0236,
1998
+ "step": 1415
1999
+ },
2000
+ {
2001
+ "epoch": 355.0,
2002
+ "grad_norm": 0.12599903345108032,
2003
+ "learning_rate": 2.18213058419244e-05,
2004
+ "loss": 0.0242,
2005
+ "step": 1420
2006
+ },
2007
+ {
2008
+ "epoch": 356.32,
2009
+ "grad_norm": 0.03213375434279442,
2010
+ "learning_rate": 2.1534936998854528e-05,
2011
+ "loss": 0.023,
2012
+ "step": 1425
2013
+ },
2014
+ {
2015
+ "epoch": 357.64,
2016
+ "grad_norm": 0.029569735750555992,
2017
+ "learning_rate": 2.124856815578465e-05,
2018
+ "loss": 0.0242,
2019
+ "step": 1430
2020
+ },
2021
+ {
2022
+ "epoch": 358.96,
2023
+ "grad_norm": 0.030345458537340164,
2024
+ "learning_rate": 2.0962199312714777e-05,
2025
+ "loss": 0.0237,
2026
+ "step": 1435
2027
+ },
2028
+ {
2029
+ "epoch": 360.0,
2030
+ "grad_norm": 0.07442766427993774,
2031
+ "learning_rate": 2.0675830469644904e-05,
2032
+ "loss": 0.0225,
2033
+ "step": 1440
2034
+ },
2035
+ {
2036
+ "epoch": 361.32,
2037
+ "grad_norm": 0.03161914646625519,
2038
+ "learning_rate": 2.038946162657503e-05,
2039
+ "loss": 0.0247,
2040
+ "step": 1445
2041
+ },
2042
+ {
2043
+ "epoch": 362.64,
2044
+ "grad_norm": 0.03342209383845329,
2045
+ "learning_rate": 2.0103092783505157e-05,
2046
+ "loss": 0.0224,
2047
+ "step": 1450
2048
+ },
2049
+ {
2050
+ "epoch": 363.96,
2051
+ "grad_norm": 0.029506616294384003,
2052
+ "learning_rate": 1.981672394043528e-05,
2053
+ "loss": 0.0226,
2054
+ "step": 1455
2055
+ },
2056
+ {
2057
+ "epoch": 365.0,
2058
+ "grad_norm": 0.13045279681682587,
2059
+ "learning_rate": 1.9530355097365406e-05,
2060
+ "loss": 0.0251,
2061
+ "step": 1460
2062
+ },
2063
+ {
2064
+ "epoch": 366.32,
2065
+ "grad_norm": 0.03303099796175957,
2066
+ "learning_rate": 1.9243986254295536e-05,
2067
+ "loss": 0.0239,
2068
+ "step": 1465
2069
+ },
2070
+ {
2071
+ "epoch": 367.64,
2072
+ "grad_norm": 0.02956564724445343,
2073
+ "learning_rate": 1.895761741122566e-05,
2074
+ "loss": 0.0221,
2075
+ "step": 1470
2076
+ },
2077
+ {
2078
+ "epoch": 368.96,
2079
+ "grad_norm": 0.03200279548764229,
2080
+ "learning_rate": 1.8671248568155786e-05,
2081
+ "loss": 0.0234,
2082
+ "step": 1475
2083
+ },
2084
+ {
2085
+ "epoch": 370.0,
2086
+ "grad_norm": 0.12507398426532745,
2087
+ "learning_rate": 1.8384879725085912e-05,
2088
+ "loss": 0.0235,
2089
+ "step": 1480
2090
+ },
2091
+ {
2092
+ "epoch": 371.32,
2093
+ "grad_norm": 0.03214867785573006,
2094
+ "learning_rate": 1.809851088201604e-05,
2095
+ "loss": 0.0233,
2096
+ "step": 1485
2097
+ },
2098
+ {
2099
+ "epoch": 372.64,
2100
+ "grad_norm": 0.03199266269803047,
2101
+ "learning_rate": 1.7812142038946165e-05,
2102
+ "loss": 0.023,
2103
+ "step": 1490
2104
+ },
2105
+ {
2106
+ "epoch": 373.96,
2107
+ "grad_norm": 0.027682902291417122,
2108
+ "learning_rate": 1.7525773195876288e-05,
2109
+ "loss": 0.0246,
2110
+ "step": 1495
2111
+ },
2112
+ {
2113
+ "epoch": 375.0,
2114
+ "grad_norm": 0.10432948172092438,
2115
+ "learning_rate": 1.7239404352806415e-05,
2116
+ "loss": 0.0228,
2117
+ "step": 1500
2118
+ },
2119
+ {
2120
+ "epoch": 376.32,
2121
+ "grad_norm": 0.03665570914745331,
2122
+ "learning_rate": 1.695303550973654e-05,
2123
+ "loss": 0.0235,
2124
+ "step": 1505
2125
+ },
2126
+ {
2127
+ "epoch": 377.64,
2128
+ "grad_norm": 0.03269299864768982,
2129
+ "learning_rate": 1.6666666666666667e-05,
2130
+ "loss": 0.0228,
2131
+ "step": 1510
2132
+ },
2133
+ {
2134
+ "epoch": 378.96,
2135
+ "grad_norm": 0.030298851430416107,
2136
+ "learning_rate": 1.6380297823596794e-05,
2137
+ "loss": 0.0232,
2138
+ "step": 1515
2139
+ },
2140
+ {
2141
+ "epoch": 380.0,
2142
+ "grad_norm": 0.1330370008945465,
2143
+ "learning_rate": 1.609392898052692e-05,
2144
+ "loss": 0.024,
2145
+ "step": 1520
2146
+ },
2147
+ {
2148
+ "epoch": 381.32,
2149
+ "grad_norm": 0.026194848120212555,
2150
+ "learning_rate": 1.5807560137457044e-05,
2151
+ "loss": 0.0232,
2152
+ "step": 1525
2153
+ },
2154
+ {
2155
+ "epoch": 382.64,
2156
+ "grad_norm": 0.030696984380483627,
2157
+ "learning_rate": 1.5521191294387173e-05,
2158
+ "loss": 0.024,
2159
+ "step": 1530
2160
+ },
2161
+ {
2162
+ "epoch": 383.96,
2163
+ "grad_norm": 0.03159346804022789,
2164
+ "learning_rate": 1.5234822451317298e-05,
2165
+ "loss": 0.0237,
2166
+ "step": 1535
2167
+ },
2168
+ {
2169
+ "epoch": 385.0,
2170
+ "grad_norm": 0.0895160585641861,
2171
+ "learning_rate": 1.4948453608247423e-05,
2172
+ "loss": 0.024,
2173
+ "step": 1540
2174
+ },
2175
+ {
2176
+ "epoch": 386.32,
2177
+ "grad_norm": 0.030342400074005127,
2178
+ "learning_rate": 1.466208476517755e-05,
2179
+ "loss": 0.0226,
2180
+ "step": 1545
2181
+ },
2182
+ {
2183
+ "epoch": 387.64,
2184
+ "grad_norm": 0.03451743721961975,
2185
+ "learning_rate": 1.4375715922107674e-05,
2186
+ "loss": 0.0241,
2187
+ "step": 1550
2188
+ },
2189
+ {
2190
+ "epoch": 388.96,
2191
+ "grad_norm": 0.034534044563770294,
2192
+ "learning_rate": 1.40893470790378e-05,
2193
+ "loss": 0.0224,
2194
+ "step": 1555
2195
+ },
2196
+ {
2197
+ "epoch": 390.0,
2198
+ "grad_norm": 0.11649748682975769,
2199
+ "learning_rate": 1.3802978235967929e-05,
2200
+ "loss": 0.024,
2201
+ "step": 1560
2202
+ },
2203
+ {
2204
+ "epoch": 391.32,
2205
+ "grad_norm": 0.02730483002960682,
2206
+ "learning_rate": 1.3516609392898052e-05,
2207
+ "loss": 0.0232,
2208
+ "step": 1565
2209
+ },
2210
+ {
2211
+ "epoch": 392.64,
2212
+ "grad_norm": 0.03302980959415436,
2213
+ "learning_rate": 1.323024054982818e-05,
2214
+ "loss": 0.0245,
2215
+ "step": 1570
2216
+ },
2217
+ {
2218
+ "epoch": 393.96,
2219
+ "grad_norm": 0.030424287542700768,
2220
+ "learning_rate": 1.2943871706758307e-05,
2221
+ "loss": 0.0231,
2222
+ "step": 1575
2223
+ },
2224
+ {
2225
+ "epoch": 395.0,
2226
+ "grad_norm": 0.09190870821475983,
2227
+ "learning_rate": 1.2657502863688431e-05,
2228
+ "loss": 0.0233,
2229
+ "step": 1580
2230
+ },
2231
+ {
2232
+ "epoch": 396.32,
2233
+ "grad_norm": 0.03016272746026516,
2234
+ "learning_rate": 1.2371134020618558e-05,
2235
+ "loss": 0.0237,
2236
+ "step": 1585
2237
+ },
2238
+ {
2239
+ "epoch": 397.64,
2240
+ "grad_norm": 0.029102135449647903,
2241
+ "learning_rate": 1.2084765177548683e-05,
2242
+ "loss": 0.0237,
2243
+ "step": 1590
2244
+ },
2245
+ {
2246
+ "epoch": 398.96,
2247
+ "grad_norm": 0.030849164351820946,
2248
+ "learning_rate": 1.1798396334478809e-05,
2249
+ "loss": 0.0238,
2250
+ "step": 1595
2251
+ },
2252
+ {
2253
+ "epoch": 400.0,
2254
+ "grad_norm": 0.09185610711574554,
2255
+ "learning_rate": 1.1512027491408934e-05,
2256
+ "loss": 0.0223,
2257
+ "step": 1600
2258
+ },
2259
+ {
2260
+ "epoch": 401.32,
2261
+ "grad_norm": 0.030718082562088966,
2262
+ "learning_rate": 1.1225658648339062e-05,
2263
+ "loss": 0.0228,
2264
+ "step": 1605
2265
+ },
2266
+ {
2267
+ "epoch": 402.64,
2268
+ "grad_norm": 0.028845084831118584,
2269
+ "learning_rate": 1.0939289805269187e-05,
2270
+ "loss": 0.0238,
2271
+ "step": 1610
2272
+ },
2273
+ {
2274
+ "epoch": 403.96,
2275
+ "grad_norm": 0.03036542609333992,
2276
+ "learning_rate": 1.0652920962199313e-05,
2277
+ "loss": 0.0241,
2278
+ "step": 1615
2279
+ },
2280
+ {
2281
+ "epoch": 405.0,
2282
+ "grad_norm": 0.10246625542640686,
2283
+ "learning_rate": 1.036655211912944e-05,
2284
+ "loss": 0.0234,
2285
+ "step": 1620
2286
+ },
2287
+ {
2288
+ "epoch": 406.32,
2289
+ "grad_norm": 0.03127530962228775,
2290
+ "learning_rate": 1.0080183276059566e-05,
2291
+ "loss": 0.0238,
2292
+ "step": 1625
2293
+ },
2294
+ {
2295
+ "epoch": 407.64,
2296
+ "grad_norm": 0.036298803985118866,
2297
+ "learning_rate": 9.793814432989691e-06,
2298
+ "loss": 0.0226,
2299
+ "step": 1630
2300
+ },
2301
+ {
2302
+ "epoch": 408.96,
2303
+ "grad_norm": 0.028423035517334938,
2304
+ "learning_rate": 9.507445589919818e-06,
2305
+ "loss": 0.0231,
2306
+ "step": 1635
2307
+ },
2308
+ {
2309
+ "epoch": 410.0,
2310
+ "grad_norm": 0.07871800661087036,
2311
+ "learning_rate": 9.221076746849944e-06,
2312
+ "loss": 0.0218,
2313
+ "step": 1640
2314
+ },
2315
+ {
2316
+ "epoch": 411.32,
2317
+ "grad_norm": 0.0336175374686718,
2318
+ "learning_rate": 8.934707903780069e-06,
2319
+ "loss": 0.0242,
2320
+ "step": 1645
2321
+ },
2322
+ {
2323
+ "epoch": 412.64,
2324
+ "grad_norm": 0.03624117374420166,
2325
+ "learning_rate": 8.648339060710195e-06,
2326
+ "loss": 0.0227,
2327
+ "step": 1650
2328
+ },
2329
+ {
2330
+ "epoch": 413.96,
2331
+ "grad_norm": 0.03119911253452301,
2332
+ "learning_rate": 8.36197021764032e-06,
2333
+ "loss": 0.0218,
2334
+ "step": 1655
2335
+ },
2336
+ {
2337
+ "epoch": 415.0,
2338
+ "grad_norm": 0.09461841732263565,
2339
+ "learning_rate": 8.075601374570448e-06,
2340
+ "loss": 0.0227,
2341
+ "step": 1660
2342
+ },
2343
+ {
2344
+ "epoch": 416.32,
2345
+ "grad_norm": 0.02897919900715351,
2346
+ "learning_rate": 7.789232531500573e-06,
2347
+ "loss": 0.0233,
2348
+ "step": 1665
2349
+ },
2350
+ {
2351
+ "epoch": 417.64,
2352
+ "grad_norm": 0.03222072497010231,
2353
+ "learning_rate": 7.502863688430699e-06,
2354
+ "loss": 0.0234,
2355
+ "step": 1670
2356
+ },
2357
+ {
2358
+ "epoch": 418.96,
2359
+ "grad_norm": 0.02793605998158455,
2360
+ "learning_rate": 7.216494845360824e-06,
2361
+ "loss": 0.0231,
2362
+ "step": 1675
2363
+ },
2364
+ {
2365
+ "epoch": 420.0,
2366
+ "grad_norm": 0.10282719135284424,
2367
+ "learning_rate": 6.930126002290952e-06,
2368
+ "loss": 0.0242,
2369
+ "step": 1680
2370
+ },
2371
+ {
2372
+ "epoch": 421.32,
2373
+ "grad_norm": 0.029103396460413933,
2374
+ "learning_rate": 6.643757159221077e-06,
2375
+ "loss": 0.0228,
2376
+ "step": 1685
2377
+ },
2378
+ {
2379
+ "epoch": 422.64,
2380
+ "grad_norm": 0.027615424245595932,
2381
+ "learning_rate": 6.357388316151203e-06,
2382
+ "loss": 0.0229,
2383
+ "step": 1690
2384
+ },
2385
+ {
2386
+ "epoch": 423.96,
2387
+ "grad_norm": 0.03273004665970802,
2388
+ "learning_rate": 6.071019473081329e-06,
2389
+ "loss": 0.023,
2390
+ "step": 1695
2391
+ },
2392
+ {
2393
+ "epoch": 425.0,
2394
+ "grad_norm": 0.088851198554039,
2395
+ "learning_rate": 5.784650630011455e-06,
2396
+ "loss": 0.0242,
2397
+ "step": 1700
2398
+ },
2399
+ {
2400
+ "epoch": 426.32,
2401
+ "grad_norm": 0.031545545905828476,
2402
+ "learning_rate": 5.498281786941581e-06,
2403
+ "loss": 0.0236,
2404
+ "step": 1705
2405
+ },
2406
+ {
2407
+ "epoch": 427.64,
2408
+ "grad_norm": 0.03436841815710068,
2409
+ "learning_rate": 5.211912943871707e-06,
2410
+ "loss": 0.0231,
2411
+ "step": 1710
2412
+ },
2413
+ {
2414
+ "epoch": 428.96,
2415
+ "grad_norm": 0.03470204398036003,
2416
+ "learning_rate": 4.925544100801833e-06,
2417
+ "loss": 0.023,
2418
+ "step": 1715
2419
+ },
2420
+ {
2421
+ "epoch": 430.0,
2422
+ "grad_norm": 0.0859316810965538,
2423
+ "learning_rate": 4.639175257731959e-06,
2424
+ "loss": 0.0233,
2425
+ "step": 1720
2426
+ },
2427
+ {
2428
+ "epoch": 431.32,
2429
+ "grad_norm": 0.02714327722787857,
2430
+ "learning_rate": 4.352806414662085e-06,
2431
+ "loss": 0.0215,
2432
+ "step": 1725
2433
+ },
2434
+ {
2435
+ "epoch": 432.64,
2436
+ "grad_norm": 0.03115593083202839,
2437
+ "learning_rate": 4.066437571592211e-06,
2438
+ "loss": 0.0233,
2439
+ "step": 1730
2440
+ },
2441
+ {
2442
+ "epoch": 433.96,
2443
+ "grad_norm": 0.03160055726766586,
2444
+ "learning_rate": 3.7800687285223365e-06,
2445
+ "loss": 0.0222,
2446
+ "step": 1735
2447
+ },
2448
+ {
2449
+ "epoch": 435.0,
2450
+ "grad_norm": 0.10642414540052414,
2451
+ "learning_rate": 3.493699885452463e-06,
2452
+ "loss": 0.0221,
2453
+ "step": 1740
2454
+ },
2455
+ {
2456
+ "epoch": 436.32,
2457
+ "grad_norm": 0.029918361455202103,
2458
+ "learning_rate": 3.2073310423825886e-06,
2459
+ "loss": 0.024,
2460
+ "step": 1745
2461
+ },
2462
+ {
2463
+ "epoch": 437.64,
2464
+ "grad_norm": 0.030128490179777145,
2465
+ "learning_rate": 2.920962199312715e-06,
2466
+ "loss": 0.023,
2467
+ "step": 1750
2468
+ },
2469
+ {
2470
+ "epoch": 438.96,
2471
+ "grad_norm": 0.03472098708152771,
2472
+ "learning_rate": 2.6345933562428407e-06,
2473
+ "loss": 0.0231,
2474
+ "step": 1755
2475
+ },
2476
+ {
2477
+ "epoch": 440.0,
2478
+ "grad_norm": 0.10841913521289825,
2479
+ "learning_rate": 2.3482245131729668e-06,
2480
+ "loss": 0.0238,
2481
+ "step": 1760
2482
+ },
2483
+ {
2484
+ "epoch": 441.32,
2485
+ "grad_norm": 0.03282919153571129,
2486
+ "learning_rate": 2.061855670103093e-06,
2487
+ "loss": 0.0241,
2488
+ "step": 1765
2489
+ },
2490
+ {
2491
+ "epoch": 442.64,
2492
+ "grad_norm": 0.030162909999489784,
2493
+ "learning_rate": 1.7754868270332189e-06,
2494
+ "loss": 0.0234,
2495
+ "step": 1770
2496
+ },
2497
+ {
2498
+ "epoch": 443.96,
2499
+ "grad_norm": 0.032848529517650604,
2500
+ "learning_rate": 1.4891179839633447e-06,
2501
+ "loss": 0.0225,
2502
+ "step": 1775
2503
+ },
2504
+ {
2505
+ "epoch": 445.0,
2506
+ "grad_norm": 0.09595301747322083,
2507
+ "learning_rate": 1.202749140893471e-06,
2508
+ "loss": 0.023,
2509
+ "step": 1780
2510
+ },
2511
+ {
2512
+ "epoch": 446.32,
2513
+ "grad_norm": 0.027366334572434425,
2514
+ "learning_rate": 9.163802978235968e-07,
2515
+ "loss": 0.0242,
2516
+ "step": 1785
2517
+ },
2518
+ {
2519
+ "epoch": 447.64,
2520
+ "grad_norm": 0.029810229316353798,
2521
+ "learning_rate": 6.300114547537229e-07,
2522
+ "loss": 0.0243,
2523
+ "step": 1790
2524
+ },
2525
+ {
2526
+ "epoch": 448.96,
2527
+ "grad_norm": 0.03164233639836311,
2528
+ "learning_rate": 3.436426116838488e-07,
2529
+ "loss": 0.0238,
2530
+ "step": 1795
2531
+ },
2532
+ {
2533
+ "epoch": 450.0,
2534
+ "grad_norm": 0.11831732094287872,
2535
+ "learning_rate": 5.72737686139748e-08,
2536
+ "loss": 0.0246,
2537
+ "step": 1800
2538
+ }
2539
+ ],
2540
+ "logging_steps": 5,
2541
+ "max_steps": 1800,
2542
+ "num_input_tokens_seen": 0,
2543
+ "num_train_epochs": 450,
2544
+ "save_steps": 300,
2545
+ "stateful_callbacks": {
2546
+ "TrainerControl": {
2547
+ "args": {
2548
+ "should_epoch_stop": false,
2549
+ "should_evaluate": false,
2550
+ "should_log": false,
2551
+ "should_save": true,
2552
+ "should_training_stop": true
2553
+ },
2554
+ "attributes": {}
2555
+ }
2556
+ },
2557
+ "total_flos": 1.5308141101056e+18,
2558
+ "train_batch_size": 2,
2559
+ "trial_name": null,
2560
+ "trial_params": null
2561
+ }
Math_QA/group_09/checkpoints/checkpoint-1800/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
Math_QA/group_09/checkpoints/checkpoint-300/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: /hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/不冻结Qwen训练/models/Qwen2.5-1.5B-Instruct
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.12.0
Math_QA/group_09/checkpoints/checkpoint-300/adapter_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/\u4e0d\u51bb\u7ed3Qwen\u8bad\u7ec3/models/Qwen2.5-1.5B-Instruct",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 128,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 64,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "gate_proj",
24
+ "o_proj",
25
+ "k_proj",
26
+ "q_proj",
27
+ "up_proj",
28
+ "down_proj",
29
+ "v_proj"
30
+ ],
31
+ "task_type": "CAUSAL_LM",
32
+ "use_dora": false,
33
+ "use_rslora": false
34
+ }
Math_QA/group_09/checkpoints/checkpoint-300/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
Math_QA/group_09/checkpoints/checkpoint-300/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
Math_QA/group_09/checkpoints/checkpoint-300/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
Math_QA/group_09/checkpoints/checkpoint-300/tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
Math_QA/group_09/checkpoints/checkpoint-300/trainer_state.json ADDED
@@ -0,0 +1,461 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 75.0,
6
+ "eval_steps": 500,
7
+ "global_step": 300,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.32,
14
+ "grad_norm": 10.95508098602295,
15
+ "learning_rate": 0.0,
16
+ "loss": 1.9528,
17
+ "step": 1
18
+ },
19
+ {
20
+ "epoch": 1.32,
21
+ "grad_norm": 6.976505279541016,
22
+ "learning_rate": 7.4074074074074075e-06,
23
+ "loss": 1.7919,
24
+ "step": 5
25
+ },
26
+ {
27
+ "epoch": 2.64,
28
+ "grad_norm": 3.288942575454712,
29
+ "learning_rate": 1.6666666666666667e-05,
30
+ "loss": 1.6625,
31
+ "step": 10
32
+ },
33
+ {
34
+ "epoch": 3.96,
35
+ "grad_norm": 2.111987829208374,
36
+ "learning_rate": 2.5925925925925925e-05,
37
+ "loss": 1.2009,
38
+ "step": 15
39
+ },
40
+ {
41
+ "epoch": 5.0,
42
+ "grad_norm": 2.4555912017822266,
43
+ "learning_rate": 3.518518518518519e-05,
44
+ "loss": 0.8544,
45
+ "step": 20
46
+ },
47
+ {
48
+ "epoch": 6.32,
49
+ "grad_norm": 0.656902015209198,
50
+ "learning_rate": 4.4444444444444447e-05,
51
+ "loss": 0.7449,
52
+ "step": 25
53
+ },
54
+ {
55
+ "epoch": 7.64,
56
+ "grad_norm": 0.5291489958763123,
57
+ "learning_rate": 5.370370370370371e-05,
58
+ "loss": 0.5884,
59
+ "step": 30
60
+ },
61
+ {
62
+ "epoch": 8.96,
63
+ "grad_norm": 0.5356371998786926,
64
+ "learning_rate": 6.296296296296296e-05,
65
+ "loss": 0.6349,
66
+ "step": 35
67
+ },
68
+ {
69
+ "epoch": 10.0,
70
+ "grad_norm": 1.3841232061386108,
71
+ "learning_rate": 7.222222222222222e-05,
72
+ "loss": 0.5088,
73
+ "step": 40
74
+ },
75
+ {
76
+ "epoch": 11.32,
77
+ "grad_norm": 0.6104851365089417,
78
+ "learning_rate": 8.148148148148148e-05,
79
+ "loss": 0.4279,
80
+ "step": 45
81
+ },
82
+ {
83
+ "epoch": 12.64,
84
+ "grad_norm": 0.6166547536849976,
85
+ "learning_rate": 9.074074074074075e-05,
86
+ "loss": 0.2689,
87
+ "step": 50
88
+ },
89
+ {
90
+ "epoch": 13.96,
91
+ "grad_norm": 1.70536470413208,
92
+ "learning_rate": 0.0001,
93
+ "loss": 0.1985,
94
+ "step": 55
95
+ },
96
+ {
97
+ "epoch": 15.0,
98
+ "grad_norm": 3.289710760116577,
99
+ "learning_rate": 9.971363115693013e-05,
100
+ "loss": 0.1346,
101
+ "step": 60
102
+ },
103
+ {
104
+ "epoch": 16.32,
105
+ "grad_norm": 0.7723984122276306,
106
+ "learning_rate": 9.942726231386026e-05,
107
+ "loss": 0.0959,
108
+ "step": 65
109
+ },
110
+ {
111
+ "epoch": 17.64,
112
+ "grad_norm": 0.8176506161689758,
113
+ "learning_rate": 9.914089347079038e-05,
114
+ "loss": 0.0617,
115
+ "step": 70
116
+ },
117
+ {
118
+ "epoch": 18.96,
119
+ "grad_norm": 0.5226219892501831,
120
+ "learning_rate": 9.885452462772051e-05,
121
+ "loss": 0.0428,
122
+ "step": 75
123
+ },
124
+ {
125
+ "epoch": 20.0,
126
+ "grad_norm": 2.8831839561462402,
127
+ "learning_rate": 9.856815578465064e-05,
128
+ "loss": 0.0416,
129
+ "step": 80
130
+ },
131
+ {
132
+ "epoch": 21.32,
133
+ "grad_norm": 0.26046544313430786,
134
+ "learning_rate": 9.828178694158075e-05,
135
+ "loss": 0.0334,
136
+ "step": 85
137
+ },
138
+ {
139
+ "epoch": 22.64,
140
+ "grad_norm": 0.5656669735908508,
141
+ "learning_rate": 9.799541809851088e-05,
142
+ "loss": 0.0347,
143
+ "step": 90
144
+ },
145
+ {
146
+ "epoch": 23.96,
147
+ "grad_norm": 0.5219624042510986,
148
+ "learning_rate": 9.7709049255441e-05,
149
+ "loss": 0.0336,
150
+ "step": 95
151
+ },
152
+ {
153
+ "epoch": 25.0,
154
+ "grad_norm": 1.2479528188705444,
155
+ "learning_rate": 9.742268041237114e-05,
156
+ "loss": 0.0325,
157
+ "step": 100
158
+ },
159
+ {
160
+ "epoch": 26.32,
161
+ "grad_norm": 0.3272712826728821,
162
+ "learning_rate": 9.713631156930127e-05,
163
+ "loss": 0.0317,
164
+ "step": 105
165
+ },
166
+ {
167
+ "epoch": 27.64,
168
+ "grad_norm": 0.4236655831336975,
169
+ "learning_rate": 9.68499427262314e-05,
170
+ "loss": 0.0322,
171
+ "step": 110
172
+ },
173
+ {
174
+ "epoch": 28.96,
175
+ "grad_norm": 0.23534469306468964,
176
+ "learning_rate": 9.656357388316152e-05,
177
+ "loss": 0.029,
178
+ "step": 115
179
+ },
180
+ {
181
+ "epoch": 30.0,
182
+ "grad_norm": 0.530704140663147,
183
+ "learning_rate": 9.627720504009165e-05,
184
+ "loss": 0.0301,
185
+ "step": 120
186
+ },
187
+ {
188
+ "epoch": 31.32,
189
+ "grad_norm": 0.08252622932195663,
190
+ "learning_rate": 9.599083619702178e-05,
191
+ "loss": 0.029,
192
+ "step": 125
193
+ },
194
+ {
195
+ "epoch": 32.64,
196
+ "grad_norm": 0.2679576277732849,
197
+ "learning_rate": 9.57044673539519e-05,
198
+ "loss": 0.0287,
199
+ "step": 130
200
+ },
201
+ {
202
+ "epoch": 33.96,
203
+ "grad_norm": 0.30863121151924133,
204
+ "learning_rate": 9.541809851088203e-05,
205
+ "loss": 0.029,
206
+ "step": 135
207
+ },
208
+ {
209
+ "epoch": 35.0,
210
+ "grad_norm": 0.27921056747436523,
211
+ "learning_rate": 9.513172966781214e-05,
212
+ "loss": 0.0272,
213
+ "step": 140
214
+ },
215
+ {
216
+ "epoch": 36.32,
217
+ "grad_norm": 0.15001249313354492,
218
+ "learning_rate": 9.484536082474227e-05,
219
+ "loss": 0.0289,
220
+ "step": 145
221
+ },
222
+ {
223
+ "epoch": 37.64,
224
+ "grad_norm": 0.391609787940979,
225
+ "learning_rate": 9.45589919816724e-05,
226
+ "loss": 0.0295,
227
+ "step": 150
228
+ },
229
+ {
230
+ "epoch": 38.96,
231
+ "grad_norm": 0.24230684340000153,
232
+ "learning_rate": 9.427262313860252e-05,
233
+ "loss": 0.0265,
234
+ "step": 155
235
+ },
236
+ {
237
+ "epoch": 40.0,
238
+ "grad_norm": 2.2498250007629395,
239
+ "learning_rate": 9.398625429553265e-05,
240
+ "loss": 0.0319,
241
+ "step": 160
242
+ },
243
+ {
244
+ "epoch": 41.32,
245
+ "grad_norm": 0.14986856281757355,
246
+ "learning_rate": 9.369988545246277e-05,
247
+ "loss": 0.0277,
248
+ "step": 165
249
+ },
250
+ {
251
+ "epoch": 42.64,
252
+ "grad_norm": 0.14574986696243286,
253
+ "learning_rate": 9.34135166093929e-05,
254
+ "loss": 0.0264,
255
+ "step": 170
256
+ },
257
+ {
258
+ "epoch": 43.96,
259
+ "grad_norm": 0.11353456974029541,
260
+ "learning_rate": 9.312714776632303e-05,
261
+ "loss": 0.026,
262
+ "step": 175
263
+ },
264
+ {
265
+ "epoch": 45.0,
266
+ "grad_norm": 0.19234131276607513,
267
+ "learning_rate": 9.284077892325315e-05,
268
+ "loss": 0.0237,
269
+ "step": 180
270
+ },
271
+ {
272
+ "epoch": 46.32,
273
+ "grad_norm": 0.058677107095718384,
274
+ "learning_rate": 9.255441008018328e-05,
275
+ "loss": 0.0265,
276
+ "step": 185
277
+ },
278
+ {
279
+ "epoch": 47.64,
280
+ "grad_norm": 0.2846521735191345,
281
+ "learning_rate": 9.22680412371134e-05,
282
+ "loss": 0.0279,
283
+ "step": 190
284
+ },
285
+ {
286
+ "epoch": 48.96,
287
+ "grad_norm": 0.06889114528894424,
288
+ "learning_rate": 9.198167239404353e-05,
289
+ "loss": 0.0257,
290
+ "step": 195
291
+ },
292
+ {
293
+ "epoch": 50.0,
294
+ "grad_norm": 0.1600271314382553,
295
+ "learning_rate": 9.169530355097366e-05,
296
+ "loss": 0.0249,
297
+ "step": 200
298
+ },
299
+ {
300
+ "epoch": 51.32,
301
+ "grad_norm": 0.06680695712566376,
302
+ "learning_rate": 9.140893470790379e-05,
303
+ "loss": 0.0245,
304
+ "step": 205
305
+ },
306
+ {
307
+ "epoch": 52.64,
308
+ "grad_norm": 0.06898869574069977,
309
+ "learning_rate": 9.112256586483391e-05,
310
+ "loss": 0.0257,
311
+ "step": 210
312
+ },
313
+ {
314
+ "epoch": 53.96,
315
+ "grad_norm": 0.04665664583444595,
316
+ "learning_rate": 9.083619702176404e-05,
317
+ "loss": 0.0246,
318
+ "step": 215
319
+ },
320
+ {
321
+ "epoch": 55.0,
322
+ "grad_norm": 0.18880419433116913,
323
+ "learning_rate": 9.054982817869416e-05,
324
+ "loss": 0.0267,
325
+ "step": 220
326
+ },
327
+ {
328
+ "epoch": 56.32,
329
+ "grad_norm": 0.05329155549407005,
330
+ "learning_rate": 9.026345933562429e-05,
331
+ "loss": 0.0258,
332
+ "step": 225
333
+ },
334
+ {
335
+ "epoch": 57.64,
336
+ "grad_norm": 0.05351603031158447,
337
+ "learning_rate": 8.997709049255442e-05,
338
+ "loss": 0.0264,
339
+ "step": 230
340
+ },
341
+ {
342
+ "epoch": 58.96,
343
+ "grad_norm": 0.05472696200013161,
344
+ "learning_rate": 8.969072164948454e-05,
345
+ "loss": 0.0266,
346
+ "step": 235
347
+ },
348
+ {
349
+ "epoch": 60.0,
350
+ "grad_norm": 0.17182305455207825,
351
+ "learning_rate": 8.940435280641467e-05,
352
+ "loss": 0.0255,
353
+ "step": 240
354
+ },
355
+ {
356
+ "epoch": 61.32,
357
+ "grad_norm": 0.05441403388977051,
358
+ "learning_rate": 8.91179839633448e-05,
359
+ "loss": 0.0259,
360
+ "step": 245
361
+ },
362
+ {
363
+ "epoch": 62.64,
364
+ "grad_norm": 0.05443132296204567,
365
+ "learning_rate": 8.883161512027491e-05,
366
+ "loss": 0.025,
367
+ "step": 250
368
+ },
369
+ {
370
+ "epoch": 63.96,
371
+ "grad_norm": 0.05410757660865784,
372
+ "learning_rate": 8.854524627720504e-05,
373
+ "loss": 0.0261,
374
+ "step": 255
375
+ },
376
+ {
377
+ "epoch": 65.0,
378
+ "grad_norm": 0.16327381134033203,
379
+ "learning_rate": 8.825887743413516e-05,
380
+ "loss": 0.0265,
381
+ "step": 260
382
+ },
383
+ {
384
+ "epoch": 66.32,
385
+ "grad_norm": 0.05516252666711807,
386
+ "learning_rate": 8.797250859106529e-05,
387
+ "loss": 0.0251,
388
+ "step": 265
389
+ },
390
+ {
391
+ "epoch": 67.64,
392
+ "grad_norm": 0.0483415424823761,
393
+ "learning_rate": 8.768613974799542e-05,
394
+ "loss": 0.0255,
395
+ "step": 270
396
+ },
397
+ {
398
+ "epoch": 68.96,
399
+ "grad_norm": 0.062226541340351105,
400
+ "learning_rate": 8.739977090492554e-05,
401
+ "loss": 0.0247,
402
+ "step": 275
403
+ },
404
+ {
405
+ "epoch": 70.0,
406
+ "grad_norm": 0.20358847081661224,
407
+ "learning_rate": 8.711340206185567e-05,
408
+ "loss": 0.0267,
409
+ "step": 280
410
+ },
411
+ {
412
+ "epoch": 71.32,
413
+ "grad_norm": 0.04628003016114235,
414
+ "learning_rate": 8.682703321878581e-05,
415
+ "loss": 0.0255,
416
+ "step": 285
417
+ },
418
+ {
419
+ "epoch": 72.64,
420
+ "grad_norm": 0.06483373790979385,
421
+ "learning_rate": 8.654066437571594e-05,
422
+ "loss": 0.0257,
423
+ "step": 290
424
+ },
425
+ {
426
+ "epoch": 73.96,
427
+ "grad_norm": 0.04926105588674545,
428
+ "learning_rate": 8.625429553264606e-05,
429
+ "loss": 0.0244,
430
+ "step": 295
431
+ },
432
+ {
433
+ "epoch": 75.0,
434
+ "grad_norm": 0.1988091617822647,
435
+ "learning_rate": 8.596792668957619e-05,
436
+ "loss": 0.0239,
437
+ "step": 300
438
+ }
439
+ ],
440
+ "logging_steps": 5,
441
+ "max_steps": 1800,
442
+ "num_input_tokens_seen": 0,
443
+ "num_train_epochs": 450,
444
+ "save_steps": 300,
445
+ "stateful_callbacks": {
446
+ "TrainerControl": {
447
+ "args": {
448
+ "should_epoch_stop": false,
449
+ "should_evaluate": false,
450
+ "should_log": false,
451
+ "should_save": true,
452
+ "should_training_stop": false
453
+ },
454
+ "attributes": {}
455
+ }
456
+ },
457
+ "total_flos": 2.551356850176e+17,
458
+ "train_batch_size": 2,
459
+ "trial_name": null,
460
+ "trial_params": null
461
+ }
Math_QA/group_09/checkpoints/checkpoint-300/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
Math_QA/group_09/checkpoints/checkpoint-600/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: /hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/不冻结Qwen训练/models/Qwen2.5-1.5B-Instruct
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.12.0
Math_QA/group_09/checkpoints/checkpoint-600/adapter_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/\u4e0d\u51bb\u7ed3Qwen\u8bad\u7ec3/models/Qwen2.5-1.5B-Instruct",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 128,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 64,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "gate_proj",
24
+ "o_proj",
25
+ "k_proj",
26
+ "q_proj",
27
+ "up_proj",
28
+ "down_proj",
29
+ "v_proj"
30
+ ],
31
+ "task_type": "CAUSAL_LM",
32
+ "use_dora": false,
33
+ "use_rslora": false
34
+ }
Math_QA/group_09/checkpoints/checkpoint-600/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
Math_QA/group_09/checkpoints/checkpoint-600/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
Math_QA/group_09/checkpoints/checkpoint-600/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
Math_QA/group_09/checkpoints/checkpoint-600/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
Math_QA/group_09/checkpoints/checkpoint-600/tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
Math_QA/group_09/checkpoints/checkpoint-600/trainer_state.json ADDED
@@ -0,0 +1,881 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 150.0,
6
+ "eval_steps": 500,
7
+ "global_step": 600,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.32,
14
+ "grad_norm": 10.95508098602295,
15
+ "learning_rate": 0.0,
16
+ "loss": 1.9528,
17
+ "step": 1
18
+ },
19
+ {
20
+ "epoch": 1.32,
21
+ "grad_norm": 6.976505279541016,
22
+ "learning_rate": 7.4074074074074075e-06,
23
+ "loss": 1.7919,
24
+ "step": 5
25
+ },
26
+ {
27
+ "epoch": 2.64,
28
+ "grad_norm": 3.288942575454712,
29
+ "learning_rate": 1.6666666666666667e-05,
30
+ "loss": 1.6625,
31
+ "step": 10
32
+ },
33
+ {
34
+ "epoch": 3.96,
35
+ "grad_norm": 2.111987829208374,
36
+ "learning_rate": 2.5925925925925925e-05,
37
+ "loss": 1.2009,
38
+ "step": 15
39
+ },
40
+ {
41
+ "epoch": 5.0,
42
+ "grad_norm": 2.4555912017822266,
43
+ "learning_rate": 3.518518518518519e-05,
44
+ "loss": 0.8544,
45
+ "step": 20
46
+ },
47
+ {
48
+ "epoch": 6.32,
49
+ "grad_norm": 0.656902015209198,
50
+ "learning_rate": 4.4444444444444447e-05,
51
+ "loss": 0.7449,
52
+ "step": 25
53
+ },
54
+ {
55
+ "epoch": 7.64,
56
+ "grad_norm": 0.5291489958763123,
57
+ "learning_rate": 5.370370370370371e-05,
58
+ "loss": 0.5884,
59
+ "step": 30
60
+ },
61
+ {
62
+ "epoch": 8.96,
63
+ "grad_norm": 0.5356371998786926,
64
+ "learning_rate": 6.296296296296296e-05,
65
+ "loss": 0.6349,
66
+ "step": 35
67
+ },
68
+ {
69
+ "epoch": 10.0,
70
+ "grad_norm": 1.3841232061386108,
71
+ "learning_rate": 7.222222222222222e-05,
72
+ "loss": 0.5088,
73
+ "step": 40
74
+ },
75
+ {
76
+ "epoch": 11.32,
77
+ "grad_norm": 0.6104851365089417,
78
+ "learning_rate": 8.148148148148148e-05,
79
+ "loss": 0.4279,
80
+ "step": 45
81
+ },
82
+ {
83
+ "epoch": 12.64,
84
+ "grad_norm": 0.6166547536849976,
85
+ "learning_rate": 9.074074074074075e-05,
86
+ "loss": 0.2689,
87
+ "step": 50
88
+ },
89
+ {
90
+ "epoch": 13.96,
91
+ "grad_norm": 1.70536470413208,
92
+ "learning_rate": 0.0001,
93
+ "loss": 0.1985,
94
+ "step": 55
95
+ },
96
+ {
97
+ "epoch": 15.0,
98
+ "grad_norm": 3.289710760116577,
99
+ "learning_rate": 9.971363115693013e-05,
100
+ "loss": 0.1346,
101
+ "step": 60
102
+ },
103
+ {
104
+ "epoch": 16.32,
105
+ "grad_norm": 0.7723984122276306,
106
+ "learning_rate": 9.942726231386026e-05,
107
+ "loss": 0.0959,
108
+ "step": 65
109
+ },
110
+ {
111
+ "epoch": 17.64,
112
+ "grad_norm": 0.8176506161689758,
113
+ "learning_rate": 9.914089347079038e-05,
114
+ "loss": 0.0617,
115
+ "step": 70
116
+ },
117
+ {
118
+ "epoch": 18.96,
119
+ "grad_norm": 0.5226219892501831,
120
+ "learning_rate": 9.885452462772051e-05,
121
+ "loss": 0.0428,
122
+ "step": 75
123
+ },
124
+ {
125
+ "epoch": 20.0,
126
+ "grad_norm": 2.8831839561462402,
127
+ "learning_rate": 9.856815578465064e-05,
128
+ "loss": 0.0416,
129
+ "step": 80
130
+ },
131
+ {
132
+ "epoch": 21.32,
133
+ "grad_norm": 0.26046544313430786,
134
+ "learning_rate": 9.828178694158075e-05,
135
+ "loss": 0.0334,
136
+ "step": 85
137
+ },
138
+ {
139
+ "epoch": 22.64,
140
+ "grad_norm": 0.5656669735908508,
141
+ "learning_rate": 9.799541809851088e-05,
142
+ "loss": 0.0347,
143
+ "step": 90
144
+ },
145
+ {
146
+ "epoch": 23.96,
147
+ "grad_norm": 0.5219624042510986,
148
+ "learning_rate": 9.7709049255441e-05,
149
+ "loss": 0.0336,
150
+ "step": 95
151
+ },
152
+ {
153
+ "epoch": 25.0,
154
+ "grad_norm": 1.2479528188705444,
155
+ "learning_rate": 9.742268041237114e-05,
156
+ "loss": 0.0325,
157
+ "step": 100
158
+ },
159
+ {
160
+ "epoch": 26.32,
161
+ "grad_norm": 0.3272712826728821,
162
+ "learning_rate": 9.713631156930127e-05,
163
+ "loss": 0.0317,
164
+ "step": 105
165
+ },
166
+ {
167
+ "epoch": 27.64,
168
+ "grad_norm": 0.4236655831336975,
169
+ "learning_rate": 9.68499427262314e-05,
170
+ "loss": 0.0322,
171
+ "step": 110
172
+ },
173
+ {
174
+ "epoch": 28.96,
175
+ "grad_norm": 0.23534469306468964,
176
+ "learning_rate": 9.656357388316152e-05,
177
+ "loss": 0.029,
178
+ "step": 115
179
+ },
180
+ {
181
+ "epoch": 30.0,
182
+ "grad_norm": 0.530704140663147,
183
+ "learning_rate": 9.627720504009165e-05,
184
+ "loss": 0.0301,
185
+ "step": 120
186
+ },
187
+ {
188
+ "epoch": 31.32,
189
+ "grad_norm": 0.08252622932195663,
190
+ "learning_rate": 9.599083619702178e-05,
191
+ "loss": 0.029,
192
+ "step": 125
193
+ },
194
+ {
195
+ "epoch": 32.64,
196
+ "grad_norm": 0.2679576277732849,
197
+ "learning_rate": 9.57044673539519e-05,
198
+ "loss": 0.0287,
199
+ "step": 130
200
+ },
201
+ {
202
+ "epoch": 33.96,
203
+ "grad_norm": 0.30863121151924133,
204
+ "learning_rate": 9.541809851088203e-05,
205
+ "loss": 0.029,
206
+ "step": 135
207
+ },
208
+ {
209
+ "epoch": 35.0,
210
+ "grad_norm": 0.27921056747436523,
211
+ "learning_rate": 9.513172966781214e-05,
212
+ "loss": 0.0272,
213
+ "step": 140
214
+ },
215
+ {
216
+ "epoch": 36.32,
217
+ "grad_norm": 0.15001249313354492,
218
+ "learning_rate": 9.484536082474227e-05,
219
+ "loss": 0.0289,
220
+ "step": 145
221
+ },
222
+ {
223
+ "epoch": 37.64,
224
+ "grad_norm": 0.391609787940979,
225
+ "learning_rate": 9.45589919816724e-05,
226
+ "loss": 0.0295,
227
+ "step": 150
228
+ },
229
+ {
230
+ "epoch": 38.96,
231
+ "grad_norm": 0.24230684340000153,
232
+ "learning_rate": 9.427262313860252e-05,
233
+ "loss": 0.0265,
234
+ "step": 155
235
+ },
236
+ {
237
+ "epoch": 40.0,
238
+ "grad_norm": 2.2498250007629395,
239
+ "learning_rate": 9.398625429553265e-05,
240
+ "loss": 0.0319,
241
+ "step": 160
242
+ },
243
+ {
244
+ "epoch": 41.32,
245
+ "grad_norm": 0.14986856281757355,
246
+ "learning_rate": 9.369988545246277e-05,
247
+ "loss": 0.0277,
248
+ "step": 165
249
+ },
250
+ {
251
+ "epoch": 42.64,
252
+ "grad_norm": 0.14574986696243286,
253
+ "learning_rate": 9.34135166093929e-05,
254
+ "loss": 0.0264,
255
+ "step": 170
256
+ },
257
+ {
258
+ "epoch": 43.96,
259
+ "grad_norm": 0.11353456974029541,
260
+ "learning_rate": 9.312714776632303e-05,
261
+ "loss": 0.026,
262
+ "step": 175
263
+ },
264
+ {
265
+ "epoch": 45.0,
266
+ "grad_norm": 0.19234131276607513,
267
+ "learning_rate": 9.284077892325315e-05,
268
+ "loss": 0.0237,
269
+ "step": 180
270
+ },
271
+ {
272
+ "epoch": 46.32,
273
+ "grad_norm": 0.058677107095718384,
274
+ "learning_rate": 9.255441008018328e-05,
275
+ "loss": 0.0265,
276
+ "step": 185
277
+ },
278
+ {
279
+ "epoch": 47.64,
280
+ "grad_norm": 0.2846521735191345,
281
+ "learning_rate": 9.22680412371134e-05,
282
+ "loss": 0.0279,
283
+ "step": 190
284
+ },
285
+ {
286
+ "epoch": 48.96,
287
+ "grad_norm": 0.06889114528894424,
288
+ "learning_rate": 9.198167239404353e-05,
289
+ "loss": 0.0257,
290
+ "step": 195
291
+ },
292
+ {
293
+ "epoch": 50.0,
294
+ "grad_norm": 0.1600271314382553,
295
+ "learning_rate": 9.169530355097366e-05,
296
+ "loss": 0.0249,
297
+ "step": 200
298
+ },
299
+ {
300
+ "epoch": 51.32,
301
+ "grad_norm": 0.06680695712566376,
302
+ "learning_rate": 9.140893470790379e-05,
303
+ "loss": 0.0245,
304
+ "step": 205
305
+ },
306
+ {
307
+ "epoch": 52.64,
308
+ "grad_norm": 0.06898869574069977,
309
+ "learning_rate": 9.112256586483391e-05,
310
+ "loss": 0.0257,
311
+ "step": 210
312
+ },
313
+ {
314
+ "epoch": 53.96,
315
+ "grad_norm": 0.04665664583444595,
316
+ "learning_rate": 9.083619702176404e-05,
317
+ "loss": 0.0246,
318
+ "step": 215
319
+ },
320
+ {
321
+ "epoch": 55.0,
322
+ "grad_norm": 0.18880419433116913,
323
+ "learning_rate": 9.054982817869416e-05,
324
+ "loss": 0.0267,
325
+ "step": 220
326
+ },
327
+ {
328
+ "epoch": 56.32,
329
+ "grad_norm": 0.05329155549407005,
330
+ "learning_rate": 9.026345933562429e-05,
331
+ "loss": 0.0258,
332
+ "step": 225
333
+ },
334
+ {
335
+ "epoch": 57.64,
336
+ "grad_norm": 0.05351603031158447,
337
+ "learning_rate": 8.997709049255442e-05,
338
+ "loss": 0.0264,
339
+ "step": 230
340
+ },
341
+ {
342
+ "epoch": 58.96,
343
+ "grad_norm": 0.05472696200013161,
344
+ "learning_rate": 8.969072164948454e-05,
345
+ "loss": 0.0266,
346
+ "step": 235
347
+ },
348
+ {
349
+ "epoch": 60.0,
350
+ "grad_norm": 0.17182305455207825,
351
+ "learning_rate": 8.940435280641467e-05,
352
+ "loss": 0.0255,
353
+ "step": 240
354
+ },
355
+ {
356
+ "epoch": 61.32,
357
+ "grad_norm": 0.05441403388977051,
358
+ "learning_rate": 8.91179839633448e-05,
359
+ "loss": 0.0259,
360
+ "step": 245
361
+ },
362
+ {
363
+ "epoch": 62.64,
364
+ "grad_norm": 0.05443132296204567,
365
+ "learning_rate": 8.883161512027491e-05,
366
+ "loss": 0.025,
367
+ "step": 250
368
+ },
369
+ {
370
+ "epoch": 63.96,
371
+ "grad_norm": 0.05410757660865784,
372
+ "learning_rate": 8.854524627720504e-05,
373
+ "loss": 0.0261,
374
+ "step": 255
375
+ },
376
+ {
377
+ "epoch": 65.0,
378
+ "grad_norm": 0.16327381134033203,
379
+ "learning_rate": 8.825887743413516e-05,
380
+ "loss": 0.0265,
381
+ "step": 260
382
+ },
383
+ {
384
+ "epoch": 66.32,
385
+ "grad_norm": 0.05516252666711807,
386
+ "learning_rate": 8.797250859106529e-05,
387
+ "loss": 0.0251,
388
+ "step": 265
389
+ },
390
+ {
391
+ "epoch": 67.64,
392
+ "grad_norm": 0.0483415424823761,
393
+ "learning_rate": 8.768613974799542e-05,
394
+ "loss": 0.0255,
395
+ "step": 270
396
+ },
397
+ {
398
+ "epoch": 68.96,
399
+ "grad_norm": 0.062226541340351105,
400
+ "learning_rate": 8.739977090492554e-05,
401
+ "loss": 0.0247,
402
+ "step": 275
403
+ },
404
+ {
405
+ "epoch": 70.0,
406
+ "grad_norm": 0.20358847081661224,
407
+ "learning_rate": 8.711340206185567e-05,
408
+ "loss": 0.0267,
409
+ "step": 280
410
+ },
411
+ {
412
+ "epoch": 71.32,
413
+ "grad_norm": 0.04628003016114235,
414
+ "learning_rate": 8.682703321878581e-05,
415
+ "loss": 0.0255,
416
+ "step": 285
417
+ },
418
+ {
419
+ "epoch": 72.64,
420
+ "grad_norm": 0.06483373790979385,
421
+ "learning_rate": 8.654066437571594e-05,
422
+ "loss": 0.0257,
423
+ "step": 290
424
+ },
425
+ {
426
+ "epoch": 73.96,
427
+ "grad_norm": 0.04926105588674545,
428
+ "learning_rate": 8.625429553264606e-05,
429
+ "loss": 0.0244,
430
+ "step": 295
431
+ },
432
+ {
433
+ "epoch": 75.0,
434
+ "grad_norm": 0.1988091617822647,
435
+ "learning_rate": 8.596792668957619e-05,
436
+ "loss": 0.0239,
437
+ "step": 300
438
+ },
439
+ {
440
+ "epoch": 76.32,
441
+ "grad_norm": 0.04305023327469826,
442
+ "learning_rate": 8.56815578465063e-05,
443
+ "loss": 0.0248,
444
+ "step": 305
445
+ },
446
+ {
447
+ "epoch": 77.64,
448
+ "grad_norm": 0.04323578625917435,
449
+ "learning_rate": 8.539518900343643e-05,
450
+ "loss": 0.0254,
451
+ "step": 310
452
+ },
453
+ {
454
+ "epoch": 78.96,
455
+ "grad_norm": 0.04426678270101547,
456
+ "learning_rate": 8.510882016036655e-05,
457
+ "loss": 0.0254,
458
+ "step": 315
459
+ },
460
+ {
461
+ "epoch": 80.0,
462
+ "grad_norm": 0.14689449965953827,
463
+ "learning_rate": 8.482245131729668e-05,
464
+ "loss": 0.0259,
465
+ "step": 320
466
+ },
467
+ {
468
+ "epoch": 81.32,
469
+ "grad_norm": 0.04256561025977135,
470
+ "learning_rate": 8.453608247422681e-05,
471
+ "loss": 0.0256,
472
+ "step": 325
473
+ },
474
+ {
475
+ "epoch": 82.64,
476
+ "grad_norm": 0.03943061828613281,
477
+ "learning_rate": 8.424971363115693e-05,
478
+ "loss": 0.0235,
479
+ "step": 330
480
+ },
481
+ {
482
+ "epoch": 83.96,
483
+ "grad_norm": 0.041899990290403366,
484
+ "learning_rate": 8.396334478808706e-05,
485
+ "loss": 0.0249,
486
+ "step": 335
487
+ },
488
+ {
489
+ "epoch": 85.0,
490
+ "grad_norm": 0.151236429810524,
491
+ "learning_rate": 8.367697594501719e-05,
492
+ "loss": 0.0255,
493
+ "step": 340
494
+ },
495
+ {
496
+ "epoch": 86.32,
497
+ "grad_norm": 0.042102884501218796,
498
+ "learning_rate": 8.339060710194731e-05,
499
+ "loss": 0.0244,
500
+ "step": 345
501
+ },
502
+ {
503
+ "epoch": 87.64,
504
+ "grad_norm": 0.04723669961094856,
505
+ "learning_rate": 8.310423825887744e-05,
506
+ "loss": 0.0251,
507
+ "step": 350
508
+ },
509
+ {
510
+ "epoch": 88.96,
511
+ "grad_norm": 0.0578082799911499,
512
+ "learning_rate": 8.281786941580757e-05,
513
+ "loss": 0.0261,
514
+ "step": 355
515
+ },
516
+ {
517
+ "epoch": 90.0,
518
+ "grad_norm": 0.10269813239574432,
519
+ "learning_rate": 8.253150057273768e-05,
520
+ "loss": 0.0225,
521
+ "step": 360
522
+ },
523
+ {
524
+ "epoch": 91.32,
525
+ "grad_norm": 0.046400491148233414,
526
+ "learning_rate": 8.224513172966782e-05,
527
+ "loss": 0.0262,
528
+ "step": 365
529
+ },
530
+ {
531
+ "epoch": 92.64,
532
+ "grad_norm": 0.04183673858642578,
533
+ "learning_rate": 8.195876288659795e-05,
534
+ "loss": 0.0239,
535
+ "step": 370
536
+ },
537
+ {
538
+ "epoch": 93.96,
539
+ "grad_norm": 0.04400316998362541,
540
+ "learning_rate": 8.167239404352807e-05,
541
+ "loss": 0.0263,
542
+ "step": 375
543
+ },
544
+ {
545
+ "epoch": 95.0,
546
+ "grad_norm": 0.10862386226654053,
547
+ "learning_rate": 8.13860252004582e-05,
548
+ "loss": 0.025,
549
+ "step": 380
550
+ },
551
+ {
552
+ "epoch": 96.32,
553
+ "grad_norm": 0.05308162048459053,
554
+ "learning_rate": 8.109965635738833e-05,
555
+ "loss": 0.0248,
556
+ "step": 385
557
+ },
558
+ {
559
+ "epoch": 97.64,
560
+ "grad_norm": 0.04261139780282974,
561
+ "learning_rate": 8.081328751431845e-05,
562
+ "loss": 0.0244,
563
+ "step": 390
564
+ },
565
+ {
566
+ "epoch": 98.96,
567
+ "grad_norm": 0.05337546020746231,
568
+ "learning_rate": 8.052691867124858e-05,
569
+ "loss": 0.0253,
570
+ "step": 395
571
+ },
572
+ {
573
+ "epoch": 100.0,
574
+ "grad_norm": 0.15639856457710266,
575
+ "learning_rate": 8.02405498281787e-05,
576
+ "loss": 0.0243,
577
+ "step": 400
578
+ },
579
+ {
580
+ "epoch": 101.32,
581
+ "grad_norm": 0.04450729116797447,
582
+ "learning_rate": 7.995418098510883e-05,
583
+ "loss": 0.0258,
584
+ "step": 405
585
+ },
586
+ {
587
+ "epoch": 102.64,
588
+ "grad_norm": 0.042327046394348145,
589
+ "learning_rate": 7.966781214203894e-05,
590
+ "loss": 0.0244,
591
+ "step": 410
592
+ },
593
+ {
594
+ "epoch": 103.96,
595
+ "grad_norm": 0.04105006903409958,
596
+ "learning_rate": 7.938144329896907e-05,
597
+ "loss": 0.0253,
598
+ "step": 415
599
+ },
600
+ {
601
+ "epoch": 105.0,
602
+ "grad_norm": 0.17930248379707336,
603
+ "learning_rate": 7.90950744558992e-05,
604
+ "loss": 0.0261,
605
+ "step": 420
606
+ },
607
+ {
608
+ "epoch": 106.32,
609
+ "grad_norm": 0.04404031112790108,
610
+ "learning_rate": 7.880870561282932e-05,
611
+ "loss": 0.0241,
612
+ "step": 425
613
+ },
614
+ {
615
+ "epoch": 107.64,
616
+ "grad_norm": 0.04142986983060837,
617
+ "learning_rate": 7.852233676975945e-05,
618
+ "loss": 0.0245,
619
+ "step": 430
620
+ },
621
+ {
622
+ "epoch": 108.96,
623
+ "grad_norm": 0.041959185153245926,
624
+ "learning_rate": 7.823596792668958e-05,
625
+ "loss": 0.0254,
626
+ "step": 435
627
+ },
628
+ {
629
+ "epoch": 110.0,
630
+ "grad_norm": 0.27740439772605896,
631
+ "learning_rate": 7.79495990836197e-05,
632
+ "loss": 0.0292,
633
+ "step": 440
634
+ },
635
+ {
636
+ "epoch": 111.32,
637
+ "grad_norm": 0.03657572343945503,
638
+ "learning_rate": 7.766323024054983e-05,
639
+ "loss": 0.026,
640
+ "step": 445
641
+ },
642
+ {
643
+ "epoch": 112.64,
644
+ "grad_norm": 0.042320434004068375,
645
+ "learning_rate": 7.737686139747996e-05,
646
+ "loss": 0.0251,
647
+ "step": 450
648
+ },
649
+ {
650
+ "epoch": 113.96,
651
+ "grad_norm": 0.0473681204020977,
652
+ "learning_rate": 7.709049255441008e-05,
653
+ "loss": 0.026,
654
+ "step": 455
655
+ },
656
+ {
657
+ "epoch": 115.0,
658
+ "grad_norm": 0.1326676607131958,
659
+ "learning_rate": 7.680412371134021e-05,
660
+ "loss": 0.0241,
661
+ "step": 460
662
+ },
663
+ {
664
+ "epoch": 116.32,
665
+ "grad_norm": 0.04483647271990776,
666
+ "learning_rate": 7.651775486827034e-05,
667
+ "loss": 0.0236,
668
+ "step": 465
669
+ },
670
+ {
671
+ "epoch": 117.64,
672
+ "grad_norm": 0.038961004465818405,
673
+ "learning_rate": 7.623138602520046e-05,
674
+ "loss": 0.0235,
675
+ "step": 470
676
+ },
677
+ {
678
+ "epoch": 118.96,
679
+ "grad_norm": 0.042134907096624374,
680
+ "learning_rate": 7.594501718213059e-05,
681
+ "loss": 0.0252,
682
+ "step": 475
683
+ },
684
+ {
685
+ "epoch": 120.0,
686
+ "grad_norm": 0.13292020559310913,
687
+ "learning_rate": 7.565864833906071e-05,
688
+ "loss": 0.024,
689
+ "step": 480
690
+ },
691
+ {
692
+ "epoch": 121.32,
693
+ "grad_norm": 0.03745294362306595,
694
+ "learning_rate": 7.537227949599084e-05,
695
+ "loss": 0.025,
696
+ "step": 485
697
+ },
698
+ {
699
+ "epoch": 122.64,
700
+ "grad_norm": 0.035545315593481064,
701
+ "learning_rate": 7.508591065292097e-05,
702
+ "loss": 0.0253,
703
+ "step": 490
704
+ },
705
+ {
706
+ "epoch": 123.96,
707
+ "grad_norm": 0.03991984575986862,
708
+ "learning_rate": 7.47995418098511e-05,
709
+ "loss": 0.026,
710
+ "step": 495
711
+ },
712
+ {
713
+ "epoch": 125.0,
714
+ "grad_norm": 0.1339961290359497,
715
+ "learning_rate": 7.451317296678122e-05,
716
+ "loss": 0.0246,
717
+ "step": 500
718
+ },
719
+ {
720
+ "epoch": 126.32,
721
+ "grad_norm": 0.04381132498383522,
722
+ "learning_rate": 7.422680412371135e-05,
723
+ "loss": 0.0235,
724
+ "step": 505
725
+ },
726
+ {
727
+ "epoch": 127.64,
728
+ "grad_norm": 0.048515841364860535,
729
+ "learning_rate": 7.394043528064147e-05,
730
+ "loss": 0.0242,
731
+ "step": 510
732
+ },
733
+ {
734
+ "epoch": 128.96,
735
+ "grad_norm": 0.04145604744553566,
736
+ "learning_rate": 7.36540664375716e-05,
737
+ "loss": 0.0249,
738
+ "step": 515
739
+ },
740
+ {
741
+ "epoch": 130.0,
742
+ "grad_norm": 0.14400818943977356,
743
+ "learning_rate": 7.336769759450171e-05,
744
+ "loss": 0.0247,
745
+ "step": 520
746
+ },
747
+ {
748
+ "epoch": 131.32,
749
+ "grad_norm": 0.04025031998753548,
750
+ "learning_rate": 7.308132875143184e-05,
751
+ "loss": 0.0241,
752
+ "step": 525
753
+ },
754
+ {
755
+ "epoch": 132.64,
756
+ "grad_norm": 0.037277135998010635,
757
+ "learning_rate": 7.279495990836197e-05,
758
+ "loss": 0.0242,
759
+ "step": 530
760
+ },
761
+ {
762
+ "epoch": 133.96,
763
+ "grad_norm": 0.03666083887219429,
764
+ "learning_rate": 7.250859106529209e-05,
765
+ "loss": 0.0251,
766
+ "step": 535
767
+ },
768
+ {
769
+ "epoch": 135.0,
770
+ "grad_norm": 0.09921745210886002,
771
+ "learning_rate": 7.222222222222222e-05,
772
+ "loss": 0.0241,
773
+ "step": 540
774
+ },
775
+ {
776
+ "epoch": 136.32,
777
+ "grad_norm": 0.0382193848490715,
778
+ "learning_rate": 7.193585337915235e-05,
779
+ "loss": 0.0247,
780
+ "step": 545
781
+ },
782
+ {
783
+ "epoch": 137.64,
784
+ "grad_norm": 0.0314810685813427,
785
+ "learning_rate": 7.164948453608247e-05,
786
+ "loss": 0.0239,
787
+ "step": 550
788
+ },
789
+ {
790
+ "epoch": 138.96,
791
+ "grad_norm": 0.04278745502233505,
792
+ "learning_rate": 7.136311569301261e-05,
793
+ "loss": 0.0243,
794
+ "step": 555
795
+ },
796
+ {
797
+ "epoch": 140.0,
798
+ "grad_norm": 0.09295342862606049,
799
+ "learning_rate": 7.107674684994274e-05,
800
+ "loss": 0.0234,
801
+ "step": 560
802
+ },
803
+ {
804
+ "epoch": 141.32,
805
+ "grad_norm": 0.03429599106311798,
806
+ "learning_rate": 7.079037800687286e-05,
807
+ "loss": 0.0248,
808
+ "step": 565
809
+ },
810
+ {
811
+ "epoch": 142.64,
812
+ "grad_norm": 0.03622185438871384,
813
+ "learning_rate": 7.050400916380299e-05,
814
+ "loss": 0.0234,
815
+ "step": 570
816
+ },
817
+ {
818
+ "epoch": 143.96,
819
+ "grad_norm": 0.042615506798028946,
820
+ "learning_rate": 7.02176403207331e-05,
821
+ "loss": 0.0242,
822
+ "step": 575
823
+ },
824
+ {
825
+ "epoch": 145.0,
826
+ "grad_norm": 0.13792142271995544,
827
+ "learning_rate": 6.993127147766323e-05,
828
+ "loss": 0.0268,
829
+ "step": 580
830
+ },
831
+ {
832
+ "epoch": 146.32,
833
+ "grad_norm": 0.035664405673742294,
834
+ "learning_rate": 6.964490263459336e-05,
835
+ "loss": 0.0231,
836
+ "step": 585
837
+ },
838
+ {
839
+ "epoch": 147.64,
840
+ "grad_norm": 0.033511932939291,
841
+ "learning_rate": 6.935853379152348e-05,
842
+ "loss": 0.0258,
843
+ "step": 590
844
+ },
845
+ {
846
+ "epoch": 148.96,
847
+ "grad_norm": 0.036591917276382446,
848
+ "learning_rate": 6.907216494845361e-05,
849
+ "loss": 0.0248,
850
+ "step": 595
851
+ },
852
+ {
853
+ "epoch": 150.0,
854
+ "grad_norm": 0.11892726272344589,
855
+ "learning_rate": 6.878579610538374e-05,
856
+ "loss": 0.0257,
857
+ "step": 600
858
+ }
859
+ ],
860
+ "logging_steps": 5,
861
+ "max_steps": 1800,
862
+ "num_input_tokens_seen": 0,
863
+ "num_train_epochs": 450,
864
+ "save_steps": 300,
865
+ "stateful_callbacks": {
866
+ "TrainerControl": {
867
+ "args": {
868
+ "should_epoch_stop": false,
869
+ "should_evaluate": false,
870
+ "should_log": false,
871
+ "should_save": true,
872
+ "should_training_stop": false
873
+ },
874
+ "attributes": {}
875
+ }
876
+ },
877
+ "total_flos": 5.102713700352e+17,
878
+ "train_batch_size": 2,
879
+ "trial_name": null,
880
+ "trial_params": null
881
+ }
Math_QA/group_09/checkpoints/checkpoint-600/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
Math_QA/group_09/metadata.json ADDED
@@ -0,0 +1,2718 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dataset_name": "Math_QA",
3
+ "group_index": 9,
4
+ "prompt_group_file": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/train_lora/prompt_groups/Math_QA/group_09.json",
5
+ "output_dir": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/train_lora/outputs/Math_QA/group_09",
6
+ "checkpoint_root": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/train_lora/outputs/Math_QA/group_09/checkpoints",
7
+ "generated_at": "2025-11-06T12:12:07Z",
8
+ "train_loss": 0.049091512378719115,
9
+ "metrics": {
10
+ "train_runtime": 30940.5798,
11
+ "train_samples_per_second": 1.862,
12
+ "train_steps_per_second": 0.058,
13
+ "total_flos": 1.5308141101056e+18,
14
+ "train_loss": 0.049091512378719115,
15
+ "epoch": 450.0
16
+ },
17
+ "trainer_state": [
18
+ {
19
+ "loss": 1.9528,
20
+ "grad_norm": 10.95508098602295,
21
+ "learning_rate": 0.0,
22
+ "epoch": 0.32,
23
+ "step": 1
24
+ },
25
+ {
26
+ "loss": 1.7919,
27
+ "grad_norm": 6.976505279541016,
28
+ "learning_rate": 7.4074074074074075e-06,
29
+ "epoch": 1.32,
30
+ "step": 5
31
+ },
32
+ {
33
+ "loss": 1.6625,
34
+ "grad_norm": 3.288942575454712,
35
+ "learning_rate": 1.6666666666666667e-05,
36
+ "epoch": 2.64,
37
+ "step": 10
38
+ },
39
+ {
40
+ "loss": 1.2009,
41
+ "grad_norm": 2.111987829208374,
42
+ "learning_rate": 2.5925925925925925e-05,
43
+ "epoch": 3.96,
44
+ "step": 15
45
+ },
46
+ {
47
+ "loss": 0.8544,
48
+ "grad_norm": 2.4555912017822266,
49
+ "learning_rate": 3.518518518518519e-05,
50
+ "epoch": 5.0,
51
+ "step": 20
52
+ },
53
+ {
54
+ "loss": 0.7449,
55
+ "grad_norm": 0.656902015209198,
56
+ "learning_rate": 4.4444444444444447e-05,
57
+ "epoch": 6.32,
58
+ "step": 25
59
+ },
60
+ {
61
+ "loss": 0.5884,
62
+ "grad_norm": 0.5291489958763123,
63
+ "learning_rate": 5.370370370370371e-05,
64
+ "epoch": 7.64,
65
+ "step": 30
66
+ },
67
+ {
68
+ "loss": 0.6349,
69
+ "grad_norm": 0.5356371998786926,
70
+ "learning_rate": 6.296296296296296e-05,
71
+ "epoch": 8.96,
72
+ "step": 35
73
+ },
74
+ {
75
+ "loss": 0.5088,
76
+ "grad_norm": 1.3841232061386108,
77
+ "learning_rate": 7.222222222222222e-05,
78
+ "epoch": 10.0,
79
+ "step": 40
80
+ },
81
+ {
82
+ "loss": 0.4279,
83
+ "grad_norm": 0.6104851365089417,
84
+ "learning_rate": 8.148148148148148e-05,
85
+ "epoch": 11.32,
86
+ "step": 45
87
+ },
88
+ {
89
+ "loss": 0.2689,
90
+ "grad_norm": 0.6166547536849976,
91
+ "learning_rate": 9.074074074074075e-05,
92
+ "epoch": 12.64,
93
+ "step": 50
94
+ },
95
+ {
96
+ "loss": 0.1985,
97
+ "grad_norm": 1.70536470413208,
98
+ "learning_rate": 0.0001,
99
+ "epoch": 13.96,
100
+ "step": 55
101
+ },
102
+ {
103
+ "loss": 0.1346,
104
+ "grad_norm": 3.289710760116577,
105
+ "learning_rate": 9.971363115693013e-05,
106
+ "epoch": 15.0,
107
+ "step": 60
108
+ },
109
+ {
110
+ "loss": 0.0959,
111
+ "grad_norm": 0.7723984122276306,
112
+ "learning_rate": 9.942726231386026e-05,
113
+ "epoch": 16.32,
114
+ "step": 65
115
+ },
116
+ {
117
+ "loss": 0.0617,
118
+ "grad_norm": 0.8176506161689758,
119
+ "learning_rate": 9.914089347079038e-05,
120
+ "epoch": 17.64,
121
+ "step": 70
122
+ },
123
+ {
124
+ "loss": 0.0428,
125
+ "grad_norm": 0.5226219892501831,
126
+ "learning_rate": 9.885452462772051e-05,
127
+ "epoch": 18.96,
128
+ "step": 75
129
+ },
130
+ {
131
+ "loss": 0.0416,
132
+ "grad_norm": 2.8831839561462402,
133
+ "learning_rate": 9.856815578465064e-05,
134
+ "epoch": 20.0,
135
+ "step": 80
136
+ },
137
+ {
138
+ "loss": 0.0334,
139
+ "grad_norm": 0.26046544313430786,
140
+ "learning_rate": 9.828178694158075e-05,
141
+ "epoch": 21.32,
142
+ "step": 85
143
+ },
144
+ {
145
+ "loss": 0.0347,
146
+ "grad_norm": 0.5656669735908508,
147
+ "learning_rate": 9.799541809851088e-05,
148
+ "epoch": 22.64,
149
+ "step": 90
150
+ },
151
+ {
152
+ "loss": 0.0336,
153
+ "grad_norm": 0.5219624042510986,
154
+ "learning_rate": 9.7709049255441e-05,
155
+ "epoch": 23.96,
156
+ "step": 95
157
+ },
158
+ {
159
+ "loss": 0.0325,
160
+ "grad_norm": 1.2479528188705444,
161
+ "learning_rate": 9.742268041237114e-05,
162
+ "epoch": 25.0,
163
+ "step": 100
164
+ },
165
+ {
166
+ "loss": 0.0317,
167
+ "grad_norm": 0.3272712826728821,
168
+ "learning_rate": 9.713631156930127e-05,
169
+ "epoch": 26.32,
170
+ "step": 105
171
+ },
172
+ {
173
+ "loss": 0.0322,
174
+ "grad_norm": 0.4236655831336975,
175
+ "learning_rate": 9.68499427262314e-05,
176
+ "epoch": 27.64,
177
+ "step": 110
178
+ },
179
+ {
180
+ "loss": 0.029,
181
+ "grad_norm": 0.23534469306468964,
182
+ "learning_rate": 9.656357388316152e-05,
183
+ "epoch": 28.96,
184
+ "step": 115
185
+ },
186
+ {
187
+ "loss": 0.0301,
188
+ "grad_norm": 0.530704140663147,
189
+ "learning_rate": 9.627720504009165e-05,
190
+ "epoch": 30.0,
191
+ "step": 120
192
+ },
193
+ {
194
+ "loss": 0.029,
195
+ "grad_norm": 0.08252622932195663,
196
+ "learning_rate": 9.599083619702178e-05,
197
+ "epoch": 31.32,
198
+ "step": 125
199
+ },
200
+ {
201
+ "loss": 0.0287,
202
+ "grad_norm": 0.2679576277732849,
203
+ "learning_rate": 9.57044673539519e-05,
204
+ "epoch": 32.64,
205
+ "step": 130
206
+ },
207
+ {
208
+ "loss": 0.029,
209
+ "grad_norm": 0.30863121151924133,
210
+ "learning_rate": 9.541809851088203e-05,
211
+ "epoch": 33.96,
212
+ "step": 135
213
+ },
214
+ {
215
+ "loss": 0.0272,
216
+ "grad_norm": 0.27921056747436523,
217
+ "learning_rate": 9.513172966781214e-05,
218
+ "epoch": 35.0,
219
+ "step": 140
220
+ },
221
+ {
222
+ "loss": 0.0289,
223
+ "grad_norm": 0.15001249313354492,
224
+ "learning_rate": 9.484536082474227e-05,
225
+ "epoch": 36.32,
226
+ "step": 145
227
+ },
228
+ {
229
+ "loss": 0.0295,
230
+ "grad_norm": 0.391609787940979,
231
+ "learning_rate": 9.45589919816724e-05,
232
+ "epoch": 37.64,
233
+ "step": 150
234
+ },
235
+ {
236
+ "loss": 0.0265,
237
+ "grad_norm": 0.24230684340000153,
238
+ "learning_rate": 9.427262313860252e-05,
239
+ "epoch": 38.96,
240
+ "step": 155
241
+ },
242
+ {
243
+ "loss": 0.0319,
244
+ "grad_norm": 2.2498250007629395,
245
+ "learning_rate": 9.398625429553265e-05,
246
+ "epoch": 40.0,
247
+ "step": 160
248
+ },
249
+ {
250
+ "loss": 0.0277,
251
+ "grad_norm": 0.14986856281757355,
252
+ "learning_rate": 9.369988545246277e-05,
253
+ "epoch": 41.32,
254
+ "step": 165
255
+ },
256
+ {
257
+ "loss": 0.0264,
258
+ "grad_norm": 0.14574986696243286,
259
+ "learning_rate": 9.34135166093929e-05,
260
+ "epoch": 42.64,
261
+ "step": 170
262
+ },
263
+ {
264
+ "loss": 0.026,
265
+ "grad_norm": 0.11353456974029541,
266
+ "learning_rate": 9.312714776632303e-05,
267
+ "epoch": 43.96,
268
+ "step": 175
269
+ },
270
+ {
271
+ "loss": 0.0237,
272
+ "grad_norm": 0.19234131276607513,
273
+ "learning_rate": 9.284077892325315e-05,
274
+ "epoch": 45.0,
275
+ "step": 180
276
+ },
277
+ {
278
+ "loss": 0.0265,
279
+ "grad_norm": 0.058677107095718384,
280
+ "learning_rate": 9.255441008018328e-05,
281
+ "epoch": 46.32,
282
+ "step": 185
283
+ },
284
+ {
285
+ "loss": 0.0279,
286
+ "grad_norm": 0.2846521735191345,
287
+ "learning_rate": 9.22680412371134e-05,
288
+ "epoch": 47.64,
289
+ "step": 190
290
+ },
291
+ {
292
+ "loss": 0.0257,
293
+ "grad_norm": 0.06889114528894424,
294
+ "learning_rate": 9.198167239404353e-05,
295
+ "epoch": 48.96,
296
+ "step": 195
297
+ },
298
+ {
299
+ "loss": 0.0249,
300
+ "grad_norm": 0.1600271314382553,
301
+ "learning_rate": 9.169530355097366e-05,
302
+ "epoch": 50.0,
303
+ "step": 200
304
+ },
305
+ {
306
+ "loss": 0.0245,
307
+ "grad_norm": 0.06680695712566376,
308
+ "learning_rate": 9.140893470790379e-05,
309
+ "epoch": 51.32,
310
+ "step": 205
311
+ },
312
+ {
313
+ "loss": 0.0257,
314
+ "grad_norm": 0.06898869574069977,
315
+ "learning_rate": 9.112256586483391e-05,
316
+ "epoch": 52.64,
317
+ "step": 210
318
+ },
319
+ {
320
+ "loss": 0.0246,
321
+ "grad_norm": 0.04665664583444595,
322
+ "learning_rate": 9.083619702176404e-05,
323
+ "epoch": 53.96,
324
+ "step": 215
325
+ },
326
+ {
327
+ "loss": 0.0267,
328
+ "grad_norm": 0.18880419433116913,
329
+ "learning_rate": 9.054982817869416e-05,
330
+ "epoch": 55.0,
331
+ "step": 220
332
+ },
333
+ {
334
+ "loss": 0.0258,
335
+ "grad_norm": 0.05329155549407005,
336
+ "learning_rate": 9.026345933562429e-05,
337
+ "epoch": 56.32,
338
+ "step": 225
339
+ },
340
+ {
341
+ "loss": 0.0264,
342
+ "grad_norm": 0.05351603031158447,
343
+ "learning_rate": 8.997709049255442e-05,
344
+ "epoch": 57.64,
345
+ "step": 230
346
+ },
347
+ {
348
+ "loss": 0.0266,
349
+ "grad_norm": 0.05472696200013161,
350
+ "learning_rate": 8.969072164948454e-05,
351
+ "epoch": 58.96,
352
+ "step": 235
353
+ },
354
+ {
355
+ "loss": 0.0255,
356
+ "grad_norm": 0.17182305455207825,
357
+ "learning_rate": 8.940435280641467e-05,
358
+ "epoch": 60.0,
359
+ "step": 240
360
+ },
361
+ {
362
+ "loss": 0.0259,
363
+ "grad_norm": 0.05441403388977051,
364
+ "learning_rate": 8.91179839633448e-05,
365
+ "epoch": 61.32,
366
+ "step": 245
367
+ },
368
+ {
369
+ "loss": 0.025,
370
+ "grad_norm": 0.05443132296204567,
371
+ "learning_rate": 8.883161512027491e-05,
372
+ "epoch": 62.64,
373
+ "step": 250
374
+ },
375
+ {
376
+ "loss": 0.0261,
377
+ "grad_norm": 0.05410757660865784,
378
+ "learning_rate": 8.854524627720504e-05,
379
+ "epoch": 63.96,
380
+ "step": 255
381
+ },
382
+ {
383
+ "loss": 0.0265,
384
+ "grad_norm": 0.16327381134033203,
385
+ "learning_rate": 8.825887743413516e-05,
386
+ "epoch": 65.0,
387
+ "step": 260
388
+ },
389
+ {
390
+ "loss": 0.0251,
391
+ "grad_norm": 0.05516252666711807,
392
+ "learning_rate": 8.797250859106529e-05,
393
+ "epoch": 66.32,
394
+ "step": 265
395
+ },
396
+ {
397
+ "loss": 0.0255,
398
+ "grad_norm": 0.0483415424823761,
399
+ "learning_rate": 8.768613974799542e-05,
400
+ "epoch": 67.64,
401
+ "step": 270
402
+ },
403
+ {
404
+ "loss": 0.0247,
405
+ "grad_norm": 0.062226541340351105,
406
+ "learning_rate": 8.739977090492554e-05,
407
+ "epoch": 68.96,
408
+ "step": 275
409
+ },
410
+ {
411
+ "loss": 0.0267,
412
+ "grad_norm": 0.20358847081661224,
413
+ "learning_rate": 8.711340206185567e-05,
414
+ "epoch": 70.0,
415
+ "step": 280
416
+ },
417
+ {
418
+ "loss": 0.0255,
419
+ "grad_norm": 0.04628003016114235,
420
+ "learning_rate": 8.682703321878581e-05,
421
+ "epoch": 71.32,
422
+ "step": 285
423
+ },
424
+ {
425
+ "loss": 0.0257,
426
+ "grad_norm": 0.06483373790979385,
427
+ "learning_rate": 8.654066437571594e-05,
428
+ "epoch": 72.64,
429
+ "step": 290
430
+ },
431
+ {
432
+ "loss": 0.0244,
433
+ "grad_norm": 0.04926105588674545,
434
+ "learning_rate": 8.625429553264606e-05,
435
+ "epoch": 73.96,
436
+ "step": 295
437
+ },
438
+ {
439
+ "loss": 0.0239,
440
+ "grad_norm": 0.1988091617822647,
441
+ "learning_rate": 8.596792668957619e-05,
442
+ "epoch": 75.0,
443
+ "step": 300
444
+ },
445
+ {
446
+ "loss": 0.0248,
447
+ "grad_norm": 0.04305023327469826,
448
+ "learning_rate": 8.56815578465063e-05,
449
+ "epoch": 76.32,
450
+ "step": 305
451
+ },
452
+ {
453
+ "loss": 0.0254,
454
+ "grad_norm": 0.04323578625917435,
455
+ "learning_rate": 8.539518900343643e-05,
456
+ "epoch": 77.64,
457
+ "step": 310
458
+ },
459
+ {
460
+ "loss": 0.0254,
461
+ "grad_norm": 0.04426678270101547,
462
+ "learning_rate": 8.510882016036655e-05,
463
+ "epoch": 78.96,
464
+ "step": 315
465
+ },
466
+ {
467
+ "loss": 0.0259,
468
+ "grad_norm": 0.14689449965953827,
469
+ "learning_rate": 8.482245131729668e-05,
470
+ "epoch": 80.0,
471
+ "step": 320
472
+ },
473
+ {
474
+ "loss": 0.0256,
475
+ "grad_norm": 0.04256561025977135,
476
+ "learning_rate": 8.453608247422681e-05,
477
+ "epoch": 81.32,
478
+ "step": 325
479
+ },
480
+ {
481
+ "loss": 0.0235,
482
+ "grad_norm": 0.03943061828613281,
483
+ "learning_rate": 8.424971363115693e-05,
484
+ "epoch": 82.64,
485
+ "step": 330
486
+ },
487
+ {
488
+ "loss": 0.0249,
489
+ "grad_norm": 0.041899990290403366,
490
+ "learning_rate": 8.396334478808706e-05,
491
+ "epoch": 83.96,
492
+ "step": 335
493
+ },
494
+ {
495
+ "loss": 0.0255,
496
+ "grad_norm": 0.151236429810524,
497
+ "learning_rate": 8.367697594501719e-05,
498
+ "epoch": 85.0,
499
+ "step": 340
500
+ },
501
+ {
502
+ "loss": 0.0244,
503
+ "grad_norm": 0.042102884501218796,
504
+ "learning_rate": 8.339060710194731e-05,
505
+ "epoch": 86.32,
506
+ "step": 345
507
+ },
508
+ {
509
+ "loss": 0.0251,
510
+ "grad_norm": 0.04723669961094856,
511
+ "learning_rate": 8.310423825887744e-05,
512
+ "epoch": 87.64,
513
+ "step": 350
514
+ },
515
+ {
516
+ "loss": 0.0261,
517
+ "grad_norm": 0.0578082799911499,
518
+ "learning_rate": 8.281786941580757e-05,
519
+ "epoch": 88.96,
520
+ "step": 355
521
+ },
522
+ {
523
+ "loss": 0.0225,
524
+ "grad_norm": 0.10269813239574432,
525
+ "learning_rate": 8.253150057273768e-05,
526
+ "epoch": 90.0,
527
+ "step": 360
528
+ },
529
+ {
530
+ "loss": 0.0262,
531
+ "grad_norm": 0.046400491148233414,
532
+ "learning_rate": 8.224513172966782e-05,
533
+ "epoch": 91.32,
534
+ "step": 365
535
+ },
536
+ {
537
+ "loss": 0.0239,
538
+ "grad_norm": 0.04183673858642578,
539
+ "learning_rate": 8.195876288659795e-05,
540
+ "epoch": 92.64,
541
+ "step": 370
542
+ },
543
+ {
544
+ "loss": 0.0263,
545
+ "grad_norm": 0.04400316998362541,
546
+ "learning_rate": 8.167239404352807e-05,
547
+ "epoch": 93.96,
548
+ "step": 375
549
+ },
550
+ {
551
+ "loss": 0.025,
552
+ "grad_norm": 0.10862386226654053,
553
+ "learning_rate": 8.13860252004582e-05,
554
+ "epoch": 95.0,
555
+ "step": 380
556
+ },
557
+ {
558
+ "loss": 0.0248,
559
+ "grad_norm": 0.05308162048459053,
560
+ "learning_rate": 8.109965635738833e-05,
561
+ "epoch": 96.32,
562
+ "step": 385
563
+ },
564
+ {
565
+ "loss": 0.0244,
566
+ "grad_norm": 0.04261139780282974,
567
+ "learning_rate": 8.081328751431845e-05,
568
+ "epoch": 97.64,
569
+ "step": 390
570
+ },
571
+ {
572
+ "loss": 0.0253,
573
+ "grad_norm": 0.05337546020746231,
574
+ "learning_rate": 8.052691867124858e-05,
575
+ "epoch": 98.96,
576
+ "step": 395
577
+ },
578
+ {
579
+ "loss": 0.0243,
580
+ "grad_norm": 0.15639856457710266,
581
+ "learning_rate": 8.02405498281787e-05,
582
+ "epoch": 100.0,
583
+ "step": 400
584
+ },
585
+ {
586
+ "loss": 0.0258,
587
+ "grad_norm": 0.04450729116797447,
588
+ "learning_rate": 7.995418098510883e-05,
589
+ "epoch": 101.32,
590
+ "step": 405
591
+ },
592
+ {
593
+ "loss": 0.0244,
594
+ "grad_norm": 0.042327046394348145,
595
+ "learning_rate": 7.966781214203894e-05,
596
+ "epoch": 102.64,
597
+ "step": 410
598
+ },
599
+ {
600
+ "loss": 0.0253,
601
+ "grad_norm": 0.04105006903409958,
602
+ "learning_rate": 7.938144329896907e-05,
603
+ "epoch": 103.96,
604
+ "step": 415
605
+ },
606
+ {
607
+ "loss": 0.0261,
608
+ "grad_norm": 0.17930248379707336,
609
+ "learning_rate": 7.90950744558992e-05,
610
+ "epoch": 105.0,
611
+ "step": 420
612
+ },
613
+ {
614
+ "loss": 0.0241,
615
+ "grad_norm": 0.04404031112790108,
616
+ "learning_rate": 7.880870561282932e-05,
617
+ "epoch": 106.32,
618
+ "step": 425
619
+ },
620
+ {
621
+ "loss": 0.0245,
622
+ "grad_norm": 0.04142986983060837,
623
+ "learning_rate": 7.852233676975945e-05,
624
+ "epoch": 107.64,
625
+ "step": 430
626
+ },
627
+ {
628
+ "loss": 0.0254,
629
+ "grad_norm": 0.041959185153245926,
630
+ "learning_rate": 7.823596792668958e-05,
631
+ "epoch": 108.96,
632
+ "step": 435
633
+ },
634
+ {
635
+ "loss": 0.0292,
636
+ "grad_norm": 0.27740439772605896,
637
+ "learning_rate": 7.79495990836197e-05,
638
+ "epoch": 110.0,
639
+ "step": 440
640
+ },
641
+ {
642
+ "loss": 0.026,
643
+ "grad_norm": 0.03657572343945503,
644
+ "learning_rate": 7.766323024054983e-05,
645
+ "epoch": 111.32,
646
+ "step": 445
647
+ },
648
+ {
649
+ "loss": 0.0251,
650
+ "grad_norm": 0.042320434004068375,
651
+ "learning_rate": 7.737686139747996e-05,
652
+ "epoch": 112.64,
653
+ "step": 450
654
+ },
655
+ {
656
+ "loss": 0.026,
657
+ "grad_norm": 0.0473681204020977,
658
+ "learning_rate": 7.709049255441008e-05,
659
+ "epoch": 113.96,
660
+ "step": 455
661
+ },
662
+ {
663
+ "loss": 0.0241,
664
+ "grad_norm": 0.1326676607131958,
665
+ "learning_rate": 7.680412371134021e-05,
666
+ "epoch": 115.0,
667
+ "step": 460
668
+ },
669
+ {
670
+ "loss": 0.0236,
671
+ "grad_norm": 0.04483647271990776,
672
+ "learning_rate": 7.651775486827034e-05,
673
+ "epoch": 116.32,
674
+ "step": 465
675
+ },
676
+ {
677
+ "loss": 0.0235,
678
+ "grad_norm": 0.038961004465818405,
679
+ "learning_rate": 7.623138602520046e-05,
680
+ "epoch": 117.64,
681
+ "step": 470
682
+ },
683
+ {
684
+ "loss": 0.0252,
685
+ "grad_norm": 0.042134907096624374,
686
+ "learning_rate": 7.594501718213059e-05,
687
+ "epoch": 118.96,
688
+ "step": 475
689
+ },
690
+ {
691
+ "loss": 0.024,
692
+ "grad_norm": 0.13292020559310913,
693
+ "learning_rate": 7.565864833906071e-05,
694
+ "epoch": 120.0,
695
+ "step": 480
696
+ },
697
+ {
698
+ "loss": 0.025,
699
+ "grad_norm": 0.03745294362306595,
700
+ "learning_rate": 7.537227949599084e-05,
701
+ "epoch": 121.32,
702
+ "step": 485
703
+ },
704
+ {
705
+ "loss": 0.0253,
706
+ "grad_norm": 0.035545315593481064,
707
+ "learning_rate": 7.508591065292097e-05,
708
+ "epoch": 122.64,
709
+ "step": 490
710
+ },
711
+ {
712
+ "loss": 0.026,
713
+ "grad_norm": 0.03991984575986862,
714
+ "learning_rate": 7.47995418098511e-05,
715
+ "epoch": 123.96,
716
+ "step": 495
717
+ },
718
+ {
719
+ "loss": 0.0246,
720
+ "grad_norm": 0.1339961290359497,
721
+ "learning_rate": 7.451317296678122e-05,
722
+ "epoch": 125.0,
723
+ "step": 500
724
+ },
725
+ {
726
+ "loss": 0.0235,
727
+ "grad_norm": 0.04381132498383522,
728
+ "learning_rate": 7.422680412371135e-05,
729
+ "epoch": 126.32,
730
+ "step": 505
731
+ },
732
+ {
733
+ "loss": 0.0242,
734
+ "grad_norm": 0.048515841364860535,
735
+ "learning_rate": 7.394043528064147e-05,
736
+ "epoch": 127.64,
737
+ "step": 510
738
+ },
739
+ {
740
+ "loss": 0.0249,
741
+ "grad_norm": 0.04145604744553566,
742
+ "learning_rate": 7.36540664375716e-05,
743
+ "epoch": 128.96,
744
+ "step": 515
745
+ },
746
+ {
747
+ "loss": 0.0247,
748
+ "grad_norm": 0.14400818943977356,
749
+ "learning_rate": 7.336769759450171e-05,
750
+ "epoch": 130.0,
751
+ "step": 520
752
+ },
753
+ {
754
+ "loss": 0.0241,
755
+ "grad_norm": 0.04025031998753548,
756
+ "learning_rate": 7.308132875143184e-05,
757
+ "epoch": 131.32,
758
+ "step": 525
759
+ },
760
+ {
761
+ "loss": 0.0242,
762
+ "grad_norm": 0.037277135998010635,
763
+ "learning_rate": 7.279495990836197e-05,
764
+ "epoch": 132.64,
765
+ "step": 530
766
+ },
767
+ {
768
+ "loss": 0.0251,
769
+ "grad_norm": 0.03666083887219429,
770
+ "learning_rate": 7.250859106529209e-05,
771
+ "epoch": 133.96,
772
+ "step": 535
773
+ },
774
+ {
775
+ "loss": 0.0241,
776
+ "grad_norm": 0.09921745210886002,
777
+ "learning_rate": 7.222222222222222e-05,
778
+ "epoch": 135.0,
779
+ "step": 540
780
+ },
781
+ {
782
+ "loss": 0.0247,
783
+ "grad_norm": 0.0382193848490715,
784
+ "learning_rate": 7.193585337915235e-05,
785
+ "epoch": 136.32,
786
+ "step": 545
787
+ },
788
+ {
789
+ "loss": 0.0239,
790
+ "grad_norm": 0.0314810685813427,
791
+ "learning_rate": 7.164948453608247e-05,
792
+ "epoch": 137.64,
793
+ "step": 550
794
+ },
795
+ {
796
+ "loss": 0.0243,
797
+ "grad_norm": 0.04278745502233505,
798
+ "learning_rate": 7.136311569301261e-05,
799
+ "epoch": 138.96,
800
+ "step": 555
801
+ },
802
+ {
803
+ "loss": 0.0234,
804
+ "grad_norm": 0.09295342862606049,
805
+ "learning_rate": 7.107674684994274e-05,
806
+ "epoch": 140.0,
807
+ "step": 560
808
+ },
809
+ {
810
+ "loss": 0.0248,
811
+ "grad_norm": 0.03429599106311798,
812
+ "learning_rate": 7.079037800687286e-05,
813
+ "epoch": 141.32,
814
+ "step": 565
815
+ },
816
+ {
817
+ "loss": 0.0234,
818
+ "grad_norm": 0.03622185438871384,
819
+ "learning_rate": 7.050400916380299e-05,
820
+ "epoch": 142.64,
821
+ "step": 570
822
+ },
823
+ {
824
+ "loss": 0.0242,
825
+ "grad_norm": 0.042615506798028946,
826
+ "learning_rate": 7.02176403207331e-05,
827
+ "epoch": 143.96,
828
+ "step": 575
829
+ },
830
+ {
831
+ "loss": 0.0268,
832
+ "grad_norm": 0.13792142271995544,
833
+ "learning_rate": 6.993127147766323e-05,
834
+ "epoch": 145.0,
835
+ "step": 580
836
+ },
837
+ {
838
+ "loss": 0.0231,
839
+ "grad_norm": 0.035664405673742294,
840
+ "learning_rate": 6.964490263459336e-05,
841
+ "epoch": 146.32,
842
+ "step": 585
843
+ },
844
+ {
845
+ "loss": 0.0258,
846
+ "grad_norm": 0.033511932939291,
847
+ "learning_rate": 6.935853379152348e-05,
848
+ "epoch": 147.64,
849
+ "step": 590
850
+ },
851
+ {
852
+ "loss": 0.0248,
853
+ "grad_norm": 0.036591917276382446,
854
+ "learning_rate": 6.907216494845361e-05,
855
+ "epoch": 148.96,
856
+ "step": 595
857
+ },
858
+ {
859
+ "loss": 0.0257,
860
+ "grad_norm": 0.11892726272344589,
861
+ "learning_rate": 6.878579610538374e-05,
862
+ "epoch": 150.0,
863
+ "step": 600
864
+ },
865
+ {
866
+ "loss": 0.0246,
867
+ "grad_norm": 0.03532181680202484,
868
+ "learning_rate": 6.849942726231386e-05,
869
+ "epoch": 151.32,
870
+ "step": 605
871
+ },
872
+ {
873
+ "loss": 0.0244,
874
+ "grad_norm": 0.039349090307950974,
875
+ "learning_rate": 6.821305841924399e-05,
876
+ "epoch": 152.64,
877
+ "step": 610
878
+ },
879
+ {
880
+ "loss": 0.0247,
881
+ "grad_norm": 0.03686106950044632,
882
+ "learning_rate": 6.792668957617412e-05,
883
+ "epoch": 153.96,
884
+ "step": 615
885
+ },
886
+ {
887
+ "loss": 0.0231,
888
+ "grad_norm": 0.08257201313972473,
889
+ "learning_rate": 6.764032073310424e-05,
890
+ "epoch": 155.0,
891
+ "step": 620
892
+ },
893
+ {
894
+ "loss": 0.0243,
895
+ "grad_norm": 0.035335343331098557,
896
+ "learning_rate": 6.735395189003437e-05,
897
+ "epoch": 156.32,
898
+ "step": 625
899
+ },
900
+ {
901
+ "loss": 0.0239,
902
+ "grad_norm": 0.030693387612700462,
903
+ "learning_rate": 6.706758304696448e-05,
904
+ "epoch": 157.64,
905
+ "step": 630
906
+ },
907
+ {
908
+ "loss": 0.0236,
909
+ "grad_norm": 0.031573694199323654,
910
+ "learning_rate": 6.678121420389462e-05,
911
+ "epoch": 158.96,
912
+ "step": 635
913
+ },
914
+ {
915
+ "loss": 0.0247,
916
+ "grad_norm": 0.11772840470075607,
917
+ "learning_rate": 6.649484536082475e-05,
918
+ "epoch": 160.0,
919
+ "step": 640
920
+ },
921
+ {
922
+ "loss": 0.0231,
923
+ "grad_norm": 0.03553156182169914,
924
+ "learning_rate": 6.620847651775487e-05,
925
+ "epoch": 161.32,
926
+ "step": 645
927
+ },
928
+ {
929
+ "loss": 0.0247,
930
+ "grad_norm": 0.04065680876374245,
931
+ "learning_rate": 6.5922107674685e-05,
932
+ "epoch": 162.64,
933
+ "step": 650
934
+ },
935
+ {
936
+ "loss": 0.0244,
937
+ "grad_norm": 0.03680557757616043,
938
+ "learning_rate": 6.563573883161513e-05,
939
+ "epoch": 163.96,
940
+ "step": 655
941
+ },
942
+ {
943
+ "loss": 0.0254,
944
+ "grad_norm": 0.1432940512895584,
945
+ "learning_rate": 6.534936998854525e-05,
946
+ "epoch": 165.0,
947
+ "step": 660
948
+ },
949
+ {
950
+ "loss": 0.024,
951
+ "grad_norm": 0.0374530591070652,
952
+ "learning_rate": 6.506300114547538e-05,
953
+ "epoch": 166.32,
954
+ "step": 665
955
+ },
956
+ {
957
+ "loss": 0.0242,
958
+ "grad_norm": 0.039093125611543655,
959
+ "learning_rate": 6.477663230240551e-05,
960
+ "epoch": 167.64,
961
+ "step": 670
962
+ },
963
+ {
964
+ "loss": 0.0238,
965
+ "grad_norm": 0.03439056873321533,
966
+ "learning_rate": 6.449026345933563e-05,
967
+ "epoch": 168.96,
968
+ "step": 675
969
+ },
970
+ {
971
+ "loss": 0.0224,
972
+ "grad_norm": 0.07211510837078094,
973
+ "learning_rate": 6.420389461626576e-05,
974
+ "epoch": 170.0,
975
+ "step": 680
976
+ },
977
+ {
978
+ "loss": 0.0246,
979
+ "grad_norm": 0.03178408369421959,
980
+ "learning_rate": 6.391752577319587e-05,
981
+ "epoch": 171.32,
982
+ "step": 685
983
+ },
984
+ {
985
+ "loss": 0.0255,
986
+ "grad_norm": 0.02913156896829605,
987
+ "learning_rate": 6.3631156930126e-05,
988
+ "epoch": 172.64,
989
+ "step": 690
990
+ },
991
+ {
992
+ "loss": 0.0257,
993
+ "grad_norm": 0.03487716615200043,
994
+ "learning_rate": 6.334478808705613e-05,
995
+ "epoch": 173.96,
996
+ "step": 695
997
+ },
998
+ {
999
+ "loss": 0.0253,
1000
+ "grad_norm": 0.12451174110174179,
1001
+ "learning_rate": 6.305841924398625e-05,
1002
+ "epoch": 175.0,
1003
+ "step": 700
1004
+ },
1005
+ {
1006
+ "loss": 0.0241,
1007
+ "grad_norm": 0.0366508811712265,
1008
+ "learning_rate": 6.277205040091638e-05,
1009
+ "epoch": 176.32,
1010
+ "step": 705
1011
+ },
1012
+ {
1013
+ "loss": 0.0242,
1014
+ "grad_norm": 0.03491870313882828,
1015
+ "learning_rate": 6.24856815578465e-05,
1016
+ "epoch": 177.64,
1017
+ "step": 710
1018
+ },
1019
+ {
1020
+ "loss": 0.0257,
1021
+ "grad_norm": 0.03027982823550701,
1022
+ "learning_rate": 6.219931271477663e-05,
1023
+ "epoch": 178.96,
1024
+ "step": 715
1025
+ },
1026
+ {
1027
+ "loss": 0.0236,
1028
+ "grad_norm": 0.08150530606508255,
1029
+ "learning_rate": 6.191294387170676e-05,
1030
+ "epoch": 180.0,
1031
+ "step": 720
1032
+ },
1033
+ {
1034
+ "loss": 0.0232,
1035
+ "grad_norm": 0.03483245149254799,
1036
+ "learning_rate": 6.162657502863689e-05,
1037
+ "epoch": 181.32,
1038
+ "step": 725
1039
+ },
1040
+ {
1041
+ "loss": 0.0241,
1042
+ "grad_norm": 0.034706421196460724,
1043
+ "learning_rate": 6.134020618556701e-05,
1044
+ "epoch": 182.64,
1045
+ "step": 730
1046
+ },
1047
+ {
1048
+ "loss": 0.0233,
1049
+ "grad_norm": 0.03622004762291908,
1050
+ "learning_rate": 6.105383734249714e-05,
1051
+ "epoch": 183.96,
1052
+ "step": 735
1053
+ },
1054
+ {
1055
+ "loss": 0.0249,
1056
+ "grad_norm": 0.10144224017858505,
1057
+ "learning_rate": 6.076746849942726e-05,
1058
+ "epoch": 185.0,
1059
+ "step": 740
1060
+ },
1061
+ {
1062
+ "loss": 0.0238,
1063
+ "grad_norm": 0.03530497848987579,
1064
+ "learning_rate": 6.0481099656357384e-05,
1065
+ "epoch": 186.32,
1066
+ "step": 745
1067
+ },
1068
+ {
1069
+ "loss": 0.0245,
1070
+ "grad_norm": 0.034086182713508606,
1071
+ "learning_rate": 6.019473081328752e-05,
1072
+ "epoch": 187.64,
1073
+ "step": 750
1074
+ },
1075
+ {
1076
+ "loss": 0.0243,
1077
+ "grad_norm": 0.039041388779878616,
1078
+ "learning_rate": 5.9908361970217644e-05,
1079
+ "epoch": 188.96,
1080
+ "step": 755
1081
+ },
1082
+ {
1083
+ "loss": 0.0245,
1084
+ "grad_norm": 0.1247899979352951,
1085
+ "learning_rate": 5.962199312714777e-05,
1086
+ "epoch": 190.0,
1087
+ "step": 760
1088
+ },
1089
+ {
1090
+ "loss": 0.0238,
1091
+ "grad_norm": 0.035458508878946304,
1092
+ "learning_rate": 5.93356242840779e-05,
1093
+ "epoch": 191.32,
1094
+ "step": 765
1095
+ },
1096
+ {
1097
+ "loss": 0.0244,
1098
+ "grad_norm": 0.03673034906387329,
1099
+ "learning_rate": 5.904925544100802e-05,
1100
+ "epoch": 192.64,
1101
+ "step": 770
1102
+ },
1103
+ {
1104
+ "loss": 0.0239,
1105
+ "grad_norm": 0.03364979103207588,
1106
+ "learning_rate": 5.876288659793815e-05,
1107
+ "epoch": 193.96,
1108
+ "step": 775
1109
+ },
1110
+ {
1111
+ "loss": 0.0235,
1112
+ "grad_norm": 0.09387586265802383,
1113
+ "learning_rate": 5.8476517754868276e-05,
1114
+ "epoch": 195.0,
1115
+ "step": 780
1116
+ },
1117
+ {
1118
+ "loss": 0.0248,
1119
+ "grad_norm": 0.03462570905685425,
1120
+ "learning_rate": 5.81901489117984e-05,
1121
+ "epoch": 196.32,
1122
+ "step": 785
1123
+ },
1124
+ {
1125
+ "loss": 0.0246,
1126
+ "grad_norm": 0.03342005982995033,
1127
+ "learning_rate": 5.790378006872853e-05,
1128
+ "epoch": 197.64,
1129
+ "step": 790
1130
+ },
1131
+ {
1132
+ "loss": 0.0246,
1133
+ "grad_norm": 0.041909925639629364,
1134
+ "learning_rate": 5.761741122565865e-05,
1135
+ "epoch": 198.96,
1136
+ "step": 795
1137
+ },
1138
+ {
1139
+ "loss": 0.0258,
1140
+ "grad_norm": 0.15439164638519287,
1141
+ "learning_rate": 5.7331042382588775e-05,
1142
+ "epoch": 200.0,
1143
+ "step": 800
1144
+ },
1145
+ {
1146
+ "loss": 0.0236,
1147
+ "grad_norm": 0.02883634716272354,
1148
+ "learning_rate": 5.70446735395189e-05,
1149
+ "epoch": 201.32,
1150
+ "step": 805
1151
+ },
1152
+ {
1153
+ "loss": 0.0235,
1154
+ "grad_norm": 0.029865020886063576,
1155
+ "learning_rate": 5.675830469644903e-05,
1156
+ "epoch": 202.64,
1157
+ "step": 810
1158
+ },
1159
+ {
1160
+ "loss": 0.024,
1161
+ "grad_norm": 0.030608315020799637,
1162
+ "learning_rate": 5.6471935853379155e-05,
1163
+ "epoch": 203.96,
1164
+ "step": 815
1165
+ },
1166
+ {
1167
+ "loss": 0.0224,
1168
+ "grad_norm": 0.07783036679029465,
1169
+ "learning_rate": 5.618556701030928e-05,
1170
+ "epoch": 205.0,
1171
+ "step": 820
1172
+ },
1173
+ {
1174
+ "loss": 0.0233,
1175
+ "grad_norm": 0.035508111119270325,
1176
+ "learning_rate": 5.589919816723941e-05,
1177
+ "epoch": 206.32,
1178
+ "step": 825
1179
+ },
1180
+ {
1181
+ "loss": 0.0242,
1182
+ "grad_norm": 0.03703364357352257,
1183
+ "learning_rate": 5.5612829324169534e-05,
1184
+ "epoch": 207.64,
1185
+ "step": 830
1186
+ },
1187
+ {
1188
+ "loss": 0.0239,
1189
+ "grad_norm": 0.030922846868634224,
1190
+ "learning_rate": 5.532646048109966e-05,
1191
+ "epoch": 208.96,
1192
+ "step": 835
1193
+ },
1194
+ {
1195
+ "loss": 0.0236,
1196
+ "grad_norm": 0.11316124349832535,
1197
+ "learning_rate": 5.504009163802979e-05,
1198
+ "epoch": 210.0,
1199
+ "step": 840
1200
+ },
1201
+ {
1202
+ "loss": 0.0237,
1203
+ "grad_norm": 0.032941922545433044,
1204
+ "learning_rate": 5.4753722794959914e-05,
1205
+ "epoch": 211.32,
1206
+ "step": 845
1207
+ },
1208
+ {
1209
+ "loss": 0.0235,
1210
+ "grad_norm": 0.028119860216975212,
1211
+ "learning_rate": 5.4467353951890033e-05,
1212
+ "epoch": 212.64,
1213
+ "step": 850
1214
+ },
1215
+ {
1216
+ "loss": 0.023,
1217
+ "grad_norm": 0.03130020201206207,
1218
+ "learning_rate": 5.418098510882016e-05,
1219
+ "epoch": 213.96,
1220
+ "step": 855
1221
+ },
1222
+ {
1223
+ "loss": 0.0226,
1224
+ "grad_norm": 0.06978127360343933,
1225
+ "learning_rate": 5.3894616265750286e-05,
1226
+ "epoch": 215.0,
1227
+ "step": 860
1228
+ },
1229
+ {
1230
+ "loss": 0.0231,
1231
+ "grad_norm": 0.030422938987612724,
1232
+ "learning_rate": 5.360824742268041e-05,
1233
+ "epoch": 216.32,
1234
+ "step": 865
1235
+ },
1236
+ {
1237
+ "loss": 0.0238,
1238
+ "grad_norm": 0.028223881497979164,
1239
+ "learning_rate": 5.332187857961054e-05,
1240
+ "epoch": 217.64,
1241
+ "step": 870
1242
+ },
1243
+ {
1244
+ "loss": 0.0243,
1245
+ "grad_norm": 0.029208194464445114,
1246
+ "learning_rate": 5.3035509736540666e-05,
1247
+ "epoch": 218.96,
1248
+ "step": 875
1249
+ },
1250
+ {
1251
+ "loss": 0.0271,
1252
+ "grad_norm": 0.16511231660842896,
1253
+ "learning_rate": 5.274914089347079e-05,
1254
+ "epoch": 220.0,
1255
+ "step": 880
1256
+ },
1257
+ {
1258
+ "loss": 0.0243,
1259
+ "grad_norm": 0.03705955296754837,
1260
+ "learning_rate": 5.246277205040092e-05,
1261
+ "epoch": 221.32,
1262
+ "step": 885
1263
+ },
1264
+ {
1265
+ "loss": 0.0241,
1266
+ "grad_norm": 0.030203381553292274,
1267
+ "learning_rate": 5.2176403207331045e-05,
1268
+ "epoch": 222.64,
1269
+ "step": 890
1270
+ },
1271
+ {
1272
+ "loss": 0.0234,
1273
+ "grad_norm": 0.027039049193263054,
1274
+ "learning_rate": 5.189003436426118e-05,
1275
+ "epoch": 223.96,
1276
+ "step": 895
1277
+ },
1278
+ {
1279
+ "loss": 0.0254,
1280
+ "grad_norm": 0.11282758414745331,
1281
+ "learning_rate": 5.1603665521191305e-05,
1282
+ "epoch": 225.0,
1283
+ "step": 900
1284
+ },
1285
+ {
1286
+ "loss": 0.0236,
1287
+ "grad_norm": 0.03700408712029457,
1288
+ "learning_rate": 5.131729667812142e-05,
1289
+ "epoch": 226.32,
1290
+ "step": 905
1291
+ },
1292
+ {
1293
+ "loss": 0.024,
1294
+ "grad_norm": 0.030705822631716728,
1295
+ "learning_rate": 5.1030927835051544e-05,
1296
+ "epoch": 227.64,
1297
+ "step": 910
1298
+ },
1299
+ {
1300
+ "loss": 0.0238,
1301
+ "grad_norm": 0.03678268566727638,
1302
+ "learning_rate": 5.074455899198167e-05,
1303
+ "epoch": 228.96,
1304
+ "step": 915
1305
+ },
1306
+ {
1307
+ "loss": 0.0269,
1308
+ "grad_norm": 0.12632058560848236,
1309
+ "learning_rate": 5.04581901489118e-05,
1310
+ "epoch": 230.0,
1311
+ "step": 920
1312
+ },
1313
+ {
1314
+ "loss": 0.0244,
1315
+ "grad_norm": 0.030165374279022217,
1316
+ "learning_rate": 5.0171821305841924e-05,
1317
+ "epoch": 231.32,
1318
+ "step": 925
1319
+ },
1320
+ {
1321
+ "loss": 0.0239,
1322
+ "grad_norm": 0.029971277341246605,
1323
+ "learning_rate": 4.988545246277205e-05,
1324
+ "epoch": 232.64,
1325
+ "step": 930
1326
+ },
1327
+ {
1328
+ "loss": 0.0237,
1329
+ "grad_norm": 0.033762127161026,
1330
+ "learning_rate": 4.9599083619702184e-05,
1331
+ "epoch": 233.96,
1332
+ "step": 935
1333
+ },
1334
+ {
1335
+ "loss": 0.0236,
1336
+ "grad_norm": 0.09928340464830399,
1337
+ "learning_rate": 4.931271477663231e-05,
1338
+ "epoch": 235.0,
1339
+ "step": 940
1340
+ },
1341
+ {
1342
+ "loss": 0.0238,
1343
+ "grad_norm": 0.030009057372808456,
1344
+ "learning_rate": 4.902634593356243e-05,
1345
+ "epoch": 236.32,
1346
+ "step": 945
1347
+ },
1348
+ {
1349
+ "loss": 0.0239,
1350
+ "grad_norm": 0.03369998559355736,
1351
+ "learning_rate": 4.8739977090492556e-05,
1352
+ "epoch": 237.64,
1353
+ "step": 950
1354
+ },
1355
+ {
1356
+ "loss": 0.0251,
1357
+ "grad_norm": 0.03107636794447899,
1358
+ "learning_rate": 4.845360824742268e-05,
1359
+ "epoch": 238.96,
1360
+ "step": 955
1361
+ },
1362
+ {
1363
+ "loss": 0.0227,
1364
+ "grad_norm": 0.10390744358301163,
1365
+ "learning_rate": 4.816723940435281e-05,
1366
+ "epoch": 240.0,
1367
+ "step": 960
1368
+ },
1369
+ {
1370
+ "loss": 0.0242,
1371
+ "grad_norm": 0.03572176396846771,
1372
+ "learning_rate": 4.7880870561282936e-05,
1373
+ "epoch": 241.32,
1374
+ "step": 965
1375
+ },
1376
+ {
1377
+ "loss": 0.0232,
1378
+ "grad_norm": 0.03051804192364216,
1379
+ "learning_rate": 4.7594501718213055e-05,
1380
+ "epoch": 242.64,
1381
+ "step": 970
1382
+ },
1383
+ {
1384
+ "loss": 0.0241,
1385
+ "grad_norm": 0.031635165214538574,
1386
+ "learning_rate": 4.730813287514318e-05,
1387
+ "epoch": 243.96,
1388
+ "step": 975
1389
+ },
1390
+ {
1391
+ "loss": 0.0231,
1392
+ "grad_norm": 0.0863058865070343,
1393
+ "learning_rate": 4.7021764032073315e-05,
1394
+ "epoch": 245.0,
1395
+ "step": 980
1396
+ },
1397
+ {
1398
+ "loss": 0.0237,
1399
+ "grad_norm": 0.03220526874065399,
1400
+ "learning_rate": 4.673539518900344e-05,
1401
+ "epoch": 246.32,
1402
+ "step": 985
1403
+ },
1404
+ {
1405
+ "loss": 0.0229,
1406
+ "grad_norm": 0.030770031735301018,
1407
+ "learning_rate": 4.644902634593357e-05,
1408
+ "epoch": 247.64,
1409
+ "step": 990
1410
+ },
1411
+ {
1412
+ "loss": 0.0233,
1413
+ "grad_norm": 0.036592498421669006,
1414
+ "learning_rate": 4.6162657502863694e-05,
1415
+ "epoch": 248.96,
1416
+ "step": 995
1417
+ },
1418
+ {
1419
+ "loss": 0.0233,
1420
+ "grad_norm": 0.09140961617231369,
1421
+ "learning_rate": 4.5876288659793814e-05,
1422
+ "epoch": 250.0,
1423
+ "step": 1000
1424
+ },
1425
+ {
1426
+ "loss": 0.0234,
1427
+ "grad_norm": 0.03191279247403145,
1428
+ "learning_rate": 4.558991981672394e-05,
1429
+ "epoch": 251.32,
1430
+ "step": 1005
1431
+ },
1432
+ {
1433
+ "loss": 0.024,
1434
+ "grad_norm": 0.02950333058834076,
1435
+ "learning_rate": 4.530355097365407e-05,
1436
+ "epoch": 252.64,
1437
+ "step": 1010
1438
+ },
1439
+ {
1440
+ "loss": 0.0233,
1441
+ "grad_norm": 0.031532324850559235,
1442
+ "learning_rate": 4.5017182130584194e-05,
1443
+ "epoch": 253.96,
1444
+ "step": 1015
1445
+ },
1446
+ {
1447
+ "loss": 0.0228,
1448
+ "grad_norm": 0.10817220062017441,
1449
+ "learning_rate": 4.473081328751432e-05,
1450
+ "epoch": 255.0,
1451
+ "step": 1020
1452
+ },
1453
+ {
1454
+ "loss": 0.0249,
1455
+ "grad_norm": 0.03229045867919922,
1456
+ "learning_rate": 4.4444444444444447e-05,
1457
+ "epoch": 256.32,
1458
+ "step": 1025
1459
+ },
1460
+ {
1461
+ "loss": 0.0236,
1462
+ "grad_norm": 0.027881359681487083,
1463
+ "learning_rate": 4.415807560137457e-05,
1464
+ "epoch": 257.64,
1465
+ "step": 1030
1466
+ },
1467
+ {
1468
+ "loss": 0.0248,
1469
+ "grad_norm": 0.027970343828201294,
1470
+ "learning_rate": 4.38717067583047e-05,
1471
+ "epoch": 258.96,
1472
+ "step": 1035
1473
+ },
1474
+ {
1475
+ "loss": 0.0236,
1476
+ "grad_norm": 0.0961368978023529,
1477
+ "learning_rate": 4.3585337915234826e-05,
1478
+ "epoch": 260.0,
1479
+ "step": 1040
1480
+ },
1481
+ {
1482
+ "loss": 0.0231,
1483
+ "grad_norm": 0.03192312270402908,
1484
+ "learning_rate": 4.329896907216495e-05,
1485
+ "epoch": 261.32,
1486
+ "step": 1045
1487
+ },
1488
+ {
1489
+ "loss": 0.0244,
1490
+ "grad_norm": 0.03287699446082115,
1491
+ "learning_rate": 4.301260022909508e-05,
1492
+ "epoch": 262.64,
1493
+ "step": 1050
1494
+ },
1495
+ {
1496
+ "loss": 0.0231,
1497
+ "grad_norm": 0.03482283651828766,
1498
+ "learning_rate": 4.27262313860252e-05,
1499
+ "epoch": 263.96,
1500
+ "step": 1055
1501
+ },
1502
+ {
1503
+ "loss": 0.0246,
1504
+ "grad_norm": 0.12014977633953094,
1505
+ "learning_rate": 4.2439862542955325e-05,
1506
+ "epoch": 265.0,
1507
+ "step": 1060
1508
+ },
1509
+ {
1510
+ "loss": 0.0235,
1511
+ "grad_norm": 0.030348435044288635,
1512
+ "learning_rate": 4.215349369988545e-05,
1513
+ "epoch": 266.32,
1514
+ "step": 1065
1515
+ },
1516
+ {
1517
+ "loss": 0.0238,
1518
+ "grad_norm": 0.027197284623980522,
1519
+ "learning_rate": 4.1867124856815585e-05,
1520
+ "epoch": 267.64,
1521
+ "step": 1070
1522
+ },
1523
+ {
1524
+ "loss": 0.024,
1525
+ "grad_norm": 0.03164960816502571,
1526
+ "learning_rate": 4.158075601374571e-05,
1527
+ "epoch": 268.96,
1528
+ "step": 1075
1529
+ },
1530
+ {
1531
+ "loss": 0.0237,
1532
+ "grad_norm": 0.09021521359682083,
1533
+ "learning_rate": 4.129438717067583e-05,
1534
+ "epoch": 270.0,
1535
+ "step": 1080
1536
+ },
1537
+ {
1538
+ "loss": 0.024,
1539
+ "grad_norm": 0.03432054817676544,
1540
+ "learning_rate": 4.100801832760596e-05,
1541
+ "epoch": 271.32,
1542
+ "step": 1085
1543
+ },
1544
+ {
1545
+ "loss": 0.0224,
1546
+ "grad_norm": 0.029961712658405304,
1547
+ "learning_rate": 4.0721649484536084e-05,
1548
+ "epoch": 272.64,
1549
+ "step": 1090
1550
+ },
1551
+ {
1552
+ "loss": 0.0245,
1553
+ "grad_norm": 0.02801748737692833,
1554
+ "learning_rate": 4.043528064146621e-05,
1555
+ "epoch": 273.96,
1556
+ "step": 1095
1557
+ },
1558
+ {
1559
+ "loss": 0.0229,
1560
+ "grad_norm": 0.09304305166006088,
1561
+ "learning_rate": 4.014891179839634e-05,
1562
+ "epoch": 275.0,
1563
+ "step": 1100
1564
+ },
1565
+ {
1566
+ "loss": 0.0242,
1567
+ "grad_norm": 0.03154018521308899,
1568
+ "learning_rate": 3.9862542955326463e-05,
1569
+ "epoch": 276.32,
1570
+ "step": 1105
1571
+ },
1572
+ {
1573
+ "loss": 0.024,
1574
+ "grad_norm": 0.029925866052508354,
1575
+ "learning_rate": 3.957617411225659e-05,
1576
+ "epoch": 277.64,
1577
+ "step": 1110
1578
+ },
1579
+ {
1580
+ "loss": 0.0232,
1581
+ "grad_norm": 0.032234761863946915,
1582
+ "learning_rate": 3.9289805269186716e-05,
1583
+ "epoch": 278.96,
1584
+ "step": 1115
1585
+ },
1586
+ {
1587
+ "loss": 0.0238,
1588
+ "grad_norm": 0.09113281220197678,
1589
+ "learning_rate": 3.900343642611684e-05,
1590
+ "epoch": 280.0,
1591
+ "step": 1120
1592
+ },
1593
+ {
1594
+ "loss": 0.0242,
1595
+ "grad_norm": 0.03371744975447655,
1596
+ "learning_rate": 3.871706758304697e-05,
1597
+ "epoch": 281.32,
1598
+ "step": 1125
1599
+ },
1600
+ {
1601
+ "loss": 0.0234,
1602
+ "grad_norm": 0.033525336533784866,
1603
+ "learning_rate": 3.8430698739977096e-05,
1604
+ "epoch": 282.64,
1605
+ "step": 1130
1606
+ },
1607
+ {
1608
+ "loss": 0.0237,
1609
+ "grad_norm": 0.030558524653315544,
1610
+ "learning_rate": 3.8144329896907216e-05,
1611
+ "epoch": 283.96,
1612
+ "step": 1135
1613
+ },
1614
+ {
1615
+ "loss": 0.022,
1616
+ "grad_norm": 0.07060851901769638,
1617
+ "learning_rate": 3.785796105383734e-05,
1618
+ "epoch": 285.0,
1619
+ "step": 1140
1620
+ },
1621
+ {
1622
+ "loss": 0.0238,
1623
+ "grad_norm": 0.02952047996222973,
1624
+ "learning_rate": 3.757159221076747e-05,
1625
+ "epoch": 286.32,
1626
+ "step": 1145
1627
+ },
1628
+ {
1629
+ "loss": 0.0227,
1630
+ "grad_norm": 0.030197326093912125,
1631
+ "learning_rate": 3.7285223367697595e-05,
1632
+ "epoch": 287.64,
1633
+ "step": 1150
1634
+ },
1635
+ {
1636
+ "loss": 0.0232,
1637
+ "grad_norm": 0.028898609802126884,
1638
+ "learning_rate": 3.699885452462772e-05,
1639
+ "epoch": 288.96,
1640
+ "step": 1155
1641
+ },
1642
+ {
1643
+ "loss": 0.0236,
1644
+ "grad_norm": 0.10391610860824585,
1645
+ "learning_rate": 3.671248568155785e-05,
1646
+ "epoch": 290.0,
1647
+ "step": 1160
1648
+ },
1649
+ {
1650
+ "loss": 0.0238,
1651
+ "grad_norm": 0.0285499207675457,
1652
+ "learning_rate": 3.6426116838487974e-05,
1653
+ "epoch": 291.32,
1654
+ "step": 1165
1655
+ },
1656
+ {
1657
+ "loss": 0.0229,
1658
+ "grad_norm": 0.028268715366721153,
1659
+ "learning_rate": 3.61397479954181e-05,
1660
+ "epoch": 292.64,
1661
+ "step": 1170
1662
+ },
1663
+ {
1664
+ "loss": 0.0247,
1665
+ "grad_norm": 0.02961159311234951,
1666
+ "learning_rate": 3.585337915234823e-05,
1667
+ "epoch": 293.96,
1668
+ "step": 1175
1669
+ },
1670
+ {
1671
+ "loss": 0.0226,
1672
+ "grad_norm": 0.08803751319646835,
1673
+ "learning_rate": 3.5567010309278354e-05,
1674
+ "epoch": 295.0,
1675
+ "step": 1180
1676
+ },
1677
+ {
1678
+ "loss": 0.0244,
1679
+ "grad_norm": 0.03452374413609505,
1680
+ "learning_rate": 3.528064146620848e-05,
1681
+ "epoch": 296.32,
1682
+ "step": 1185
1683
+ },
1684
+ {
1685
+ "loss": 0.023,
1686
+ "grad_norm": 0.028895270079374313,
1687
+ "learning_rate": 3.49942726231386e-05,
1688
+ "epoch": 297.64,
1689
+ "step": 1190
1690
+ },
1691
+ {
1692
+ "loss": 0.0234,
1693
+ "grad_norm": 0.029182473197579384,
1694
+ "learning_rate": 3.4707903780068726e-05,
1695
+ "epoch": 298.96,
1696
+ "step": 1195
1697
+ },
1698
+ {
1699
+ "loss": 0.0235,
1700
+ "grad_norm": 0.11874058097600937,
1701
+ "learning_rate": 3.442153493699885e-05,
1702
+ "epoch": 300.0,
1703
+ "step": 1200
1704
+ },
1705
+ {
1706
+ "loss": 0.0237,
1707
+ "grad_norm": 0.030481066554784775,
1708
+ "learning_rate": 3.4135166093928986e-05,
1709
+ "epoch": 301.32,
1710
+ "step": 1205
1711
+ },
1712
+ {
1713
+ "loss": 0.023,
1714
+ "grad_norm": 0.03108309395611286,
1715
+ "learning_rate": 3.384879725085911e-05,
1716
+ "epoch": 302.64,
1717
+ "step": 1210
1718
+ },
1719
+ {
1720
+ "loss": 0.0228,
1721
+ "grad_norm": 0.03036290407180786,
1722
+ "learning_rate": 3.356242840778923e-05,
1723
+ "epoch": 303.96,
1724
+ "step": 1215
1725
+ },
1726
+ {
1727
+ "loss": 0.0223,
1728
+ "grad_norm": 0.07720436155796051,
1729
+ "learning_rate": 3.327605956471936e-05,
1730
+ "epoch": 305.0,
1731
+ "step": 1220
1732
+ },
1733
+ {
1734
+ "loss": 0.0235,
1735
+ "grad_norm": 0.03028162382543087,
1736
+ "learning_rate": 3.2989690721649485e-05,
1737
+ "epoch": 306.32,
1738
+ "step": 1225
1739
+ },
1740
+ {
1741
+ "loss": 0.0226,
1742
+ "grad_norm": 0.033151157200336456,
1743
+ "learning_rate": 3.270332187857961e-05,
1744
+ "epoch": 307.64,
1745
+ "step": 1230
1746
+ },
1747
+ {
1748
+ "loss": 0.0235,
1749
+ "grad_norm": 0.02951214276254177,
1750
+ "learning_rate": 3.241695303550974e-05,
1751
+ "epoch": 308.96,
1752
+ "step": 1235
1753
+ },
1754
+ {
1755
+ "loss": 0.0257,
1756
+ "grad_norm": 0.09070917963981628,
1757
+ "learning_rate": 3.2130584192439865e-05,
1758
+ "epoch": 310.0,
1759
+ "step": 1240
1760
+ },
1761
+ {
1762
+ "loss": 0.0248,
1763
+ "grad_norm": 0.03337477520108223,
1764
+ "learning_rate": 3.184421534936999e-05,
1765
+ "epoch": 311.32,
1766
+ "step": 1245
1767
+ },
1768
+ {
1769
+ "loss": 0.0226,
1770
+ "grad_norm": 0.03151268512010574,
1771
+ "learning_rate": 3.155784650630012e-05,
1772
+ "epoch": 312.64,
1773
+ "step": 1250
1774
+ },
1775
+ {
1776
+ "loss": 0.024,
1777
+ "grad_norm": 0.030940482392907143,
1778
+ "learning_rate": 3.1271477663230244e-05,
1779
+ "epoch": 313.96,
1780
+ "step": 1255
1781
+ },
1782
+ {
1783
+ "loss": 0.0236,
1784
+ "grad_norm": 0.09032298624515533,
1785
+ "learning_rate": 3.098510882016037e-05,
1786
+ "epoch": 315.0,
1787
+ "step": 1260
1788
+ },
1789
+ {
1790
+ "loss": 0.0222,
1791
+ "grad_norm": 0.029143668711185455,
1792
+ "learning_rate": 3.06987399770905e-05,
1793
+ "epoch": 316.32,
1794
+ "step": 1265
1795
+ },
1796
+ {
1797
+ "loss": 0.0246,
1798
+ "grad_norm": 0.029851289466023445,
1799
+ "learning_rate": 3.0412371134020617e-05,
1800
+ "epoch": 317.64,
1801
+ "step": 1270
1802
+ },
1803
+ {
1804
+ "loss": 0.023,
1805
+ "grad_norm": 0.03257305920124054,
1806
+ "learning_rate": 3.0126002290950743e-05,
1807
+ "epoch": 318.96,
1808
+ "step": 1275
1809
+ },
1810
+ {
1811
+ "loss": 0.0242,
1812
+ "grad_norm": 0.10195237398147583,
1813
+ "learning_rate": 2.983963344788087e-05,
1814
+ "epoch": 320.0,
1815
+ "step": 1280
1816
+ },
1817
+ {
1818
+ "loss": 0.0237,
1819
+ "grad_norm": 0.03116573579609394,
1820
+ "learning_rate": 2.9553264604811e-05,
1821
+ "epoch": 321.32,
1822
+ "step": 1285
1823
+ },
1824
+ {
1825
+ "loss": 0.0253,
1826
+ "grad_norm": 0.033235374838113785,
1827
+ "learning_rate": 2.9266895761741126e-05,
1828
+ "epoch": 322.64,
1829
+ "step": 1290
1830
+ },
1831
+ {
1832
+ "loss": 0.0234,
1833
+ "grad_norm": 0.03546692803502083,
1834
+ "learning_rate": 2.8980526918671253e-05,
1835
+ "epoch": 323.96,
1836
+ "step": 1295
1837
+ },
1838
+ {
1839
+ "loss": 0.0231,
1840
+ "grad_norm": 0.0778215229511261,
1841
+ "learning_rate": 2.8694158075601372e-05,
1842
+ "epoch": 325.0,
1843
+ "step": 1300
1844
+ },
1845
+ {
1846
+ "loss": 0.0241,
1847
+ "grad_norm": 0.029815560206770897,
1848
+ "learning_rate": 2.8407789232531502e-05,
1849
+ "epoch": 326.32,
1850
+ "step": 1305
1851
+ },
1852
+ {
1853
+ "loss": 0.0233,
1854
+ "grad_norm": 0.03497137874364853,
1855
+ "learning_rate": 2.812142038946163e-05,
1856
+ "epoch": 327.64,
1857
+ "step": 1310
1858
+ },
1859
+ {
1860
+ "loss": 0.0244,
1861
+ "grad_norm": 0.030050713568925858,
1862
+ "learning_rate": 2.7835051546391755e-05,
1863
+ "epoch": 328.96,
1864
+ "step": 1315
1865
+ },
1866
+ {
1867
+ "loss": 0.0231,
1868
+ "grad_norm": 0.09748966246843338,
1869
+ "learning_rate": 2.754868270332188e-05,
1870
+ "epoch": 330.0,
1871
+ "step": 1320
1872
+ },
1873
+ {
1874
+ "loss": 0.0235,
1875
+ "grad_norm": 0.0319872722029686,
1876
+ "learning_rate": 2.7262313860252005e-05,
1877
+ "epoch": 331.32,
1878
+ "step": 1325
1879
+ },
1880
+ {
1881
+ "loss": 0.0243,
1882
+ "grad_norm": 0.029525283724069595,
1883
+ "learning_rate": 2.697594501718213e-05,
1884
+ "epoch": 332.64,
1885
+ "step": 1330
1886
+ },
1887
+ {
1888
+ "loss": 0.0245,
1889
+ "grad_norm": 0.029868364334106445,
1890
+ "learning_rate": 2.6689576174112258e-05,
1891
+ "epoch": 333.96,
1892
+ "step": 1335
1893
+ },
1894
+ {
1895
+ "loss": 0.0212,
1896
+ "grad_norm": 0.07746418565511703,
1897
+ "learning_rate": 2.6403207331042384e-05,
1898
+ "epoch": 335.0,
1899
+ "step": 1340
1900
+ },
1901
+ {
1902
+ "loss": 0.0233,
1903
+ "grad_norm": 0.02571861259639263,
1904
+ "learning_rate": 2.611683848797251e-05,
1905
+ "epoch": 336.32,
1906
+ "step": 1345
1907
+ },
1908
+ {
1909
+ "loss": 0.0238,
1910
+ "grad_norm": 0.0320206955075264,
1911
+ "learning_rate": 2.5830469644902637e-05,
1912
+ "epoch": 337.64,
1913
+ "step": 1350
1914
+ },
1915
+ {
1916
+ "loss": 0.024,
1917
+ "grad_norm": 0.03084505721926689,
1918
+ "learning_rate": 2.554410080183276e-05,
1919
+ "epoch": 338.96,
1920
+ "step": 1355
1921
+ },
1922
+ {
1923
+ "loss": 0.0237,
1924
+ "grad_norm": 0.1282522976398468,
1925
+ "learning_rate": 2.5257731958762887e-05,
1926
+ "epoch": 340.0,
1927
+ "step": 1360
1928
+ },
1929
+ {
1930
+ "loss": 0.0239,
1931
+ "grad_norm": 0.03159436210989952,
1932
+ "learning_rate": 2.4971363115693013e-05,
1933
+ "epoch": 341.32,
1934
+ "step": 1365
1935
+ },
1936
+ {
1937
+ "loss": 0.023,
1938
+ "grad_norm": 0.03368183225393295,
1939
+ "learning_rate": 2.468499427262314e-05,
1940
+ "epoch": 342.64,
1941
+ "step": 1370
1942
+ },
1943
+ {
1944
+ "loss": 0.0232,
1945
+ "grad_norm": 0.02871900610625744,
1946
+ "learning_rate": 2.4398625429553266e-05,
1947
+ "epoch": 343.96,
1948
+ "step": 1375
1949
+ },
1950
+ {
1951
+ "loss": 0.0216,
1952
+ "grad_norm": 0.06527750939130783,
1953
+ "learning_rate": 2.4112256586483393e-05,
1954
+ "epoch": 345.0,
1955
+ "step": 1380
1956
+ },
1957
+ {
1958
+ "loss": 0.0246,
1959
+ "grad_norm": 0.029657971113920212,
1960
+ "learning_rate": 2.3825887743413516e-05,
1961
+ "epoch": 346.32,
1962
+ "step": 1385
1963
+ },
1964
+ {
1965
+ "loss": 0.0226,
1966
+ "grad_norm": 0.029672225937247276,
1967
+ "learning_rate": 2.3539518900343642e-05,
1968
+ "epoch": 347.64,
1969
+ "step": 1390
1970
+ },
1971
+ {
1972
+ "loss": 0.0234,
1973
+ "grad_norm": 0.032295338809490204,
1974
+ "learning_rate": 2.3253150057273772e-05,
1975
+ "epoch": 348.96,
1976
+ "step": 1395
1977
+ },
1978
+ {
1979
+ "loss": 0.0246,
1980
+ "grad_norm": 0.12228602916002274,
1981
+ "learning_rate": 2.2966781214203895e-05,
1982
+ "epoch": 350.0,
1983
+ "step": 1400
1984
+ },
1985
+ {
1986
+ "loss": 0.0244,
1987
+ "grad_norm": 0.031152470037341118,
1988
+ "learning_rate": 2.268041237113402e-05,
1989
+ "epoch": 351.32,
1990
+ "step": 1405
1991
+ },
1992
+ {
1993
+ "loss": 0.0241,
1994
+ "grad_norm": 0.03246377035975456,
1995
+ "learning_rate": 2.2394043528064148e-05,
1996
+ "epoch": 352.64,
1997
+ "step": 1410
1998
+ },
1999
+ {
2000
+ "loss": 0.0236,
2001
+ "grad_norm": 0.03664344921708107,
2002
+ "learning_rate": 2.210767468499427e-05,
2003
+ "epoch": 353.96,
2004
+ "step": 1415
2005
+ },
2006
+ {
2007
+ "loss": 0.0242,
2008
+ "grad_norm": 0.12599903345108032,
2009
+ "learning_rate": 2.18213058419244e-05,
2010
+ "epoch": 355.0,
2011
+ "step": 1420
2012
+ },
2013
+ {
2014
+ "loss": 0.023,
2015
+ "grad_norm": 0.03213375434279442,
2016
+ "learning_rate": 2.1534936998854528e-05,
2017
+ "epoch": 356.32,
2018
+ "step": 1425
2019
+ },
2020
+ {
2021
+ "loss": 0.0242,
2022
+ "grad_norm": 0.029569735750555992,
2023
+ "learning_rate": 2.124856815578465e-05,
2024
+ "epoch": 357.64,
2025
+ "step": 1430
2026
+ },
2027
+ {
2028
+ "loss": 0.0237,
2029
+ "grad_norm": 0.030345458537340164,
2030
+ "learning_rate": 2.0962199312714777e-05,
2031
+ "epoch": 358.96,
2032
+ "step": 1435
2033
+ },
2034
+ {
2035
+ "loss": 0.0225,
2036
+ "grad_norm": 0.07442766427993774,
2037
+ "learning_rate": 2.0675830469644904e-05,
2038
+ "epoch": 360.0,
2039
+ "step": 1440
2040
+ },
2041
+ {
2042
+ "loss": 0.0247,
2043
+ "grad_norm": 0.03161914646625519,
2044
+ "learning_rate": 2.038946162657503e-05,
2045
+ "epoch": 361.32,
2046
+ "step": 1445
2047
+ },
2048
+ {
2049
+ "loss": 0.0224,
2050
+ "grad_norm": 0.03342209383845329,
2051
+ "learning_rate": 2.0103092783505157e-05,
2052
+ "epoch": 362.64,
2053
+ "step": 1450
2054
+ },
2055
+ {
2056
+ "loss": 0.0226,
2057
+ "grad_norm": 0.029506616294384003,
2058
+ "learning_rate": 1.981672394043528e-05,
2059
+ "epoch": 363.96,
2060
+ "step": 1455
2061
+ },
2062
+ {
2063
+ "loss": 0.0251,
2064
+ "grad_norm": 0.13045279681682587,
2065
+ "learning_rate": 1.9530355097365406e-05,
2066
+ "epoch": 365.0,
2067
+ "step": 1460
2068
+ },
2069
+ {
2070
+ "loss": 0.0239,
2071
+ "grad_norm": 0.03303099796175957,
2072
+ "learning_rate": 1.9243986254295536e-05,
2073
+ "epoch": 366.32,
2074
+ "step": 1465
2075
+ },
2076
+ {
2077
+ "loss": 0.0221,
2078
+ "grad_norm": 0.02956564724445343,
2079
+ "learning_rate": 1.895761741122566e-05,
2080
+ "epoch": 367.64,
2081
+ "step": 1470
2082
+ },
2083
+ {
2084
+ "loss": 0.0234,
2085
+ "grad_norm": 0.03200279548764229,
2086
+ "learning_rate": 1.8671248568155786e-05,
2087
+ "epoch": 368.96,
2088
+ "step": 1475
2089
+ },
2090
+ {
2091
+ "loss": 0.0235,
2092
+ "grad_norm": 0.12507398426532745,
2093
+ "learning_rate": 1.8384879725085912e-05,
2094
+ "epoch": 370.0,
2095
+ "step": 1480
2096
+ },
2097
+ {
2098
+ "loss": 0.0233,
2099
+ "grad_norm": 0.03214867785573006,
2100
+ "learning_rate": 1.809851088201604e-05,
2101
+ "epoch": 371.32,
2102
+ "step": 1485
2103
+ },
2104
+ {
2105
+ "loss": 0.023,
2106
+ "grad_norm": 0.03199266269803047,
2107
+ "learning_rate": 1.7812142038946165e-05,
2108
+ "epoch": 372.64,
2109
+ "step": 1490
2110
+ },
2111
+ {
2112
+ "loss": 0.0246,
2113
+ "grad_norm": 0.027682902291417122,
2114
+ "learning_rate": 1.7525773195876288e-05,
2115
+ "epoch": 373.96,
2116
+ "step": 1495
2117
+ },
2118
+ {
2119
+ "loss": 0.0228,
2120
+ "grad_norm": 0.10432948172092438,
2121
+ "learning_rate": 1.7239404352806415e-05,
2122
+ "epoch": 375.0,
2123
+ "step": 1500
2124
+ },
2125
+ {
2126
+ "loss": 0.0235,
2127
+ "grad_norm": 0.03665570914745331,
2128
+ "learning_rate": 1.695303550973654e-05,
2129
+ "epoch": 376.32,
2130
+ "step": 1505
2131
+ },
2132
+ {
2133
+ "loss": 0.0228,
2134
+ "grad_norm": 0.03269299864768982,
2135
+ "learning_rate": 1.6666666666666667e-05,
2136
+ "epoch": 377.64,
2137
+ "step": 1510
2138
+ },
2139
+ {
2140
+ "loss": 0.0232,
2141
+ "grad_norm": 0.030298851430416107,
2142
+ "learning_rate": 1.6380297823596794e-05,
2143
+ "epoch": 378.96,
2144
+ "step": 1515
2145
+ },
2146
+ {
2147
+ "loss": 0.024,
2148
+ "grad_norm": 0.1330370008945465,
2149
+ "learning_rate": 1.609392898052692e-05,
2150
+ "epoch": 380.0,
2151
+ "step": 1520
2152
+ },
2153
+ {
2154
+ "loss": 0.0232,
2155
+ "grad_norm": 0.026194848120212555,
2156
+ "learning_rate": 1.5807560137457044e-05,
2157
+ "epoch": 381.32,
2158
+ "step": 1525
2159
+ },
2160
+ {
2161
+ "loss": 0.024,
2162
+ "grad_norm": 0.030696984380483627,
2163
+ "learning_rate": 1.5521191294387173e-05,
2164
+ "epoch": 382.64,
2165
+ "step": 1530
2166
+ },
2167
+ {
2168
+ "loss": 0.0237,
2169
+ "grad_norm": 0.03159346804022789,
2170
+ "learning_rate": 1.5234822451317298e-05,
2171
+ "epoch": 383.96,
2172
+ "step": 1535
2173
+ },
2174
+ {
2175
+ "loss": 0.024,
2176
+ "grad_norm": 0.0895160585641861,
2177
+ "learning_rate": 1.4948453608247423e-05,
2178
+ "epoch": 385.0,
2179
+ "step": 1540
2180
+ },
2181
+ {
2182
+ "loss": 0.0226,
2183
+ "grad_norm": 0.030342400074005127,
2184
+ "learning_rate": 1.466208476517755e-05,
2185
+ "epoch": 386.32,
2186
+ "step": 1545
2187
+ },
2188
+ {
2189
+ "loss": 0.0241,
2190
+ "grad_norm": 0.03451743721961975,
2191
+ "learning_rate": 1.4375715922107674e-05,
2192
+ "epoch": 387.64,
2193
+ "step": 1550
2194
+ },
2195
+ {
2196
+ "loss": 0.0224,
2197
+ "grad_norm": 0.034534044563770294,
2198
+ "learning_rate": 1.40893470790378e-05,
2199
+ "epoch": 388.96,
2200
+ "step": 1555
2201
+ },
2202
+ {
2203
+ "loss": 0.024,
2204
+ "grad_norm": 0.11649748682975769,
2205
+ "learning_rate": 1.3802978235967929e-05,
2206
+ "epoch": 390.0,
2207
+ "step": 1560
2208
+ },
2209
+ {
2210
+ "loss": 0.0232,
2211
+ "grad_norm": 0.02730483002960682,
2212
+ "learning_rate": 1.3516609392898052e-05,
2213
+ "epoch": 391.32,
2214
+ "step": 1565
2215
+ },
2216
+ {
2217
+ "loss": 0.0245,
2218
+ "grad_norm": 0.03302980959415436,
2219
+ "learning_rate": 1.323024054982818e-05,
2220
+ "epoch": 392.64,
2221
+ "step": 1570
2222
+ },
2223
+ {
2224
+ "loss": 0.0231,
2225
+ "grad_norm": 0.030424287542700768,
2226
+ "learning_rate": 1.2943871706758307e-05,
2227
+ "epoch": 393.96,
2228
+ "step": 1575
2229
+ },
2230
+ {
2231
+ "loss": 0.0233,
2232
+ "grad_norm": 0.09190870821475983,
2233
+ "learning_rate": 1.2657502863688431e-05,
2234
+ "epoch": 395.0,
2235
+ "step": 1580
2236
+ },
2237
+ {
2238
+ "loss": 0.0237,
2239
+ "grad_norm": 0.03016272746026516,
2240
+ "learning_rate": 1.2371134020618558e-05,
2241
+ "epoch": 396.32,
2242
+ "step": 1585
2243
+ },
2244
+ {
2245
+ "loss": 0.0237,
2246
+ "grad_norm": 0.029102135449647903,
2247
+ "learning_rate": 1.2084765177548683e-05,
2248
+ "epoch": 397.64,
2249
+ "step": 1590
2250
+ },
2251
+ {
2252
+ "loss": 0.0238,
2253
+ "grad_norm": 0.030849164351820946,
2254
+ "learning_rate": 1.1798396334478809e-05,
2255
+ "epoch": 398.96,
2256
+ "step": 1595
2257
+ },
2258
+ {
2259
+ "loss": 0.0223,
2260
+ "grad_norm": 0.09185610711574554,
2261
+ "learning_rate": 1.1512027491408934e-05,
2262
+ "epoch": 400.0,
2263
+ "step": 1600
2264
+ },
2265
+ {
2266
+ "loss": 0.0228,
2267
+ "grad_norm": 0.030718082562088966,
2268
+ "learning_rate": 1.1225658648339062e-05,
2269
+ "epoch": 401.32,
2270
+ "step": 1605
2271
+ },
2272
+ {
2273
+ "loss": 0.0238,
2274
+ "grad_norm": 0.028845084831118584,
2275
+ "learning_rate": 1.0939289805269187e-05,
2276
+ "epoch": 402.64,
2277
+ "step": 1610
2278
+ },
2279
+ {
2280
+ "loss": 0.0241,
2281
+ "grad_norm": 0.03036542609333992,
2282
+ "learning_rate": 1.0652920962199313e-05,
2283
+ "epoch": 403.96,
2284
+ "step": 1615
2285
+ },
2286
+ {
2287
+ "loss": 0.0234,
2288
+ "grad_norm": 0.10246625542640686,
2289
+ "learning_rate": 1.036655211912944e-05,
2290
+ "epoch": 405.0,
2291
+ "step": 1620
2292
+ },
2293
+ {
2294
+ "loss": 0.0238,
2295
+ "grad_norm": 0.03127530962228775,
2296
+ "learning_rate": 1.0080183276059566e-05,
2297
+ "epoch": 406.32,
2298
+ "step": 1625
2299
+ },
2300
+ {
2301
+ "loss": 0.0226,
2302
+ "grad_norm": 0.036298803985118866,
2303
+ "learning_rate": 9.793814432989691e-06,
2304
+ "epoch": 407.64,
2305
+ "step": 1630
2306
+ },
2307
+ {
2308
+ "loss": 0.0231,
2309
+ "grad_norm": 0.028423035517334938,
2310
+ "learning_rate": 9.507445589919818e-06,
2311
+ "epoch": 408.96,
2312
+ "step": 1635
2313
+ },
2314
+ {
2315
+ "loss": 0.0218,
2316
+ "grad_norm": 0.07871800661087036,
2317
+ "learning_rate": 9.221076746849944e-06,
2318
+ "epoch": 410.0,
2319
+ "step": 1640
2320
+ },
2321
+ {
2322
+ "loss": 0.0242,
2323
+ "grad_norm": 0.0336175374686718,
2324
+ "learning_rate": 8.934707903780069e-06,
2325
+ "epoch": 411.32,
2326
+ "step": 1645
2327
+ },
2328
+ {
2329
+ "loss": 0.0227,
2330
+ "grad_norm": 0.03624117374420166,
2331
+ "learning_rate": 8.648339060710195e-06,
2332
+ "epoch": 412.64,
2333
+ "step": 1650
2334
+ },
2335
+ {
2336
+ "loss": 0.0218,
2337
+ "grad_norm": 0.03119911253452301,
2338
+ "learning_rate": 8.36197021764032e-06,
2339
+ "epoch": 413.96,
2340
+ "step": 1655
2341
+ },
2342
+ {
2343
+ "loss": 0.0227,
2344
+ "grad_norm": 0.09461841732263565,
2345
+ "learning_rate": 8.075601374570448e-06,
2346
+ "epoch": 415.0,
2347
+ "step": 1660
2348
+ },
2349
+ {
2350
+ "loss": 0.0233,
2351
+ "grad_norm": 0.02897919900715351,
2352
+ "learning_rate": 7.789232531500573e-06,
2353
+ "epoch": 416.32,
2354
+ "step": 1665
2355
+ },
2356
+ {
2357
+ "loss": 0.0234,
2358
+ "grad_norm": 0.03222072497010231,
2359
+ "learning_rate": 7.502863688430699e-06,
2360
+ "epoch": 417.64,
2361
+ "step": 1670
2362
+ },
2363
+ {
2364
+ "loss": 0.0231,
2365
+ "grad_norm": 0.02793605998158455,
2366
+ "learning_rate": 7.216494845360824e-06,
2367
+ "epoch": 418.96,
2368
+ "step": 1675
2369
+ },
2370
+ {
2371
+ "loss": 0.0242,
2372
+ "grad_norm": 0.10282719135284424,
2373
+ "learning_rate": 6.930126002290952e-06,
2374
+ "epoch": 420.0,
2375
+ "step": 1680
2376
+ },
2377
+ {
2378
+ "loss": 0.0228,
2379
+ "grad_norm": 0.029103396460413933,
2380
+ "learning_rate": 6.643757159221077e-06,
2381
+ "epoch": 421.32,
2382
+ "step": 1685
2383
+ },
2384
+ {
2385
+ "loss": 0.0229,
2386
+ "grad_norm": 0.027615424245595932,
2387
+ "learning_rate": 6.357388316151203e-06,
2388
+ "epoch": 422.64,
2389
+ "step": 1690
2390
+ },
2391
+ {
2392
+ "loss": 0.023,
2393
+ "grad_norm": 0.03273004665970802,
2394
+ "learning_rate": 6.071019473081329e-06,
2395
+ "epoch": 423.96,
2396
+ "step": 1695
2397
+ },
2398
+ {
2399
+ "loss": 0.0242,
2400
+ "grad_norm": 0.088851198554039,
2401
+ "learning_rate": 5.784650630011455e-06,
2402
+ "epoch": 425.0,
2403
+ "step": 1700
2404
+ },
2405
+ {
2406
+ "loss": 0.0236,
2407
+ "grad_norm": 0.031545545905828476,
2408
+ "learning_rate": 5.498281786941581e-06,
2409
+ "epoch": 426.32,
2410
+ "step": 1705
2411
+ },
2412
+ {
2413
+ "loss": 0.0231,
2414
+ "grad_norm": 0.03436841815710068,
2415
+ "learning_rate": 5.211912943871707e-06,
2416
+ "epoch": 427.64,
2417
+ "step": 1710
2418
+ },
2419
+ {
2420
+ "loss": 0.023,
2421
+ "grad_norm": 0.03470204398036003,
2422
+ "learning_rate": 4.925544100801833e-06,
2423
+ "epoch": 428.96,
2424
+ "step": 1715
2425
+ },
2426
+ {
2427
+ "loss": 0.0233,
2428
+ "grad_norm": 0.0859316810965538,
2429
+ "learning_rate": 4.639175257731959e-06,
2430
+ "epoch": 430.0,
2431
+ "step": 1720
2432
+ },
2433
+ {
2434
+ "loss": 0.0215,
2435
+ "grad_norm": 0.02714327722787857,
2436
+ "learning_rate": 4.352806414662085e-06,
2437
+ "epoch": 431.32,
2438
+ "step": 1725
2439
+ },
2440
+ {
2441
+ "loss": 0.0233,
2442
+ "grad_norm": 0.03115593083202839,
2443
+ "learning_rate": 4.066437571592211e-06,
2444
+ "epoch": 432.64,
2445
+ "step": 1730
2446
+ },
2447
+ {
2448
+ "loss": 0.0222,
2449
+ "grad_norm": 0.03160055726766586,
2450
+ "learning_rate": 3.7800687285223365e-06,
2451
+ "epoch": 433.96,
2452
+ "step": 1735
2453
+ },
2454
+ {
2455
+ "loss": 0.0221,
2456
+ "grad_norm": 0.10642414540052414,
2457
+ "learning_rate": 3.493699885452463e-06,
2458
+ "epoch": 435.0,
2459
+ "step": 1740
2460
+ },
2461
+ {
2462
+ "loss": 0.024,
2463
+ "grad_norm": 0.029918361455202103,
2464
+ "learning_rate": 3.2073310423825886e-06,
2465
+ "epoch": 436.32,
2466
+ "step": 1745
2467
+ },
2468
+ {
2469
+ "loss": 0.023,
2470
+ "grad_norm": 0.030128490179777145,
2471
+ "learning_rate": 2.920962199312715e-06,
2472
+ "epoch": 437.64,
2473
+ "step": 1750
2474
+ },
2475
+ {
2476
+ "loss": 0.0231,
2477
+ "grad_norm": 0.03472098708152771,
2478
+ "learning_rate": 2.6345933562428407e-06,
2479
+ "epoch": 438.96,
2480
+ "step": 1755
2481
+ },
2482
+ {
2483
+ "loss": 0.0238,
2484
+ "grad_norm": 0.10841913521289825,
2485
+ "learning_rate": 2.3482245131729668e-06,
2486
+ "epoch": 440.0,
2487
+ "step": 1760
2488
+ },
2489
+ {
2490
+ "loss": 0.0241,
2491
+ "grad_norm": 0.03282919153571129,
2492
+ "learning_rate": 2.061855670103093e-06,
2493
+ "epoch": 441.32,
2494
+ "step": 1765
2495
+ },
2496
+ {
2497
+ "loss": 0.0234,
2498
+ "grad_norm": 0.030162909999489784,
2499
+ "learning_rate": 1.7754868270332189e-06,
2500
+ "epoch": 442.64,
2501
+ "step": 1770
2502
+ },
2503
+ {
2504
+ "loss": 0.0225,
2505
+ "grad_norm": 0.032848529517650604,
2506
+ "learning_rate": 1.4891179839633447e-06,
2507
+ "epoch": 443.96,
2508
+ "step": 1775
2509
+ },
2510
+ {
2511
+ "loss": 0.023,
2512
+ "grad_norm": 0.09595301747322083,
2513
+ "learning_rate": 1.202749140893471e-06,
2514
+ "epoch": 445.0,
2515
+ "step": 1780
2516
+ },
2517
+ {
2518
+ "loss": 0.0242,
2519
+ "grad_norm": 0.027366334572434425,
2520
+ "learning_rate": 9.163802978235968e-07,
2521
+ "epoch": 446.32,
2522
+ "step": 1785
2523
+ },
2524
+ {
2525
+ "loss": 0.0243,
2526
+ "grad_norm": 0.029810229316353798,
2527
+ "learning_rate": 6.300114547537229e-07,
2528
+ "epoch": 447.64,
2529
+ "step": 1790
2530
+ },
2531
+ {
2532
+ "loss": 0.0238,
2533
+ "grad_norm": 0.03164233639836311,
2534
+ "learning_rate": 3.436426116838488e-07,
2535
+ "epoch": 448.96,
2536
+ "step": 1795
2537
+ },
2538
+ {
2539
+ "loss": 0.0246,
2540
+ "grad_norm": 0.11831732094287872,
2541
+ "learning_rate": 5.72737686139748e-08,
2542
+ "epoch": 450.0,
2543
+ "step": 1800
2544
+ },
2545
+ {
2546
+ "train_runtime": 30940.5798,
2547
+ "train_samples_per_second": 1.862,
2548
+ "train_steps_per_second": 0.058,
2549
+ "total_flos": 1.5308141101056e+18,
2550
+ "train_loss": 0.049091512378719115,
2551
+ "epoch": 450.0,
2552
+ "step": 1800
2553
+ }
2554
+ ],
2555
+ "training_args": {
2556
+ "output_dir": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/train_lora/outputs/Math_QA/group_09/checkpoints",
2557
+ "overwrite_output_dir": false,
2558
+ "do_train": false,
2559
+ "do_eval": false,
2560
+ "do_predict": false,
2561
+ "eval_strategy": "no",
2562
+ "prediction_loss_only": false,
2563
+ "per_device_train_batch_size": 2,
2564
+ "per_device_eval_batch_size": 8,
2565
+ "per_gpu_train_batch_size": null,
2566
+ "per_gpu_eval_batch_size": null,
2567
+ "gradient_accumulation_steps": 16,
2568
+ "eval_accumulation_steps": null,
2569
+ "eval_delay": 0,
2570
+ "torch_empty_cache_steps": null,
2571
+ "learning_rate": 0.0001,
2572
+ "weight_decay": 0.01,
2573
+ "adam_beta1": 0.9,
2574
+ "adam_beta2": 0.999,
2575
+ "adam_epsilon": 1e-08,
2576
+ "max_grad_norm": 1.0,
2577
+ "num_train_epochs": 12,
2578
+ "max_steps": 1800,
2579
+ "lr_scheduler_type": "linear",
2580
+ "lr_scheduler_kwargs": {},
2581
+ "warmup_ratio": 0.03,
2582
+ "warmup_steps": 0,
2583
+ "log_level": "passive",
2584
+ "log_level_replica": "warning",
2585
+ "log_on_each_node": true,
2586
+ "logging_dir": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/train_lora/logs/Math_QA/group_09",
2587
+ "logging_strategy": "steps",
2588
+ "logging_first_step": true,
2589
+ "logging_steps": 5,
2590
+ "logging_nan_inf_filter": true,
2591
+ "save_strategy": "steps",
2592
+ "save_steps": 300,
2593
+ "save_total_limit": 6,
2594
+ "save_safetensors": true,
2595
+ "save_on_each_node": false,
2596
+ "save_only_model": false,
2597
+ "restore_callback_states_from_checkpoint": false,
2598
+ "no_cuda": false,
2599
+ "use_cpu": false,
2600
+ "use_mps_device": false,
2601
+ "seed": 42,
2602
+ "data_seed": null,
2603
+ "jit_mode_eval": false,
2604
+ "bf16": true,
2605
+ "fp16": false,
2606
+ "fp16_opt_level": "O1",
2607
+ "half_precision_backend": "auto",
2608
+ "bf16_full_eval": false,
2609
+ "fp16_full_eval": false,
2610
+ "tf32": null,
2611
+ "local_rank": 0,
2612
+ "ddp_backend": null,
2613
+ "tpu_num_cores": null,
2614
+ "tpu_metrics_debug": false,
2615
+ "debug": [],
2616
+ "dataloader_drop_last": false,
2617
+ "eval_steps": null,
2618
+ "dataloader_num_workers": 0,
2619
+ "dataloader_prefetch_factor": null,
2620
+ "past_index": -1,
2621
+ "run_name": null,
2622
+ "disable_tqdm": false,
2623
+ "remove_unused_columns": true,
2624
+ "label_names": null,
2625
+ "load_best_model_at_end": false,
2626
+ "metric_for_best_model": null,
2627
+ "greater_is_better": null,
2628
+ "ignore_data_skip": false,
2629
+ "fsdp": [],
2630
+ "fsdp_min_num_params": 0,
2631
+ "fsdp_config": {
2632
+ "min_num_params": 0,
2633
+ "xla": false,
2634
+ "xla_fsdp_v2": false,
2635
+ "xla_fsdp_grad_ckpt": false
2636
+ },
2637
+ "fsdp_transformer_layer_cls_to_wrap": null,
2638
+ "accelerator_config": {
2639
+ "split_batches": false,
2640
+ "dispatch_batches": null,
2641
+ "even_batches": true,
2642
+ "use_seedable_sampler": true,
2643
+ "non_blocking": false,
2644
+ "gradient_accumulation_kwargs": null
2645
+ },
2646
+ "parallelism_config": null,
2647
+ "deepspeed": null,
2648
+ "label_smoothing_factor": 0.0,
2649
+ "optim": "adamw_torch",
2650
+ "optim_args": null,
2651
+ "adafactor": false,
2652
+ "group_by_length": false,
2653
+ "length_column_name": "length",
2654
+ "report_to": [],
2655
+ "project": "huggingface",
2656
+ "trackio_space_id": "trackio",
2657
+ "ddp_find_unused_parameters": null,
2658
+ "ddp_bucket_cap_mb": null,
2659
+ "ddp_broadcast_buffers": null,
2660
+ "dataloader_pin_memory": true,
2661
+ "dataloader_persistent_workers": false,
2662
+ "skip_memory_metrics": true,
2663
+ "use_legacy_prediction_loop": false,
2664
+ "push_to_hub": false,
2665
+ "resume_from_checkpoint": null,
2666
+ "hub_model_id": null,
2667
+ "hub_strategy": "every_save",
2668
+ "hub_token": "<HUB_TOKEN>",
2669
+ "hub_private_repo": null,
2670
+ "hub_always_push": false,
2671
+ "hub_revision": null,
2672
+ "gradient_checkpointing": true,
2673
+ "gradient_checkpointing_kwargs": null,
2674
+ "include_inputs_for_metrics": false,
2675
+ "include_for_metrics": [],
2676
+ "eval_do_concat_batches": true,
2677
+ "fp16_backend": "auto",
2678
+ "push_to_hub_model_id": null,
2679
+ "push_to_hub_organization": null,
2680
+ "push_to_hub_token": "<PUSH_TO_HUB_TOKEN>",
2681
+ "mp_parameters": "",
2682
+ "auto_find_batch_size": false,
2683
+ "full_determinism": false,
2684
+ "torchdynamo": null,
2685
+ "ray_scope": "last",
2686
+ "ddp_timeout": 1800,
2687
+ "torch_compile": false,
2688
+ "torch_compile_backend": null,
2689
+ "torch_compile_mode": null,
2690
+ "include_tokens_per_second": false,
2691
+ "include_num_input_tokens_seen": "no",
2692
+ "neftune_noise_alpha": null,
2693
+ "optim_target_modules": null,
2694
+ "batch_eval_metrics": false,
2695
+ "eval_on_start": false,
2696
+ "use_liger_kernel": false,
2697
+ "liger_kernel_config": null,
2698
+ "eval_use_gather_object": false,
2699
+ "average_tokens_across_devices": true
2700
+ },
2701
+ "lora_config": {
2702
+ "r": 64,
2703
+ "alpha": 128,
2704
+ "dropout": 0.05,
2705
+ "target_modules": [
2706
+ "q_proj",
2707
+ "k_proj",
2708
+ "v_proj",
2709
+ "o_proj",
2710
+ "gate_proj",
2711
+ "up_proj",
2712
+ "down_proj"
2713
+ ]
2714
+ },
2715
+ "effective_batch_size": 32,
2716
+ "world_size": 1,
2717
+ "git_commit": ""
2718
+ }
Math_QA/group_09/prompt_group.json ADDED
@@ -0,0 +1,613 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dataset_name": "Math_QA",
3
+ "group_index": 9,
4
+ "source_file": "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/prepare/data/math/Math_QA.json",
5
+ "selected_indices": [
6
+ 142,
7
+ 181,
8
+ 659,
9
+ 708,
10
+ 738,
11
+ 820,
12
+ 997,
13
+ 1097,
14
+ 1185,
15
+ 1268,
16
+ 1293,
17
+ 1454,
18
+ 1463,
19
+ 1552,
20
+ 1649,
21
+ 1692,
22
+ 1903,
23
+ 2038,
24
+ 2076,
25
+ 2098,
26
+ 2107,
27
+ 2389,
28
+ 2401,
29
+ 2452,
30
+ 2577,
31
+ 2616,
32
+ 2619,
33
+ 2646,
34
+ 2739,
35
+ 2823,
36
+ 2993,
37
+ 3049,
38
+ 3183,
39
+ 3378,
40
+ 3386,
41
+ 3454,
42
+ 3517,
43
+ 3526,
44
+ 3539,
45
+ 3711,
46
+ 3887,
47
+ 3994,
48
+ 4085,
49
+ 4113,
50
+ 4149,
51
+ 4251,
52
+ 4330,
53
+ 4449,
54
+ 4521,
55
+ 4557,
56
+ 4623,
57
+ 4646,
58
+ 4651,
59
+ 4680,
60
+ 4854,
61
+ 4870,
62
+ 4927,
63
+ 4937,
64
+ 5025,
65
+ 5039,
66
+ 5118,
67
+ 5131,
68
+ 5198,
69
+ 5213,
70
+ 5241,
71
+ 5311,
72
+ 5437,
73
+ 5441,
74
+ 5839,
75
+ 5908,
76
+ 5918,
77
+ 6051,
78
+ 6096,
79
+ 6113,
80
+ 6340,
81
+ 6345,
82
+ 6395,
83
+ 6424,
84
+ 6742,
85
+ 6796,
86
+ 6812,
87
+ 6847,
88
+ 6879,
89
+ 7107,
90
+ 7129,
91
+ 7652,
92
+ 7854,
93
+ 8037,
94
+ 8177,
95
+ 8246,
96
+ 8337,
97
+ 9033,
98
+ 9248,
99
+ 9274,
100
+ 9379,
101
+ 9483,
102
+ 9523,
103
+ 9581,
104
+ 9777,
105
+ 9816
106
+ ],
107
+ "total_records": 10000,
108
+ "sample_count": 100,
109
+ "generated_at": "2025-11-05T12:58:28Z",
110
+ "seed": 760231939,
111
+ "samples": [
112
+ {
113
+ "prompt": "what is the 26 th digit to the right of the decimal point in the decimal form of 6 / 11 ?\na ) 3 , b ) 4 , c ) 5 , d ) 6 , e ) 7",
114
+ "response": "\"6 / 11 = 0.54545454 . . . the even numbered positions in the decimal expansion are all 4 . the answer is b .\"",
115
+ "system": ""
116
+ },
117
+ {
118
+ "prompt": "arjun started a business with rs . 40,000 and is joined afterwards by anoop with rs . 80 , 000 . after how many months did anoop join if the profits at the end of the year are divided equally ?\na ) 3 , b ) 4 , c ) 5 , d ) 6 , e ) 7",
119
+ "response": "\"suppose anoop joined after 3 months . then , 40000 * 12 = 80000 * ( 12 – x ) = > x = 6 . answer : d\"",
120
+ "system": ""
121
+ },
122
+ {
123
+ "prompt": "if log 2 = 0.3010 and log 3 = 0.4771 , the value of log 5 ( 512 )\na ) 3.876 , b ) 2.967 , c ) 2.87 , d ) 3.912 , e ) 1.9",
124
+ "response": "\"log 5 ( 512 ) = log ( 512 ) / log 5 = log 2 ^ 9 / log ( 10 / 2 ) = 9 log 2 / ( log 10 - log 2 ) = ( 9 x 0.3010 ) / ( 1 - 0.3010 ) = 2.709 / 0.699 = 2709 / 699 = 3.876 answer is a .\"",
125
+ "system": ""
126
+ },
127
+ {
128
+ "prompt": "4,25 , 49,121 , 169,289 , 361,529 ,\na ) 149 , b ) 169 , c ) 841 , d ) 920 , e ) 219",
129
+ "response": "\"29 ^ 2 = 841 because follow sequence of square of the prime numbers answer : c\"",
130
+ "system": ""
131
+ },
132
+ {
133
+ "prompt": "a survey of employers found that during 1993 employment costs rose 3.5 percent , where employment costs consist of salary costs and fringe - benefit costs . if salary costs rose 3 percent and fringe - benefit costs rose 6.5 percent during 1993 , then fringe - benefit costs represented what percent of employment costs at the beginning of 1993 ?\na ) 16.5 % , b ) 14 % , c ) 35 % , d ) 55 % , e ) 65 %",
134
+ "response": "the amount by which employment costs rose is equal to 0.035 ( salary costs + fringe benefit costs ) ; on the other hand the amount by which employment costs rose is equal to 0.03 * salary costs + 0.065 * fringe benefit costs ; so , 35 ( s + f ) = 30 s + 65 f - - > s = 6 f - - > f / s = 1 / 6 - - > f / ( s + f ) = 1 / ( 1 + 6 ) = 1 / 7 = 0.14 . answer : b .",
135
+ "system": ""
136
+ },
137
+ {
138
+ "prompt": "if x and y are both odd prime numbers and x < y , how many distinct positive integer w factors does 2 xy have ?\na ) 3 , b ) 4 , c ) 6 , d ) 8 , e ) 12",
139
+ "response": "since 2 xy prime w factors are x ^ 1 * y ^ 1 * 2 ^ 1 , its total number or factors must be ( 1 + 1 ) ( 1 + 1 ) ( 1 + 1 ) = 2 ^ 3 = 8 . thus , i think d would be the correct answer .",
140
+ "system": ""
141
+ },
142
+ {
143
+ "prompt": "p alone can complete a job in 5 days . the work done by q alone in one day is equal to one - fourth of the work done by p alone in one day . in how many days can the work be completed if p and q work together ?\na ) 2.5 , b ) 3.0 , c ) 3.5 , d ) 4.0 , e ) 4.5",
144
+ "response": "\"p ' s rate is 1 / 5 q ' s rate is 1 / 20 the combined rate is 1 / 5 + 1 / 20 = 1 / 4 if they work together , the job will take 4 days . the answer is d .\"",
145
+ "system": ""
146
+ },
147
+ {
148
+ "prompt": "a library branch originally contained 18360 volumes , 30 % of which were fiction novels . 1 / 3 of the volumes were transferred to another location and 1 / 3 of the volumes transferred were fiction novels . what percent of the remaining collection was fiction novels ?\na ) 2.5 % , b ) 17.67 % , c ) 28.3 % , d ) 45.2 % , e ) 73.6 %",
149
+ "response": "\"as everything is either fraction or percentage , the given figure 18360 is just a false alarm . we can do this by assuming that originally the branch had 100 volumes . originally : total - 100 fiction - 30 transferred : total - 33 ( one third of original total ) fiction - 11 ( one third of those transferred ) remaining : total - 100 − 33 = 67100 − 33 = 67 fiction - 30 − 11 = 1930 − 11 = 19 to find : 19 is what percent of 67 28.3 option c\"",
150
+ "system": ""
151
+ },
152
+ {
153
+ "prompt": "of the goose eggs laid at a certain pond , 1 / 4 hatched and 4 / 5 of the geese that hatched from those eggs survived the first month . of the geese that survived the first month , 2 / 5 did not survive the first year . if 120 geese survived the first year and if no more than one goose hatched from each egg , how many goose eggs were laid at the pond ?\na ) 600 , b ) 700 , c ) 800 , d ) 900 , e ) 1000",
154
+ "response": "\"let x be the number of eggs that were laid . ( 3 / 5 ) ( 4 / 5 ) ( 1 / 4 ) x = 120 ( 12 / 100 ) x = 120 x = 1000 the answer is e .\"",
155
+ "system": ""
156
+ },
157
+ {
158
+ "prompt": "solution x is 40 % chemical a and 60 % chemical b by volume . solution y is 50 % chemical a and 50 % chemical b by volume . if a mixture of x and y is 47 % chemical a , what percent of the mixture is solution x ?\na ) 15 % , b ) 20 % , c ) 25 % , d ) 30 % , e ) 35 %",
159
+ "response": "\"the volume of the mixture be x + y . 0.4 x + 0.5 y = 0.47 ( x + y ) x = 3 y / 7 x / ( x + y ) = ( 3 y / 7 ) / ( 10 y / 7 ) = 3 / 10 = 30 % . the answer is d .\"",
160
+ "system": ""
161
+ },
162
+ {
163
+ "prompt": "find the fraction which has the same ratio to 2 / 6 that 3 / 4 has to 1 / 2\na ) 1 / 2 , b ) 2 / 5 , c ) 6 / 8 , d ) 9 / 4 , e ) 7 / 5",
164
+ "response": "\"p : 2 / 6 = 3 / 4 : 1 / 2 as the product of the means is equal to the product of the extremes . p * 1 / 2 = 2 / 6 * 3 / 4 p * 1 / 2 = 6 / 24 p = 1 / 2 = > p = 1 / 2 answer : a\"",
165
+ "system": ""
166
+ },
167
+ {
168
+ "prompt": "a man invests in a 16 % stock at 128 . the interest obtained by him is\na ) 22.5 % , b ) 42.5 % , c ) 12.5 % , d ) 62.5 % , e ) 82.5 %",
169
+ "response": "by investing rs 128 , income derived = rs . 16 by investing rs . 100 , income derived = = rs . 12.5 interest obtained = 12.5 % answer : c",
170
+ "system": ""
171
+ },
172
+ {
173
+ "prompt": "13 different biology books and 8 different chemistry books lie on a shelf . in how many ways can a student pick 2 books of each type ?\na ) 80 , b ) 160 , c ) 720 , d ) 1100 , e ) 2184",
174
+ "response": "\"no . of ways of picking 2 biology books ( from 13 books ) = 13 c 2 = ( 13 * 12 ) / 2 = 78 no . of ways of picking 2 chemistry books ( from 8 books ) = 8 c 2 = ( 8 * 7 ) / 2 = 28 total ways of picking 2 books of each type = 78 * 28 = 2184 ( option e )\"",
175
+ "system": ""
176
+ },
177
+ {
178
+ "prompt": "there is food for 760 men for 22 days . how many more men should join after two days so that the same food may last for 16 days more ?\na ) 122 , b ) 140 , c ) 199 , d ) 188 , e ) 190",
179
+ "response": "\"760 - - - - 22 760 - - - - 20 x - - - - - 16 x * 16 = 760 * 20 x = 950 760 - - - - - - - 190 answer : e\"",
180
+ "system": ""
181
+ },
182
+ {
183
+ "prompt": "if albert ’ s monthly earnings rise by 20 % , he would earn $ 560 . if , instead , his earnings rise by only 21 % , how much ( in $ ) would he earn this month ?\na ) 643 , b ) 652 , c ) 660 , d ) 564 , e ) 693",
184
+ "response": "\"= 560 / 1.2 ∗ 1.21 = 564 = 564 answer is d\"",
185
+ "system": ""
186
+ },
187
+ {
188
+ "prompt": "10 is subtracted from 50 % of a number , the result is 25 . find the number ?\na ) 75 , b ) 70 , c ) 35 , d ) 170 , e ) 50",
189
+ "response": "\"( 50 / 100 ) * x â € “ 10 = 25 5 x = 350 x = 70 answer : b\"",
190
+ "system": ""
191
+ },
192
+ {
193
+ "prompt": "if the average ( arithmetic mean ) of x and y is 60 , and z – x = 80 , what is the average of y and z ?\na ) 100 , b ) 120 , c ) 125 , d ) 115 , e ) 90",
194
+ "response": "\"x + y / 2 = 60 = > x + y = 120 x = z - 80 . . . sub this value z - 80 + y = 120 = > z + y = 200 = > z + y / 2 = 100 answer : a\"",
195
+ "system": ""
196
+ },
197
+ {
198
+ "prompt": "what is the speed of the stream if a canoe rows upstream at 3 km / hr and downstream at 12 km / hr\na ) 1 kmph , b ) 4 kmph , c ) 3 kmph , d ) 2 kmph , e ) 4.5 kmph",
199
+ "response": "\"sol . speed of stream = 1 / 2 ( 12 - 3 ) kmph = 4.5 kmph . answer e\"",
200
+ "system": ""
201
+ },
202
+ {
203
+ "prompt": "among all sales staff at listco corporation , college graduates and those without college degrees are equally represented . each sales staff member is either a level - 1 or level - 2 employee . level - 1 college graduates account for 15 % of listco ' s sales staff . listco employs 60 level - 1 employees , 30 of whom are college graduates . how many sales staff members without college degrees are level - 2 employees ?\na ) 46 , b ) 42 , c ) 56 , d ) 70 , e ) 58",
204
+ "response": "i ' m going in on this one . so let ' s say that we have the following so we know that l 1 = 60 and that c and l 1 = 0.15 x , we should set up a double set matrix btw but anyways , i ' m just explaining the point with this problem . now we are told that 0.15 x = 30 , therefore the grand total is 200 . now we know that l 2 is 200 - 60 = 140 . we also learn that c and no c are equally represented thus 100 each . therefore no c and no l 2 will be 100 - 30 = 70 . thus d is the correct answer choice",
205
+ "system": ""
206
+ },
207
+ {
208
+ "prompt": "if x is divided by 7 , the remainder is 5 . what is the remainder if 4 x is divided by 7 ?\na ) 1 , b ) 2 , c ) 4 , d ) 6 , e ) 8",
209
+ "response": "x = 7 q + 5 4 x = 7 * 4 q + 20 4 x = 7 * 4 q + 7 * 2 + 6 4 x = 7 ( 4 q + 2 ) + 6 4 x = 7 k + 6 ( k = 4 q + 2 ) answer d",
210
+ "system": ""
211
+ },
212
+ {
213
+ "prompt": "a rectangular field is to be fenced on three sides leaving a side of 20 feet uncovered . if the area of the field is 650 sq . feet , how many feet of fencing will be required ?\na ) 34 , b ) 40 , c ) 85 , d ) 88 , e ) none",
214
+ "response": "\"explanation we have : l = 20 ft and lb = 650 sq . ft . so , b = 32.5 ft . length of fencing = ( l + 2 b ) = ( 20 + 65 ) ft = 85 ft . answer c\"",
215
+ "system": ""
216
+ },
217
+ {
218
+ "prompt": "a straight line in the xy - plane has slope 2 . on this line the x - coordinate of the point is 300 and y - coordinate is 900 then what is the y intercept of the plane ?\na ) 200 , b ) 250 , c ) 100 , d ) 300 , e ) 220",
219
+ "response": "\"eq of line = y = mx + c m = 2 x = 300 y = 300 * 2 + c , substitute y by 900 as given in question . 900 = 600 + c , c = 200 correct option is a\"",
220
+ "system": ""
221
+ },
222
+ {
223
+ "prompt": "a cistern has a leak which would empty the cistern in 20 minutes . a tap is turned on which admits 2 liters a minute into the cistern , and it is emptied in 24 minutes . how many liters does the cistern hold ?\na ) 480 , b ) 240 , c ) 289 , d ) 270 , e ) 927",
224
+ "response": "\"1 / x - 1 / 20 = - 1 / 24 x = 120 120 * 2 = 240 answer : b\"",
225
+ "system": ""
226
+ },
227
+ {
228
+ "prompt": "the cash difference between the selling prices of an article at a profit of 6 % and 8 % is rs 3 . the ratio of two selling prices is\na ) 51 : 52 , b ) 52 : 53 , c ) 53 : 54 , d ) 54 : 55 , e ) none of these",
229
+ "response": "\"explanation : let the cost price of article is rs . x required ratio = ( 106 % of x ) / ( 108 % of x ) = 106 / 108 = 53 / 54 = 53 : 54 . answer : c\"",
230
+ "system": ""
231
+ },
232
+ {
233
+ "prompt": "there are 20 poles with a constant distance between each pole . a car takes 22 second to reach the 12 th pole . how much will it take to reach the last pole .\na ) 38 , b ) 41 , c ) 28 , d ) 88 , e ) 22",
234
+ "response": "\"assuming the car starts at the first pole . to reach the 12 th pole , the car need to travel 11 poles ( the first pole does n ' t count , as the car is already there ) . 11 poles 22 seconds 1 pole ( 22 / 11 ) seconds to reach the last ( 20 th ) pole , the car needs to travel 19 poles . 19 pole 19 x ( 22 / 11 ) seconds = 38 seconds answer : a\"",
235
+ "system": ""
236
+ },
237
+ {
238
+ "prompt": "right now , al and eliot have bank accounts , and al has more money than eliot . the difference between their two accounts is 1 / 12 of the sum of their two accounts . if al ’ s account were to increase by 10 % and eliot ’ s account were to increase by 15 % , then al would have exactly $ 22 more than eliot in his account . how much money does eliot have in his account right now ?\na ) $ 146.6 , b ) $ 120 , c ) $ 180 , d ) $ 220 , e ) $ 260",
239
+ "response": "lets assume al have amount a in his bank account and eliot ' s bank account got e amount . we can form an equation from the first condition . a - e = 1 / 12 * ( a + e ) = = > 11 a = 13 e - - - - - - - - - - - - ( 1 ) second condition gives two different amounts , al ' s amount = 1.1 a and eliot ' s amount = 1.2 e 1.1 a = 22 + 1.15 e = = > 11 a = 220 + 11.5 e - - - - - - - ( 2 ) substituting ( 1 ) in ( 2 ) : 13 e = 220 + 11.5 e = = > 1.5 e = 220 or e = 440 / 3 = 146.6 a",
240
+ "system": ""
241
+ },
242
+ {
243
+ "prompt": "in a urban village of india named ` ` owlna ' ' , 80 % people have refrigerator , 82 % people have television , 70 % people got computers and 75 % got air - conditionor . how many people ( minimum ) got all these luxury .\na ) 3 % , b ) 8 % , c ) 7 % , d ) 10 % , e ) 15 %",
244
+ "response": "\"c 7 % 100 - [ ( 100 - 80 ) + ( 100 - 82 ) + ( 100 - 70 ) + ( 100 - 75 ) ] = 100 - ( 20 + 18 + 30 + 25 ) = 100 - 93\"",
245
+ "system": ""
246
+ },
247
+ {
248
+ "prompt": "a fruit seller had some apples . he sells 40 % apples and still has 420 apples . originally , he had :\na ) 588 apples , b ) 742 apples , c ) 750 apples , d ) 600 apples , e ) 700 apples",
249
+ "response": "\"suppose originally he had x apples . then , ( 100 - 40 ) % of x = 420 . 60 / 100 x x = 420 x = ( 420 x 100 ) / 60 = 700 . answer e\"",
250
+ "system": ""
251
+ },
252
+ {
253
+ "prompt": "pencils , pens and exercise books in a shop are in the ratio of 14 : 4 : 3 . if there are 140 pencils , the number of exercise books in the shop is :\na ) 30 , b ) 27 , c ) 35 , d ) 33 , e ) 37",
254
+ "response": "explanation : let pencils = 14 x , pens = 4 x & exercise books = 3 x . now , 14 x = 140 hence x = 10 number of exercise books = 3 x = 30 answer : a",
255
+ "system": ""
256
+ },
257
+ {
258
+ "prompt": "a sum was put at simple interest at a certain rate for 4 years had it been put at 2 % higher rate , it would have fetched 56 more . find the sum .\na ) 500 , b ) 600 , c ) 700 , d ) 800 , e ) none of these",
259
+ "response": "\"difference in s . i . = p × t / 100 ( r 1 − r 2 ) ⇒ 56 = p × 4 × 2 / 100 ( ∵ r 1 - r 2 = 2 ) ⇒ p = 56 × 100 / 4 × 2 = 700 answer c\"",
260
+ "system": ""
261
+ },
262
+ {
263
+ "prompt": "every day daniel drives 96 miles back from work . on sunday , daniel drove all the way back from work at a constant speed of x miles per hour . on monday , daniel drove the first 32 miles back from work at ( 2 x ) miles per hour , and the rest of the way at ( x / 2 ) miles per hour . the time it took daniel to drive back from work on monday is longer than the time it took him to drive back from work on sunday by what percent ?\na ) 10 % , b ) 20 % , c ) 30 % , d ) 40 % , e ) 50 %",
264
+ "response": "\"let ' s test x = 4 . . . . on sunday , daniel drove 96 miles at 4 miles / hour . d = ( r ) ( t ) 96 = ( 4 ) ( t ) 96 / 4 = 24 = t it takes 24 hours to drive home on monday , daniel drove the first 32 miles at ( 2 ) ( 4 ) = 8 miles / hour and the rest of the way ( 64 miles ) at 4 / 2 = 2 miles / hour d = ( r ) ( t ) 32 = ( 8 ) ( t ) 32 / 8 = 4 = t it takes 4 hours for the first part d = ( r ) ( t ) 64 = ( 2 ) ( t ) 64 / 2 = 32 = t it takes 32 hours for the second part total time to drive home on monday = 4 + 32 = 36 hours we ' re asked by what percent 36 hours is greater than 32 hours . 36 / 32 = 1.5 , so it is 50 % greater . e\"",
265
+ "system": ""
266
+ },
267
+ {
268
+ "prompt": "the true discount on a bill due 9 months hence at 16 % per annum is rs . 189 . the amount of the bill is :\na ) rs . 1386 , b ) rs . 1764 , c ) rs . 1575 , d ) rs . 2268 , e ) none of these",
269
+ "response": "\"solution 32.5 let p . w . be rs . x . then , s . i . on rs . x at 16 % for 9 months = rs . 189 . ∴ x 16 x 9 / 12 x 1 / 100 = 189 or x = 1575 . ∴ p . w . = rs . 1575 . ∴ sum due = p . w . + t . d . = rs . ( 1575 + 189 ) = rs . 1764 . answer b\"",
270
+ "system": ""
271
+ },
272
+ {
273
+ "prompt": "two employees m and n are paid a total of $ 583 per week by their employer . if m is paid 120 percent of the salary paid to n , how much is n paid per week ?\na ) $ 245 , b ) $ 255 , c ) $ 265 , d ) $ 275 , e ) $ 285",
274
+ "response": "\"1.2 n + n = 583 2.2 n = 583 n = 265 the answer is c .\"",
275
+ "system": ""
276
+ },
277
+ {
278
+ "prompt": "what is the dividend . divisor 15 , the quotient is 9 and the remainder is 5 ?\na ) a ) 140 , b ) b ) 134 , c ) c ) 148 , d ) d ) 158 , e ) e ) 160",
279
+ "response": "\"d = d * q + r d = 15 * 9 + 5 d = 135 + 5 d = 140 answer a\"",
280
+ "system": ""
281
+ },
282
+ {
283
+ "prompt": "how many pounds of salt at 70 cents / lb must be mixed with 45 lbs of salt that costs 40 cents / lb so that a merchant will get 20 % profit by selling the mixture at 48 cents / lb ?\na ) 5 , b ) 9 , c ) 40 , d ) 50 , e ) 25",
284
+ "response": "\"selling price is 48 cents / lb for a 20 % profit , cost price should be 40 cents / lb ( cp * 6 / 5 = 48 ) basically , you need to mix 40 cents / lb ( salt 1 ) with 70 cents / lb ( salt 2 ) to get a mixture costing 45 cents / lb ( salt avg ) weight of salt 1 / weight of salt 2 = ( salt 2 - saltavg ) / ( saltavg - salt 1 ) = ( 70 - 45 ) / ( 45 - 40 ) = 5 / 1 we know that weight of salt 1 is 45 lbs . weight of salt 2 must be 9 lbs . answer ( b )\"",
285
+ "system": ""
286
+ },
287
+ {
288
+ "prompt": "when n is divided by 27 , the remainder is 4 . what is the remainder when n + 16 is divided by 7 ?\na ) 2 , b ) 3 , c ) 4 , d ) 5 , e ) 6",
289
+ "response": "\"assume n = 23 remainder ( n / 27 ) = 4 n + 16 = 39 remainder ( 39 / 7 ) = 4 option c\"",
290
+ "system": ""
291
+ },
292
+ {
293
+ "prompt": "working together at their respective constant rates , machine a and machine b can produce 600 units in 8 hours . working alone , machine b would complete that same output in 50 % more time . if machine a were to work on its own for an 8 - hour shift , what percent of the 600 unit total would it produce ?\na ) 25 , b ) 37 , c ) 50 , d ) 30 , e ) 75",
294
+ "response": "\"1 / a + 1 / b = 1 / t 1 / a + 1 / 12 = 1 / 8 ( 50 % more of 8 is 12 ) 1 / a = 1 / 24 machine a can produce 600 units in 24 hrs , so it can produce 600 * 8 / 24 = 200 units is 8 hrs . 200 is 30 % of 600 . d is the answer\"",
295
+ "system": ""
296
+ },
297
+ {
298
+ "prompt": "from the sale of sleeping bags , a retailer made a gross profit of 12 % of the wholesale cost . if each sleeping bag was sold for $ 28 , what was the wholesale cost per bag ?\na ) 3.0 , b ) 3.36 , c ) 24.64 , d ) 25.0 , e ) 31.36",
299
+ "response": "cost price * 1.12 = selling price - - > cost price * 1.12 = $ 28 - - > cost price = $ 25 . answer : d . actually even without any math only c and d make any sense , but since 24.64 * 1.12 wo n ' t be an integer ( $ 28 ) then only answer choice d remains .",
300
+ "system": ""
301
+ },
302
+ {
303
+ "prompt": "a certain car dealership sells economy cars , luxury cars , and sport utility vehicles . the ratio of economy to luxury cars is 5 : 2 . the ratio of economy cars to sport utility vehicles is 4 : 3 . what is the ratio of luxury cars to sport utility vehicles ?\na ) 9 : 8 , b ) 8 : 15 , c ) 3 : 2 , d ) 2 : 3 , e ) 1 : 2",
304
+ "response": "\"the ratio of economy to luxury cars is 5 : 2 - - > e : l = 5 : 2 = 20 : 8 . the ratio of economy cars to sport utility vehicles is 4 : 3 - - > e : s = 4 : 3 = 20 : 15 . thus , l : s = 8 : 15 . answer : b .\"",
305
+ "system": ""
306
+ },
307
+ {
308
+ "prompt": "nitin borrowed some money at the rate of 6 % p . a . for the first 3 years , 9 % p . a . for the next 5 years and 13 % p . a . for the period beyond 8 years . if the total interest paid by him at the end of 11 years is rs . 8160 , how much money did he borrow ?\na ) 8000 , b ) 2787 , c ) 27766 , d ) 9976 , e ) 21671",
309
+ "response": "let the sum be rs . x . then , [ ( x * 6 * 3 ) / 100 ] + [ ( x * 9 * 5 ) / 100 ] + [ ( x * 13 * 3 ) / 100 ] = 8160 18 x + 45 x + 39 x = ( 8160 * 100 ) 102 x = 816000 = > x = 8000 . answer : a",
310
+ "system": ""
311
+ },
312
+ {
313
+ "prompt": "find the average of all numbers between 6 and 36 which are divisible by 7\na ) 20 , b ) 15 , c ) 25 , d ) 30 , e ) 35",
314
+ "response": "explanation : average = ( 7 + 14 + 21 + 28 + 35 ) / 7 = 105 / 7 = 15 option b",
315
+ "system": ""
316
+ },
317
+ {
318
+ "prompt": "linda bought 3 notebooks at $ 1.20 each ; a box of pencils at $ 1.50 and a box of pens at $ 1.70 . how much did linda spend ?\na ) $ 6.80 , b ) $ 8.40 , c ) $ 7.70 , d ) $ 4.70 , e ) $ 3.90",
319
+ "response": "linda spent 1.20 ? 3 = $ 3.60 on notebooks the total amount of money that linda spent is equal to 3.60 + 1.50 + 1.70 = $ 6.80 correct answer a",
320
+ "system": ""
321
+ },
322
+ {
323
+ "prompt": "a basketball is dropped from a height of 40 feet . if it bounces back up to a height that is exactly half of its previous height , and it stops bouncing after hitting the ground for the fourth time , then how many total feet will the ball have traveled after 3 full bounces .\na ) 50 , b ) 55 , c ) 110 , d ) 75 , e ) 80",
324
+ "response": "\"initial distance = 40 feet first bounce = 20 feet up + 20 feet down = 40 feet second bouche = 10 feet up + 10 feet down = 20 feet third bounce = 5 feet up and 5 feet down = 10 feet total distance covered = 40 + 40 + 20 + 10 = 110 answer is c\"",
325
+ "system": ""
326
+ },
327
+ {
328
+ "prompt": "evaluate : | 7 - 8 ( 3 - 12 ) | - | 5 - 11 | = ?\na ) 40 , b ) 50 , c ) 73 , d ) 70 , e ) 80",
329
+ "response": "\"according to order of operations , inner brackets first . hence | 7 - 8 ( 3 - 12 ) | - | 5 - 11 | = | 7 - 8 * ( - 9 ) | - | 5 - 11 | according to order of operations , multiplication within absolute value signs ( which may be considered as brackets when it comes to order of operations ) next . hence = | 7 + 72 | - | 5 - 11 | = | 79 | - | - 6 | = 79 - 6 = 73 correct answer c ) 73\"",
330
+ "system": ""
331
+ },
332
+ {
333
+ "prompt": "in a certain pond , 50 fish were caught , tagged , and returned to the pond . a few days later , 50 fish were caught again , of which 2 were found to have been tagged . if the percent of tagged fish in the second catch approximates the percent of tagged fish in the pond , what is the approximate number of fish in the pond ?\na ) 400 , b ) 625 , c ) 1,250 , d ) 2,500 , e ) 10,000",
334
+ "response": "\"total fish = x percentage of second catch = ( 2 / 50 ) * 100 = 4 % so , x * 4 % = 50 x = 1250 answer : c\"",
335
+ "system": ""
336
+ },
337
+ {
338
+ "prompt": "the product of 4 consecutive even numbers is always divisible by :\na ) 384 , b ) 350 , c ) 400 , d ) 200 , e ) 250",
339
+ "response": "the product of 4 consecutive numbers is always divisible by 4 ! . since , we have 4 even numbers , we have an additional 2 available with each number . now , using both the facts , we can say that the product of 4 consecutive even numbers is always divisible by , 2 ^ 4 * 4 ! 16 * 24 = 384 answer a",
340
+ "system": ""
341
+ },
342
+ {
343
+ "prompt": "a certain lab experiments with white and brown mice only . in one experiment , 2 / 3 of the mice are white . if there are 14 white mice in the experiment , how many brown mice are in the experiment ?\na ) 12 , b ) 8 , c ) 28 , d ) 7 , e ) 27",
344
+ "response": "let total number of mice = m number of white mice = 2 / 3 m = 14 m = 21 number of brown mice = 1 / 3 m = 1 / 3 * 21 = > brown mice = 7 answer d",
345
+ "system": ""
346
+ },
347
+ {
348
+ "prompt": "a cube of edge 8 cm is cut into cubes each of edge 1 cm . the ratio of the total surface area of one of the small cubes to that of the large cube is equal to :\na ) 1 : 25 , b ) 1 : 22 , c ) 1 : 52 , d ) 1 : 64 , e ) none",
349
+ "response": "\"sol . required ratio = 6 * 1 * 1 / 6 * 8 * 8 = 1 / 64 = 1 : 64 . answer d\"",
350
+ "system": ""
351
+ },
352
+ {
353
+ "prompt": "a can give b 70 meters start and c 200 meters start in a kilometer race . how much start can b give c in a kilometer race ?\na ) 139.78 , b ) 139.13 , c ) 139.22 , d ) 111.0 , e ) 111.12",
354
+ "response": "\"a runs 1000 m while b runs 930 m and c runs 800 m . the number of meters that c runs when b runs 1000 m , = ( 1000 * 800 ) / 930 = 860.21 m . b can give c = 1000 - 860.21 = 139.78 m . answer : a\"",
355
+ "system": ""
356
+ },
357
+ {
358
+ "prompt": "what least number should be added to 1053 , so that the sum is completely divisible by 23\na ) a ) 4 , b ) b ) 1 , c ) c ) 2 , d ) d ) 3 , e ) e ) 5",
359
+ "response": "\"explanation : ( 1053 / 23 ) gives remainder 18 18 + 5 = 23 , so we need to add 5 answer : option e\"",
360
+ "system": ""
361
+ },
362
+ {
363
+ "prompt": "what is the sum of the integers 45 through 175 inclusive ?\na ) 12,295 , b ) 13,000 , c ) 14,300 , d ) 14,410 , e ) 28,820",
364
+ "response": "\"sum of n consecutive positive integers = n ( n + 1 ) / 2 . . in one case n = 44 and other 175 . . . subtract the sum to get the answer sum of first 175 + ive numbers = 175 * 176 / 2 = 15400 . . sum of first 45 + i ' ve numbers = 45 * 44 / 2 = 990 . . answer = 15400 - 990 = 14410 answer : d\"",
365
+ "system": ""
366
+ },
367
+ {
368
+ "prompt": "the circumferences of two circles are 528 meters and 704 meters . find the difference between the areas of the larger and the smaller circles ?\na ) 29963 sq m , b ) 28937 sq m , c ) 43162 sq m , d ) 27688 sq m , e ) 17248 sq m",
369
+ "response": "\"let the radii of the smaller and the larger circles be s m and l m respectively . 2 ∏ s = 528 and 2 ∏ l = 704 s = 528 / 2 ∏ and l = 704 / 2 ∏ difference between the areas = ∏ l ^ 2 - ∏ s ^ 2 = ∏ { 264 ^ 2 / ∏ ^ 2 - 352 ^ 2 / ∏ ^ 2 } = 264 ^ 2 / ∏ - 352 ^ 2 / ∏ = ( 264 - 352 ) ( 264 + 352 ) / ∏ = ( 88 ) ( 616 ) / ( 22 / 7 ) = 17248 sq m answer : e\"",
370
+ "system": ""
371
+ },
372
+ {
373
+ "prompt": "if a randomly selected non - negative single digit integer is added to { 2 , 3 , 4 , 9 } . what is the probability that the median of the set will increase but the range still remains the same ?\na ) 0.2 , b ) 0.3 , c ) 0.4 , d ) 0.5 , e ) 0.6",
374
+ "response": "\"we are selecting from non - negative single digit integers , so from { 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 } . these 10 digits represent the total number of outcomes . hence , the total number of outcomes is 10 . we need to find the probability that the median of the set will increase but the range still remains the same . the median of the set is ( 3 + 4 ) / 2 = 3.5 , thus the number selected must be 4 or greater . for the range to remain the same , the number must be between 2 and 9 inclusive . to satisfy both conditions , the number selected must be 4 , 5 , 6 , 7 , 8 , or 9 . the probability is 6 / 10 = 0.6 the answer is e .\"",
375
+ "system": ""
376
+ },
377
+ {
378
+ "prompt": "a man rows his boat 90 km downstream and 55 km upstream , taking 3 hours each time . find the speed of the stream ?\na ) 76 kmph , b ) 6 kmph , c ) 14 kmph , d ) 8 kmph , e ) 4 kmph",
379
+ "response": "\"speed downstream = d / t = 90 / ( 3 ) = 30 kmph speed upstream = d / t = 55 / ( 3 ) = 18 kmph the speed of the stream = ( 30 - 18 ) / 2 = 6 kmph answer : b\"",
380
+ "system": ""
381
+ },
382
+ {
383
+ "prompt": "in a class , 12 students like to play basketball and 8 like to play cricket . 3 students like to play on both basketball and cricket . how many students like to play basketball or cricket or both ?\na ) 12 , b ) 15 , c ) 17 , d ) 18 , e ) 22",
384
+ "response": "\"draw a venn diagram yourself ! b + c - bc = number of students that play either basketball or cricket 12 + 8 - 3 = 17 c )\"",
385
+ "system": ""
386
+ },
387
+ {
388
+ "prompt": "the area of one square is x ^ 2 + 12 x + 36 and the area of another square is 4 x ^ 2 − 12 x + 9 . if the sum of the perimeters of both squares is 64 , what is the value of x ?\na ) 0 , b ) 4.3 , c ) 2.5 , d ) 4.67 , e ) 10",
389
+ "response": "spotting the pattern of equations both are in form of ( x + c ) ^ 2 so a 1 = ( x + 6 ) ^ 2 a 2 = ( 2 x - 3 ) ^ 2 l 1 = x + 6 l 2 = 2 x - 3 p 1 = 4 ( x + 6 ) p 2 = 4 ( 2 x - 3 ) p 1 + p 2 = 64 4 ( x + 6 ) + 4 ( 2 x - 3 ) = 64 . . . . . . . . . . . . . . > x = 4.3 answer : b",
390
+ "system": ""
391
+ },
392
+ {
393
+ "prompt": "the length of the bridge , which a train 160 meters long and travelling at 45 km / hr can cross in 30 seconds , is ?\na ) 766 m , b ) 156 m , c ) 215 m , d ) 156 m , e ) 156 m",
394
+ "response": "\"speed = ( 45 * 5 / 18 ) m / sec = ( 25 / 2 ) m / sec . time = 30 sec . let the length of bridge be x meters . then , ( 160 + x ) / 30 = 25 / 2 = = > 2 ( 160 + x ) = 750 = = > x = 215 m . answer : c\"",
395
+ "system": ""
396
+ },
397
+ {
398
+ "prompt": "how many 4 - digit positive integers are there , where each digit is positive , and no 4 adjacent digits are same ?\na ) 1236 , b ) 3024 , c ) 4096 , d ) 4608 , e ) 6561",
399
+ "response": "\"first digit . . 9 posibilities second digit , 8 possibilities third digit , 7 possibilities fourth digit , 6 possibilities . 9 * 8 * 7 * 6 = 3024 . b\"",
400
+ "system": ""
401
+ },
402
+ {
403
+ "prompt": "if 2 men or 3 women can reap a field in 5 days how long will 5 men and 6 women take to reap it ?\na ) 2 / 24 , b ) 6 / 18 , c ) 2 / 22 , d ) 5 / 12 , e ) 9 / 10",
404
+ "response": "\"explanation : 2 men reap 2 / 5 field in 1 day 1 man reap 1 / ( 2 x 5 ) 3 women reap 1 / 43 field in 1 day 1 woman reap 1 / ( 5 x 3 ) 5 men and 6 women reap ( 5 / ( 2 x 5 ) + 6 / ( 3 x 5 ) ) = 9 / 10 in 1 day 5 men and 6 women will reap the field in 9 / 10 days answer : option e\"",
405
+ "system": ""
406
+ },
407
+ {
408
+ "prompt": "a train 120 m long is running with a speed of 66 km / hr . in what time will it pass a man who is roller skating at 12 km / hr in the direction opposite to that in which the train is going ?\na ) 4.92 , b ) 6.92 , c ) 7.92 , d ) 4.92 , e ) 2.92",
409
+ "response": "\"speed of train relative to man = 66 + 12 = 78 km / hr . = 78 * 5 / 18 = 65 / 3 m / sec . time taken to pass the man = 150 * 3 / 65 = 6.92 sec . answer : b\"",
410
+ "system": ""
411
+ },
412
+ {
413
+ "prompt": "how many different positive integers exist between 10 ^ 6 and 10 ^ 7 , the sum of whose digits is equal to 2 ?\na ) 5 , b ) 6 , c ) 7 , d ) 8 , e ) 18",
414
+ "response": "\"total cases = > 1000000 = > 6 cases for 1 being present at any of the 6 zero and last case 2000000 hence & cases answer : c\"",
415
+ "system": ""
416
+ },
417
+ {
418
+ "prompt": "find the average of all numbers between 3 and 86 which are divisible by 5\na ) 15 , b ) 20 , c ) 25 , d ) 30 , e ) 45",
419
+ "response": "\"explanation : average = ( 5 + 10 + 15 + 20 + 25 + 30 + 35 + 40 + 45 + 50 + 55 + 60 + 65 + 70 + 75 + 80 + 85 ) / 17 = 765 / 17 = 45 answer : option e\"",
420
+ "system": ""
421
+ },
422
+ {
423
+ "prompt": "what is the remainder when 8 ^ 1 + 8 ^ 2 + 8 ^ 3 + . . . + 8 ^ 9 is divided by 2 ?\na ) 4 , b ) 3 , c ) 2 , d ) none of the above , e ) 5",
424
+ "response": "\"notice that in the brackets we have the sum of 9 even multiples of 2 , which yields remainder of 0 upon division by 2 . answer : d\"",
425
+ "system": ""
426
+ },
427
+ {
428
+ "prompt": "milk contains 5 % water . what content of pure milk should be added to 10 liters of milk to reduce this to 2 % ?\na ) 10 liters , b ) 15 liters , c ) 20 liters , d ) 18 liters , e ) 22 liters",
429
+ "response": "\"quantity of water in 10 liters = 5 % of 10 liters = 0.5 liters let x liters of pure milk be added . then , 0.5 / ( 10 + x ) = 2 / 100 2 x = 30 x = 15 liters answer is b\"",
430
+ "system": ""
431
+ },
432
+ {
433
+ "prompt": "a student scored an average of 75 marks in 3 subjects : physics , chemistry and mathematics . if the average marks in physics and mathematics is 90 and that in physics and chemistry is 70 , what are the marks in physics ?\na ) 86 , b ) 16 , c ) 76 , d ) 95 , e ) 26",
434
+ "response": "\"given m + p + c = 75 * 3 = 225 - - - ( 1 ) m + p = 90 * 2 = 180 - - - ( 2 ) p + c = 70 * 2 = 140 - - - ( 3 ) where m , p and c are marks obtained by the student in mathematics , physics and chemistry . p = ( 2 ) + ( 3 ) - ( 1 ) = 180 + 140 - 225 = 95 answer : d\"",
435
+ "system": ""
436
+ },
437
+ {
438
+ "prompt": "a question paper has 2 parts , a & b , each containing 10 questions . if a student has to choose 8 from part a & 5 from part b , in how many ways can he choose the questions ?\na ) 1100 , b ) 1200 , c ) 1235 , d ) 1354 , e ) 1140",
439
+ "response": "there 10 questions in part a out of which 8 question can be chosen as = 10 c 8 . similarly , 5 questions can be chosen from 10 questions of part b as = 10 c 5 . hence , total number of ways , = 10 c 8 * 10 c 5 = [ 10 ! / ( 2 ! 8 ! ) ] * [ 10 ! / ( 5 ! * 5 ) ] = { 10 * 9 / 2 } * { 10 * 9 * 8 * 7 * 6 / ( 5 * 4 * 3 * 2 * 1 ) } = 1140 . e",
440
+ "system": ""
441
+ },
442
+ {
443
+ "prompt": "16 is what % of 90 ?\na ) 16 , b ) 17.78 , c ) 17 , d ) 18.5 , e ) 18.23",
444
+ "response": "we assume that 90 is 100 % assume ' x ' is value we looking for here , 90 = 100 % and x % = 16 therefore , 100 / x = 90 / 16 100 / x = 5.625 x = 17.78 b",
445
+ "system": ""
446
+ },
447
+ {
448
+ "prompt": "marla starts running around a circular track at the same time nick starts walking around the same circular track . marla completes 28 laps around the track per hour and nick completes 13 laps around the track per hour . how many minutes after marla and nick begin moving will marla have completed 4 more laps around the track than nick ?\na ) 5 , b ) 8 , c ) 12 , d ) 16 , e ) 20",
449
+ "response": "\"maria ' s rate - 28 laps per hour - - > 28 / 60 laps / min nick ' s rate - 13 laps per hour - - > 13 / 60 laps / min lets set equations : 28 / 60 * t = 4 ( since maria had to run 4 laps before nick would start ) 13 / 60 * t = 0 ( hick has just started and has n ' t run any lap yet ) ( 28 / 60 - 13 / 60 ) * t = 4 - 0 ( since nick was chasing maria ) t = 16 min needed maria to run 4 laps answer : d\"",
450
+ "system": ""
451
+ },
452
+ {
453
+ "prompt": "what is the least common multiple of 15 , 18 , and 34 ?\na ) 60 , b ) 120 , c ) 240 , d ) 360 , e ) 1530",
454
+ "response": "\"let us first write the numbers in the form of prime factors : 15 = 3 * 5 18 = 2 * 3 ^ 2 34 = 2 * 17 ^ 1 the lcm would be the largest powers of the prime numbers from all these three numbers . hence lcm = 1530 option e\"",
455
+ "system": ""
456
+ },
457
+ {
458
+ "prompt": "divide rs . 800 among a , b and c so that a receives 1 / 3 as much as b and c together and b receives 2 / 3 as a and c together . a ' s share is ?\na ) s . 800 , b ) s . 200 , c ) s . 600 , d ) s . 500 , e ) s . 900",
459
+ "response": "\"a + b + c = 800 a = 1 / 3 ( b + c ) ; b = 2 / 3 ( a + c ) a / ( b + c ) = 1 / 3 a = 1 / 4 * 800 = > 200 answer : b\"",
460
+ "system": ""
461
+ },
462
+ {
463
+ "prompt": "what is the smallest number which when increased by 3 is divisible by 12 , 15 , 35 , and 40 ?\na ) 837 , b ) 947 , c ) 1027 , d ) 1155 , e ) 1231",
464
+ "response": "\"factor each of the numbers 8 , 15 , 35 , and 40 into primes : 12 = 2 * 2 * 3 ; 15 = 3 * 5 ; 35 = 5 * 7 ; 40 = 2 * 2 * 2 * 5 the smallest number divisible by all of them is thus 2 * 2 * 2 * 3 * 5 * 7 = 840 837 + 3 = 840 a\"",
465
+ "system": ""
466
+ },
467
+ {
468
+ "prompt": "if the range w of the 6 numbers 4 , 314 , 710 and x is 12 , what is the difference between the greatest possible value of x and least possible value of x ?\na ) 0 , b ) 2 , c ) 12 , d ) 13 , e ) 15",
469
+ "response": "the range w of a set is the difference between the largest and smallest elements of a set . without x , the difference between the largest and smallest elements of a set is 14 - 3 = 11 < 12 , which means that in order 12 to be the range of the set x must be either the smallest element so that 14 - x = 12 - - - > x = 2 or x must the largest element so that x - 3 = 12 - - > x = 15 . the the difference between the greatest possible value of x and least possible value of x is 15 - 2 = 13 . answer : d .",
470
+ "system": ""
471
+ },
472
+ {
473
+ "prompt": "in the xy - coordinate plane , the graph of y = - x ^ 2 + 9 intersects line l at ( p , 5 ) and ( t , - 8 ) . what is the least possible value of the slope of line l ?\na ) - 6.5 , b ) 2 , c ) - 2 , d ) - 6 , e ) - 10",
474
+ "response": "\"we need to find out the value of p and l to get to the slope . line l and graph y intersect at point ( p , 5 ) . hence , x = p and y = 5 should sactisfy the graph . soliving 5 = - p 2 + 9 p 2 = 4 p = + or - 2 simillarly point ( t , - 8 ) should satisfy the equation . hence x = t and y = - 8 . - 7 = - t 2 + 9 t = + or - 4 considering p = - 2 and t = 4 , the least slope is ( - 8 - 5 ) / ( 4 - 2 ) = - 6.5 imo option a is correct answer .\"",
475
+ "system": ""
476
+ },
477
+ {
478
+ "prompt": "how many 9 - digits number are palindromic numbers ? a palindromic number reads the same forward and backward , example 123454321 .\na ) 100 , b ) 610 , c ) 729 , d ) 900 , e ) 90000",
479
+ "response": "take the task of building palindromes and break it intostages . stage 1 : select the 9 th digit we can choose 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , or 9 so , we can complete stage 1 in 9 ways stage 2 : select the 8 th , 7 th , 6 th , 5 th digit we can choose 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , or 9 so , we can complete stage 2 in 10 ways important : at this point , the remaining digits are alreadylocked in . stage 3 : select the 4 th , 3 rd , 2 nd , 1 st digit so , we can complete this stage in 1 way . by thefundamental counting principle ( fcp ) , we can complete all 5 stages ( and thus build a 9 - digit palindrome ) in ( 9 ) ( 10 ) ( 10 ) ( 10 ) ( 10 ) ( 1 ) ( 1 ) ( 1 ) ( 1 ) ways ( = 90000 ways ) answer : e",
480
+ "system": ""
481
+ },
482
+ {
483
+ "prompt": "if taxi fares were $ 1.00 for the first 1 / 5 mile and $ 0.30 for each 1 / 5 mile there after , then the taxi fare for a 3 - mile ride was\na ) $ 1.56 , b ) $ 2.40 , c ) $ 3.80 , d ) $ 5.20 , e ) $ 2.80",
484
+ "response": "\"in 3 miles , initial 1 / 5 mile charge is $ 1 rest of the distance = 3 - ( 1 / 5 ) = 14 / 5 rest of the distance charge = 14 ( 0.3 ) = $ 4.2 ( as the charge is 0.3 for every 1 / 5 mile ) = > total charge for 3 miles = 1 + 4.2 = 5.2 answer is d .\"",
485
+ "system": ""
486
+ },
487
+ {
488
+ "prompt": "find the largest 6 digit number which is exactly divisible by 88 ?\na ) 998765 , b ) 998907 , c ) 999944 , d ) 999954 , e ) 999990",
489
+ "response": "largest 6 digit number is 999999 after doing 999999 ÷ 88 we get remainder 55 hence largest 6 digit number exactly divisible by 88 = 999999 - 55 = 999944 c",
490
+ "system": ""
491
+ },
492
+ {
493
+ "prompt": "if n divided by 7 has a remainder of 2 , what is the remainder when 6 times n is divided by 7 ?\na ) 1 , b ) 2 , c ) 3 , d ) 5 , e ) 6",
494
+ "response": "\"as per question = > n = 7 p + 2 for some integer p hence 6 n = > 42 q + 12 = > remainder = > 12 for some integer q alternatively = > n = 2 > 6 n = > 12 = > 12 divided by 7 will leave a remainder 5 hence d\"",
495
+ "system": ""
496
+ },
497
+ {
498
+ "prompt": "in a certain egg - processing plant , every egg must be inspected , and is either accepted for processing or rejected . for every 96 eggs accepted for processing , 4 eggs are rejected . if , on a particular day , 12 additional eggs were accepted , but the overall number of eggs inspected remained the same , the ratio of those accepted to those rejected would be 99 to 1 . how many w eggs does the plant process per day ?\na ) 100 , b ) 300 , c ) 400 , d ) 3,000 , e ) 4,000",
499
+ "response": "\"straight pluggin in for me . as usual , i started with c and got the answer . lets ' back calculate and see what we get let us consider eggs processed each day to be 400 so initial ratio of eggs processed and rejected is 96 : 4 or 24 : 1 so out of 400 eggs , there will be 384 eggs processed and 16 rejected . now if the no . of eggs inspected remain and 12 more eggs get accepted that means there w = 384 + 12 = 396 eggs accepted and 4 rejected . . . and the ratio will be 99 : 1 bingo . . . this is what the questions says . . . . its always a good idea to start with c .\"",
500
+ "system": ""
501
+ },
502
+ {
503
+ "prompt": "if 0.5 % of a = 80 paise , then the value of a is ?\na ) rs . 170 , b ) rs . 160 , c ) rs . 1.70 , d ) rs . 4.25 , e ) none",
504
+ "response": "\"answer ∵ 0.5 / 100 of a = 80 / 100 ∴ a = rs . ( 80 / 0.5 ) = rs . 160 correct option : b\"",
505
+ "system": ""
506
+ },
507
+ {
508
+ "prompt": "wink , inc . follows a certain procedure that requires two tasks to be finished independently in order for a job to be done . on any given day , there is a 7 / 8 probability that task 1 will be completed on time , and a 1 / 5 probability that task 2 will be completed on time . on a certain day , what is the probability that task 1 will be completed on time , but task 2 will not ?\na ) 1 / 20 , b ) 3 / 40 , c ) 13 / 40 , d ) 7 / 10 , e ) 13 / 22",
509
+ "response": "\"p ( 1 and not 2 ) = 7 / 8 * ( 1 - 1 / 5 ) = 7 / 10 . answer : d .\"",
510
+ "system": ""
511
+ },
512
+ {
513
+ "prompt": "each of the integers from 0 to 9 , inclusive , is written on a separate slip of blank paper and the ten slips are dropped into a hat . if 4 of the slips are the drawn , without replacement , what is the probability that all 4 have a odd number written on it ?\na ) 1 / 12 , b ) 1 / 10 , c ) 1 / 8 , d ) 1 / 42 , e ) 5 / 9",
514
+ "response": "\"key is that there is no replacement , so each successive choice will become more skewed towards picking a neg ( i . e . the pool of positives decreases , while the pool of negatives stay the same ) p ( + on 1 st pick ) = 5 / 10 p ( + on 2 nd pick ) = 4 / 9 p ( + on 3 rd pick ) = 3 / 8 p ( + on 4 rd pick ) = 2 / 7 5 / 10 * 4 / 9 * 3 / 8 * 2 / 7 = 1 / 42 d\"",
515
+ "system": ""
516
+ },
517
+ {
518
+ "prompt": "in a 1000 m race , a beats b by 200 meters or 25 seconds . find the speed of b ?\na ) 5 m / s , b ) 8 m / s , c ) 9 m / s , d ) 4 m / s , e ) 2 m / s",
519
+ "response": "b 8 m / s since a beats b by 200 m or 25 seconds , i t implies that b covers 200 m in 25 seconds . hence speed of b = 200 / 25 = 8 m / s .",
520
+ "system": ""
521
+ },
522
+ {
523
+ "prompt": "a ’ s speed is 20 / 19 times that of b . if a and b run a race , what part of the length of the race should a give b as a head start , so that the race ends in a dead heat ?\na ) 1 / 19 , b ) 3 / 19 , c ) 1 / 10 , d ) 1 / 20 , e ) 3 / 10",
524
+ "response": "\"let d be the full distance . let x be the fraction of the distance that b runs . let v be the speed at which b runs . the time should be the same for both runners . time = d / ( 20 v / 19 ) = xd / v ( 19 / 20 ) * d / v = x * d / v x = 19 / 20 b should have a head start of 1 / 20 of the full distance . the answer is d .\"",
525
+ "system": ""
526
+ },
527
+ {
528
+ "prompt": "p and q started a business investing rs . 85,000 and rs . 35,000 respectively . in what ratio the profit earned after 2 years be divided between p and q respectively ?\na ) 17 : 6 , b ) 17 : 0 , c ) 17 : 7 , d ) 17 : 2 , e ) 17 : 3",
529
+ "response": "p : q = 85000 : 35000 = 17 : 7 . answer : c",
530
+ "system": ""
531
+ },
532
+ {
533
+ "prompt": "the product of two numbers is 266 and their difference is 5 . what is the bigger number ?\na ) 13 , b ) 15 , c ) 19 , d ) 24 , e ) none of these",
534
+ "response": "\"explanation : let the two numbers be a and b , here a > b ab = 266 b = 266 / a - - - - - - - - - - - - - - - - - ( i ) given , a – b = 5 - - - - - - - - - - - ( ii ) substitute from ( i ) in ( ii ) , we get a – 266 / a = 5 a 2 – 5 a + 266 = 0 ( a – 19 ) ( a – 14 ) = 0 therefore , a = 19 or a = 14 hence , bigger number = a = 19 answer : c\"",
535
+ "system": ""
536
+ },
537
+ {
538
+ "prompt": "vijay sells a cupboard at 12 % below cost price . had he got rs . 1500 more , he would have made a profit of 12 % . what is the cost price of the cupboard ?\na ) 7450 , b ) 14900 , c ) 6250 , d ) 6000 , e ) none of these",
539
+ "response": "\"explanation : cost price = 1500 / ( 0.12 + 0.12 ) = 1500 / 0.24 = rs . 6250 answer c\"",
540
+ "system": ""
541
+ },
542
+ {
543
+ "prompt": "in the junior basketball league there are 18 teams , 2 / 3 of them are bad and ½ are rich . what ca n ' t be the number of teams that are rich and bad ?\na ) 4 . , b ) 6 . , c ) 7 . , d ) 8 . , e ) 10",
544
+ "response": "otal teams = 18 bad teams = ( 2 / 3 ) * 18 = 12 rich teams = 9 so maximum value that the both rich and bad can take will be 9 . so e = 10 can not be that value . answer : e",
545
+ "system": ""
546
+ },
547
+ {
548
+ "prompt": "if i walk at 5 km / h , i miss the bus by 9 minutes . if i walk at 3 km / h , i reach 6 minutes before the arrival of the bus . how far i walk to reach the bus stand ?\na ) 1.99 km , b ) 1.55 km , c ) 1.82 km , d ) 2.87 km , e ) 1.87 km",
549
+ "response": "\"d = product of speed difference of time / difference of speed d = 5 x 3 / 60 [ 9 â ˆ ’ ( â ˆ ’ 6 ) / 5 - 3 ] [ here , â € “ ve sign indicates before the schedule time ] â ‡ ’ d = 1.87 km answer e\"",
550
+ "system": ""
551
+ },
552
+ {
553
+ "prompt": "3 * 12 + 3 * 13 + 3 * 16 + 11 = ?\na ) 122 , b ) 126 , c ) 134 , d ) 148 , e ) 151",
554
+ "response": "3 * 12 + 3 * 13 + 3 * 16 + 11 = 36 + 39 + 48 + 11 = 134 the answer is c .",
555
+ "system": ""
556
+ },
557
+ {
558
+ "prompt": "37 . if the cost price of 15 tables be equal to the selling price of 20 tables , the loss per cent is ?\na ) 20 % , b ) 30 % , c ) 25 % , d ) 37.5 % , e ) 38 %",
559
+ "response": "let c . p . of each table = re . 1 c . p . of 20 tables = rs . 20 s . p . of 20 table = c . p . of 15 tables = rs . 15 loss = ( 5 / 20 ) x 100 % = 25 % answer : c",
560
+ "system": ""
561
+ },
562
+ {
563
+ "prompt": "a man can row his boat with the stream at 25 km / h and against the stream in 13 km / h . the man ' s rate is ?\na ) 1 kmph , b ) 2 kmph , c ) 6 kmph , d ) 8 kmph , e ) 3 kmph",
564
+ "response": "\"ds = 25 us = 13 s = ? s = ( 25 - 13 ) / 2 = 6 kmph answer : c\"",
565
+ "system": ""
566
+ },
567
+ {
568
+ "prompt": "in what ratio must tea at rs . 65 per kg be mixed with tea at rs . 70 per kg so that the mixture must be worth rs . 6 per kg ?\na ) 1 : 1 , b ) 3 : 2 , c ) 4 : 3 , d ) 5 : 3 , e ) none",
569
+ "response": "\"required ratio = 500 : 500 = 1 : 1 answer a\"",
570
+ "system": ""
571
+ },
572
+ {
573
+ "prompt": "in a certain animal population , for each of the first 3 months of life , the probability that an animal will die during that month is 1 / 10 . for a group of 700 newborn members of the population , approximately how many would be expected to survive the first 3 months of life ?\na ) 511 , b ) 546 , c ) 552 , d ) 562 , e ) 570",
574
+ "response": "\"number of newborns that can die in first month = 1 / 10 * 700 = 70 survived = 630 number of newborns that can die in second month = 1 / 10 * 630 = 63 survived = 567 number of newborns that can die in third month = 1 / 10 * 567 = 56 survived = 511 answer : a\"",
575
+ "system": ""
576
+ },
577
+ {
578
+ "prompt": "if 60 percent of 500 is 50 percent of x , then x = ?\na ) 600 , b ) 620 , c ) 650 , d ) 700 , e ) 720",
579
+ "response": "\"0.6 * 500 = 0.5 * x x = 6 / 5 * 500 = 600\"",
580
+ "system": ""
581
+ },
582
+ {
583
+ "prompt": "a train running at the speed of 60 km / hr crosses a pole in 12 seconds . what is the length of the train ?\na ) 187 m , b ) 278 m , c ) 876 m , d ) 200 m , e ) 267 m",
584
+ "response": "\"speed = ( 60 * 5 / 18 ) m / sec = ( 50 / 3 ) m / sec length of the train = ( speed x time ) = ( 50 / 3 * 12 ) m = 200 m . answer : d\"",
585
+ "system": ""
586
+ },
587
+ {
588
+ "prompt": "when 242 is divided by a certain divisor the remainder obtained is 6 . when 698 is divided by the same divisor the remainder obtained is 13 . however , when the sum of the two numbers 242 and 698 is divided by the divisor , the remainder obtained is 5 . what is the value of the divisor ?\na ) 11 , b ) 14 , c ) 18 , d ) 23 , e ) none of these",
589
+ "response": "\"let that divisor be x since remainder is 6 or 13 it means divisor is greater than 13 . now 242 - 6 = 236 = kx ( k is an integer and 234 is divisble by x ) similarly 698 - 13 = 685 = lx ( l is an integer and 689 is divisible by x ) adding both 698 and 242 = ( 236 + 685 ) + 6 + 13 = x ( k + l ) + 19 when we divide this number by x then remainder will be equal to remainder of ( 19 divided by x ) = 5 hence x = 19 - 5 = 14 hence b\"",
590
+ "system": ""
591
+ },
592
+ {
593
+ "prompt": "two trains are moving in the same direction at 72 kmph and 36 kmph . the faster train crosses a man in the slower train in 15 seconds . find the length of the faster train ?\na ) 270 , b ) 277 , c ) 187 , d ) 257 , e ) 150",
594
+ "response": "\"relative speed = ( 72 - 36 ) * 5 / 18 = 2 * 5 = 10 mps . distance covered in 15 sec = 15 * 10 = 150 m . the length of the faster train = 150 m . answer : e\"",
595
+ "system": ""
596
+ },
597
+ {
598
+ "prompt": "the speed at which a man can row a boat in still water is 20 kmph . if he rows downstream , where the speed of current is 3 kmph , what time will he take to cover 60 metres ?\na ) 16 seconds , b ) 76 seconds , c ) 26 seconds , d ) 9.4 seconds , e ) 18 seconds",
599
+ "response": "\"speed of the boat downstream = 20 + 3 = 23 kmph = 23 * 5 / 18 = 115 / 18 m / s hence time taken to cover 60 m = 60 * 18 / 115 = 9.4 seconds . answer : d\"",
600
+ "system": ""
601
+ },
602
+ {
603
+ "prompt": "how many trucks are there if each truck carrying 70 packages and total of 490 packages ?\na ) a ) 7 , b ) b ) 6 , c ) c ) 9 , d ) d ) 11 , e ) e ) none of the above",
604
+ "response": "sol . total packages 490 each truck carries 70 packages = 490 / 70 = 7 answer : a",
605
+ "system": ""
606
+ },
607
+ {
608
+ "prompt": "tough and tricky questions : statistics . set x consists of prime numbers { 3 , 11 , 7 , a , 17 , 19 } . if integer y represents the product of all elements in set x and if 11 y is an even number , what is the range of set x ?\na ) 14 , b ) 16 , c ) 17 , d ) 20 , e ) 26",
609
+ "response": "since 11 y = even therefore y has to beevensince 11 is a odd integer ( even * odd = even ) similarly , y is the product of all integers in set x but all integers in set x are odd except the unknown a and since x contains only prime numbers , a has to equal to 2 . . . ( 2 is the only even prime number and the product of all prime numbers in set x has to be even , even * odd = even ) since you know value of a you can calculate the range = largest integer in the set minus smallest integer in the set = 19 - 2 = 17 answer is c",
610
+ "system": ""
611
+ }
612
+ ]
613
+ }
Math_QA/group_09/tokenizer/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
Math_QA/group_09/tokenizer/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
Math_QA/group_09/tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
Math_QA/group_09/tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
Math_QA/group_09/tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
Math_QA/group_09/tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
README.md ADDED
@@ -0,0 +1,223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Qwen2.5-1.5B Math LoRA Collection
2
+
3
+ This directory aggregates all LoRA checkpoints produced by the `train_lora` pipeline. Every subfolder corresponds to one math dataset and contains 10 independent 100-shot LoRA runs (group `00`–`09`) trained on **Qwen2.5‑1.5B-Instruct** with identical hyperparameters. The adapters here are the source of truth for downstream evaluation (`../评估体系`) and for the `parameter_generator` project, which learns to map prompts to LoRA weights.
4
+
5
+ If you are new to the project, this document explains **where the data comes from, how the LoRAs are produced, and how you can reuse them for inference, evaluation, or further training**.
6
+
7
+ ## Provenance
8
+
9
+ - **Base model:** `/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/不冻结Qwen训练/models/Qwen2.5-1.5B-Instruct`
10
+ - **Datasets:** sampled from `../../prepare/data/math/*.json`. Each JSON is a list of `{prompt, response, system?}` records. `dataset_sampler.py` draws 10 disjoint groups of 100 samples (unless the dataset has <1 000 examples, in which case sampling with replacement keeps the group size fixed) using a deterministic seed derived from the dataset name.
11
+ - **Training recipe (from `config/default.yaml`):**
12
+ - sequence length 4 096; LoRA `r=64`, `alpha=128`, `dropout=0.05`, target modules = `{q,k,v,o,gate,up,down}_proj`
13
+ - 12 epochs / max 1 800 steps, learning rate `1e-4`, batch size per device `2`, gradient accumulation `16`, BF16 training, gradient checkpointing on, weight decay `0.01`, warmup ratio `0.03`, checkpoints saved every 300 steps (keeping at most 6) plus a final adapter export
14
+ - Tokenizers are cloned from the base model (pad token defaults to EOS if missing)
15
+ - **Monitoring & reproducibility:**
16
+ - Trainer logs (loss, LR, throughput) are in `../logs/<dataset>/group_xx/`.
17
+ - Slurm stdout/err for each shard live in `../logs/slurm/`.
18
+ - `metadata.json` captures the git commit (if `GIT_COMMIT` was set), timestamps, seeds, and the effective batch size so any experiment can be repeated exactly.
19
+
20
+ ### End-to-end data flow
21
+
22
+ 1. **Raw JSON data** comes from `../../prepare/data/math`. Each file is a list of dict objects with keys:
23
+ ```json
24
+ {
25
+ "prompt": "...question...",
26
+ "response": "...reference answer...",
27
+ "system": "optional system message"
28
+ }
29
+ ```
30
+ 2. `python -m train_lora.dataset_sampler --config config/default.yaml` reads every dataset, filters out `GSM8K_test.json`, and deterministically samples 10×100 items per dataset. The samples plus metadata (indices, seeds, timestamps) are written to `../prompt_groups/<dataset>/group_xx.json`.
31
+ 3. `python -m train_lora.run_tasks --run` (or the Slurm array) iterates dataset/group pairs, loads the corresponding prompt group, and performs LoRA fine-tuning with Hugging Face `Trainer`.
32
+ 4. After training finishes, the following artifacts land in `outputs/<dataset>/group_xx/`:
33
+ - a ready-to-use LoRA adapter (`adapter/`)
34
+ - intermediate checkpoints for analysis/resume
35
+ - tokenizers and metadata
36
+ 5. The evaluation stacks (`../评估体系`, `../parameter_generator/评估`) and the LoRA parameter generator both consume these directories directly.
37
+
38
+ ## Directory layout
39
+
40
+ ```
41
+ outputs/
42
+ ├── Competition_Math/
43
+ ├── GSM8K_train/
44
+ ├── MATH/
45
+ ├── Math-IIO-68K-Mini/
46
+ ├── Math-Plus/
47
+ ├── Math_QA/
48
+ ├── Mu-Math/
49
+ └── ToT-Math-V1/
50
+ ```
51
+
52
+ Each dataset directory contains `group_00` … `group_09`. Inside every group:
53
+
54
+ | Item | Description |
55
+ | --- | --- |
56
+ | `adapter/` | Final LoRA export (`adapter_model.safetensors`, `adapter_config.json`, tokenizer + chat template snapshots, and HF `training_args.bin`). This is the folder you will load for inference. |
57
+ | `checkpoints/checkpoint-xxxx/` | Intermediate Trainer checkpoints saved every 300 steps (300–1 800). They include optimizer, scheduler, RNG state, and tokenizer copies for resuming or studying training dynamics. |
58
+ | `tokenizer/` | Standalone tokenizer snapshot identical to the one used during training; useful if you need a self-contained deployment without referencing the base model directory. |
59
+ | `prompt_group.json` | The exact 100-shot dataset used for this training run (a copy of `prompt_groups/<dataset>/group_xx.json`). Contains metadata such as sampled indices, original source file, and timestamp. |
60
+ | `metadata.json` | Provenance record with training loss, Trainer metrics, LoRA config, effective batch size/world size, timestamps, git commit (if exported), and file paths. |
61
+ | `metadata.json -> trainer_state` | Full training log history (per-step metrics). Disable via `metadata.save_training_state: false` if you want lighter metadata. |
62
+
63
+ > **Tip:** Use `metadata.json` to find the latest checkpoint, to confirm which base model/tokenizer were used, or to drive automated uploads/evaluations.
64
+
65
+ ## Dataset overview
66
+
67
+ | Dataset dir | Source file (relative to `prepare/data/math`) | Notes |
68
+ | --- | --- | --- |
69
+ | `Competition_Math` | `Competition_Math.json` | 100-shot groups drawn from Competition Math practice problems. |
70
+ | `GSM8K_train` | `GSM8K_train.json` | Standard GSM8K train split, excluding the public test set (`GSM8K_test.json` was filtered out). |
71
+ | `MATH` | `MATH.json` | High-school & olympiad math benchmark. |
72
+ | `Math-IIO-68K-Mini` | `Math-IIO-68K-Mini.json` | Mini version of Math IIO dataset. |
73
+ | `Math-Plus` | `Math-Plus.json` | Composed of challenging math word problems. |
74
+ | `Math_QA` | `Math_QA.json` | Multi-choice MathQA dataset formatted to open-ended QA. |
75
+ | `Mu-Math` | `Mu-Math.json` | MuSR style math reasoning set. |
76
+ | `ToT-Math-V1` | `ToT-Math-V1.json` | Tree-of-Thought flavored math prompts. |
77
+
78
+ All datasets follow the same JSON schema, so swapping between them only changes topical coverage.
79
+
80
+ ## How to navigate a single group
81
+
82
+ ```
83
+ Math_QA/
84
+ └── group_00/
85
+ ├── adapter/
86
+ │ ├── adapter_config.json
87
+ │ ├── adapter_model.safetensors
88
+ │ ├── tokenizer/… (extra copies of merges, vocab, chat_template.jinja)
89
+ │ └── training_args.bin
90
+ ├── checkpoints/
91
+ │ ├── checkpoint-300/
92
+ │ ├── checkpoint-600/
93
+ │ └── …
94
+ ├── tokenizer/ # same as base tokenizer but pinned to this run
95
+ ├── prompt_group.json # 100-shot data
96
+ └── metadata.json
97
+ ```
98
+
99
+ When inspecting or sharing a run, the **minimum** file set is `adapter/` + `prompt_group.json` + `metadata.json`. Everything else speeds up resuming or auditing.
100
+
101
+ ## Using the adapters
102
+
103
+ ### 0. Environment prerequisites
104
+
105
+ - Python ≥ 3.10, `transformers >= 4.37`, `peft >= 0.8`, `accelerate`, `safetensors`, `torch` (GPU build).
106
+ - The base model directory must be accessible; otherwise download `Qwen2.5-1.5B-Instruct` from Hugging Face and update `base_model` path.
107
+ - Optional: set `HF_HOME`, `TRANSFORMERS_CACHE` to avoid repeated downloads.
108
+
109
+ ### 0.5. Reproduce the training pipeline (optional)
110
+
111
+ If someone wants to regenerate any adapter from scratch:
112
+
113
+ ```bash
114
+ cd /hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/train_lora
115
+ python -m train_lora.dataset_sampler --overwrite # regenerates prompt groups
116
+ python -m train_lora.train_single --dataset Math_QA --group 0
117
+ # or run the full queue
118
+ python -m train_lora.run_tasks --run
119
+ ```
120
+
121
+ These commands will rebuild `prompt_groups/` and `outputs/` with exactly the same seeds and configuration documented above. Slurm users should submit `sbatch run_lora_multinode.sh`.
122
+
123
+ ### 1. Load adapter with PEFT
124
+
125
+ ```python
126
+ from transformers import AutoModelForCausalLM, AutoTokenizer
127
+ from peft import PeftModel
128
+ import torch
129
+
130
+ base_model = "/hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/不冻结Qwen训练/models/Qwen2.5-1.5B-Instruct"
131
+ adapter_dir = "outputs/Math_QA/group_00/adapter"
132
+
133
+ tokenizer = AutoTokenizer.from_pretrained(adapter_dir, trust_remote_code=True)
134
+ model = AutoModelForCausalLM.from_pretrained(
135
+ base_model,
136
+ torch_dtype=torch.bfloat16,
137
+ device_map="auto",
138
+ trust_remote_code=True,
139
+ )
140
+ model = PeftModel.from_pretrained(model, adapter_dir)
141
+
142
+ prompt = "Solve 3x + 7 = 22."
143
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
144
+ out = model.generate(**inputs, max_new_tokens=256)
145
+ print(tokenizer.decode(out[0], skip_special_tokens=True))
146
+ ```
147
+
148
+ Notes:
149
+ - Loading the tokenizer from `adapter/` ensures identical chat template and additional tokens (if any). You can also point to the base tokenizer path if you prefer.
150
+ - For batch inference, wrap the model with `model.merge_and_unload()` if you need a single combined set of weights (at the cost of losing LoRA toggling).
151
+ - If you want maximal throughput on a single GPU, also call `model.half()` or `model.to(torch.bfloat16)` depending on your hardware; the adapters were trained with BF16 so keeping BF16 is the safest choice.
152
+
153
+ ### 2. Resume or continue training
154
+
155
+ ```bash
156
+ python -m train_lora.train_single \
157
+ --dataset Math_QA \
158
+ --group 0 \
159
+ --group-file outputs/Math_QA/group_00/prompt_group.json
160
+ ```
161
+
162
+ Set `--group-file` to reuse the same 100 samples, and initialize `Trainer` with `checkpoints/checkpoint-XXXX` via `TrainingArguments.resume_from_checkpoint`. This reproduces a group or lets you extend training steps.
163
+
164
+ To resume manually:
165
+
166
+ ```python
167
+ trainer.train(resume_from_checkpoint="outputs/Math_QA/group_00/checkpoints/checkpoint-1500")
168
+ ```
169
+
170
+ ### 3. Evaluate with Math-Verify
171
+
172
+ The evaluation stack in `../评估体系` and `../parameter_generator/评估` expects this directory layout. Example:
173
+
174
+ ```bash
175
+ cd /hkfs/work/workspace/scratch/tum_fmp0582-dndworkspace/自己训练lora/评估体系
176
+ python scripts/run_all_evals.py \
177
+ --config configs/eval_config.yaml \
178
+ --datasets Math_QA \
179
+ --groups 0 1
180
+ ```
181
+
182
+ ### 4. Packaging for distribution
183
+
184
+ - Upload only `adapter/` and `metadata.json` when sharing publicly (e.g., Hugging Face) to avoid huge checkpoint directories.
185
+ - Keep `prompt_group.json` if you want consumers to understand the training data or to regenerate LoRA weights with the same samples.
186
+ - When exporting, include a README snippet that references this document so downstream users know the provenance.
187
+ - Suggested Hugging Face layout:
188
+ ```
189
+ Math_QA/
190
+ group_00/
191
+ adapter/
192
+ prompt_group.json
193
+ metadata.json
194
+ README.md (copy sections describing provenance + usage)
195
+ ```
196
+
197
+ ## File reference (`metadata.json`)
198
+
199
+ Key fields you may want to automate against:
200
+
201
+ | Field | Meaning |
202
+ | --- | --- |
203
+ | `dataset_name`, `group_index` | Identify the run. |
204
+ | `prompt_group_file` | Absolute path back to the sampled dataset. |
205
+ | `checkpoint_root` | Where all intermediate checkpoints live. |
206
+ | `train_loss`, `metrics` | Final loss and Trainer metrics dict. |
207
+ | `trainer_state` | Full log history (can be large; disable via `metadata.save_training_state`). |
208
+ | `training_args` | Exact HF `TrainingArguments` snapshot. |
209
+ | `lora_config` | Copy of the LoRA hyperparameters used. |
210
+ | `effective_batch_size` | `world_size × per_device_batch_size × grad_accum` — useful for scaling comparisons. |
211
+ | `git_commit` | Populated if the `GIT_COMMIT` env var was set before training. |
212
+ | `metrics.train_runtime`, `metrics.train_samples_per_second` | Throughput stats. |
213
+ | `generated_at` | UTC timestamp when the metadata was written. |
214
+
215
+ ## Best practices
216
+
217
+ - Always match BF16 or FP16 settings between base model loading and adapter training; these adapters were trained in BF16.
218
+ - If you edit files inside this directory, keep structure intact—other scripts rely on relative paths (`adapter`, `tokenizer`, `metadata.json`).
219
+ - Before deploying a new LoRA, verify it with the evaluation suite and consider merging multiple groups (e.g., ensemble or checkpoint averaging) only after confirming stability.
220
+ - Use `prompt_group.json` and `metadata.json` as documentation when presenting results; they already include seeds, sample indices, and environment details.
221
+ - If you build new LoRAs with different configs (e.g., higher rank, more steps), add a sibling directory (e.g., `outputs_v2/`) or annotate the README so collaborators know which adapters correspond to which experiment.
222
+
223
+ Happy finetuning! If you extend this collection (new datasets, extra groups, or different hyperparameters), add another section here describing the changes so downstream consumers stay informed.