MiaMao commited on
Commit
b843574
·
1 Parent(s): 300c2ab

Add LoRA checkpoints (without PNG loss curves)

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitignore +2 -0
  2. Personality/checkpoint-32899/README.md +206 -0
  3. Personality/checkpoint-32899/adapter_config.json +45 -0
  4. Personality/checkpoint-32899/adapter_model.safetensors +3 -0
  5. Personality/checkpoint-32899/added_tokens.json +24 -0
  6. Personality/checkpoint-32899/chat_template.jinja +54 -0
  7. Personality/checkpoint-32899/merges.txt +0 -0
  8. Personality/checkpoint-32899/optimizer.pt +3 -0
  9. Personality/checkpoint-32899/rng_state.pth +3 -0
  10. Personality/checkpoint-32899/scheduler.pt +3 -0
  11. Personality/checkpoint-32899/special_tokens_map.json +31 -0
  12. Personality/checkpoint-32899/tokenizer_config.json +207 -0
  13. Personality/checkpoint-32899/trainer_state.json +0 -0
  14. Personality/checkpoint-32899/training_args.bin +3 -0
  15. Personality/checkpoint-32899/vocab.json +0 -0
  16. Scene/checkpoint-35821/README.md +206 -0
  17. Scene/checkpoint-35821/adapter_config.json +45 -0
  18. Scene/checkpoint-35821/adapter_model.safetensors +3 -0
  19. Scene/checkpoint-35821/added_tokens.json +24 -0
  20. Scene/checkpoint-35821/chat_template.jinja +54 -0
  21. Scene/checkpoint-35821/merges.txt +0 -0
  22. Scene/checkpoint-35821/optimizer.pt +3 -0
  23. Scene/checkpoint-35821/rng_state.pth +3 -0
  24. Scene/checkpoint-35821/scheduler.pt +3 -0
  25. Scene/checkpoint-35821/special_tokens_map.json +31 -0
  26. Scene/checkpoint-35821/tokenizer_config.json +207 -0
  27. Scene/checkpoint-35821/trainer_state.json +0 -0
  28. Scene/checkpoint-35821/training_args.bin +3 -0
  29. Scene/checkpoint-35821/vocab.json +0 -0
  30. Topic/Strategy/checkpoint-26250/README.md +206 -0
  31. Topic/Strategy/checkpoint-26250/adapter_config.json +45 -0
  32. Topic/Strategy/checkpoint-26250/adapter_model.safetensors +3 -0
  33. Topic/Strategy/checkpoint-26250/added_tokens.json +24 -0
  34. Topic/Strategy/checkpoint-26250/chat_template.jinja +54 -0
  35. Topic/Strategy/checkpoint-26250/merges.txt +0 -0
  36. Topic/Strategy/checkpoint-26250/optimizer.pt +3 -0
  37. Topic/Strategy/checkpoint-26250/plot.py +44 -0
  38. Topic/Strategy/checkpoint-26250/rng_state.pth +3 -0
  39. Topic/Strategy/checkpoint-26250/scheduler.pt +3 -0
  40. Topic/Strategy/checkpoint-26250/special_tokens_map.json +31 -0
  41. Topic/Strategy/checkpoint-26250/tokenizer_config.json +207 -0
  42. Topic/Strategy/checkpoint-26250/trainer_state.json +3919 -0
  43. Topic/Strategy/checkpoint-26250/training_args.bin +3 -0
  44. Topic/Strategy/checkpoint-26250/vocab.json +0 -0
  45. Topic/Willingness/checkpoint-2500/README.md +206 -0
  46. Topic/Willingness/checkpoint-2500/adapter_config.json +45 -0
  47. Topic/Willingness/checkpoint-2500/adapter_model.safetensors +3 -0
  48. Topic/Willingness/checkpoint-2500/added_tokens.json +24 -0
  49. Topic/Willingness/checkpoint-2500/chat_template.jinja +54 -0
  50. Topic/Willingness/checkpoint-2500/merges.txt +0 -0
.gitignore ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+
2
+ Scene/**/loss_curve.png
Personality/checkpoint-32899/README.md ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: D:\LLM\Qwen2.5-7B-Instruct
3
+ library_name: peft
4
+ tags:
5
+ - base_model:adapter:D:\LLM\Qwen2.5-7B-Instruct
6
+ - lora
7
+ - transformers
8
+ ---
9
+
10
+ # Model Card for Model ID
11
+
12
+ <!-- Provide a quick summary of what the model is/does. -->
13
+
14
+
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ <!-- Provide a longer summary of what this model is. -->
21
+
22
+
23
+
24
+ - **Developed by:** [More Information Needed]
25
+ - **Funded by [optional]:** [More Information Needed]
26
+ - **Shared by [optional]:** [More Information Needed]
27
+ - **Model type:** [More Information Needed]
28
+ - **Language(s) (NLP):** [More Information Needed]
29
+ - **License:** [More Information Needed]
30
+ - **Finetuned from model [optional]:** [More Information Needed]
31
+
32
+ ### Model Sources [optional]
33
+
34
+ <!-- Provide the basic links for the model. -->
35
+
36
+ - **Repository:** [More Information Needed]
37
+ - **Paper [optional]:** [More Information Needed]
38
+ - **Demo [optional]:** [More Information Needed]
39
+
40
+ ## Uses
41
+
42
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
+
44
+ ### Direct Use
45
+
46
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
47
+
48
+ [More Information Needed]
49
+
50
+ ### Downstream Use [optional]
51
+
52
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
53
+
54
+ [More Information Needed]
55
+
56
+ ### Out-of-Scope Use
57
+
58
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
59
+
60
+ [More Information Needed]
61
+
62
+ ## Bias, Risks, and Limitations
63
+
64
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
65
+
66
+ [More Information Needed]
67
+
68
+ ### Recommendations
69
+
70
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
71
+
72
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
73
+
74
+ ## How to Get Started with the Model
75
+
76
+ Use the code below to get started with the model.
77
+
78
+ [More Information Needed]
79
+
80
+ ## Training Details
81
+
82
+ ### Training Data
83
+
84
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
85
+
86
+ [More Information Needed]
87
+
88
+ ### Training Procedure
89
+
90
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
91
+
92
+ #### Preprocessing [optional]
93
+
94
+ [More Information Needed]
95
+
96
+
97
+ #### Training Hyperparameters
98
+
99
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
100
+
101
+ #### Speeds, Sizes, Times [optional]
102
+
103
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
104
+
105
+ [More Information Needed]
106
+
107
+ ## Evaluation
108
+
109
+ <!-- This section describes the evaluation protocols and provides the results. -->
110
+
111
+ ### Testing Data, Factors & Metrics
112
+
113
+ #### Testing Data
114
+
115
+ <!-- This should link to a Dataset Card if possible. -->
116
+
117
+ [More Information Needed]
118
+
119
+ #### Factors
120
+
121
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
122
+
123
+ [More Information Needed]
124
+
125
+ #### Metrics
126
+
127
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
128
+
129
+ [More Information Needed]
130
+
131
+ ### Results
132
+
133
+ [More Information Needed]
134
+
135
+ #### Summary
136
+
137
+
138
+
139
+ ## Model Examination [optional]
140
+
141
+ <!-- Relevant interpretability work for the model goes here -->
142
+
143
+ [More Information Needed]
144
+
145
+ ## Environmental Impact
146
+
147
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
148
+
149
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
150
+
151
+ - **Hardware Type:** [More Information Needed]
152
+ - **Hours used:** [More Information Needed]
153
+ - **Cloud Provider:** [More Information Needed]
154
+ - **Compute Region:** [More Information Needed]
155
+ - **Carbon Emitted:** [More Information Needed]
156
+
157
+ ## Technical Specifications [optional]
158
+
159
+ ### Model Architecture and Objective
160
+
161
+ [More Information Needed]
162
+
163
+ ### Compute Infrastructure
164
+
165
+ [More Information Needed]
166
+
167
+ #### Hardware
168
+
169
+ [More Information Needed]
170
+
171
+ #### Software
172
+
173
+ [More Information Needed]
174
+
175
+ ## Citation [optional]
176
+
177
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
178
+
179
+ **BibTeX:**
180
+
181
+ [More Information Needed]
182
+
183
+ **APA:**
184
+
185
+ [More Information Needed]
186
+
187
+ ## Glossary [optional]
188
+
189
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
190
+
191
+ [More Information Needed]
192
+
193
+ ## More Information [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Authors [optional]
198
+
199
+ [More Information Needed]
200
+
201
+ ## Model Card Contact
202
+
203
+ [More Information Needed]
204
+ ### Framework versions
205
+
206
+ - PEFT 0.17.1
Personality/checkpoint-32899/adapter_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "D:\\LLM\\Qwen2.5-7B-Instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": [
22
+ "classifier",
23
+ "score"
24
+ ],
25
+ "peft_type": "LORA",
26
+ "qalora_group_size": 16,
27
+ "r": 16,
28
+ "rank_pattern": {},
29
+ "revision": null,
30
+ "target_modules": [
31
+ "up_proj",
32
+ "down_proj",
33
+ "k_proj",
34
+ "q_proj",
35
+ "v_proj",
36
+ "gate_proj",
37
+ "o_proj"
38
+ ],
39
+ "target_parameters": null,
40
+ "task_type": "SEQ_CLS",
41
+ "trainable_token_indices": null,
42
+ "use_dora": false,
43
+ "use_qalora": false,
44
+ "use_rslora": false
45
+ }
Personality/checkpoint-32899/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af5930e48cc77d7662426819125321a06660b998db5ee01ed1fa5e20df707238
3
+ size 80800144
Personality/checkpoint-32899/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
Personality/checkpoint-32899/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
Personality/checkpoint-32899/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
Personality/checkpoint-32899/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:863fd26e34ab8f27d5e4fcd8933aacca30ede9c6c42363498ee864506ed11f73
3
+ size 161825517
Personality/checkpoint-32899/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4f01ce3966ed9a31240d334f88c6224d31c1deee278389817bb71d446a93938
3
+ size 14244
Personality/checkpoint-32899/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45f02365e14f6f94a66275ee1956c749894b798545103a599937b918bf481b17
3
+ size 1064
Personality/checkpoint-32899/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
Personality/checkpoint-32899/tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
Personality/checkpoint-32899/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
Personality/checkpoint-32899/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a2d5b8ea3c799429af1fb7f8aca3e13a5a4c23bca58dbccf559cfdd3d2317e8
3
+ size 5496
Personality/checkpoint-32899/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
Scene/checkpoint-35821/README.md ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: D:\LLM\Qwen2.5-7B-Instruct
3
+ library_name: peft
4
+ tags:
5
+ - base_model:adapter:D:\LLM\Qwen2.5-7B-Instruct
6
+ - lora
7
+ - transformers
8
+ ---
9
+
10
+ # Model Card for Model ID
11
+
12
+ <!-- Provide a quick summary of what the model is/does. -->
13
+
14
+
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ <!-- Provide a longer summary of what this model is. -->
21
+
22
+
23
+
24
+ - **Developed by:** [More Information Needed]
25
+ - **Funded by [optional]:** [More Information Needed]
26
+ - **Shared by [optional]:** [More Information Needed]
27
+ - **Model type:** [More Information Needed]
28
+ - **Language(s) (NLP):** [More Information Needed]
29
+ - **License:** [More Information Needed]
30
+ - **Finetuned from model [optional]:** [More Information Needed]
31
+
32
+ ### Model Sources [optional]
33
+
34
+ <!-- Provide the basic links for the model. -->
35
+
36
+ - **Repository:** [More Information Needed]
37
+ - **Paper [optional]:** [More Information Needed]
38
+ - **Demo [optional]:** [More Information Needed]
39
+
40
+ ## Uses
41
+
42
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
+
44
+ ### Direct Use
45
+
46
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
47
+
48
+ [More Information Needed]
49
+
50
+ ### Downstream Use [optional]
51
+
52
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
53
+
54
+ [More Information Needed]
55
+
56
+ ### Out-of-Scope Use
57
+
58
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
59
+
60
+ [More Information Needed]
61
+
62
+ ## Bias, Risks, and Limitations
63
+
64
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
65
+
66
+ [More Information Needed]
67
+
68
+ ### Recommendations
69
+
70
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
71
+
72
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
73
+
74
+ ## How to Get Started with the Model
75
+
76
+ Use the code below to get started with the model.
77
+
78
+ [More Information Needed]
79
+
80
+ ## Training Details
81
+
82
+ ### Training Data
83
+
84
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
85
+
86
+ [More Information Needed]
87
+
88
+ ### Training Procedure
89
+
90
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
91
+
92
+ #### Preprocessing [optional]
93
+
94
+ [More Information Needed]
95
+
96
+
97
+ #### Training Hyperparameters
98
+
99
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
100
+
101
+ #### Speeds, Sizes, Times [optional]
102
+
103
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
104
+
105
+ [More Information Needed]
106
+
107
+ ## Evaluation
108
+
109
+ <!-- This section describes the evaluation protocols and provides the results. -->
110
+
111
+ ### Testing Data, Factors & Metrics
112
+
113
+ #### Testing Data
114
+
115
+ <!-- This should link to a Dataset Card if possible. -->
116
+
117
+ [More Information Needed]
118
+
119
+ #### Factors
120
+
121
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
122
+
123
+ [More Information Needed]
124
+
125
+ #### Metrics
126
+
127
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
128
+
129
+ [More Information Needed]
130
+
131
+ ### Results
132
+
133
+ [More Information Needed]
134
+
135
+ #### Summary
136
+
137
+
138
+
139
+ ## Model Examination [optional]
140
+
141
+ <!-- Relevant interpretability work for the model goes here -->
142
+
143
+ [More Information Needed]
144
+
145
+ ## Environmental Impact
146
+
147
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
148
+
149
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
150
+
151
+ - **Hardware Type:** [More Information Needed]
152
+ - **Hours used:** [More Information Needed]
153
+ - **Cloud Provider:** [More Information Needed]
154
+ - **Compute Region:** [More Information Needed]
155
+ - **Carbon Emitted:** [More Information Needed]
156
+
157
+ ## Technical Specifications [optional]
158
+
159
+ ### Model Architecture and Objective
160
+
161
+ [More Information Needed]
162
+
163
+ ### Compute Infrastructure
164
+
165
+ [More Information Needed]
166
+
167
+ #### Hardware
168
+
169
+ [More Information Needed]
170
+
171
+ #### Software
172
+
173
+ [More Information Needed]
174
+
175
+ ## Citation [optional]
176
+
177
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
178
+
179
+ **BibTeX:**
180
+
181
+ [More Information Needed]
182
+
183
+ **APA:**
184
+
185
+ [More Information Needed]
186
+
187
+ ## Glossary [optional]
188
+
189
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
190
+
191
+ [More Information Needed]
192
+
193
+ ## More Information [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Authors [optional]
198
+
199
+ [More Information Needed]
200
+
201
+ ## Model Card Contact
202
+
203
+ [More Information Needed]
204
+ ### Framework versions
205
+
206
+ - PEFT 0.17.1
Scene/checkpoint-35821/adapter_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "D:\\LLM\\Qwen2.5-7B-Instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": [
22
+ "classifier",
23
+ "score"
24
+ ],
25
+ "peft_type": "LORA",
26
+ "qalora_group_size": 16,
27
+ "r": 16,
28
+ "rank_pattern": {},
29
+ "revision": null,
30
+ "target_modules": [
31
+ "k_proj",
32
+ "down_proj",
33
+ "o_proj",
34
+ "v_proj",
35
+ "up_proj",
36
+ "q_proj",
37
+ "gate_proj"
38
+ ],
39
+ "target_parameters": null,
40
+ "task_type": "SEQ_CLS",
41
+ "trainable_token_indices": null,
42
+ "use_dora": false,
43
+ "use_qalora": false,
44
+ "use_rslora": false
45
+ }
Scene/checkpoint-35821/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4fcf5253dc918a91db69102cb176d33d6906decc1b4e906e63d12ff3cc6bd6e7
3
+ size 80800144
Scene/checkpoint-35821/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
Scene/checkpoint-35821/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
Scene/checkpoint-35821/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
Scene/checkpoint-35821/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79ee259503abdfa8dc95799bbec2dd417b80bde057550bd5ccd312e13affd5e1
3
+ size 161825517
Scene/checkpoint-35821/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c9e74e6748d86233ea3548a2a6aaeea66ce8b0c98c9cf2d2874275fb6535d16
3
+ size 14244
Scene/checkpoint-35821/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10618b9497da7cdb3c1bceedfdcb1166e7ef0a30bff5e38d15feaea24e7d5965
3
+ size 1064
Scene/checkpoint-35821/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
Scene/checkpoint-35821/tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
Scene/checkpoint-35821/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
Scene/checkpoint-35821/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed9d84e724cd357c9a26e067cf1694dd83fcc9174a01bb1e2dfcf6bbe46bf9ad
3
+ size 5496
Scene/checkpoint-35821/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
Topic/Strategy/checkpoint-26250/README.md ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: D:\LLM\Qwen2.5-7B-Instruct
3
+ library_name: peft
4
+ tags:
5
+ - base_model:adapter:D:\LLM\Qwen2.5-7B-Instruct
6
+ - lora
7
+ - transformers
8
+ ---
9
+
10
+ # Model Card for Model ID
11
+
12
+ <!-- Provide a quick summary of what the model is/does. -->
13
+
14
+
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ <!-- Provide a longer summary of what this model is. -->
21
+
22
+
23
+
24
+ - **Developed by:** [More Information Needed]
25
+ - **Funded by [optional]:** [More Information Needed]
26
+ - **Shared by [optional]:** [More Information Needed]
27
+ - **Model type:** [More Information Needed]
28
+ - **Language(s) (NLP):** [More Information Needed]
29
+ - **License:** [More Information Needed]
30
+ - **Finetuned from model [optional]:** [More Information Needed]
31
+
32
+ ### Model Sources [optional]
33
+
34
+ <!-- Provide the basic links for the model. -->
35
+
36
+ - **Repository:** [More Information Needed]
37
+ - **Paper [optional]:** [More Information Needed]
38
+ - **Demo [optional]:** [More Information Needed]
39
+
40
+ ## Uses
41
+
42
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
+
44
+ ### Direct Use
45
+
46
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
47
+
48
+ [More Information Needed]
49
+
50
+ ### Downstream Use [optional]
51
+
52
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
53
+
54
+ [More Information Needed]
55
+
56
+ ### Out-of-Scope Use
57
+
58
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
59
+
60
+ [More Information Needed]
61
+
62
+ ## Bias, Risks, and Limitations
63
+
64
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
65
+
66
+ [More Information Needed]
67
+
68
+ ### Recommendations
69
+
70
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
71
+
72
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
73
+
74
+ ## How to Get Started with the Model
75
+
76
+ Use the code below to get started with the model.
77
+
78
+ [More Information Needed]
79
+
80
+ ## Training Details
81
+
82
+ ### Training Data
83
+
84
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
85
+
86
+ [More Information Needed]
87
+
88
+ ### Training Procedure
89
+
90
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
91
+
92
+ #### Preprocessing [optional]
93
+
94
+ [More Information Needed]
95
+
96
+
97
+ #### Training Hyperparameters
98
+
99
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
100
+
101
+ #### Speeds, Sizes, Times [optional]
102
+
103
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
104
+
105
+ [More Information Needed]
106
+
107
+ ## Evaluation
108
+
109
+ <!-- This section describes the evaluation protocols and provides the results. -->
110
+
111
+ ### Testing Data, Factors & Metrics
112
+
113
+ #### Testing Data
114
+
115
+ <!-- This should link to a Dataset Card if possible. -->
116
+
117
+ [More Information Needed]
118
+
119
+ #### Factors
120
+
121
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
122
+
123
+ [More Information Needed]
124
+
125
+ #### Metrics
126
+
127
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
128
+
129
+ [More Information Needed]
130
+
131
+ ### Results
132
+
133
+ [More Information Needed]
134
+
135
+ #### Summary
136
+
137
+
138
+
139
+ ## Model Examination [optional]
140
+
141
+ <!-- Relevant interpretability work for the model goes here -->
142
+
143
+ [More Information Needed]
144
+
145
+ ## Environmental Impact
146
+
147
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
148
+
149
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
150
+
151
+ - **Hardware Type:** [More Information Needed]
152
+ - **Hours used:** [More Information Needed]
153
+ - **Cloud Provider:** [More Information Needed]
154
+ - **Compute Region:** [More Information Needed]
155
+ - **Carbon Emitted:** [More Information Needed]
156
+
157
+ ## Technical Specifications [optional]
158
+
159
+ ### Model Architecture and Objective
160
+
161
+ [More Information Needed]
162
+
163
+ ### Compute Infrastructure
164
+
165
+ [More Information Needed]
166
+
167
+ #### Hardware
168
+
169
+ [More Information Needed]
170
+
171
+ #### Software
172
+
173
+ [More Information Needed]
174
+
175
+ ## Citation [optional]
176
+
177
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
178
+
179
+ **BibTeX:**
180
+
181
+ [More Information Needed]
182
+
183
+ **APA:**
184
+
185
+ [More Information Needed]
186
+
187
+ ## Glossary [optional]
188
+
189
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
190
+
191
+ [More Information Needed]
192
+
193
+ ## More Information [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Authors [optional]
198
+
199
+ [More Information Needed]
200
+
201
+ ## Model Card Contact
202
+
203
+ [More Information Needed]
204
+ ### Framework versions
205
+
206
+ - PEFT 0.17.1
Topic/Strategy/checkpoint-26250/adapter_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "D:\\LLM\\Qwen2.5-7B-Instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": [
22
+ "classifier",
23
+ "score"
24
+ ],
25
+ "peft_type": "LORA",
26
+ "qalora_group_size": 16,
27
+ "r": 16,
28
+ "rank_pattern": {},
29
+ "revision": null,
30
+ "target_modules": [
31
+ "v_proj",
32
+ "o_proj",
33
+ "up_proj",
34
+ "k_proj",
35
+ "q_proj",
36
+ "gate_proj",
37
+ "down_proj"
38
+ ],
39
+ "target_parameters": null,
40
+ "task_type": "SEQ_CLS",
41
+ "trainable_token_indices": null,
42
+ "use_dora": false,
43
+ "use_qalora": false,
44
+ "use_rslora": false
45
+ }
Topic/Strategy/checkpoint-26250/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c49d9c70dfa8d68314e9699ab6264f87a3b21120e1eaab5b166346f34aaee8f5
3
+ size 80814480
Topic/Strategy/checkpoint-26250/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
Topic/Strategy/checkpoint-26250/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
Topic/Strategy/checkpoint-26250/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
Topic/Strategy/checkpoint-26250/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0cb1fa488b7e86855f27adf3f98dfa27664bcc4d4ec09c474f66bdbd6e48c62f
3
+ size 161854189
Topic/Strategy/checkpoint-26250/plot.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import matplotlib.pyplot as plt
3
+ from pathlib import Path
4
+
5
+ # TODO: 换成你的 checkpoint 路径
6
+ ckpt_dir = Path(r"D:\Task_design\Topic\strategy_train\outputs\qwen7b-lora-topic_strategy\checkpoint-26250")
7
+ state_path = ckpt_dir / "trainer_state.json"
8
+
9
+ with open(state_path, "r", encoding="utf-8") as f:
10
+ state = json.load(f)
11
+
12
+ log_history = state["log_history"]
13
+
14
+ train_steps, train_loss = [], []
15
+ eval_steps, eval_loss = [], []
16
+
17
+ for log in log_history:
18
+ # 训练过程中的 loss 记录
19
+ if "loss" in log and "step" in log:
20
+ train_steps.append(log["step"])
21
+ train_loss.append(log["loss"])
22
+ # 在 dev/validation 上的 eval_loss 记录
23
+ if "eval_loss" in log and "step" in log:
24
+ eval_steps.append(log["step"])
25
+ eval_loss.append(log["eval_loss"])
26
+
27
+ plt.figure()
28
+ if train_steps:
29
+ plt.plot(train_steps, train_loss, label="train_loss")
30
+ if eval_steps:
31
+ plt.plot(eval_steps, eval_loss, label="eval_loss")
32
+
33
+ plt.xlabel("step")
34
+ plt.ylabel("loss")
35
+ plt.title("Training / Eval Loss Curve")
36
+ plt.legend()
37
+ plt.grid(True)
38
+
39
+
40
+ plt.ylim(0, 10)
41
+ # 保存成图片
42
+ out_path = ckpt_dir / "loss_curve.png"
43
+ plt.savefig(out_path, dpi=200)
44
+ print(f"保存训练曲线到: {out_path}")
Topic/Strategy/checkpoint-26250/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e07af4cff68e8dfc429600ad02684407d151274b8f2529f17d4fe32bbc07eefe
3
+ size 14244
Topic/Strategy/checkpoint-26250/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e01cf43cce8612f270f510d4e4e5279edc23410a1d257e1b3131f184f9467e3
3
+ size 1064
Topic/Strategy/checkpoint-26250/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
Topic/Strategy/checkpoint-26250/tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
Topic/Strategy/checkpoint-26250/trainer_state.json ADDED
@@ -0,0 +1,3919 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 26250,
3
+ "best_metric": 0.9502699810655684,
4
+ "best_model_checkpoint": "D:\\Task_design\\Topic\\strategy_train\\outputs\\qwen7b-lora-topic_strategy\\checkpoint-26250",
5
+ "epoch": 0.7518231711901361,
6
+ "eval_steps": 1250,
7
+ "global_step": 26250,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.0014320441356002593,
14
+ "grad_norm": 608.0,
15
+ "learning_rate": 9.351145038167939e-06,
16
+ "loss": 28.2021,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.0028640882712005185,
21
+ "grad_norm": 326.0,
22
+ "learning_rate": 1.8893129770992367e-05,
23
+ "loss": 13.4193,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.004296132406800777,
28
+ "grad_norm": 600.0,
29
+ "learning_rate": 2.8435114503816796e-05,
30
+ "loss": 8.5573,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.005728176542401037,
35
+ "grad_norm": 146.0,
36
+ "learning_rate": 3.797709923664122e-05,
37
+ "loss": 4.1909,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.007160220678001296,
42
+ "grad_norm": 372.0,
43
+ "learning_rate": 4.751908396946565e-05,
44
+ "loss": 3.6133,
45
+ "step": 250
46
+ },
47
+ {
48
+ "epoch": 0.008592264813601555,
49
+ "grad_norm": 36.75,
50
+ "learning_rate": 5.7061068702290074e-05,
51
+ "loss": 2.8409,
52
+ "step": 300
53
+ },
54
+ {
55
+ "epoch": 0.010024308949201815,
56
+ "grad_norm": 210.0,
57
+ "learning_rate": 6.66030534351145e-05,
58
+ "loss": 2.9306,
59
+ "step": 350
60
+ },
61
+ {
62
+ "epoch": 0.011456353084802074,
63
+ "grad_norm": 0.00037384033203125,
64
+ "learning_rate": 7.614503816793893e-05,
65
+ "loss": 4.0291,
66
+ "step": 400
67
+ },
68
+ {
69
+ "epoch": 0.012888397220402333,
70
+ "grad_norm": 24.875,
71
+ "learning_rate": 8.568702290076335e-05,
72
+ "loss": 4.0621,
73
+ "step": 450
74
+ },
75
+ {
76
+ "epoch": 0.014320441356002592,
77
+ "grad_norm": 378.0,
78
+ "learning_rate": 9.522900763358779e-05,
79
+ "loss": 5.0668,
80
+ "step": 500
81
+ },
82
+ {
83
+ "epoch": 0.015752485491602852,
84
+ "grad_norm": 20.5,
85
+ "learning_rate": 0.00010477099236641222,
86
+ "loss": 3.8491,
87
+ "step": 550
88
+ },
89
+ {
90
+ "epoch": 0.01718452962720311,
91
+ "grad_norm": 478.0,
92
+ "learning_rate": 0.00011431297709923666,
93
+ "loss": 2.9158,
94
+ "step": 600
95
+ },
96
+ {
97
+ "epoch": 0.01861657376280337,
98
+ "grad_norm": 53.5,
99
+ "learning_rate": 0.00012385496183206106,
100
+ "loss": 4.3353,
101
+ "step": 650
102
+ },
103
+ {
104
+ "epoch": 0.02004861789840363,
105
+ "grad_norm": 0.0010986328125,
106
+ "learning_rate": 0.0001333969465648855,
107
+ "loss": 4.2307,
108
+ "step": 700
109
+ },
110
+ {
111
+ "epoch": 0.021480662034003888,
112
+ "grad_norm": 44.75,
113
+ "learning_rate": 0.0001429389312977099,
114
+ "loss": 3.1633,
115
+ "step": 750
116
+ },
117
+ {
118
+ "epoch": 0.022912706169604148,
119
+ "grad_norm": 396.0,
120
+ "learning_rate": 0.00015248091603053436,
121
+ "loss": 4.8827,
122
+ "step": 800
123
+ },
124
+ {
125
+ "epoch": 0.024344750305204405,
126
+ "grad_norm": 3.838539123535156e-05,
127
+ "learning_rate": 0.0001620229007633588,
128
+ "loss": 3.0475,
129
+ "step": 850
130
+ },
131
+ {
132
+ "epoch": 0.025776794440804666,
133
+ "grad_norm": 12.625,
134
+ "learning_rate": 0.0001715648854961832,
135
+ "loss": 4.3648,
136
+ "step": 900
137
+ },
138
+ {
139
+ "epoch": 0.027208838576404926,
140
+ "grad_norm": 0.009033203125,
141
+ "learning_rate": 0.00018110687022900764,
142
+ "loss": 3.8295,
143
+ "step": 950
144
+ },
145
+ {
146
+ "epoch": 0.028640882712005183,
147
+ "grad_norm": 304.0,
148
+ "learning_rate": 0.00019064885496183207,
149
+ "loss": 5.2518,
150
+ "step": 1000
151
+ },
152
+ {
153
+ "epoch": 0.030072926847605444,
154
+ "grad_norm": 872.0,
155
+ "learning_rate": 0.0001999940947206803,
156
+ "loss": 17.7632,
157
+ "step": 1050
158
+ },
159
+ {
160
+ "epoch": 0.031504970983205705,
161
+ "grad_norm": 328.0,
162
+ "learning_rate": 0.00019969883075469472,
163
+ "loss": 16.0223,
164
+ "step": 1100
165
+ },
166
+ {
167
+ "epoch": 0.03293701511880596,
168
+ "grad_norm": 468.0,
169
+ "learning_rate": 0.0001994035667887091,
170
+ "loss": 10.9938,
171
+ "step": 1150
172
+ },
173
+ {
174
+ "epoch": 0.03436905925440622,
175
+ "grad_norm": 462.0,
176
+ "learning_rate": 0.00019910830282272353,
177
+ "loss": 9.1089,
178
+ "step": 1200
179
+ },
180
+ {
181
+ "epoch": 0.03580110339000648,
182
+ "grad_norm": 43.25,
183
+ "learning_rate": 0.00019881303885673795,
184
+ "loss": 8.8076,
185
+ "step": 1250
186
+ },
187
+ {
188
+ "epoch": 0.03580110339000648,
189
+ "eval_accuracy": 0.473,
190
+ "eval_loss": 1.0368720293045044,
191
+ "eval_macro_f1": 0.3829550887916939,
192
+ "eval_runtime": 172.7823,
193
+ "eval_samples_per_second": 11.575,
194
+ "eval_steps_per_second": 11.575,
195
+ "step": 1250
196
+ },
197
+ {
198
+ "epoch": 0.03723314752560674,
199
+ "grad_norm": 276.0,
200
+ "learning_rate": 0.00019851777489075234,
201
+ "loss": 8.0924,
202
+ "step": 1300
203
+ },
204
+ {
205
+ "epoch": 0.038665191661207,
206
+ "grad_norm": 150.0,
207
+ "learning_rate": 0.00019822251092476676,
208
+ "loss": 7.168,
209
+ "step": 1350
210
+ },
211
+ {
212
+ "epoch": 0.04009723579680726,
213
+ "grad_norm": 282.0,
214
+ "learning_rate": 0.00019792724695878115,
215
+ "loss": 6.6729,
216
+ "step": 1400
217
+ },
218
+ {
219
+ "epoch": 0.041529279932407515,
220
+ "grad_norm": 155.0,
221
+ "learning_rate": 0.00019763198299279557,
222
+ "loss": 6.2658,
223
+ "step": 1450
224
+ },
225
+ {
226
+ "epoch": 0.042961324068007775,
227
+ "grad_norm": 95.0,
228
+ "learning_rate": 0.00019733671902680996,
229
+ "loss": 4.2749,
230
+ "step": 1500
231
+ },
232
+ {
233
+ "epoch": 0.044393368203608036,
234
+ "grad_norm": 121.5,
235
+ "learning_rate": 0.00019704145506082438,
236
+ "loss": 6.0376,
237
+ "step": 1550
238
+ },
239
+ {
240
+ "epoch": 0.045825412339208296,
241
+ "grad_norm": 484.0,
242
+ "learning_rate": 0.0001967461910948388,
243
+ "loss": 5.4624,
244
+ "step": 1600
245
+ },
246
+ {
247
+ "epoch": 0.04725745647480856,
248
+ "grad_norm": 83.0,
249
+ "learning_rate": 0.00019645092712885321,
250
+ "loss": 4.8571,
251
+ "step": 1650
252
+ },
253
+ {
254
+ "epoch": 0.04868950061040881,
255
+ "grad_norm": 86.0,
256
+ "learning_rate": 0.00019615566316286763,
257
+ "loss": 5.2631,
258
+ "step": 1700
259
+ },
260
+ {
261
+ "epoch": 0.05012154474600907,
262
+ "grad_norm": 0.35546875,
263
+ "learning_rate": 0.00019586039919688202,
264
+ "loss": 4.2013,
265
+ "step": 1750
266
+ },
267
+ {
268
+ "epoch": 0.05155358888160933,
269
+ "grad_norm": 58.0,
270
+ "learning_rate": 0.00019556513523089644,
271
+ "loss": 5.4813,
272
+ "step": 1800
273
+ },
274
+ {
275
+ "epoch": 0.05298563301720959,
276
+ "grad_norm": 868.0,
277
+ "learning_rate": 0.00019526987126491083,
278
+ "loss": 4.6324,
279
+ "step": 1850
280
+ },
281
+ {
282
+ "epoch": 0.05441767715280985,
283
+ "grad_norm": 169.0,
284
+ "learning_rate": 0.00019497460729892525,
285
+ "loss": 3.9849,
286
+ "step": 1900
287
+ },
288
+ {
289
+ "epoch": 0.055849721288410106,
290
+ "grad_norm": 4.28125,
291
+ "learning_rate": 0.00019467934333293967,
292
+ "loss": 3.2505,
293
+ "step": 1950
294
+ },
295
+ {
296
+ "epoch": 0.05728176542401037,
297
+ "grad_norm": 11.0625,
298
+ "learning_rate": 0.00019438407936695406,
299
+ "loss": 3.7368,
300
+ "step": 2000
301
+ },
302
+ {
303
+ "epoch": 0.05871380955961063,
304
+ "grad_norm": 40.75,
305
+ "learning_rate": 0.00019408881540096848,
306
+ "loss": 4.2252,
307
+ "step": 2050
308
+ },
309
+ {
310
+ "epoch": 0.06014585369521089,
311
+ "grad_norm": 2240.0,
312
+ "learning_rate": 0.00019379355143498287,
313
+ "loss": 3.8708,
314
+ "step": 2100
315
+ },
316
+ {
317
+ "epoch": 0.06157789783081115,
318
+ "grad_norm": 318.0,
319
+ "learning_rate": 0.0001934982874689973,
320
+ "loss": 3.7427,
321
+ "step": 2150
322
+ },
323
+ {
324
+ "epoch": 0.06300994196641141,
325
+ "grad_norm": 79.0,
326
+ "learning_rate": 0.00019320302350301168,
327
+ "loss": 2.5798,
328
+ "step": 2200
329
+ },
330
+ {
331
+ "epoch": 0.06444198610201167,
332
+ "grad_norm": 188.0,
333
+ "learning_rate": 0.0001929077595370261,
334
+ "loss": 3.2888,
335
+ "step": 2250
336
+ },
337
+ {
338
+ "epoch": 0.06587403023761192,
339
+ "grad_norm": 255.0,
340
+ "learning_rate": 0.00019261249557104052,
341
+ "loss": 3.5956,
342
+ "step": 2300
343
+ },
344
+ {
345
+ "epoch": 0.06730607437321218,
346
+ "grad_norm": 111.5,
347
+ "learning_rate": 0.0001923172316050549,
348
+ "loss": 2.6906,
349
+ "step": 2350
350
+ },
351
+ {
352
+ "epoch": 0.06873811850881244,
353
+ "grad_norm": 8.8125,
354
+ "learning_rate": 0.00019202196763906933,
355
+ "loss": 2.9821,
356
+ "step": 2400
357
+ },
358
+ {
359
+ "epoch": 0.0701701626444127,
360
+ "grad_norm": 62.25,
361
+ "learning_rate": 0.00019172670367308375,
362
+ "loss": 2.9432,
363
+ "step": 2450
364
+ },
365
+ {
366
+ "epoch": 0.07160220678001296,
367
+ "grad_norm": 268.0,
368
+ "learning_rate": 0.00019143143970709817,
369
+ "loss": 5.8543,
370
+ "step": 2500
371
+ },
372
+ {
373
+ "epoch": 0.07160220678001296,
374
+ "eval_accuracy": 0.8855,
375
+ "eval_loss": 0.4594672918319702,
376
+ "eval_macro_f1": 0.8847948863660271,
377
+ "eval_runtime": 174.3198,
378
+ "eval_samples_per_second": 11.473,
379
+ "eval_steps_per_second": 11.473,
380
+ "step": 2500
381
+ },
382
+ {
383
+ "epoch": 0.07303425091561322,
384
+ "grad_norm": 1.703125,
385
+ "learning_rate": 0.00019113617574111256,
386
+ "loss": 4.3718,
387
+ "step": 2550
388
+ },
389
+ {
390
+ "epoch": 0.07446629505121348,
391
+ "grad_norm": 6.125,
392
+ "learning_rate": 0.00019084091177512698,
393
+ "loss": 5.6269,
394
+ "step": 2600
395
+ },
396
+ {
397
+ "epoch": 0.07589833918681374,
398
+ "grad_norm": 2.796875,
399
+ "learning_rate": 0.0001905456478091414,
400
+ "loss": 4.2341,
401
+ "step": 2650
402
+ },
403
+ {
404
+ "epoch": 0.077330383322414,
405
+ "grad_norm": 608.0,
406
+ "learning_rate": 0.0001902503838431558,
407
+ "loss": 3.3186,
408
+ "step": 2700
409
+ },
410
+ {
411
+ "epoch": 0.07876242745801426,
412
+ "grad_norm": 140.0,
413
+ "learning_rate": 0.0001899551198771702,
414
+ "loss": 5.9126,
415
+ "step": 2750
416
+ },
417
+ {
418
+ "epoch": 0.08019447159361452,
419
+ "grad_norm": 824.0,
420
+ "learning_rate": 0.0001896598559111846,
421
+ "loss": 5.0582,
422
+ "step": 2800
423
+ },
424
+ {
425
+ "epoch": 0.08162651572921477,
426
+ "grad_norm": 165.0,
427
+ "learning_rate": 0.00018936459194519902,
428
+ "loss": 3.5105,
429
+ "step": 2850
430
+ },
431
+ {
432
+ "epoch": 0.08305855986481503,
433
+ "grad_norm": 0.0284423828125,
434
+ "learning_rate": 0.0001890693279792134,
435
+ "loss": 4.6236,
436
+ "step": 2900
437
+ },
438
+ {
439
+ "epoch": 0.08449060400041529,
440
+ "grad_norm": 116.5,
441
+ "learning_rate": 0.00018877406401322783,
442
+ "loss": 3.9021,
443
+ "step": 2950
444
+ },
445
+ {
446
+ "epoch": 0.08592264813601555,
447
+ "grad_norm": 0.251953125,
448
+ "learning_rate": 0.00018847880004724225,
449
+ "loss": 3.883,
450
+ "step": 3000
451
+ },
452
+ {
453
+ "epoch": 0.08735469227161581,
454
+ "grad_norm": 0.0260009765625,
455
+ "learning_rate": 0.00018818353608125664,
456
+ "loss": 3.9736,
457
+ "step": 3050
458
+ },
459
+ {
460
+ "epoch": 0.08878673640721607,
461
+ "grad_norm": 290.0,
462
+ "learning_rate": 0.00018788827211527106,
463
+ "loss": 5.218,
464
+ "step": 3100
465
+ },
466
+ {
467
+ "epoch": 0.09021878054281633,
468
+ "grad_norm": 1720.0,
469
+ "learning_rate": 0.00018759300814928548,
470
+ "loss": 3.2961,
471
+ "step": 3150
472
+ },
473
+ {
474
+ "epoch": 0.09165082467841659,
475
+ "grad_norm": 0.26953125,
476
+ "learning_rate": 0.0001872977441832999,
477
+ "loss": 3.4482,
478
+ "step": 3200
479
+ },
480
+ {
481
+ "epoch": 0.09308286881401685,
482
+ "grad_norm": 13.375,
483
+ "learning_rate": 0.0001870024802173143,
484
+ "loss": 2.928,
485
+ "step": 3250
486
+ },
487
+ {
488
+ "epoch": 0.09451491294961711,
489
+ "grad_norm": 73.0,
490
+ "learning_rate": 0.0001867072162513287,
491
+ "loss": 3.4569,
492
+ "step": 3300
493
+ },
494
+ {
495
+ "epoch": 0.09594695708521736,
496
+ "grad_norm": 9.25,
497
+ "learning_rate": 0.00018641195228534313,
498
+ "loss": 3.8492,
499
+ "step": 3350
500
+ },
501
+ {
502
+ "epoch": 0.09737900122081762,
503
+ "grad_norm": 274.0,
504
+ "learning_rate": 0.00018611668831935752,
505
+ "loss": 3.4008,
506
+ "step": 3400
507
+ },
508
+ {
509
+ "epoch": 0.09881104535641788,
510
+ "grad_norm": 158.0,
511
+ "learning_rate": 0.00018582142435337194,
512
+ "loss": 3.6703,
513
+ "step": 3450
514
+ },
515
+ {
516
+ "epoch": 0.10024308949201814,
517
+ "grad_norm": 264.0,
518
+ "learning_rate": 0.00018552616038738633,
519
+ "loss": 3.4321,
520
+ "step": 3500
521
+ },
522
+ {
523
+ "epoch": 0.1016751336276184,
524
+ "grad_norm": 2.015625,
525
+ "learning_rate": 0.00018523089642140075,
526
+ "loss": 2.4367,
527
+ "step": 3550
528
+ },
529
+ {
530
+ "epoch": 0.10310717776321866,
531
+ "grad_norm": 270.0,
532
+ "learning_rate": 0.00018493563245541514,
533
+ "loss": 3.6473,
534
+ "step": 3600
535
+ },
536
+ {
537
+ "epoch": 0.10453922189881892,
538
+ "grad_norm": 0.0478515625,
539
+ "learning_rate": 0.00018464036848942956,
540
+ "loss": 2.3759,
541
+ "step": 3650
542
+ },
543
+ {
544
+ "epoch": 0.10597126603441918,
545
+ "grad_norm": 282.0,
546
+ "learning_rate": 0.00018434510452344395,
547
+ "loss": 2.5434,
548
+ "step": 3700
549
+ },
550
+ {
551
+ "epoch": 0.10740331017001944,
552
+ "grad_norm": 100.0,
553
+ "learning_rate": 0.00018404984055745837,
554
+ "loss": 2.4411,
555
+ "step": 3750
556
+ },
557
+ {
558
+ "epoch": 0.10740331017001944,
559
+ "eval_accuracy": 0.911,
560
+ "eval_loss": 0.6009318232536316,
561
+ "eval_macro_f1": 0.9109307309196768,
562
+ "eval_runtime": 173.1976,
563
+ "eval_samples_per_second": 11.548,
564
+ "eval_steps_per_second": 11.548,
565
+ "step": 3750
566
+ },
567
+ {
568
+ "epoch": 0.1088353543056197,
569
+ "grad_norm": 8.875,
570
+ "learning_rate": 0.00018375457659147279,
571
+ "loss": 3.7952,
572
+ "step": 3800
573
+ },
574
+ {
575
+ "epoch": 0.11026739844121995,
576
+ "grad_norm": 408.0,
577
+ "learning_rate": 0.00018345931262548718,
578
+ "loss": 2.7528,
579
+ "step": 3850
580
+ },
581
+ {
582
+ "epoch": 0.11169944257682021,
583
+ "grad_norm": 4.65625,
584
+ "learning_rate": 0.0001831640486595016,
585
+ "loss": 3.0934,
586
+ "step": 3900
587
+ },
588
+ {
589
+ "epoch": 0.11313148671242047,
590
+ "grad_norm": 0.0274658203125,
591
+ "learning_rate": 0.00018286878469351601,
592
+ "loss": 3.3618,
593
+ "step": 3950
594
+ },
595
+ {
596
+ "epoch": 0.11456353084802073,
597
+ "grad_norm": 93.0,
598
+ "learning_rate": 0.00018257352072753043,
599
+ "loss": 3.635,
600
+ "step": 4000
601
+ },
602
+ {
603
+ "epoch": 0.115995574983621,
604
+ "grad_norm": 159.0,
605
+ "learning_rate": 0.00018227825676154482,
606
+ "loss": 2.3589,
607
+ "step": 4050
608
+ },
609
+ {
610
+ "epoch": 0.11742761911922125,
611
+ "grad_norm": 290.0,
612
+ "learning_rate": 0.00018198299279555924,
613
+ "loss": 3.9717,
614
+ "step": 4100
615
+ },
616
+ {
617
+ "epoch": 0.11885966325482152,
618
+ "grad_norm": 8.375,
619
+ "learning_rate": 0.00018168772882957366,
620
+ "loss": 3.0616,
621
+ "step": 4150
622
+ },
623
+ {
624
+ "epoch": 0.12029170739042178,
625
+ "grad_norm": 264.0,
626
+ "learning_rate": 0.00018139246486358805,
627
+ "loss": 3.4315,
628
+ "step": 4200
629
+ },
630
+ {
631
+ "epoch": 0.12172375152602204,
632
+ "grad_norm": 206.0,
633
+ "learning_rate": 0.00018109720089760247,
634
+ "loss": 3.3353,
635
+ "step": 4250
636
+ },
637
+ {
638
+ "epoch": 0.1231557956616223,
639
+ "grad_norm": 0.37109375,
640
+ "learning_rate": 0.00018080193693161686,
641
+ "loss": 2.7568,
642
+ "step": 4300
643
+ },
644
+ {
645
+ "epoch": 0.12458783979722254,
646
+ "grad_norm": 314.0,
647
+ "learning_rate": 0.00018050667296563128,
648
+ "loss": 3.0107,
649
+ "step": 4350
650
+ },
651
+ {
652
+ "epoch": 0.12601988393282282,
653
+ "grad_norm": 992.0,
654
+ "learning_rate": 0.00018021140899964567,
655
+ "loss": 2.8247,
656
+ "step": 4400
657
+ },
658
+ {
659
+ "epoch": 0.12745192806842306,
660
+ "grad_norm": 225.0,
661
+ "learning_rate": 0.0001799161450336601,
662
+ "loss": 3.3408,
663
+ "step": 4450
664
+ },
665
+ {
666
+ "epoch": 0.12888397220402334,
667
+ "grad_norm": 0.703125,
668
+ "learning_rate": 0.0001796208810676745,
669
+ "loss": 2.8974,
670
+ "step": 4500
671
+ },
672
+ {
673
+ "epoch": 0.13031601633962359,
674
+ "grad_norm": 280.0,
675
+ "learning_rate": 0.0001793256171016889,
676
+ "loss": 2.8223,
677
+ "step": 4550
678
+ },
679
+ {
680
+ "epoch": 0.13174806047522383,
681
+ "grad_norm": 180.0,
682
+ "learning_rate": 0.00017903035313570332,
683
+ "loss": 3.7603,
684
+ "step": 4600
685
+ },
686
+ {
687
+ "epoch": 0.1331801046108241,
688
+ "grad_norm": 6.28125,
689
+ "learning_rate": 0.00017873508916971774,
690
+ "loss": 4.2271,
691
+ "step": 4650
692
+ },
693
+ {
694
+ "epoch": 0.13461214874642435,
695
+ "grad_norm": 338.0,
696
+ "learning_rate": 0.00017843982520373216,
697
+ "loss": 3.2114,
698
+ "step": 4700
699
+ },
700
+ {
701
+ "epoch": 0.13604419288202463,
702
+ "grad_norm": 0.6796875,
703
+ "learning_rate": 0.00017814456123774655,
704
+ "loss": 3.4457,
705
+ "step": 4750
706
+ },
707
+ {
708
+ "epoch": 0.13747623701762487,
709
+ "grad_norm": 2.34375,
710
+ "learning_rate": 0.00017784929727176097,
711
+ "loss": 2.2643,
712
+ "step": 4800
713
+ },
714
+ {
715
+ "epoch": 0.13890828115322515,
716
+ "grad_norm": 288.0,
717
+ "learning_rate": 0.0001775540333057754,
718
+ "loss": 3.0672,
719
+ "step": 4850
720
+ },
721
+ {
722
+ "epoch": 0.1403403252888254,
723
+ "grad_norm": 1896.0,
724
+ "learning_rate": 0.00017725876933978978,
725
+ "loss": 2.8551,
726
+ "step": 4900
727
+ },
728
+ {
729
+ "epoch": 0.14177236942442567,
730
+ "grad_norm": 88.0,
731
+ "learning_rate": 0.0001769635053738042,
732
+ "loss": 3.5021,
733
+ "step": 4950
734
+ },
735
+ {
736
+ "epoch": 0.14320441356002592,
737
+ "grad_norm": 94.0,
738
+ "learning_rate": 0.0001766682414078186,
739
+ "loss": 2.1413,
740
+ "step": 5000
741
+ },
742
+ {
743
+ "epoch": 0.14320441356002592,
744
+ "eval_accuracy": 0.917,
745
+ "eval_loss": 0.3995007872581482,
746
+ "eval_macro_f1": 0.9161602620439439,
747
+ "eval_runtime": 179.9592,
748
+ "eval_samples_per_second": 11.114,
749
+ "eval_steps_per_second": 11.114,
750
+ "step": 5000
751
+ },
752
+ {
753
+ "epoch": 0.1446364576956262,
754
+ "grad_norm": 0.97265625,
755
+ "learning_rate": 0.000176372977441833,
756
+ "loss": 2.3626,
757
+ "step": 5050
758
+ },
759
+ {
760
+ "epoch": 0.14606850183122644,
761
+ "grad_norm": 266.0,
762
+ "learning_rate": 0.0001760777134758474,
763
+ "loss": 3.3284,
764
+ "step": 5100
765
+ },
766
+ {
767
+ "epoch": 0.14750054596682668,
768
+ "grad_norm": 0.2314453125,
769
+ "learning_rate": 0.00017578244950986182,
770
+ "loss": 2.2628,
771
+ "step": 5150
772
+ },
773
+ {
774
+ "epoch": 0.14893259010242696,
775
+ "grad_norm": 237.0,
776
+ "learning_rate": 0.00017548718554387624,
777
+ "loss": 2.5359,
778
+ "step": 5200
779
+ },
780
+ {
781
+ "epoch": 0.1503646342380272,
782
+ "grad_norm": 65.5,
783
+ "learning_rate": 0.00017519192157789063,
784
+ "loss": 2.5109,
785
+ "step": 5250
786
+ },
787
+ {
788
+ "epoch": 0.15179667837362748,
789
+ "grad_norm": 0.2197265625,
790
+ "learning_rate": 0.00017489665761190505,
791
+ "loss": 3.4319,
792
+ "step": 5300
793
+ },
794
+ {
795
+ "epoch": 0.15322872250922773,
796
+ "grad_norm": 140.0,
797
+ "learning_rate": 0.00017460139364591944,
798
+ "loss": 2.149,
799
+ "step": 5350
800
+ },
801
+ {
802
+ "epoch": 0.154660766644828,
803
+ "grad_norm": 74.0,
804
+ "learning_rate": 0.00017430612967993386,
805
+ "loss": 3.3437,
806
+ "step": 5400
807
+ },
808
+ {
809
+ "epoch": 0.15609281078042825,
810
+ "grad_norm": 160.0,
811
+ "learning_rate": 0.00017401086571394828,
812
+ "loss": 3.2952,
813
+ "step": 5450
814
+ },
815
+ {
816
+ "epoch": 0.15752485491602852,
817
+ "grad_norm": 0.451171875,
818
+ "learning_rate": 0.0001737156017479627,
819
+ "loss": 2.6442,
820
+ "step": 5500
821
+ },
822
+ {
823
+ "epoch": 0.15895689905162877,
824
+ "grad_norm": 246.0,
825
+ "learning_rate": 0.00017342033778197712,
826
+ "loss": 2.1805,
827
+ "step": 5550
828
+ },
829
+ {
830
+ "epoch": 0.16038894318722904,
831
+ "grad_norm": 0.1669921875,
832
+ "learning_rate": 0.0001731250738159915,
833
+ "loss": 2.957,
834
+ "step": 5600
835
+ },
836
+ {
837
+ "epoch": 0.1618209873228293,
838
+ "grad_norm": 11.4375,
839
+ "learning_rate": 0.00017282980985000593,
840
+ "loss": 3.791,
841
+ "step": 5650
842
+ },
843
+ {
844
+ "epoch": 0.16325303145842954,
845
+ "grad_norm": 241.0,
846
+ "learning_rate": 0.00017253454588402032,
847
+ "loss": 2.3945,
848
+ "step": 5700
849
+ },
850
+ {
851
+ "epoch": 0.1646850755940298,
852
+ "grad_norm": 0.1572265625,
853
+ "learning_rate": 0.00017223928191803474,
854
+ "loss": 2.3927,
855
+ "step": 5750
856
+ },
857
+ {
858
+ "epoch": 0.16611711972963006,
859
+ "grad_norm": 0.06494140625,
860
+ "learning_rate": 0.00017194401795204913,
861
+ "loss": 2.4573,
862
+ "step": 5800
863
+ },
864
+ {
865
+ "epoch": 0.16754916386523033,
866
+ "grad_norm": 0.34375,
867
+ "learning_rate": 0.00017164875398606355,
868
+ "loss": 2.6829,
869
+ "step": 5850
870
+ },
871
+ {
872
+ "epoch": 0.16898120800083058,
873
+ "grad_norm": 7.46875,
874
+ "learning_rate": 0.00017135349002007797,
875
+ "loss": 3.0156,
876
+ "step": 5900
877
+ },
878
+ {
879
+ "epoch": 0.17041325213643085,
880
+ "grad_norm": 0.2470703125,
881
+ "learning_rate": 0.00017105822605409236,
882
+ "loss": 2.5155,
883
+ "step": 5950
884
+ },
885
+ {
886
+ "epoch": 0.1718452962720311,
887
+ "grad_norm": 3.65625,
888
+ "learning_rate": 0.00017076296208810678,
889
+ "loss": 2.5886,
890
+ "step": 6000
891
+ },
892
+ {
893
+ "epoch": 0.17327734040763138,
894
+ "grad_norm": 420.0,
895
+ "learning_rate": 0.00017046769812212117,
896
+ "loss": 3.7327,
897
+ "step": 6050
898
+ },
899
+ {
900
+ "epoch": 0.17470938454323162,
901
+ "grad_norm": 88.0,
902
+ "learning_rate": 0.00017017243415613559,
903
+ "loss": 4.1712,
904
+ "step": 6100
905
+ },
906
+ {
907
+ "epoch": 0.17614142867883187,
908
+ "grad_norm": 1864.0,
909
+ "learning_rate": 0.00016987717019015,
910
+ "loss": 3.0617,
911
+ "step": 6150
912
+ },
913
+ {
914
+ "epoch": 0.17757347281443214,
915
+ "grad_norm": 56.25,
916
+ "learning_rate": 0.00016958190622416442,
917
+ "loss": 2.6603,
918
+ "step": 6200
919
+ },
920
+ {
921
+ "epoch": 0.1790055169500324,
922
+ "grad_norm": 25.25,
923
+ "learning_rate": 0.00016928664225817884,
924
+ "loss": 2.7308,
925
+ "step": 6250
926
+ },
927
+ {
928
+ "epoch": 0.1790055169500324,
929
+ "eval_accuracy": 0.9195,
930
+ "eval_loss": 0.47414371371269226,
931
+ "eval_macro_f1": 0.9193664539192946,
932
+ "eval_runtime": 182.0886,
933
+ "eval_samples_per_second": 10.984,
934
+ "eval_steps_per_second": 10.984,
935
+ "step": 6250
936
+ },
937
+ {
938
+ "epoch": 0.18043756108563266,
939
+ "grad_norm": 66.5,
940
+ "learning_rate": 0.00016899137829219323,
941
+ "loss": 2.9805,
942
+ "step": 6300
943
+ },
944
+ {
945
+ "epoch": 0.1818696052212329,
946
+ "grad_norm": 119.0,
947
+ "learning_rate": 0.00016869611432620765,
948
+ "loss": 2.343,
949
+ "step": 6350
950
+ },
951
+ {
952
+ "epoch": 0.18330164935683319,
953
+ "grad_norm": 0.14453125,
954
+ "learning_rate": 0.00016840085036022204,
955
+ "loss": 2.5346,
956
+ "step": 6400
957
+ },
958
+ {
959
+ "epoch": 0.18473369349243343,
960
+ "grad_norm": 67.5,
961
+ "learning_rate": 0.00016810558639423646,
962
+ "loss": 2.6565,
963
+ "step": 6450
964
+ },
965
+ {
966
+ "epoch": 0.1861657376280337,
967
+ "grad_norm": 14.0,
968
+ "learning_rate": 0.00016781032242825085,
969
+ "loss": 3.2329,
970
+ "step": 6500
971
+ },
972
+ {
973
+ "epoch": 0.18759778176363395,
974
+ "grad_norm": 1408.0,
975
+ "learning_rate": 0.00016751505846226527,
976
+ "loss": 2.7886,
977
+ "step": 6550
978
+ },
979
+ {
980
+ "epoch": 0.18902982589923423,
981
+ "grad_norm": 23.0,
982
+ "learning_rate": 0.0001672197944962797,
983
+ "loss": 2.2165,
984
+ "step": 6600
985
+ },
986
+ {
987
+ "epoch": 0.19046187003483447,
988
+ "grad_norm": 88.0,
989
+ "learning_rate": 0.00016692453053029408,
990
+ "loss": 2.826,
991
+ "step": 6650
992
+ },
993
+ {
994
+ "epoch": 0.19189391417043472,
995
+ "grad_norm": 7.28125,
996
+ "learning_rate": 0.0001666292665643085,
997
+ "loss": 2.6884,
998
+ "step": 6700
999
+ },
1000
+ {
1001
+ "epoch": 0.193325958306035,
1002
+ "grad_norm": 4.3125,
1003
+ "learning_rate": 0.0001663340025983229,
1004
+ "loss": 2.3811,
1005
+ "step": 6750
1006
+ },
1007
+ {
1008
+ "epoch": 0.19475800244163524,
1009
+ "grad_norm": 2.78125,
1010
+ "learning_rate": 0.0001660387386323373,
1011
+ "loss": 2.1648,
1012
+ "step": 6800
1013
+ },
1014
+ {
1015
+ "epoch": 0.19619004657723552,
1016
+ "grad_norm": 2.65625,
1017
+ "learning_rate": 0.0001657434746663517,
1018
+ "loss": 2.0769,
1019
+ "step": 6850
1020
+ },
1021
+ {
1022
+ "epoch": 0.19762209071283576,
1023
+ "grad_norm": 0.337890625,
1024
+ "learning_rate": 0.00016544821070036612,
1025
+ "loss": 3.2644,
1026
+ "step": 6900
1027
+ },
1028
+ {
1029
+ "epoch": 0.19905413484843604,
1030
+ "grad_norm": 5.15625,
1031
+ "learning_rate": 0.00016515294673438054,
1032
+ "loss": 3.1548,
1033
+ "step": 6950
1034
+ },
1035
+ {
1036
+ "epoch": 0.20048617898403628,
1037
+ "grad_norm": 52.75,
1038
+ "learning_rate": 0.00016485768276839496,
1039
+ "loss": 2.3094,
1040
+ "step": 7000
1041
+ },
1042
+ {
1043
+ "epoch": 0.20191822311963656,
1044
+ "grad_norm": 0.15625,
1045
+ "learning_rate": 0.00016456241880240938,
1046
+ "loss": 2.2522,
1047
+ "step": 7050
1048
+ },
1049
+ {
1050
+ "epoch": 0.2033502672552368,
1051
+ "grad_norm": 0.09521484375,
1052
+ "learning_rate": 0.00016426715483642377,
1053
+ "loss": 2.1453,
1054
+ "step": 7100
1055
+ },
1056
+ {
1057
+ "epoch": 0.20478231139083705,
1058
+ "grad_norm": 274.0,
1059
+ "learning_rate": 0.0001639718908704382,
1060
+ "loss": 2.8386,
1061
+ "step": 7150
1062
+ },
1063
+ {
1064
+ "epoch": 0.20621435552643733,
1065
+ "grad_norm": 274.0,
1066
+ "learning_rate": 0.00016367662690445258,
1067
+ "loss": 3.5395,
1068
+ "step": 7200
1069
+ },
1070
+ {
1071
+ "epoch": 0.20764639966203757,
1072
+ "grad_norm": 81.0,
1073
+ "learning_rate": 0.000163381362938467,
1074
+ "loss": 2.668,
1075
+ "step": 7250
1076
+ },
1077
+ {
1078
+ "epoch": 0.20907844379763785,
1079
+ "grad_norm": 0.1162109375,
1080
+ "learning_rate": 0.00016308609897248142,
1081
+ "loss": 2.2543,
1082
+ "step": 7300
1083
+ },
1084
+ {
1085
+ "epoch": 0.2105104879332381,
1086
+ "grad_norm": 0.05517578125,
1087
+ "learning_rate": 0.0001627908350064958,
1088
+ "loss": 2.4399,
1089
+ "step": 7350
1090
+ },
1091
+ {
1092
+ "epoch": 0.21194253206883837,
1093
+ "grad_norm": 0.283203125,
1094
+ "learning_rate": 0.00016249557104051023,
1095
+ "loss": 2.0814,
1096
+ "step": 7400
1097
+ },
1098
+ {
1099
+ "epoch": 0.21337457620443862,
1100
+ "grad_norm": 79.0,
1101
+ "learning_rate": 0.00016220030707452462,
1102
+ "loss": 3.2041,
1103
+ "step": 7450
1104
+ },
1105
+ {
1106
+ "epoch": 0.2148066203400389,
1107
+ "grad_norm": 144.0,
1108
+ "learning_rate": 0.00016190504310853904,
1109
+ "loss": 1.962,
1110
+ "step": 7500
1111
+ },
1112
+ {
1113
+ "epoch": 0.2148066203400389,
1114
+ "eval_accuracy": 0.93,
1115
+ "eval_loss": 0.3529609441757202,
1116
+ "eval_macro_f1": 0.9295120271109343,
1117
+ "eval_runtime": 175.7548,
1118
+ "eval_samples_per_second": 11.379,
1119
+ "eval_steps_per_second": 11.379,
1120
+ "step": 7500
1121
+ },
1122
+ {
1123
+ "epoch": 0.21623866447563914,
1124
+ "grad_norm": 2592.0,
1125
+ "learning_rate": 0.00016160977914255343,
1126
+ "loss": 2.7684,
1127
+ "step": 7550
1128
+ },
1129
+ {
1130
+ "epoch": 0.2176707086112394,
1131
+ "grad_norm": 0.03271484375,
1132
+ "learning_rate": 0.00016131451517656785,
1133
+ "loss": 2.5066,
1134
+ "step": 7600
1135
+ },
1136
+ {
1137
+ "epoch": 0.21910275274683966,
1138
+ "grad_norm": 0.09423828125,
1139
+ "learning_rate": 0.00016101925121058227,
1140
+ "loss": 2.6791,
1141
+ "step": 7650
1142
+ },
1143
+ {
1144
+ "epoch": 0.2205347968824399,
1145
+ "grad_norm": 536.0,
1146
+ "learning_rate": 0.0001607239872445967,
1147
+ "loss": 3.3268,
1148
+ "step": 7700
1149
+ },
1150
+ {
1151
+ "epoch": 0.22196684101804018,
1152
+ "grad_norm": 0.01007080078125,
1153
+ "learning_rate": 0.0001604287232786111,
1154
+ "loss": 2.2916,
1155
+ "step": 7750
1156
+ },
1157
+ {
1158
+ "epoch": 0.22339888515364043,
1159
+ "grad_norm": 0.1396484375,
1160
+ "learning_rate": 0.0001601334593126255,
1161
+ "loss": 2.8402,
1162
+ "step": 7800
1163
+ },
1164
+ {
1165
+ "epoch": 0.2248309292892407,
1166
+ "grad_norm": 93.5,
1167
+ "learning_rate": 0.00015983819534663992,
1168
+ "loss": 2.5527,
1169
+ "step": 7850
1170
+ },
1171
+ {
1172
+ "epoch": 0.22626297342484095,
1173
+ "grad_norm": 0.318359375,
1174
+ "learning_rate": 0.0001595429313806543,
1175
+ "loss": 3.0559,
1176
+ "step": 7900
1177
+ },
1178
+ {
1179
+ "epoch": 0.22769501756044122,
1180
+ "grad_norm": 276.0,
1181
+ "learning_rate": 0.00015924766741466873,
1182
+ "loss": 1.8897,
1183
+ "step": 7950
1184
+ },
1185
+ {
1186
+ "epoch": 0.22912706169604147,
1187
+ "grad_norm": 1.7421875,
1188
+ "learning_rate": 0.00015895240344868315,
1189
+ "loss": 1.9342,
1190
+ "step": 8000
1191
+ },
1192
+ {
1193
+ "epoch": 0.23055910583164174,
1194
+ "grad_norm": 0.036865234375,
1195
+ "learning_rate": 0.00015865713948269754,
1196
+ "loss": 2.0979,
1197
+ "step": 8050
1198
+ },
1199
+ {
1200
+ "epoch": 0.231991149967242,
1201
+ "grad_norm": 164.0,
1202
+ "learning_rate": 0.00015836187551671196,
1203
+ "loss": 2.2929,
1204
+ "step": 8100
1205
+ },
1206
+ {
1207
+ "epoch": 0.23342319410284226,
1208
+ "grad_norm": 88.5,
1209
+ "learning_rate": 0.00015806661155072635,
1210
+ "loss": 3.0427,
1211
+ "step": 8150
1212
+ },
1213
+ {
1214
+ "epoch": 0.2348552382384425,
1215
+ "grad_norm": 1104.0,
1216
+ "learning_rate": 0.00015777134758474077,
1217
+ "loss": 2.8966,
1218
+ "step": 8200
1219
+ },
1220
+ {
1221
+ "epoch": 0.23628728237404276,
1222
+ "grad_norm": 520.0,
1223
+ "learning_rate": 0.00015747608361875516,
1224
+ "loss": 2.0752,
1225
+ "step": 8250
1226
+ },
1227
+ {
1228
+ "epoch": 0.23771932650964303,
1229
+ "grad_norm": 0.07568359375,
1230
+ "learning_rate": 0.00015718081965276958,
1231
+ "loss": 1.7808,
1232
+ "step": 8300
1233
+ },
1234
+ {
1235
+ "epoch": 0.23915137064524328,
1236
+ "grad_norm": 0.06982421875,
1237
+ "learning_rate": 0.000156885555686784,
1238
+ "loss": 2.9426,
1239
+ "step": 8350
1240
+ },
1241
+ {
1242
+ "epoch": 0.24058341478084355,
1243
+ "grad_norm": 242.0,
1244
+ "learning_rate": 0.00015659029172079839,
1245
+ "loss": 2.3159,
1246
+ "step": 8400
1247
+ },
1248
+ {
1249
+ "epoch": 0.2420154589164438,
1250
+ "grad_norm": 7.5,
1251
+ "learning_rate": 0.0001562950277548128,
1252
+ "loss": 2.6197,
1253
+ "step": 8450
1254
+ },
1255
+ {
1256
+ "epoch": 0.24344750305204407,
1257
+ "grad_norm": 57.25,
1258
+ "learning_rate": 0.00015599976378882722,
1259
+ "loss": 2.6834,
1260
+ "step": 8500
1261
+ },
1262
+ {
1263
+ "epoch": 0.24487954718764432,
1264
+ "grad_norm": 4.0,
1265
+ "learning_rate": 0.00015570449982284164,
1266
+ "loss": 2.116,
1267
+ "step": 8550
1268
+ },
1269
+ {
1270
+ "epoch": 0.2463115913232446,
1271
+ "grad_norm": 0.11181640625,
1272
+ "learning_rate": 0.00015540923585685603,
1273
+ "loss": 3.5668,
1274
+ "step": 8600
1275
+ },
1276
+ {
1277
+ "epoch": 0.24774363545884484,
1278
+ "grad_norm": 240.0,
1279
+ "learning_rate": 0.00015511397189087045,
1280
+ "loss": 3.1473,
1281
+ "step": 8650
1282
+ },
1283
+ {
1284
+ "epoch": 0.2491756795944451,
1285
+ "grad_norm": 117.5,
1286
+ "learning_rate": 0.00015481870792488487,
1287
+ "loss": 2.4813,
1288
+ "step": 8700
1289
+ },
1290
+ {
1291
+ "epoch": 0.25060772373004536,
1292
+ "grad_norm": 7.46875,
1293
+ "learning_rate": 0.00015452344395889926,
1294
+ "loss": 1.8936,
1295
+ "step": 8750
1296
+ },
1297
+ {
1298
+ "epoch": 0.25060772373004536,
1299
+ "eval_accuracy": 0.9365,
1300
+ "eval_loss": 0.328545480966568,
1301
+ "eval_macro_f1": 0.9360277798015127,
1302
+ "eval_runtime": 178.5517,
1303
+ "eval_samples_per_second": 11.201,
1304
+ "eval_steps_per_second": 11.201,
1305
+ "step": 8750
1306
+ },
1307
+ {
1308
+ "epoch": 0.25203976786564564,
1309
+ "grad_norm": 126.0,
1310
+ "learning_rate": 0.00015422817999291368,
1311
+ "loss": 2.649,
1312
+ "step": 8800
1313
+ },
1314
+ {
1315
+ "epoch": 0.25347181200124586,
1316
+ "grad_norm": 0.06884765625,
1317
+ "learning_rate": 0.00015393291602692807,
1318
+ "loss": 2.8102,
1319
+ "step": 8850
1320
+ },
1321
+ {
1322
+ "epoch": 0.25490385613684613,
1323
+ "grad_norm": 0.6015625,
1324
+ "learning_rate": 0.0001536376520609425,
1325
+ "loss": 2.4762,
1326
+ "step": 8900
1327
+ },
1328
+ {
1329
+ "epoch": 0.2563359002724464,
1330
+ "grad_norm": 370.0,
1331
+ "learning_rate": 0.00015334238809495688,
1332
+ "loss": 2.1245,
1333
+ "step": 8950
1334
+ },
1335
+ {
1336
+ "epoch": 0.2577679444080467,
1337
+ "grad_norm": 238.0,
1338
+ "learning_rate": 0.0001530471241289713,
1339
+ "loss": 1.4588,
1340
+ "step": 9000
1341
+ },
1342
+ {
1343
+ "epoch": 0.2591999885436469,
1344
+ "grad_norm": 8.5625,
1345
+ "learning_rate": 0.00015275186016298572,
1346
+ "loss": 2.7869,
1347
+ "step": 9050
1348
+ },
1349
+ {
1350
+ "epoch": 0.26063203267924717,
1351
+ "grad_norm": 118.5,
1352
+ "learning_rate": 0.0001524565961970001,
1353
+ "loss": 2.1987,
1354
+ "step": 9100
1355
+ },
1356
+ {
1357
+ "epoch": 0.26206407681484745,
1358
+ "grad_norm": 37.25,
1359
+ "learning_rate": 0.00015216133223101453,
1360
+ "loss": 2.8539,
1361
+ "step": 9150
1362
+ },
1363
+ {
1364
+ "epoch": 0.26349612095044767,
1365
+ "grad_norm": 0.62109375,
1366
+ "learning_rate": 0.00015186606826502895,
1367
+ "loss": 2.6421,
1368
+ "step": 9200
1369
+ },
1370
+ {
1371
+ "epoch": 0.26492816508604794,
1372
+ "grad_norm": 0.404296875,
1373
+ "learning_rate": 0.00015157080429904337,
1374
+ "loss": 3.3623,
1375
+ "step": 9250
1376
+ },
1377
+ {
1378
+ "epoch": 0.2663602092216482,
1379
+ "grad_norm": 0.09130859375,
1380
+ "learning_rate": 0.00015127554033305776,
1381
+ "loss": 2.6995,
1382
+ "step": 9300
1383
+ },
1384
+ {
1385
+ "epoch": 0.2677922533572485,
1386
+ "grad_norm": 82.5,
1387
+ "learning_rate": 0.00015098027636707218,
1388
+ "loss": 1.8874,
1389
+ "step": 9350
1390
+ },
1391
+ {
1392
+ "epoch": 0.2692242974928487,
1393
+ "grad_norm": 4416.0,
1394
+ "learning_rate": 0.0001506850124010866,
1395
+ "loss": 2.2107,
1396
+ "step": 9400
1397
+ },
1398
+ {
1399
+ "epoch": 0.270656341628449,
1400
+ "grad_norm": 18.125,
1401
+ "learning_rate": 0.000150389748435101,
1402
+ "loss": 3.2056,
1403
+ "step": 9450
1404
+ },
1405
+ {
1406
+ "epoch": 0.27208838576404926,
1407
+ "grad_norm": 3.09375,
1408
+ "learning_rate": 0.0001500944844691154,
1409
+ "loss": 2.9934,
1410
+ "step": 9500
1411
+ },
1412
+ {
1413
+ "epoch": 0.27352042989964953,
1414
+ "grad_norm": 280.0,
1415
+ "learning_rate": 0.0001497992205031298,
1416
+ "loss": 2.2205,
1417
+ "step": 9550
1418
+ },
1419
+ {
1420
+ "epoch": 0.27495247403524975,
1421
+ "grad_norm": 94.5,
1422
+ "learning_rate": 0.00014950395653714422,
1423
+ "loss": 2.5102,
1424
+ "step": 9600
1425
+ },
1426
+ {
1427
+ "epoch": 0.27638451817085,
1428
+ "grad_norm": 0.2021484375,
1429
+ "learning_rate": 0.0001492086925711586,
1430
+ "loss": 2.0138,
1431
+ "step": 9650
1432
+ },
1433
+ {
1434
+ "epoch": 0.2778165623064503,
1435
+ "grad_norm": 1.3359375,
1436
+ "learning_rate": 0.00014891342860517303,
1437
+ "loss": 1.556,
1438
+ "step": 9700
1439
+ },
1440
+ {
1441
+ "epoch": 0.2792486064420505,
1442
+ "grad_norm": 0.494140625,
1443
+ "learning_rate": 0.00014861816463918745,
1444
+ "loss": 2.7351,
1445
+ "step": 9750
1446
+ },
1447
+ {
1448
+ "epoch": 0.2806806505776508,
1449
+ "grad_norm": 0.1953125,
1450
+ "learning_rate": 0.00014832290067320184,
1451
+ "loss": 2.0641,
1452
+ "step": 9800
1453
+ },
1454
+ {
1455
+ "epoch": 0.28211269471325107,
1456
+ "grad_norm": 0.421875,
1457
+ "learning_rate": 0.00014802763670721626,
1458
+ "loss": 2.642,
1459
+ "step": 9850
1460
+ },
1461
+ {
1462
+ "epoch": 0.28354473884885134,
1463
+ "grad_norm": 292.0,
1464
+ "learning_rate": 0.00014773237274123065,
1465
+ "loss": 2.5676,
1466
+ "step": 9900
1467
+ },
1468
+ {
1469
+ "epoch": 0.28497678298445156,
1470
+ "grad_norm": 4.40625,
1471
+ "learning_rate": 0.00014743710877524507,
1472
+ "loss": 2.5438,
1473
+ "step": 9950
1474
+ },
1475
+ {
1476
+ "epoch": 0.28640882712005183,
1477
+ "grad_norm": 129.0,
1478
+ "learning_rate": 0.0001471418448092595,
1479
+ "loss": 3.0776,
1480
+ "step": 10000
1481
+ },
1482
+ {
1483
+ "epoch": 0.28640882712005183,
1484
+ "eval_accuracy": 0.9335,
1485
+ "eval_loss": 0.34245818853378296,
1486
+ "eval_macro_f1": 0.9327568911653952,
1487
+ "eval_runtime": 181.524,
1488
+ "eval_samples_per_second": 11.018,
1489
+ "eval_steps_per_second": 11.018,
1490
+ "step": 10000
1491
+ },
1492
+ {
1493
+ "epoch": 0.2878408712556521,
1494
+ "grad_norm": 9.25,
1495
+ "learning_rate": 0.0001468465808432739,
1496
+ "loss": 2.2061,
1497
+ "step": 10050
1498
+ },
1499
+ {
1500
+ "epoch": 0.2892729153912524,
1501
+ "grad_norm": 0.10693359375,
1502
+ "learning_rate": 0.00014655131687728832,
1503
+ "loss": 2.6087,
1504
+ "step": 10100
1505
+ },
1506
+ {
1507
+ "epoch": 0.2907049595268526,
1508
+ "grad_norm": 0.10107421875,
1509
+ "learning_rate": 0.00014625605291130272,
1510
+ "loss": 2.4579,
1511
+ "step": 10150
1512
+ },
1513
+ {
1514
+ "epoch": 0.2921370036624529,
1515
+ "grad_norm": 0.33203125,
1516
+ "learning_rate": 0.00014596078894531714,
1517
+ "loss": 2.2037,
1518
+ "step": 10200
1519
+ },
1520
+ {
1521
+ "epoch": 0.29356904779805315,
1522
+ "grad_norm": 0.1513671875,
1523
+ "learning_rate": 0.00014566552497933153,
1524
+ "loss": 2.3772,
1525
+ "step": 10250
1526
+ },
1527
+ {
1528
+ "epoch": 0.29500109193365337,
1529
+ "grad_norm": 80.5,
1530
+ "learning_rate": 0.00014537026101334595,
1531
+ "loss": 2.2901,
1532
+ "step": 10300
1533
+ },
1534
+ {
1535
+ "epoch": 0.29643313606925364,
1536
+ "grad_norm": 21.0,
1537
+ "learning_rate": 0.00014507499704736034,
1538
+ "loss": 2.1736,
1539
+ "step": 10350
1540
+ },
1541
+ {
1542
+ "epoch": 0.2978651802048539,
1543
+ "grad_norm": 168.0,
1544
+ "learning_rate": 0.00014477973308137476,
1545
+ "loss": 2.3493,
1546
+ "step": 10400
1547
+ },
1548
+ {
1549
+ "epoch": 0.2992972243404542,
1550
+ "grad_norm": 229.0,
1551
+ "learning_rate": 0.00014448446911538915,
1552
+ "loss": 2.4933,
1553
+ "step": 10450
1554
+ },
1555
+ {
1556
+ "epoch": 0.3007292684760544,
1557
+ "grad_norm": 274.0,
1558
+ "learning_rate": 0.00014418920514940357,
1559
+ "loss": 2.6936,
1560
+ "step": 10500
1561
+ },
1562
+ {
1563
+ "epoch": 0.3021613126116547,
1564
+ "grad_norm": 272.0,
1565
+ "learning_rate": 0.00014389394118341799,
1566
+ "loss": 3.3879,
1567
+ "step": 10550
1568
+ },
1569
+ {
1570
+ "epoch": 0.30359335674725496,
1571
+ "grad_norm": 0.0103759765625,
1572
+ "learning_rate": 0.00014359867721743238,
1573
+ "loss": 1.6445,
1574
+ "step": 10600
1575
+ },
1576
+ {
1577
+ "epoch": 0.30502540088285524,
1578
+ "grad_norm": 0.02880859375,
1579
+ "learning_rate": 0.0001433034132514468,
1580
+ "loss": 1.5567,
1581
+ "step": 10650
1582
+ },
1583
+ {
1584
+ "epoch": 0.30645744501845545,
1585
+ "grad_norm": 292.0,
1586
+ "learning_rate": 0.00014300814928546121,
1587
+ "loss": 2.5947,
1588
+ "step": 10700
1589
+ },
1590
+ {
1591
+ "epoch": 0.30788948915405573,
1592
+ "grad_norm": 97.5,
1593
+ "learning_rate": 0.00014271288531947563,
1594
+ "loss": 2.7865,
1595
+ "step": 10750
1596
+ },
1597
+ {
1598
+ "epoch": 0.309321533289656,
1599
+ "grad_norm": 79.5,
1600
+ "learning_rate": 0.00014241762135349002,
1601
+ "loss": 2.1275,
1602
+ "step": 10800
1603
+ },
1604
+ {
1605
+ "epoch": 0.3107535774252562,
1606
+ "grad_norm": 298.0,
1607
+ "learning_rate": 0.00014212235738750444,
1608
+ "loss": 2.0145,
1609
+ "step": 10850
1610
+ },
1611
+ {
1612
+ "epoch": 0.3121856215608565,
1613
+ "grad_norm": 0.040771484375,
1614
+ "learning_rate": 0.00014182709342151886,
1615
+ "loss": 1.8322,
1616
+ "step": 10900
1617
+ },
1618
+ {
1619
+ "epoch": 0.31361766569645677,
1620
+ "grad_norm": 142.0,
1621
+ "learning_rate": 0.00014153182945553325,
1622
+ "loss": 1.3864,
1623
+ "step": 10950
1624
+ },
1625
+ {
1626
+ "epoch": 0.31504970983205705,
1627
+ "grad_norm": 1.7578125,
1628
+ "learning_rate": 0.00014123656548954767,
1629
+ "loss": 2.7755,
1630
+ "step": 11000
1631
+ },
1632
+ {
1633
+ "epoch": 0.31648175396765726,
1634
+ "grad_norm": 82.0,
1635
+ "learning_rate": 0.00014094130152356206,
1636
+ "loss": 2.5528,
1637
+ "step": 11050
1638
+ },
1639
+ {
1640
+ "epoch": 0.31791379810325754,
1641
+ "grad_norm": 80.0,
1642
+ "learning_rate": 0.00014064603755757648,
1643
+ "loss": 2.5284,
1644
+ "step": 11100
1645
+ },
1646
+ {
1647
+ "epoch": 0.3193458422388578,
1648
+ "grad_norm": 0.05224609375,
1649
+ "learning_rate": 0.00014035077359159087,
1650
+ "loss": 2.8708,
1651
+ "step": 11150
1652
+ },
1653
+ {
1654
+ "epoch": 0.3207778863744581,
1655
+ "grad_norm": 0.140625,
1656
+ "learning_rate": 0.0001400555096256053,
1657
+ "loss": 3.5295,
1658
+ "step": 11200
1659
+ },
1660
+ {
1661
+ "epoch": 0.3222099305100583,
1662
+ "grad_norm": 0.050048828125,
1663
+ "learning_rate": 0.0001397602456596197,
1664
+ "loss": 3.325,
1665
+ "step": 11250
1666
+ },
1667
+ {
1668
+ "epoch": 0.3222099305100583,
1669
+ "eval_accuracy": 0.94,
1670
+ "eval_loss": 0.2819044888019562,
1671
+ "eval_macro_f1": 0.9395225640341313,
1672
+ "eval_runtime": 173.501,
1673
+ "eval_samples_per_second": 11.527,
1674
+ "eval_steps_per_second": 11.527,
1675
+ "step": 11250
1676
+ },
1677
+ {
1678
+ "epoch": 0.3236419746456586,
1679
+ "grad_norm": 1.078125,
1680
+ "learning_rate": 0.0001394649816936341,
1681
+ "loss": 3.0985,
1682
+ "step": 11300
1683
+ },
1684
+ {
1685
+ "epoch": 0.32507401878125886,
1686
+ "grad_norm": 116.0,
1687
+ "learning_rate": 0.00013916971772764852,
1688
+ "loss": 2.5793,
1689
+ "step": 11350
1690
+ },
1691
+ {
1692
+ "epoch": 0.3265060629168591,
1693
+ "grad_norm": 0.1865234375,
1694
+ "learning_rate": 0.00013887445376166291,
1695
+ "loss": 2.5646,
1696
+ "step": 11400
1697
+ },
1698
+ {
1699
+ "epoch": 0.32793810705245935,
1700
+ "grad_norm": 306.0,
1701
+ "learning_rate": 0.00013857918979567733,
1702
+ "loss": 1.9864,
1703
+ "step": 11450
1704
+ },
1705
+ {
1706
+ "epoch": 0.3293701511880596,
1707
+ "grad_norm": 274.0,
1708
+ "learning_rate": 0.00013828392582969175,
1709
+ "loss": 1.8868,
1710
+ "step": 11500
1711
+ },
1712
+ {
1713
+ "epoch": 0.3308021953236599,
1714
+ "grad_norm": 0.08984375,
1715
+ "learning_rate": 0.00013798866186370617,
1716
+ "loss": 2.5106,
1717
+ "step": 11550
1718
+ },
1719
+ {
1720
+ "epoch": 0.3322342394592601,
1721
+ "grad_norm": 270.0,
1722
+ "learning_rate": 0.0001376933978977206,
1723
+ "loss": 1.8537,
1724
+ "step": 11600
1725
+ },
1726
+ {
1727
+ "epoch": 0.3336662835948604,
1728
+ "grad_norm": 270.0,
1729
+ "learning_rate": 0.00013739813393173498,
1730
+ "loss": 2.3735,
1731
+ "step": 11650
1732
+ },
1733
+ {
1734
+ "epoch": 0.33509832773046067,
1735
+ "grad_norm": 0.416015625,
1736
+ "learning_rate": 0.0001371028699657494,
1737
+ "loss": 2.0794,
1738
+ "step": 11700
1739
+ },
1740
+ {
1741
+ "epoch": 0.3365303718660609,
1742
+ "grad_norm": 0.32421875,
1743
+ "learning_rate": 0.0001368076059997638,
1744
+ "loss": 2.5114,
1745
+ "step": 11750
1746
+ },
1747
+ {
1748
+ "epoch": 0.33796241600166116,
1749
+ "grad_norm": 0.0595703125,
1750
+ "learning_rate": 0.0001365123420337782,
1751
+ "loss": 2.199,
1752
+ "step": 11800
1753
+ },
1754
+ {
1755
+ "epoch": 0.33939446013726143,
1756
+ "grad_norm": 61.75,
1757
+ "learning_rate": 0.0001362170780677926,
1758
+ "loss": 2.64,
1759
+ "step": 11850
1760
+ },
1761
+ {
1762
+ "epoch": 0.3408265042728617,
1763
+ "grad_norm": 9.0,
1764
+ "learning_rate": 0.00013592181410180702,
1765
+ "loss": 1.8553,
1766
+ "step": 11900
1767
+ },
1768
+ {
1769
+ "epoch": 0.3422585484084619,
1770
+ "grad_norm": 177.0,
1771
+ "learning_rate": 0.00013562655013582144,
1772
+ "loss": 1.6963,
1773
+ "step": 11950
1774
+ },
1775
+ {
1776
+ "epoch": 0.3436905925440622,
1777
+ "grad_norm": 326.0,
1778
+ "learning_rate": 0.00013533128616983583,
1779
+ "loss": 3.007,
1780
+ "step": 12000
1781
+ },
1782
+ {
1783
+ "epoch": 0.3451226366796625,
1784
+ "grad_norm": 0.06787109375,
1785
+ "learning_rate": 0.00013503602220385025,
1786
+ "loss": 1.6731,
1787
+ "step": 12050
1788
+ },
1789
+ {
1790
+ "epoch": 0.34655468081526275,
1791
+ "grad_norm": 5.28125,
1792
+ "learning_rate": 0.00013474075823786464,
1793
+ "loss": 2.5167,
1794
+ "step": 12100
1795
+ },
1796
+ {
1797
+ "epoch": 0.34798672495086297,
1798
+ "grad_norm": 0.11865234375,
1799
+ "learning_rate": 0.00013444549427187906,
1800
+ "loss": 3.4208,
1801
+ "step": 12150
1802
+ },
1803
+ {
1804
+ "epoch": 0.34941876908646324,
1805
+ "grad_norm": 0.1337890625,
1806
+ "learning_rate": 0.00013415023030589348,
1807
+ "loss": 1.3073,
1808
+ "step": 12200
1809
+ },
1810
+ {
1811
+ "epoch": 0.3508508132220635,
1812
+ "grad_norm": 82.5,
1813
+ "learning_rate": 0.0001338549663399079,
1814
+ "loss": 2.3618,
1815
+ "step": 12250
1816
+ },
1817
+ {
1818
+ "epoch": 0.35228285735766374,
1819
+ "grad_norm": 1.546875,
1820
+ "learning_rate": 0.00013355970237392232,
1821
+ "loss": 2.0756,
1822
+ "step": 12300
1823
+ },
1824
+ {
1825
+ "epoch": 0.353714901493264,
1826
+ "grad_norm": 120.0,
1827
+ "learning_rate": 0.0001332644384079367,
1828
+ "loss": 2.2016,
1829
+ "step": 12350
1830
+ },
1831
+ {
1832
+ "epoch": 0.3551469456288643,
1833
+ "grad_norm": 5.34375,
1834
+ "learning_rate": 0.00013296917444195113,
1835
+ "loss": 2.7446,
1836
+ "step": 12400
1837
+ },
1838
+ {
1839
+ "epoch": 0.35657898976446456,
1840
+ "grad_norm": 0.057861328125,
1841
+ "learning_rate": 0.00013267391047596552,
1842
+ "loss": 2.3892,
1843
+ "step": 12450
1844
+ },
1845
+ {
1846
+ "epoch": 0.3580110339000648,
1847
+ "grad_norm": 1.6640625,
1848
+ "learning_rate": 0.00013237864650997994,
1849
+ "loss": 2.3038,
1850
+ "step": 12500
1851
+ },
1852
+ {
1853
+ "epoch": 0.3580110339000648,
1854
+ "eval_accuracy": 0.943,
1855
+ "eval_loss": 0.2820850610733032,
1856
+ "eval_macro_f1": 0.9423954094372721,
1857
+ "eval_runtime": 180.0229,
1858
+ "eval_samples_per_second": 11.11,
1859
+ "eval_steps_per_second": 11.11,
1860
+ "step": 12500
1861
+ },
1862
+ {
1863
+ "epoch": 0.35944307803566505,
1864
+ "grad_norm": 0.15625,
1865
+ "learning_rate": 0.00013208338254399433,
1866
+ "loss": 1.6647,
1867
+ "step": 12550
1868
+ },
1869
+ {
1870
+ "epoch": 0.36087512217126533,
1871
+ "grad_norm": 0.06640625,
1872
+ "learning_rate": 0.00013178811857800875,
1873
+ "loss": 2.2751,
1874
+ "step": 12600
1875
+ },
1876
+ {
1877
+ "epoch": 0.3623071663068656,
1878
+ "grad_norm": 0.984375,
1879
+ "learning_rate": 0.00013149285461202316,
1880
+ "loss": 2.7458,
1881
+ "step": 12650
1882
+ },
1883
+ {
1884
+ "epoch": 0.3637392104424658,
1885
+ "grad_norm": 420.0,
1886
+ "learning_rate": 0.00013119759064603756,
1887
+ "loss": 2.2766,
1888
+ "step": 12700
1889
+ },
1890
+ {
1891
+ "epoch": 0.3651712545780661,
1892
+ "grad_norm": 0.1103515625,
1893
+ "learning_rate": 0.00013090232668005198,
1894
+ "loss": 1.4627,
1895
+ "step": 12750
1896
+ },
1897
+ {
1898
+ "epoch": 0.36660329871366637,
1899
+ "grad_norm": 0.25390625,
1900
+ "learning_rate": 0.00013060706271406637,
1901
+ "loss": 1.9492,
1902
+ "step": 12800
1903
+ },
1904
+ {
1905
+ "epoch": 0.3680353428492666,
1906
+ "grad_norm": 0.232421875,
1907
+ "learning_rate": 0.00013031179874808079,
1908
+ "loss": 1.8883,
1909
+ "step": 12850
1910
+ },
1911
+ {
1912
+ "epoch": 0.36946738698486686,
1913
+ "grad_norm": 0.205078125,
1914
+ "learning_rate": 0.00013001653478209518,
1915
+ "loss": 2.058,
1916
+ "step": 12900
1917
+ },
1918
+ {
1919
+ "epoch": 0.37089943112046714,
1920
+ "grad_norm": 0.53515625,
1921
+ "learning_rate": 0.0001297212708161096,
1922
+ "loss": 2.7257,
1923
+ "step": 12950
1924
+ },
1925
+ {
1926
+ "epoch": 0.3723314752560674,
1927
+ "grad_norm": 764.0,
1928
+ "learning_rate": 0.00012942600685012401,
1929
+ "loss": 2.4923,
1930
+ "step": 13000
1931
+ },
1932
+ {
1933
+ "epoch": 0.37376351939166763,
1934
+ "grad_norm": 298.0,
1935
+ "learning_rate": 0.00012913074288413843,
1936
+ "loss": 1.8667,
1937
+ "step": 13050
1938
+ },
1939
+ {
1940
+ "epoch": 0.3751955635272679,
1941
+ "grad_norm": 282.0,
1942
+ "learning_rate": 0.00012883547891815285,
1943
+ "loss": 2.4477,
1944
+ "step": 13100
1945
+ },
1946
+ {
1947
+ "epoch": 0.3766276076628682,
1948
+ "grad_norm": 0.02587890625,
1949
+ "learning_rate": 0.00012854021495216724,
1950
+ "loss": 1.0699,
1951
+ "step": 13150
1952
+ },
1953
+ {
1954
+ "epoch": 0.37805965179846845,
1955
+ "grad_norm": 160.0,
1956
+ "learning_rate": 0.00012824495098618166,
1957
+ "loss": 2.8487,
1958
+ "step": 13200
1959
+ },
1960
+ {
1961
+ "epoch": 0.3794916959340687,
1962
+ "grad_norm": 216.0,
1963
+ "learning_rate": 0.00012794968702019605,
1964
+ "loss": 1.8886,
1965
+ "step": 13250
1966
+ },
1967
+ {
1968
+ "epoch": 0.38092374006966895,
1969
+ "grad_norm": 302.0,
1970
+ "learning_rate": 0.00012765442305421047,
1971
+ "loss": 2.4619,
1972
+ "step": 13300
1973
+ },
1974
+ {
1975
+ "epoch": 0.3823557842052692,
1976
+ "grad_norm": 172.0,
1977
+ "learning_rate": 0.0001273591590882249,
1978
+ "loss": 2.8237,
1979
+ "step": 13350
1980
+ },
1981
+ {
1982
+ "epoch": 0.38378782834086944,
1983
+ "grad_norm": 91.5,
1984
+ "learning_rate": 0.00012706389512223928,
1985
+ "loss": 2.7431,
1986
+ "step": 13400
1987
+ },
1988
+ {
1989
+ "epoch": 0.3852198724764697,
1990
+ "grad_norm": 0.275390625,
1991
+ "learning_rate": 0.0001267686311562537,
1992
+ "loss": 1.9888,
1993
+ "step": 13450
1994
+ },
1995
+ {
1996
+ "epoch": 0.38665191661207,
1997
+ "grad_norm": 194.0,
1998
+ "learning_rate": 0.0001264733671902681,
1999
+ "loss": 2.5123,
2000
+ "step": 13500
2001
+ },
2002
+ {
2003
+ "epoch": 0.38808396074767026,
2004
+ "grad_norm": 396.0,
2005
+ "learning_rate": 0.0001261781032242825,
2006
+ "loss": 1.9384,
2007
+ "step": 13550
2008
+ },
2009
+ {
2010
+ "epoch": 0.3895160048832705,
2011
+ "grad_norm": 54.75,
2012
+ "learning_rate": 0.0001258828392582969,
2013
+ "loss": 1.9031,
2014
+ "step": 13600
2015
+ },
2016
+ {
2017
+ "epoch": 0.39094804901887076,
2018
+ "grad_norm": 0.828125,
2019
+ "learning_rate": 0.00012558757529231132,
2020
+ "loss": 2.65,
2021
+ "step": 13650
2022
+ },
2023
+ {
2024
+ "epoch": 0.39238009315447103,
2025
+ "grad_norm": 36.25,
2026
+ "learning_rate": 0.00012529231132632574,
2027
+ "loss": 2.2794,
2028
+ "step": 13700
2029
+ },
2030
+ {
2031
+ "epoch": 0.3938121372900713,
2032
+ "grad_norm": 0.11962890625,
2033
+ "learning_rate": 0.00012499704736034016,
2034
+ "loss": 2.1189,
2035
+ "step": 13750
2036
+ },
2037
+ {
2038
+ "epoch": 0.3938121372900713,
2039
+ "eval_accuracy": 0.9465,
2040
+ "eval_loss": 0.31273505091667175,
2041
+ "eval_macro_f1": 0.9457595736365828,
2042
+ "eval_runtime": 172.8312,
2043
+ "eval_samples_per_second": 11.572,
2044
+ "eval_steps_per_second": 11.572,
2045
+ "step": 13750
2046
+ },
2047
+ {
2048
+ "epoch": 0.3952441814256715,
2049
+ "grad_norm": 398.0,
2050
+ "learning_rate": 0.00012470178339435458,
2051
+ "loss": 2.5979,
2052
+ "step": 13800
2053
+ },
2054
+ {
2055
+ "epoch": 0.3966762255612718,
2056
+ "grad_norm": 266.0,
2057
+ "learning_rate": 0.00012440651942836897,
2058
+ "loss": 2.8401,
2059
+ "step": 13850
2060
+ },
2061
+ {
2062
+ "epoch": 0.3981082696968721,
2063
+ "grad_norm": 0.1884765625,
2064
+ "learning_rate": 0.0001241112554623834,
2065
+ "loss": 1.9365,
2066
+ "step": 13900
2067
+ },
2068
+ {
2069
+ "epoch": 0.3995403138324723,
2070
+ "grad_norm": 0.055908203125,
2071
+ "learning_rate": 0.00012381599149639778,
2072
+ "loss": 0.9845,
2073
+ "step": 13950
2074
+ },
2075
+ {
2076
+ "epoch": 0.40097235796807257,
2077
+ "grad_norm": 130.0,
2078
+ "learning_rate": 0.0001235207275304122,
2079
+ "loss": 2.9002,
2080
+ "step": 14000
2081
+ },
2082
+ {
2083
+ "epoch": 0.40240440210367284,
2084
+ "grad_norm": 0.0045166015625,
2085
+ "learning_rate": 0.00012322546356442662,
2086
+ "loss": 2.1528,
2087
+ "step": 14050
2088
+ },
2089
+ {
2090
+ "epoch": 0.4038364462392731,
2091
+ "grad_norm": 0.39453125,
2092
+ "learning_rate": 0.000122930199598441,
2093
+ "loss": 2.4575,
2094
+ "step": 14100
2095
+ },
2096
+ {
2097
+ "epoch": 0.40526849037487334,
2098
+ "grad_norm": 241.0,
2099
+ "learning_rate": 0.00012263493563245543,
2100
+ "loss": 2.475,
2101
+ "step": 14150
2102
+ },
2103
+ {
2104
+ "epoch": 0.4067005345104736,
2105
+ "grad_norm": 37.5,
2106
+ "learning_rate": 0.00012233967166646982,
2107
+ "loss": 1.6529,
2108
+ "step": 14200
2109
+ },
2110
+ {
2111
+ "epoch": 0.4081325786460739,
2112
+ "grad_norm": 272.0,
2113
+ "learning_rate": 0.00012204440770048424,
2114
+ "loss": 2.922,
2115
+ "step": 14250
2116
+ },
2117
+ {
2118
+ "epoch": 0.4095646227816741,
2119
+ "grad_norm": 3.734375,
2120
+ "learning_rate": 0.00012174914373449864,
2121
+ "loss": 2.2361,
2122
+ "step": 14300
2123
+ },
2124
+ {
2125
+ "epoch": 0.4109966669172744,
2126
+ "grad_norm": 338.0,
2127
+ "learning_rate": 0.00012145387976851306,
2128
+ "loss": 1.5299,
2129
+ "step": 14350
2130
+ },
2131
+ {
2132
+ "epoch": 0.41242871105287465,
2133
+ "grad_norm": 0.043212890625,
2134
+ "learning_rate": 0.00012115861580252748,
2135
+ "loss": 2.5728,
2136
+ "step": 14400
2137
+ },
2138
+ {
2139
+ "epoch": 0.4138607551884749,
2140
+ "grad_norm": 0.15625,
2141
+ "learning_rate": 0.00012086335183654187,
2142
+ "loss": 2.122,
2143
+ "step": 14450
2144
+ },
2145
+ {
2146
+ "epoch": 0.41529279932407515,
2147
+ "grad_norm": 0.008056640625,
2148
+ "learning_rate": 0.00012056808787055629,
2149
+ "loss": 2.4863,
2150
+ "step": 14500
2151
+ },
2152
+ {
2153
+ "epoch": 0.4167248434596754,
2154
+ "grad_norm": 32.5,
2155
+ "learning_rate": 0.00012027282390457068,
2156
+ "loss": 3.3401,
2157
+ "step": 14550
2158
+ },
2159
+ {
2160
+ "epoch": 0.4181568875952757,
2161
+ "grad_norm": 0.2490234375,
2162
+ "learning_rate": 0.0001199775599385851,
2163
+ "loss": 2.0398,
2164
+ "step": 14600
2165
+ },
2166
+ {
2167
+ "epoch": 0.41958893173087597,
2168
+ "grad_norm": 0.326171875,
2169
+ "learning_rate": 0.00011968229597259951,
2170
+ "loss": 2.8364,
2171
+ "step": 14650
2172
+ },
2173
+ {
2174
+ "epoch": 0.4210209758664762,
2175
+ "grad_norm": 0.15625,
2176
+ "learning_rate": 0.00011938703200661391,
2177
+ "loss": 1.4864,
2178
+ "step": 14700
2179
+ },
2180
+ {
2181
+ "epoch": 0.42245302000207646,
2182
+ "grad_norm": 80.0,
2183
+ "learning_rate": 0.00011909176804062833,
2184
+ "loss": 2.3397,
2185
+ "step": 14750
2186
+ },
2187
+ {
2188
+ "epoch": 0.42388506413767674,
2189
+ "grad_norm": 101.5,
2190
+ "learning_rate": 0.00011879650407464274,
2191
+ "loss": 3.3186,
2192
+ "step": 14800
2193
+ },
2194
+ {
2195
+ "epoch": 0.42531710827327696,
2196
+ "grad_norm": 2.6875,
2197
+ "learning_rate": 0.00011850124010865716,
2198
+ "loss": 2.7605,
2199
+ "step": 14850
2200
+ },
2201
+ {
2202
+ "epoch": 0.42674915240887723,
2203
+ "grad_norm": 95.0,
2204
+ "learning_rate": 0.00011820597614267155,
2205
+ "loss": 1.9215,
2206
+ "step": 14900
2207
+ },
2208
+ {
2209
+ "epoch": 0.4281811965444775,
2210
+ "grad_norm": 0.65625,
2211
+ "learning_rate": 0.00011791071217668597,
2212
+ "loss": 2.0689,
2213
+ "step": 14950
2214
+ },
2215
+ {
2216
+ "epoch": 0.4296132406800778,
2217
+ "grad_norm": 0.66015625,
2218
+ "learning_rate": 0.00011761544821070036,
2219
+ "loss": 3.5633,
2220
+ "step": 15000
2221
+ },
2222
+ {
2223
+ "epoch": 0.4296132406800778,
2224
+ "eval_accuracy": 0.942,
2225
+ "eval_loss": 0.27190178632736206,
2226
+ "eval_macro_f1": 0.9413541591870516,
2227
+ "eval_runtime": 181.3524,
2228
+ "eval_samples_per_second": 11.028,
2229
+ "eval_steps_per_second": 11.028,
2230
+ "step": 15000
2231
+ },
2232
+ {
2233
+ "epoch": 0.431045284815678,
2234
+ "grad_norm": 42.25,
2235
+ "learning_rate": 0.00011732018424471478,
2236
+ "loss": 1.8073,
2237
+ "step": 15050
2238
+ },
2239
+ {
2240
+ "epoch": 0.4324773289512783,
2241
+ "grad_norm": 0.22265625,
2242
+ "learning_rate": 0.0001170249202787292,
2243
+ "loss": 1.777,
2244
+ "step": 15100
2245
+ },
2246
+ {
2247
+ "epoch": 0.43390937308687855,
2248
+ "grad_norm": 0.11328125,
2249
+ "learning_rate": 0.0001167296563127436,
2250
+ "loss": 1.9598,
2251
+ "step": 15150
2252
+ },
2253
+ {
2254
+ "epoch": 0.4353414172224788,
2255
+ "grad_norm": 0.095703125,
2256
+ "learning_rate": 0.00011643439234675802,
2257
+ "loss": 2.7789,
2258
+ "step": 15200
2259
+ },
2260
+ {
2261
+ "epoch": 0.43677346135807904,
2262
+ "grad_norm": 0.8125,
2263
+ "learning_rate": 0.00011613912838077241,
2264
+ "loss": 2.3985,
2265
+ "step": 15250
2266
+ },
2267
+ {
2268
+ "epoch": 0.4382055054936793,
2269
+ "grad_norm": 0.22265625,
2270
+ "learning_rate": 0.00011584386441478683,
2271
+ "loss": 1.6076,
2272
+ "step": 15300
2273
+ },
2274
+ {
2275
+ "epoch": 0.4396375496292796,
2276
+ "grad_norm": 0.578125,
2277
+ "learning_rate": 0.00011554860044880122,
2278
+ "loss": 2.9266,
2279
+ "step": 15350
2280
+ },
2281
+ {
2282
+ "epoch": 0.4410695937648798,
2283
+ "grad_norm": 0.0130615234375,
2284
+ "learning_rate": 0.00011525333648281564,
2285
+ "loss": 1.388,
2286
+ "step": 15400
2287
+ },
2288
+ {
2289
+ "epoch": 0.4425016379004801,
2290
+ "grad_norm": 88.5,
2291
+ "learning_rate": 0.00011495807251683006,
2292
+ "loss": 2.6264,
2293
+ "step": 15450
2294
+ },
2295
+ {
2296
+ "epoch": 0.44393368203608036,
2297
+ "grad_norm": 0.123046875,
2298
+ "learning_rate": 0.00011466280855084446,
2299
+ "loss": 1.9447,
2300
+ "step": 15500
2301
+ },
2302
+ {
2303
+ "epoch": 0.44536572617168063,
2304
+ "grad_norm": 318.0,
2305
+ "learning_rate": 0.00011436754458485888,
2306
+ "loss": 1.4494,
2307
+ "step": 15550
2308
+ },
2309
+ {
2310
+ "epoch": 0.44679777030728085,
2311
+ "grad_norm": 0.123046875,
2312
+ "learning_rate": 0.00011407228061887327,
2313
+ "loss": 2.4483,
2314
+ "step": 15600
2315
+ },
2316
+ {
2317
+ "epoch": 0.4482298144428811,
2318
+ "grad_norm": 260.0,
2319
+ "learning_rate": 0.00011377701665288769,
2320
+ "loss": 2.8899,
2321
+ "step": 15650
2322
+ },
2323
+ {
2324
+ "epoch": 0.4496618585784814,
2325
+ "grad_norm": 0.07666015625,
2326
+ "learning_rate": 0.00011348175268690208,
2327
+ "loss": 2.0935,
2328
+ "step": 15700
2329
+ },
2330
+ {
2331
+ "epoch": 0.4510939027140817,
2332
+ "grad_norm": 0.047119140625,
2333
+ "learning_rate": 0.0001131864887209165,
2334
+ "loss": 3.0298,
2335
+ "step": 15750
2336
+ },
2337
+ {
2338
+ "epoch": 0.4525259468496819,
2339
+ "grad_norm": 18.25,
2340
+ "learning_rate": 0.00011289122475493092,
2341
+ "loss": 1.8288,
2342
+ "step": 15800
2343
+ },
2344
+ {
2345
+ "epoch": 0.45395799098528217,
2346
+ "grad_norm": 11.75,
2347
+ "learning_rate": 0.00011259596078894533,
2348
+ "loss": 2.9182,
2349
+ "step": 15850
2350
+ },
2351
+ {
2352
+ "epoch": 0.45539003512088244,
2353
+ "grad_norm": 0.1416015625,
2354
+ "learning_rate": 0.00011230069682295975,
2355
+ "loss": 1.9045,
2356
+ "step": 15900
2357
+ },
2358
+ {
2359
+ "epoch": 0.45682207925648266,
2360
+ "grad_norm": 0.1982421875,
2361
+ "learning_rate": 0.00011200543285697414,
2362
+ "loss": 1.7933,
2363
+ "step": 15950
2364
+ },
2365
+ {
2366
+ "epoch": 0.45825412339208293,
2367
+ "grad_norm": 201.0,
2368
+ "learning_rate": 0.00011171016889098856,
2369
+ "loss": 2.4752,
2370
+ "step": 16000
2371
+ },
2372
+ {
2373
+ "epoch": 0.4596861675276832,
2374
+ "grad_norm": 284.0,
2375
+ "learning_rate": 0.00011141490492500295,
2376
+ "loss": 1.7394,
2377
+ "step": 16050
2378
+ },
2379
+ {
2380
+ "epoch": 0.4611182116632835,
2381
+ "grad_norm": 0.1162109375,
2382
+ "learning_rate": 0.00011111964095901737,
2383
+ "loss": 1.3612,
2384
+ "step": 16100
2385
+ },
2386
+ {
2387
+ "epoch": 0.4625502557988837,
2388
+ "grad_norm": 0.30859375,
2389
+ "learning_rate": 0.00011082437699303178,
2390
+ "loss": 2.6066,
2391
+ "step": 16150
2392
+ },
2393
+ {
2394
+ "epoch": 0.463982299934484,
2395
+ "grad_norm": 23.75,
2396
+ "learning_rate": 0.00011052911302704618,
2397
+ "loss": 2.6868,
2398
+ "step": 16200
2399
+ },
2400
+ {
2401
+ "epoch": 0.46541434407008425,
2402
+ "grad_norm": 1.5546875,
2403
+ "learning_rate": 0.0001102338490610606,
2404
+ "loss": 2.6526,
2405
+ "step": 16250
2406
+ },
2407
+ {
2408
+ "epoch": 0.46541434407008425,
2409
+ "eval_accuracy": 0.9445,
2410
+ "eval_loss": 0.2859993577003479,
2411
+ "eval_macro_f1": 0.9439942443541156,
2412
+ "eval_runtime": 175.6054,
2413
+ "eval_samples_per_second": 11.389,
2414
+ "eval_steps_per_second": 11.389,
2415
+ "step": 16250
2416
+ },
2417
+ {
2418
+ "epoch": 0.4668463882056845,
2419
+ "grad_norm": 0.11181640625,
2420
+ "learning_rate": 0.000109938585095075,
2421
+ "loss": 2.041,
2422
+ "step": 16300
2423
+ },
2424
+ {
2425
+ "epoch": 0.46827843234128474,
2426
+ "grad_norm": 8.3125,
2427
+ "learning_rate": 0.00010964332112908942,
2428
+ "loss": 2.3374,
2429
+ "step": 16350
2430
+ },
2431
+ {
2432
+ "epoch": 0.469710476476885,
2433
+ "grad_norm": 268.0,
2434
+ "learning_rate": 0.00010934805716310381,
2435
+ "loss": 3.461,
2436
+ "step": 16400
2437
+ },
2438
+ {
2439
+ "epoch": 0.4711425206124853,
2440
+ "grad_norm": 270.0,
2441
+ "learning_rate": 0.00010905279319711823,
2442
+ "loss": 1.5168,
2443
+ "step": 16450
2444
+ },
2445
+ {
2446
+ "epoch": 0.4725745647480855,
2447
+ "grad_norm": 1.6171875,
2448
+ "learning_rate": 0.00010875752923113265,
2449
+ "loss": 1.9156,
2450
+ "step": 16500
2451
+ },
2452
+ {
2453
+ "epoch": 0.4740066088836858,
2454
+ "grad_norm": 3.25,
2455
+ "learning_rate": 0.00010846226526514704,
2456
+ "loss": 1.9988,
2457
+ "step": 16550
2458
+ },
2459
+ {
2460
+ "epoch": 0.47543865301928606,
2461
+ "grad_norm": 266.0,
2462
+ "learning_rate": 0.00010816700129916146,
2463
+ "loss": 1.9911,
2464
+ "step": 16600
2465
+ },
2466
+ {
2467
+ "epoch": 0.47687069715488634,
2468
+ "grad_norm": 0.45703125,
2469
+ "learning_rate": 0.00010787173733317586,
2470
+ "loss": 2.2805,
2471
+ "step": 16650
2472
+ },
2473
+ {
2474
+ "epoch": 0.47830274129048655,
2475
+ "grad_norm": 0.08740234375,
2476
+ "learning_rate": 0.00010757647336719028,
2477
+ "loss": 2.3786,
2478
+ "step": 16700
2479
+ },
2480
+ {
2481
+ "epoch": 0.47973478542608683,
2482
+ "grad_norm": 0.0263671875,
2483
+ "learning_rate": 0.00010728120940120467,
2484
+ "loss": 2.0964,
2485
+ "step": 16750
2486
+ },
2487
+ {
2488
+ "epoch": 0.4811668295616871,
2489
+ "grad_norm": 288.0,
2490
+ "learning_rate": 0.00010698594543521909,
2491
+ "loss": 2.5816,
2492
+ "step": 16800
2493
+ },
2494
+ {
2495
+ "epoch": 0.4825988736972873,
2496
+ "grad_norm": 0.2138671875,
2497
+ "learning_rate": 0.00010669068146923348,
2498
+ "loss": 1.2136,
2499
+ "step": 16850
2500
+ },
2501
+ {
2502
+ "epoch": 0.4840309178328876,
2503
+ "grad_norm": 1.1796875,
2504
+ "learning_rate": 0.0001063954175032479,
2505
+ "loss": 2.2321,
2506
+ "step": 16900
2507
+ },
2508
+ {
2509
+ "epoch": 0.48546296196848787,
2510
+ "grad_norm": 0.259765625,
2511
+ "learning_rate": 0.00010610015353726232,
2512
+ "loss": 2.7485,
2513
+ "step": 16950
2514
+ },
2515
+ {
2516
+ "epoch": 0.48689500610408815,
2517
+ "grad_norm": 75.5,
2518
+ "learning_rate": 0.00010580488957127673,
2519
+ "loss": 3.0909,
2520
+ "step": 17000
2521
+ },
2522
+ {
2523
+ "epoch": 0.48832705023968836,
2524
+ "grad_norm": 86.0,
2525
+ "learning_rate": 0.00010550962560529115,
2526
+ "loss": 2.0178,
2527
+ "step": 17050
2528
+ },
2529
+ {
2530
+ "epoch": 0.48975909437528864,
2531
+ "grad_norm": 0.98046875,
2532
+ "learning_rate": 0.00010521436163930554,
2533
+ "loss": 2.0193,
2534
+ "step": 17100
2535
+ },
2536
+ {
2537
+ "epoch": 0.4911911385108889,
2538
+ "grad_norm": 68.0,
2539
+ "learning_rate": 0.00010491909767331996,
2540
+ "loss": 2.178,
2541
+ "step": 17150
2542
+ },
2543
+ {
2544
+ "epoch": 0.4926231826464892,
2545
+ "grad_norm": 0.375,
2546
+ "learning_rate": 0.00010462383370733435,
2547
+ "loss": 2.4366,
2548
+ "step": 17200
2549
+ },
2550
+ {
2551
+ "epoch": 0.4940552267820894,
2552
+ "grad_norm": 0.322265625,
2553
+ "learning_rate": 0.00010432856974134877,
2554
+ "loss": 2.7989,
2555
+ "step": 17250
2556
+ },
2557
+ {
2558
+ "epoch": 0.4954872709176897,
2559
+ "grad_norm": 0.1513671875,
2560
+ "learning_rate": 0.00010403330577536318,
2561
+ "loss": 1.8681,
2562
+ "step": 17300
2563
+ },
2564
+ {
2565
+ "epoch": 0.49691931505328996,
2566
+ "grad_norm": 0.08935546875,
2567
+ "learning_rate": 0.00010373804180937759,
2568
+ "loss": 1.9697,
2569
+ "step": 17350
2570
+ },
2571
+ {
2572
+ "epoch": 0.4983513591888902,
2573
+ "grad_norm": 0.1328125,
2574
+ "learning_rate": 0.00010344277784339201,
2575
+ "loss": 1.7415,
2576
+ "step": 17400
2577
+ },
2578
+ {
2579
+ "epoch": 0.49978340332449045,
2580
+ "grad_norm": 0.65234375,
2581
+ "learning_rate": 0.0001031475138774064,
2582
+ "loss": 1.6849,
2583
+ "step": 17450
2584
+ },
2585
+ {
2586
+ "epoch": 0.5012154474600907,
2587
+ "grad_norm": 86.5,
2588
+ "learning_rate": 0.00010285224991142082,
2589
+ "loss": 2.1474,
2590
+ "step": 17500
2591
+ },
2592
+ {
2593
+ "epoch": 0.5012154474600907,
2594
+ "eval_accuracy": 0.947,
2595
+ "eval_loss": 0.2907390892505646,
2596
+ "eval_macro_f1": 0.9463529866080697,
2597
+ "eval_runtime": 172.6878,
2598
+ "eval_samples_per_second": 11.582,
2599
+ "eval_steps_per_second": 11.582,
2600
+ "step": 17500
2601
+ },
2602
+ {
2603
+ "epoch": 0.502647491595691,
2604
+ "grad_norm": 472.0,
2605
+ "learning_rate": 0.00010255698594543521,
2606
+ "loss": 1.9173,
2607
+ "step": 17550
2608
+ },
2609
+ {
2610
+ "epoch": 0.5040795357312913,
2611
+ "grad_norm": 1.3828125,
2612
+ "learning_rate": 0.00010226172197944963,
2613
+ "loss": 3.1869,
2614
+ "step": 17600
2615
+ },
2616
+ {
2617
+ "epoch": 0.5055115798668915,
2618
+ "grad_norm": 145.0,
2619
+ "learning_rate": 0.00010196645801346405,
2620
+ "loss": 2.5482,
2621
+ "step": 17650
2622
+ },
2623
+ {
2624
+ "epoch": 0.5069436240024917,
2625
+ "grad_norm": 0.58203125,
2626
+ "learning_rate": 0.00010167119404747844,
2627
+ "loss": 2.8567,
2628
+ "step": 17700
2629
+ },
2630
+ {
2631
+ "epoch": 0.508375668138092,
2632
+ "grad_norm": 0.0888671875,
2633
+ "learning_rate": 0.00010137593008149286,
2634
+ "loss": 1.7268,
2635
+ "step": 17750
2636
+ },
2637
+ {
2638
+ "epoch": 0.5098077122736923,
2639
+ "grad_norm": 286.0,
2640
+ "learning_rate": 0.00010108066611550726,
2641
+ "loss": 2.2268,
2642
+ "step": 17800
2643
+ },
2644
+ {
2645
+ "epoch": 0.5112397564092925,
2646
+ "grad_norm": 169.0,
2647
+ "learning_rate": 0.00010078540214952168,
2648
+ "loss": 1.8245,
2649
+ "step": 17850
2650
+ },
2651
+ {
2652
+ "epoch": 0.5126718005448928,
2653
+ "grad_norm": 0.1923828125,
2654
+ "learning_rate": 0.00010049013818353607,
2655
+ "loss": 2.3801,
2656
+ "step": 17900
2657
+ },
2658
+ {
2659
+ "epoch": 0.5141038446804931,
2660
+ "grad_norm": 77.0,
2661
+ "learning_rate": 0.00010019487421755049,
2662
+ "loss": 2.3412,
2663
+ "step": 17950
2664
+ },
2665
+ {
2666
+ "epoch": 0.5155358888160934,
2667
+ "grad_norm": 328.0,
2668
+ "learning_rate": 9.98996102515649e-05,
2669
+ "loss": 3.1564,
2670
+ "step": 18000
2671
+ },
2672
+ {
2673
+ "epoch": 0.5169679329516935,
2674
+ "grad_norm": 0.57421875,
2675
+ "learning_rate": 9.96043462855793e-05,
2676
+ "loss": 2.0409,
2677
+ "step": 18050
2678
+ },
2679
+ {
2680
+ "epoch": 0.5183999770872938,
2681
+ "grad_norm": 0.8828125,
2682
+ "learning_rate": 9.930908231959372e-05,
2683
+ "loss": 1.8093,
2684
+ "step": 18100
2685
+ },
2686
+ {
2687
+ "epoch": 0.5198320212228941,
2688
+ "grad_norm": 0.263671875,
2689
+ "learning_rate": 9.901381835360814e-05,
2690
+ "loss": 2.1228,
2691
+ "step": 18150
2692
+ },
2693
+ {
2694
+ "epoch": 0.5212640653584943,
2695
+ "grad_norm": 13.625,
2696
+ "learning_rate": 9.871855438762255e-05,
2697
+ "loss": 1.6072,
2698
+ "step": 18200
2699
+ },
2700
+ {
2701
+ "epoch": 0.5226961094940946,
2702
+ "grad_norm": 176.0,
2703
+ "learning_rate": 9.842329042163695e-05,
2704
+ "loss": 2.1088,
2705
+ "step": 18250
2706
+ },
2707
+ {
2708
+ "epoch": 0.5241281536296949,
2709
+ "grad_norm": 0.09228515625,
2710
+ "learning_rate": 9.812802645565136e-05,
2711
+ "loss": 2.2985,
2712
+ "step": 18300
2713
+ },
2714
+ {
2715
+ "epoch": 0.5255601977652952,
2716
+ "grad_norm": 5.9375,
2717
+ "learning_rate": 9.783276248966576e-05,
2718
+ "loss": 2.8687,
2719
+ "step": 18350
2720
+ },
2721
+ {
2722
+ "epoch": 0.5269922419008953,
2723
+ "grad_norm": 0.1142578125,
2724
+ "learning_rate": 9.753749852368017e-05,
2725
+ "loss": 1.9855,
2726
+ "step": 18400
2727
+ },
2728
+ {
2729
+ "epoch": 0.5284242860364956,
2730
+ "grad_norm": 1.03125,
2731
+ "learning_rate": 9.724223455769457e-05,
2732
+ "loss": 2.5827,
2733
+ "step": 18450
2734
+ },
2735
+ {
2736
+ "epoch": 0.5298563301720959,
2737
+ "grad_norm": 270.0,
2738
+ "learning_rate": 9.694697059170899e-05,
2739
+ "loss": 1.9905,
2740
+ "step": 18500
2741
+ },
2742
+ {
2743
+ "epoch": 0.5312883743076962,
2744
+ "grad_norm": 0.2392578125,
2745
+ "learning_rate": 9.665170662572341e-05,
2746
+ "loss": 1.9516,
2747
+ "step": 18550
2748
+ },
2749
+ {
2750
+ "epoch": 0.5327204184432964,
2751
+ "grad_norm": 116.5,
2752
+ "learning_rate": 9.635644265973781e-05,
2753
+ "loss": 1.7887,
2754
+ "step": 18600
2755
+ },
2756
+ {
2757
+ "epoch": 0.5341524625788967,
2758
+ "grad_norm": 0.0306396484375,
2759
+ "learning_rate": 9.606117869375222e-05,
2760
+ "loss": 1.8686,
2761
+ "step": 18650
2762
+ },
2763
+ {
2764
+ "epoch": 0.535584506714497,
2765
+ "grad_norm": 0.7109375,
2766
+ "learning_rate": 9.576591472776662e-05,
2767
+ "loss": 1.7828,
2768
+ "step": 18700
2769
+ },
2770
+ {
2771
+ "epoch": 0.5370165508500973,
2772
+ "grad_norm": 0.92578125,
2773
+ "learning_rate": 9.547065076178103e-05,
2774
+ "loss": 1.8761,
2775
+ "step": 18750
2776
+ },
2777
+ {
2778
+ "epoch": 0.5370165508500973,
2779
+ "eval_accuracy": 0.9495,
2780
+ "eval_loss": 0.26102420687675476,
2781
+ "eval_macro_f1": 0.9488861373782008,
2782
+ "eval_runtime": 172.7705,
2783
+ "eval_samples_per_second": 11.576,
2784
+ "eval_steps_per_second": 11.576,
2785
+ "step": 18750
2786
+ },
2787
+ {
2788
+ "epoch": 0.5384485949856974,
2789
+ "grad_norm": 0.1416015625,
2790
+ "learning_rate": 9.517538679579543e-05,
2791
+ "loss": 2.3765,
2792
+ "step": 18800
2793
+ },
2794
+ {
2795
+ "epoch": 0.5398806391212977,
2796
+ "grad_norm": 258.0,
2797
+ "learning_rate": 9.488012282980985e-05,
2798
+ "loss": 1.5944,
2799
+ "step": 18850
2800
+ },
2801
+ {
2802
+ "epoch": 0.541312683256898,
2803
+ "grad_norm": 106.0,
2804
+ "learning_rate": 9.458485886382427e-05,
2805
+ "loss": 2.4606,
2806
+ "step": 18900
2807
+ },
2808
+ {
2809
+ "epoch": 0.5427447273924982,
2810
+ "grad_norm": 92.5,
2811
+ "learning_rate": 9.428959489783868e-05,
2812
+ "loss": 2.6863,
2813
+ "step": 18950
2814
+ },
2815
+ {
2816
+ "epoch": 0.5441767715280985,
2817
+ "grad_norm": 207.0,
2818
+ "learning_rate": 9.399433093185308e-05,
2819
+ "loss": 1.9462,
2820
+ "step": 19000
2821
+ },
2822
+ {
2823
+ "epoch": 0.5456088156636988,
2824
+ "grad_norm": 82.5,
2825
+ "learning_rate": 9.369906696586749e-05,
2826
+ "loss": 2.8243,
2827
+ "step": 19050
2828
+ },
2829
+ {
2830
+ "epoch": 0.5470408597992991,
2831
+ "grad_norm": 0.1572265625,
2832
+ "learning_rate": 9.340380299988189e-05,
2833
+ "loss": 1.8158,
2834
+ "step": 19100
2835
+ },
2836
+ {
2837
+ "epoch": 0.5484729039348992,
2838
+ "grad_norm": 0.4609375,
2839
+ "learning_rate": 9.31085390338963e-05,
2840
+ "loss": 2.7108,
2841
+ "step": 19150
2842
+ },
2843
+ {
2844
+ "epoch": 0.5499049480704995,
2845
+ "grad_norm": 0.62109375,
2846
+ "learning_rate": 9.281327506791072e-05,
2847
+ "loss": 1.9835,
2848
+ "step": 19200
2849
+ },
2850
+ {
2851
+ "epoch": 0.5513369922060998,
2852
+ "grad_norm": 0.039794921875,
2853
+ "learning_rate": 9.251801110192512e-05,
2854
+ "loss": 2.0608,
2855
+ "step": 19250
2856
+ },
2857
+ {
2858
+ "epoch": 0.5527690363417,
2859
+ "grad_norm": 1.265625,
2860
+ "learning_rate": 9.222274713593954e-05,
2861
+ "loss": 1.9135,
2862
+ "step": 19300
2863
+ },
2864
+ {
2865
+ "epoch": 0.5542010804773003,
2866
+ "grad_norm": 0.030517578125,
2867
+ "learning_rate": 9.192748316995395e-05,
2868
+ "loss": 2.5224,
2869
+ "step": 19350
2870
+ },
2871
+ {
2872
+ "epoch": 0.5556331246129006,
2873
+ "grad_norm": 332.0,
2874
+ "learning_rate": 9.163221920396835e-05,
2875
+ "loss": 3.2123,
2876
+ "step": 19400
2877
+ },
2878
+ {
2879
+ "epoch": 0.5570651687485009,
2880
+ "grad_norm": 0.046142578125,
2881
+ "learning_rate": 9.133695523798276e-05,
2882
+ "loss": 1.9807,
2883
+ "step": 19450
2884
+ },
2885
+ {
2886
+ "epoch": 0.558497212884101,
2887
+ "grad_norm": 0.036376953125,
2888
+ "learning_rate": 9.104169127199716e-05,
2889
+ "loss": 2.8211,
2890
+ "step": 19500
2891
+ },
2892
+ {
2893
+ "epoch": 0.5599292570197013,
2894
+ "grad_norm": 0.2412109375,
2895
+ "learning_rate": 9.074642730601158e-05,
2896
+ "loss": 2.7913,
2897
+ "step": 19550
2898
+ },
2899
+ {
2900
+ "epoch": 0.5613613011553016,
2901
+ "grad_norm": 0.087890625,
2902
+ "learning_rate": 9.045116334002599e-05,
2903
+ "loss": 1.5528,
2904
+ "step": 19600
2905
+ },
2906
+ {
2907
+ "epoch": 0.5627933452909019,
2908
+ "grad_norm": 0.21484375,
2909
+ "learning_rate": 9.01558993740404e-05,
2910
+ "loss": 2.3024,
2911
+ "step": 19650
2912
+ },
2913
+ {
2914
+ "epoch": 0.5642253894265021,
2915
+ "grad_norm": 0.271484375,
2916
+ "learning_rate": 8.986063540805481e-05,
2917
+ "loss": 1.305,
2918
+ "step": 19700
2919
+ },
2920
+ {
2921
+ "epoch": 0.5656574335621024,
2922
+ "grad_norm": 0.373046875,
2923
+ "learning_rate": 8.956537144206921e-05,
2924
+ "loss": 2.1656,
2925
+ "step": 19750
2926
+ },
2927
+ {
2928
+ "epoch": 0.5670894776977027,
2929
+ "grad_norm": 84.5,
2930
+ "learning_rate": 8.927010747608362e-05,
2931
+ "loss": 1.6671,
2932
+ "step": 19800
2933
+ },
2934
+ {
2935
+ "epoch": 0.568521521833303,
2936
+ "grad_norm": 112.0,
2937
+ "learning_rate": 8.897484351009802e-05,
2938
+ "loss": 2.4715,
2939
+ "step": 19850
2940
+ },
2941
+ {
2942
+ "epoch": 0.5699535659689031,
2943
+ "grad_norm": 0.130859375,
2944
+ "learning_rate": 8.867957954411244e-05,
2945
+ "loss": 1.7577,
2946
+ "step": 19900
2947
+ },
2948
+ {
2949
+ "epoch": 0.5713856101045034,
2950
+ "grad_norm": 0.1357421875,
2951
+ "learning_rate": 8.838431557812685e-05,
2952
+ "loss": 1.6778,
2953
+ "step": 19950
2954
+ },
2955
+ {
2956
+ "epoch": 0.5728176542401037,
2957
+ "grad_norm": 0.07080078125,
2958
+ "learning_rate": 8.808905161214125e-05,
2959
+ "loss": 1.4789,
2960
+ "step": 20000
2961
+ },
2962
+ {
2963
+ "epoch": 0.5728176542401037,
2964
+ "eval_accuracy": 0.949,
2965
+ "eval_loss": 0.2858292758464813,
2966
+ "eval_macro_f1": 0.9484543460104051,
2967
+ "eval_runtime": 172.7421,
2968
+ "eval_samples_per_second": 11.578,
2969
+ "eval_steps_per_second": 11.578,
2970
+ "step": 20000
2971
+ },
2972
+ {
2973
+ "epoch": 0.5742496983757039,
2974
+ "grad_norm": 0.140625,
2975
+ "learning_rate": 8.779378764615567e-05,
2976
+ "loss": 1.6797,
2977
+ "step": 20050
2978
+ },
2979
+ {
2980
+ "epoch": 0.5756817425113042,
2981
+ "grad_norm": 0.115234375,
2982
+ "learning_rate": 8.749852368017008e-05,
2983
+ "loss": 1.5026,
2984
+ "step": 20100
2985
+ },
2986
+ {
2987
+ "epoch": 0.5771137866469045,
2988
+ "grad_norm": 0.02880859375,
2989
+ "learning_rate": 8.720325971418448e-05,
2990
+ "loss": 2.1316,
2991
+ "step": 20150
2992
+ },
2993
+ {
2994
+ "epoch": 0.5785458307825048,
2995
+ "grad_norm": 0.08544921875,
2996
+ "learning_rate": 8.690799574819889e-05,
2997
+ "loss": 1.7517,
2998
+ "step": 20200
2999
+ },
3000
+ {
3001
+ "epoch": 0.5799778749181049,
3002
+ "grad_norm": 82.0,
3003
+ "learning_rate": 8.661273178221331e-05,
3004
+ "loss": 2.4167,
3005
+ "step": 20250
3006
+ },
3007
+ {
3008
+ "epoch": 0.5814099190537052,
3009
+ "grad_norm": 0.138671875,
3010
+ "learning_rate": 8.631746781622771e-05,
3011
+ "loss": 1.8565,
3012
+ "step": 20300
3013
+ },
3014
+ {
3015
+ "epoch": 0.5828419631893055,
3016
+ "grad_norm": 34.25,
3017
+ "learning_rate": 8.602220385024212e-05,
3018
+ "loss": 1.9747,
3019
+ "step": 20350
3020
+ },
3021
+ {
3022
+ "epoch": 0.5842740073249058,
3023
+ "grad_norm": 91.0,
3024
+ "learning_rate": 8.572693988425654e-05,
3025
+ "loss": 2.3284,
3026
+ "step": 20400
3027
+ },
3028
+ {
3029
+ "epoch": 0.585706051460506,
3030
+ "grad_norm": 0.80859375,
3031
+ "learning_rate": 8.543167591827094e-05,
3032
+ "loss": 2.1788,
3033
+ "step": 20450
3034
+ },
3035
+ {
3036
+ "epoch": 0.5871380955961063,
3037
+ "grad_norm": 446.0,
3038
+ "learning_rate": 8.513641195228535e-05,
3039
+ "loss": 1.6187,
3040
+ "step": 20500
3041
+ },
3042
+ {
3043
+ "epoch": 0.5885701397317066,
3044
+ "grad_norm": 43.75,
3045
+ "learning_rate": 8.484114798629975e-05,
3046
+ "loss": 1.6338,
3047
+ "step": 20550
3048
+ },
3049
+ {
3050
+ "epoch": 0.5900021838673067,
3051
+ "grad_norm": 0.91015625,
3052
+ "learning_rate": 8.454588402031417e-05,
3053
+ "loss": 2.424,
3054
+ "step": 20600
3055
+ },
3056
+ {
3057
+ "epoch": 0.591434228002907,
3058
+ "grad_norm": 0.44921875,
3059
+ "learning_rate": 8.425062005432858e-05,
3060
+ "loss": 2.3043,
3061
+ "step": 20650
3062
+ },
3063
+ {
3064
+ "epoch": 0.5928662721385073,
3065
+ "grad_norm": 268.0,
3066
+ "learning_rate": 8.395535608834298e-05,
3067
+ "loss": 2.5707,
3068
+ "step": 20700
3069
+ },
3070
+ {
3071
+ "epoch": 0.5942983162741076,
3072
+ "grad_norm": 4.125,
3073
+ "learning_rate": 8.366009212235739e-05,
3074
+ "loss": 1.9577,
3075
+ "step": 20750
3076
+ },
3077
+ {
3078
+ "epoch": 0.5957303604097078,
3079
+ "grad_norm": 5.0,
3080
+ "learning_rate": 8.33648281563718e-05,
3081
+ "loss": 0.7482,
3082
+ "step": 20800
3083
+ },
3084
+ {
3085
+ "epoch": 0.5971624045453081,
3086
+ "grad_norm": 0.04052734375,
3087
+ "learning_rate": 8.306956419038621e-05,
3088
+ "loss": 1.5055,
3089
+ "step": 20850
3090
+ },
3091
+ {
3092
+ "epoch": 0.5985944486809084,
3093
+ "grad_norm": 1.109375,
3094
+ "learning_rate": 8.277430022440061e-05,
3095
+ "loss": 3.3671,
3096
+ "step": 20900
3097
+ },
3098
+ {
3099
+ "epoch": 0.6000264928165085,
3100
+ "grad_norm": 0.162109375,
3101
+ "learning_rate": 8.247903625841503e-05,
3102
+ "loss": 2.0574,
3103
+ "step": 20950
3104
+ },
3105
+ {
3106
+ "epoch": 0.6014585369521088,
3107
+ "grad_norm": 0.03173828125,
3108
+ "learning_rate": 8.218377229242944e-05,
3109
+ "loss": 2.1942,
3110
+ "step": 21000
3111
+ },
3112
+ {
3113
+ "epoch": 0.6028905810877091,
3114
+ "grad_norm": 0.251953125,
3115
+ "learning_rate": 8.188850832644384e-05,
3116
+ "loss": 1.6319,
3117
+ "step": 21050
3118
+ },
3119
+ {
3120
+ "epoch": 0.6043226252233094,
3121
+ "grad_norm": 86.5,
3122
+ "learning_rate": 8.159324436045825e-05,
3123
+ "loss": 2.1558,
3124
+ "step": 21100
3125
+ },
3126
+ {
3127
+ "epoch": 0.6057546693589096,
3128
+ "grad_norm": 2.984375,
3129
+ "learning_rate": 8.129798039447267e-05,
3130
+ "loss": 2.2353,
3131
+ "step": 21150
3132
+ },
3133
+ {
3134
+ "epoch": 0.6071867134945099,
3135
+ "grad_norm": 0.0908203125,
3136
+ "learning_rate": 8.100271642848707e-05,
3137
+ "loss": 1.4975,
3138
+ "step": 21200
3139
+ },
3140
+ {
3141
+ "epoch": 0.6086187576301102,
3142
+ "grad_norm": 98.0,
3143
+ "learning_rate": 8.070745246250148e-05,
3144
+ "loss": 2.5975,
3145
+ "step": 21250
3146
+ },
3147
+ {
3148
+ "epoch": 0.6086187576301102,
3149
+ "eval_accuracy": 0.948,
3150
+ "eval_loss": 0.2741381525993347,
3151
+ "eval_macro_f1": 0.9473973559594594,
3152
+ "eval_runtime": 172.6111,
3153
+ "eval_samples_per_second": 11.587,
3154
+ "eval_steps_per_second": 11.587,
3155
+ "step": 21250
3156
+ },
3157
+ {
3158
+ "epoch": 0.6100508017657105,
3159
+ "grad_norm": 328.0,
3160
+ "learning_rate": 8.04121884965159e-05,
3161
+ "loss": 2.4534,
3162
+ "step": 21300
3163
+ },
3164
+ {
3165
+ "epoch": 0.6114828459013106,
3166
+ "grad_norm": 11.125,
3167
+ "learning_rate": 8.01169245305303e-05,
3168
+ "loss": 2.1319,
3169
+ "step": 21350
3170
+ },
3171
+ {
3172
+ "epoch": 0.6129148900369109,
3173
+ "grad_norm": 0.08642578125,
3174
+ "learning_rate": 7.982166056454471e-05,
3175
+ "loss": 2.4199,
3176
+ "step": 21400
3177
+ },
3178
+ {
3179
+ "epoch": 0.6143469341725112,
3180
+ "grad_norm": 0.18359375,
3181
+ "learning_rate": 7.952639659855911e-05,
3182
+ "loss": 1.7527,
3183
+ "step": 21450
3184
+ },
3185
+ {
3186
+ "epoch": 0.6157789783081115,
3187
+ "grad_norm": 0.458984375,
3188
+ "learning_rate": 7.923113263257352e-05,
3189
+ "loss": 2.4992,
3190
+ "step": 21500
3191
+ },
3192
+ {
3193
+ "epoch": 0.6172110224437117,
3194
+ "grad_norm": 0.671875,
3195
+ "learning_rate": 7.893586866658794e-05,
3196
+ "loss": 2.5082,
3197
+ "step": 21550
3198
+ },
3199
+ {
3200
+ "epoch": 0.618643066579312,
3201
+ "grad_norm": 272.0,
3202
+ "learning_rate": 7.864060470060234e-05,
3203
+ "loss": 2.2187,
3204
+ "step": 21600
3205
+ },
3206
+ {
3207
+ "epoch": 0.6200751107149123,
3208
+ "grad_norm": 310.0,
3209
+ "learning_rate": 7.834534073461675e-05,
3210
+ "loss": 3.006,
3211
+ "step": 21650
3212
+ },
3213
+ {
3214
+ "epoch": 0.6215071548505124,
3215
+ "grad_norm": 536.0,
3216
+ "learning_rate": 7.805007676863117e-05,
3217
+ "loss": 2.4535,
3218
+ "step": 21700
3219
+ },
3220
+ {
3221
+ "epoch": 0.6229391989861127,
3222
+ "grad_norm": 408.0,
3223
+ "learning_rate": 7.775481280264557e-05,
3224
+ "loss": 2.376,
3225
+ "step": 21750
3226
+ },
3227
+ {
3228
+ "epoch": 0.624371243121713,
3229
+ "grad_norm": 0.123046875,
3230
+ "learning_rate": 7.745954883665998e-05,
3231
+ "loss": 1.3044,
3232
+ "step": 21800
3233
+ },
3234
+ {
3235
+ "epoch": 0.6258032872573133,
3236
+ "grad_norm": 0.70703125,
3237
+ "learning_rate": 7.716428487067438e-05,
3238
+ "loss": 1.9046,
3239
+ "step": 21850
3240
+ },
3241
+ {
3242
+ "epoch": 0.6272353313929135,
3243
+ "grad_norm": 0.189453125,
3244
+ "learning_rate": 7.68690209046888e-05,
3245
+ "loss": 1.8825,
3246
+ "step": 21900
3247
+ },
3248
+ {
3249
+ "epoch": 0.6286673755285138,
3250
+ "grad_norm": 0.251953125,
3251
+ "learning_rate": 7.65737569387032e-05,
3252
+ "loss": 2.353,
3253
+ "step": 21950
3254
+ },
3255
+ {
3256
+ "epoch": 0.6300994196641141,
3257
+ "grad_norm": 0.0301513671875,
3258
+ "learning_rate": 7.627849297271761e-05,
3259
+ "loss": 1.7222,
3260
+ "step": 22000
3261
+ },
3262
+ {
3263
+ "epoch": 0.6315314637997143,
3264
+ "grad_norm": 0.126953125,
3265
+ "learning_rate": 7.598322900673203e-05,
3266
+ "loss": 2.4583,
3267
+ "step": 22050
3268
+ },
3269
+ {
3270
+ "epoch": 0.6329635079353145,
3271
+ "grad_norm": 1.0234375,
3272
+ "learning_rate": 7.568796504074643e-05,
3273
+ "loss": 1.9643,
3274
+ "step": 22100
3275
+ },
3276
+ {
3277
+ "epoch": 0.6343955520709148,
3278
+ "grad_norm": 70.5,
3279
+ "learning_rate": 7.539270107476084e-05,
3280
+ "loss": 1.6712,
3281
+ "step": 22150
3282
+ },
3283
+ {
3284
+ "epoch": 0.6358275962065151,
3285
+ "grad_norm": 8.0,
3286
+ "learning_rate": 7.509743710877524e-05,
3287
+ "loss": 2.1964,
3288
+ "step": 22200
3289
+ },
3290
+ {
3291
+ "epoch": 0.6372596403421154,
3292
+ "grad_norm": 11.625,
3293
+ "learning_rate": 7.480217314278965e-05,
3294
+ "loss": 2.0319,
3295
+ "step": 22250
3296
+ },
3297
+ {
3298
+ "epoch": 0.6386916844777156,
3299
+ "grad_norm": 0.09033203125,
3300
+ "learning_rate": 7.450690917680407e-05,
3301
+ "loss": 3.1062,
3302
+ "step": 22300
3303
+ },
3304
+ {
3305
+ "epoch": 0.6401237286133159,
3306
+ "grad_norm": 490.0,
3307
+ "learning_rate": 7.421164521081847e-05,
3308
+ "loss": 2.028,
3309
+ "step": 22350
3310
+ },
3311
+ {
3312
+ "epoch": 0.6415557727489162,
3313
+ "grad_norm": 688.0,
3314
+ "learning_rate": 7.391638124483289e-05,
3315
+ "loss": 1.6743,
3316
+ "step": 22400
3317
+ },
3318
+ {
3319
+ "epoch": 0.6429878168845163,
3320
+ "grad_norm": 0.2578125,
3321
+ "learning_rate": 7.36211172788473e-05,
3322
+ "loss": 1.3926,
3323
+ "step": 22450
3324
+ },
3325
+ {
3326
+ "epoch": 0.6444198610201166,
3327
+ "grad_norm": 278.0,
3328
+ "learning_rate": 7.33258533128617e-05,
3329
+ "loss": 2.073,
3330
+ "step": 22500
3331
+ },
3332
+ {
3333
+ "epoch": 0.6444198610201166,
3334
+ "eval_accuracy": 0.9495,
3335
+ "eval_loss": 0.2617259919643402,
3336
+ "eval_macro_f1": 0.9489699460568645,
3337
+ "eval_runtime": 172.6662,
3338
+ "eval_samples_per_second": 11.583,
3339
+ "eval_steps_per_second": 11.583,
3340
+ "step": 22500
3341
+ },
3342
+ {
3343
+ "epoch": 0.6458519051557169,
3344
+ "grad_norm": 0.07373046875,
3345
+ "learning_rate": 7.303058934687611e-05,
3346
+ "loss": 2.099,
3347
+ "step": 22550
3348
+ },
3349
+ {
3350
+ "epoch": 0.6472839492913172,
3351
+ "grad_norm": 0.5859375,
3352
+ "learning_rate": 7.273532538089051e-05,
3353
+ "loss": 2.2826,
3354
+ "step": 22600
3355
+ },
3356
+ {
3357
+ "epoch": 0.6487159934269174,
3358
+ "grad_norm": 79.5,
3359
+ "learning_rate": 7.244006141490493e-05,
3360
+ "loss": 1.377,
3361
+ "step": 22650
3362
+ },
3363
+ {
3364
+ "epoch": 0.6501480375625177,
3365
+ "grad_norm": 2.265625,
3366
+ "learning_rate": 7.214479744891934e-05,
3367
+ "loss": 1.9826,
3368
+ "step": 22700
3369
+ },
3370
+ {
3371
+ "epoch": 0.651580081698118,
3372
+ "grad_norm": 0.37109375,
3373
+ "learning_rate": 7.184953348293376e-05,
3374
+ "loss": 2.2446,
3375
+ "step": 22750
3376
+ },
3377
+ {
3378
+ "epoch": 0.6530121258337181,
3379
+ "grad_norm": 4.25,
3380
+ "learning_rate": 7.155426951694816e-05,
3381
+ "loss": 2.0254,
3382
+ "step": 22800
3383
+ },
3384
+ {
3385
+ "epoch": 0.6544441699693184,
3386
+ "grad_norm": 0.03564453125,
3387
+ "learning_rate": 7.125900555096257e-05,
3388
+ "loss": 2.0871,
3389
+ "step": 22850
3390
+ },
3391
+ {
3392
+ "epoch": 0.6558762141049187,
3393
+ "grad_norm": 0.390625,
3394
+ "learning_rate": 7.096374158497697e-05,
3395
+ "loss": 2.9276,
3396
+ "step": 22900
3397
+ },
3398
+ {
3399
+ "epoch": 0.657308258240519,
3400
+ "grad_norm": 0.384765625,
3401
+ "learning_rate": 7.066847761899138e-05,
3402
+ "loss": 1.0622,
3403
+ "step": 22950
3404
+ },
3405
+ {
3406
+ "epoch": 0.6587403023761192,
3407
+ "grad_norm": 4576.0,
3408
+ "learning_rate": 7.037321365300578e-05,
3409
+ "loss": 3.0808,
3410
+ "step": 23000
3411
+ },
3412
+ {
3413
+ "epoch": 0.6601723465117195,
3414
+ "grad_norm": 372.0,
3415
+ "learning_rate": 7.00779496870202e-05,
3416
+ "loss": 1.8306,
3417
+ "step": 23050
3418
+ },
3419
+ {
3420
+ "epoch": 0.6616043906473198,
3421
+ "grad_norm": 0.208984375,
3422
+ "learning_rate": 6.978268572103462e-05,
3423
+ "loss": 2.1282,
3424
+ "step": 23100
3425
+ },
3426
+ {
3427
+ "epoch": 0.66303643478292,
3428
+ "grad_norm": 5.65625,
3429
+ "learning_rate": 6.948742175504902e-05,
3430
+ "loss": 1.8392,
3431
+ "step": 23150
3432
+ },
3433
+ {
3434
+ "epoch": 0.6644684789185202,
3435
+ "grad_norm": 0.24609375,
3436
+ "learning_rate": 6.919215778906343e-05,
3437
+ "loss": 2.594,
3438
+ "step": 23200
3439
+ },
3440
+ {
3441
+ "epoch": 0.6659005230541205,
3442
+ "grad_norm": 0.123046875,
3443
+ "learning_rate": 6.889689382307783e-05,
3444
+ "loss": 2.4234,
3445
+ "step": 23250
3446
+ },
3447
+ {
3448
+ "epoch": 0.6673325671897208,
3449
+ "grad_norm": 268.0,
3450
+ "learning_rate": 6.860162985709224e-05,
3451
+ "loss": 2.3424,
3452
+ "step": 23300
3453
+ },
3454
+ {
3455
+ "epoch": 0.6687646113253211,
3456
+ "grad_norm": 0.427734375,
3457
+ "learning_rate": 6.830636589110664e-05,
3458
+ "loss": 2.3216,
3459
+ "step": 23350
3460
+ },
3461
+ {
3462
+ "epoch": 0.6701966554609213,
3463
+ "grad_norm": 0.9296875,
3464
+ "learning_rate": 6.801110192512106e-05,
3465
+ "loss": 2.4566,
3466
+ "step": 23400
3467
+ },
3468
+ {
3469
+ "epoch": 0.6716286995965216,
3470
+ "grad_norm": 88.5,
3471
+ "learning_rate": 6.771583795913548e-05,
3472
+ "loss": 1.3767,
3473
+ "step": 23450
3474
+ },
3475
+ {
3476
+ "epoch": 0.6730607437321218,
3477
+ "grad_norm": 488.0,
3478
+ "learning_rate": 6.742057399314989e-05,
3479
+ "loss": 2.343,
3480
+ "step": 23500
3481
+ },
3482
+ {
3483
+ "epoch": 0.674492787867722,
3484
+ "grad_norm": 296.0,
3485
+ "learning_rate": 6.712531002716429e-05,
3486
+ "loss": 1.4841,
3487
+ "step": 23550
3488
+ },
3489
+ {
3490
+ "epoch": 0.6759248320033223,
3491
+ "grad_norm": 218.0,
3492
+ "learning_rate": 6.68300460611787e-05,
3493
+ "loss": 2.4037,
3494
+ "step": 23600
3495
+ },
3496
+ {
3497
+ "epoch": 0.6773568761389226,
3498
+ "grad_norm": 0.466796875,
3499
+ "learning_rate": 6.65347820951931e-05,
3500
+ "loss": 1.4982,
3501
+ "step": 23650
3502
+ },
3503
+ {
3504
+ "epoch": 0.6787889202745229,
3505
+ "grad_norm": 49.25,
3506
+ "learning_rate": 6.623951812920751e-05,
3507
+ "loss": 2.2085,
3508
+ "step": 23700
3509
+ },
3510
+ {
3511
+ "epoch": 0.6802209644101231,
3512
+ "grad_norm": 0.30078125,
3513
+ "learning_rate": 6.594425416322191e-05,
3514
+ "loss": 1.7055,
3515
+ "step": 23750
3516
+ },
3517
+ {
3518
+ "epoch": 0.6802209644101231,
3519
+ "eval_accuracy": 0.9505,
3520
+ "eval_loss": 0.26270824670791626,
3521
+ "eval_macro_f1": 0.9498080478089564,
3522
+ "eval_runtime": 172.6664,
3523
+ "eval_samples_per_second": 11.583,
3524
+ "eval_steps_per_second": 11.583,
3525
+ "step": 23750
3526
+ },
3527
+ {
3528
+ "epoch": 0.6816530085457234,
3529
+ "grad_norm": 95.5,
3530
+ "learning_rate": 6.564899019723633e-05,
3531
+ "loss": 2.5024,
3532
+ "step": 23800
3533
+ },
3534
+ {
3535
+ "epoch": 0.6830850526813237,
3536
+ "grad_norm": 0.30078125,
3537
+ "learning_rate": 6.535372623125075e-05,
3538
+ "loss": 2.2518,
3539
+ "step": 23850
3540
+ },
3541
+ {
3542
+ "epoch": 0.6845170968169239,
3543
+ "grad_norm": 116.5,
3544
+ "learning_rate": 6.505846226526516e-05,
3545
+ "loss": 2.0539,
3546
+ "step": 23900
3547
+ },
3548
+ {
3549
+ "epoch": 0.6859491409525241,
3550
+ "grad_norm": 264.0,
3551
+ "learning_rate": 6.476319829927956e-05,
3552
+ "loss": 2.5857,
3553
+ "step": 23950
3554
+ },
3555
+ {
3556
+ "epoch": 0.6873811850881244,
3557
+ "grad_norm": 302.0,
3558
+ "learning_rate": 6.446793433329397e-05,
3559
+ "loss": 2.1408,
3560
+ "step": 24000
3561
+ },
3562
+ {
3563
+ "epoch": 0.6888132292237247,
3564
+ "grad_norm": 0.5390625,
3565
+ "learning_rate": 6.417267036730837e-05,
3566
+ "loss": 1.9618,
3567
+ "step": 24050
3568
+ },
3569
+ {
3570
+ "epoch": 0.690245273359325,
3571
+ "grad_norm": 164.0,
3572
+ "learning_rate": 6.387740640132278e-05,
3573
+ "loss": 2.0112,
3574
+ "step": 24100
3575
+ },
3576
+ {
3577
+ "epoch": 0.6916773174949252,
3578
+ "grad_norm": 14.0625,
3579
+ "learning_rate": 6.35821424353372e-05,
3580
+ "loss": 2.7256,
3581
+ "step": 24150
3582
+ },
3583
+ {
3584
+ "epoch": 0.6931093616305255,
3585
+ "grad_norm": 164.0,
3586
+ "learning_rate": 6.328687846935161e-05,
3587
+ "loss": 0.8362,
3588
+ "step": 24200
3589
+ },
3590
+ {
3591
+ "epoch": 0.6945414057661257,
3592
+ "grad_norm": 0.271484375,
3593
+ "learning_rate": 6.299161450336602e-05,
3594
+ "loss": 2.2874,
3595
+ "step": 24250
3596
+ },
3597
+ {
3598
+ "epoch": 0.6959734499017259,
3599
+ "grad_norm": 110.5,
3600
+ "learning_rate": 6.269635053738042e-05,
3601
+ "loss": 1.5674,
3602
+ "step": 24300
3603
+ },
3604
+ {
3605
+ "epoch": 0.6974054940373262,
3606
+ "grad_norm": 0.1630859375,
3607
+ "learning_rate": 6.240108657139483e-05,
3608
+ "loss": 2.5817,
3609
+ "step": 24350
3610
+ },
3611
+ {
3612
+ "epoch": 0.6988375381729265,
3613
+ "grad_norm": 0.99609375,
3614
+ "learning_rate": 6.210582260540923e-05,
3615
+ "loss": 1.2537,
3616
+ "step": 24400
3617
+ },
3618
+ {
3619
+ "epoch": 0.7002695823085268,
3620
+ "grad_norm": 266.0,
3621
+ "learning_rate": 6.181055863942364e-05,
3622
+ "loss": 2.4499,
3623
+ "step": 24450
3624
+ },
3625
+ {
3626
+ "epoch": 0.701701626444127,
3627
+ "grad_norm": 1.59375,
3628
+ "learning_rate": 6.151529467343806e-05,
3629
+ "loss": 2.8047,
3630
+ "step": 24500
3631
+ },
3632
+ {
3633
+ "epoch": 0.7031336705797273,
3634
+ "grad_norm": 0.08447265625,
3635
+ "learning_rate": 6.122003070745246e-05,
3636
+ "loss": 1.6917,
3637
+ "step": 24550
3638
+ },
3639
+ {
3640
+ "epoch": 0.7045657147153275,
3641
+ "grad_norm": 296.0,
3642
+ "learning_rate": 6.0924766741466875e-05,
3643
+ "loss": 2.2486,
3644
+ "step": 24600
3645
+ },
3646
+ {
3647
+ "epoch": 0.7059977588509277,
3648
+ "grad_norm": 183.0,
3649
+ "learning_rate": 6.062950277548128e-05,
3650
+ "loss": 2.5183,
3651
+ "step": 24650
3652
+ },
3653
+ {
3654
+ "epoch": 0.707429802986528,
3655
+ "grad_norm": 0.123046875,
3656
+ "learning_rate": 6.033423880949569e-05,
3657
+ "loss": 2.2984,
3658
+ "step": 24700
3659
+ },
3660
+ {
3661
+ "epoch": 0.7088618471221283,
3662
+ "grad_norm": 0.37890625,
3663
+ "learning_rate": 6.00389748435101e-05,
3664
+ "loss": 2.4598,
3665
+ "step": 24750
3666
+ },
3667
+ {
3668
+ "epoch": 0.7102938912577286,
3669
+ "grad_norm": 6.25,
3670
+ "learning_rate": 5.97437108775245e-05,
3671
+ "loss": 2.0554,
3672
+ "step": 24800
3673
+ },
3674
+ {
3675
+ "epoch": 0.7117259353933288,
3676
+ "grad_norm": 1.5234375,
3677
+ "learning_rate": 5.9448446911538915e-05,
3678
+ "loss": 1.3688,
3679
+ "step": 24850
3680
+ },
3681
+ {
3682
+ "epoch": 0.7131579795289291,
3683
+ "grad_norm": 83.5,
3684
+ "learning_rate": 5.9153182945553334e-05,
3685
+ "loss": 2.6434,
3686
+ "step": 24900
3687
+ },
3688
+ {
3689
+ "epoch": 0.7145900236645294,
3690
+ "grad_norm": 0.8046875,
3691
+ "learning_rate": 5.885791897956774e-05,
3692
+ "loss": 1.1703,
3693
+ "step": 24950
3694
+ },
3695
+ {
3696
+ "epoch": 0.7160220678001296,
3697
+ "grad_norm": 0.76953125,
3698
+ "learning_rate": 5.8562655013582144e-05,
3699
+ "loss": 1.7433,
3700
+ "step": 25000
3701
+ },
3702
+ {
3703
+ "epoch": 0.7160220678001296,
3704
+ "eval_accuracy": 0.9475,
3705
+ "eval_loss": 0.2805185317993164,
3706
+ "eval_macro_f1": 0.9469725724830536,
3707
+ "eval_runtime": 172.6365,
3708
+ "eval_samples_per_second": 11.585,
3709
+ "eval_steps_per_second": 11.585,
3710
+ "step": 25000
3711
+ },
3712
+ {
3713
+ "epoch": 0.7174541119357298,
3714
+ "grad_norm": 268.0,
3715
+ "learning_rate": 5.8267391047596556e-05,
3716
+ "loss": 2.7963,
3717
+ "step": 25050
3718
+ },
3719
+ {
3720
+ "epoch": 0.7188861560713301,
3721
+ "grad_norm": 294.0,
3722
+ "learning_rate": 5.797212708161096e-05,
3723
+ "loss": 2.3253,
3724
+ "step": 25100
3725
+ },
3726
+ {
3727
+ "epoch": 0.7203182002069304,
3728
+ "grad_norm": 0.11181640625,
3729
+ "learning_rate": 5.7676863115625366e-05,
3730
+ "loss": 1.0165,
3731
+ "step": 25150
3732
+ },
3733
+ {
3734
+ "epoch": 0.7217502443425307,
3735
+ "grad_norm": 756.0,
3736
+ "learning_rate": 5.738159914963978e-05,
3737
+ "loss": 1.4844,
3738
+ "step": 25200
3739
+ },
3740
+ {
3741
+ "epoch": 0.7231822884781309,
3742
+ "grad_norm": 0.10107421875,
3743
+ "learning_rate": 5.70863351836542e-05,
3744
+ "loss": 2.7171,
3745
+ "step": 25250
3746
+ },
3747
+ {
3748
+ "epoch": 0.7246143326137312,
3749
+ "grad_norm": 336.0,
3750
+ "learning_rate": 5.67910712176686e-05,
3751
+ "loss": 3.1605,
3752
+ "step": 25300
3753
+ },
3754
+ {
3755
+ "epoch": 0.7260463767493314,
3756
+ "grad_norm": 177.0,
3757
+ "learning_rate": 5.649580725168301e-05,
3758
+ "loss": 1.9816,
3759
+ "step": 25350
3760
+ },
3761
+ {
3762
+ "epoch": 0.7274784208849316,
3763
+ "grad_norm": 1.78125,
3764
+ "learning_rate": 5.620054328569741e-05,
3765
+ "loss": 1.8129,
3766
+ "step": 25400
3767
+ },
3768
+ {
3769
+ "epoch": 0.7289104650205319,
3770
+ "grad_norm": 0.040771484375,
3771
+ "learning_rate": 5.5905279319711824e-05,
3772
+ "loss": 1.3484,
3773
+ "step": 25450
3774
+ },
3775
+ {
3776
+ "epoch": 0.7303425091561322,
3777
+ "grad_norm": 8.375,
3778
+ "learning_rate": 5.561001535372623e-05,
3779
+ "loss": 2.1354,
3780
+ "step": 25500
3781
+ },
3782
+ {
3783
+ "epoch": 0.7317745532917325,
3784
+ "grad_norm": 0.0308837890625,
3785
+ "learning_rate": 5.5314751387740635e-05,
3786
+ "loss": 1.747,
3787
+ "step": 25550
3788
+ },
3789
+ {
3790
+ "epoch": 0.7332065974273327,
3791
+ "grad_norm": 808.0,
3792
+ "learning_rate": 5.5019487421755053e-05,
3793
+ "loss": 2.6803,
3794
+ "step": 25600
3795
+ },
3796
+ {
3797
+ "epoch": 0.734638641562933,
3798
+ "grad_norm": 0.96484375,
3799
+ "learning_rate": 5.4724223455769465e-05,
3800
+ "loss": 2.2422,
3801
+ "step": 25650
3802
+ },
3803
+ {
3804
+ "epoch": 0.7360706856985332,
3805
+ "grad_norm": 10.0,
3806
+ "learning_rate": 5.442895948978387e-05,
3807
+ "loss": 2.0731,
3808
+ "step": 25700
3809
+ },
3810
+ {
3811
+ "epoch": 0.7375027298341335,
3812
+ "grad_norm": 0.27734375,
3813
+ "learning_rate": 5.4133695523798276e-05,
3814
+ "loss": 2.9622,
3815
+ "step": 25750
3816
+ },
3817
+ {
3818
+ "epoch": 0.7389347739697337,
3819
+ "grad_norm": 200.0,
3820
+ "learning_rate": 5.383843155781269e-05,
3821
+ "loss": 2.179,
3822
+ "step": 25800
3823
+ },
3824
+ {
3825
+ "epoch": 0.740366818105334,
3826
+ "grad_norm": 118.5,
3827
+ "learning_rate": 5.354316759182709e-05,
3828
+ "loss": 2.4152,
3829
+ "step": 25850
3830
+ },
3831
+ {
3832
+ "epoch": 0.7417988622409343,
3833
+ "grad_norm": 0.2470703125,
3834
+ "learning_rate": 5.32479036258415e-05,
3835
+ "loss": 1.4274,
3836
+ "step": 25900
3837
+ },
3838
+ {
3839
+ "epoch": 0.7432309063765346,
3840
+ "grad_norm": 6.71875,
3841
+ "learning_rate": 5.295263965985592e-05,
3842
+ "loss": 2.0263,
3843
+ "step": 25950
3844
+ },
3845
+ {
3846
+ "epoch": 0.7446629505121348,
3847
+ "grad_norm": 0.28125,
3848
+ "learning_rate": 5.265737569387033e-05,
3849
+ "loss": 1.8231,
3850
+ "step": 26000
3851
+ },
3852
+ {
3853
+ "epoch": 0.746094994647735,
3854
+ "grad_norm": 0.8046875,
3855
+ "learning_rate": 5.2362111727884734e-05,
3856
+ "loss": 1.7974,
3857
+ "step": 26050
3858
+ },
3859
+ {
3860
+ "epoch": 0.7475270387833353,
3861
+ "grad_norm": 102.5,
3862
+ "learning_rate": 5.206684776189914e-05,
3863
+ "loss": 2.5667,
3864
+ "step": 26100
3865
+ },
3866
+ {
3867
+ "epoch": 0.7489590829189355,
3868
+ "grad_norm": 380.0,
3869
+ "learning_rate": 5.1771583795913544e-05,
3870
+ "loss": 1.8334,
3871
+ "step": 26150
3872
+ },
3873
+ {
3874
+ "epoch": 0.7503911270545358,
3875
+ "grad_norm": 0.74609375,
3876
+ "learning_rate": 5.1476319829927956e-05,
3877
+ "loss": 1.2929,
3878
+ "step": 26200
3879
+ },
3880
+ {
3881
+ "epoch": 0.7518231711901361,
3882
+ "grad_norm": 7.3125,
3883
+ "learning_rate": 5.118105586394236e-05,
3884
+ "loss": 2.2943,
3885
+ "step": 26250
3886
+ },
3887
+ {
3888
+ "epoch": 0.7518231711901361,
3889
+ "eval_accuracy": 0.951,
3890
+ "eval_loss": 0.2633407413959503,
3891
+ "eval_macro_f1": 0.9502699810655684,
3892
+ "eval_runtime": 172.7789,
3893
+ "eval_samples_per_second": 11.575,
3894
+ "eval_steps_per_second": 11.575,
3895
+ "step": 26250
3896
+ }
3897
+ ],
3898
+ "logging_steps": 50,
3899
+ "max_steps": 34916,
3900
+ "num_input_tokens_seen": 0,
3901
+ "num_train_epochs": 1,
3902
+ "save_steps": 1250,
3903
+ "stateful_callbacks": {
3904
+ "TrainerControl": {
3905
+ "args": {
3906
+ "should_epoch_stop": false,
3907
+ "should_evaluate": false,
3908
+ "should_log": false,
3909
+ "should_save": true,
3910
+ "should_training_stop": false
3911
+ },
3912
+ "attributes": {}
3913
+ }
3914
+ },
3915
+ "total_flos": 2.1179332952064e+18,
3916
+ "train_batch_size": 1,
3917
+ "trial_name": null,
3918
+ "trial_params": null
3919
+ }
Topic/Strategy/checkpoint-26250/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c312fa9ef2c2fd587c80c33d22bca8b57769510fc82b4a9d1b7772152d2b2bb4
3
+ size 5496
Topic/Strategy/checkpoint-26250/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
Topic/Willingness/checkpoint-2500/README.md ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: D:\LLM\Qwen2.5-7B-Instruct
3
+ library_name: peft
4
+ tags:
5
+ - base_model:adapter:D:\LLM\Qwen2.5-7B-Instruct
6
+ - lora
7
+ - transformers
8
+ ---
9
+
10
+ # Model Card for Model ID
11
+
12
+ <!-- Provide a quick summary of what the model is/does. -->
13
+
14
+
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ <!-- Provide a longer summary of what this model is. -->
21
+
22
+
23
+
24
+ - **Developed by:** [More Information Needed]
25
+ - **Funded by [optional]:** [More Information Needed]
26
+ - **Shared by [optional]:** [More Information Needed]
27
+ - **Model type:** [More Information Needed]
28
+ - **Language(s) (NLP):** [More Information Needed]
29
+ - **License:** [More Information Needed]
30
+ - **Finetuned from model [optional]:** [More Information Needed]
31
+
32
+ ### Model Sources [optional]
33
+
34
+ <!-- Provide the basic links for the model. -->
35
+
36
+ - **Repository:** [More Information Needed]
37
+ - **Paper [optional]:** [More Information Needed]
38
+ - **Demo [optional]:** [More Information Needed]
39
+
40
+ ## Uses
41
+
42
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
+
44
+ ### Direct Use
45
+
46
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
47
+
48
+ [More Information Needed]
49
+
50
+ ### Downstream Use [optional]
51
+
52
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
53
+
54
+ [More Information Needed]
55
+
56
+ ### Out-of-Scope Use
57
+
58
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
59
+
60
+ [More Information Needed]
61
+
62
+ ## Bias, Risks, and Limitations
63
+
64
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
65
+
66
+ [More Information Needed]
67
+
68
+ ### Recommendations
69
+
70
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
71
+
72
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
73
+
74
+ ## How to Get Started with the Model
75
+
76
+ Use the code below to get started with the model.
77
+
78
+ [More Information Needed]
79
+
80
+ ## Training Details
81
+
82
+ ### Training Data
83
+
84
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
85
+
86
+ [More Information Needed]
87
+
88
+ ### Training Procedure
89
+
90
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
91
+
92
+ #### Preprocessing [optional]
93
+
94
+ [More Information Needed]
95
+
96
+
97
+ #### Training Hyperparameters
98
+
99
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
100
+
101
+ #### Speeds, Sizes, Times [optional]
102
+
103
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
104
+
105
+ [More Information Needed]
106
+
107
+ ## Evaluation
108
+
109
+ <!-- This section describes the evaluation protocols and provides the results. -->
110
+
111
+ ### Testing Data, Factors & Metrics
112
+
113
+ #### Testing Data
114
+
115
+ <!-- This should link to a Dataset Card if possible. -->
116
+
117
+ [More Information Needed]
118
+
119
+ #### Factors
120
+
121
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
122
+
123
+ [More Information Needed]
124
+
125
+ #### Metrics
126
+
127
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
128
+
129
+ [More Information Needed]
130
+
131
+ ### Results
132
+
133
+ [More Information Needed]
134
+
135
+ #### Summary
136
+
137
+
138
+
139
+ ## Model Examination [optional]
140
+
141
+ <!-- Relevant interpretability work for the model goes here -->
142
+
143
+ [More Information Needed]
144
+
145
+ ## Environmental Impact
146
+
147
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
148
+
149
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
150
+
151
+ - **Hardware Type:** [More Information Needed]
152
+ - **Hours used:** [More Information Needed]
153
+ - **Cloud Provider:** [More Information Needed]
154
+ - **Compute Region:** [More Information Needed]
155
+ - **Carbon Emitted:** [More Information Needed]
156
+
157
+ ## Technical Specifications [optional]
158
+
159
+ ### Model Architecture and Objective
160
+
161
+ [More Information Needed]
162
+
163
+ ### Compute Infrastructure
164
+
165
+ [More Information Needed]
166
+
167
+ #### Hardware
168
+
169
+ [More Information Needed]
170
+
171
+ #### Software
172
+
173
+ [More Information Needed]
174
+
175
+ ## Citation [optional]
176
+
177
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
178
+
179
+ **BibTeX:**
180
+
181
+ [More Information Needed]
182
+
183
+ **APA:**
184
+
185
+ [More Information Needed]
186
+
187
+ ## Glossary [optional]
188
+
189
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
190
+
191
+ [More Information Needed]
192
+
193
+ ## More Information [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Authors [optional]
198
+
199
+ [More Information Needed]
200
+
201
+ ## Model Card Contact
202
+
203
+ [More Information Needed]
204
+ ### Framework versions
205
+
206
+ - PEFT 0.17.1
Topic/Willingness/checkpoint-2500/adapter_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "D:\\LLM\\Qwen2.5-7B-Instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": [
22
+ "classifier",
23
+ "score"
24
+ ],
25
+ "peft_type": "LORA",
26
+ "qalora_group_size": 16,
27
+ "r": 16,
28
+ "rank_pattern": {},
29
+ "revision": null,
30
+ "target_modules": [
31
+ "q_proj",
32
+ "up_proj",
33
+ "gate_proj",
34
+ "down_proj",
35
+ "o_proj",
36
+ "k_proj",
37
+ "v_proj"
38
+ ],
39
+ "target_parameters": null,
40
+ "task_type": "SEQ_CLS",
41
+ "trainable_token_indices": null,
42
+ "use_dora": false,
43
+ "use_qalora": false,
44
+ "use_rslora": false
45
+ }
Topic/Willingness/checkpoint-2500/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43890d6d5f36ad8aca965ad57a653b2ad3794ab1fac0fd3f5665a6ab580ce6e8
3
+ size 80800144
Topic/Willingness/checkpoint-2500/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
Topic/Willingness/checkpoint-2500/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
Topic/Willingness/checkpoint-2500/merges.txt ADDED
The diff for this file is too large to render. See raw diff