vishnuOI commited on
Commit
7fc60f6
·
verified ·
1 Parent(s): 4c3642f

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3-Coder-30B-A3B-Instruct
7
+ - lora
8
+ - sft
9
+ - transformers
10
+ - trl
11
+ ---
12
+
13
+ # Model Card for Model ID
14
+
15
+ <!-- Provide a quick summary of what the model is/does. -->
16
+
17
+
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ <!-- Provide a longer summary of what this model is. -->
24
+
25
+
26
+
27
+ - **Developed by:** [More Information Needed]
28
+ - **Funded by [optional]:** [More Information Needed]
29
+ - **Shared by [optional]:** [More Information Needed]
30
+ - **Model type:** [More Information Needed]
31
+ - **Language(s) (NLP):** [More Information Needed]
32
+ - **License:** [More Information Needed]
33
+ - **Finetuned from model [optional]:** [More Information Needed]
34
+
35
+ ### Model Sources [optional]
36
+
37
+ <!-- Provide the basic links for the model. -->
38
+
39
+ - **Repository:** [More Information Needed]
40
+ - **Paper [optional]:** [More Information Needed]
41
+ - **Demo [optional]:** [More Information Needed]
42
+
43
+ ## Uses
44
+
45
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
46
+
47
+ ### Direct Use
48
+
49
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
50
+
51
+ [More Information Needed]
52
+
53
+ ### Downstream Use [optional]
54
+
55
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
56
+
57
+ [More Information Needed]
58
+
59
+ ### Out-of-Scope Use
60
+
61
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
62
+
63
+ [More Information Needed]
64
+
65
+ ## Bias, Risks, and Limitations
66
+
67
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
68
+
69
+ [More Information Needed]
70
+
71
+ ### Recommendations
72
+
73
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
74
+
75
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
76
+
77
+ ## How to Get Started with the Model
78
+
79
+ Use the code below to get started with the model.
80
+
81
+ [More Information Needed]
82
+
83
+ ## Training Details
84
+
85
+ ### Training Data
86
+
87
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
+
89
+ [More Information Needed]
90
+
91
+ ### Training Procedure
92
+
93
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
94
+
95
+ #### Preprocessing [optional]
96
+
97
+ [More Information Needed]
98
+
99
+
100
+ #### Training Hyperparameters
101
+
102
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
103
+
104
+ #### Speeds, Sizes, Times [optional]
105
+
106
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
107
+
108
+ [More Information Needed]
109
+
110
+ ## Evaluation
111
+
112
+ <!-- This section describes the evaluation protocols and provides the results. -->
113
+
114
+ ### Testing Data, Factors & Metrics
115
+
116
+ #### Testing Data
117
+
118
+ <!-- This should link to a Dataset Card if possible. -->
119
+
120
+ [More Information Needed]
121
+
122
+ #### Factors
123
+
124
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
125
+
126
+ [More Information Needed]
127
+
128
+ #### Metrics
129
+
130
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
131
+
132
+ [More Information Needed]
133
+
134
+ ### Results
135
+
136
+ [More Information Needed]
137
+
138
+ #### Summary
139
+
140
+
141
+
142
+ ## Model Examination [optional]
143
+
144
+ <!-- Relevant interpretability work for the model goes here -->
145
+
146
+ [More Information Needed]
147
+
148
+ ## Environmental Impact
149
+
150
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
151
+
152
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
153
+
154
+ - **Hardware Type:** [More Information Needed]
155
+ - **Hours used:** [More Information Needed]
156
+ - **Cloud Provider:** [More Information Needed]
157
+ - **Compute Region:** [More Information Needed]
158
+ - **Carbon Emitted:** [More Information Needed]
159
+
160
+ ## Technical Specifications [optional]
161
+
162
+ ### Model Architecture and Objective
163
+
164
+ [More Information Needed]
165
+
166
+ ### Compute Infrastructure
167
+
168
+ [More Information Needed]
169
+
170
+ #### Hardware
171
+
172
+ [More Information Needed]
173
+
174
+ #### Software
175
+
176
+ [More Information Needed]
177
+
178
+ ## Citation [optional]
179
+
180
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
181
+
182
+ **BibTeX:**
183
+
184
+ [More Information Needed]
185
+
186
+ **APA:**
187
+
188
+ [More Information Needed]
189
+
190
+ ## Glossary [optional]
191
+
192
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
193
+
194
+ [More Information Needed]
195
+
196
+ ## More Information [optional]
197
+
198
+ [More Information Needed]
199
+
200
+ ## Model Card Authors [optional]
201
+
202
+ [More Information Needed]
203
+
204
+ ## Model Card Contact
205
+
206
+ [More Information Needed]
207
+ ### Framework versions
208
+
209
+ - PEFT 0.18.1
adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.1",
27
+ "qalora_group_size": 16,
28
+ "r": 16,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "o_proj",
33
+ "q_proj",
34
+ "gate_proj",
35
+ "v_proj",
36
+ "k_proj",
37
+ "up_proj",
38
+ "down_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ec62876de622d14dedb4da963455955477c39c9268502906c582db292796dfc
3
+ size 53528920
chat_template.jinja ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {% macro render_extra_keys(json_dict, handled_keys) %}
2
+ {%- if json_dict is mapping %}
3
+ {%- for json_key in json_dict if json_key not in handled_keys %}
4
+ {%- if json_dict[json_key] is mapping or (json_dict[json_key] is sequence and json_dict[json_key] is not string) %}
5
+ {{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' }}
6
+ {%- else %}
7
+ {{-'\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' }}
8
+ {%- endif %}
9
+ {%- endfor %}
10
+ {%- endif %}
11
+ {% endmacro %}
12
+
13
+ {%- if messages[0]["role"] == "system" %}
14
+ {%- set system_message = messages[0]["content"] %}
15
+ {%- set loop_messages = messages[1:] %}
16
+ {%- else %}
17
+ {%- set loop_messages = messages %}
18
+ {%- endif %}
19
+
20
+ {%- if not tools is defined %}
21
+ {%- set tools = [] %}
22
+ {%- endif %}
23
+
24
+ {%- if system_message is defined %}
25
+ {{- "<|im_start|>system\n" + system_message }}
26
+ {%- else %}
27
+ {%- if tools is iterable and tools | length > 0 %}
28
+ {{- "<|im_start|>system\nYou are Qwen, a helpful AI assistant that can interact with a computer to solve tasks." }}
29
+ {%- endif %}
30
+ {%- endif %}
31
+ {%- if tools is iterable and tools | length > 0 %}
32
+ {{- "\n\n# Tools\n\nYou have access to the following functions:\n\n" }}
33
+ {{- "<tools>" }}
34
+ {%- for tool in tools %}
35
+ {%- if tool.function is defined %}
36
+ {%- set tool = tool.function %}
37
+ {%- endif %}
38
+ {{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
39
+ {%- if tool.description is defined %}
40
+ {{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
41
+ {%- endif %}
42
+ {{- '\n<parameters>' }}
43
+ {%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
44
+ {%- for param_name, param_fields in tool.parameters.properties|items %}
45
+ {{- '\n<parameter>' }}
46
+ {{- '\n<name>' ~ param_name ~ '</name>' }}
47
+ {%- if param_fields.type is defined %}
48
+ {{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
49
+ {%- endif %}
50
+ {%- if param_fields.description is defined %}
51
+ {{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
52
+ {%- endif %}
53
+ {%- set handled_keys = ['name', 'type', 'description'] %}
54
+ {{- render_extra_keys(param_fields, handled_keys) }}
55
+ {{- '\n</parameter>' }}
56
+ {%- endfor %}
57
+ {%- endif %}
58
+ {% set handled_keys = ['type', 'properties'] %}
59
+ {{- render_extra_keys(tool.parameters, handled_keys) }}
60
+ {{- '\n</parameters>' }}
61
+ {%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
62
+ {{- render_extra_keys(tool, handled_keys) }}
63
+ {{- '\n</function>' }}
64
+ {%- endfor %}
65
+ {{- "\n</tools>" }}
66
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
67
+ {%- endif %}
68
+ {%- if system_message is defined %}
69
+ {{- '<|im_end|>\n' }}
70
+ {%- else %}
71
+ {%- if tools is iterable and tools | length > 0 %}
72
+ {{- '<|im_end|>\n' }}
73
+ {%- endif %}
74
+ {%- endif %}
75
+ {%- for message in loop_messages %}
76
+ {%- if message.role == "assistant" and message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}
77
+ {{- '<|im_start|>' + message.role }}
78
+ {%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
79
+ {{- '\n' + message.content | trim + '\n' }}
80
+ {%- endif %}
81
+ {%- for tool_call in message.tool_calls %}
82
+ {%- if tool_call.function is defined %}
83
+ {%- set tool_call = tool_call.function %}
84
+ {%- endif %}
85
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
86
+ {%- if tool_call.arguments is defined %}
87
+ {%- for args_name, args_value in tool_call.arguments|items %}
88
+ {{- '<parameter=' + args_name + '>\n' }}
89
+ {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
90
+ {{- args_value }}
91
+ {{- '\n</parameter>\n' }}
92
+ {%- endfor %}
93
+ {%- endif %}
94
+ {{- '</function>\n</tool_call>' }}
95
+ {%- endfor %}
96
+ {{- '<|im_end|>\n' }}
97
+ {%- elif message.role == "user" or message.role == "system" or message.role == "assistant" %}
98
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
99
+ {%- elif message.role == "tool" %}
100
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
101
+ {{- '<|im_start|>user\n' }}
102
+ {%- endif %}
103
+ {{- '<tool_response>\n' }}
104
+ {{- message.content }}
105
+ {{- '\n</tool_response>\n' }}
106
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
107
+ {{- '<|im_end|>\n' }}
108
+ {%- elif loop.last %}
109
+ {{- '<|im_end|>\n' }}
110
+ {%- endif %}
111
+ {%- else %}
112
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>\n' }}
113
+ {%- endif %}
114
+ {%- endfor %}
115
+ {%- if add_generation_prompt %}
116
+ {{- '<|im_start|>assistant\n' }}
117
+ {%- endif %}
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66efe6d162462dccc7ba946f706e7453625246b79b78e9f8b890eb06e8ce8379
3
+ size 27614166
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbe6c9dc1ffcec3fabe9e60a4178bdb8df2583340910df5d99436b9812079d05
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b215a27550f7e6212a372d3c82b0189b243e058c2f2aefe6a4194883d88ccdb6
3
+ size 1064
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
3
+ size 11422650
tokenizer_config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|im_start|>",
10
+ "<|im_end|>",
11
+ "<|object_ref_start|>",
12
+ "<|object_ref_end|>",
13
+ "<|box_start|>",
14
+ "<|box_end|>",
15
+ "<|quad_start|>",
16
+ "<|quad_end|>",
17
+ "<|vision_start|>",
18
+ "<|vision_end|>",
19
+ "<|vision_pad|>",
20
+ "<|image_pad|>",
21
+ "<|video_pad|>"
22
+ ],
23
+ "is_local": false,
24
+ "model_max_length": 2048,
25
+ "pad_token": "<|im_end|>",
26
+ "split_special_tokens": false,
27
+ "tokenizer_class": "Qwen2Tokenizer",
28
+ "unk_token": null
29
+ }
trainer_state.json ADDED
@@ -0,0 +1,1075 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 1038,
3
+ "best_metric": 1.032158374786377,
4
+ "best_model_checkpoint": "C:\\unity_train\\output\\unity-coder-adapter\\checkpoint-1038",
5
+ "epoch": 1.0,
6
+ "eval_steps": 500,
7
+ "global_step": 1038,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "entropy": 1.0811691626906395,
14
+ "epoch": 0.00963623223319682,
15
+ "grad_norm": 0.35738348960876465,
16
+ "learning_rate": 1.153846153846154e-05,
17
+ "loss": 2.42077693939209,
18
+ "mean_token_accuracy": 0.5671707078814506,
19
+ "num_tokens": 59463.0,
20
+ "step": 10
21
+ },
22
+ {
23
+ "entropy": 1.0596500962972641,
24
+ "epoch": 0.01927246446639364,
25
+ "grad_norm": 0.26245468854904175,
26
+ "learning_rate": 2.435897435897436e-05,
27
+ "loss": 2.3048450469970705,
28
+ "mean_token_accuracy": 0.5807798743247986,
29
+ "num_tokens": 122441.0,
30
+ "step": 20
31
+ },
32
+ {
33
+ "entropy": 1.1488549172878266,
34
+ "epoch": 0.02890869669959046,
35
+ "grad_norm": 0.21727126836776733,
36
+ "learning_rate": 3.717948717948718e-05,
37
+ "loss": 2.15008487701416,
38
+ "mean_token_accuracy": 0.5763025932013989,
39
+ "num_tokens": 178300.0,
40
+ "step": 30
41
+ },
42
+ {
43
+ "entropy": 1.324630731344223,
44
+ "epoch": 0.03854492893278728,
45
+ "grad_norm": 0.09244390577077866,
46
+ "learning_rate": 5e-05,
47
+ "loss": 1.9429821014404296,
48
+ "mean_token_accuracy": 0.595233204215765,
49
+ "num_tokens": 240279.0,
50
+ "step": 40
51
+ },
52
+ {
53
+ "entropy": 1.54823177754879,
54
+ "epoch": 0.0481811611659841,
55
+ "grad_norm": 0.07658641040325165,
56
+ "learning_rate": 6.282051282051282e-05,
57
+ "loss": 1.8167322158813477,
58
+ "mean_token_accuracy": 0.6044709533452988,
59
+ "num_tokens": 302914.0,
60
+ "step": 50
61
+ },
62
+ {
63
+ "entropy": 1.6716329783201218,
64
+ "epoch": 0.05781739339918092,
65
+ "grad_norm": 0.0950954258441925,
66
+ "learning_rate": 7.564102564102564e-05,
67
+ "loss": 1.7055500030517579,
68
+ "mean_token_accuracy": 0.6348149433732033,
69
+ "num_tokens": 351888.0,
70
+ "step": 60
71
+ },
72
+ {
73
+ "entropy": 1.519571453332901,
74
+ "epoch": 0.06745362563237774,
75
+ "grad_norm": 0.07054093480110168,
76
+ "learning_rate": 8.846153846153847e-05,
77
+ "loss": 1.5317837715148925,
78
+ "mean_token_accuracy": 0.6589120730757714,
79
+ "num_tokens": 414923.0,
80
+ "step": 70
81
+ },
82
+ {
83
+ "entropy": 1.5119123458862305,
84
+ "epoch": 0.07708985786557455,
85
+ "grad_norm": 0.07167445123195648,
86
+ "learning_rate": 0.00010128205128205129,
87
+ "loss": 1.4823562622070312,
88
+ "mean_token_accuracy": 0.6631396897137165,
89
+ "num_tokens": 468593.0,
90
+ "step": 80
91
+ },
92
+ {
93
+ "entropy": 1.3805345997214318,
94
+ "epoch": 0.08672609009877139,
95
+ "grad_norm": 0.10775867104530334,
96
+ "learning_rate": 0.0001141025641025641,
97
+ "loss": 1.413505458831787,
98
+ "mean_token_accuracy": 0.6830234751105309,
99
+ "num_tokens": 525914.0,
100
+ "step": 90
101
+ },
102
+ {
103
+ "entropy": 1.4891180142760276,
104
+ "epoch": 0.0963623223319682,
105
+ "grad_norm": 0.05629459396004677,
106
+ "learning_rate": 0.00012692307692307693,
107
+ "loss": 1.5126714706420898,
108
+ "mean_token_accuracy": 0.6582696467638016,
109
+ "num_tokens": 589835.0,
110
+ "step": 100
111
+ },
112
+ {
113
+ "entropy": 1.4281011313199996,
114
+ "epoch": 0.10599855456516502,
115
+ "grad_norm": 0.05547759681940079,
116
+ "learning_rate": 0.00013974358974358974,
117
+ "loss": 1.44595947265625,
118
+ "mean_token_accuracy": 0.6682780802249908,
119
+ "num_tokens": 648138.0,
120
+ "step": 110
121
+ },
122
+ {
123
+ "entropy": 1.3978321179747581,
124
+ "epoch": 0.11563478679836184,
125
+ "grad_norm": 0.05473790690302849,
126
+ "learning_rate": 0.00015256410256410255,
127
+ "loss": 1.4433627128601074,
128
+ "mean_token_accuracy": 0.674263383448124,
129
+ "num_tokens": 702219.0,
130
+ "step": 120
131
+ },
132
+ {
133
+ "entropy": 1.4054933816194535,
134
+ "epoch": 0.12527101903155866,
135
+ "grad_norm": 0.06133154034614563,
136
+ "learning_rate": 0.0001653846153846154,
137
+ "loss": 1.4284349441528321,
138
+ "mean_token_accuracy": 0.6733272626996041,
139
+ "num_tokens": 759042.0,
140
+ "step": 130
141
+ },
142
+ {
143
+ "entropy": 1.452716739475727,
144
+ "epoch": 0.13490725126475547,
145
+ "grad_norm": 0.05656077712774277,
146
+ "learning_rate": 0.00017820512820512823,
147
+ "loss": 1.4834243774414062,
148
+ "mean_token_accuracy": 0.6669867470860481,
149
+ "num_tokens": 822515.0,
150
+ "step": 140
151
+ },
152
+ {
153
+ "entropy": 1.2200356990098953,
154
+ "epoch": 0.1445434834979523,
155
+ "grad_norm": 0.06002284586429596,
156
+ "learning_rate": 0.00019102564102564104,
157
+ "loss": 1.2548653602600097,
158
+ "mean_token_accuracy": 0.7205198630690575,
159
+ "num_tokens": 876506.0,
160
+ "step": 150
161
+ },
162
+ {
163
+ "entropy": 1.3374179184436799,
164
+ "epoch": 0.1541797157311491,
165
+ "grad_norm": 0.07770856469869614,
166
+ "learning_rate": 0.00019979716024340772,
167
+ "loss": 1.383979606628418,
168
+ "mean_token_accuracy": 0.6905814677476882,
169
+ "num_tokens": 934413.0,
170
+ "step": 160
171
+ },
172
+ {
173
+ "entropy": 1.275234666466713,
174
+ "epoch": 0.16381594796434595,
175
+ "grad_norm": 0.06369830667972565,
176
+ "learning_rate": 0.0001991210277214334,
177
+ "loss": 1.2860990524291993,
178
+ "mean_token_accuracy": 0.7051964432001114,
179
+ "num_tokens": 992594.0,
180
+ "step": 170
181
+ },
182
+ {
183
+ "entropy": 1.3504198059439658,
184
+ "epoch": 0.17345218019754277,
185
+ "grad_norm": 0.07278598099946976,
186
+ "learning_rate": 0.0001984448951994591,
187
+ "loss": 1.3741103172302247,
188
+ "mean_token_accuracy": 0.6862095996737481,
189
+ "num_tokens": 1048524.0,
190
+ "step": 180
191
+ },
192
+ {
193
+ "entropy": 1.2644047453999518,
194
+ "epoch": 0.1830884124307396,
195
+ "grad_norm": 0.07138558477163315,
196
+ "learning_rate": 0.00019776876267748477,
197
+ "loss": 1.2995844841003419,
198
+ "mean_token_accuracy": 0.706048458814621,
199
+ "num_tokens": 1109157.0,
200
+ "step": 190
201
+ },
202
+ {
203
+ "entropy": 1.3248096346855163,
204
+ "epoch": 0.1927246446639364,
205
+ "grad_norm": 0.08208955079317093,
206
+ "learning_rate": 0.00019709263015551048,
207
+ "loss": 1.3252597808837892,
208
+ "mean_token_accuracy": 0.7000985190272331,
209
+ "num_tokens": 1170070.0,
210
+ "step": 200
211
+ },
212
+ {
213
+ "entropy": 1.2682933300733565,
214
+ "epoch": 0.20236087689713322,
215
+ "grad_norm": 0.055171046406030655,
216
+ "learning_rate": 0.00019641649763353617,
217
+ "loss": 1.3352859497070313,
218
+ "mean_token_accuracy": 0.700250719487667,
219
+ "num_tokens": 1229436.0,
220
+ "step": 210
221
+ },
222
+ {
223
+ "entropy": 1.347843087464571,
224
+ "epoch": 0.21199710913033004,
225
+ "grad_norm": 0.06492508947849274,
226
+ "learning_rate": 0.00019574036511156188,
227
+ "loss": 1.3542288780212401,
228
+ "mean_token_accuracy": 0.6922637760639191,
229
+ "num_tokens": 1287532.0,
230
+ "step": 220
231
+ },
232
+ {
233
+ "entropy": 1.218190498650074,
234
+ "epoch": 0.22163334136352686,
235
+ "grad_norm": 0.0640958771109581,
236
+ "learning_rate": 0.00019506423258958757,
237
+ "loss": 1.2281950950622558,
238
+ "mean_token_accuracy": 0.718992106616497,
239
+ "num_tokens": 1344357.0,
240
+ "step": 230
241
+ },
242
+ {
243
+ "entropy": 1.3014725491404533,
244
+ "epoch": 0.23126957359672368,
245
+ "grad_norm": 0.05718906223773956,
246
+ "learning_rate": 0.00019438810006761325,
247
+ "loss": 1.3229006767272948,
248
+ "mean_token_accuracy": 0.6986020863056183,
249
+ "num_tokens": 1405209.0,
250
+ "step": 240
251
+ },
252
+ {
253
+ "entropy": 1.227053464204073,
254
+ "epoch": 0.2409058058299205,
255
+ "grad_norm": 0.05763186141848564,
256
+ "learning_rate": 0.00019371196754563896,
257
+ "loss": 1.2807598114013672,
258
+ "mean_token_accuracy": 0.7116438299417496,
259
+ "num_tokens": 1457023.0,
260
+ "step": 250
261
+ },
262
+ {
263
+ "entropy": 1.2856510818004607,
264
+ "epoch": 0.2505420380631173,
265
+ "grad_norm": 0.06267844140529633,
266
+ "learning_rate": 0.00019303583502366465,
267
+ "loss": 1.3038443565368651,
268
+ "mean_token_accuracy": 0.7054769903421402,
269
+ "num_tokens": 1513975.0,
270
+ "step": 260
271
+ },
272
+ {
273
+ "entropy": 1.228683941066265,
274
+ "epoch": 0.26017827029631413,
275
+ "grad_norm": 0.0609310083091259,
276
+ "learning_rate": 0.00019235970250169033,
277
+ "loss": 1.2614136695861817,
278
+ "mean_token_accuracy": 0.7095189347863198,
279
+ "num_tokens": 1579252.0,
280
+ "step": 270
281
+ },
282
+ {
283
+ "entropy": 1.2551469817757606,
284
+ "epoch": 0.26981450252951095,
285
+ "grad_norm": 0.06873613595962524,
286
+ "learning_rate": 0.00019168356997971604,
287
+ "loss": 1.2714256286621093,
288
+ "mean_token_accuracy": 0.712159389257431,
289
+ "num_tokens": 1637995.0,
290
+ "step": 280
291
+ },
292
+ {
293
+ "entropy": 1.2518196165561677,
294
+ "epoch": 0.27945073476270776,
295
+ "grad_norm": 0.07062622159719467,
296
+ "learning_rate": 0.00019100743745774173,
297
+ "loss": 1.302776050567627,
298
+ "mean_token_accuracy": 0.7063210532069206,
299
+ "num_tokens": 1703008.0,
300
+ "step": 290
301
+ },
302
+ {
303
+ "entropy": 1.2446896508336067,
304
+ "epoch": 0.2890869669959046,
305
+ "grad_norm": 0.051750048995018005,
306
+ "learning_rate": 0.00019033130493576744,
307
+ "loss": 1.3015289306640625,
308
+ "mean_token_accuracy": 0.7073144510388374,
309
+ "num_tokens": 1766072.0,
310
+ "step": 300
311
+ },
312
+ {
313
+ "entropy": 1.3015983402729034,
314
+ "epoch": 0.2987231992291014,
315
+ "grad_norm": 0.0565313994884491,
316
+ "learning_rate": 0.00018965517241379312,
317
+ "loss": 1.311717414855957,
318
+ "mean_token_accuracy": 0.7029094457626343,
319
+ "num_tokens": 1820923.0,
320
+ "step": 310
321
+ },
322
+ {
323
+ "entropy": 1.1577203705906869,
324
+ "epoch": 0.3083594314622982,
325
+ "grad_norm": 0.05471700802445412,
326
+ "learning_rate": 0.0001889790398918188,
327
+ "loss": 1.2251465797424317,
328
+ "mean_token_accuracy": 0.7273583576083184,
329
+ "num_tokens": 1877116.0,
330
+ "step": 320
331
+ },
332
+ {
333
+ "entropy": 1.205883078277111,
334
+ "epoch": 0.31799566369549503,
335
+ "grad_norm": 0.05748973786830902,
336
+ "learning_rate": 0.00018830290736984452,
337
+ "loss": 1.244644546508789,
338
+ "mean_token_accuracy": 0.7220181196928024,
339
+ "num_tokens": 1935606.0,
340
+ "step": 330
341
+ },
342
+ {
343
+ "entropy": 1.258717157691717,
344
+ "epoch": 0.3276318959286919,
345
+ "grad_norm": 0.05800570175051689,
346
+ "learning_rate": 0.0001876267748478702,
347
+ "loss": 1.3071101188659668,
348
+ "mean_token_accuracy": 0.7034722596406937,
349
+ "num_tokens": 1997199.0,
350
+ "step": 340
351
+ },
352
+ {
353
+ "entropy": 1.22360208183527,
354
+ "epoch": 0.3372681281618887,
355
+ "grad_norm": 0.09994573146104813,
356
+ "learning_rate": 0.00018695064232589589,
357
+ "loss": 1.2296308517456054,
358
+ "mean_token_accuracy": 0.7146901577711106,
359
+ "num_tokens": 2048172.0,
360
+ "step": 350
361
+ },
362
+ {
363
+ "entropy": 1.3035957127809525,
364
+ "epoch": 0.34690436039508554,
365
+ "grad_norm": 0.07320819795131683,
366
+ "learning_rate": 0.00018627450980392157,
367
+ "loss": 1.2939067840576173,
368
+ "mean_token_accuracy": 0.696925450861454,
369
+ "num_tokens": 2109948.0,
370
+ "step": 360
371
+ },
372
+ {
373
+ "entropy": 1.212366634607315,
374
+ "epoch": 0.35654059262828236,
375
+ "grad_norm": 0.06324014812707901,
376
+ "learning_rate": 0.00018559837728194725,
377
+ "loss": 1.2521446228027344,
378
+ "mean_token_accuracy": 0.717373288422823,
379
+ "num_tokens": 2171174.0,
380
+ "step": 370
381
+ },
382
+ {
383
+ "entropy": 1.240752936899662,
384
+ "epoch": 0.3661768248614792,
385
+ "grad_norm": 0.07091817259788513,
386
+ "learning_rate": 0.00018492224475997297,
387
+ "loss": 1.2615043640136718,
388
+ "mean_token_accuracy": 0.7065044552087784,
389
+ "num_tokens": 2228430.0,
390
+ "step": 380
391
+ },
392
+ {
393
+ "entropy": 1.2120654836297036,
394
+ "epoch": 0.375813057094676,
395
+ "grad_norm": 0.05131813511252403,
396
+ "learning_rate": 0.00018424611223799865,
397
+ "loss": 1.2185423851013184,
398
+ "mean_token_accuracy": 0.7187462538480759,
399
+ "num_tokens": 2285248.0,
400
+ "step": 390
401
+ },
402
+ {
403
+ "entropy": 1.2625371024012566,
404
+ "epoch": 0.3854492893278728,
405
+ "grad_norm": 0.056896600872278214,
406
+ "learning_rate": 0.00018356997971602433,
407
+ "loss": 1.3041479110717773,
408
+ "mean_token_accuracy": 0.7014564648270607,
409
+ "num_tokens": 2345584.0,
410
+ "step": 400
411
+ },
412
+ {
413
+ "entropy": 1.229997194558382,
414
+ "epoch": 0.39508552156106963,
415
+ "grad_norm": 0.06733101606369019,
416
+ "learning_rate": 0.00018289384719405005,
417
+ "loss": 1.2783479690551758,
418
+ "mean_token_accuracy": 0.7130212724208832,
419
+ "num_tokens": 2402073.0,
420
+ "step": 410
421
+ },
422
+ {
423
+ "entropy": 1.253336711972952,
424
+ "epoch": 0.40472175379426645,
425
+ "grad_norm": 0.1017632707953453,
426
+ "learning_rate": 0.00018221771467207573,
427
+ "loss": 1.2827540397644044,
428
+ "mean_token_accuracy": 0.703055490553379,
429
+ "num_tokens": 2461269.0,
430
+ "step": 420
431
+ },
432
+ {
433
+ "entropy": 1.1999207064509392,
434
+ "epoch": 0.41435798602746327,
435
+ "grad_norm": 0.0675441101193428,
436
+ "learning_rate": 0.00018154158215010142,
437
+ "loss": 1.1980154037475585,
438
+ "mean_token_accuracy": 0.7192850261926651,
439
+ "num_tokens": 2520272.0,
440
+ "step": 430
441
+ },
442
+ {
443
+ "entropy": 1.1966832533478737,
444
+ "epoch": 0.4239942182606601,
445
+ "grad_norm": 0.04782295227050781,
446
+ "learning_rate": 0.00018086544962812713,
447
+ "loss": 1.196799373626709,
448
+ "mean_token_accuracy": 0.7202263355255127,
449
+ "num_tokens": 2574916.0,
450
+ "step": 440
451
+ },
452
+ {
453
+ "entropy": 1.28217963129282,
454
+ "epoch": 0.4336304504938569,
455
+ "grad_norm": 0.055622052401304245,
456
+ "learning_rate": 0.0001801893171061528,
457
+ "loss": 1.2780956268310546,
458
+ "mean_token_accuracy": 0.6991938084363938,
459
+ "num_tokens": 2630803.0,
460
+ "step": 450
461
+ },
462
+ {
463
+ "entropy": 1.1060978904366494,
464
+ "epoch": 0.4432666827270537,
465
+ "grad_norm": 0.08565858751535416,
466
+ "learning_rate": 0.00017951318458417852,
467
+ "loss": 1.1542920112609862,
468
+ "mean_token_accuracy": 0.7370428621768952,
469
+ "num_tokens": 2686122.0,
470
+ "step": 460
471
+ },
472
+ {
473
+ "entropy": 1.3041150823235512,
474
+ "epoch": 0.45290291496025054,
475
+ "grad_norm": 0.05283431336283684,
476
+ "learning_rate": 0.0001788370520622042,
477
+ "loss": 1.2778042793273925,
478
+ "mean_token_accuracy": 0.700305138528347,
479
+ "num_tokens": 2751119.0,
480
+ "step": 470
481
+ },
482
+ {
483
+ "entropy": 1.2362390145659448,
484
+ "epoch": 0.46253914719344735,
485
+ "grad_norm": 0.05927009880542755,
486
+ "learning_rate": 0.0001781609195402299,
487
+ "loss": 1.2784390449523926,
488
+ "mean_token_accuracy": 0.700747960805893,
489
+ "num_tokens": 2809393.0,
490
+ "step": 480
491
+ },
492
+ {
493
+ "entropy": 1.2329043842852117,
494
+ "epoch": 0.47217537942664417,
495
+ "grad_norm": 0.05709557607769966,
496
+ "learning_rate": 0.0001774847870182556,
497
+ "loss": 1.2346397399902345,
498
+ "mean_token_accuracy": 0.7168422609567642,
499
+ "num_tokens": 2870246.0,
500
+ "step": 490
501
+ },
502
+ {
503
+ "entropy": 1.197921334207058,
504
+ "epoch": 0.481811611659841,
505
+ "grad_norm": 0.051337700337171555,
506
+ "learning_rate": 0.0001768086544962813,
507
+ "loss": 1.2222982406616212,
508
+ "mean_token_accuracy": 0.7162339583039283,
509
+ "num_tokens": 2934671.0,
510
+ "step": 500
511
+ },
512
+ {
513
+ "entropy": 1.2013787925243378,
514
+ "epoch": 0.4914478438930378,
515
+ "grad_norm": 0.06907735019922256,
516
+ "learning_rate": 0.00017613252197430697,
517
+ "loss": 1.2295709609985352,
518
+ "mean_token_accuracy": 0.7151955872774124,
519
+ "num_tokens": 2989014.0,
520
+ "step": 510
521
+ },
522
+ {
523
+ "entropy": 1.1978842347860337,
524
+ "epoch": 0.5010840761262346,
525
+ "grad_norm": 0.06152976304292679,
526
+ "learning_rate": 0.00017545638945233268,
527
+ "loss": 1.2030101776123048,
528
+ "mean_token_accuracy": 0.7203953012824058,
529
+ "num_tokens": 3045999.0,
530
+ "step": 520
531
+ },
532
+ {
533
+ "entropy": 1.3108120203018188,
534
+ "epoch": 0.5107203083594315,
535
+ "grad_norm": 0.05314662307500839,
536
+ "learning_rate": 0.00017478025693035837,
537
+ "loss": 1.3376919746398925,
538
+ "mean_token_accuracy": 0.6901254534721375,
539
+ "num_tokens": 3106021.0,
540
+ "step": 530
541
+ },
542
+ {
543
+ "entropy": 1.269498337060213,
544
+ "epoch": 0.5203565405926283,
545
+ "grad_norm": 0.052769869565963745,
546
+ "learning_rate": 0.00017410412440838405,
547
+ "loss": 1.3347968101501464,
548
+ "mean_token_accuracy": 0.7055287733674049,
549
+ "num_tokens": 3167001.0,
550
+ "step": 540
551
+ },
552
+ {
553
+ "entropy": 1.174779912084341,
554
+ "epoch": 0.5299927728258251,
555
+ "grad_norm": 0.057087887078523636,
556
+ "learning_rate": 0.00017342799188640974,
557
+ "loss": 1.204134464263916,
558
+ "mean_token_accuracy": 0.7242644309997559,
559
+ "num_tokens": 3222794.0,
560
+ "step": 550
561
+ },
562
+ {
563
+ "entropy": 1.1798104658722877,
564
+ "epoch": 0.5396290050590219,
565
+ "grad_norm": 0.06867323815822601,
566
+ "learning_rate": 0.00017275185936443542,
567
+ "loss": 1.218485164642334,
568
+ "mean_token_accuracy": 0.7220648050308227,
569
+ "num_tokens": 3282856.0,
570
+ "step": 560
571
+ },
572
+ {
573
+ "entropy": 1.2600811369717122,
574
+ "epoch": 0.5492652372922188,
575
+ "grad_norm": 0.06002269685268402,
576
+ "learning_rate": 0.00017207572684246113,
577
+ "loss": 1.2951341629028321,
578
+ "mean_token_accuracy": 0.7052147269248963,
579
+ "num_tokens": 3343572.0,
580
+ "step": 570
581
+ },
582
+ {
583
+ "entropy": 1.2493580430746078,
584
+ "epoch": 0.5589014695254155,
585
+ "grad_norm": 0.06532762199640274,
586
+ "learning_rate": 0.00017139959432048682,
587
+ "loss": 1.2813149452209474,
588
+ "mean_token_accuracy": 0.7067271783947945,
589
+ "num_tokens": 3402401.0,
590
+ "step": 580
591
+ },
592
+ {
593
+ "entropy": 1.174679161608219,
594
+ "epoch": 0.5685377017586124,
595
+ "grad_norm": 0.0596516914665699,
596
+ "learning_rate": 0.0001707234617985125,
597
+ "loss": 1.2225926399230957,
598
+ "mean_token_accuracy": 0.7233842894434929,
599
+ "num_tokens": 3459042.0,
600
+ "step": 590
601
+ },
602
+ {
603
+ "entropy": 1.2576898023486138,
604
+ "epoch": 0.5781739339918092,
605
+ "grad_norm": 0.05735331028699875,
606
+ "learning_rate": 0.0001700473292765382,
607
+ "loss": 1.2835253715515136,
608
+ "mean_token_accuracy": 0.7045948460698128,
609
+ "num_tokens": 3527637.0,
610
+ "step": 600
611
+ },
612
+ {
613
+ "entropy": 1.2473611667752267,
614
+ "epoch": 0.587810166225006,
615
+ "grad_norm": 0.06076724827289581,
616
+ "learning_rate": 0.0001693711967545639,
617
+ "loss": 1.2636917114257813,
618
+ "mean_token_accuracy": 0.7045381426811218,
619
+ "num_tokens": 3589404.0,
620
+ "step": 610
621
+ },
622
+ {
623
+ "entropy": 1.2829708829522133,
624
+ "epoch": 0.5974463984582028,
625
+ "grad_norm": 0.05943462252616882,
626
+ "learning_rate": 0.0001686950642325896,
627
+ "loss": 1.2889452934265138,
628
+ "mean_token_accuracy": 0.7001010566949845,
629
+ "num_tokens": 3651124.0,
630
+ "step": 620
631
+ },
632
+ {
633
+ "entropy": 1.2208770334720611,
634
+ "epoch": 0.6070826306913997,
635
+ "grad_norm": 0.04695171117782593,
636
+ "learning_rate": 0.0001680189317106153,
637
+ "loss": 1.259203052520752,
638
+ "mean_token_accuracy": 0.7110297352075576,
639
+ "num_tokens": 3705293.0,
640
+ "step": 630
641
+ },
642
+ {
643
+ "entropy": 1.301590697467327,
644
+ "epoch": 0.6167188629245964,
645
+ "grad_norm": 0.047615595161914825,
646
+ "learning_rate": 0.00016734279918864098,
647
+ "loss": 1.322781467437744,
648
+ "mean_token_accuracy": 0.6952215626835823,
649
+ "num_tokens": 3765429.0,
650
+ "step": 640
651
+ },
652
+ {
653
+ "entropy": 1.2000589437782765,
654
+ "epoch": 0.6263550951577933,
655
+ "grad_norm": 0.057772569358348846,
656
+ "learning_rate": 0.0001666666666666667,
657
+ "loss": 1.2448016166687013,
658
+ "mean_token_accuracy": 0.7192464172840118,
659
+ "num_tokens": 3826736.0,
660
+ "step": 650
661
+ },
662
+ {
663
+ "entropy": 1.187142127752304,
664
+ "epoch": 0.6359913273909901,
665
+ "grad_norm": 0.06310118734836578,
666
+ "learning_rate": 0.00016599053414469237,
667
+ "loss": 1.1989784240722656,
668
+ "mean_token_accuracy": 0.7233487412333488,
669
+ "num_tokens": 3886196.0,
670
+ "step": 660
671
+ },
672
+ {
673
+ "entropy": 1.2190307170152663,
674
+ "epoch": 0.6456275596241869,
675
+ "grad_norm": 0.049476709216833115,
676
+ "learning_rate": 0.00016531440162271806,
677
+ "loss": 1.2834860801696777,
678
+ "mean_token_accuracy": 0.713593752682209,
679
+ "num_tokens": 3941611.0,
680
+ "step": 670
681
+ },
682
+ {
683
+ "entropy": 1.1955443389713765,
684
+ "epoch": 0.6552637918573838,
685
+ "grad_norm": 0.06877182424068451,
686
+ "learning_rate": 0.00016463826910074377,
687
+ "loss": 1.2530131340026855,
688
+ "mean_token_accuracy": 0.7211957752704621,
689
+ "num_tokens": 3998745.0,
690
+ "step": 680
691
+ },
692
+ {
693
+ "entropy": 1.255328443646431,
694
+ "epoch": 0.6649000240905806,
695
+ "grad_norm": 0.06499036401510239,
696
+ "learning_rate": 0.00016396213657876945,
697
+ "loss": 1.2712255477905274,
698
+ "mean_token_accuracy": 0.7075859263539315,
699
+ "num_tokens": 4057693.0,
700
+ "step": 690
701
+ },
702
+ {
703
+ "entropy": 1.269506113231182,
704
+ "epoch": 0.6745362563237775,
705
+ "grad_norm": 0.07255974411964417,
706
+ "learning_rate": 0.00016328600405679514,
707
+ "loss": 1.294202709197998,
708
+ "mean_token_accuracy": 0.7018555745482444,
709
+ "num_tokens": 4113943.0,
710
+ "step": 700
711
+ },
712
+ {
713
+ "entropy": 1.24193774163723,
714
+ "epoch": 0.6841724885569742,
715
+ "grad_norm": 0.07056763768196106,
716
+ "learning_rate": 0.00016260987153482082,
717
+ "loss": 1.2850048065185546,
718
+ "mean_token_accuracy": 0.714261619746685,
719
+ "num_tokens": 4171377.0,
720
+ "step": 710
721
+ },
722
+ {
723
+ "entropy": 1.2012858629226684,
724
+ "epoch": 0.6938087207901711,
725
+ "grad_norm": 0.06263507157564163,
726
+ "learning_rate": 0.0001619337390128465,
727
+ "loss": 1.2244320869445802,
728
+ "mean_token_accuracy": 0.7127483233809471,
729
+ "num_tokens": 4224788.0,
730
+ "step": 720
731
+ },
732
+ {
733
+ "entropy": 1.118230439722538,
734
+ "epoch": 0.7034449530233678,
735
+ "grad_norm": 0.0841294601559639,
736
+ "learning_rate": 0.00016125760649087222,
737
+ "loss": 1.1727646827697753,
738
+ "mean_token_accuracy": 0.7389020159840584,
739
+ "num_tokens": 4273260.0,
740
+ "step": 730
741
+ },
742
+ {
743
+ "entropy": 1.1662761889398099,
744
+ "epoch": 0.7130811852565647,
745
+ "grad_norm": 0.05674638971686363,
746
+ "learning_rate": 0.0001605814739688979,
747
+ "loss": 1.2040279388427735,
748
+ "mean_token_accuracy": 0.7247484371066093,
749
+ "num_tokens": 4330262.0,
750
+ "step": 740
751
+ },
752
+ {
753
+ "entropy": 1.1847384825348855,
754
+ "epoch": 0.7227174174897615,
755
+ "grad_norm": 0.0654911920428276,
756
+ "learning_rate": 0.00015990534144692359,
757
+ "loss": 1.245723056793213,
758
+ "mean_token_accuracy": 0.7201158210635186,
759
+ "num_tokens": 4387940.0,
760
+ "step": 750
761
+ },
762
+ {
763
+ "entropy": 1.2574817538261414,
764
+ "epoch": 0.7323536497229584,
765
+ "grad_norm": 0.05644860863685608,
766
+ "learning_rate": 0.0001592292089249493,
767
+ "loss": 1.2513961791992188,
768
+ "mean_token_accuracy": 0.7070513799786567,
769
+ "num_tokens": 4451167.0,
770
+ "step": 760
771
+ },
772
+ {
773
+ "entropy": 1.1798933163285255,
774
+ "epoch": 0.7419898819561551,
775
+ "grad_norm": 0.04677167534828186,
776
+ "learning_rate": 0.00015855307640297498,
777
+ "loss": 1.2556601524353028,
778
+ "mean_token_accuracy": 0.7209865570068359,
779
+ "num_tokens": 4514662.0,
780
+ "step": 770
781
+ },
782
+ {
783
+ "entropy": 1.2833519145846366,
784
+ "epoch": 0.751626114189352,
785
+ "grad_norm": 0.05460638552904129,
786
+ "learning_rate": 0.0001578769438810007,
787
+ "loss": 1.3391889572143554,
788
+ "mean_token_accuracy": 0.699195285141468,
789
+ "num_tokens": 4577464.0,
790
+ "step": 780
791
+ },
792
+ {
793
+ "entropy": 1.301037323474884,
794
+ "epoch": 0.7612623464225488,
795
+ "grad_norm": 0.08067753911018372,
796
+ "learning_rate": 0.00015720081135902638,
797
+ "loss": 1.3280585289001465,
798
+ "mean_token_accuracy": 0.6965255320072175,
799
+ "num_tokens": 4640394.0,
800
+ "step": 790
801
+ },
802
+ {
803
+ "entropy": 1.1654298216104508,
804
+ "epoch": 0.7708985786557456,
805
+ "grad_norm": 0.06938653439283371,
806
+ "learning_rate": 0.00015652467883705206,
807
+ "loss": 1.2016778945922852,
808
+ "mean_token_accuracy": 0.7259130507707596,
809
+ "num_tokens": 4694747.0,
810
+ "step": 800
811
+ },
812
+ {
813
+ "entropy": 1.2972704201936722,
814
+ "epoch": 0.7805348108889424,
815
+ "grad_norm": 0.05404666066169739,
816
+ "learning_rate": 0.00015584854631507777,
817
+ "loss": 1.2715141296386718,
818
+ "mean_token_accuracy": 0.7007692798972129,
819
+ "num_tokens": 4754920.0,
820
+ "step": 810
821
+ },
822
+ {
823
+ "entropy": 1.191229759156704,
824
+ "epoch": 0.7901710431221393,
825
+ "grad_norm": 0.05804457888007164,
826
+ "learning_rate": 0.00015517241379310346,
827
+ "loss": 1.2225379943847656,
828
+ "mean_token_accuracy": 0.7155083760619163,
829
+ "num_tokens": 4813911.0,
830
+ "step": 820
831
+ },
832
+ {
833
+ "entropy": 1.1537679880857468,
834
+ "epoch": 0.799807275355336,
835
+ "grad_norm": 0.06406034529209137,
836
+ "learning_rate": 0.00015449628127112914,
837
+ "loss": 1.2089889526367188,
838
+ "mean_token_accuracy": 0.7305796332657337,
839
+ "num_tokens": 4867754.0,
840
+ "step": 830
841
+ },
842
+ {
843
+ "entropy": 1.3313012823462487,
844
+ "epoch": 0.8094435075885329,
845
+ "grad_norm": 0.05331671983003616,
846
+ "learning_rate": 0.00015382014874915485,
847
+ "loss": 1.3202786445617676,
848
+ "mean_token_accuracy": 0.6896555438637734,
849
+ "num_tokens": 4932220.0,
850
+ "step": 840
851
+ },
852
+ {
853
+ "entropy": 1.1428007125854491,
854
+ "epoch": 0.8190797398217297,
855
+ "grad_norm": 0.0543409064412117,
856
+ "learning_rate": 0.00015314401622718054,
857
+ "loss": 1.1924286842346192,
858
+ "mean_token_accuracy": 0.7276274234056472,
859
+ "num_tokens": 4991379.0,
860
+ "step": 850
861
+ },
862
+ {
863
+ "entropy": 1.1923618368804454,
864
+ "epoch": 0.8287159720549265,
865
+ "grad_norm": 0.04643765091896057,
866
+ "learning_rate": 0.00015246788370520625,
867
+ "loss": 1.2740966796875,
868
+ "mean_token_accuracy": 0.7193146347999573,
869
+ "num_tokens": 5050844.0,
870
+ "step": 860
871
+ },
872
+ {
873
+ "entropy": 1.2764008730649947,
874
+ "epoch": 0.8383522042881233,
875
+ "grad_norm": 0.06100849807262421,
876
+ "learning_rate": 0.00015179175118323193,
877
+ "loss": 1.2969582557678223,
878
+ "mean_token_accuracy": 0.7002251073718071,
879
+ "num_tokens": 5111937.0,
880
+ "step": 870
881
+ },
882
+ {
883
+ "entropy": 1.1852023541927337,
884
+ "epoch": 0.8479884365213202,
885
+ "grad_norm": 0.08129674941301346,
886
+ "learning_rate": 0.00015111561866125762,
887
+ "loss": 1.215758991241455,
888
+ "mean_token_accuracy": 0.7221846386790276,
889
+ "num_tokens": 5173122.0,
890
+ "step": 880
891
+ },
892
+ {
893
+ "entropy": 1.2100131824612617,
894
+ "epoch": 0.857624668754517,
895
+ "grad_norm": 0.05021384358406067,
896
+ "learning_rate": 0.0001504394861392833,
897
+ "loss": 1.245238971710205,
898
+ "mean_token_accuracy": 0.7174671500921249,
899
+ "num_tokens": 5237440.0,
900
+ "step": 890
901
+ },
902
+ {
903
+ "entropy": 1.2959269508719444,
904
+ "epoch": 0.8672609009877138,
905
+ "grad_norm": 0.06317298859357834,
906
+ "learning_rate": 0.000149763353617309,
907
+ "loss": 1.298589324951172,
908
+ "mean_token_accuracy": 0.696705661714077,
909
+ "num_tokens": 5306311.0,
910
+ "step": 900
911
+ },
912
+ {
913
+ "entropy": 1.178437228500843,
914
+ "epoch": 0.8768971332209107,
915
+ "grad_norm": 0.0590016208589077,
916
+ "learning_rate": 0.00014908722109533467,
917
+ "loss": 1.2140485763549804,
918
+ "mean_token_accuracy": 0.7231360018253327,
919
+ "num_tokens": 5365204.0,
920
+ "step": 910
921
+ },
922
+ {
923
+ "entropy": 1.1817002773284913,
924
+ "epoch": 0.8865333654541074,
925
+ "grad_norm": 0.06499218195676804,
926
+ "learning_rate": 0.00014841108857336038,
927
+ "loss": 1.2365409851074218,
928
+ "mean_token_accuracy": 0.7207681879401207,
929
+ "num_tokens": 5425136.0,
930
+ "step": 920
931
+ },
932
+ {
933
+ "entropy": 1.1871344536542892,
934
+ "epoch": 0.8961695976873043,
935
+ "grad_norm": 0.05251992866396904,
936
+ "learning_rate": 0.00014773495605138607,
937
+ "loss": 1.216776180267334,
938
+ "mean_token_accuracy": 0.7216181293129921,
939
+ "num_tokens": 5485557.0,
940
+ "step": 930
941
+ },
942
+ {
943
+ "entropy": 1.2011874541640282,
944
+ "epoch": 0.9058058299205011,
945
+ "grad_norm": 0.05413221940398216,
946
+ "learning_rate": 0.00014705882352941178,
947
+ "loss": 1.2213294982910157,
948
+ "mean_token_accuracy": 0.7150228247046471,
949
+ "num_tokens": 5545105.0,
950
+ "step": 940
951
+ },
952
+ {
953
+ "entropy": 1.2181889459490776,
954
+ "epoch": 0.9154420621536979,
955
+ "grad_norm": 0.056055545806884766,
956
+ "learning_rate": 0.00014638269100743746,
957
+ "loss": 1.25078125,
958
+ "mean_token_accuracy": 0.7119832545518875,
959
+ "num_tokens": 5605134.0,
960
+ "step": 950
961
+ },
962
+ {
963
+ "entropy": 1.1678502529859542,
964
+ "epoch": 0.9250782943868947,
965
+ "grad_norm": 0.05342623591423035,
966
+ "learning_rate": 0.00014570655848546315,
967
+ "loss": 1.2045835494995116,
968
+ "mean_token_accuracy": 0.7228824377059937,
969
+ "num_tokens": 5663877.0,
970
+ "step": 960
971
+ },
972
+ {
973
+ "entropy": 1.1557280227541924,
974
+ "epoch": 0.9347145266200916,
975
+ "grad_norm": 0.056248344480991364,
976
+ "learning_rate": 0.00014503042596348886,
977
+ "loss": 1.175156021118164,
978
+ "mean_token_accuracy": 0.7239007547497749,
979
+ "num_tokens": 5726645.0,
980
+ "step": 970
981
+ },
982
+ {
983
+ "entropy": 1.2256763622164726,
984
+ "epoch": 0.9443507588532883,
985
+ "grad_norm": 0.06137476861476898,
986
+ "learning_rate": 0.00014435429344151454,
987
+ "loss": 1.234114933013916,
988
+ "mean_token_accuracy": 0.7094191774725914,
989
+ "num_tokens": 5783468.0,
990
+ "step": 980
991
+ },
992
+ {
993
+ "entropy": 1.1099035568535327,
994
+ "epoch": 0.9539869910864852,
995
+ "grad_norm": 0.052310869097709656,
996
+ "learning_rate": 0.00014367816091954023,
997
+ "loss": 1.158639907836914,
998
+ "mean_token_accuracy": 0.7366219267249108,
999
+ "num_tokens": 5840101.0,
1000
+ "step": 990
1001
+ },
1002
+ {
1003
+ "entropy": 1.1962413311004638,
1004
+ "epoch": 0.963623223319682,
1005
+ "grad_norm": 0.055140018463134766,
1006
+ "learning_rate": 0.00014300202839756594,
1007
+ "loss": 1.210787010192871,
1008
+ "mean_token_accuracy": 0.7173429980874062,
1009
+ "num_tokens": 5903466.0,
1010
+ "step": 1000
1011
+ },
1012
+ {
1013
+ "entropy": 1.0866067253053189,
1014
+ "epoch": 0.9732594555528788,
1015
+ "grad_norm": 0.07032942771911621,
1016
+ "learning_rate": 0.00014232589587559162,
1017
+ "loss": 1.1129253387451172,
1018
+ "mean_token_accuracy": 0.7413682281970978,
1019
+ "num_tokens": 5958697.0,
1020
+ "step": 1010
1021
+ },
1022
+ {
1023
+ "entropy": 1.1695112690329552,
1024
+ "epoch": 0.9828956877860756,
1025
+ "grad_norm": 0.06965778768062592,
1026
+ "learning_rate": 0.00014164976335361734,
1027
+ "loss": 1.2002062797546387,
1028
+ "mean_token_accuracy": 0.7275112867355347,
1029
+ "num_tokens": 6018820.0,
1030
+ "step": 1020
1031
+ },
1032
+ {
1033
+ "entropy": 1.129349136352539,
1034
+ "epoch": 0.9925319200192725,
1035
+ "grad_norm": 0.0639411062002182,
1036
+ "learning_rate": 0.00014097363083164302,
1037
+ "loss": 1.1613442420959472,
1038
+ "mean_token_accuracy": 0.7342579141259193,
1039
+ "num_tokens": 6074337.0,
1040
+ "step": 1030
1041
+ },
1042
+ {
1043
+ "epoch": 1.0,
1044
+ "eval_entropy": 1.0300818726266783,
1045
+ "eval_loss": 1.032158374786377,
1046
+ "eval_mean_token_accuracy": 0.7592671683222431,
1047
+ "eval_num_tokens": 6115944.0,
1048
+ "eval_runtime": 513.4443,
1049
+ "eval_samples_per_second": 1.7,
1050
+ "eval_steps_per_second": 1.7,
1051
+ "step": 1038
1052
+ }
1053
+ ],
1054
+ "logging_steps": 10,
1055
+ "max_steps": 3114,
1056
+ "num_input_tokens_seen": 0,
1057
+ "num_train_epochs": 3,
1058
+ "save_steps": 500,
1059
+ "stateful_callbacks": {
1060
+ "TrainerControl": {
1061
+ "args": {
1062
+ "should_epoch_stop": false,
1063
+ "should_evaluate": false,
1064
+ "should_log": false,
1065
+ "should_save": true,
1066
+ "should_training_stop": false
1067
+ },
1068
+ "attributes": {}
1069
+ }
1070
+ },
1071
+ "total_flos": 2.1498282716531098e+18,
1072
+ "train_batch_size": 4,
1073
+ "trial_name": null,
1074
+ "trial_params": null
1075
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16c55af3229f91defd719bd313afa179de5164961715a6136bf7587ace17f152
3
+ size 5240