Aananda-giri commited on
Commit
8b35fe8
·
verified ·
1 Parent(s): bee823c

Upload Thera dialogue fine-tuned model

Browse files
.gitattributes CHANGED
@@ -37,3 +37,5 @@ tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
  Thera-adapters/loss_plots.png filter=lfs diff=lfs merge=lfs -text
38
  checkpoint-405/tokenizer.json filter=lfs diff=lfs merge=lfs -text
39
  checkpoint-810/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
 
37
  Thera-adapters/loss_plots.png filter=lfs diff=lfs merge=lfs -text
38
  checkpoint-405/tokenizer.json filter=lfs diff=lfs merge=lfs -text
39
  checkpoint-810/tokenizer.json filter=lfs diff=lfs merge=lfs -text
40
+ checkpoint-919/tokenizer.json filter=lfs diff=lfs merge=lfs -text
41
+ crypto-thera-adapters/loss_plots.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  base_model: Qwen/Qwen3-4B
3
  library_name: peft
4
- model_name: CryptoStatuette-qwen-finetuned
5
  tags:
6
  - base_model:adapter:Qwen/Qwen3-4B
7
  - lora
@@ -12,7 +12,7 @@ licence: license
12
  pipeline_tag: text-generation
13
  ---
14
 
15
- # Model Card for CryptoStatuette-qwen-finetuned
16
 
17
  This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B).
18
  It has been trained using [TRL](https://github.com/huggingface/trl).
@@ -38,8 +38,8 @@ This model was trained with SFT.
38
  ### Framework versions
39
 
40
  - PEFT 0.18.0
41
- - TRL: 0.25.1
42
- - Transformers: 4.57.2
43
  - Pytorch: 2.9.0+cu126
44
  - Datasets: 4.0.0
45
  - Tokenizers: 0.22.1
 
1
  ---
2
  base_model: Qwen/Qwen3-4B
3
  library_name: peft
4
+ model_name: Thera-qwen-finetuned
5
  tags:
6
  - base_model:adapter:Qwen/Qwen3-4B
7
  - lora
 
12
  pipeline_tag: text-generation
13
  ---
14
 
15
+ # Model Card for Thera-qwen-finetuned
16
 
17
  This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B).
18
  It has been trained using [TRL](https://github.com/huggingface/trl).
 
38
  ### Framework versions
39
 
40
  - PEFT 0.18.0
41
+ - TRL: 0.26.1
42
+ - Transformers: 4.57.3
43
  - Pytorch: 2.9.0+cu126
44
  - Datasets: 4.0.0
45
  - Tokenizers: 0.22.1
checkpoint-919/README.md ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-4B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3-4B
7
+ - lora
8
+ - sft
9
+ - transformers
10
+ - trl
11
+ ---
12
+
13
+ # Model Card for Model ID
14
+
15
+ <!-- Provide a quick summary of what the model is/does. -->
16
+
17
+
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ <!-- Provide a longer summary of what this model is. -->
24
+
25
+
26
+
27
+ - **Developed by:** [More Information Needed]
28
+ - **Funded by [optional]:** [More Information Needed]
29
+ - **Shared by [optional]:** [More Information Needed]
30
+ - **Model type:** [More Information Needed]
31
+ - **Language(s) (NLP):** [More Information Needed]
32
+ - **License:** [More Information Needed]
33
+ - **Finetuned from model [optional]:** [More Information Needed]
34
+
35
+ ### Model Sources [optional]
36
+
37
+ <!-- Provide the basic links for the model. -->
38
+
39
+ - **Repository:** [More Information Needed]
40
+ - **Paper [optional]:** [More Information Needed]
41
+ - **Demo [optional]:** [More Information Needed]
42
+
43
+ ## Uses
44
+
45
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
46
+
47
+ ### Direct Use
48
+
49
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
50
+
51
+ [More Information Needed]
52
+
53
+ ### Downstream Use [optional]
54
+
55
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
56
+
57
+ [More Information Needed]
58
+
59
+ ### Out-of-Scope Use
60
+
61
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
62
+
63
+ [More Information Needed]
64
+
65
+ ## Bias, Risks, and Limitations
66
+
67
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
68
+
69
+ [More Information Needed]
70
+
71
+ ### Recommendations
72
+
73
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
74
+
75
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
76
+
77
+ ## How to Get Started with the Model
78
+
79
+ Use the code below to get started with the model.
80
+
81
+ [More Information Needed]
82
+
83
+ ## Training Details
84
+
85
+ ### Training Data
86
+
87
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
+
89
+ [More Information Needed]
90
+
91
+ ### Training Procedure
92
+
93
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
94
+
95
+ #### Preprocessing [optional]
96
+
97
+ [More Information Needed]
98
+
99
+
100
+ #### Training Hyperparameters
101
+
102
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
103
+
104
+ #### Speeds, Sizes, Times [optional]
105
+
106
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
107
+
108
+ [More Information Needed]
109
+
110
+ ## Evaluation
111
+
112
+ <!-- This section describes the evaluation protocols and provides the results. -->
113
+
114
+ ### Testing Data, Factors & Metrics
115
+
116
+ #### Testing Data
117
+
118
+ <!-- This should link to a Dataset Card if possible. -->
119
+
120
+ [More Information Needed]
121
+
122
+ #### Factors
123
+
124
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
125
+
126
+ [More Information Needed]
127
+
128
+ #### Metrics
129
+
130
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
131
+
132
+ [More Information Needed]
133
+
134
+ ### Results
135
+
136
+ [More Information Needed]
137
+
138
+ #### Summary
139
+
140
+
141
+
142
+ ## Model Examination [optional]
143
+
144
+ <!-- Relevant interpretability work for the model goes here -->
145
+
146
+ [More Information Needed]
147
+
148
+ ## Environmental Impact
149
+
150
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
151
+
152
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
153
+
154
+ - **Hardware Type:** [More Information Needed]
155
+ - **Hours used:** [More Information Needed]
156
+ - **Cloud Provider:** [More Information Needed]
157
+ - **Compute Region:** [More Information Needed]
158
+ - **Carbon Emitted:** [More Information Needed]
159
+
160
+ ## Technical Specifications [optional]
161
+
162
+ ### Model Architecture and Objective
163
+
164
+ [More Information Needed]
165
+
166
+ ### Compute Infrastructure
167
+
168
+ [More Information Needed]
169
+
170
+ #### Hardware
171
+
172
+ [More Information Needed]
173
+
174
+ #### Software
175
+
176
+ [More Information Needed]
177
+
178
+ ## Citation [optional]
179
+
180
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
181
+
182
+ **BibTeX:**
183
+
184
+ [More Information Needed]
185
+
186
+ **APA:**
187
+
188
+ [More Information Needed]
189
+
190
+ ## Glossary [optional]
191
+
192
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
193
+
194
+ [More Information Needed]
195
+
196
+ ## More Information [optional]
197
+
198
+ [More Information Needed]
199
+
200
+ ## Model Card Authors [optional]
201
+
202
+ [More Information Needed]
203
+
204
+ ## Model Card Contact
205
+
206
+ [More Information Needed]
207
+ ### Framework versions
208
+
209
+ - PEFT 0.18.0
checkpoint-919/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3-4B",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 128,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 64,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "q_proj",
33
+ "down_proj",
34
+ "o_proj",
35
+ "k_proj",
36
+ "up_proj",
37
+ "v_proj",
38
+ "gate_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
checkpoint-919/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c56f62f45a8ab6ac856e3e9f2349d60cc62c28c8c7e8a9d8135adaa254dad027
3
+ size 528550256
checkpoint-919/added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
checkpoint-919/chat_template.jinja ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if message.content is string %}
27
+ {%- set content = message.content %}
28
+ {%- else %}
29
+ {%- set content = '' %}
30
+ {%- endif %}
31
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
32
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
33
+ {%- elif message.role == "assistant" %}
34
+ {%- set reasoning_content = '' %}
35
+ {%- if message.reasoning_content is string %}
36
+ {%- set reasoning_content = message.reasoning_content %}
37
+ {%- else %}
38
+ {%- if '</think>' in content %}
39
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
40
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
41
+ {%- endif %}
42
+ {%- endif %}
43
+ {%- if loop.index0 > ns.last_query_index %}
44
+ {%- if loop.last or (not loop.last and reasoning_content) %}
45
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
46
+ {%- else %}
47
+ {{- '<|im_start|>' + message.role + '\n' + content }}
48
+ {%- endif %}
49
+ {%- else %}
50
+ {{- '<|im_start|>' + message.role + '\n' + content }}
51
+ {%- endif %}
52
+ {%- if message.tool_calls %}
53
+ {%- for tool_call in message.tool_calls %}
54
+ {%- if (loop.first and content) or (not loop.first) %}
55
+ {{- '\n' }}
56
+ {%- endif %}
57
+ {%- if tool_call.function %}
58
+ {%- set tool_call = tool_call.function %}
59
+ {%- endif %}
60
+ {{- '<tool_call>\n{"name": "' }}
61
+ {{- tool_call.name }}
62
+ {{- '", "arguments": ' }}
63
+ {%- if tool_call.arguments is string %}
64
+ {{- tool_call.arguments }}
65
+ {%- else %}
66
+ {{- tool_call.arguments | tojson }}
67
+ {%- endif %}
68
+ {{- '}\n</tool_call>' }}
69
+ {%- endfor %}
70
+ {%- endif %}
71
+ {{- '<|im_end|>\n' }}
72
+ {%- elif message.role == "tool" %}
73
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
74
+ {{- '<|im_start|>user' }}
75
+ {%- endif %}
76
+ {{- '\n<tool_response>\n' }}
77
+ {{- content }}
78
+ {{- '\n</tool_response>' }}
79
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
80
+ {{- '<|im_end|>\n' }}
81
+ {%- endif %}
82
+ {%- endif %}
83
+ {%- endfor %}
84
+ {%- if add_generation_prompt %}
85
+ {{- '<|im_start|>assistant\n' }}
86
+ {%- if enable_thinking is defined and enable_thinking is false %}
87
+ {{- '<think>\n\n</think>\n\n' }}
88
+ {%- endif %}
89
+ {%- endif %}
checkpoint-919/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-919/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4027bb9ddd9c7da1a321d305718e185b0225c664c4c005e7ef619fb25289fbc
3
+ size 1057397963
checkpoint-919/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1b9a669990eaa12fac0cc394f2dc8ae57aa35afb582a495f244c1744ad72a8a
3
+ size 14645
checkpoint-919/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:625b5b26600a0c8a228296cfb69e3a316e7cbd299ada136d4cc35aa98a57707f
3
+ size 1465
checkpoint-919/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
checkpoint-919/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654
checkpoint-919/tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "clean_up_tokenization_spaces": false,
231
+ "eos_token": "<|im_end|>",
232
+ "errors": "replace",
233
+ "extra_special_tokens": {},
234
+ "model_max_length": 131072,
235
+ "pad_token": "<|endoftext|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
checkpoint-919/trainer_state.json ADDED
@@ -0,0 +1,955 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 919,
3
+ "best_metric": 0.5503931641578674,
4
+ "best_model_checkpoint": "Thera-qwen-finetuned/checkpoint-919",
5
+ "epoch": 1.0,
6
+ "eval_steps": 500,
7
+ "global_step": 919,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "entropy": 1.4115246817469598,
14
+ "epoch": 0.010887316276537834,
15
+ "grad_norm": 2.907423257827759,
16
+ "learning_rate": 4.891304347826087e-05,
17
+ "loss": 3.3033,
18
+ "mean_token_accuracy": 0.5003452189266682,
19
+ "num_tokens": 7489.0,
20
+ "step": 10
21
+ },
22
+ {
23
+ "entropy": 1.1823123581707478,
24
+ "epoch": 0.021774632553075667,
25
+ "grad_norm": 1.0035202503204346,
26
+ "learning_rate": 0.0001032608695652174,
27
+ "loss": 1.0891,
28
+ "mean_token_accuracy": 0.807713833451271,
29
+ "num_tokens": 14981.0,
30
+ "step": 20
31
+ },
32
+ {
33
+ "entropy": 0.7195636250078679,
34
+ "epoch": 0.0326619488296135,
35
+ "grad_norm": 0.7734199166297913,
36
+ "learning_rate": 0.0001576086956521739,
37
+ "loss": 0.7185,
38
+ "mean_token_accuracy": 0.8713902071118355,
39
+ "num_tokens": 22399.0,
40
+ "step": 30
41
+ },
42
+ {
43
+ "entropy": 0.598898995667696,
44
+ "epoch": 0.043549265106151334,
45
+ "grad_norm": 0.737375020980835,
46
+ "learning_rate": 0.00021195652173913043,
47
+ "loss": 0.6231,
48
+ "mean_token_accuracy": 0.8840597704052925,
49
+ "num_tokens": 29741.0,
50
+ "step": 40
51
+ },
52
+ {
53
+ "entropy": 0.6763634454458952,
54
+ "epoch": 0.05443658138268917,
55
+ "grad_norm": 0.77037113904953,
56
+ "learning_rate": 0.000266304347826087,
57
+ "loss": 0.7017,
58
+ "mean_token_accuracy": 0.8767582342028618,
59
+ "num_tokens": 37112.0,
60
+ "step": 50
61
+ },
62
+ {
63
+ "entropy": 0.6615333516150713,
64
+ "epoch": 0.065323897659227,
65
+ "grad_norm": 0.9717927575111389,
66
+ "learning_rate": 0.00032065217391304346,
67
+ "loss": 0.7214,
68
+ "mean_token_accuracy": 0.8725218087434768,
69
+ "num_tokens": 44590.0,
70
+ "step": 60
71
+ },
72
+ {
73
+ "entropy": 0.665872010961175,
74
+ "epoch": 0.07621121393576484,
75
+ "grad_norm": 0.6530428528785706,
76
+ "learning_rate": 0.000375,
77
+ "loss": 0.6938,
78
+ "mean_token_accuracy": 0.8677900999784469,
79
+ "num_tokens": 52209.0,
80
+ "step": 70
81
+ },
82
+ {
83
+ "entropy": 0.6730144165456295,
84
+ "epoch": 0.08709853021230267,
85
+ "grad_norm": 0.8795536756515503,
86
+ "learning_rate": 0.00042934782608695655,
87
+ "loss": 0.6937,
88
+ "mean_token_accuracy": 0.8736036345362663,
89
+ "num_tokens": 59627.0,
90
+ "step": 80
91
+ },
92
+ {
93
+ "entropy": 0.5524990629404783,
94
+ "epoch": 0.0979858464888405,
95
+ "grad_norm": 2.4545137882232666,
96
+ "learning_rate": 0.00048369565217391304,
97
+ "loss": 0.6051,
98
+ "mean_token_accuracy": 0.8858248308300972,
99
+ "num_tokens": 66878.0,
100
+ "step": 90
101
+ },
102
+ {
103
+ "entropy": 0.7578814126551151,
104
+ "epoch": 0.10887316276537834,
105
+ "grad_norm": 1.7287325859069824,
106
+ "learning_rate": 0.0004999116169004186,
107
+ "loss": 0.7764,
108
+ "mean_token_accuracy": 0.8569089323282242,
109
+ "num_tokens": 74378.0,
110
+ "step": 100
111
+ },
112
+ {
113
+ "entropy": 0.6519227813929319,
114
+ "epoch": 0.11976047904191617,
115
+ "grad_norm": 0.6661782264709473,
116
+ "learning_rate": 0.0004994788705196884,
117
+ "loss": 0.7017,
118
+ "mean_token_accuracy": 0.8656885161995888,
119
+ "num_tokens": 81971.0,
120
+ "step": 110
121
+ },
122
+ {
123
+ "entropy": 0.6764750100672245,
124
+ "epoch": 0.130647795318454,
125
+ "grad_norm": 0.5872151255607605,
126
+ "learning_rate": 0.0004986861508565064,
127
+ "loss": 0.6964,
128
+ "mean_token_accuracy": 0.867901885509491,
129
+ "num_tokens": 89448.0,
130
+ "step": 120
131
+ },
132
+ {
133
+ "entropy": 0.6099378645420075,
134
+ "epoch": 0.14153511159499182,
135
+ "grad_norm": 0.569556474685669,
136
+ "learning_rate": 0.0004975346017267744,
137
+ "loss": 0.6559,
138
+ "mean_token_accuracy": 0.877436301112175,
139
+ "num_tokens": 96736.0,
140
+ "step": 130
141
+ },
142
+ {
143
+ "entropy": 0.7155832894146442,
144
+ "epoch": 0.15242242787152968,
145
+ "grad_norm": 1.407532811164856,
146
+ "learning_rate": 0.000496025884701748,
147
+ "loss": 0.7803,
148
+ "mean_token_accuracy": 0.861001954972744,
149
+ "num_tokens": 104199.0,
150
+ "step": 140
151
+ },
152
+ {
153
+ "entropy": 0.6367012947797775,
154
+ "epoch": 0.1633097441480675,
155
+ "grad_norm": 0.7547012567520142,
156
+ "learning_rate": 0.0004941621767105542,
157
+ "loss": 0.66,
158
+ "mean_token_accuracy": 0.8748047932982445,
159
+ "num_tokens": 111648.0,
160
+ "step": 150
161
+ },
162
+ {
163
+ "entropy": 0.6732052929699421,
164
+ "epoch": 0.17419706042460534,
165
+ "grad_norm": 6.324690341949463,
166
+ "learning_rate": 0.0004919461668990982,
167
+ "loss": 0.7214,
168
+ "mean_token_accuracy": 0.869445288181305,
169
+ "num_tokens": 119010.0,
170
+ "step": 160
171
+ },
172
+ {
173
+ "entropy": 0.7065097156912088,
174
+ "epoch": 0.18508437670114317,
175
+ "grad_norm": 0.6715346574783325,
176
+ "learning_rate": 0.0004893810527498928,
177
+ "loss": 0.7486,
178
+ "mean_token_accuracy": 0.867114982008934,
179
+ "num_tokens": 126525.0,
180
+ "step": 170
181
+ },
182
+ {
183
+ "entropy": 0.6536071257665753,
184
+ "epoch": 0.195971692977681,
185
+ "grad_norm": 0.8438016176223755,
186
+ "learning_rate": 0.0004864705354684076,
187
+ "loss": 0.6745,
188
+ "mean_token_accuracy": 0.8712003126740455,
189
+ "num_tokens": 133957.0,
190
+ "step": 180
191
+ },
192
+ {
193
+ "entropy": 0.7786130860447884,
194
+ "epoch": 0.20685900925421882,
195
+ "grad_norm": 11.108195304870605,
196
+ "learning_rate": 0.00048321881464259676,
197
+ "loss": 0.8311,
198
+ "mean_token_accuracy": 0.8524066478013992,
199
+ "num_tokens": 141635.0,
200
+ "step": 190
201
+ },
202
+ {
203
+ "entropy": 0.966428604722023,
204
+ "epoch": 0.21774632553075668,
205
+ "grad_norm": 0.9511222243309021,
206
+ "learning_rate": 0.0004796305821833098,
207
+ "loss": 1.1536,
208
+ "mean_token_accuracy": 0.8078579772263765,
209
+ "num_tokens": 148904.0,
210
+ "step": 200
211
+ },
212
+ {
213
+ "entropy": 0.7515945039689541,
214
+ "epoch": 0.2286336418072945,
215
+ "grad_norm": 1.1783875226974487,
216
+ "learning_rate": 0.00047571101555432896,
217
+ "loss": 0.7879,
218
+ "mean_token_accuracy": 0.8544724628329277,
219
+ "num_tokens": 156485.0,
220
+ "step": 210
221
+ },
222
+ {
223
+ "entropy": 0.7837981097400188,
224
+ "epoch": 0.23952095808383234,
225
+ "grad_norm": 6.818761348724365,
226
+ "learning_rate": 0.0004714657703018024,
227
+ "loss": 0.8526,
228
+ "mean_token_accuracy": 0.8500533938407898,
229
+ "num_tokens": 164052.0,
230
+ "step": 220
231
+ },
232
+ {
233
+ "entropy": 2.9163815125823023,
234
+ "epoch": 0.25040827436037016,
235
+ "grad_norm": 30.283111572265625,
236
+ "learning_rate": 0.0004669009718938517,
237
+ "loss": 4.5265,
238
+ "mean_token_accuracy": 0.5338523894548416,
239
+ "num_tokens": 171458.0,
240
+ "step": 230
241
+ },
242
+ {
243
+ "entropy": 2.0303648129105567,
244
+ "epoch": 0.261295590636908,
245
+ "grad_norm": 115.25823211669922,
246
+ "learning_rate": 0.00046202320688212834,
247
+ "loss": 2.5513,
248
+ "mean_token_accuracy": 0.6328982267528772,
249
+ "num_tokens": 179120.0,
250
+ "step": 240
251
+ },
252
+ {
253
+ "entropy": 0.7287540748715401,
254
+ "epoch": 0.2721829069134458,
255
+ "grad_norm": 8.124970436096191,
256
+ "learning_rate": 0.00045683951339807265,
257
+ "loss": 0.7502,
258
+ "mean_token_accuracy": 0.8621942937374115,
259
+ "num_tokens": 186544.0,
260
+ "step": 250
261
+ },
262
+ {
263
+ "entropy": 0.6197790112346411,
264
+ "epoch": 0.28307022318998365,
265
+ "grad_norm": 0.8240286111831665,
266
+ "learning_rate": 0.0004513573709975877,
267
+ "loss": 0.6533,
268
+ "mean_token_accuracy": 0.8799639001488686,
269
+ "num_tokens": 193880.0,
270
+ "step": 260
271
+ },
272
+ {
273
+ "entropy": 0.5914675913751125,
274
+ "epoch": 0.2939575394665215,
275
+ "grad_norm": 0.7374333143234253,
276
+ "learning_rate": 0.0004455846898687814,
277
+ "loss": 0.6144,
278
+ "mean_token_accuracy": 0.8865744516253471,
279
+ "num_tokens": 201144.0,
280
+ "step": 270
281
+ },
282
+ {
283
+ "entropy": 0.6894511558115483,
284
+ "epoch": 0.30484485574305936,
285
+ "grad_norm": 0.5616167783737183,
286
+ "learning_rate": 0.00043952979941834925,
287
+ "loss": 0.7368,
288
+ "mean_token_accuracy": 0.869646517932415,
289
+ "num_tokens": 208571.0,
290
+ "step": 280
291
+ },
292
+ {
293
+ "entropy": 0.7011830236762762,
294
+ "epoch": 0.3157321720195972,
295
+ "grad_norm": 0.9096492528915405,
296
+ "learning_rate": 0.0004332014362530659,
297
+ "loss": 0.7601,
298
+ "mean_token_accuracy": 0.8600849106907844,
299
+ "num_tokens": 216262.0,
300
+ "step": 290
301
+ },
302
+ {
303
+ "entropy": 0.6535980202257633,
304
+ "epoch": 0.326619488296135,
305
+ "grad_norm": 0.798920214176178,
306
+ "learning_rate": 0.00042660873157372763,
307
+ "loss": 0.6919,
308
+ "mean_token_accuracy": 0.8731523841619492,
309
+ "num_tokens": 223604.0,
310
+ "step": 300
311
+ },
312
+ {
313
+ "entropy": 0.7168508902192116,
314
+ "epoch": 0.33750680457267285,
315
+ "grad_norm": 0.7000783085823059,
316
+ "learning_rate": 0.00041976119799973477,
317
+ "loss": 0.7312,
318
+ "mean_token_accuracy": 0.862776193022728,
319
+ "num_tokens": 231179.0,
320
+ "step": 310
321
+ },
322
+ {
323
+ "entropy": 0.542249171063304,
324
+ "epoch": 0.3483941208492107,
325
+ "grad_norm": 0.6315221190452576,
326
+ "learning_rate": 0.00041266871584332454,
327
+ "loss": 0.5784,
328
+ "mean_token_accuracy": 0.8910727813839913,
329
+ "num_tokens": 238379.0,
330
+ "step": 320
331
+ },
332
+ {
333
+ "entropy": 0.7523471737280488,
334
+ "epoch": 0.3592814371257485,
335
+ "grad_norm": 0.7601750493049622,
336
+ "learning_rate": 0.0004053415188532599,
337
+ "loss": 0.7976,
338
+ "mean_token_accuracy": 0.8552964687347412,
339
+ "num_tokens": 246037.0,
340
+ "step": 330
341
+ },
342
+ {
343
+ "entropy": 0.6234626328572631,
344
+ "epoch": 0.37016875340228633,
345
+ "grad_norm": 0.7975150346755981,
346
+ "learning_rate": 0.0003977901794485446,
347
+ "loss": 0.6718,
348
+ "mean_token_accuracy": 0.8762789338827133,
349
+ "num_tokens": 253413.0,
350
+ "step": 340
351
+ },
352
+ {
353
+ "entropy": 0.6043397862464189,
354
+ "epoch": 0.38105606967882416,
355
+ "grad_norm": 0.6521063446998596,
356
+ "learning_rate": 0.0003900255934634699,
357
+ "loss": 0.6134,
358
+ "mean_token_accuracy": 0.8838247537612915,
359
+ "num_tokens": 260806.0,
360
+ "step": 350
361
+ },
362
+ {
363
+ "entropy": 0.6836169838905335,
364
+ "epoch": 0.391943385955362,
365
+ "grad_norm": 0.9141950607299805,
366
+ "learning_rate": 0.0003820589644260065,
367
+ "loss": 0.7149,
368
+ "mean_token_accuracy": 0.8627639308571815,
369
+ "num_tokens": 268395.0,
370
+ "step": 360
371
+ },
372
+ {
373
+ "entropy": 0.6906204871833325,
374
+ "epoch": 0.4028307022318998,
375
+ "grad_norm": 0.6039382815361023,
376
+ "learning_rate": 0.00037390178739222363,
377
+ "loss": 0.749,
378
+ "mean_token_accuracy": 0.8611163705587387,
379
+ "num_tokens": 276103.0,
380
+ "step": 370
381
+ },
382
+ {
383
+ "entropy": 0.7154507145285607,
384
+ "epoch": 0.41371801850843765,
385
+ "grad_norm": 0.7205569744110107,
386
+ "learning_rate": 0.00036556583236006237,
387
+ "loss": 0.7435,
388
+ "mean_token_accuracy": 0.8620465710759163,
389
+ "num_tokens": 283734.0,
390
+ "step": 380
391
+ },
392
+ {
393
+ "entropy": 0.6251850917935371,
394
+ "epoch": 0.42460533478497553,
395
+ "grad_norm": 0.7158589959144592,
396
+ "learning_rate": 0.0003570631272863956,
397
+ "loss": 0.638,
398
+ "mean_token_accuracy": 0.8804232597351074,
399
+ "num_tokens": 291104.0,
400
+ "step": 390
401
+ },
402
+ {
403
+ "entropy": 0.655794395133853,
404
+ "epoch": 0.43549265106151336,
405
+ "grad_norm": 0.6197627186775208,
406
+ "learning_rate": 0.0003484059407318781,
407
+ "loss": 0.7033,
408
+ "mean_token_accuracy": 0.8641186743974686,
409
+ "num_tokens": 298667.0,
410
+ "step": 400
411
+ },
412
+ {
413
+ "entropy": 0.6755085363984108,
414
+ "epoch": 0.4463799673380512,
415
+ "grad_norm": 0.6667740345001221,
416
+ "learning_rate": 0.00033960676415863015,
417
+ "loss": 0.724,
418
+ "mean_token_accuracy": 0.8660084888339042,
419
+ "num_tokens": 306246.0,
420
+ "step": 410
421
+ },
422
+ {
423
+ "entropy": 0.6676596872508526,
424
+ "epoch": 0.457267283614589,
425
+ "grad_norm": 0.7817707061767578,
426
+ "learning_rate": 0.00033067829390629453,
427
+ "loss": 0.6793,
428
+ "mean_token_accuracy": 0.8715122014284133,
429
+ "num_tokens": 313691.0,
430
+ "step": 420
431
+ },
432
+ {
433
+ "entropy": 0.7171908929944039,
434
+ "epoch": 0.46815459989112684,
435
+ "grad_norm": 1.0613240003585815,
436
+ "learning_rate": 0.00032163341287247876,
437
+ "loss": 0.7552,
438
+ "mean_token_accuracy": 0.8618355020880699,
439
+ "num_tokens": 321394.0,
440
+ "step": 430
441
+ },
442
+ {
443
+ "entropy": 0.5122752044349909,
444
+ "epoch": 0.47904191616766467,
445
+ "grad_norm": 0.649142324924469,
446
+ "learning_rate": 0.00031248517192400876,
447
+ "loss": 0.5586,
448
+ "mean_token_accuracy": 0.8886241987347603,
449
+ "num_tokens": 328807.0,
450
+ "step": 440
451
+ },
452
+ {
453
+ "entropy": 0.5436007279902697,
454
+ "epoch": 0.4899292324442025,
455
+ "grad_norm": 0.5830965042114258,
456
+ "learning_rate": 0.0003032467710658231,
457
+ "loss": 0.5397,
458
+ "mean_token_accuracy": 0.8886899709701538,
459
+ "num_tokens": 336117.0,
460
+ "step": 450
461
+ },
462
+ {
463
+ "entropy": 0.6272627621889114,
464
+ "epoch": 0.5008165487207403,
465
+ "grad_norm": 0.6612179279327393,
466
+ "learning_rate": 0.0002939315403946733,
467
+ "loss": 0.6844,
468
+ "mean_token_accuracy": 0.8671970278024673,
469
+ "num_tokens": 343690.0,
470
+ "step": 460
471
+ },
472
+ {
473
+ "entropy": 0.6421366423368454,
474
+ "epoch": 0.5117038649972782,
475
+ "grad_norm": 0.6228363513946533,
476
+ "learning_rate": 0.0002845529208651161,
477
+ "loss": 0.6428,
478
+ "mean_token_accuracy": 0.8806748360395431,
479
+ "num_tokens": 351025.0,
480
+ "step": 470
481
+ },
482
+ {
483
+ "entropy": 0.5570888858288526,
484
+ "epoch": 0.522591181273816,
485
+ "grad_norm": 0.6702244877815247,
486
+ "learning_rate": 0.00027512444489554767,
487
+ "loss": 0.6107,
488
+ "mean_token_accuracy": 0.8815102204680443,
489
+ "num_tokens": 358280.0,
490
+ "step": 480
491
+ },
492
+ {
493
+ "entropy": 0.5678568474948407,
494
+ "epoch": 0.5334784975503538,
495
+ "grad_norm": 0.6265514492988586,
496
+ "learning_rate": 0.00026565971684226573,
497
+ "loss": 0.5648,
498
+ "mean_token_accuracy": 0.8863577455282211,
499
+ "num_tokens": 365676.0,
500
+ "step": 490
501
+ },
502
+ {
503
+ "entropy": 0.5953685358166695,
504
+ "epoch": 0.5443658138268916,
505
+ "grad_norm": 0.8042871356010437,
506
+ "learning_rate": 0.0002561723933697317,
507
+ "loss": 0.6452,
508
+ "mean_token_accuracy": 0.8779568284749985,
509
+ "num_tokens": 373045.0,
510
+ "step": 500
511
+ },
512
+ {
513
+ "entropy": 0.7206135954707861,
514
+ "epoch": 0.5552531301034295,
515
+ "grad_norm": 0.65378737449646,
516
+ "learning_rate": 0.0002466761637453568,
517
+ "loss": 0.7605,
518
+ "mean_token_accuracy": 0.8624323174357414,
519
+ "num_tokens": 380560.0,
520
+ "step": 510
521
+ },
522
+ {
523
+ "entropy": 0.6116405792534352,
524
+ "epoch": 0.5661404463799673,
525
+ "grad_norm": 0.5593022108078003,
526
+ "learning_rate": 0.00023718473008724742,
527
+ "loss": 0.6431,
528
+ "mean_token_accuracy": 0.8734307438135147,
529
+ "num_tokens": 388027.0,
530
+ "step": 520
531
+ },
532
+ {
533
+ "entropy": 0.6847221277654171,
534
+ "epoch": 0.5770277626565051,
535
+ "grad_norm": 0.6043410301208496,
536
+ "learning_rate": 0.00022771178759340514,
537
+ "loss": 0.6989,
538
+ "mean_token_accuracy": 0.8697591915726661,
539
+ "num_tokens": 395471.0,
540
+ "step": 530
541
+ },
542
+ {
543
+ "entropy": 0.6371712744235992,
544
+ "epoch": 0.587915078933043,
545
+ "grad_norm": 0.5856935381889343,
546
+ "learning_rate": 0.00021827100478091506,
547
+ "loss": 0.6722,
548
+ "mean_token_accuracy": 0.8688437402248382,
549
+ "num_tokens": 403108.0,
550
+ "step": 540
551
+ },
552
+ {
553
+ "entropy": 0.5488773200660944,
554
+ "epoch": 0.5988023952095808,
555
+ "grad_norm": 0.5797805190086365,
556
+ "learning_rate": 0.00020887600376362904,
557
+ "loss": 0.572,
558
+ "mean_token_accuracy": 0.8839038833975792,
559
+ "num_tokens": 410452.0,
560
+ "step": 550
561
+ },
562
+ {
563
+ "entropy": 0.5808871898800134,
564
+ "epoch": 0.6096897114861187,
565
+ "grad_norm": 0.521022379398346,
566
+ "learning_rate": 0.00019954034059680668,
567
+ "loss": 0.5912,
568
+ "mean_token_accuracy": 0.888222835958004,
569
+ "num_tokens": 417828.0,
570
+ "step": 560
571
+ },
572
+ {
573
+ "entropy": 0.5592609565705061,
574
+ "epoch": 0.6205770277626566,
575
+ "grad_norm": 0.5325660705566406,
576
+ "learning_rate": 0.00019027748571707066,
577
+ "loss": 0.569,
578
+ "mean_token_accuracy": 0.8819878950715065,
579
+ "num_tokens": 425205.0,
580
+ "step": 570
581
+ },
582
+ {
583
+ "entropy": 0.6147103808820248,
584
+ "epoch": 0.6314643440391944,
585
+ "grad_norm": 0.5417932271957397,
586
+ "learning_rate": 0.00018110080450590182,
587
+ "loss": 0.6577,
588
+ "mean_token_accuracy": 0.8764070853590965,
589
+ "num_tokens": 432668.0,
590
+ "step": 580
591
+ },
592
+ {
593
+ "entropy": 0.6328225396573544,
594
+ "epoch": 0.6423516603157322,
595
+ "grad_norm": 0.6512264013290405,
596
+ "learning_rate": 0.0001720235380047188,
597
+ "loss": 0.6491,
598
+ "mean_token_accuracy": 0.8737886667251586,
599
+ "num_tokens": 440161.0,
600
+ "step": 590
601
+ },
602
+ {
603
+ "entropy": 0.6266105823218823,
604
+ "epoch": 0.65323897659227,
605
+ "grad_norm": 0.6788283586502075,
606
+ "learning_rate": 0.00016305878380936723,
607
+ "loss": 0.6251,
608
+ "mean_token_accuracy": 0.8808962866663933,
609
+ "num_tokens": 447470.0,
610
+ "step": 600
611
+ },
612
+ {
613
+ "entropy": 0.5399745622649789,
614
+ "epoch": 0.6641262928688079,
615
+ "grad_norm": 0.6711626052856445,
616
+ "learning_rate": 0.00015421947717158752,
617
+ "loss": 0.5736,
618
+ "mean_token_accuracy": 0.8923148453235626,
619
+ "num_tokens": 454744.0,
620
+ "step": 610
621
+ },
622
+ {
623
+ "entropy": 0.695448774844408,
624
+ "epoch": 0.6750136091453457,
625
+ "grad_norm": 0.755664587020874,
626
+ "learning_rate": 0.00014551837233472853,
627
+ "loss": 0.7435,
628
+ "mean_token_accuracy": 0.8619298219680787,
629
+ "num_tokens": 462432.0,
630
+ "step": 620
631
+ },
632
+ {
633
+ "entropy": 0.5408242929726839,
634
+ "epoch": 0.6859009254218835,
635
+ "grad_norm": 0.6359855532646179,
636
+ "learning_rate": 0.0001369680241306384,
637
+ "loss": 0.5643,
638
+ "mean_token_accuracy": 0.8882203832268715,
639
+ "num_tokens": 469763.0,
640
+ "step": 630
641
+ },
642
+ {
643
+ "entropy": 0.5984043713659049,
644
+ "epoch": 0.6967882416984214,
645
+ "grad_norm": 0.6010624170303345,
646
+ "learning_rate": 0.00012858076986428722,
647
+ "loss": 0.5929,
648
+ "mean_token_accuracy": 0.8816081374883652,
649
+ "num_tokens": 477149.0,
650
+ "step": 640
651
+ },
652
+ {
653
+ "entropy": 0.5971009206026793,
654
+ "epoch": 0.7076755579749592,
655
+ "grad_norm": 0.6069923043251038,
656
+ "learning_rate": 0.00012036871151225798,
657
+ "loss": 0.6258,
658
+ "mean_token_accuracy": 0.8779259011149406,
659
+ "num_tokens": 484640.0,
660
+ "step": 650
661
+ },
662
+ {
663
+ "entropy": 0.6068212665617466,
664
+ "epoch": 0.718562874251497,
665
+ "grad_norm": 0.5725326538085938,
666
+ "learning_rate": 0.00011234369826079432,
667
+ "loss": 0.613,
668
+ "mean_token_accuracy": 0.8772263854742051,
669
+ "num_tokens": 492176.0,
670
+ "step": 660
671
+ },
672
+ {
673
+ "entropy": 0.6018822252750397,
674
+ "epoch": 0.7294501905280348,
675
+ "grad_norm": 0.7555631995201111,
676
+ "learning_rate": 0.00010451730940859949,
677
+ "loss": 0.6395,
678
+ "mean_token_accuracy": 0.8763651743531227,
679
+ "num_tokens": 499845.0,
680
+ "step": 670
681
+ },
682
+ {
683
+ "entropy": 0.5979844883084298,
684
+ "epoch": 0.7403375068045727,
685
+ "grad_norm": 0.5494154691696167,
686
+ "learning_rate": 9.690083765905544e-05,
687
+ "loss": 0.6319,
688
+ "mean_token_accuracy": 0.8793143942952156,
689
+ "num_tokens": 507418.0,
690
+ "step": 680
691
+ },
692
+ {
693
+ "entropy": 0.5443667802959681,
694
+ "epoch": 0.7512248230811105,
695
+ "grad_norm": 0.7554852366447449,
696
+ "learning_rate": 8.950527282597156e-05,
697
+ "loss": 0.5712,
698
+ "mean_token_accuracy": 0.8878106832504272,
699
+ "num_tokens": 514785.0,
700
+ "step": 690
701
+ },
702
+ {
703
+ "entropy": 0.5350296102464199,
704
+ "epoch": 0.7621121393576483,
705
+ "grad_norm": 0.5492348670959473,
706
+ "learning_rate": 8.234128597637239e-05,
707
+ "loss": 0.5349,
708
+ "mean_token_accuracy": 0.8928583487868309,
709
+ "num_tokens": 522076.0,
710
+ "step": 700
711
+ },
712
+ {
713
+ "entropy": 0.5867601800709963,
714
+ "epoch": 0.7729994556341862,
715
+ "grad_norm": 0.6712722778320312,
716
+ "learning_rate": 7.541921403320593e-05,
717
+ "loss": 0.6173,
718
+ "mean_token_accuracy": 0.8806560948491097,
719
+ "num_tokens": 529578.0,
720
+ "step": 710
721
+ },
722
+ {
723
+ "entropy": 0.6274564698338508,
724
+ "epoch": 0.783886771910724,
725
+ "grad_norm": 0.5191476345062256,
726
+ "learning_rate": 6.874904486018821e-05,
727
+ "loss": 0.6322,
728
+ "mean_token_accuracy": 0.8802066639065742,
729
+ "num_tokens": 537024.0,
730
+ "step": 720
731
+ },
732
+ {
733
+ "entropy": 0.5485220493748784,
734
+ "epoch": 0.7947740881872618,
735
+ "grad_norm": 0.6473787426948547,
736
+ "learning_rate": 6.234040285030551e-05,
737
+ "loss": 0.5631,
738
+ "mean_token_accuracy": 0.884115393459797,
739
+ "num_tokens": 544467.0,
740
+ "step": 730
741
+ },
742
+ {
743
+ "entropy": 0.5133912313729525,
744
+ "epoch": 0.8056614044637996,
745
+ "grad_norm": 0.6537692546844482,
746
+ "learning_rate": 5.6202535038770045e-05,
747
+ "loss": 0.544,
748
+ "mean_token_accuracy": 0.8923841789364815,
749
+ "num_tokens": 551834.0,
750
+ "step": 740
751
+ },
752
+ {
753
+ "entropy": 0.5843172324821353,
754
+ "epoch": 0.8165487207403375,
755
+ "grad_norm": 0.6818355321884155,
756
+ "learning_rate": 5.0344297760463954e-05,
757
+ "loss": 0.5861,
758
+ "mean_token_accuracy": 0.8821831315755844,
759
+ "num_tokens": 559289.0,
760
+ "step": 750
761
+ },
762
+ {
763
+ "entropy": 0.5654132578521966,
764
+ "epoch": 0.8274360370168753,
765
+ "grad_norm": 0.7315685153007507,
766
+ "learning_rate": 4.477414387112652e-05,
767
+ "loss": 0.6023,
768
+ "mean_token_accuracy": 0.8810448855161667,
769
+ "num_tokens": 566780.0,
770
+ "step": 760
771
+ },
772
+ {
773
+ "entropy": 0.5563108414411545,
774
+ "epoch": 0.8383233532934131,
775
+ "grad_norm": 0.6280286312103271,
776
+ "learning_rate": 3.950011055072039e-05,
777
+ "loss": 0.5637,
778
+ "mean_token_accuracy": 0.8888336777687073,
779
+ "num_tokens": 574206.0,
780
+ "step": 770
781
+ },
782
+ {
783
+ "entropy": 0.5699046881869435,
784
+ "epoch": 0.8492106695699511,
785
+ "grad_norm": 0.4488978683948517,
786
+ "learning_rate": 3.4529807706578346e-05,
787
+ "loss": 0.6045,
788
+ "mean_token_accuracy": 0.8825341418385506,
789
+ "num_tokens": 581608.0,
790
+ "step": 780
791
+ },
792
+ {
793
+ "entropy": 0.5627471528947353,
794
+ "epoch": 0.8600979858464889,
795
+ "grad_norm": 0.7140914797782898,
796
+ "learning_rate": 2.987040699306076e-05,
797
+ "loss": 0.6071,
798
+ "mean_token_accuracy": 0.8823000833392143,
799
+ "num_tokens": 589176.0,
800
+ "step": 790
801
+ },
802
+ {
803
+ "entropy": 0.5856846395879984,
804
+ "epoch": 0.8709853021230267,
805
+ "grad_norm": 0.7391604781150818,
806
+ "learning_rate": 2.5528631463569348e-05,
807
+ "loss": 0.6048,
808
+ "mean_token_accuracy": 0.8830034390091897,
809
+ "num_tokens": 596679.0,
810
+ "step": 800
811
+ },
812
+ {
813
+ "entropy": 0.47006473541259763,
814
+ "epoch": 0.8818726183995645,
815
+ "grad_norm": 0.44131314754486084,
816
+ "learning_rate": 2.151074586984744e-05,
817
+ "loss": 0.4758,
818
+ "mean_token_accuracy": 0.8987425476312637,
819
+ "num_tokens": 603956.0,
820
+ "step": 810
821
+ },
822
+ {
823
+ "entropy": 0.500870693475008,
824
+ "epoch": 0.8927599346761024,
825
+ "grad_norm": 0.9443516135215759,
826
+ "learning_rate": 1.7822547622564188e-05,
827
+ "loss": 0.5234,
828
+ "mean_token_accuracy": 0.8955065041780472,
829
+ "num_tokens": 611308.0,
830
+ "step": 820
831
+ },
832
+ {
833
+ "entropy": 0.5766935784369707,
834
+ "epoch": 0.9036472509526402,
835
+ "grad_norm": 0.6435447931289673,
836
+ "learning_rate": 1.4469358426225682e-05,
837
+ "loss": 0.5874,
838
+ "mean_token_accuracy": 0.8817667260766029,
839
+ "num_tokens": 618906.0,
840
+ "step": 830
841
+ },
842
+ {
843
+ "entropy": 0.5494085047394037,
844
+ "epoch": 0.914534567229178,
845
+ "grad_norm": 0.7876263856887817,
846
+ "learning_rate": 1.1456016600482706e-05,
847
+ "loss": 0.5684,
848
+ "mean_token_accuracy": 0.8897502169013023,
849
+ "num_tokens": 626224.0,
850
+ "step": 840
851
+ },
852
+ {
853
+ "entropy": 0.5102550655603408,
854
+ "epoch": 0.9254218835057159,
855
+ "grad_norm": 0.577179491519928,
856
+ "learning_rate": 8.78687009891499e-06,
857
+ "loss": 0.5115,
858
+ "mean_token_accuracy": 0.8966878160834313,
859
+ "num_tokens": 633424.0,
860
+ "step": 850
861
+ },
862
+ {
863
+ "entropy": 0.5591466184705496,
864
+ "epoch": 0.9363091997822537,
865
+ "grad_norm": 0.5634007453918457,
866
+ "learning_rate": 6.465770235365404e-06,
867
+ "loss": 0.5585,
868
+ "mean_token_accuracy": 0.88973039239645,
869
+ "num_tokens": 640829.0,
870
+ "step": 860
871
+ },
872
+ {
873
+ "entropy": 0.5340237215161323,
874
+ "epoch": 0.9471965160587915,
875
+ "grad_norm": 0.47885215282440186,
876
+ "learning_rate": 4.496066126875531e-06,
877
+ "loss": 0.522,
878
+ "mean_token_accuracy": 0.8937937587499618,
879
+ "num_tokens": 648132.0,
880
+ "step": 870
881
+ },
882
+ {
883
+ "entropy": 0.5373509109020234,
884
+ "epoch": 0.9580838323353293,
885
+ "grad_norm": 0.45497098565101624,
886
+ "learning_rate": 2.8805998612418396e-06,
887
+ "loss": 0.5222,
888
+ "mean_token_accuracy": 0.8893009826540947,
889
+ "num_tokens": 655528.0,
890
+ "step": 880
891
+ },
892
+ {
893
+ "entropy": 0.6342795874923468,
894
+ "epoch": 0.9689711486118672,
895
+ "grad_norm": 0.5892038345336914,
896
+ "learning_rate": 1.6217023961647982e-06,
897
+ "loss": 0.6542,
898
+ "mean_token_accuracy": 0.8745187908411026,
899
+ "num_tokens": 663081.0,
900
+ "step": 890
901
+ },
902
+ {
903
+ "entropy": 0.594165425002575,
904
+ "epoch": 0.979858464888405,
905
+ "grad_norm": 0.4947717785835266,
906
+ "learning_rate": 7.211901959078004e-07,
907
+ "loss": 0.6075,
908
+ "mean_token_accuracy": 0.8763557627797127,
909
+ "num_tokens": 670555.0,
910
+ "step": 900
911
+ },
912
+ {
913
+ "entropy": 0.5458611365407705,
914
+ "epoch": 0.9907457811649428,
915
+ "grad_norm": 0.5046107769012451,
916
+ "learning_rate": 1.8036261031936784e-07,
917
+ "loss": 0.5612,
918
+ "mean_token_accuracy": 0.8900749757885933,
919
+ "num_tokens": 677920.0,
920
+ "step": 910
921
+ },
922
+ {
923
+ "epoch": 1.0,
924
+ "eval_entropy": 0.5564239460492192,
925
+ "eval_loss": 0.5503931641578674,
926
+ "eval_mean_token_accuracy": 0.8885072296289477,
927
+ "eval_num_tokens": 684102.0,
928
+ "eval_runtime": 353.9795,
929
+ "eval_samples_per_second": 1.155,
930
+ "eval_steps_per_second": 1.155,
931
+ "step": 919
932
+ }
933
+ ],
934
+ "logging_steps": 10,
935
+ "max_steps": 919,
936
+ "num_input_tokens_seen": 0,
937
+ "num_train_epochs": 1,
938
+ "save_steps": 500,
939
+ "stateful_callbacks": {
940
+ "TrainerControl": {
941
+ "args": {
942
+ "should_epoch_stop": false,
943
+ "should_evaluate": false,
944
+ "should_log": false,
945
+ "should_save": true,
946
+ "should_training_stop": true
947
+ },
948
+ "attributes": {}
949
+ }
950
+ },
951
+ "total_flos": 1.5456460396345344e+16,
952
+ "train_batch_size": 1,
953
+ "trial_name": null,
954
+ "trial_params": null
955
+ }
checkpoint-919/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7320aac5597ff43ceb7a64adab1b8e32ac8f821a4d32f503b6ad887aa2b70fc5
3
+ size 6225
checkpoint-919/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
crypto-thera-adapters/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3-4B",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 128,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 64,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "q_proj",
33
+ "down_proj",
34
+ "o_proj",
35
+ "k_proj",
36
+ "up_proj",
37
+ "v_proj",
38
+ "gate_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
crypto-thera-adapters/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c56f62f45a8ab6ac856e3e9f2349d60cc62c28c8c7e8a9d8135adaa254dad027
3
+ size 528550256
crypto-thera-adapters/loss_plots.png ADDED

Git LFS Details

  • SHA256: 1773ae86bb8cd35af9e3a34d017d0cb6ed36219f8a160636bf173d4a4cda7420
  • Pointer size: 131 Bytes
  • Size of remote file: 198 kB
crypto-thera-adapters/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7320aac5597ff43ceb7a64adab1b8e32ac8f821a4d32f503b6ad887aa2b70fc5
3
+ size 6225
crypto-thera-adapters/training_history.json ADDED
@@ -0,0 +1,932 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "loss": 3.3033,
4
+ "grad_norm": 2.907423257827759,
5
+ "learning_rate": 4.891304347826087e-05,
6
+ "entropy": 1.4115246817469598,
7
+ "num_tokens": 7489.0,
8
+ "mean_token_accuracy": 0.5003452189266682,
9
+ "epoch": 0.010887316276537834,
10
+ "step": 10
11
+ },
12
+ {
13
+ "loss": 1.0891,
14
+ "grad_norm": 1.0035202503204346,
15
+ "learning_rate": 0.0001032608695652174,
16
+ "entropy": 1.1823123581707478,
17
+ "num_tokens": 14981.0,
18
+ "mean_token_accuracy": 0.807713833451271,
19
+ "epoch": 0.021774632553075667,
20
+ "step": 20
21
+ },
22
+ {
23
+ "loss": 0.7185,
24
+ "grad_norm": 0.7734199166297913,
25
+ "learning_rate": 0.0001576086956521739,
26
+ "entropy": 0.7195636250078679,
27
+ "num_tokens": 22399.0,
28
+ "mean_token_accuracy": 0.8713902071118355,
29
+ "epoch": 0.0326619488296135,
30
+ "step": 30
31
+ },
32
+ {
33
+ "loss": 0.6231,
34
+ "grad_norm": 0.737375020980835,
35
+ "learning_rate": 0.00021195652173913043,
36
+ "entropy": 0.598898995667696,
37
+ "num_tokens": 29741.0,
38
+ "mean_token_accuracy": 0.8840597704052925,
39
+ "epoch": 0.043549265106151334,
40
+ "step": 40
41
+ },
42
+ {
43
+ "loss": 0.7017,
44
+ "grad_norm": 0.77037113904953,
45
+ "learning_rate": 0.000266304347826087,
46
+ "entropy": 0.6763634454458952,
47
+ "num_tokens": 37112.0,
48
+ "mean_token_accuracy": 0.8767582342028618,
49
+ "epoch": 0.05443658138268917,
50
+ "step": 50
51
+ },
52
+ {
53
+ "loss": 0.7214,
54
+ "grad_norm": 0.9717927575111389,
55
+ "learning_rate": 0.00032065217391304346,
56
+ "entropy": 0.6615333516150713,
57
+ "num_tokens": 44590.0,
58
+ "mean_token_accuracy": 0.8725218087434768,
59
+ "epoch": 0.065323897659227,
60
+ "step": 60
61
+ },
62
+ {
63
+ "loss": 0.6938,
64
+ "grad_norm": 0.6530428528785706,
65
+ "learning_rate": 0.000375,
66
+ "entropy": 0.665872010961175,
67
+ "num_tokens": 52209.0,
68
+ "mean_token_accuracy": 0.8677900999784469,
69
+ "epoch": 0.07621121393576484,
70
+ "step": 70
71
+ },
72
+ {
73
+ "loss": 0.6937,
74
+ "grad_norm": 0.8795536756515503,
75
+ "learning_rate": 0.00042934782608695655,
76
+ "entropy": 0.6730144165456295,
77
+ "num_tokens": 59627.0,
78
+ "mean_token_accuracy": 0.8736036345362663,
79
+ "epoch": 0.08709853021230267,
80
+ "step": 80
81
+ },
82
+ {
83
+ "loss": 0.6051,
84
+ "grad_norm": 2.4545137882232666,
85
+ "learning_rate": 0.00048369565217391304,
86
+ "entropy": 0.5524990629404783,
87
+ "num_tokens": 66878.0,
88
+ "mean_token_accuracy": 0.8858248308300972,
89
+ "epoch": 0.0979858464888405,
90
+ "step": 90
91
+ },
92
+ {
93
+ "loss": 0.7764,
94
+ "grad_norm": 1.7287325859069824,
95
+ "learning_rate": 0.0004999116169004186,
96
+ "entropy": 0.7578814126551151,
97
+ "num_tokens": 74378.0,
98
+ "mean_token_accuracy": 0.8569089323282242,
99
+ "epoch": 0.10887316276537834,
100
+ "step": 100
101
+ },
102
+ {
103
+ "loss": 0.7017,
104
+ "grad_norm": 0.6661782264709473,
105
+ "learning_rate": 0.0004994788705196884,
106
+ "entropy": 0.6519227813929319,
107
+ "num_tokens": 81971.0,
108
+ "mean_token_accuracy": 0.8656885161995888,
109
+ "epoch": 0.11976047904191617,
110
+ "step": 110
111
+ },
112
+ {
113
+ "loss": 0.6964,
114
+ "grad_norm": 0.5872151255607605,
115
+ "learning_rate": 0.0004986861508565064,
116
+ "entropy": 0.6764750100672245,
117
+ "num_tokens": 89448.0,
118
+ "mean_token_accuracy": 0.867901885509491,
119
+ "epoch": 0.130647795318454,
120
+ "step": 120
121
+ },
122
+ {
123
+ "loss": 0.6559,
124
+ "grad_norm": 0.569556474685669,
125
+ "learning_rate": 0.0004975346017267744,
126
+ "entropy": 0.6099378645420075,
127
+ "num_tokens": 96736.0,
128
+ "mean_token_accuracy": 0.877436301112175,
129
+ "epoch": 0.14153511159499182,
130
+ "step": 130
131
+ },
132
+ {
133
+ "loss": 0.7803,
134
+ "grad_norm": 1.407532811164856,
135
+ "learning_rate": 0.000496025884701748,
136
+ "entropy": 0.7155832894146442,
137
+ "num_tokens": 104199.0,
138
+ "mean_token_accuracy": 0.861001954972744,
139
+ "epoch": 0.15242242787152968,
140
+ "step": 140
141
+ },
142
+ {
143
+ "loss": 0.66,
144
+ "grad_norm": 0.7547012567520142,
145
+ "learning_rate": 0.0004941621767105542,
146
+ "entropy": 0.6367012947797775,
147
+ "num_tokens": 111648.0,
148
+ "mean_token_accuracy": 0.8748047932982445,
149
+ "epoch": 0.1633097441480675,
150
+ "step": 150
151
+ },
152
+ {
153
+ "loss": 0.7214,
154
+ "grad_norm": 6.324690341949463,
155
+ "learning_rate": 0.0004919461668990982,
156
+ "entropy": 0.6732052929699421,
157
+ "num_tokens": 119010.0,
158
+ "mean_token_accuracy": 0.869445288181305,
159
+ "epoch": 0.17419706042460534,
160
+ "step": 160
161
+ },
162
+ {
163
+ "loss": 0.7486,
164
+ "grad_norm": 0.6715346574783325,
165
+ "learning_rate": 0.0004893810527498928,
166
+ "entropy": 0.7065097156912088,
167
+ "num_tokens": 126525.0,
168
+ "mean_token_accuracy": 0.867114982008934,
169
+ "epoch": 0.18508437670114317,
170
+ "step": 170
171
+ },
172
+ {
173
+ "loss": 0.6745,
174
+ "grad_norm": 0.8438016176223755,
175
+ "learning_rate": 0.0004864705354684076,
176
+ "entropy": 0.6536071257665753,
177
+ "num_tokens": 133957.0,
178
+ "mean_token_accuracy": 0.8712003126740455,
179
+ "epoch": 0.195971692977681,
180
+ "step": 180
181
+ },
182
+ {
183
+ "loss": 0.8311,
184
+ "grad_norm": 11.108195304870605,
185
+ "learning_rate": 0.00048321881464259676,
186
+ "entropy": 0.7786130860447884,
187
+ "num_tokens": 141635.0,
188
+ "mean_token_accuracy": 0.8524066478013992,
189
+ "epoch": 0.20685900925421882,
190
+ "step": 190
191
+ },
192
+ {
193
+ "loss": 1.1536,
194
+ "grad_norm": 0.9511222243309021,
195
+ "learning_rate": 0.0004796305821833098,
196
+ "entropy": 0.966428604722023,
197
+ "num_tokens": 148904.0,
198
+ "mean_token_accuracy": 0.8078579772263765,
199
+ "epoch": 0.21774632553075668,
200
+ "step": 200
201
+ },
202
+ {
203
+ "loss": 0.7879,
204
+ "grad_norm": 1.1783875226974487,
205
+ "learning_rate": 0.00047571101555432896,
206
+ "entropy": 0.7515945039689541,
207
+ "num_tokens": 156485.0,
208
+ "mean_token_accuracy": 0.8544724628329277,
209
+ "epoch": 0.2286336418072945,
210
+ "step": 210
211
+ },
212
+ {
213
+ "loss": 0.8526,
214
+ "grad_norm": 6.818761348724365,
215
+ "learning_rate": 0.0004714657703018024,
216
+ "entropy": 0.7837981097400188,
217
+ "num_tokens": 164052.0,
218
+ "mean_token_accuracy": 0.8500533938407898,
219
+ "epoch": 0.23952095808383234,
220
+ "step": 220
221
+ },
222
+ {
223
+ "loss": 4.5265,
224
+ "grad_norm": 30.283111572265625,
225
+ "learning_rate": 0.0004669009718938517,
226
+ "entropy": 2.9163815125823023,
227
+ "num_tokens": 171458.0,
228
+ "mean_token_accuracy": 0.5338523894548416,
229
+ "epoch": 0.25040827436037016,
230
+ "step": 230
231
+ },
232
+ {
233
+ "loss": 2.5513,
234
+ "grad_norm": 115.25823211669922,
235
+ "learning_rate": 0.00046202320688212834,
236
+ "entropy": 2.0303648129105567,
237
+ "num_tokens": 179120.0,
238
+ "mean_token_accuracy": 0.6328982267528772,
239
+ "epoch": 0.261295590636908,
240
+ "step": 240
241
+ },
242
+ {
243
+ "loss": 0.7502,
244
+ "grad_norm": 8.124970436096191,
245
+ "learning_rate": 0.00045683951339807265,
246
+ "entropy": 0.7287540748715401,
247
+ "num_tokens": 186544.0,
248
+ "mean_token_accuracy": 0.8621942937374115,
249
+ "epoch": 0.2721829069134458,
250
+ "step": 250
251
+ },
252
+ {
253
+ "loss": 0.6533,
254
+ "grad_norm": 0.8240286111831665,
255
+ "learning_rate": 0.0004513573709975877,
256
+ "entropy": 0.6197790112346411,
257
+ "num_tokens": 193880.0,
258
+ "mean_token_accuracy": 0.8799639001488686,
259
+ "epoch": 0.28307022318998365,
260
+ "step": 260
261
+ },
262
+ {
263
+ "loss": 0.6144,
264
+ "grad_norm": 0.7374333143234253,
265
+ "learning_rate": 0.0004455846898687814,
266
+ "entropy": 0.5914675913751125,
267
+ "num_tokens": 201144.0,
268
+ "mean_token_accuracy": 0.8865744516253471,
269
+ "epoch": 0.2939575394665215,
270
+ "step": 270
271
+ },
272
+ {
273
+ "loss": 0.7368,
274
+ "grad_norm": 0.5616167783737183,
275
+ "learning_rate": 0.00043952979941834925,
276
+ "entropy": 0.6894511558115483,
277
+ "num_tokens": 208571.0,
278
+ "mean_token_accuracy": 0.869646517932415,
279
+ "epoch": 0.30484485574305936,
280
+ "step": 280
281
+ },
282
+ {
283
+ "loss": 0.7601,
284
+ "grad_norm": 0.9096492528915405,
285
+ "learning_rate": 0.0004332014362530659,
286
+ "entropy": 0.7011830236762762,
287
+ "num_tokens": 216262.0,
288
+ "mean_token_accuracy": 0.8600849106907844,
289
+ "epoch": 0.3157321720195972,
290
+ "step": 290
291
+ },
292
+ {
293
+ "loss": 0.6919,
294
+ "grad_norm": 0.798920214176178,
295
+ "learning_rate": 0.00042660873157372763,
296
+ "entropy": 0.6535980202257633,
297
+ "num_tokens": 223604.0,
298
+ "mean_token_accuracy": 0.8731523841619492,
299
+ "epoch": 0.326619488296135,
300
+ "step": 300
301
+ },
302
+ {
303
+ "loss": 0.7312,
304
+ "grad_norm": 0.7000783085823059,
305
+ "learning_rate": 0.00041976119799973477,
306
+ "entropy": 0.7168508902192116,
307
+ "num_tokens": 231179.0,
308
+ "mean_token_accuracy": 0.862776193022728,
309
+ "epoch": 0.33750680457267285,
310
+ "step": 310
311
+ },
312
+ {
313
+ "loss": 0.5784,
314
+ "grad_norm": 0.6315221190452576,
315
+ "learning_rate": 0.00041266871584332454,
316
+ "entropy": 0.542249171063304,
317
+ "num_tokens": 238379.0,
318
+ "mean_token_accuracy": 0.8910727813839913,
319
+ "epoch": 0.3483941208492107,
320
+ "step": 320
321
+ },
322
+ {
323
+ "loss": 0.7976,
324
+ "grad_norm": 0.7601750493049622,
325
+ "learning_rate": 0.0004053415188532599,
326
+ "entropy": 0.7523471737280488,
327
+ "num_tokens": 246037.0,
328
+ "mean_token_accuracy": 0.8552964687347412,
329
+ "epoch": 0.3592814371257485,
330
+ "step": 330
331
+ },
332
+ {
333
+ "loss": 0.6718,
334
+ "grad_norm": 0.7975150346755981,
335
+ "learning_rate": 0.0003977901794485446,
336
+ "entropy": 0.6234626328572631,
337
+ "num_tokens": 253413.0,
338
+ "mean_token_accuracy": 0.8762789338827133,
339
+ "epoch": 0.37016875340228633,
340
+ "step": 340
341
+ },
342
+ {
343
+ "loss": 0.6134,
344
+ "grad_norm": 0.6521063446998596,
345
+ "learning_rate": 0.0003900255934634699,
346
+ "entropy": 0.6043397862464189,
347
+ "num_tokens": 260806.0,
348
+ "mean_token_accuracy": 0.8838247537612915,
349
+ "epoch": 0.38105606967882416,
350
+ "step": 350
351
+ },
352
+ {
353
+ "loss": 0.7149,
354
+ "grad_norm": 0.9141950607299805,
355
+ "learning_rate": 0.0003820589644260065,
356
+ "entropy": 0.6836169838905335,
357
+ "num_tokens": 268395.0,
358
+ "mean_token_accuracy": 0.8627639308571815,
359
+ "epoch": 0.391943385955362,
360
+ "step": 360
361
+ },
362
+ {
363
+ "loss": 0.749,
364
+ "grad_norm": 0.6039382815361023,
365
+ "learning_rate": 0.00037390178739222363,
366
+ "entropy": 0.6906204871833325,
367
+ "num_tokens": 276103.0,
368
+ "mean_token_accuracy": 0.8611163705587387,
369
+ "epoch": 0.4028307022318998,
370
+ "step": 370
371
+ },
372
+ {
373
+ "loss": 0.7435,
374
+ "grad_norm": 0.7205569744110107,
375
+ "learning_rate": 0.00036556583236006237,
376
+ "entropy": 0.7154507145285607,
377
+ "num_tokens": 283734.0,
378
+ "mean_token_accuracy": 0.8620465710759163,
379
+ "epoch": 0.41371801850843765,
380
+ "step": 380
381
+ },
382
+ {
383
+ "loss": 0.638,
384
+ "grad_norm": 0.7158589959144592,
385
+ "learning_rate": 0.0003570631272863956,
386
+ "entropy": 0.6251850917935371,
387
+ "num_tokens": 291104.0,
388
+ "mean_token_accuracy": 0.8804232597351074,
389
+ "epoch": 0.42460533478497553,
390
+ "step": 390
391
+ },
392
+ {
393
+ "loss": 0.7033,
394
+ "grad_norm": 0.6197627186775208,
395
+ "learning_rate": 0.0003484059407318781,
396
+ "entropy": 0.655794395133853,
397
+ "num_tokens": 298667.0,
398
+ "mean_token_accuracy": 0.8641186743974686,
399
+ "epoch": 0.43549265106151336,
400
+ "step": 400
401
+ },
402
+ {
403
+ "loss": 0.724,
404
+ "grad_norm": 0.6667740345001221,
405
+ "learning_rate": 0.00033960676415863015,
406
+ "entropy": 0.6755085363984108,
407
+ "num_tokens": 306246.0,
408
+ "mean_token_accuracy": 0.8660084888339042,
409
+ "epoch": 0.4463799673380512,
410
+ "step": 410
411
+ },
412
+ {
413
+ "loss": 0.6793,
414
+ "grad_norm": 0.7817707061767578,
415
+ "learning_rate": 0.00033067829390629453,
416
+ "entropy": 0.6676596872508526,
417
+ "num_tokens": 313691.0,
418
+ "mean_token_accuracy": 0.8715122014284133,
419
+ "epoch": 0.457267283614589,
420
+ "step": 420
421
+ },
422
+ {
423
+ "loss": 0.7552,
424
+ "grad_norm": 1.0613240003585815,
425
+ "learning_rate": 0.00032163341287247876,
426
+ "entropy": 0.7171908929944039,
427
+ "num_tokens": 321394.0,
428
+ "mean_token_accuracy": 0.8618355020880699,
429
+ "epoch": 0.46815459989112684,
430
+ "step": 430
431
+ },
432
+ {
433
+ "loss": 0.5586,
434
+ "grad_norm": 0.649142324924469,
435
+ "learning_rate": 0.00031248517192400876,
436
+ "entropy": 0.5122752044349909,
437
+ "num_tokens": 328807.0,
438
+ "mean_token_accuracy": 0.8886241987347603,
439
+ "epoch": 0.47904191616766467,
440
+ "step": 440
441
+ },
442
+ {
443
+ "loss": 0.5397,
444
+ "grad_norm": 0.5830965042114258,
445
+ "learning_rate": 0.0003032467710658231,
446
+ "entropy": 0.5436007279902697,
447
+ "num_tokens": 336117.0,
448
+ "mean_token_accuracy": 0.8886899709701538,
449
+ "epoch": 0.4899292324442025,
450
+ "step": 450
451
+ },
452
+ {
453
+ "loss": 0.6844,
454
+ "grad_norm": 0.6612179279327393,
455
+ "learning_rate": 0.0002939315403946733,
456
+ "entropy": 0.6272627621889114,
457
+ "num_tokens": 343690.0,
458
+ "mean_token_accuracy": 0.8671970278024673,
459
+ "epoch": 0.5008165487207403,
460
+ "step": 460
461
+ },
462
+ {
463
+ "loss": 0.6428,
464
+ "grad_norm": 0.6228363513946533,
465
+ "learning_rate": 0.0002845529208651161,
466
+ "entropy": 0.6421366423368454,
467
+ "num_tokens": 351025.0,
468
+ "mean_token_accuracy": 0.8806748360395431,
469
+ "epoch": 0.5117038649972782,
470
+ "step": 470
471
+ },
472
+ {
473
+ "loss": 0.6107,
474
+ "grad_norm": 0.6702244877815247,
475
+ "learning_rate": 0.00027512444489554767,
476
+ "entropy": 0.5570888858288526,
477
+ "num_tokens": 358280.0,
478
+ "mean_token_accuracy": 0.8815102204680443,
479
+ "epoch": 0.522591181273816,
480
+ "step": 480
481
+ },
482
+ {
483
+ "loss": 0.5648,
484
+ "grad_norm": 0.6265514492988586,
485
+ "learning_rate": 0.00026565971684226573,
486
+ "entropy": 0.5678568474948407,
487
+ "num_tokens": 365676.0,
488
+ "mean_token_accuracy": 0.8863577455282211,
489
+ "epoch": 0.5334784975503538,
490
+ "step": 490
491
+ },
492
+ {
493
+ "loss": 0.6452,
494
+ "grad_norm": 0.8042871356010437,
495
+ "learning_rate": 0.0002561723933697317,
496
+ "entropy": 0.5953685358166695,
497
+ "num_tokens": 373045.0,
498
+ "mean_token_accuracy": 0.8779568284749985,
499
+ "epoch": 0.5443658138268916,
500
+ "step": 500
501
+ },
502
+ {
503
+ "loss": 0.7605,
504
+ "grad_norm": 0.65378737449646,
505
+ "learning_rate": 0.0002466761637453568,
506
+ "entropy": 0.7206135954707861,
507
+ "num_tokens": 380560.0,
508
+ "mean_token_accuracy": 0.8624323174357414,
509
+ "epoch": 0.5552531301034295,
510
+ "step": 510
511
+ },
512
+ {
513
+ "loss": 0.6431,
514
+ "grad_norm": 0.5593022108078003,
515
+ "learning_rate": 0.00023718473008724742,
516
+ "entropy": 0.6116405792534352,
517
+ "num_tokens": 388027.0,
518
+ "mean_token_accuracy": 0.8734307438135147,
519
+ "epoch": 0.5661404463799673,
520
+ "step": 520
521
+ },
522
+ {
523
+ "loss": 0.6989,
524
+ "grad_norm": 0.6043410301208496,
525
+ "learning_rate": 0.00022771178759340514,
526
+ "entropy": 0.6847221277654171,
527
+ "num_tokens": 395471.0,
528
+ "mean_token_accuracy": 0.8697591915726661,
529
+ "epoch": 0.5770277626565051,
530
+ "step": 530
531
+ },
532
+ {
533
+ "loss": 0.6722,
534
+ "grad_norm": 0.5856935381889343,
535
+ "learning_rate": 0.00021827100478091506,
536
+ "entropy": 0.6371712744235992,
537
+ "num_tokens": 403108.0,
538
+ "mean_token_accuracy": 0.8688437402248382,
539
+ "epoch": 0.587915078933043,
540
+ "step": 540
541
+ },
542
+ {
543
+ "loss": 0.572,
544
+ "grad_norm": 0.5797805190086365,
545
+ "learning_rate": 0.00020887600376362904,
546
+ "entropy": 0.5488773200660944,
547
+ "num_tokens": 410452.0,
548
+ "mean_token_accuracy": 0.8839038833975792,
549
+ "epoch": 0.5988023952095808,
550
+ "step": 550
551
+ },
552
+ {
553
+ "loss": 0.5912,
554
+ "grad_norm": 0.521022379398346,
555
+ "learning_rate": 0.00019954034059680668,
556
+ "entropy": 0.5808871898800134,
557
+ "num_tokens": 417828.0,
558
+ "mean_token_accuracy": 0.888222835958004,
559
+ "epoch": 0.6096897114861187,
560
+ "step": 560
561
+ },
562
+ {
563
+ "loss": 0.569,
564
+ "grad_norm": 0.5325660705566406,
565
+ "learning_rate": 0.00019027748571707066,
566
+ "entropy": 0.5592609565705061,
567
+ "num_tokens": 425205.0,
568
+ "mean_token_accuracy": 0.8819878950715065,
569
+ "epoch": 0.6205770277626566,
570
+ "step": 570
571
+ },
572
+ {
573
+ "loss": 0.6577,
574
+ "grad_norm": 0.5417932271957397,
575
+ "learning_rate": 0.00018110080450590182,
576
+ "entropy": 0.6147103808820248,
577
+ "num_tokens": 432668.0,
578
+ "mean_token_accuracy": 0.8764070853590965,
579
+ "epoch": 0.6314643440391944,
580
+ "step": 580
581
+ },
582
+ {
583
+ "loss": 0.6491,
584
+ "grad_norm": 0.6512264013290405,
585
+ "learning_rate": 0.0001720235380047188,
586
+ "entropy": 0.6328225396573544,
587
+ "num_tokens": 440161.0,
588
+ "mean_token_accuracy": 0.8737886667251586,
589
+ "epoch": 0.6423516603157322,
590
+ "step": 590
591
+ },
592
+ {
593
+ "loss": 0.6251,
594
+ "grad_norm": 0.6788283586502075,
595
+ "learning_rate": 0.00016305878380936723,
596
+ "entropy": 0.6266105823218823,
597
+ "num_tokens": 447470.0,
598
+ "mean_token_accuracy": 0.8808962866663933,
599
+ "epoch": 0.65323897659227,
600
+ "step": 600
601
+ },
602
+ {
603
+ "loss": 0.5736,
604
+ "grad_norm": 0.6711626052856445,
605
+ "learning_rate": 0.00015421947717158752,
606
+ "entropy": 0.5399745622649789,
607
+ "num_tokens": 454744.0,
608
+ "mean_token_accuracy": 0.8923148453235626,
609
+ "epoch": 0.6641262928688079,
610
+ "step": 610
611
+ },
612
+ {
613
+ "loss": 0.7435,
614
+ "grad_norm": 0.755664587020874,
615
+ "learning_rate": 0.00014551837233472853,
616
+ "entropy": 0.695448774844408,
617
+ "num_tokens": 462432.0,
618
+ "mean_token_accuracy": 0.8619298219680787,
619
+ "epoch": 0.6750136091453457,
620
+ "step": 620
621
+ },
622
+ {
623
+ "loss": 0.5643,
624
+ "grad_norm": 0.6359855532646179,
625
+ "learning_rate": 0.0001369680241306384,
626
+ "entropy": 0.5408242929726839,
627
+ "num_tokens": 469763.0,
628
+ "mean_token_accuracy": 0.8882203832268715,
629
+ "epoch": 0.6859009254218835,
630
+ "step": 630
631
+ },
632
+ {
633
+ "loss": 0.5929,
634
+ "grad_norm": 0.6010624170303345,
635
+ "learning_rate": 0.00012858076986428722,
636
+ "entropy": 0.5984043713659049,
637
+ "num_tokens": 477149.0,
638
+ "mean_token_accuracy": 0.8816081374883652,
639
+ "epoch": 0.6967882416984214,
640
+ "step": 640
641
+ },
642
+ {
643
+ "loss": 0.6258,
644
+ "grad_norm": 0.6069923043251038,
645
+ "learning_rate": 0.00012036871151225798,
646
+ "entropy": 0.5971009206026793,
647
+ "num_tokens": 484640.0,
648
+ "mean_token_accuracy": 0.8779259011149406,
649
+ "epoch": 0.7076755579749592,
650
+ "step": 650
651
+ },
652
+ {
653
+ "loss": 0.613,
654
+ "grad_norm": 0.5725326538085938,
655
+ "learning_rate": 0.00011234369826079432,
656
+ "entropy": 0.6068212665617466,
657
+ "num_tokens": 492176.0,
658
+ "mean_token_accuracy": 0.8772263854742051,
659
+ "epoch": 0.718562874251497,
660
+ "step": 660
661
+ },
662
+ {
663
+ "loss": 0.6395,
664
+ "grad_norm": 0.7555631995201111,
665
+ "learning_rate": 0.00010451730940859949,
666
+ "entropy": 0.6018822252750397,
667
+ "num_tokens": 499845.0,
668
+ "mean_token_accuracy": 0.8763651743531227,
669
+ "epoch": 0.7294501905280348,
670
+ "step": 670
671
+ },
672
+ {
673
+ "loss": 0.6319,
674
+ "grad_norm": 0.5494154691696167,
675
+ "learning_rate": 9.690083765905544e-05,
676
+ "entropy": 0.5979844883084298,
677
+ "num_tokens": 507418.0,
678
+ "mean_token_accuracy": 0.8793143942952156,
679
+ "epoch": 0.7403375068045727,
680
+ "step": 680
681
+ },
682
+ {
683
+ "loss": 0.5712,
684
+ "grad_norm": 0.7554852366447449,
685
+ "learning_rate": 8.950527282597156e-05,
686
+ "entropy": 0.5443667802959681,
687
+ "num_tokens": 514785.0,
688
+ "mean_token_accuracy": 0.8878106832504272,
689
+ "epoch": 0.7512248230811105,
690
+ "step": 690
691
+ },
692
+ {
693
+ "loss": 0.5349,
694
+ "grad_norm": 0.5492348670959473,
695
+ "learning_rate": 8.234128597637239e-05,
696
+ "entropy": 0.5350296102464199,
697
+ "num_tokens": 522076.0,
698
+ "mean_token_accuracy": 0.8928583487868309,
699
+ "epoch": 0.7621121393576483,
700
+ "step": 700
701
+ },
702
+ {
703
+ "loss": 0.6173,
704
+ "grad_norm": 0.6712722778320312,
705
+ "learning_rate": 7.541921403320593e-05,
706
+ "entropy": 0.5867601800709963,
707
+ "num_tokens": 529578.0,
708
+ "mean_token_accuracy": 0.8806560948491097,
709
+ "epoch": 0.7729994556341862,
710
+ "step": 710
711
+ },
712
+ {
713
+ "loss": 0.6322,
714
+ "grad_norm": 0.5191476345062256,
715
+ "learning_rate": 6.874904486018821e-05,
716
+ "entropy": 0.6274564698338508,
717
+ "num_tokens": 537024.0,
718
+ "mean_token_accuracy": 0.8802066639065742,
719
+ "epoch": 0.783886771910724,
720
+ "step": 720
721
+ },
722
+ {
723
+ "loss": 0.5631,
724
+ "grad_norm": 0.6473787426948547,
725
+ "learning_rate": 6.234040285030551e-05,
726
+ "entropy": 0.5485220493748784,
727
+ "num_tokens": 544467.0,
728
+ "mean_token_accuracy": 0.884115393459797,
729
+ "epoch": 0.7947740881872618,
730
+ "step": 730
731
+ },
732
+ {
733
+ "loss": 0.544,
734
+ "grad_norm": 0.6537692546844482,
735
+ "learning_rate": 5.6202535038770045e-05,
736
+ "entropy": 0.5133912313729525,
737
+ "num_tokens": 551834.0,
738
+ "mean_token_accuracy": 0.8923841789364815,
739
+ "epoch": 0.8056614044637996,
740
+ "step": 740
741
+ },
742
+ {
743
+ "loss": 0.5861,
744
+ "grad_norm": 0.6818355321884155,
745
+ "learning_rate": 5.0344297760463954e-05,
746
+ "entropy": 0.5843172324821353,
747
+ "num_tokens": 559289.0,
748
+ "mean_token_accuracy": 0.8821831315755844,
749
+ "epoch": 0.8165487207403375,
750
+ "step": 750
751
+ },
752
+ {
753
+ "loss": 0.6023,
754
+ "grad_norm": 0.7315685153007507,
755
+ "learning_rate": 4.477414387112652e-05,
756
+ "entropy": 0.5654132578521966,
757
+ "num_tokens": 566780.0,
758
+ "mean_token_accuracy": 0.8810448855161667,
759
+ "epoch": 0.8274360370168753,
760
+ "step": 760
761
+ },
762
+ {
763
+ "loss": 0.5637,
764
+ "grad_norm": 0.6280286312103271,
765
+ "learning_rate": 3.950011055072039e-05,
766
+ "entropy": 0.5563108414411545,
767
+ "num_tokens": 574206.0,
768
+ "mean_token_accuracy": 0.8888336777687073,
769
+ "epoch": 0.8383233532934131,
770
+ "step": 770
771
+ },
772
+ {
773
+ "loss": 0.6045,
774
+ "grad_norm": 0.4488978683948517,
775
+ "learning_rate": 3.4529807706578346e-05,
776
+ "entropy": 0.5699046881869435,
777
+ "num_tokens": 581608.0,
778
+ "mean_token_accuracy": 0.8825341418385506,
779
+ "epoch": 0.8492106695699511,
780
+ "step": 780
781
+ },
782
+ {
783
+ "loss": 0.6071,
784
+ "grad_norm": 0.7140914797782898,
785
+ "learning_rate": 2.987040699306076e-05,
786
+ "entropy": 0.5627471528947353,
787
+ "num_tokens": 589176.0,
788
+ "mean_token_accuracy": 0.8823000833392143,
789
+ "epoch": 0.8600979858464889,
790
+ "step": 790
791
+ },
792
+ {
793
+ "loss": 0.6048,
794
+ "grad_norm": 0.7391604781150818,
795
+ "learning_rate": 2.5528631463569348e-05,
796
+ "entropy": 0.5856846395879984,
797
+ "num_tokens": 596679.0,
798
+ "mean_token_accuracy": 0.8830034390091897,
799
+ "epoch": 0.8709853021230267,
800
+ "step": 800
801
+ },
802
+ {
803
+ "loss": 0.4758,
804
+ "grad_norm": 0.44131314754486084,
805
+ "learning_rate": 2.151074586984744e-05,
806
+ "entropy": 0.47006473541259763,
807
+ "num_tokens": 603956.0,
808
+ "mean_token_accuracy": 0.8987425476312637,
809
+ "epoch": 0.8818726183995645,
810
+ "step": 810
811
+ },
812
+ {
813
+ "loss": 0.5234,
814
+ "grad_norm": 0.9443516135215759,
815
+ "learning_rate": 1.7822547622564188e-05,
816
+ "entropy": 0.500870693475008,
817
+ "num_tokens": 611308.0,
818
+ "mean_token_accuracy": 0.8955065041780472,
819
+ "epoch": 0.8927599346761024,
820
+ "step": 820
821
+ },
822
+ {
823
+ "loss": 0.5874,
824
+ "grad_norm": 0.6435447931289673,
825
+ "learning_rate": 1.4469358426225682e-05,
826
+ "entropy": 0.5766935784369707,
827
+ "num_tokens": 618906.0,
828
+ "mean_token_accuracy": 0.8817667260766029,
829
+ "epoch": 0.9036472509526402,
830
+ "step": 830
831
+ },
832
+ {
833
+ "loss": 0.5684,
834
+ "grad_norm": 0.7876263856887817,
835
+ "learning_rate": 1.1456016600482706e-05,
836
+ "entropy": 0.5494085047394037,
837
+ "num_tokens": 626224.0,
838
+ "mean_token_accuracy": 0.8897502169013023,
839
+ "epoch": 0.914534567229178,
840
+ "step": 840
841
+ },
842
+ {
843
+ "loss": 0.5115,
844
+ "grad_norm": 0.577179491519928,
845
+ "learning_rate": 8.78687009891499e-06,
846
+ "entropy": 0.5102550655603408,
847
+ "num_tokens": 633424.0,
848
+ "mean_token_accuracy": 0.8966878160834313,
849
+ "epoch": 0.9254218835057159,
850
+ "step": 850
851
+ },
852
+ {
853
+ "loss": 0.5585,
854
+ "grad_norm": 0.5634007453918457,
855
+ "learning_rate": 6.465770235365404e-06,
856
+ "entropy": 0.5591466184705496,
857
+ "num_tokens": 640829.0,
858
+ "mean_token_accuracy": 0.88973039239645,
859
+ "epoch": 0.9363091997822537,
860
+ "step": 860
861
+ },
862
+ {
863
+ "loss": 0.522,
864
+ "grad_norm": 0.47885215282440186,
865
+ "learning_rate": 4.496066126875531e-06,
866
+ "entropy": 0.5340237215161323,
867
+ "num_tokens": 648132.0,
868
+ "mean_token_accuracy": 0.8937937587499618,
869
+ "epoch": 0.9471965160587915,
870
+ "step": 870
871
+ },
872
+ {
873
+ "loss": 0.5222,
874
+ "grad_norm": 0.45497098565101624,
875
+ "learning_rate": 2.8805998612418396e-06,
876
+ "entropy": 0.5373509109020234,
877
+ "num_tokens": 655528.0,
878
+ "mean_token_accuracy": 0.8893009826540947,
879
+ "epoch": 0.9580838323353293,
880
+ "step": 880
881
+ },
882
+ {
883
+ "loss": 0.6542,
884
+ "grad_norm": 0.5892038345336914,
885
+ "learning_rate": 1.6217023961647982e-06,
886
+ "entropy": 0.6342795874923468,
887
+ "num_tokens": 663081.0,
888
+ "mean_token_accuracy": 0.8745187908411026,
889
+ "epoch": 0.9689711486118672,
890
+ "step": 890
891
+ },
892
+ {
893
+ "loss": 0.6075,
894
+ "grad_norm": 0.4947717785835266,
895
+ "learning_rate": 7.211901959078004e-07,
896
+ "entropy": 0.594165425002575,
897
+ "num_tokens": 670555.0,
898
+ "mean_token_accuracy": 0.8763557627797127,
899
+ "epoch": 0.979858464888405,
900
+ "step": 900
901
+ },
902
+ {
903
+ "loss": 0.5612,
904
+ "grad_norm": 0.5046107769012451,
905
+ "learning_rate": 1.8036261031936784e-07,
906
+ "entropy": 0.5458611365407705,
907
+ "num_tokens": 677920.0,
908
+ "mean_token_accuracy": 0.8900749757885933,
909
+ "epoch": 0.9907457811649428,
910
+ "step": 910
911
+ },
912
+ {
913
+ "eval_loss": 0.5503931641578674,
914
+ "eval_runtime": 353.9795,
915
+ "eval_samples_per_second": 1.155,
916
+ "eval_steps_per_second": 1.155,
917
+ "eval_entropy": 0.5564239460492192,
918
+ "eval_num_tokens": 684102.0,
919
+ "eval_mean_token_accuracy": 0.8885072296289477,
920
+ "epoch": 1.0,
921
+ "step": 919
922
+ },
923
+ {
924
+ "train_runtime": 7276.3168,
925
+ "train_samples_per_second": 0.505,
926
+ "train_steps_per_second": 0.126,
927
+ "total_flos": 1.5456460396345344e+16,
928
+ "train_loss": 0.7488500804714332,
929
+ "epoch": 1.0,
930
+ "step": 919
931
+ }
932
+ ]