jeromeramos commited on
Commit
98e6ed7
·
verified ·
1 Parent(s): 104abb7

Model save

Browse files
README.md CHANGED
@@ -1,66 +1,58 @@
1
  ---
2
- library_name: transformers
3
- license: llama3.1
4
  base_model: meta-llama/Llama-3.1-8B
 
 
5
  tags:
 
6
  - trl
7
  - sft
8
- - generated_from_trainer
9
- model-index:
10
- - name: inter-play-sim-assistant-sft
11
- results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
- # inter-play-sim-assistant-sft
18
 
19
- This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) on an unknown dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 0.7087
22
 
23
- ## Model description
24
 
25
- More information needed
 
26
 
27
- ## Intended uses & limitations
 
 
 
 
28
 
29
- More information needed
30
-
31
- ## Training and evaluation data
32
 
33
- More information needed
34
 
35
- ## Training procedure
36
 
37
- ### Training hyperparameters
38
 
39
- The following hyperparameters were used during training:
40
- - learning_rate: 0.0002
41
- - train_batch_size: 4
42
- - eval_batch_size: 4
43
- - seed: 42
44
- - distributed_type: multi-GPU
45
- - num_devices: 4
46
- - gradient_accumulation_steps: 4
47
- - total_train_batch_size: 64
48
- - total_eval_batch_size: 16
49
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
50
- - lr_scheduler_type: cosine
51
- - lr_scheduler_warmup_ratio: 0.1
52
- - num_epochs: 1
53
 
54
- ### Training results
 
 
 
 
55
 
56
- | Training Loss | Epoch | Step | Validation Loss |
57
- |:-------------:|:-----:|:----:|:---------------:|
58
- | 0.6807 | 1.0 | 723 | 0.7087 |
59
 
60
 
61
- ### Framework versions
62
 
63
- - Transformers 4.45.2
64
- - Pytorch 2.4.1.post302
65
- - Datasets 3.0.1
66
- - Tokenizers 0.20.1
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: meta-llama/Llama-3.1-8B
3
+ library_name: transformers
4
+ model_name: inter-play-sim-assistant-sft
5
  tags:
6
+ - generated_from_trainer
7
  - trl
8
  - sft
9
+ licence: license
 
 
 
10
  ---
11
 
12
+ # Model Card for inter-play-sim-assistant-sft
 
 
 
13
 
14
+ This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
 
16
 
17
+ ## Quick start
18
 
19
+ ```python
20
+ from transformers import pipeline
21
 
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="Sim4Rec/inter-play-sim-assistant-sft", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
26
+ ```
27
 
28
+ ## Training procedure
 
 
29
 
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/jerome-ramos-20/huggingface/runs/rdaw49f9)
31
 
 
32
 
33
+ This model was trained with SFT.
34
 
35
+ ### Framework versions
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
+ - TRL: 0.14.0
38
+ - Transformers: 4.48.2
39
+ - Pytorch: 2.5.1
40
+ - Datasets: 3.0.1
41
+ - Tokenizers: 0.21.0
42
 
43
+ ## Citations
 
 
44
 
45
 
 
46
 
47
+ Cite TRL as:
48
+
49
+ ```bibtex
50
+ @misc{vonwerra2022trl,
51
+ title = {{TRL: Transformer Reinforcement Learning}},
52
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
53
+ year = 2020,
54
+ journal = {GitHub repository},
55
+ publisher = {GitHub},
56
+ howpublished = {\url{https://github.com/huggingface/trl}}
57
+ }
58
+ ```
all_results.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
- "epoch": 1.0,
3
  "eval_loss": 0.7086742520332336,
4
  "eval_runtime": 98.2909,
5
  "eval_samples": 2071,
6
  "eval_samples_per_second": 46.902,
7
  "eval_steps_per_second": 2.94,
8
- "total_flos": 1.702143589888295e+18,
9
- "train_loss": 0.914938143180119,
10
- "train_runtime": 4445.7565,
11
  "train_samples": 46269,
12
- "train_samples_per_second": 10.407,
13
- "train_steps_per_second": 0.163
14
  }
 
1
  {
2
+ "epoch": 0.9986168741355463,
3
  "eval_loss": 0.7086742520332336,
4
  "eval_runtime": 98.2909,
5
  "eval_samples": 2071,
6
  "eval_samples_per_second": 46.902,
7
  "eval_steps_per_second": 2.94,
8
+ "total_flos": 1.74045731487744e+18,
9
+ "train_loss": 0.8234024724801822,
10
+ "train_runtime": 2385.6161,
11
  "train_samples": 46269,
12
+ "train_samples_per_second": 19.395,
13
+ "train_steps_per_second": 0.151
14
  }
config.json CHANGED
@@ -31,7 +31,7 @@
31
  "rope_theta": 500000.0,
32
  "tie_word_embeddings": false,
33
  "torch_dtype": "bfloat16",
34
- "transformers_version": "4.45.2",
35
  "use_cache": false,
36
  "vocab_size": 128320
37
  }
 
31
  "rope_theta": 500000.0,
32
  "tie_word_embeddings": false,
33
  "torch_dtype": "bfloat16",
34
+ "transformers_version": "4.48.2",
35
  "use_cache": false,
36
  "vocab_size": 128320
37
  }
generation_config.json CHANGED
@@ -5,5 +5,5 @@
5
  "eos_token_id": 128001,
6
  "temperature": 0.6,
7
  "top_p": 0.9,
8
- "transformers_version": "4.45.2"
9
  }
 
5
  "eos_token_id": 128001,
6
  "temperature": 0.6,
7
  "top_p": 0.9,
8
+ "transformers_version": "4.48.2"
9
  }
model-00001-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f0d1acf0ba6a3f1ba28ba59a3541d28b6951d70f2f46c044d4216ad79c4e6568
3
  size 4977222960
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96a3f25cdf50508cedb9141a645a1b95248e26d20ef5fe3d2de30857075f9ee2
3
  size 4977222960
model-00002-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:94e01be8ae8791d7dd12ad345f799cdbd8762e20c18bbbc34b3810dab3fe08de
3
  size 4999802720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:675a20a1c8cb7ef8957fa7e3549f80d43b42e7bb023aa7c1a6c3b159e495bc67
3
  size 4999802720
model-00003-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f2342c0554a4a6147d4b0cbb7d9102db5213f1a98ff1f94100f5392e9e4babca
3
  size 4915916176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc06566be8c2f403026d162424f82153270a6a0d04b0b40e6e14ad4c2ea5332c
3
  size 4915916176
model-00004-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e726e4714bc8147f1489557764a74b0681c9fc1f0cf2aa91bf3b9b8f7ae6756a
3
  size 1168663096
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2986a57de29725fc6544bc073fd543bfb7c0fe517bc6ef006cfebc4c12bbb8e5
3
  size 1168663096
runs/Feb02_21-16-11_w-jerom-inter-play-sim-94c6890b9ccf44ea86f033a3db8a5dbd-54ksrw6/events.out.tfevents.1738531247.w-jerom-inter-play-sim-94c6890b9ccf44ea86f033a3db8a5dbd-54ksrw6.52190.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:658dbb9c74cd9a9ba070279f36a501ac7caf4ca720e92179e4c3996f6fb59659
3
+ size 21809
special_tokens_map.json CHANGED
@@ -1,41 +1,4 @@
1
  {
2
- "additional_special_tokens": [
3
- {
4
- "content": "<response>",
5
- "lstrip": false,
6
- "normalized": false,
7
- "rstrip": false,
8
- "single_word": false
9
- },
10
- {
11
- "content": "</response>",
12
- "lstrip": false,
13
- "normalized": false,
14
- "rstrip": false,
15
- "single_word": false
16
- },
17
- {
18
- "content": "<answer>",
19
- "lstrip": false,
20
- "normalized": false,
21
- "rstrip": false,
22
- "single_word": false
23
- },
24
- {
25
- "content": "</answer>",
26
- "lstrip": false,
27
- "normalized": false,
28
- "rstrip": false,
29
- "single_word": false
30
- },
31
- {
32
- "content": "<inquire>",
33
- "lstrip": false,
34
- "normalized": false,
35
- "rstrip": false,
36
- "single_word": false
37
- }
38
- ],
39
  "bos_token": {
40
  "content": "<|im_start|>",
41
  "lstrip": false,
 
1
  {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  "bos_token": {
3
  "content": "<|im_start|>",
4
  "lstrip": false,
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3919c1e7bfa558ff525a618a3d463929a238acaba668d7ef6da432fcd6cd7fad
3
- size 17211327
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:635e16753749bb3465bdf9e00f68e8b29c9e4884d9ee55eb27705bd8f1318cf4
3
+ size 17210395
tokenizer_config.json CHANGED
@@ -2063,59 +2063,13 @@
2063
  "rstrip": false,
2064
  "single_word": false,
2065
  "special": true
2066
- },
2067
- "128258": {
2068
- "content": "<response>",
2069
- "lstrip": false,
2070
- "normalized": false,
2071
- "rstrip": false,
2072
- "single_word": false,
2073
- "special": true
2074
- },
2075
- "128259": {
2076
- "content": "</response>",
2077
- "lstrip": false,
2078
- "normalized": false,
2079
- "rstrip": false,
2080
- "single_word": false,
2081
- "special": true
2082
- },
2083
- "128260": {
2084
- "content": "<answer>",
2085
- "lstrip": false,
2086
- "normalized": false,
2087
- "rstrip": false,
2088
- "single_word": false,
2089
- "special": true
2090
- },
2091
- "128261": {
2092
- "content": "</answer>",
2093
- "lstrip": false,
2094
- "normalized": false,
2095
- "rstrip": false,
2096
- "single_word": false,
2097
- "special": true
2098
- },
2099
- "128262": {
2100
- "content": "<inquire>",
2101
- "lstrip": false,
2102
- "normalized": false,
2103
- "rstrip": false,
2104
- "single_word": false,
2105
- "special": true
2106
  }
2107
  },
2108
- "additional_special_tokens": [
2109
- "<response>",
2110
- "</response>",
2111
- "<answer>",
2112
- "</answer>",
2113
- "<inquire>"
2114
- ],
2115
  "bos_token": "<|im_start|>",
2116
  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
2117
  "clean_up_tokenization_spaces": true,
2118
  "eos_token": "<|im_end|>",
 
2119
  "model_input_names": [
2120
  "input_ids",
2121
  "attention_mask"
 
2063
  "rstrip": false,
2064
  "single_word": false,
2065
  "special": true
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2066
  }
2067
  },
 
 
 
 
 
 
 
2068
  "bos_token": "<|im_start|>",
2069
  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
2070
  "clean_up_tokenization_spaces": true,
2071
  "eos_token": "<|im_end|>",
2072
+ "extra_special_tokens": {},
2073
  "model_input_names": [
2074
  "input_ids",
2075
  "attention_mask"
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 1.0,
3
- "total_flos": 1.702143589888295e+18,
4
- "train_loss": 0.914938143180119,
5
- "train_runtime": 4445.7565,
6
  "train_samples": 46269,
7
- "train_samples_per_second": 10.407,
8
- "train_steps_per_second": 0.163
9
  }
 
1
  {
2
+ "epoch": 0.9986168741355463,
3
+ "total_flos": 1.74045731487744e+18,
4
+ "train_loss": 0.8234024724801822,
5
+ "train_runtime": 2385.6161,
6
  "train_samples": 46269,
7
+ "train_samples_per_second": 19.395,
8
+ "train_steps_per_second": 0.151
9
  }
trainer_state.json CHANGED
@@ -1,1048 +1,544 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 1.0,
5
  "eval_steps": 500,
6
- "global_step": 723,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.0013831258644536654,
13
- "grad_norm": 362.72589111328125,
14
- "learning_rate": 2.7397260273972604e-06,
15
- "loss": 4.592,
16
  "step": 1
17
  },
18
  {
19
- "epoch": 0.006915629322268326,
20
- "grad_norm": 43.32975769042969,
21
- "learning_rate": 1.3698630136986302e-05,
22
- "loss": 3.1459,
23
  "step": 5
24
  },
25
  {
26
- "epoch": 0.013831258644536652,
27
- "grad_norm": 4.805922985076904,
28
- "learning_rate": 2.7397260273972603e-05,
29
- "loss": 1.1293,
30
  "step": 10
31
  },
32
  {
33
- "epoch": 0.02074688796680498,
34
- "grad_norm": 12.635961532592773,
35
- "learning_rate": 4.1095890410958905e-05,
36
- "loss": 1.0413,
37
  "step": 15
38
  },
39
  {
40
- "epoch": 0.027662517289073305,
41
- "grad_norm": 4.329101085662842,
42
- "learning_rate": 5.479452054794521e-05,
43
- "loss": 0.9687,
44
  "step": 20
45
  },
46
  {
47
- "epoch": 0.034578146611341634,
48
- "grad_norm": 2.8100547790527344,
49
- "learning_rate": 6.84931506849315e-05,
50
- "loss": 0.9551,
51
  "step": 25
52
  },
53
  {
54
- "epoch": 0.04149377593360996,
55
- "grad_norm": 2.5719969272613525,
56
- "learning_rate": 8.219178082191781e-05,
57
- "loss": 0.9471,
58
  "step": 30
59
  },
60
  {
61
- "epoch": 0.048409405255878286,
62
- "grad_norm": 2.6822149753570557,
63
- "learning_rate": 9.58904109589041e-05,
64
- "loss": 0.9569,
65
  "step": 35
66
  },
67
  {
68
- "epoch": 0.05532503457814661,
69
- "grad_norm": 2.2626094818115234,
70
- "learning_rate": 0.00010958904109589041,
71
- "loss": 0.9722,
72
  "step": 40
73
  },
74
  {
75
- "epoch": 0.06224066390041494,
76
- "grad_norm": 2.315629482269287,
77
- "learning_rate": 0.0001232876712328767,
78
- "loss": 1.029,
79
  "step": 45
80
  },
81
  {
82
- "epoch": 0.06915629322268327,
83
- "grad_norm": 1.7225898504257202,
84
- "learning_rate": 0.000136986301369863,
85
- "loss": 1.0222,
86
  "step": 50
87
  },
88
  {
89
- "epoch": 0.07607192254495158,
90
- "grad_norm": 1.9998998641967773,
91
- "learning_rate": 0.00015068493150684933,
92
- "loss": 1.0334,
93
  "step": 55
94
  },
95
  {
96
- "epoch": 0.08298755186721991,
97
- "grad_norm": 2.4757347106933594,
98
- "learning_rate": 0.00016438356164383562,
99
- "loss": 1.0415,
100
  "step": 60
101
  },
102
  {
103
- "epoch": 0.08990318118948824,
104
- "grad_norm": 1.6731195449829102,
105
- "learning_rate": 0.00017808219178082192,
106
- "loss": 1.1052,
107
  "step": 65
108
  },
109
  {
110
- "epoch": 0.09681881051175657,
111
- "grad_norm": 2.4862051010131836,
112
- "learning_rate": 0.0001917808219178082,
113
- "loss": 1.0613,
114
  "step": 70
115
  },
116
  {
117
- "epoch": 0.1037344398340249,
118
- "grad_norm": 15.797516822814941,
119
- "learning_rate": 0.0001999953280342959,
120
- "loss": 1.2723,
121
  "step": 75
122
  },
123
  {
124
- "epoch": 0.11065006915629322,
125
- "grad_norm": 103.66837310791016,
126
- "learning_rate": 0.00019994277343344518,
127
- "loss": 2.2148,
128
  "step": 80
129
  },
130
  {
131
- "epoch": 0.11756569847856155,
132
- "grad_norm": 17.954513549804688,
133
- "learning_rate": 0.0001998318550673364,
134
- "loss": 2.1056,
135
  "step": 85
136
  },
137
  {
138
- "epoch": 0.12448132780082988,
139
- "grad_norm": 11.641583442687988,
140
- "learning_rate": 0.00019966263770917193,
141
- "loss": 3.8091,
142
  "step": 90
143
  },
144
  {
145
- "epoch": 0.1313969571230982,
146
- "grad_norm": 3.0865869522094727,
147
- "learning_rate": 0.00019943522017712358,
148
- "loss": 1.4319,
149
  "step": 95
150
  },
151
  {
152
- "epoch": 0.13831258644536654,
153
- "grad_norm": 2.2934648990631104,
154
- "learning_rate": 0.000199149735276626,
155
- "loss": 1.1962,
156
  "step": 100
157
  },
158
  {
159
- "epoch": 0.14522821576763487,
160
- "grad_norm": 2.6889796257019043,
161
- "learning_rate": 0.00019880634972282166,
162
- "loss": 1.1448,
163
  "step": 105
164
  },
165
  {
166
- "epoch": 0.15214384508990317,
167
- "grad_norm": 1.613039493560791,
168
- "learning_rate": 0.00019840526404320415,
169
- "loss": 1.1176,
170
  "step": 110
171
  },
172
  {
173
- "epoch": 0.1590594744121715,
174
- "grad_norm": 1.7039101123809814,
175
- "learning_rate": 0.0001979467124605156,
176
- "loss": 1.1048,
177
  "step": 115
178
  },
179
  {
180
- "epoch": 0.16597510373443983,
181
- "grad_norm": 1.4721240997314453,
182
- "learning_rate": 0.00019743096275596735,
183
- "loss": 1.0652,
184
  "step": 120
185
  },
186
  {
187
- "epoch": 0.17289073305670816,
188
- "grad_norm": 1.2751952409744263,
189
- "learning_rate": 0.0001968583161128631,
190
- "loss": 1.0753,
191
  "step": 125
192
  },
193
  {
194
- "epoch": 0.1798063623789765,
195
- "grad_norm": 1.4763237237930298,
196
- "learning_rate": 0.00019622910694071656,
197
- "loss": 1.0534,
198
  "step": 130
199
  },
200
  {
201
- "epoch": 0.18672199170124482,
202
- "grad_norm": 1.4396852254867554,
203
- "learning_rate": 0.00019554370267996538,
204
- "loss": 1.0363,
205
  "step": 135
206
  },
207
  {
208
- "epoch": 0.19363762102351315,
209
- "grad_norm": 1.4692487716674805,
210
- "learning_rate": 0.00019480250358739663,
211
- "loss": 1.0164,
212
  "step": 140
213
  },
214
  {
215
- "epoch": 0.20055325034578148,
216
- "grad_norm": 1.7275824546813965,
217
- "learning_rate": 0.00019400594250240798,
218
- "loss": 1.0782,
219
  "step": 145
220
  },
221
  {
222
- "epoch": 0.2074688796680498,
223
- "grad_norm": 1.3491764068603516,
224
- "learning_rate": 0.0001931544845942415,
225
- "loss": 1.017,
226
  "step": 150
227
  },
228
  {
229
- "epoch": 0.2143845089903181,
230
- "grad_norm": 1.069191336631775,
231
- "learning_rate": 0.00019224862709033824,
232
- "loss": 1.0312,
233
  "step": 155
234
  },
235
  {
236
- "epoch": 0.22130013831258644,
237
- "grad_norm": 1.2669651508331299,
238
- "learning_rate": 0.00019128889898597116,
239
- "loss": 1.06,
240
  "step": 160
241
  },
242
  {
243
- "epoch": 0.22821576763485477,
244
- "grad_norm": 1.5793681144714355,
245
- "learning_rate": 0.0001902758607353269,
246
- "loss": 0.9996,
247
  "step": 165
248
  },
249
  {
250
- "epoch": 0.2351313969571231,
251
- "grad_norm": 1.3170143365859985,
252
- "learning_rate": 0.00018921010392421628,
253
- "loss": 1.0259,
254
  "step": 170
255
  },
256
  {
257
- "epoch": 0.24204702627939143,
258
- "grad_norm": 1.088996171951294,
259
- "learning_rate": 0.00018809225092460488,
260
- "loss": 1.0145,
261
  "step": 175
262
  },
263
  {
264
- "epoch": 0.24896265560165975,
265
- "grad_norm": 1.0915374755859375,
266
- "learning_rate": 0.0001869229545311653,
267
- "loss": 1.004,
268
  "step": 180
269
  },
270
  {
271
- "epoch": 0.25587828492392806,
272
- "grad_norm": 0.9946417808532715,
273
- "learning_rate": 0.00018570289758006346,
274
- "loss": 0.9957,
275
  "step": 185
276
  },
277
  {
278
- "epoch": 0.2627939142461964,
279
- "grad_norm": 1.0990344285964966,
280
- "learning_rate": 0.00018443279255020152,
281
- "loss": 0.9922,
282
  "step": 190
283
  },
284
  {
285
- "epoch": 0.2697095435684647,
286
- "grad_norm": 1.0007325410842896,
287
- "learning_rate": 0.0001831133811471503,
288
- "loss": 0.9955,
289
  "step": 195
290
  },
291
  {
292
- "epoch": 0.2766251728907331,
293
- "grad_norm": 0.9499465823173523,
294
- "learning_rate": 0.000181745433870014,
295
- "loss": 0.9934,
296
  "step": 200
297
  },
298
  {
299
- "epoch": 0.2835408022130014,
300
- "grad_norm": 0.8894681930541992,
301
- "learning_rate": 0.00018032974956148063,
302
- "loss": 0.9824,
303
  "step": 205
304
  },
305
  {
306
- "epoch": 0.29045643153526973,
307
- "grad_norm": 0.9037023782730103,
308
- "learning_rate": 0.00017886715494132006,
309
- "loss": 0.9829,
310
  "step": 210
311
  },
312
  {
313
- "epoch": 0.29737206085753803,
314
- "grad_norm": 0.8227784633636475,
315
- "learning_rate": 0.00017735850412360331,
316
- "loss": 0.9855,
317
  "step": 215
318
  },
319
  {
320
- "epoch": 0.30428769017980634,
321
- "grad_norm": 0.7213776707649231,
322
- "learning_rate": 0.0001758046781179237,
323
- "loss": 0.9818,
324
  "step": 220
325
  },
326
  {
327
- "epoch": 0.3112033195020747,
328
- "grad_norm": 0.7871569991111755,
329
- "learning_rate": 0.00017420658431491223,
330
- "loss": 0.9537,
331
  "step": 225
332
  },
333
  {
334
- "epoch": 0.318118948824343,
335
- "grad_norm": 0.7829176783561707,
336
- "learning_rate": 0.0001725651559563469,
337
- "loss": 0.9476,
338
  "step": 230
339
  },
340
  {
341
- "epoch": 0.32503457814661135,
342
- "grad_norm": 0.7467004060745239,
343
- "learning_rate": 0.00017088135159016584,
344
- "loss": 0.9544,
345
  "step": 235
346
  },
347
  {
348
- "epoch": 0.33195020746887965,
349
- "grad_norm": 0.8612602353096008,
350
- "learning_rate": 0.00016915615451070233,
351
- "loss": 0.9651,
352
  "step": 240
353
  },
354
  {
355
- "epoch": 0.338865836791148,
356
- "grad_norm": 0.817294716835022,
357
- "learning_rate": 0.0001673905721844686,
358
- "loss": 0.9291,
359
  "step": 245
360
  },
361
  {
362
- "epoch": 0.3457814661134163,
363
- "grad_norm": 0.7505643963813782,
364
- "learning_rate": 0.00016558563566182363,
365
- "loss": 0.9078,
366
  "step": 250
367
  },
368
  {
369
- "epoch": 0.35269709543568467,
370
- "grad_norm": 0.8095275163650513,
371
- "learning_rate": 0.000163742398974869,
372
- "loss": 0.9343,
373
  "step": 255
374
  },
375
  {
376
- "epoch": 0.359612724757953,
377
- "grad_norm": 0.7329360842704773,
378
- "learning_rate": 0.00016186193852192355,
379
- "loss": 0.9177,
380
  "step": 260
381
  },
382
  {
383
- "epoch": 0.3665283540802213,
384
- "grad_norm": 0.7928723096847534,
385
- "learning_rate": 0.0001599453524389374,
386
- "loss": 0.9402,
387
  "step": 265
388
  },
389
  {
390
- "epoch": 0.37344398340248963,
391
- "grad_norm": 0.7580089569091797,
392
- "learning_rate": 0.00015799375995821118,
393
- "loss": 0.9128,
394
  "step": 270
395
  },
396
  {
397
- "epoch": 0.38035961272475793,
398
- "grad_norm": 0.6633639335632324,
399
- "learning_rate": 0.00015600830075479603,
400
- "loss": 0.9083,
401
  "step": 275
402
  },
403
  {
404
- "epoch": 0.3872752420470263,
405
- "grad_norm": 0.719149112701416,
406
- "learning_rate": 0.0001539901342809554,
407
- "loss": 0.9134,
408
  "step": 280
409
  },
410
  {
411
- "epoch": 0.3941908713692946,
412
- "grad_norm": 0.7551069259643555,
413
- "learning_rate": 0.00015194043908907775,
414
- "loss": 0.9187,
415
  "step": 285
416
  },
417
  {
418
- "epoch": 0.40110650069156295,
419
- "grad_norm": 0.7057808041572571,
420
- "learning_rate": 0.00014986041214343486,
421
- "loss": 0.9034,
422
  "step": 290
423
  },
424
  {
425
- "epoch": 0.40802213001383125,
426
- "grad_norm": 0.720129668712616,
427
- "learning_rate": 0.00014775126812118864,
428
- "loss": 0.8919,
429
  "step": 295
430
  },
431
  {
432
- "epoch": 0.4149377593360996,
433
- "grad_norm": 0.6026068925857544,
434
- "learning_rate": 0.00014561423870305382,
435
- "loss": 0.8808,
436
  "step": 300
437
  },
438
  {
439
- "epoch": 0.4218533886583679,
440
- "grad_norm": 0.6884504556655884,
441
- "learning_rate": 0.000143450571854031,
442
- "loss": 0.8889,
443
  "step": 305
444
  },
445
  {
446
- "epoch": 0.4287690179806362,
447
- "grad_norm": 0.7061929702758789,
448
- "learning_rate": 0.00014126153109463024,
449
- "loss": 0.8767,
450
  "step": 310
451
  },
452
  {
453
- "epoch": 0.43568464730290457,
454
- "grad_norm": 0.6553164124488831,
455
- "learning_rate": 0.0001390483947630109,
456
- "loss": 0.8512,
457
  "step": 315
458
  },
459
  {
460
- "epoch": 0.4426002766251729,
461
- "grad_norm": 0.6340067982673645,
462
- "learning_rate": 0.00013681245526846783,
463
- "loss": 0.8621,
464
  "step": 320
465
  },
466
  {
467
- "epoch": 0.44951590594744123,
468
- "grad_norm": 0.7011162042617798,
469
- "learning_rate": 0.00013455501833670088,
470
- "loss": 0.863,
471
  "step": 325
472
  },
473
  {
474
- "epoch": 0.45643153526970953,
475
- "grad_norm": 0.6404575109481812,
476
- "learning_rate": 0.00013227740224730798,
477
- "loss": 0.8574,
478
  "step": 330
479
  },
480
  {
481
- "epoch": 0.4633471645919779,
482
- "grad_norm": 0.5631649494171143,
483
- "learning_rate": 0.00012998093706394675,
484
- "loss": 0.8621,
485
  "step": 335
486
  },
487
  {
488
- "epoch": 0.4702627939142462,
489
- "grad_norm": 0.6324531435966492,
490
- "learning_rate": 0.00012766696385761494,
491
- "loss": 0.8459,
492
  "step": 340
493
  },
494
  {
495
- "epoch": 0.47717842323651455,
496
- "grad_norm": 0.6023428440093994,
497
- "learning_rate": 0.00012533683392350263,
498
- "loss": 0.8534,
499
  "step": 345
500
  },
501
  {
502
- "epoch": 0.48409405255878285,
503
- "grad_norm": 0.5649986863136292,
504
- "learning_rate": 0.00012299190799187405,
505
- "loss": 0.8396,
506
  "step": 350
507
  },
508
  {
509
- "epoch": 0.49100968188105115,
510
- "grad_norm": 0.6606884598731995,
511
- "learning_rate": 0.00012063355543343924,
512
- "loss": 0.8505,
513
  "step": 355
514
  },
515
  {
516
- "epoch": 0.4979253112033195,
517
- "grad_norm": 0.6517143845558167,
518
- "learning_rate": 0.00011826315345968013,
519
- "loss": 0.8152,
520
  "step": 360
521
  },
522
  {
523
- "epoch": 0.5048409405255878,
524
- "grad_norm": 0.5879771113395691,
525
- "learning_rate": 0.00011588208631859807,
526
- "loss": 0.8375,
527
- "step": 365
528
- },
529
- {
530
- "epoch": 0.5117565698478561,
531
- "grad_norm": 0.5463613867759705,
532
- "learning_rate": 0.00011349174448635158,
533
- "loss": 0.8307,
534
- "step": 370
535
- },
536
- {
537
- "epoch": 0.5186721991701245,
538
- "grad_norm": 0.6212437748908997,
539
- "learning_rate": 0.00011109352385525783,
540
- "loss": 0.8303,
541
- "step": 375
542
- },
543
- {
544
- "epoch": 0.5255878284923928,
545
- "grad_norm": 0.5765424370765686,
546
- "learning_rate": 0.00010868882491863049,
547
- "loss": 0.8311,
548
- "step": 380
549
- },
550
- {
551
- "epoch": 0.5325034578146611,
552
- "grad_norm": 0.5644758343696594,
553
- "learning_rate": 0.00010627905195293135,
554
- "loss": 0.8251,
555
- "step": 385
556
- },
557
- {
558
- "epoch": 0.5394190871369294,
559
- "grad_norm": 0.5946635603904724,
560
- "learning_rate": 0.00010386561219771222,
561
- "loss": 0.8157,
562
- "step": 390
563
- },
564
- {
565
- "epoch": 0.5463347164591977,
566
- "grad_norm": 0.5891283750534058,
567
- "learning_rate": 0.00010144991503382674,
568
- "loss": 0.811,
569
- "step": 395
570
- },
571
- {
572
- "epoch": 0.5532503457814661,
573
- "grad_norm": 0.5650286674499512,
574
- "learning_rate": 9.903337116039171e-05,
575
- "loss": 0.7895,
576
- "step": 400
577
- },
578
- {
579
- "epoch": 0.5601659751037344,
580
- "grad_norm": 0.5457018613815308,
581
- "learning_rate": 9.661739177097836e-05,
582
- "loss": 0.7975,
583
- "step": 405
584
- },
585
- {
586
- "epoch": 0.5670816044260027,
587
- "grad_norm": 0.54831463098526,
588
- "learning_rate": 9.420338772951521e-05,
589
- "loss": 0.806,
590
- "step": 410
591
- },
592
- {
593
- "epoch": 0.573997233748271,
594
- "grad_norm": 0.5567086935043335,
595
- "learning_rate": 9.179276874638315e-05,
596
- "loss": 0.8009,
597
- "step": 415
598
- },
599
- {
600
- "epoch": 0.5809128630705395,
601
- "grad_norm": 0.5590771436691284,
602
- "learning_rate": 8.938694255518444e-05,
603
- "loss": 0.7919,
604
- "step": 420
605
- },
606
- {
607
- "epoch": 0.5878284923928078,
608
- "grad_norm": 0.5756837725639343,
609
- "learning_rate": 8.698731409066568e-05,
610
- "loss": 0.7923,
611
- "step": 425
612
- },
613
- {
614
- "epoch": 0.5947441217150761,
615
- "grad_norm": 0.5342572331428528,
616
- "learning_rate": 8.459528466827575e-05,
617
- "loss": 0.8009,
618
- "step": 430
619
- },
620
- {
621
- "epoch": 0.6016597510373444,
622
- "grad_norm": 0.5078967213630676,
623
- "learning_rate": 8.221225116583678e-05,
624
- "loss": 0.7832,
625
- "step": 435
626
- },
627
- {
628
- "epoch": 0.6085753803596127,
629
- "grad_norm": 0.5687820911407471,
630
- "learning_rate": 7.98396052078071e-05,
631
- "loss": 0.7867,
632
- "step": 440
633
- },
634
- {
635
- "epoch": 0.6154910096818811,
636
- "grad_norm": 0.563842236995697,
637
- "learning_rate": 7.747873235261157e-05,
638
- "loss": 0.7876,
639
- "step": 445
640
- },
641
- {
642
- "epoch": 0.6224066390041494,
643
- "grad_norm": 0.5250119566917419,
644
- "learning_rate": 7.513101128351454e-05,
645
- "loss": 0.7883,
646
- "step": 450
647
- },
648
- {
649
- "epoch": 0.6293222683264177,
650
- "grad_norm": 0.4736279845237732,
651
- "learning_rate": 7.279781300350758e-05,
652
- "loss": 0.7733,
653
- "step": 455
654
- },
655
- {
656
- "epoch": 0.636237897648686,
657
- "grad_norm": 0.5302676558494568,
658
- "learning_rate": 7.048050003468251e-05,
659
- "loss": 0.7777,
660
- "step": 460
661
- },
662
- {
663
- "epoch": 0.6431535269709544,
664
- "grad_norm": 0.5099808573722839,
665
- "learning_rate": 6.81804256225567e-05,
666
- "loss": 0.7903,
667
- "step": 465
668
- },
669
- {
670
- "epoch": 0.6500691562932227,
671
- "grad_norm": 0.5259727239608765,
672
- "learning_rate": 6.58989329458158e-05,
673
- "loss": 0.7643,
674
- "step": 470
675
- },
676
- {
677
- "epoch": 0.656984785615491,
678
- "grad_norm": 0.48697277903556824,
679
- "learning_rate": 6.36373543319353e-05,
680
- "loss": 0.761,
681
- "step": 475
682
- },
683
- {
684
- "epoch": 0.6639004149377593,
685
- "grad_norm": 0.5124850869178772,
686
- "learning_rate": 6.139701047913885e-05,
687
- "loss": 0.7603,
688
- "step": 480
689
- },
690
- {
691
- "epoch": 0.6708160442600276,
692
- "grad_norm": 0.49750426411628723,
693
- "learning_rate": 5.917920968514752e-05,
694
- "loss": 0.7555,
695
- "step": 485
696
- },
697
- {
698
- "epoch": 0.677731673582296,
699
- "grad_norm": 0.5086675882339478,
700
- "learning_rate": 5.698524708317081e-05,
701
- "loss": 0.7618,
702
- "step": 490
703
- },
704
- {
705
- "epoch": 0.6846473029045643,
706
- "grad_norm": 0.5031773447990417,
707
- "learning_rate": 5.481640388558551e-05,
708
- "loss": 0.742,
709
- "step": 495
710
- },
711
- {
712
- "epoch": 0.6915629322268326,
713
- "grad_norm": 0.5332823991775513,
714
- "learning_rate": 5.267394663574351e-05,
715
- "loss": 0.7524,
716
- "step": 500
717
- },
718
- {
719
- "epoch": 0.6984785615491009,
720
- "grad_norm": 0.5489112138748169,
721
- "learning_rate": 5.055912646834635e-05,
722
- "loss": 0.73,
723
- "step": 505
724
  },
725
  {
726
- "epoch": 0.7053941908713693,
727
- "grad_norm": 0.5065959692001343,
728
- "learning_rate": 4.8473178378817564e-05,
729
- "loss": 0.749,
730
- "step": 510
731
- },
732
- {
733
- "epoch": 0.7123098201936376,
734
- "grad_norm": 0.49869057536125183,
735
- "learning_rate": 4.6417320502100316e-05,
736
- "loss": 0.7426,
737
- "step": 515
738
- },
739
- {
740
- "epoch": 0.719225449515906,
741
- "grad_norm": 0.5105318427085876,
742
- "learning_rate": 4.439275340130099e-05,
743
- "loss": 0.7551,
744
- "step": 520
745
- },
746
- {
747
- "epoch": 0.7261410788381742,
748
- "grad_norm": 0.4851531982421875,
749
- "learning_rate": 4.240065936659374e-05,
750
- "loss": 0.7333,
751
- "step": 525
752
- },
753
- {
754
- "epoch": 0.7330567081604425,
755
- "grad_norm": 0.5122374892234802,
756
- "learning_rate": 4.044220172479675e-05,
757
- "loss": 0.753,
758
- "step": 530
759
- },
760
- {
761
- "epoch": 0.739972337482711,
762
- "grad_norm": 0.4910212755203247,
763
- "learning_rate": 3.851852416002187e-05,
764
- "loss": 0.7309,
765
- "step": 535
766
- },
767
- {
768
- "epoch": 0.7468879668049793,
769
- "grad_norm": 0.47112396359443665,
770
- "learning_rate": 3.663075004579547e-05,
771
- "loss": 0.747,
772
- "step": 540
773
- },
774
- {
775
- "epoch": 0.7538035961272476,
776
- "grad_norm": 0.5120583176612854,
777
- "learning_rate": 3.477998178903982e-05,
778
- "loss": 0.7317,
779
- "step": 545
780
- },
781
- {
782
- "epoch": 0.7607192254495159,
783
- "grad_norm": 0.47287434339523315,
784
- "learning_rate": 3.296730018629846e-05,
785
- "loss": 0.7124,
786
- "step": 550
787
- },
788
- {
789
- "epoch": 0.7676348547717843,
790
- "grad_norm": 0.46755021810531616,
791
- "learning_rate": 3.11937637925816e-05,
792
- "loss": 0.7215,
793
- "step": 555
794
- },
795
- {
796
- "epoch": 0.7745504840940526,
797
- "grad_norm": 0.4925783574581146,
798
- "learning_rate": 2.9460408303199694e-05,
799
- "loss": 0.732,
800
- "step": 560
801
- },
802
- {
803
- "epoch": 0.7814661134163209,
804
- "grad_norm": 0.4654233455657959,
805
- "learning_rate": 2.7768245948946612e-05,
806
- "loss": 0.7157,
807
- "step": 565
808
- },
809
- {
810
- "epoch": 0.7883817427385892,
811
- "grad_norm": 0.471231609582901,
812
- "learning_rate": 2.61182649049853e-05,
813
- "loss": 0.7193,
814
- "step": 570
815
- },
816
- {
817
- "epoch": 0.7952973720608575,
818
- "grad_norm": 0.4691479802131653,
819
- "learning_rate": 2.4511428713781238e-05,
820
- "loss": 0.7268,
821
- "step": 575
822
- },
823
- {
824
- "epoch": 0.8022130013831259,
825
- "grad_norm": 0.46052515506744385,
826
- "learning_rate": 2.2948675722421086e-05,
827
- "loss": 0.707,
828
- "step": 580
829
- },
830
- {
831
- "epoch": 0.8091286307053942,
832
- "grad_norm": 0.4800238609313965,
833
- "learning_rate": 2.1430918534643996e-05,
834
- "loss": 0.7119,
835
- "step": 585
836
- },
837
- {
838
- "epoch": 0.8160442600276625,
839
- "grad_norm": 0.44981294870376587,
840
- "learning_rate": 1.9959043477907e-05,
841
- "loss": 0.7031,
842
- "step": 590
843
- },
844
- {
845
- "epoch": 0.8229598893499308,
846
- "grad_norm": 0.4723599851131439,
847
- "learning_rate": 1.8533910085794713e-05,
848
- "loss": 0.7131,
849
- "step": 595
850
- },
851
- {
852
- "epoch": 0.8298755186721992,
853
- "grad_norm": 0.4526391327381134,
854
- "learning_rate": 1.7156350596075744e-05,
855
- "loss": 0.7028,
856
- "step": 600
857
- },
858
- {
859
- "epoch": 0.8367911479944675,
860
- "grad_norm": 0.4318162500858307,
861
- "learning_rate": 1.5827169464699576e-05,
862
- "loss": 0.7129,
863
- "step": 605
864
- },
865
- {
866
- "epoch": 0.8437067773167358,
867
- "grad_norm": 0.45468002557754517,
868
- "learning_rate": 1.4547142896016608e-05,
869
- "loss": 0.7003,
870
- "step": 610
871
- },
872
- {
873
- "epoch": 0.8506224066390041,
874
- "grad_norm": 0.4626847803592682,
875
- "learning_rate": 1.3317018389496927e-05,
876
- "loss": 0.7148,
877
- "step": 615
878
- },
879
- {
880
- "epoch": 0.8575380359612724,
881
- "grad_norm": 0.4578290581703186,
882
- "learning_rate": 1.2137514303211561e-05,
883
- "loss": 0.7,
884
- "step": 620
885
- },
886
- {
887
- "epoch": 0.8644536652835408,
888
- "grad_norm": 0.4639127254486084,
889
- "learning_rate": 1.1009319434331622e-05,
890
- "loss": 0.7035,
891
- "step": 625
892
- },
893
- {
894
- "epoch": 0.8713692946058091,
895
- "grad_norm": 0.45225322246551514,
896
- "learning_rate": 9.93309261689015e-06,
897
- "loss": 0.6892,
898
- "step": 630
899
- },
900
- {
901
- "epoch": 0.8782849239280774,
902
- "grad_norm": 0.5041568279266357,
903
- "learning_rate": 8.909462337041507e-06,
904
- "loss": 0.6957,
905
- "step": 635
906
- },
907
- {
908
- "epoch": 0.8852005532503457,
909
- "grad_norm": 0.4375840127468109,
910
- "learning_rate": 7.939026366043322e-06,
911
- "loss": 0.698,
912
- "step": 640
913
- },
914
- {
915
- "epoch": 0.8921161825726142,
916
- "grad_norm": 0.44402968883514404,
917
- "learning_rate": 7.022351411174866e-06,
918
- "loss": 0.6981,
919
- "step": 645
920
- },
921
- {
922
- "epoch": 0.8990318118948825,
923
- "grad_norm": 0.4330901801586151,
924
- "learning_rate": 6.1599727847957975e-06,
925
- "loss": 0.6907,
926
- "step": 650
927
- },
928
- {
929
- "epoch": 0.9059474412171508,
930
- "grad_norm": 0.4889258146286011,
931
- "learning_rate": 5.3523940917390215e-06,
932
- "loss": 0.6822,
933
- "step": 655
934
- },
935
- {
936
- "epoch": 0.9128630705394191,
937
- "grad_norm": 0.45514872670173645,
938
- "learning_rate": 4.600086935219561e-06,
939
- "loss": 0.6885,
940
- "step": 660
941
- },
942
- {
943
- "epoch": 0.9197786998616874,
944
- "grad_norm": 0.4331250488758087,
945
- "learning_rate": 3.903490641431573e-06,
946
- "loss": 0.6758,
947
- "step": 665
948
- },
949
- {
950
- "epoch": 0.9266943291839558,
951
- "grad_norm": 0.4574739336967468,
952
- "learning_rate": 3.2630120029942037e-06,
953
- "loss": 0.6851,
954
- "step": 670
955
- },
956
- {
957
- "epoch": 0.9336099585062241,
958
- "grad_norm": 0.4640979468822479,
959
- "learning_rate": 2.679025041396155e-06,
960
- "loss": 0.7036,
961
- "step": 675
962
- },
963
- {
964
- "epoch": 0.9405255878284924,
965
- "grad_norm": 0.45547041296958923,
966
- "learning_rate": 2.1518707885777146e-06,
967
- "loss": 0.722,
968
- "step": 680
969
- },
970
- {
971
- "epoch": 0.9474412171507607,
972
- "grad_norm": 0.4363589286804199,
973
- "learning_rate": 1.6818570877776718e-06,
974
- "loss": 0.6818,
975
- "step": 685
976
- },
977
- {
978
- "epoch": 0.9543568464730291,
979
- "grad_norm": 0.45574650168418884,
980
- "learning_rate": 1.2692584137615204e-06,
981
- "loss": 0.704,
982
- "step": 690
983
- },
984
- {
985
- "epoch": 0.9612724757952974,
986
- "grad_norm": 0.4429336190223694,
987
- "learning_rate": 9.143157125359514e-07,
988
- "loss": 0.6805,
989
- "step": 695
990
- },
991
- {
992
- "epoch": 0.9681881051175657,
993
- "grad_norm": 0.435350239276886,
994
- "learning_rate": 6.172362606431281e-07,
995
- "loss": 0.6934,
996
- "step": 700
997
- },
998
- {
999
- "epoch": 0.975103734439834,
1000
- "grad_norm": 0.47741129994392395,
1001
- "learning_rate": 3.781935441171336e-07,
1002
- "loss": 0.6917,
1003
- "step": 705
1004
- },
1005
- {
1006
- "epoch": 0.9820193637621023,
1007
- "grad_norm": 0.44488659501075745,
1008
- "learning_rate": 1.973271571728441e-07,
1009
- "loss": 0.6832,
1010
- "step": 710
1011
- },
1012
- {
1013
- "epoch": 0.9889349930843707,
1014
- "grad_norm": 0.46524661779403687,
1015
- "learning_rate": 7.474272068698218e-08,
1016
- "loss": 0.6978,
1017
- "step": 715
1018
- },
1019
- {
1020
- "epoch": 0.995850622406639,
1021
- "grad_norm": 0.4371030032634735,
1022
- "learning_rate": 1.0511820518432913e-08,
1023
- "loss": 0.6807,
1024
- "step": 720
1025
- },
1026
- {
1027
- "epoch": 1.0,
1028
- "eval_loss": 0.7086742520332336,
1029
- "eval_runtime": 99.1201,
1030
- "eval_samples_per_second": 46.509,
1031
- "eval_steps_per_second": 2.916,
1032
- "step": 723
1033
- },
1034
- {
1035
- "epoch": 1.0,
1036
- "step": 723,
1037
- "total_flos": 1.702143589888295e+18,
1038
- "train_loss": 0.914938143180119,
1039
- "train_runtime": 4445.7565,
1040
- "train_samples_per_second": 10.407,
1041
- "train_steps_per_second": 0.163
1042
  }
1043
  ],
1044
  "logging_steps": 5,
1045
- "max_steps": 723,
1046
  "num_input_tokens_seen": 0,
1047
  "num_train_epochs": 1,
1048
  "save_steps": 500,
@@ -1058,7 +554,7 @@
1058
  "attributes": {}
1059
  }
1060
  },
1061
- "total_flos": 1.702143589888295e+18,
1062
  "train_batch_size": 4,
1063
  "trial_name": null,
1064
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 0.9986168741355463,
5
  "eval_steps": 500,
6
+ "global_step": 361,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.0027662517289073307,
13
+ "grad_norm": 22.881450653076172,
14
+ "learning_rate": 5.405405405405406e-06,
15
+ "loss": 1.6158,
16
  "step": 1
17
  },
18
  {
19
+ "epoch": 0.013831258644536652,
20
+ "grad_norm": 2.178889274597168,
21
+ "learning_rate": 2.702702702702703e-05,
22
+ "loss": 1.3807,
23
  "step": 5
24
  },
25
  {
26
+ "epoch": 0.027662517289073305,
27
+ "grad_norm": 14.400589942932129,
28
+ "learning_rate": 5.405405405405406e-05,
29
+ "loss": 1.3352,
30
  "step": 10
31
  },
32
  {
33
+ "epoch": 0.04149377593360996,
34
+ "grad_norm": 2.756945848464966,
35
+ "learning_rate": 8.108108108108109e-05,
36
+ "loss": 1.2203,
37
  "step": 15
38
  },
39
  {
40
+ "epoch": 0.05532503457814661,
41
+ "grad_norm": 1.3922957181930542,
42
+ "learning_rate": 0.00010810810810810812,
43
+ "loss": 1.0964,
44
  "step": 20
45
  },
46
  {
47
+ "epoch": 0.06915629322268327,
48
+ "grad_norm": 1.0261996984481812,
49
+ "learning_rate": 0.00013513513513513514,
50
+ "loss": 1.2033,
51
  "step": 25
52
  },
53
  {
54
+ "epoch": 0.08298755186721991,
55
+ "grad_norm": 1.6099579334259033,
56
+ "learning_rate": 0.00016216216216216218,
57
+ "loss": 1.2005,
58
  "step": 30
59
  },
60
  {
61
+ "epoch": 0.09681881051175657,
62
+ "grad_norm": 1.77192223072052,
63
+ "learning_rate": 0.0001891891891891892,
64
+ "loss": 1.4161,
65
  "step": 35
66
  },
67
  {
68
+ "epoch": 0.11065006915629322,
69
+ "grad_norm": 1.0772837400436401,
70
+ "learning_rate": 0.0001999576950082201,
71
+ "loss": 1.4553,
72
  "step": 40
73
  },
74
  {
75
+ "epoch": 0.12448132780082988,
76
+ "grad_norm": 1.4605121612548828,
77
+ "learning_rate": 0.0001996992941167792,
78
+ "loss": 1.2175,
79
  "step": 45
80
  },
81
  {
82
+ "epoch": 0.13831258644536654,
83
+ "grad_norm": 1.0822768211364746,
84
+ "learning_rate": 0.00019920660160815422,
85
+ "loss": 1.0378,
86
  "step": 50
87
  },
88
  {
89
+ "epoch": 0.15214384508990317,
90
+ "grad_norm": 0.9796843528747559,
91
+ "learning_rate": 0.00019848077530122083,
92
+ "loss": 1.0451,
93
  "step": 55
94
  },
95
  {
96
+ "epoch": 0.16597510373443983,
97
+ "grad_norm": 1.1945514678955078,
98
+ "learning_rate": 0.00019752352087524933,
99
+ "loss": 1.4266,
100
  "step": 60
101
  },
102
  {
103
+ "epoch": 0.1798063623789765,
104
+ "grad_norm": 0.8683685064315796,
105
+ "learning_rate": 0.00019633708786158806,
106
+ "loss": 1.0347,
107
  "step": 65
108
  },
109
  {
110
+ "epoch": 0.19363762102351315,
111
+ "grad_norm": 0.25568732619285583,
112
+ "learning_rate": 0.0001949242643573034,
113
+ "loss": 0.9376,
114
  "step": 70
115
  },
116
  {
117
+ "epoch": 0.2074688796680498,
118
+ "grad_norm": 0.26001420617103577,
119
+ "learning_rate": 0.0001932883704732001,
120
+ "loss": 0.9132,
121
  "step": 75
122
  },
123
  {
124
+ "epoch": 0.22130013831258644,
125
+ "grad_norm": 0.2598419189453125,
126
+ "learning_rate": 0.00019143325053161796,
127
+ "loss": 0.8958,
128
  "step": 80
129
  },
130
  {
131
+ "epoch": 0.2351313969571231,
132
+ "grad_norm": 0.20231448113918304,
133
+ "learning_rate": 0.00018936326403234125,
134
+ "loss": 0.8734,
135
  "step": 85
136
  },
137
  {
138
+ "epoch": 0.24896265560165975,
139
+ "grad_norm": 0.17383822798728943,
140
+ "learning_rate": 0.00018708327540784922,
141
+ "loss": 0.8701,
142
  "step": 90
143
  },
144
  {
145
+ "epoch": 0.2627939142461964,
146
+ "grad_norm": 0.17745399475097656,
147
+ "learning_rate": 0.0001845986425919841,
148
+ "loss": 0.8499,
149
  "step": 95
150
  },
151
  {
152
+ "epoch": 0.2766251728907331,
153
+ "grad_norm": 0.17801660299301147,
154
+ "learning_rate": 0.0001819152044288992,
155
+ "loss": 0.8512,
156
  "step": 100
157
  },
158
  {
159
+ "epoch": 0.29045643153526973,
160
+ "grad_norm": 0.18566825985908508,
161
+ "learning_rate": 0.00017903926695187595,
162
+ "loss": 0.8361,
163
  "step": 105
164
  },
165
  {
166
+ "epoch": 0.30428769017980634,
167
+ "grad_norm": 0.18012060225009918,
168
+ "learning_rate": 0.00017597758856425494,
169
+ "loss": 0.834,
170
  "step": 110
171
  },
172
  {
173
+ "epoch": 0.318118948824343,
174
+ "grad_norm": 0.16151954233646393,
175
+ "learning_rate": 0.00017273736415730488,
176
+ "loss": 0.8114,
177
  "step": 115
178
  },
179
  {
180
+ "epoch": 0.33195020746887965,
181
+ "grad_norm": 0.16563855111598969,
182
+ "learning_rate": 0.00016932620820235244,
183
+ "loss": 0.8191,
184
  "step": 120
185
  },
186
  {
187
+ "epoch": 0.3457814661134163,
188
+ "grad_norm": 0.16186057031154633,
189
+ "learning_rate": 0.0001657521368569064,
190
+ "loss": 0.7887,
191
  "step": 125
192
  },
193
  {
194
+ "epoch": 0.359612724757953,
195
+ "grad_norm": 0.1734704077243805,
196
+ "learning_rate": 0.000162023549126826,
197
+ "loss": 0.7946,
198
  "step": 130
199
  },
200
  {
201
+ "epoch": 0.37344398340248963,
202
+ "grad_norm": 0.17336814105510712,
203
+ "learning_rate": 0.00015814920712880267,
204
+ "loss": 0.7974,
205
  "step": 135
206
  },
207
  {
208
+ "epoch": 0.3872752420470263,
209
+ "grad_norm": 0.15509486198425293,
210
+ "learning_rate": 0.00015413821549953698,
211
+ "loss": 0.7866,
212
  "step": 140
213
  },
214
  {
215
+ "epoch": 0.40110650069156295,
216
+ "grad_norm": 0.18101590871810913,
217
+ "learning_rate": 0.00015000000000000001,
218
+ "loss": 0.7927,
219
  "step": 145
220
  },
221
  {
222
+ "epoch": 0.4149377593360996,
223
+ "grad_norm": 0.14941518008708954,
224
+ "learning_rate": 0.0001457442853650581,
225
+ "loss": 0.7698,
226
  "step": 150
227
  },
228
  {
229
+ "epoch": 0.4287690179806362,
230
+ "grad_norm": 0.15677104890346527,
231
+ "learning_rate": 0.00014138107245051392,
232
+ "loss": 0.7721,
233
  "step": 155
234
  },
235
  {
236
+ "epoch": 0.4426002766251729,
237
+ "grad_norm": 0.14607611298561096,
238
+ "learning_rate": 0.00013692061473126845,
239
+ "loss": 0.7516,
240
  "step": 160
241
  },
242
  {
243
+ "epoch": 0.45643153526970953,
244
+ "grad_norm": 0.16472630202770233,
245
+ "learning_rate": 0.00013237339420583212,
246
+ "loss": 0.7554,
247
  "step": 165
248
  },
249
  {
250
+ "epoch": 0.4702627939142462,
251
+ "grad_norm": 0.13666489720344543,
252
+ "learning_rate": 0.00012775009676380957,
253
+ "loss": 0.7515,
254
  "step": 170
255
  },
256
  {
257
+ "epoch": 0.48409405255878285,
258
+ "grad_norm": 0.1362183392047882,
259
+ "learning_rate": 0.00012306158707424403,
260
+ "loss": 0.7513,
261
  "step": 175
262
  },
263
  {
264
+ "epoch": 0.4979253112033195,
265
+ "grad_norm": 0.12810291349887848,
266
+ "learning_rate": 0.00011831888305383268,
267
+ "loss": 0.7385,
268
  "step": 180
269
  },
270
  {
271
+ "epoch": 0.5117565698478561,
272
+ "grad_norm": 0.14311975240707397,
273
+ "learning_rate": 0.00011353312997501313,
274
+ "loss": 0.7469,
275
  "step": 185
276
  },
277
  {
278
+ "epoch": 0.5255878284923928,
279
+ "grad_norm": 0.129547581076622,
280
+ "learning_rate": 0.00010871557427476583,
281
+ "loss": 0.7423,
282
  "step": 190
283
  },
284
  {
285
+ "epoch": 0.5394190871369294,
286
+ "grad_norm": 0.1447523832321167,
287
+ "learning_rate": 0.0001038775371256817,
288
+ "loss": 0.7351,
289
  "step": 195
290
  },
291
  {
292
+ "epoch": 0.5532503457814661,
293
+ "grad_norm": 0.1369813233613968,
294
+ "learning_rate": 9.903038783140216e-05,
295
+ "loss": 0.7202,
296
  "step": 200
297
  },
298
  {
299
+ "epoch": 0.5670816044260027,
300
+ "grad_norm": 0.12533989548683167,
301
+ "learning_rate": 9.418551710895243e-05,
302
+ "loss": 0.722,
303
  "step": 205
304
  },
305
  {
306
+ "epoch": 0.5809128630705395,
307
+ "grad_norm": 0.12739399075508118,
308
+ "learning_rate": 8.935431032075318e-05,
309
+ "loss": 0.7173,
310
  "step": 210
311
  },
312
  {
313
+ "epoch": 0.5947441217150761,
314
+ "grad_norm": 0.13596710562705994,
315
+ "learning_rate": 8.454812071921596e-05,
316
+ "loss": 0.7194,
317
  "step": 215
318
  },
319
  {
320
+ "epoch": 0.6085753803596127,
321
+ "grad_norm": 0.12327581644058228,
322
+ "learning_rate": 7.977824276679623e-05,
323
+ "loss": 0.7095,
324
  "step": 220
325
  },
326
  {
327
+ "epoch": 0.6224066390041494,
328
+ "grad_norm": 0.1317676603794098,
329
+ "learning_rate": 7.505588559420189e-05,
330
+ "loss": 0.713,
331
  "step": 225
332
  },
333
  {
334
+ "epoch": 0.636237897648686,
335
+ "grad_norm": 0.13516183197498322,
336
+ "learning_rate": 7.039214665913003e-05,
337
+ "loss": 0.7048,
338
  "step": 230
339
  },
340
  {
341
+ "epoch": 0.6500691562932227,
342
+ "grad_norm": 0.12717784941196442,
343
+ "learning_rate": 6.579798566743314e-05,
344
+ "loss": 0.7088,
345
  "step": 235
346
  },
347
  {
348
+ "epoch": 0.6639004149377593,
349
+ "grad_norm": 0.12097220122814178,
350
+ "learning_rate": 6.128419881799996e-05,
351
+ "loss": 0.6939,
352
  "step": 240
353
  },
354
  {
355
+ "epoch": 0.677731673582296,
356
+ "grad_norm": 0.1216357946395874,
357
+ "learning_rate": 5.6861393431874675e-05,
358
+ "loss": 0.6943,
359
  "step": 245
360
  },
361
  {
362
+ "epoch": 0.6915629322268326,
363
+ "grad_norm": 0.12578962743282318,
364
+ "learning_rate": 5.253996302523596e-05,
365
+ "loss": 0.6832,
366
  "step": 250
367
  },
368
  {
369
+ "epoch": 0.7053941908713693,
370
+ "grad_norm": 0.1288958042860031,
371
+ "learning_rate": 4.833006288481371e-05,
372
+ "loss": 0.6786,
373
  "step": 255
374
  },
375
  {
376
+ "epoch": 0.719225449515906,
377
+ "grad_norm": 0.13444924354553223,
378
+ "learning_rate": 4.424158620314073e-05,
379
+ "loss": 0.6861,
380
  "step": 260
381
  },
382
  {
383
+ "epoch": 0.7330567081604425,
384
+ "grad_norm": 0.15658161044120789,
385
+ "learning_rate": 4.028414082972141e-05,
386
+ "loss": 0.6829,
387
  "step": 265
388
  },
389
  {
390
+ "epoch": 0.7468879668049793,
391
+ "grad_norm": 0.13638462126255035,
392
+ "learning_rate": 3.646702669275151e-05,
393
+ "loss": 0.6811,
394
  "step": 270
395
  },
396
  {
397
+ "epoch": 0.7607192254495159,
398
+ "grad_norm": 0.11960398405790329,
399
+ "learning_rate": 3.279921394444776e-05,
400
+ "loss": 0.6645,
401
  "step": 275
402
  },
403
  {
404
+ "epoch": 0.7745504840940526,
405
+ "grad_norm": 0.12005037814378738,
406
+ "learning_rate": 2.9289321881345254e-05,
407
+ "loss": 0.6709,
408
  "step": 280
409
  },
410
  {
411
+ "epoch": 0.7883817427385892,
412
+ "grad_norm": 0.12300828844308853,
413
+ "learning_rate": 2.594559868909956e-05,
414
+ "loss": 0.6629,
415
  "step": 285
416
  },
417
  {
418
+ "epoch": 0.8022130013831259,
419
+ "grad_norm": 0.11922738701105118,
420
+ "learning_rate": 2.2775902059393085e-05,
421
+ "loss": 0.6613,
422
  "step": 290
423
  },
424
  {
425
+ "epoch": 0.8160442600276625,
426
+ "grad_norm": 0.11143971979618073,
427
+ "learning_rate": 1.9787680724495617e-05,
428
+ "loss": 0.6546,
429
  "step": 295
430
  },
431
  {
432
+ "epoch": 0.8298755186721992,
433
+ "grad_norm": 0.11601640284061432,
434
+ "learning_rate": 1.698795695287212e-05,
435
+ "loss": 0.6567,
436
  "step": 300
437
  },
438
  {
439
+ "epoch": 0.8437067773167358,
440
+ "grad_norm": 0.11989685148000717,
441
+ "learning_rate": 1.4383310046973365e-05,
442
+ "loss": 0.657,
443
  "step": 305
444
  },
445
  {
446
+ "epoch": 0.8575380359612724,
447
+ "grad_norm": 0.11077902466058731,
448
+ "learning_rate": 1.1979860881988902e-05,
449
+ "loss": 0.6555,
450
  "step": 310
451
  },
452
  {
453
+ "epoch": 0.8713692946058091,
454
+ "grad_norm": 0.11324643343687057,
455
+ "learning_rate": 9.783257521896227e-06,
456
+ "loss": 0.6468,
457
  "step": 315
458
  },
459
  {
460
+ "epoch": 0.8852005532503457,
461
+ "grad_norm": 0.11370333284139633,
462
+ "learning_rate": 7.798661946608166e-06,
463
+ "loss": 0.648,
464
  "step": 320
465
  },
466
  {
467
+ "epoch": 0.8990318118948825,
468
+ "grad_norm": 0.10991474986076355,
469
+ "learning_rate": 6.030737921409169e-06,
470
+ "loss": 0.6446,
471
  "step": 325
472
  },
473
  {
474
+ "epoch": 0.9128630705394191,
475
+ "grad_norm": 0.11461606621742249,
476
+ "learning_rate": 4.4836400371876974e-06,
477
+ "loss": 0.6387,
478
  "step": 330
479
  },
480
  {
481
+ "epoch": 0.9266943291839558,
482
+ "grad_norm": 0.1137213185429573,
483
+ "learning_rate": 3.161003947219421e-06,
484
+ "loss": 0.6329,
485
  "step": 335
486
  },
487
  {
488
+ "epoch": 0.9405255878284924,
489
+ "grad_norm": 0.10857342928647995,
490
+ "learning_rate": 2.0659378234448525e-06,
491
+ "loss": 0.6627,
492
  "step": 340
493
  },
494
  {
495
+ "epoch": 0.9543568464730291,
496
+ "grad_norm": 0.10978103429079056,
497
+ "learning_rate": 1.201015052319099e-06,
498
+ "loss": 0.6435,
499
  "step": 345
500
  },
501
  {
502
+ "epoch": 0.9681881051175657,
503
+ "grad_norm": 0.1058996319770813,
504
+ "learning_rate": 5.682681873981577e-07,
505
+ "loss": 0.6388,
506
  "step": 350
507
  },
508
  {
509
+ "epoch": 0.9820193637621023,
510
+ "grad_norm": 0.10548459738492966,
511
+ "learning_rate": 1.6918417287318245e-07,
512
+ "loss": 0.6382,
513
  "step": 355
514
  },
515
  {
516
+ "epoch": 0.995850622406639,
517
+ "grad_norm": 0.11099706590175629,
518
+ "learning_rate": 4.700849277383679e-09,
519
+ "loss": 0.6424,
520
  "step": 360
521
  },
522
  {
523
+ "epoch": 0.9986168741355463,
524
+ "eval_loss": 0.658014178276062,
525
+ "eval_runtime": 53.9504,
526
+ "eval_samples_per_second": 85.449,
527
+ "eval_steps_per_second": 2.688,
528
+ "step": 361
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
529
  },
530
  {
531
+ "epoch": 0.9986168741355463,
532
+ "step": 361,
533
+ "total_flos": 1.74045731487744e+18,
534
+ "train_loss": 0.8234024724801822,
535
+ "train_runtime": 2385.6161,
536
+ "train_samples_per_second": 19.395,
537
+ "train_steps_per_second": 0.151
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
538
  }
539
  ],
540
  "logging_steps": 5,
541
+ "max_steps": 361,
542
  "num_input_tokens_seen": 0,
543
  "num_train_epochs": 1,
544
  "save_steps": 500,
 
554
  "attributes": {}
555
  }
556
  },
557
+ "total_flos": 1.74045731487744e+18,
558
  "train_batch_size": 4,
559
  "trial_name": null,
560
  "trial_params": null
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c7f901665b246c7f97e4ff363cef52fdcc6b1b8fb59deef2a745733be6a10b18
3
- size 6968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:475f29db775c3fa3ebae0c3997a227d93f50e4e631d281907a24df8c23250da0
3
+ size 7096