lemms commited on
Commit
6146d01
·
verified ·
1 Parent(s): 2de7c50

Upload folder using huggingface_hub

Browse files
exports/improved-10k-huggingface/README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license:
5
+ - gpl-3.0
6
+ - other
7
+ tags:
8
+ - text-generation
9
+ - pytorch
10
+ - causal-lm
11
+ - openllm
12
+ - gpt
13
+ - language-model
14
+ datasets:
15
+ - squad
16
+ metrics:
17
+ - perplexity
18
+ - loss
19
+ pipeline_tag: text-generation
20
+ model-index:
21
+ - name: OpenLLM Small Extended 10k Improved
22
+ results:
23
+ - task:
24
+ type: text-generation
25
+ dataset:
26
+ type: squad
27
+ name: SQUAD
28
+ metrics:
29
+ - type: loss
30
+ value: 5.1774
31
+ - type: perplexity
32
+ value: 177.23
33
+ ---
34
+
35
+ # OpenLLM Small Extended 10k Improved
36
+
37
+ This is an improved version of the OpenLLM Small model trained for 10,000 steps using the enhanced training process with proper checkpoint saving and validation monitoring.
38
+
39
+ ## Model Details
40
+
41
+ - **Model Type**: GPT-style language model
42
+ - **Architecture**: Transformer decoder-only
43
+ - **Parameters**: 35.8M
44
+ - **Training Steps**: 10,000 (resumed from 9k model)
45
+ - **Training Time**: 21.57 hours
46
+ - **Final Loss**: 5.1774
47
+ - **Final Perplexity**: 177.23
48
+ - **Best Validation Loss**: 5.4179
49
+
50
+ ## Training Process
51
+
52
+ This model was trained using the improved training process that includes:
53
+ - ✅ Proper checkpoint saving with full metadata
54
+ - ✅ Best checkpoint tracking
55
+ - ✅ Validation monitoring
56
+ - ✅ Early stopping mechanism
57
+ - ✅ Complete training logs
58
+
59
+ ## Usage
60
+
61
+ ```python
62
+ # Load using the OpenLLM framework
63
+ from core.src.model import GPTModel
64
+ import json
65
+ import torch
66
+
67
+ # Load configuration
68
+ with open("config.json", "r") as f:
69
+ config = json.load(f)
70
+
71
+ # Create model instance
72
+ model = GPTModel(config["model_config"])
73
+
74
+ # Load trained weights
75
+ model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
76
+
77
+ # Load tokenizer
78
+ import sentencepiece as spm
79
+ tokenizer = spm.SentencePieceProcessor()
80
+ tokenizer.load("tokenizer.model")
81
+
82
+ # Generate text
83
+ prompt = "The future of artificial intelligence"
84
+ tokens = tokenizer.encode(prompt)
85
+ inputs = torch.tensor([tokens], dtype=torch.long)
86
+
87
+ with torch.no_grad():
88
+ outputs = model.generate(
89
+ inputs,
90
+ max_length=100,
91
+ temperature=0.7
92
+ )
93
+
94
+ generated_text = tokenizer.decode(outputs[0].tolist())
95
+ print(generated_text)
96
+ ```
97
+
98
+ ## Training Configuration
99
+
100
+ - **Learning Rate**: 3e-4
101
+ - **Batch Size**: 4
102
+ - **Gradient Accumulation Steps**: 4
103
+ - **Max Steps**: 10,000
104
+ - **Warmup Steps**: 100
105
+ - **Weight Decay**: 0.01
106
+ - **Sequence Length**: 512
107
+
108
+ ## Model Performance
109
+
110
+ This improved 10k model maintains the same high performance as the 9k model while having proper checkpoint format and complete training metadata.
111
+
112
+ ## License
113
+
114
+ This model is licensed under the GNU General Public License v3.0.
exports/improved-10k-huggingface/config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_config": {
3
+ "model_name": "OpenLLM-Small-10k-Improved",
4
+ "model_size": "small",
5
+ "vocab_size": 32000,
6
+ "n_layer": 6,
7
+ "n_head": 8,
8
+ "n_embd": 512,
9
+ "block_size": 1024,
10
+ "dropout": 0.1,
11
+ "bias": false,
12
+ "training_info": {
13
+ "step": 10000,
14
+ "best_loss": 5.177438259124756,
15
+ "model_type": "gpt-small-improved"
16
+ }
17
+ },
18
+ "tokenizer_config": {
19
+ "type": "sentencepiece",
20
+ "vocab_size": 32000,
21
+ "model_file": "tokenizer.model"
22
+ },
23
+ "training_config": {
24
+ "learning_rate": 0.0003,
25
+ "batch_size": 4,
26
+ "gradient_accumulation_steps": 4,
27
+ "max_steps": 10000,
28
+ "warmup_steps": 100,
29
+ "weight_decay": 0.01,
30
+ "training_time_hours": 21.57,
31
+ "final_perplexity": 177.23,
32
+ "best_validation_loss": 5.417883062362671
33
+ }
34
+ }
exports/improved-10k-huggingface/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8153ebd2d2e4e2f8f5a1f70783b63e0cc70b3bf5d757aabb7ae70edede618e39
3
+ size 168490621
exports/improved-10k-huggingface/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6efb1da9b0e667cee37b23f4240e0bd34fbfb20e1faebcb8d299a7598c0635f3
3
+ size 547695
exports/improved-10k-huggingface/training_log.json ADDED
@@ -0,0 +1,906 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "step": 100,
4
+ "loss": 7.657252192497253,
5
+ "perplexity": 2115.935250373028,
6
+ "learning_rate": 0.00015150000000000019,
7
+ "step_time": 9.368009328842163,
8
+ "tokens_per_second": 109.30817466708918,
9
+ "memory_mb": 756.6015625
10
+ },
11
+ {
12
+ "step": 200,
13
+ "loss": 7.487581491470337,
14
+ "perplexity": 1785.7280665447065,
15
+ "learning_rate": 0.0002999999999999999,
16
+ "step_time": 10.94300365447998,
17
+ "tokens_per_second": 93.57577063229641,
18
+ "memory_mb": 730.99609375
19
+ },
20
+ {
21
+ "step": 300,
22
+ "loss": 7.163171410560608,
23
+ "perplexity": 1290.9987344513497,
24
+ "learning_rate": 0.00029794904665665113,
25
+ "step_time": 8.949450016021729,
26
+ "tokens_per_second": 114.42043904002891,
27
+ "memory_mb": 733.89453125
28
+ },
29
+ {
30
+ "step": 400,
31
+ "loss": 6.677380561828613,
32
+ "perplexity": 794.2359329078598,
33
+ "learning_rate": 0.00029185850380610337,
34
+ "step_time": 8.572781324386597,
35
+ "tokens_per_second": 119.44781527169884,
36
+ "memory_mb": 827.69140625
37
+ },
38
+ {
39
+ "step": 500,
40
+ "loss": 6.8126842975616455,
41
+ "perplexity": 909.3083881764071,
42
+ "learning_rate": 0.0002819134295109075,
43
+ "step_time": 8.801953315734863,
44
+ "tokens_per_second": 116.33781312716582,
45
+ "memory_mb": 849.21484375
46
+ },
47
+ {
48
+ "step": 600,
49
+ "loss": 6.901162624359131,
50
+ "perplexity": 993.4290292468936,
51
+ "learning_rate": 0.0002684159998210713,
52
+ "step_time": 8.660848379135132,
53
+ "tokens_per_second": 118.23322094714423,
54
+ "memory_mb": 734.16015625
55
+ },
56
+ {
57
+ "step": 700,
58
+ "loss": 6.6995872259140015,
59
+ "perplexity": 812.0705542959239,
60
+ "learning_rate": 0.0002517763273076916,
61
+ "step_time": 9.10423469543457,
62
+ "tokens_per_second": 112.47513209578146,
63
+ "memory_mb": 734.64453125
64
+ },
65
+ {
66
+ "step": 800,
67
+ "loss": 6.729204297065735,
68
+ "perplexity": 836.4814103649661,
69
+ "learning_rate": 0.00023250000000000793,
70
+ "step_time": 8.646829605102539,
71
+ "tokens_per_second": 118.42490794495734,
72
+ "memory_mb": 733.84765625
73
+ },
74
+ {
75
+ "step": 900,
76
+ "loss": 6.801577568054199,
77
+ "perplexity": 899.2648246887062,
78
+ "learning_rate": 0.00021117271934897237,
79
+ "step_time": 8.66753602027893,
80
+ "tokens_per_second": 118.14199532649263,
81
+ "memory_mb": 839.50390625
82
+ },
83
+ {
84
+ "step": 1000,
85
+ "loss": 6.5149312019348145,
86
+ "perplexity": 675.1475110828569,
87
+ "learning_rate": 0.00018844250398504186,
88
+ "step_time": 9.189276933670044,
89
+ "tokens_per_second": 111.43423006961567,
90
+ "memory_mb": 734.55078125
91
+ },
92
+ {
93
+ "step": 1100,
94
+ "loss": 6.551132082939148,
95
+ "perplexity": 700.0362244642821,
96
+ "learning_rate": 0.00016500000000000537,
97
+ "step_time": 8.517168760299683,
98
+ "tokens_per_second": 120.22774572379963,
99
+ "memory_mb": 913.80078125
100
+ },
101
+ {
102
+ "step": 1200,
103
+ "loss": 6.6684452295303345,
104
+ "perplexity": 787.170782663513,
105
+ "learning_rate": 0.00014155749601496882,
106
+ "step_time": 9.400139570236206,
107
+ "tokens_per_second": 108.93455276369572,
108
+ "memory_mb": 968.5859375
109
+ },
110
+ {
111
+ "step": 1300,
112
+ "loss": 6.406905889511108,
113
+ "perplexity": 606.0156976890383,
114
+ "learning_rate": 0.00011882728065103813,
115
+ "step_time": 8.490540981292725,
116
+ "tokens_per_second": 120.6048003603289,
117
+ "memory_mb": 979.359375
118
+ },
119
+ {
120
+ "step": 1400,
121
+ "loss": 6.3421419858932495,
122
+ "perplexity": 568.011682273887,
123
+ "learning_rate": 9.750000000000261e-05,
124
+ "step_time": 10.675879955291748,
125
+ "tokens_per_second": 95.91715196201983,
126
+ "memory_mb": 733.6328125
127
+ },
128
+ {
129
+ "step": 1500,
130
+ "loss": 6.335531115531921,
131
+ "perplexity": 564.2690154518974,
132
+ "learning_rate": 7.822367269231907e-05,
133
+ "step_time": 8.945564270019531,
134
+ "tokens_per_second": 114.47014062957085,
135
+ "memory_mb": 994.3515625
136
+ },
137
+ {
138
+ "step": 1600,
139
+ "loss": 6.629261136054993,
140
+ "perplexity": 756.9227010883292,
141
+ "learning_rate": 6.158400017893925e-05,
142
+ "step_time": 10.422486782073975,
143
+ "tokens_per_second": 98.2491051714468,
144
+ "memory_mb": 730.56640625
145
+ },
146
+ {
147
+ "step": 1700,
148
+ "loss": 6.302608251571655,
149
+ "perplexity": 545.9941446328027,
150
+ "learning_rate": 4.808657048910149e-05,
151
+ "step_time": 9.476832628250122,
152
+ "tokens_per_second": 108.05297932006208,
153
+ "memory_mb": 745.43359375
154
+ },
155
+ {
156
+ "step": 1800,
157
+ "loss": 6.266754984855652,
158
+ "perplexity": 526.765240241726,
159
+ "learning_rate": 3.8141496193902704e-05,
160
+ "step_time": 8.648972511291504,
161
+ "tokens_per_second": 118.3955664864394,
162
+ "memory_mb": 1009.6484375
163
+ },
164
+ {
165
+ "step": 1900,
166
+ "loss": 6.480456352233887,
167
+ "perplexity": 652.268542568135,
168
+ "learning_rate": 3.2050953343351995e-05,
169
+ "step_time": 8.593879699707031,
170
+ "tokens_per_second": 119.15456531639704,
171
+ "memory_mb": 735.796875
172
+ },
173
+ {
174
+ "step": 2000,
175
+ "loss": 5.975515604019165,
176
+ "perplexity": 393.6710271356782,
177
+ "learning_rate": 2.9999999999999997e-05,
178
+ "step_time": 8.94594669342041,
179
+ "tokens_per_second": 114.46524723349116,
180
+ "memory_mb": 729.84375
181
+ },
182
+ {
183
+ "step": 2100,
184
+ "loss": 6.554613709449768,
185
+ "perplexity": 702.4777368926921,
186
+ "learning_rate": 3.205095334333285e-05,
187
+ "step_time": 9.199656009674072,
188
+ "tokens_per_second": 111.3085096794047,
189
+ "memory_mb": 752.64453125
190
+ },
191
+ {
192
+ "step": 2200,
193
+ "loss": 6.471360206604004,
194
+ "perplexity": 646.3623155888199,
195
+ "learning_rate": 3.814149619382671e-05,
196
+ "step_time": 8.654725313186646,
197
+ "tokens_per_second": 118.3168688715975,
198
+ "memory_mb": 735.9375
199
+ },
200
+ {
201
+ "step": 2300,
202
+ "loss": 6.382450699806213,
203
+ "perplexity": 591.3752163574818,
204
+ "learning_rate": 4.808657048893273e-05,
205
+ "step_time": 8.75070834159851,
206
+ "tokens_per_second": 117.01909834340836,
207
+ "memory_mb": 776.53125
208
+ },
209
+ {
210
+ "step": 2400,
211
+ "loss": 5.957324385643005,
212
+ "perplexity": 386.5744152212925,
213
+ "learning_rate": 6.158400017864459e-05,
214
+ "step_time": 8.642782926559448,
215
+ "tokens_per_second": 118.4803562349376,
216
+ "memory_mb": 923.15234375
217
+ },
218
+ {
219
+ "step": 2500,
220
+ "loss": 6.218142509460449,
221
+ "perplexity": 501.7703322170689,
222
+ "learning_rate": 7.822367269186924e-05,
223
+ "step_time": 9.567925691604614,
224
+ "tokens_per_second": 107.0242425585004,
225
+ "memory_mb": 1050.1875
226
+ },
227
+ {
228
+ "step": 2600,
229
+ "loss": 6.314482092857361,
230
+ "perplexity": 552.515834581567,
231
+ "learning_rate": 9.749999999937299e-05,
232
+ "step_time": 8.212730169296265,
233
+ "tokens_per_second": 124.68448115199003,
234
+ "memory_mb": 738.69921875
235
+ },
236
+ {
237
+ "step": 2700,
238
+ "loss": 6.240906238555908,
239
+ "perplexity": 513.3234937600264,
240
+ "learning_rate": 0.00011882728065020969,
241
+ "step_time": 8.88506555557251,
242
+ "tokens_per_second": 115.24957172181702,
243
+ "memory_mb": 739.125
244
+ },
245
+ {
246
+ "step": 2800,
247
+ "loss": 6.308051347732544,
248
+ "perplexity": 548.9741461252174,
249
+ "learning_rate": 0.0001415574960139285,
250
+ "step_time": 9.43014669418335,
251
+ "tokens_per_second": 108.58791842884247,
252
+ "memory_mb": 732.40234375
253
+ },
254
+ {
255
+ "step": 2900,
256
+ "loss": 6.410398960113525,
257
+ "perplexity": 608.136254778883,
258
+ "learning_rate": 0.00016499999999874617,
259
+ "step_time": 8.674461603164673,
260
+ "tokens_per_second": 118.0476722182294,
261
+ "memory_mb": 733.71484375
262
+ },
263
+ {
264
+ "step": 3000,
265
+ "loss": 6.160744071006775,
266
+ "perplexity": 473.7804700263767,
267
+ "learning_rate": 0.0001884425039835638,
268
+ "step_time": 9.056138277053833,
269
+ "tokens_per_second": 113.07247843096432,
270
+ "memory_mb": 803.75390625
271
+ },
272
+ {
273
+ "step": 3100,
274
+ "loss": 6.270610332489014,
275
+ "perplexity": 528.8000232415734,
276
+ "learning_rate": 0.0002111727193472824,
277
+ "step_time": 9.650378227233887,
278
+ "tokens_per_second": 106.10983071214939,
279
+ "memory_mb": 790.328125
280
+ },
281
+ {
282
+ "step": 3200,
283
+ "loss": 6.4413875341415405,
284
+ "perplexity": 627.2765639068664,
285
+ "learning_rate": 0.00023249999999811922,
286
+ "step_time": 8.436410665512085,
287
+ "tokens_per_second": 121.37863371043517,
288
+ "memory_mb": 940.3671875
289
+ },
290
+ {
291
+ "step": 3300,
292
+ "loss": 6.163665413856506,
293
+ "perplexity": 475.1665688640195,
294
+ "learning_rate": 0.0002517763273056231,
295
+ "step_time": 10.011188507080078,
296
+ "tokens_per_second": 102.28555773131335,
297
+ "memory_mb": 747.43359375
298
+ },
299
+ {
300
+ "step": 3400,
301
+ "loss": 6.067382216453552,
302
+ "perplexity": 431.5494984476535,
303
+ "learning_rate": 0.00026841599981884787,
304
+ "step_time": 8.671327114105225,
305
+ "tokens_per_second": 118.0903437876665,
306
+ "memory_mb": 970.37109375
307
+ },
308
+ {
309
+ "step": 3500,
310
+ "loss": 6.14486300945282,
311
+ "perplexity": 466.3157638357051,
312
+ "learning_rate": 0.0002819134295085615,
313
+ "step_time": 8.194751024246216,
314
+ "tokens_per_second": 124.95803679333764,
315
+ "memory_mb": 736.890625
316
+ },
317
+ {
318
+ "step": 3600,
319
+ "loss": 6.448354721069336,
320
+ "perplexity": 631.6621769355031,
321
+ "learning_rate": 0.00029185850380367053,
322
+ "step_time": 8.76004934310913,
323
+ "tokens_per_second": 116.89431872955184,
324
+ "memory_mb": 781.22265625
325
+ },
326
+ {
327
+ "step": 3700,
328
+ "loss": 6.149544358253479,
329
+ "perplexity": 468.5038682213576,
330
+ "learning_rate": 0.00029794904665416755,
331
+ "step_time": 11.995165824890137,
332
+ "tokens_per_second": 85.36772354369505,
333
+ "memory_mb": 836.3046875
334
+ },
335
+ {
336
+ "step": 3800,
337
+ "loss": 6.083824634552002,
338
+ "perplexity": 438.70387214992155,
339
+ "learning_rate": 0.0002999999999975028,
340
+ "step_time": 8.194983720779419,
341
+ "tokens_per_second": 124.95448861033346,
342
+ "memory_mb": 766.9453125
343
+ },
344
+ {
345
+ "step": 3900,
346
+ "loss": 6.295181512832642,
347
+ "perplexity": 541.954209109435,
348
+ "learning_rate": 0.0002979490466541723,
349
+ "step_time": 9.617033004760742,
350
+ "tokens_per_second": 106.47774625428518,
351
+ "memory_mb": 837.29296875
352
+ },
353
+ {
354
+ "step": 4000,
355
+ "loss": 5.752694249153137,
356
+ "perplexity": 315.038309582537,
357
+ "learning_rate": 0.0002918585038036804,
358
+ "step_time": 10.153574228286743,
359
+ "tokens_per_second": 100.85118569845567,
360
+ "memory_mb": 729.46484375
361
+ },
362
+ {
363
+ "step": 4100,
364
+ "loss": 6.129071950912476,
365
+ "perplexity": 459.00997915951336,
366
+ "learning_rate": 0.00028191342950857645,
367
+ "step_time": 16.923099994659424,
368
+ "tokens_per_second": 121.01801683180422,
369
+ "memory_mb": 866.20703125
370
+ },
371
+ {
372
+ "step": 4200,
373
+ "loss": 5.725140452384949,
374
+ "perplexity": 306.4763075489217,
375
+ "learning_rate": 0.0002684159998188649,
376
+ "step_time": 16.08098530769348,
377
+ "tokens_per_second": 127.35538033357905,
378
+ "memory_mb": 856.5625
379
+ },
380
+ {
381
+ "step": 4300,
382
+ "loss": 5.917757749557495,
383
+ "perplexity": 371.57760903139615,
384
+ "learning_rate": 0.00025177632730563935,
385
+ "step_time": 16.5029239654541,
386
+ "tokens_per_second": 124.09922049493284,
387
+ "memory_mb": 856.58203125
388
+ },
389
+ {
390
+ "step": 4400,
391
+ "loss": 5.828692674636841,
392
+ "perplexity": 339.9140102647636,
393
+ "learning_rate": 0.00023249999999813424,
394
+ "step_time": 15.03031873703003,
395
+ "tokens_per_second": 136.25792212605347,
396
+ "memory_mb": 856.58203125
397
+ },
398
+ {
399
+ "step": 4500,
400
+ "loss": 5.768530249595642,
401
+ "perplexity": 320.0669682230316,
402
+ "learning_rate": 0.0002111727193472959,
403
+ "step_time": 15.095525026321411,
404
+ "tokens_per_second": 135.66934548013344,
405
+ "memory_mb": 856.58203125
406
+ },
407
+ {
408
+ "step": 4600,
409
+ "loss": 5.824233174324036,
410
+ "perplexity": 338.40153857021943,
411
+ "learning_rate": 0.00018844250398357548,
412
+ "step_time": 15.770112752914429,
413
+ "tokens_per_second": 129.86590724416445,
414
+ "memory_mb": 856.6796875
415
+ },
416
+ {
417
+ "step": 4700,
418
+ "loss": 5.592358708381653,
419
+ "perplexity": 268.3678753241069,
420
+ "learning_rate": 0.00016499999999875585,
421
+ "step_time": 15.393234729766846,
422
+ "tokens_per_second": 133.04546029169921,
423
+ "memory_mb": 856.6015625
424
+ },
425
+ {
426
+ "step": 4800,
427
+ "loss": 5.626398682594299,
428
+ "perplexity": 277.6603717835579,
429
+ "learning_rate": 0.0001415574960139364,
430
+ "step_time": 15.054797887802124,
431
+ "tokens_per_second": 136.03636629750804,
432
+ "memory_mb": 856.6796875
433
+ },
434
+ {
435
+ "step": 4900,
436
+ "loss": 5.718711733818054,
437
+ "perplexity": 304.5123771619807,
438
+ "learning_rate": 0.00011882728065021638,
439
+ "step_time": 16.42769432067871,
440
+ "tokens_per_second": 124.66752546168553,
441
+ "memory_mb": 856.62890625
442
+ },
443
+ {
444
+ "step": 5000,
445
+ "loss": 5.441234588623047,
446
+ "perplexity": 230.7268604519411,
447
+ "learning_rate": 9.749999999937817e-05,
448
+ "step_time": 15.495960474014282,
449
+ "tokens_per_second": 132.16347598681364,
450
+ "memory_mb": 856.71875
451
+ },
452
+ {
453
+ "step": 5100,
454
+ "loss": 5.574064016342163,
455
+ "perplexity": 263.50280585799544,
456
+ "learning_rate": 7.822367269187295e-05,
457
+ "step_time": 15.050865411758423,
458
+ "tokens_per_second": 136.07190975211358,
459
+ "memory_mb": 856.67578125
460
+ },
461
+ {
462
+ "step": 5200,
463
+ "loss": 5.486729264259338,
464
+ "perplexity": 241.4661419404157,
465
+ "learning_rate": 6.158400017864684e-05,
466
+ "step_time": 14.95394515991211,
467
+ "tokens_per_second": 136.95382576968316,
468
+ "memory_mb": 856.74609375
469
+ },
470
+ {
471
+ "step": 5300,
472
+ "loss": 5.643607139587402,
473
+ "perplexity": 282.47982711304775,
474
+ "learning_rate": 4.808657048893407e-05,
475
+ "step_time": 14.74146294593811,
476
+ "tokens_per_second": 138.927866759948,
477
+ "memory_mb": 856.828125
478
+ },
479
+ {
480
+ "step": 5400,
481
+ "loss": 5.773634672164917,
482
+ "perplexity": 321.7049020761921,
483
+ "learning_rate": 3.8141496193827305e-05,
484
+ "step_time": 14.770179033279419,
485
+ "tokens_per_second": 138.6577640924697,
486
+ "memory_mb": 856.73046875
487
+ },
488
+ {
489
+ "step": 5500,
490
+ "loss": 5.430219531059265,
491
+ "perplexity": 228.19933676762415,
492
+ "learning_rate": 3.2050953343333e-05,
493
+ "step_time": 17.394481897354126,
494
+ "tokens_per_second": 117.7384881070543,
495
+ "memory_mb": 856.4453125
496
+ },
497
+ {
498
+ "step": 5600,
499
+ "loss": 5.905840158462524,
500
+ "perplexity": 367.17558190769864,
501
+ "learning_rate": 2.9999999999999997e-05,
502
+ "step_time": 14.9873046875,
503
+ "tokens_per_second": 136.64898677265916,
504
+ "memory_mb": 856.71875
505
+ },
506
+ {
507
+ "step": 5700,
508
+ "loss": 5.61777663230896,
509
+ "perplexity": 275.27666109950576,
510
+ "learning_rate": 3.205095334333272e-05,
511
+ "step_time": 16.493826627731323,
512
+ "tokens_per_second": 124.16766868136386,
513
+ "memory_mb": 856.71875
514
+ },
515
+ {
516
+ "step": 5800,
517
+ "loss": 5.823711514472961,
518
+ "perplexity": 338.2250541104363,
519
+ "learning_rate": 3.814149619382621e-05,
520
+ "step_time": 15.767404556274414,
521
+ "tokens_per_second": 129.88821290724272,
522
+ "memory_mb": 856.7265625
523
+ },
524
+ {
525
+ "step": 5900,
526
+ "loss": 5.767146706581116,
527
+ "perplexity": 319.6244479984378,
528
+ "learning_rate": 4.808657048893161e-05,
529
+ "step_time": 14.995726346969604,
530
+ "tokens_per_second": 136.57224415900788,
531
+ "memory_mb": 857.28125
532
+ },
533
+ {
534
+ "step": 6000,
535
+ "loss": 5.632080435752869,
536
+ "perplexity": 279.242459738446,
537
+ "learning_rate": 6.158400017864261e-05,
538
+ "step_time": 15.532944679260254,
539
+ "tokens_per_second": 131.84879250452173,
540
+ "memory_mb": 857.25
541
+ },
542
+ {
543
+ "step": 6100,
544
+ "loss": 5.641765594482422,
545
+ "perplexity": 281.9601064615597,
546
+ "learning_rate": 7.822367269186639e-05,
547
+ "step_time": 13.042147397994995,
548
+ "tokens_per_second": 157.02935548135613,
549
+ "memory_mb": 3038.3984375
550
+ },
551
+ {
552
+ "step": 6200,
553
+ "loss": 5.216195583343506,
554
+ "perplexity": 184.23195401529154,
555
+ "learning_rate": 9.749999999936928e-05,
556
+ "step_time": 13.716249227523804,
557
+ "tokens_per_second": 149.3119559165174,
558
+ "memory_mb": 3027.03515625
559
+ },
560
+ {
561
+ "step": 6300,
562
+ "loss": 5.537067532539368,
563
+ "perplexity": 253.93225847719265,
564
+ "learning_rate": 0.00011882728065020474,
565
+ "step_time": 13.56373405456543,
566
+ "tokens_per_second": 150.99086960575298,
567
+ "memory_mb": 3029.5546875
568
+ },
569
+ {
570
+ "step": 6400,
571
+ "loss": 5.462532162666321,
572
+ "perplexity": 235.69348362718267,
573
+ "learning_rate": 0.0001415574960139221,
574
+ "step_time": 15.235360145568848,
575
+ "tokens_per_second": 134.4241278468008,
576
+ "memory_mb": 3029.3984375
577
+ },
578
+ {
579
+ "step": 6500,
580
+ "loss": 5.461255669593811,
581
+ "perplexity": 235.39281446997174,
582
+ "learning_rate": 0.00016499999999873858,
583
+ "step_time": 14.625237941741943,
584
+ "tokens_per_second": 140.03190978211683,
585
+ "memory_mb": 3029.7421875
586
+ },
587
+ {
588
+ "step": 6600,
589
+ "loss": 5.543500542640686,
590
+ "perplexity": 255.571072864111,
591
+ "learning_rate": 0.00018844250398355512,
592
+ "step_time": 13.293677568435669,
593
+ "tokens_per_second": 154.0581971735755,
594
+ "memory_mb": 3025.90234375
595
+ },
596
+ {
597
+ "step": 6700,
598
+ "loss": 5.326942205429077,
599
+ "perplexity": 205.80769336539043,
600
+ "learning_rate": 0.0002111727193472727,
601
+ "step_time": 13.689483880996704,
602
+ "tokens_per_second": 149.60388702768896,
603
+ "memory_mb": 3032.14453125
604
+ },
605
+ {
606
+ "step": 6800,
607
+ "loss": 5.4400506019592285,
608
+ "perplexity": 230.4538445816494,
609
+ "learning_rate": 0.00023249999999810797,
610
+ "step_time": 13.920767545700073,
611
+ "tokens_per_second": 147.11832470994733,
612
+ "memory_mb": 3033.22265625
613
+ },
614
+ {
615
+ "step": 6900,
616
+ "loss": 5.562488794326782,
617
+ "perplexity": 260.47028727589134,
618
+ "learning_rate": 0.0002517763273056109,
619
+ "step_time": 13.11566162109375,
620
+ "tokens_per_second": 156.1491946930247,
621
+ "memory_mb": 3030.31640625
622
+ },
623
+ {
624
+ "step": 7000,
625
+ "loss": 5.337615728378296,
626
+ "perplexity": 208.01615155187662,
627
+ "learning_rate": 0.00026841599981883453,
628
+ "step_time": 13.812754154205322,
629
+ "tokens_per_second": 148.2687650222517,
630
+ "memory_mb": 3030.01171875
631
+ },
632
+ {
633
+ "step": 7100,
634
+ "loss": 5.591769337654114,
635
+ "perplexity": 268.20975375486825,
636
+ "learning_rate": 4.0863750705554176e-05,
637
+ "step_time": 22.385777473449707,
638
+ "tokens_per_second": 91.48665943941405,
639
+ "memory_mb": 778.03515625
640
+ },
641
+ {
642
+ "step": 7200,
643
+ "loss": 5.491156101226807,
644
+ "perplexity": 242.5374426712762,
645
+ "learning_rate": 3.860829246363548e-05,
646
+ "step_time": 23.257835626602173,
647
+ "tokens_per_second": 88.0563450907491,
648
+ "memory_mb": 766.17578125
649
+ },
650
+ {
651
+ "step": 7300,
652
+ "loss": 5.590375304222107,
653
+ "perplexity": 267.83612088021056,
654
+ "learning_rate": 3.660737030015427e-05,
655
+ "step_time": 22.15459370613098,
656
+ "tokens_per_second": 92.4413251339944,
657
+ "memory_mb": 765.515625
658
+ },
659
+ {
660
+ "step": 7400,
661
+ "loss": 5.408857345581055,
662
+ "perplexity": 223.37619999638744,
663
+ "learning_rate": 3.486501380605981e-05,
664
+ "step_time": 22.82973575592041,
665
+ "tokens_per_second": 89.70756481353028,
666
+ "memory_mb": 762.5078125
667
+ },
668
+ {
669
+ "step": 7500,
670
+ "loss": 5.681139707565308,
671
+ "perplexity": 293.2834969372769,
672
+ "learning_rate": 3.338473185545381e-05,
673
+ "step_time": 23.581167221069336,
674
+ "tokens_per_second": 86.848966414612,
675
+ "memory_mb": 762.328125
676
+ },
677
+ {
678
+ "step": 7600,
679
+ "loss": 5.239741921424866,
680
+ "perplexity": 188.6214169770699,
681
+ "learning_rate": 3.2169505539184994e-05,
682
+ "step_time": 24.760926008224487,
683
+ "tokens_per_second": 82.71096159003685,
684
+ "memory_mb": 765.65625
685
+ },
686
+ {
687
+ "step": 7700,
688
+ "loss": 5.552804112434387,
689
+ "perplexity": 257.95989121628156,
690
+ "learning_rate": 3.122178216132881e-05,
691
+ "step_time": 21.644299030303955,
692
+ "tokens_per_second": 94.62075889510751,
693
+ "memory_mb": 766.9921875
694
+ },
695
+ {
696
+ "step": 7800,
697
+ "loss": 5.3391993045806885,
698
+ "perplexity": 208.34582193938436,
699
+ "learning_rate": 3.054347031064272e-05,
700
+ "step_time": 21.552024602890015,
701
+ "tokens_per_second": 95.02587518971994,
702
+ "memory_mb": 768.30078125
703
+ },
704
+ {
705
+ "step": 7900,
706
+ "loss": 5.07136607170105,
707
+ "perplexity": 159.39191947680405,
708
+ "learning_rate": 3.0135936016922528e-05,
709
+ "step_time": 25.795836448669434,
710
+ "tokens_per_second": 79.3926571861808,
711
+ "memory_mb": 766.03125
712
+ },
713
+ {
714
+ "step": 8000,
715
+ "loss": 5.289915561676025,
716
+ "perplexity": 198.32667833002583,
717
+ "learning_rate": 2.9999999999999997e-05,
718
+ "step_time": 20.64373207092285,
719
+ "tokens_per_second": 99.20686787466366,
720
+ "memory_mb": 770.73046875
721
+ },
722
+ {
723
+ "step": 8100,
724
+ "loss": 5.472690582275391,
725
+ "perplexity": 238.09995923313426,
726
+ "learning_rate": 2.9999999999999997e-05,
727
+ "step_time": 20.021532773971558,
728
+ "tokens_per_second": 102.28987076666009,
729
+ "memory_mb": 782.06640625
730
+ },
731
+ {
732
+ "step": 8200,
733
+ "loss": 5.381044268608093,
734
+ "perplexity": 217.24902334669338,
735
+ "learning_rate": 2.9999999999999997e-05,
736
+ "step_time": 19.268752336502075,
737
+ "tokens_per_second": 106.28607209405757,
738
+ "memory_mb": 772.1875
739
+ },
740
+ {
741
+ "step": 8300,
742
+ "loss": 5.509163498878479,
743
+ "perplexity": 246.94447131857976,
744
+ "learning_rate": 2.9999999999999997e-05,
745
+ "step_time": 21.33850598335266,
746
+ "tokens_per_second": 95.97672871745365,
747
+ "memory_mb": 773.4140625
748
+ },
749
+ {
750
+ "step": 8400,
751
+ "loss": 5.361395001411438,
752
+ "perplexity": 213.0219051308242,
753
+ "learning_rate": 2.9999999999999997e-05,
754
+ "step_time": 19.24502682685852,
755
+ "tokens_per_second": 106.41710289235836,
756
+ "memory_mb": 771.12890625
757
+ },
758
+ {
759
+ "step": 8500,
760
+ "loss": 5.601587295532227,
761
+ "perplexity": 270.8559949054038,
762
+ "learning_rate": 2.9999999999999997e-05,
763
+ "step_time": 19.78052020072937,
764
+ "tokens_per_second": 103.53620527757828,
765
+ "memory_mb": 788.19140625
766
+ },
767
+ {
768
+ "step": 8600,
769
+ "loss": 5.177438259124756,
770
+ "perplexity": 177.22821619925446,
771
+ "learning_rate": 2.9999999999999997e-05,
772
+ "step_time": 19.43180513381958,
773
+ "tokens_per_second": 105.39422281646966,
774
+ "memory_mb": 768.90625
775
+ },
776
+ {
777
+ "step": 8700,
778
+ "loss": 5.507565855979919,
779
+ "perplexity": 246.5502572281615,
780
+ "learning_rate": 2.9999999999999997e-05,
781
+ "step_time": 20.27060604095459,
782
+ "tokens_per_second": 101.03299308675011,
783
+ "memory_mb": 770.671875
784
+ },
785
+ {
786
+ "step": 8800,
787
+ "loss": 5.28560483455658,
788
+ "perplexity": 197.47358618400258,
789
+ "learning_rate": 2.9999999999999997e-05,
790
+ "step_time": 18.91090226173401,
791
+ "tokens_per_second": 108.29731821649273,
792
+ "memory_mb": 780.921875
793
+ },
794
+ {
795
+ "step": 8900,
796
+ "loss": 5.001006722450256,
797
+ "perplexity": 148.56264519463585,
798
+ "learning_rate": 2.9999999999999997e-05,
799
+ "step_time": 18.08107566833496,
800
+ "tokens_per_second": 113.26759743540164,
801
+ "memory_mb": 782.58203125
802
+ },
803
+ {
804
+ "step": 9000,
805
+ "loss": 5.228554844856262,
806
+ "perplexity": 186.5230539013194,
807
+ "learning_rate": 2.9999999999999997e-05,
808
+ "step_time": 19.203582286834717,
809
+ "tokens_per_second": 106.64676878563616,
810
+ "memory_mb": 775.125
811
+ },
812
+ {
813
+ "step": 9100,
814
+ "loss": 5.65421450138092,
815
+ "perplexity": 285.4921409454976,
816
+ "learning_rate": 0.000249042537549566,
817
+ "step_time": 21.455636262893677,
818
+ "tokens_per_second": 95.45277403597214,
819
+ "memory_mb": 865.41015625
820
+ },
821
+ {
822
+ "step": 9200,
823
+ "loss": 5.5710365772247314,
824
+ "perplexity": 262.7062734909883,
825
+ "learning_rate": 0.00020331309202311048,
826
+ "step_time": 19.316173791885376,
827
+ "tokens_per_second": 106.02513841847676,
828
+ "memory_mb": 856.3984375
829
+ },
830
+ {
831
+ "step": 9300,
832
+ "loss": 5.682259440422058,
833
+ "perplexity": 293.61208003345325,
834
+ "learning_rate": 0.0001628567928623838,
835
+ "step_time": 19.75247859954834,
836
+ "tokens_per_second": 103.68319042486291,
837
+ "memory_mb": 856.68359375
838
+ },
839
+ {
840
+ "step": 9400,
841
+ "loss": 5.443678855895996,
842
+ "perplexity": 231.29150836001398,
843
+ "learning_rate": 0.00012771356555031085,
844
+ "step_time": 19.370458602905273,
845
+ "tokens_per_second": 105.7280078899542,
846
+ "memory_mb": 856.72265625
847
+ },
848
+ {
849
+ "step": 9500,
850
+ "loss": 5.662865877151489,
851
+ "perplexity": 287.9727556291962,
852
+ "learning_rate": 9.791809220935524e-05,
853
+ "step_time": 19.50398349761963,
854
+ "tokens_per_second": 105.00419056701668,
855
+ "memory_mb": 881.109375,
856
+ "validation_loss": 5.492509889602661,
857
+ "validation_perplexity": 242.8660093965992
858
+ },
859
+ {
860
+ "step": 9600,
861
+ "loss": 5.278370499610901,
862
+ "perplexity": 196.05015112962369,
863
+ "learning_rate": 7.34997773744534e-05,
864
+ "step_time": 20.28781247138977,
865
+ "tokens_per_second": 100.94730532866102,
866
+ "memory_mb": 856.40625
867
+ },
868
+ {
869
+ "step": 9700,
870
+ "loss": 5.564499616622925,
871
+ "perplexity": 260.99457368376903,
872
+ "learning_rate": 5.448271897428809e-05,
873
+ "step_time": 19.123426914215088,
874
+ "tokens_per_second": 107.09377608872249,
875
+ "memory_mb": 856.16796875
876
+ },
877
+ {
878
+ "step": 9800,
879
+ "loss": 5.31648576259613,
880
+ "perplexity": 203.66688908965676,
881
+ "learning_rate": 4.0885684549543514e-05,
882
+ "step_time": 24.667370319366455,
883
+ "tokens_per_second": 83.02465862735707,
884
+ "memory_mb": 847.92578125
885
+ },
886
+ {
887
+ "step": 9900,
888
+ "loss": 5.028266668319702,
889
+ "perplexity": 152.6681586659976,
890
+ "learning_rate": 3.27220927316066e-05,
891
+ "step_time": 20.24069046974182,
892
+ "tokens_per_second": 101.18231900545057,
893
+ "memory_mb": 856.6015625
894
+ },
895
+ {
896
+ "step": 10000,
897
+ "loss": 5.290503025054932,
898
+ "perplexity": 198.44322221988676,
899
+ "learning_rate": 2.9999999999999997e-05,
900
+ "step_time": 20.630642414093018,
901
+ "tokens_per_second": 99.26981229634366,
902
+ "memory_mb": 884.50390625,
903
+ "validation_loss": 5.417883062362671,
904
+ "validation_perplexity": 225.401456259331
905
+ }
906
+ ]