log
Browse files
train.log
CHANGED
|
@@ -15,4 +15,9 @@
|
|
| 15 |
[p1] starting training: max_steps=500
|
| 16 |
[transformers] The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248044, 'bos_token_id': None, 'pad_token_id': 248044}.
|
| 17 |
|
| 18 |
0%| | 0/500 [00:00<?, ?it/s][transformers] `use_return_dict` is deprecated! Use `return_dict` instead!
|
| 19 |
-
|
| 20 |
0%| | 1/500 [07:55<65:50:46, 475.04s/it]
|
| 21 |
0%| | 2/500 [07:58<27:19:18, 197.51s/it]
|
|
|
|
| 22 |
0%| | 1/500 [07:55<65:50:46, 475.04s/it]
|
| 23 |
0%| | 2/500 [07:58<27:19:18, 197.51s/it]
|
| 24 |
1%| | 3/500 [08:01<15:00:18, 108.69s/it]
|
| 25 |
1%| | 4/500 [08:23<10:17:27, 74.69s/it]
|
| 26 |
1%| | 5/500 [08:26<6:42:49, 48.83s/it]
|
| 27 |
1%| | 6/500 [08:31<4:37:59, 33.76s/it]
|
| 28 |
1%|β | 7/500 [08:34<3:15:24, 23.78s/it]
|
| 29 |
2%|β | 8/500 [08:37<2:20:41, 17.16s/it]
|
| 30 |
2%|β | 9/500 [08:40<1:44:15, 12.74s/it]
|
| 31 |
2%|β | 10/500 [08:43<1:19:59, 9.80s/it][p1][NaN-ABORT] grad_norm=nan at step 10
|
|
|
|
| 32 |
|
|
|
|
| 33 |
2%|β | 10/500 [08:43<1:19:59, 9.80s/it]
|
| 34 |
|
|
|
|
| 35 |
2%|β | 10/500 [08:43<1:19:59, 9.80s/it]
|
| 36 |
2%|β | 10/500 [08:43<7:07:48, 52.38s/it]
|
|
|
|
|
|
|
|
|
| 15 |
[p1] starting training: max_steps=500
|
| 16 |
[transformers] The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248044, 'bos_token_id': None, 'pad_token_id': 248044}.
|
| 17 |
|
| 18 |
0%| | 0/500 [00:00<?, ?it/s][transformers] `use_return_dict` is deprecated! Use `return_dict` instead!
|
|
|
|
| 19 |
0%| | 1/500 [07:55<65:50:46, 475.04s/it]
|
| 20 |
0%| | 2/500 [07:58<27:19:18, 197.51s/it]
|
| 21 |
+
|
| 22 |
0%| | 1/500 [07:55<65:50:46, 475.04s/it]
|
| 23 |
0%| | 2/500 [07:58<27:19:18, 197.51s/it]
|
| 24 |
1%| | 3/500 [08:01<15:00:18, 108.69s/it]
|
| 25 |
1%| | 4/500 [08:23<10:17:27, 74.69s/it]
|
| 26 |
1%| | 5/500 [08:26<6:42:49, 48.83s/it]
|
| 27 |
1%| | 6/500 [08:31<4:37:59, 33.76s/it]
|
| 28 |
1%|β | 7/500 [08:34<3:15:24, 23.78s/it]
|
| 29 |
2%|β | 8/500 [08:37<2:20:41, 17.16s/it]
|
| 30 |
2%|β | 9/500 [08:40<1:44:15, 12.74s/it]
|
| 31 |
2%|β | 10/500 [08:43<1:19:59, 9.80s/it][p1][NaN-ABORT] grad_norm=nan at step 10
|
| 32 |
+
|
| 33 |
|
| 34 |
+
|
| 35 |
2%|β | 10/500 [08:43<1:19:59, 9.80s/it]
|
| 36 |
|
| 37 |
+
|
| 38 |
2%|β | 10/500 [08:43<1:19:59, 9.80s/it]
|
| 39 |
2%|β | 10/500 [08:43<7:07:48, 52.38s/it]
|
| 40 |
+
[p1] training finished in 8.7m
|
| 41 |
+
[p1] NaN-aborted. Reason: grad_norm=nan step=10. NOT pushing model.
|