Tobias1726 commited on
Commit
05ab68a
Β·
verified Β·
1 Parent(s): 1789325
Files changed (1) hide show
  1. train.log +6 -1
train.log CHANGED
@@ -15,4 +15,9 @@
15
  [p1] starting training: max_steps=500
16
  [transformers] The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248044, 'bos_token_id': None, 'pad_token_id': 248044}.
17
 
18
  0%| | 0/500 [00:00<?, ?it/s][transformers] `use_return_dict` is deprecated! Use `return_dict` instead!
19
-
20
  0%| | 1/500 [07:55<65:50:46, 475.04s/it]
21
  0%| | 2/500 [07:58<27:19:18, 197.51s/it]
 
22
  0%| | 1/500 [07:55<65:50:46, 475.04s/it]
23
  0%| | 2/500 [07:58<27:19:18, 197.51s/it]
24
  1%| | 3/500 [08:01<15:00:18, 108.69s/it]
25
  1%| | 4/500 [08:23<10:17:27, 74.69s/it]
26
  1%| | 5/500 [08:26<6:42:49, 48.83s/it]
27
  1%| | 6/500 [08:31<4:37:59, 33.76s/it]
28
  1%|▏ | 7/500 [08:34<3:15:24, 23.78s/it]
29
  2%|▏ | 8/500 [08:37<2:20:41, 17.16s/it]
30
  2%|▏ | 9/500 [08:40<1:44:15, 12.74s/it]
31
  2%|▏ | 10/500 [08:43<1:19:59, 9.80s/it][p1][NaN-ABORT] grad_norm=nan at step 10
 
32
 
 
33
  2%|▏ | 10/500 [08:43<1:19:59, 9.80s/it]
34
 
 
35
  2%|▏ | 10/500 [08:43<1:19:59, 9.80s/it]
36
  2%|▏ | 10/500 [08:43<7:07:48, 52.38s/it]
 
 
 
15
  [p1] starting training: max_steps=500
16
  [transformers] The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248044, 'bos_token_id': None, 'pad_token_id': 248044}.
17
 
18
  0%| | 0/500 [00:00<?, ?it/s][transformers] `use_return_dict` is deprecated! Use `return_dict` instead!
 
19
  0%| | 1/500 [07:55<65:50:46, 475.04s/it]
20
  0%| | 2/500 [07:58<27:19:18, 197.51s/it]
21
+
22
  0%| | 1/500 [07:55<65:50:46, 475.04s/it]
23
  0%| | 2/500 [07:58<27:19:18, 197.51s/it]
24
  1%| | 3/500 [08:01<15:00:18, 108.69s/it]
25
  1%| | 4/500 [08:23<10:17:27, 74.69s/it]
26
  1%| | 5/500 [08:26<6:42:49, 48.83s/it]
27
  1%| | 6/500 [08:31<4:37:59, 33.76s/it]
28
  1%|▏ | 7/500 [08:34<3:15:24, 23.78s/it]
29
  2%|▏ | 8/500 [08:37<2:20:41, 17.16s/it]
30
  2%|▏ | 9/500 [08:40<1:44:15, 12.74s/it]
31
  2%|▏ | 10/500 [08:43<1:19:59, 9.80s/it][p1][NaN-ABORT] grad_norm=nan at step 10
32
+
33
 
34
+
35
  2%|▏ | 10/500 [08:43<1:19:59, 9.80s/it]
36
 
37
+
38
  2%|▏ | 10/500 [08:43<1:19:59, 9.80s/it]
39
  2%|▏ | 10/500 [08:43<7:07:48, 52.38s/it]
40
+ [p1] training finished in 8.7m
41
+ [p1] NaN-aborted. Reason: grad_norm=nan step=10. NOT pushing model.