Tobias1726
/

shadowfox-floor

Model card Files Files and versions

Tobias1726 commited on 24 days ago

Commit

05ab68a

·

verified ·

1 Parent(s): 1789325

log

Files changed (1) hide show

train.log +6 -1

train.log CHANGED Viewed

@@ -15,4 +15,9 @@
 [p1] starting training: max_steps=500
 [transformers] The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248044, 'bos_token_id': None, 'pad_token_id': 248044}.
  0%|          | 0/500 [00:00<?, ?it/s][transformers] `use_return_dict` is deprecated! Use `return_dict` instead!
  0%|          | 1/500 [07:55<65:50:46, 475.04s/it]
  0%|          | 2/500 [07:58<27:19:18, 197.51s/it]
  0%|          | 1/500 [07:55<65:50:46, 475.04s/it]
  0%|          | 2/500 [07:58<27:19:18, 197.51s/it]
  1%|          | 3/500 [08:01<15:00:18, 108.69s/it]
  1%|          | 4/500 [08:23<10:17:27, 74.69s/it]
  1%|          | 5/500 [08:26<6:42:49, 48.83s/it]
  1%|          | 6/500 [08:31<4:37:59, 33.76s/it]
  1%|▏         | 7/500 [08:34<3:15:24, 23.78s/it]
  2%|▏         | 8/500 [08:37<2:20:41, 17.16s/it]
  2%|▏         | 9/500 [08:40<1:44:15, 12.74s/it]
  2%|▏         | 10/500 [08:43<1:19:59,  9.80s/it][p1][NaN-ABORT] grad_norm=nan at step 10
  2%|▏         | 10/500 [08:43<1:19:59,  9.80s/it]
  2%|▏         | 10/500 [08:43<1:19:59,  9.80s/it]
  2%|▏         | 10/500 [08:43<7:07:48, 52.38s/it]

 [p1] starting training: max_steps=500
 [transformers] The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248044, 'bos_token_id': None, 'pad_token_id': 248044}.
  0%|          | 0/500 [00:00<?, ?it/s][transformers] `use_return_dict` is deprecated! Use `return_dict` instead!
  0%|          | 1/500 [07:55<65:50:46, 475.04s/it]
  0%|          | 2/500 [07:58<27:19:18, 197.51s/it]
  0%|          | 1/500 [07:55<65:50:46, 475.04s/it]
  0%|          | 2/500 [07:58<27:19:18, 197.51s/it]
  1%|          | 3/500 [08:01<15:00:18, 108.69s/it]
  1%|          | 4/500 [08:23<10:17:27, 74.69s/it]
  1%|          | 5/500 [08:26<6:42:49, 48.83s/it]
  1%|          | 6/500 [08:31<4:37:59, 33.76s/it]
  1%|▏         | 7/500 [08:34<3:15:24, 23.78s/it]
  2%|▏         | 8/500 [08:37<2:20:41, 17.16s/it]
  2%|▏         | 9/500 [08:40<1:44:15, 12.74s/it]
  2%|▏         | 10/500 [08:43<1:19:59,  9.80s/it][p1][NaN-ABORT] grad_norm=nan at step 10
  2%|▏         | 10/500 [08:43<1:19:59,  9.80s/it]
  2%|▏         | 10/500 [08:43<1:19:59,  9.80s/it]
  2%|▏         | 10/500 [08:43<7:07:48, 52.38s/it]
+[p1] training finished in 8.7m
+[p1] NaN-aborted. Reason: grad_norm=nan step=10. NOT pushing model.