File size: 30,684 Bytes
c5c2ebf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
model training desc:  QuALITY 使用随机选择的关键句训练
2023-12-07 17:47:54.813 | INFO     | __main__:init_components:108 - Initializing components...

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:16<00:16, 16.49s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:21<00:00,  9.65s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:21<00:00, 10.68s/it]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2023-12-07 17:48:16.828 | INFO     | __main__:init_components:155 - 

2023-12-07 17:48:16.828 | INFO     | __main__:init_components:156 - ********************
2023-12-07 17:48:16.828 | INFO     | __main__:init_components:157 - using llama2 model
2023-12-07 17:48:16.828 | INFO     | __main__:init_components:158 - ********************
2023-12-07 17:48:16.829 | INFO     | __main__:init_components:159 - 

memory footprint of model: 4.024436950683594 GB
trainable params: 319,815,680 || all params: 7,058,231,296 || trainable%: 4.531102291607305
2023-12-07 17:49:11.724 | INFO     | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/QuALITY/random_select/quality_random_2048_instruct/train.jsonl
2023-12-07 17:49:11.842 | INFO     | component.dataset:__init__:19 - there are 2523 data in dataset
2023-12-07 17:49:11.887 | INFO     | __main__:main:231 - *** starting training ***

  0%|          | 0/420 [00:00<?, ?it/s]
  0%|          | 1/420 [00:33<3:51:41, 33.18s/it]
  0%|          | 2/420 [01:02<3:36:16, 31.04s/it]
  1%|          | 3/420 [01:32<3:32:24, 30.56s/it]
  1%|          | 4/420 [02:02<3:29:22, 30.20s/it]
  1%|          | 5/420 [02:32<3:27:37, 30.02s/it]
  1%|▏         | 6/420 [03:02<3:27:08, 30.02s/it]
  2%|▏         | 7/420 [03:31<3:25:56, 29.92s/it]
  2%|▏         | 8/420 [04:01<3:25:01, 29.86s/it]
  2%|▏         | 9/420 [04:31<3:25:00, 29.93s/it]
  2%|▏         | 10/420 [05:01<3:24:48, 29.97s/it]
  3%|▎         | 11/420 [05:31<3:23:44, 29.89s/it]
  3%|▎         | 12/420 [06:01<3:23:37, 29.94s/it]
  3%|▎         | 13/420 [06:31<3:22:42, 29.88s/it]
  3%|▎         | 14/420 [07:00<3:21:06, 29.72s/it]
  4%|▎         | 15/420 [07:29<3:19:50, 29.61s/it]
  4%|▍         | 16/420 [07:59<3:18:49, 29.53s/it]
  4%|▍         | 17/420 [08:29<3:19:25, 29.69s/it]
  4%|▍         | 18/420 [08:58<3:18:55, 29.69s/it]
  5%|▍         | 19/420 [09:29<3:19:12, 29.81s/it]
  5%|▍         | 20/420 [09:58<3:18:26, 29.77s/it]
  5%|▌         | 21/420 [10:28<3:17:44, 29.73s/it]
  5%|▌         | 22/420 [10:58<3:17:09, 29.72s/it]
  5%|▌         | 23/420 [11:27<3:15:56, 29.61s/it]
  6%|▌         | 24/420 [11:57<3:16:21, 29.75s/it]
  6%|▌         | 25/420 [12:27<3:15:47, 29.74s/it]
  6%|▌         | 26/420 [12:57<3:15:56, 29.84s/it]
  6%|▋         | 27/420 [13:27<3:15:55, 29.91s/it]
  7%|▋         | 28/420 [13:57<3:14:59, 29.85s/it]
  7%|▋         | 29/420 [14:26<3:13:32, 29.70s/it]
  7%|▋         | 30/420 [14:56<3:13:44, 29.81s/it]
  7%|▋         | 31/420 [15:26<3:13:05, 29.78s/it]
  8%|▊         | 32/420 [15:55<3:12:29, 29.77s/it]
  8%|▊         | 33/420 [16:26<3:12:33, 29.85s/it]
  8%|▊         | 34/420 [16:55<3:11:44, 29.80s/it]
  8%|▊         | 35/420 [17:25<3:11:44, 29.88s/it]
  9%|▊         | 36/420 [17:55<3:11:34, 29.93s/it]
  9%|▉         | 37/420 [18:25<3:10:35, 29.86s/it]
  9%|▉         | 38/420 [18:55<3:10:31, 29.92s/it]
  9%|▉         | 39/420 [19:25<3:09:36, 29.86s/it]
 10%|▉         | 40/420 [19:55<3:08:51, 29.82s/it]
 10%|▉         | 41/420 [20:24<3:08:03, 29.77s/it]
 10%|█         | 42/420 [20:54<3:08:06, 29.86s/it]
 10%|█         | 43/420 [21:24<3:08:03, 29.93s/it]
 10%|█         | 44/420 [21:54<3:06:28, 29.76s/it]
 11%|█         | 45/420 [22:23<3:05:51, 29.74s/it]
 11%|█         | 46/420 [22:53<3:05:15, 29.72s/it]
 11%|█         | 47/420 [23:23<3:05:22, 29.82s/it]
 11%|█▏        | 48/420 [23:53<3:05:19, 29.89s/it]
 12%|█▏        | 49/420 [24:23<3:04:24, 29.82s/it]
 12%|█▏        | 50/420 [24:52<3:03:02, 29.68s/it]
                                                  
{'loss': 0.5905, 'learning_rate': 0.0001, 'global_step': 50, 'epoch': 0.24}

 12%|█▏        | 50/420 [24:52<3:03:02, 29.68s/it]
 12%|█▏        | 51/420 [25:22<3:02:28, 29.67s/it]
 12%|█▏        | 52/420 [25:52<3:02:37, 29.78s/it]
 13%|█▎        | 53/420 [26:22<3:02:35, 29.85s/it]
 13%|█▎        | 54/420 [26:52<3:01:49, 29.81s/it]
 13%|█▎        | 55/420 [27:21<3:01:06, 29.77s/it]
 13%|█▎        | 56/420 [27:51<3:01:04, 29.85s/it]
 14%|█▎        | 57/420 [28:21<2:59:42, 29.70s/it]
 14%|█▍        | 58/420 [28:51<2:59:46, 29.80s/it]
 14%|█▍        | 59/420 [29:21<2:59:41, 29.87s/it]
 14%|█▍        | 60/420 [29:50<2:58:52, 29.81s/it]
 15%|█▍        | 61/420 [30:20<2:57:33, 29.67s/it]
 15%|█▍        | 62/420 [30:50<2:57:39, 29.78s/it]
 15%|█▌        | 63/420 [31:20<2:57:38, 29.86s/it]
 15%|█▌        | 64/420 [31:49<2:56:51, 29.81s/it]
 15%|█▌        | 65/420 [32:19<2:56:08, 29.77s/it]
 16%|█▌        | 66/420 [32:49<2:56:05, 29.85s/it]
 16%|█▌        | 67/420 [33:19<2:55:07, 29.76s/it]
 16%|█▌        | 68/420 [33:49<2:55:04, 29.84s/it]
 16%|█▋        | 69/420 [34:18<2:53:42, 29.69s/it]
 17%|█▋        | 70/420 [34:47<2:52:36, 29.59s/it]
 17%|█▋        | 71/420 [35:17<2:52:19, 29.63s/it]
 17%|█▋        | 72/420 [35:47<2:51:21, 29.55s/it]
 17%|█▋        | 73/420 [36:16<2:51:06, 29.59s/it]
 18%|█▊        | 74/420 [36:46<2:50:47, 29.62s/it]
 18%|█▊        | 75/420 [37:16<2:50:15, 29.61s/it]
 18%|█▊        | 76/420 [37:45<2:49:21, 29.54s/it]
 18%|█▊        | 77/420 [38:15<2:49:06, 29.58s/it]
 19%|█▊        | 78/420 [38:45<2:49:22, 29.71s/it]
 19%|█▉        | 79/420 [39:14<2:48:18, 29.61s/it]
 19%|█▉        | 80/420 [39:44<2:48:29, 29.73s/it]
 19%|█▉        | 81/420 [40:13<2:47:21, 29.62s/it]
 20%|█▉        | 82/420 [40:43<2:46:24, 29.54s/it]
 20%|█▉        | 83/420 [41:12<2:46:11, 29.59s/it]
 20%|██        | 84/420 [41:42<2:45:55, 29.63s/it]
 20%|██        | 85/420 [42:12<2:46:04, 29.74s/it]
 20%|██        | 86/420 [42:42<2:45:30, 29.73s/it]
 21%|██        | 87/420 [43:12<2:45:29, 29.82s/it]
 21%|██        | 88/420 [43:42<2:44:48, 29.78s/it]
 21%|██        | 89/420 [44:12<2:44:43, 29.86s/it]
 21%|██▏       | 90/420 [44:42<2:44:29, 29.91s/it]
 22%|██▏       | 91/420 [45:12<2:44:12, 29.95s/it]
 22%|██▏       | 92/420 [45:42<2:43:50, 29.97s/it]
 22%|██▏       | 93/420 [46:11<2:42:53, 29.89s/it]
 22%|██▏       | 94/420 [46:41<2:42:05, 29.83s/it]
 23%|██▎       | 95/420 [47:11<2:41:22, 29.79s/it]
 23%|██▎       | 96/420 [47:41<2:41:16, 29.87s/it]
 23%|██▎       | 97/420 [48:11<2:41:02, 29.92s/it]
 23%|██▎       | 98/420 [48:40<2:39:39, 29.75s/it]
 24%|██▎       | 99/420 [49:10<2:39:37, 29.84s/it]
 24%|██▍       | 100/420 [49:40<2:38:43, 29.76s/it]
                                                   
{'loss': 0.5458, 'learning_rate': 0.0001, 'global_step': 100, 'epoch': 0.48}

 24%|██▍       | 100/420 [49:40<2:38:43, 29.76s/it]
 24%|██▍       | 101/420 [50:10<2:38:05, 29.74s/it]
 24%|██▍       | 102/420 [50:39<2:37:00, 29.62s/it]
 25%|██▍       | 103/420 [51:08<2:36:06, 29.55s/it]
 25%|██▍       | 104/420 [51:38<2:36:21, 29.69s/it]
 25%|██▌       | 105/420 [52:08<2:35:53, 29.70s/it]
 25%|██▌       | 106/420 [52:38<2:35:23, 29.69s/it]
 25%|██▌       | 107/420 [53:07<2:34:54, 29.69s/it]
 26%|██▌       | 108/420 [53:37<2:33:53, 29.59s/it]
 26%|██▌       | 109/420 [54:07<2:34:02, 29.72s/it]
 26%|██▌       | 110/420 [54:37<2:34:01, 29.81s/it]
 26%|██▋       | 111/420 [55:06<2:32:49, 29.67s/it]
 27%|██▋       | 112/420 [55:35<2:31:51, 29.58s/it]
 27%|██▋       | 113/420 [56:06<2:32:03, 29.72s/it]
 27%|██▋       | 114/420 [56:36<2:32:01, 29.81s/it]
 27%|██▋       | 115/420 [57:05<2:31:21, 29.78s/it]
 28%|██▊       | 116/420 [57:35<2:31:15, 29.85s/it]
 28%|██▊       | 117/420 [58:05<2:31:03, 29.91s/it]
 28%|██▊       | 118/420 [58:35<2:29:44, 29.75s/it]
 28%|██▊       | 119/420 [59:04<2:29:11, 29.74s/it]
 29%|██▊       | 120/420 [59:34<2:28:38, 29.73s/it]
 29%|██▉       | 121/420 [1:00:03<2:27:36, 29.62s/it]
 29%|██▉       | 122/420 [1:00:33<2:26:34, 29.51s/it]
 29%|██▉       | 123/420 [1:01:03<2:26:49, 29.66s/it]
 30%|██▉       | 124/420 [1:01:32<2:26:24, 29.68s/it]
 30%|██▉       | 125/420 [1:02:02<2:26:24, 29.78s/it]
 30%|███       | 126/420 [1:02:33<2:26:16, 29.85s/it]
 30%|███       | 127/420 [1:03:02<2:25:24, 29.78s/it]
 30%|███       | 128/420 [1:03:32<2:24:48, 29.75s/it]
 31%|███       | 129/420 [1:04:01<2:24:12, 29.73s/it]
 31%|███       | 130/420 [1:04:31<2:23:38, 29.72s/it]
 31%|███       | 131/420 [1:05:01<2:23:35, 29.81s/it]
 31%|███▏      | 132/420 [1:05:31<2:23:23, 29.87s/it]
 32%|███▏      | 133/420 [1:06:01<2:22:39, 29.83s/it]
 32%|███▏      | 134/420 [1:06:31<2:22:28, 29.89s/it]
 32%|███▏      | 135/420 [1:07:01<2:21:41, 29.83s/it]
 32%|███▏      | 136/420 [1:07:30<2:20:33, 29.69s/it]
 33%|███▎      | 137/420 [1:08:00<2:20:04, 29.70s/it]
 33%|███▎      | 138/420 [1:08:30<2:20:04, 29.80s/it]
 33%|███▎      | 139/420 [1:09:00<2:19:54, 29.87s/it]
 33%|███▎      | 140/420 [1:09:30<2:19:11, 29.83s/it]
 34%|███▎      | 141/420 [1:09:59<2:18:31, 29.79s/it]
 34%|███▍      | 142/420 [1:10:29<2:17:26, 29.66s/it]
 34%|███▍      | 143/420 [1:10:58<2:16:59, 29.67s/it]
 34%|███▍      | 144/420 [1:11:28<2:16:05, 29.58s/it]
 35%|███▍      | 145/420 [1:11:57<2:15:44, 29.62s/it]
 35%|███▍      | 146/420 [1:12:27<2:15:21, 29.64s/it]
 35%|███▌      | 147/420 [1:12:57<2:14:57, 29.66s/it]
 35%|███▌      | 148/420 [1:13:27<2:14:57, 29.77s/it]
 35%|███▌      | 149/420 [1:13:56<2:13:46, 29.62s/it]
 36%|███▌      | 150/420 [1:14:25<2:12:56, 29.54s/it]
                                                     
{'loss': 0.532, 'learning_rate': 0.0001, 'global_step': 150, 'epoch': 0.71}

 36%|███▌      | 150/420 [1:14:25<2:12:56, 29.54s/it]
 36%|███▌      | 151/420 [1:14:55<2:12:40, 29.59s/it]
 36%|███▌      | 152/420 [1:15:25<2:12:17, 29.62s/it]
 36%|███▋      | 153/420 [1:15:55<2:11:54, 29.64s/it]
 37%|███▋      | 154/420 [1:16:25<2:11:55, 29.76s/it]
 37%|███▋      | 155/420 [1:16:54<2:11:21, 29.74s/it]
 37%|███▋      | 156/420 [1:17:24<2:10:22, 29.63s/it]
 37%|███▋      | 157/420 [1:17:53<2:09:56, 29.65s/it]
 38%|███▊      | 158/420 [1:18:23<2:09:05, 29.56s/it]
 38%|███▊      | 159/420 [1:18:53<2:09:12, 29.70s/it]
 38%|███▊      | 160/420 [1:19:22<2:08:42, 29.70s/it]
 38%|███▊      | 161/420 [1:19:52<2:08:37, 29.80s/it]
 39%|███▊      | 162/420 [1:20:22<2:08:25, 29.87s/it]
 39%|███▉      | 163/420 [1:20:52<2:07:41, 29.81s/it]
 39%|███▉      | 164/420 [1:21:22<2:07:27, 29.87s/it]
 39%|███▉      | 165/420 [1:21:52<2:06:18, 29.72s/it]
 40%|███▉      | 166/420 [1:22:21<2:05:47, 29.72s/it]
 40%|███▉      | 167/420 [1:22:51<2:05:16, 29.71s/it]
 40%|████      | 168/420 [1:23:21<2:04:45, 29.71s/it]
 40%|████      | 169/420 [1:23:50<2:04:15, 29.70s/it]
 40%|████      | 170/420 [1:24:20<2:03:44, 29.70s/it]
 41%|████      | 171/420 [1:24:50<2:03:13, 29.69s/it]
 41%|████      | 172/420 [1:25:20<2:03:08, 29.79s/it]
 41%|████      | 173/420 [1:25:49<2:02:30, 29.76s/it]
 41%|████▏     | 174/420 [1:26:19<2:01:57, 29.75s/it]
 42%|████▏     | 175/420 [1:26:49<2:01:22, 29.73s/it]
 42%|████▏     | 176/420 [1:27:18<2:00:25, 29.61s/it]
 42%|████▏     | 177/420 [1:27:48<2:00:00, 29.63s/it]
 42%|████▏     | 178/420 [1:28:18<1:59:35, 29.65s/it]
 43%|████▎     | 179/420 [1:28:47<1:59:09, 29.67s/it]
 43%|████▎     | 180/420 [1:29:17<1:58:42, 29.68s/it]
 43%|████▎     | 181/420 [1:29:47<1:58:38, 29.78s/it]
 43%|████▎     | 182/420 [1:30:17<1:58:25, 29.85s/it]
 44%|████▎     | 183/420 [1:30:47<1:57:44, 29.81s/it]
 44%|████▍     | 184/420 [1:31:16<1:57:07, 29.78s/it]
 44%|████▍     | 185/420 [1:31:46<1:56:33, 29.76s/it]
 44%|████▍     | 186/420 [1:32:16<1:56:00, 29.75s/it]
 45%|████▍     | 187/420 [1:32:46<1:55:29, 29.74s/it]
 45%|████▍     | 188/420 [1:33:15<1:54:59, 29.74s/it]
 45%|████▌     | 189/420 [1:33:45<1:54:50, 29.83s/it]
 45%|████▌     | 190/420 [1:34:15<1:54:11, 29.79s/it]
 45%|████▌     | 191/420 [1:34:45<1:53:36, 29.77s/it]
 46%|████▌     | 192/420 [1:35:14<1:53:01, 29.75s/it]
 46%|████▌     | 193/420 [1:35:44<1:52:29, 29.74s/it]
 46%|████▌     | 194/420 [1:36:14<1:51:35, 29.63s/it]
 46%|████▋     | 195/420 [1:36:43<1:50:48, 29.55s/it]
 47%|████▋     | 196/420 [1:37:13<1:50:28, 29.59s/it]
 47%|████▋     | 197/420 [1:37:43<1:50:26, 29.72s/it]
 47%|████▋     | 198/420 [1:38:12<1:49:55, 29.71s/it]
 47%|████▋     | 199/420 [1:38:42<1:49:19, 29.68s/it]
 48%|████▊     | 200/420 [1:39:12<1:49:13, 29.79s/it]
                                                     
{'loss': 0.5351, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 0.95}

 48%|████▊     | 200/420 [1:39:12<1:49:13, 29.79s/it]
 48%|████▊     | 201/420 [1:39:46<1:53:23, 31.07s/it]
 48%|████▊     | 202/420 [1:40:16<1:51:23, 30.66s/it]
 48%|████▊     | 203/420 [1:40:46<1:50:13, 30.47s/it]
 49%|████▊     | 204/420 [1:41:15<1:48:46, 30.22s/it]
 49%|████▉     | 205/420 [1:41:45<1:47:43, 30.06s/it]
 49%|████▉     | 206/420 [1:42:15<1:46:51, 29.96s/it]
 49%|████▉     | 207/420 [1:42:44<1:45:43, 29.78s/it]
 50%|████▉     | 208/420 [1:43:14<1:45:30, 29.86s/it]
 50%|████▉     | 209/420 [1:43:44<1:44:51, 29.82s/it]
 50%|█████     | 210/420 [1:44:14<1:44:34, 29.88s/it]
 50%|█████     | 211/420 [1:44:39<1:38:39, 28.32s/it]
 50%|█████     | 212/420 [1:45:08<1:39:37, 28.74s/it]
 51%|█████     | 213/420 [1:45:38<1:40:30, 29.13s/it]
 51%|█████     | 214/420 [1:46:08<1:40:35, 29.30s/it]
 51%|█████     | 215/420 [1:46:38<1:40:50, 29.51s/it]
 51%|█████▏    | 216/420 [1:47:07<1:40:06, 29.44s/it]
 52%|█████▏    | 217/420 [1:47:37<1:39:52, 29.52s/it]
 52%|█████▏    | 218/420 [1:48:07<1:39:33, 29.57s/it]
 52%|█████▏    | 219/420 [1:48:36<1:39:12, 29.61s/it]
 52%|█████▏    | 220/420 [1:49:06<1:38:28, 29.54s/it]
 53%|█████▎    | 221/420 [1:49:36<1:38:07, 29.58s/it]
 53%|█████▎    | 222/420 [1:50:06<1:38:02, 29.71s/it]
 53%|█████▎    | 223/420 [1:50:36<1:37:51, 29.81s/it]
 53%|█████▎    | 224/420 [1:51:05<1:36:55, 29.67s/it]
 54%|█████▎    | 225/420 [1:51:35<1:36:28, 29.68s/it]
 54%|█████▍    | 226/420 [1:52:04<1:35:59, 29.69s/it]
 54%|█████▍    | 227/420 [1:52:34<1:35:28, 29.68s/it]
 54%|█████▍    | 228/420 [1:53:04<1:34:58, 29.68s/it]
 55%|█████▍    | 229/420 [1:53:34<1:34:48, 29.78s/it]
 55%|█████▍    | 230/420 [1:54:04<1:34:32, 29.86s/it]
 55%|█████▌    | 231/420 [1:54:33<1:33:53, 29.80s/it]
 55%|█████▌    | 232/420 [1:55:03<1:32:58, 29.67s/it]
 55%|█████▌    | 233/420 [1:55:32<1:32:29, 29.68s/it]
 56%|█████▌    | 234/420 [1:56:02<1:32:17, 29.77s/it]
 56%|█████▌    | 235/420 [1:56:32<1:32:01, 29.85s/it]
 56%|█████▌    | 236/420 [1:57:02<1:31:24, 29.81s/it]
 56%|█████▋    | 237/420 [1:57:32<1:31:06, 29.87s/it]
 57%|█████▋    | 238/420 [1:58:02<1:30:09, 29.72s/it]
 57%|█████▋    | 239/420 [1:58:32<1:29:56, 29.82s/it]
 57%|█████▋    | 240/420 [1:59:01<1:29:01, 29.68s/it]
 57%|█████▋    | 241/420 [1:59:31<1:28:32, 29.68s/it]
 58%|█████▊    | 242/420 [2:00:00<1:27:46, 29.59s/it]
 58%|█████▊    | 243/420 [2:00:29<1:27:04, 29.52s/it]
 58%|█████▊    | 244/420 [2:00:59<1:27:01, 29.66s/it]
 58%|█████▊    | 245/420 [2:01:29<1:26:15, 29.58s/it]
 59%|█████▊    | 246/420 [2:01:58<1:25:51, 29.60s/it]
 59%|█████▉    | 247/420 [2:02:28<1:25:07, 29.52s/it]
 59%|█████▉    | 248/420 [2:02:57<1:24:28, 29.47s/it]
 59%|█████▉    | 249/420 [2:03:27<1:24:27, 29.63s/it]
 60%|█████▉    | 250/420 [2:03:56<1:23:44, 29.56s/it]
                                                     
{'loss': 0.2802, 'learning_rate': 0.0001, 'global_step': 250, 'epoch': 1.19}

 60%|█████▉    | 250/420 [2:03:56<1:23:44, 29.56s/it]
 60%|█████▉    | 251/420 [2:04:26<1:23:36, 29.69s/it]
 60%|██████    | 252/420 [2:04:56<1:22:50, 29.58s/it]
 60%|██████    | 253/420 [2:05:25<1:22:25, 29.62s/it]
 60%|██████    | 254/420 [2:05:55<1:21:42, 29.53s/it]
 61%|██████    | 255/420 [2:06:25<1:21:36, 29.68s/it]
 61%|██████    | 256/420 [2:06:55<1:21:07, 29.68s/it]
 61%|██████    | 257/420 [2:07:24<1:20:33, 29.65s/it]
 61%|██████▏   | 258/420 [2:07:54<1:20:20, 29.76s/it]
 62%|██████▏   | 259/420 [2:08:23<1:19:30, 29.63s/it]
 62%|██████▏   | 260/420 [2:08:53<1:19:03, 29.65s/it]
 62%|██████▏   | 261/420 [2:09:23<1:18:49, 29.74s/it]
 62%|██████▏   | 262/420 [2:09:53<1:18:16, 29.72s/it]
 63%|██████▎   | 263/420 [2:10:22<1:17:44, 29.71s/it]
 63%|██████▎   | 264/420 [2:10:52<1:17:13, 29.70s/it]
 63%|██████▎   | 265/420 [2:11:21<1:16:26, 29.59s/it]
 63%|██████▎   | 266/420 [2:11:51<1:16:00, 29.61s/it]
 64%|██████▎   | 267/420 [2:12:21<1:15:18, 29.53s/it]
 64%|██████▍   | 268/420 [2:12:50<1:14:40, 29.47s/it]
 64%|██████▍   | 269/420 [2:13:20<1:14:35, 29.64s/it]
 64%|██████▍   | 270/420 [2:13:50<1:14:08, 29.65s/it]
 65%|██████▍   | 271/420 [2:14:20<1:13:54, 29.76s/it]
 65%|██████▍   | 272/420 [2:14:49<1:13:16, 29.71s/it]
 65%|██████▌   | 273/420 [2:15:18<1:12:31, 29.60s/it]
 65%|██████▌   | 274/420 [2:15:48<1:12:05, 29.62s/it]
 65%|██████▌   | 275/420 [2:16:18<1:11:37, 29.64s/it]
 66%|██████▌   | 276/420 [2:16:47<1:10:55, 29.55s/it]
 66%|██████▌   | 277/420 [2:17:17<1:10:17, 29.50s/it]
 66%|██████▌   | 278/420 [2:17:47<1:10:08, 29.64s/it]
 66%|██████▋   | 279/420 [2:18:16<1:09:40, 29.65s/it]
 67%|██████▋   | 280/420 [2:18:46<1:09:11, 29.65s/it]
 67%|██████▋   | 281/420 [2:19:15<1:08:28, 29.56s/it]
 67%|██████▋   | 282/420 [2:19:45<1:07:59, 29.56s/it]
 67%|██████▋   | 283/420 [2:20:14<1:07:21, 29.50s/it]
 68%|██████▊   | 284/420 [2:20:44<1:07:12, 29.65s/it]
 68%|██████▊   | 285/420 [2:21:14<1:06:57, 29.76s/it]
 68%|██████▊   | 286/420 [2:21:44<1:06:37, 29.84s/it]
 68%|██████▊   | 287/420 [2:22:14<1:06:14, 29.88s/it]
 69%|██████▊   | 288/420 [2:22:44<1:05:35, 29.82s/it]
 69%|██████▉   | 289/420 [2:23:13<1:05:00, 29.77s/it]
 69%|██████▉   | 290/420 [2:23:43<1:04:25, 29.74s/it]
 69%|██████▉   | 291/420 [2:24:12<1:03:40, 29.62s/it]
 70%|██████▉   | 292/420 [2:24:42<1:03:00, 29.54s/it]
 70%|██████▉   | 293/420 [2:25:11<1:02:36, 29.58s/it]
 70%|███████   | 294/420 [2:25:41<1:02:10, 29.61s/it]
 70%|███████   | 295/420 [2:26:11<1:01:55, 29.72s/it]
 70%|███████   | 296/420 [2:26:41<1:01:22, 29.70s/it]
 71%|███████   | 297/420 [2:27:11<1:01:04, 29.79s/it]
 71%|███████   | 298/420 [2:27:41<1:00:30, 29.76s/it]
 71%|███████   | 299/420 [2:28:10<59:58, 29.74s/it]  
 71%|███████▏  | 300/420 [2:28:40<59:15, 29.63s/it]
                                                   
{'loss': 0.2154, 'learning_rate': 0.0001, 'global_step': 300, 'epoch': 1.43}

 71%|███████▏  | 300/420 [2:28:40<59:15, 29.63s/it]
 72%|███████▏  | 301/420 [2:29:10<58:58, 29.74s/it]
 72%|███████▏  | 302/420 [2:29:40<58:37, 29.81s/it]
 72%|███████▏  | 303/420 [2:30:09<58:02, 29.76s/it]
 72%|███████▏  | 304/420 [2:30:39<57:17, 29.63s/it]
 73%|███████▎  | 305/420 [2:31:08<56:48, 29.64s/it]
 73%|███████▎  | 306/420 [2:31:38<56:31, 29.75s/it]
 73%|███████▎  | 307/420 [2:32:08<55:58, 29.72s/it]
 73%|███████▎  | 308/420 [2:32:37<55:26, 29.70s/it]
 74%|███████▎  | 309/420 [2:33:07<54:55, 29.69s/it]
 74%|███████▍  | 310/420 [2:33:37<54:24, 29.68s/it]
 74%|███████▍  | 311/420 [2:34:06<53:44, 29.58s/it]
 74%|███████▍  | 312/420 [2:34:36<53:17, 29.61s/it]
 75%|███████▍  | 313/420 [2:35:06<53:00, 29.72s/it]
 75%|███████▍  | 314/420 [2:35:35<52:28, 29.70s/it]
 75%|███████▌  | 315/420 [2:36:05<51:58, 29.70s/it]
 75%|███████▌  | 316/420 [2:36:35<51:37, 29.78s/it]
 75%|███████▌  | 317/420 [2:37:05<51:14, 29.85s/it]
 76%|███████▌  | 318/420 [2:37:35<50:38, 29.79s/it]
 76%|███████▌  | 319/420 [2:38:04<50:04, 29.75s/it]
 76%|███████▌  | 320/420 [2:38:34<49:42, 29.82s/it]
 76%|███████▋  | 321/420 [2:39:04<49:08, 29.78s/it]
 77%|███████▋  | 322/420 [2:39:33<48:25, 29.65s/it]
 77%|███████▋  | 323/420 [2:40:03<48:05, 29.75s/it]
 77%|███████▋  | 324/420 [2:40:33<47:43, 29.83s/it]
 77%|███████▋  | 325/420 [2:41:03<47:18, 29.88s/it]
 78%|███████▊  | 326/420 [2:41:33<46:42, 29.82s/it]
 78%|███████▊  | 327/420 [2:42:03<46:08, 29.77s/it]
 78%|███████▊  | 328/420 [2:42:32<45:27, 29.65s/it]
 78%|███████▊  | 329/420 [2:43:02<45:08, 29.76s/it]
 79%|███████▊  | 330/420 [2:43:32<44:36, 29.73s/it]
 79%|███████▉  | 331/420 [2:44:02<44:04, 29.72s/it]
 79%|███████▉  | 332/420 [2:44:32<43:42, 29.80s/it]
 79%|███████▉  | 333/420 [2:45:02<43:18, 29.86s/it]
 80%|███████▉  | 334/420 [2:45:31<42:43, 29.81s/it]
 80%|███████▉  | 335/420 [2:46:01<42:10, 29.77s/it]
 80%|████████  | 336/420 [2:46:30<41:30, 29.64s/it]
 80%|████████  | 337/420 [2:47:00<41:09, 29.75s/it]
 80%|████████  | 338/420 [2:47:30<40:37, 29.73s/it]
 81%|████████  | 339/420 [2:48:00<40:06, 29.71s/it]
 81%|████████  | 340/420 [2:48:29<39:35, 29.69s/it]
 81%|████████  | 341/420 [2:48:59<39:12, 29.78s/it]
 81%|████████▏ | 342/420 [2:49:29<38:48, 29.85s/it]
 82%|████████▏ | 343/420 [2:49:59<38:13, 29.79s/it]
 82%|████████▏ | 344/420 [2:50:29<37:41, 29.76s/it]
 82%|████████▏ | 345/420 [2:50:59<37:17, 29.83s/it]
 82%|████████▏ | 346/420 [2:51:28<36:43, 29.78s/it]
 83%|████████▎ | 347/420 [2:51:58<36:18, 29.84s/it]
 83%|████████▎ | 348/420 [2:52:28<35:37, 29.69s/it]
 83%|████████▎ | 349/420 [2:52:58<35:14, 29.78s/it]
 83%|████████▎ | 350/420 [2:53:27<34:42, 29.75s/it]
                                                   
{'loss': 0.2142, 'learning_rate': 0.0001, 'global_step': 350, 'epoch': 1.66}

 83%|████████▎ | 350/420 [2:53:27<34:42, 29.75s/it]
 84%|████████▎ | 351/420 [2:53:57<34:10, 29.71s/it]
 84%|████████▍ | 352/420 [2:54:26<33:39, 29.69s/it]
 84%|████████▍ | 353/420 [2:54:56<33:14, 29.78s/it]
 84%|████████▍ | 354/420 [2:55:26<32:42, 29.73s/it]
 85%|████████▍ | 355/420 [2:55:56<32:10, 29.71s/it]
 85%|████████▍ | 356/420 [2:56:25<31:40, 29.70s/it]
 85%|████████▌ | 357/420 [2:56:55<31:03, 29.58s/it]
 85%|████████▌ | 358/420 [2:57:24<30:33, 29.57s/it]
 85%|████████▌ | 359/420 [2:57:54<30:05, 29.59s/it]
 86%|████████▌ | 360/420 [2:58:24<29:36, 29.61s/it]
 86%|████████▌ | 361/420 [2:58:54<29:13, 29.72s/it]
 86%|████████▌ | 362/420 [2:59:24<28:48, 29.80s/it]
 86%|████████▋ | 363/420 [2:59:53<28:14, 29.72s/it]
 87%|████████▋ | 364/420 [3:00:23<27:43, 29.70s/it]
 87%|████████▋ | 365/420 [3:00:53<27:18, 29.79s/it]
 87%|████████▋ | 366/420 [3:01:22<26:40, 29.64s/it]
 87%|████████▋ | 367/420 [3:01:52<26:11, 29.64s/it]
 88%|████████▊ | 368/420 [3:02:21<25:40, 29.62s/it]
 88%|████████▊ | 369/420 [3:02:51<25:10, 29.63s/it]
 88%|████████▊ | 370/420 [3:03:21<24:46, 29.73s/it]
 88%|████████▊ | 371/420 [3:03:51<24:16, 29.72s/it]
 89%|████████▊ | 372/420 [3:04:20<23:40, 29.60s/it]
 89%|████████▉ | 373/420 [3:04:49<23:07, 29.52s/it]
 89%|████████▉ | 374/420 [3:05:19<22:39, 29.56s/it]
 89%|████████▉ | 375/420 [3:05:49<22:16, 29.69s/it]
 90%|████████▉ | 376/420 [3:06:19<21:50, 29.78s/it]
 90%|████████▉ | 377/420 [3:06:48<21:18, 29.74s/it]
 90%|█████████ | 378/420 [3:07:18<20:48, 29.72s/it]
 90%|█████████ | 379/420 [3:07:47<20:13, 29.60s/it]
 90%|█████████ | 380/420 [3:08:17<19:40, 29.52s/it]
 91%|█████████ | 381/420 [3:08:46<19:09, 29.46s/it]
 91%|█████████ | 382/420 [3:09:16<18:45, 29.61s/it]
 91%|█████████ | 383/420 [3:09:45<18:12, 29.53s/it]
 91%|█████████▏| 384/420 [3:10:15<17:44, 29.56s/it]
 92%|█████████▏| 385/420 [3:10:44<17:12, 29.49s/it]
 92%|█████████▏| 386/420 [3:11:14<16:47, 29.63s/it]
 92%|█████████▏| 387/420 [3:11:44<16:21, 29.73s/it]
 92%|█████████▏| 388/420 [3:12:14<15:50, 29.71s/it]
 93%|█████████▎| 389/420 [3:12:44<15:20, 29.69s/it]
 93%|█████████▎| 390/420 [3:13:14<14:53, 29.78s/it]
 93%|█████████▎| 391/420 [3:13:43<14:19, 29.64s/it]
 93%|█████████▎| 392/420 [3:14:13<13:52, 29.74s/it]
 94%|█████████▎| 393/420 [3:14:43<13:24, 29.81s/it]
 94%|█████████▍| 394/420 [3:15:13<12:53, 29.76s/it]
 94%|█████████▍| 395/420 [3:15:42<12:23, 29.74s/it]
 94%|█████████▍| 396/420 [3:16:12<11:53, 29.71s/it]
 95%|█████████▍| 397/420 [3:16:42<11:25, 29.80s/it]
 95%|█████████▍| 398/420 [3:17:11<10:54, 29.75s/it]
 95%|█████████▌| 399/420 [3:17:41<10:26, 29.83s/it]
 95%|█████████▌| 400/420 [3:18:11<09:57, 29.87s/it]
                                                   
{'loss': 0.2111, 'learning_rate': 0.0001, 'global_step': 400, 'epoch': 1.9}

 95%|█████████▌| 400/420 [3:18:11<09:57, 29.87s/it]
 95%|█████████▌| 401/420 [3:18:45<09:47, 30.94s/it]
 96%|█████████▌| 402/420 [3:19:15<09:11, 30.66s/it]
 96%|█████████▌| 403/420 [3:19:44<08:34, 30.26s/it]
 96%|█████████▌| 404/420 [3:20:14<07:59, 29.98s/it]
 96%|█████████▋| 405/420 [3:20:43<07:28, 29.89s/it]
 97%|█████████▋| 406/420 [3:21:13<06:56, 29.73s/it]
 97%|█████████▋| 407/420 [3:21:42<06:25, 29.67s/it]
 97%|█████████▋| 408/420 [3:22:11<05:54, 29.57s/it]
 97%|█████████▋| 409/420 [3:22:41<05:24, 29.50s/it]
 98%|█████████▊| 410/420 [3:23:10<04:55, 29.55s/it]
 98%|█████████▊| 411/420 [3:23:40<04:27, 29.68s/it]
 98%|█████████▊| 412/420 [3:24:10<03:58, 29.77s/it]
 98%|█████████▊| 413/420 [3:24:40<03:28, 29.74s/it]
 99%|█████████▊| 414/420 [3:25:10<02:58, 29.82s/it]
 99%|█████████▉| 415/420 [3:25:40<02:28, 29.77s/it]
 99%|█████████▉| 416/420 [3:26:09<01:58, 29.64s/it]
 99%|█████████▉| 417/420 [3:26:39<01:28, 29.64s/it]
100%|█████████▉| 418/420 [3:27:08<00:59, 29.65s/it]
100%|█████████▉| 419/420 [3:27:38<00:29, 29.75s/it]
100%|██████████| 420/420 [3:28:08<00:00, 29.82s/it]
                                                   
{'train_runtime': 12489.0215, 'train_samples_per_second': 0.404, 'train_steps_per_second': 0.034, 'train_loss': 0.38128303913843065, 'epoch': 2.0}

100%|██████████| 420/420 [3:28:09<00:00, 29.82s/it]
100%|██████████| 420/420 [3:28:09<00:00, 29.74s/it]
***** train metrics *****
  epoch                    =        2.0
  train_loss               =     0.3813
  train_runtime            = 3:28:09.02
  train_samples_per_second =      0.404
  train_steps_per_second   =      0.034