File size: 49,390 Bytes
9e536bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
model training desc:  做知识选择,使用QuALITY数据集,随机选择的知识和关键句训练
2023-12-16 23:07:37.734 | INFO     | __main__:init_components:108 - Initializing components...

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:48<00:48, 48.58s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [01:02<00:00, 28.35s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [01:02<00:00, 31.38s/it]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2023-12-16 23:08:41.148 | INFO     | __main__:init_components:155 - 

2023-12-16 23:08:41.148 | INFO     | __main__:init_components:156 - ********************
2023-12-16 23:08:41.148 | INFO     | __main__:init_components:157 - using llama2 model
2023-12-16 23:08:41.148 | INFO     | __main__:init_components:158 - ********************
2023-12-16 23:08:41.148 | INFO     | __main__:init_components:159 - 

memory footprint of model: 4.024436950683594 GB
trainable params: 319,815,680 || all params: 7,058,231,296 || trainable%: 4.531102291607305
2023-12-16 23:08:44.549 | INFO     | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/QuALITY/random_select/with_knowledge_without_select_instruction/train.jsonl
2023-12-16 23:08:44.647 | INFO     | component.dataset:__init__:19 - there are 2523 data in dataset
2023-12-16 23:08:44.773 | INFO     | __main__:main:231 - *** starting training ***

  0%|          | 0/630 [00:00<?, ?it/s]
  0%|          | 1/630 [01:28<15:29:52, 88.70s/it]
  0%|          | 2/630 [02:48<14:34:48, 83.58s/it]
  0%|          | 3/630 [04:10<14:23:46, 82.66s/it]
  1%|          | 4/630 [05:32<14:18:41, 82.30s/it]
  1%|          | 5/630 [06:53<14:15:21, 82.11s/it]
  1%|          | 6/630 [08:16<14:15:39, 82.28s/it]
  1%|          | 7/630 [09:38<14:12:59, 82.15s/it]
  1%|▏         | 8/630 [10:59<14:09:57, 81.99s/it]
  1%|▏         | 9/630 [12:21<14:08:12, 81.95s/it]
  2%|▏         | 10/630 [13:16<12:38:25, 73.40s/it]
  2%|▏         | 11/630 [13:52<10:41:19, 62.16s/it]
  2%|▏         | 12/630 [14:29<9:20:14, 54.39s/it] 
  2%|▏         | 13/630 [15:05<8:23:57, 49.01s/it]
  2%|▏         | 14/630 [15:42<7:43:09, 45.11s/it]
  2%|▏         | 15/630 [16:18<7:16:08, 42.55s/it]
  3%|▎         | 16/630 [16:55<6:57:10, 40.77s/it]
  3%|▎         | 17/630 [17:32<6:44:01, 39.55s/it]
  3%|▎         | 18/630 [18:08<6:34:23, 38.67s/it]
  3%|▎         | 19/630 [18:45<6:28:19, 38.13s/it]
  3%|▎         | 20/630 [19:20<6:17:18, 37.11s/it]
                                                  
{'loss': 0.5195, 'learning_rate': 3.1746031746031745e-05, 'global_step': 20, 'epoch': 0.1}

  3%|▎         | 20/630 [19:20<6:17:18, 37.11s/it]
  3%|▎         | 21/630 [19:57<6:15:54, 37.03s/it]
  3%|▎         | 22/630 [20:33<6:14:20, 36.94s/it]
  4%|▎         | 23/630 [21:10<6:12:23, 36.81s/it]
  4%|▍         | 24/630 [21:47<6:11:31, 36.78s/it]
  4%|▍         | 25/630 [22:23<6:10:36, 36.75s/it]
  4%|▍         | 26/630 [23:00<6:09:53, 36.74s/it]
  4%|▍         | 27/630 [23:37<6:09:29, 36.76s/it]
  4%|▍         | 28/630 [24:14<6:08:44, 36.75s/it]
  5%|▍         | 29/630 [24:50<6:06:48, 36.62s/it]
  5%|▍         | 30/630 [25:27<6:06:30, 36.65s/it]
  5%|▍         | 31/630 [26:03<6:06:41, 36.73s/it]
  5%|▌         | 32/630 [26:40<6:05:45, 36.70s/it]
  5%|▌         | 33/630 [27:17<6:05:11, 36.70s/it]
  5%|▌         | 34/630 [27:54<6:04:39, 36.71s/it]
  6%|▌         | 35/630 [28:30<6:03:47, 36.68s/it]
  6%|▌         | 36/630 [29:07<6:03:48, 36.75s/it]
  6%|▌         | 37/630 [29:44<6:02:54, 36.72s/it]
  6%|▌         | 38/630 [30:20<6:02:17, 36.72s/it]
  6%|▌         | 39/630 [30:57<6:01:33, 36.71s/it]
  6%|▋         | 40/630 [31:34<6:00:55, 36.70s/it]
                                                  
{'loss': 0.5055, 'learning_rate': 6.349206349206349e-05, 'global_step': 40, 'epoch': 0.19}

  6%|▋         | 40/630 [31:34<6:00:55, 36.70s/it]
  7%|▋         | 41/630 [32:10<6:00:05, 36.68s/it]
  7%|▋         | 42/630 [32:47<5:59:28, 36.68s/it]
  7%|▋         | 43/630 [33:24<5:58:37, 36.66s/it]
  7%|▋         | 44/630 [34:00<5:57:55, 36.65s/it]
  7%|▋         | 45/630 [34:37<5:57:30, 36.67s/it]
  7%|▋         | 46/630 [35:14<5:57:03, 36.68s/it]
  7%|▋         | 47/630 [35:50<5:56:33, 36.70s/it]
  8%|▊         | 48/630 [36:27<5:56:19, 36.74s/it]
  8%|▊         | 49/630 [37:04<5:55:23, 36.70s/it]
  8%|▊         | 50/630 [37:41<5:55:16, 36.75s/it]
  8%|▊         | 51/630 [38:17<5:54:16, 36.71s/it]
  8%|▊         | 52/630 [38:54<5:53:36, 36.71s/it]
  8%|▊         | 53/630 [39:31<5:52:58, 36.70s/it]
  9%|▊         | 54/630 [40:08<5:52:24, 36.71s/it]
  9%|▊         | 55/630 [40:44<5:51:43, 36.70s/it]
  9%|▉         | 56/630 [41:21<5:51:02, 36.69s/it]
  9%|▉         | 57/630 [41:58<5:50:25, 36.69s/it]
  9%|▉         | 58/630 [42:34<5:49:38, 36.67s/it]
  9%|▉         | 59/630 [43:11<5:49:06, 36.68s/it]
 10%|▉         | 60/630 [43:48<5:48:36, 36.70s/it]
                                                  
{'loss': 0.5483, 'learning_rate': 9.523809523809524e-05, 'global_step': 60, 'epoch': 0.29}

 10%|▉         | 60/630 [43:48<5:48:36, 36.70s/it]
 10%|▉         | 61/630 [44:24<5:48:05, 36.71s/it]
 10%|▉         | 62/630 [45:01<5:47:12, 36.68s/it]
 10%|█         | 63/630 [45:38<5:46:13, 36.64s/it]
 10%|█         | 64/630 [46:14<5:45:51, 36.66s/it]
 10%|█         | 65/630 [46:51<5:45:21, 36.68s/it]
 10%|█         | 66/630 [47:28<5:45:26, 36.75s/it]
 11%|█         | 67/630 [48:05<5:44:57, 36.76s/it]
 11%|█         | 68/630 [48:41<5:44:27, 36.78s/it]
 11%|█         | 69/630 [49:18<5:43:38, 36.75s/it]
 11%|█         | 70/630 [49:54<5:41:28, 36.59s/it]
 11%|█▏        | 71/630 [50:31<5:41:02, 36.61s/it]
 11%|█▏        | 72/630 [51:08<5:40:28, 36.61s/it]
 12%|█▏        | 73/630 [51:44<5:39:51, 36.61s/it]
 12%|█▏        | 74/630 [52:21<5:39:47, 36.67s/it]
 12%|█▏        | 75/630 [52:58<5:39:09, 36.67s/it]
 12%|█▏        | 76/630 [53:34<5:38:41, 36.68s/it]
 12%|█▏        | 77/630 [54:11<5:38:25, 36.72s/it]
 12%|█▏        | 78/630 [54:48<5:38:03, 36.75s/it]
 13%|█▎        | 79/630 [55:25<5:37:16, 36.73s/it]
 13%|█▎        | 80/630 [56:02<5:36:51, 36.75s/it]
                                                  
{'loss': 0.4659, 'learning_rate': 0.0001, 'global_step': 80, 'epoch': 0.38}

 13%|█▎        | 80/630 [56:02<5:36:51, 36.75s/it]
 13%|█▎        | 81/630 [56:38<5:35:51, 36.71s/it]
 13%|█▎        | 82/630 [57:15<5:35:14, 36.71s/it]
 13%|█▎        | 83/630 [57:52<5:34:37, 36.71s/it]
 13%|█▎        | 84/630 [58:28<5:34:32, 36.76s/it]
 13%|█▎        | 85/630 [59:05<5:33:41, 36.74s/it]
 14%|█▎        | 86/630 [59:42<5:33:31, 36.79s/it]
 14%|█▍        | 87/630 [1:00:19<5:32:41, 36.76s/it]
 14%|█▍        | 88/630 [1:00:55<5:31:40, 36.72s/it]
 14%|█▍        | 89/630 [1:01:32<5:31:14, 36.74s/it]
 14%|█▍        | 90/630 [1:02:09<5:30:50, 36.76s/it]
 14%|█▍        | 91/630 [1:02:46<5:30:03, 36.74s/it]
 15%|█▍        | 92/630 [1:03:22<5:29:23, 36.73s/it]
 15%|█▍        | 93/630 [1:03:59<5:28:42, 36.73s/it]
 15%|█▍        | 94/630 [1:04:36<5:28:16, 36.75s/it]
 15%|█▌        | 95/630 [1:05:12<5:27:19, 36.71s/it]
 15%|█▌        | 96/630 [1:05:49<5:26:29, 36.69s/it]
 15%|█▌        | 97/630 [1:06:26<5:26:04, 36.71s/it]
 16%|█▌        | 98/630 [1:07:32<6:43:26, 45.50s/it]
 16%|█▌        | 99/630 [1:08:52<8:15:43, 56.01s/it]
 16%|█▌        | 100/630 [1:10:14<9:23:01, 63.74s/it]
                                                     
{'loss': 0.5625, 'learning_rate': 0.0001, 'global_step': 100, 'epoch': 0.48}

 16%|█▌        | 100/630 [1:10:14<9:23:01, 63.74s/it]
 16%|█▌        | 101/630 [1:11:36<10:09:46, 69.16s/it]
 16%|█▌        | 102/630 [1:12:58<10:42:36, 73.02s/it]
 16%|█▋        | 103/630 [1:14:20<11:04:01, 75.60s/it]
 17%|█▋        | 104/630 [1:15:42<11:19:25, 77.50s/it]
 17%|█▋        | 105/630 [1:17:03<11:28:59, 78.74s/it]
 17%|█▋        | 106/630 [1:18:25<11:35:56, 79.69s/it]
 17%|█▋        | 107/630 [1:19:47<11:39:41, 80.27s/it]
 17%|█▋        | 108/630 [1:21:08<11:41:52, 80.68s/it]
 17%|█▋        | 109/630 [1:22:30<11:43:33, 81.02s/it]
 17%|█▋        | 110/630 [1:23:52<11:43:00, 81.12s/it]
 18%|█▊        | 111/630 [1:25:14<11:44:41, 81.47s/it]
 18%|█▊        | 112/630 [1:26:36<11:44:32, 81.61s/it]
 18%|█▊        | 113/630 [1:27:58<11:43:51, 81.69s/it]
 18%|█▊        | 114/630 [1:29:19<11:42:07, 81.64s/it]
 18%|█▊        | 115/630 [1:30:41<11:40:43, 81.64s/it]
 18%|█▊        | 116/630 [1:32:03<11:40:05, 81.72s/it]
 19%|█▊        | 117/630 [1:33:25<11:39:02, 81.76s/it]
 19%|█▊        | 118/630 [1:34:47<11:38:07, 81.81s/it]
 19%|█▉        | 119/630 [1:36:09<11:37:18, 81.88s/it]
 19%|█▉        | 120/630 [1:37:28<11:30:08, 81.19s/it]
                                                      
{'loss': 0.545, 'learning_rate': 0.0001, 'global_step': 120, 'epoch': 0.57}

 19%|█▉        | 120/630 [1:37:28<11:30:08, 81.19s/it]
 19%|█▉        | 121/630 [1:38:50<11:30:27, 81.39s/it]
 19%|█▉        | 122/630 [1:40:11<11:29:23, 81.42s/it]
 20%|█▉        | 123/630 [1:41:34<11:29:43, 81.62s/it]
 20%|█▉        | 124/630 [1:42:55<11:29:06, 81.71s/it]
 20%|█▉        | 125/630 [1:44:17<11:27:27, 81.68s/it]
 20%|██        | 126/630 [1:45:39<11:25:25, 81.60s/it]
 20%|██        | 127/630 [1:47:00<11:24:39, 81.67s/it]
 20%|██        | 128/630 [1:48:20<11:17:02, 80.92s/it]
 20%|██        | 129/630 [1:49:41<11:17:34, 81.15s/it]
 21%|██        | 130/630 [1:51:03<11:18:30, 81.42s/it]
 21%|██        | 131/630 [1:52:25<11:18:42, 81.61s/it]
 21%|██        | 132/630 [1:53:47<11:18:20, 81.73s/it]
 21%|██        | 133/630 [1:55:09<11:16:46, 81.70s/it]
 21%|██▏       | 134/630 [1:56:31<11:17:03, 81.90s/it]
 21%|██▏       | 135/630 [1:57:50<11:08:23, 81.02s/it]
 22%|██▏       | 136/630 [1:59:07<10:56:04, 79.68s/it]
 22%|██▏       | 137/630 [2:00:28<10:59:34, 80.27s/it]
 22%|██▏       | 138/630 [2:01:50<11:02:16, 80.76s/it]
 22%|██▏       | 139/630 [2:03:12<11:02:53, 81.01s/it]
 22%|██▏       | 140/630 [2:04:34<11:03:02, 81.19s/it]
                                                      
{'loss': 0.4962, 'learning_rate': 0.0001, 'global_step': 140, 'epoch': 0.67}

 22%|██▏       | 140/630 [2:04:34<11:03:02, 81.19s/it]
 22%|██▏       | 141/630 [2:05:56<11:03:47, 81.45s/it]
 23%|██▎       | 142/630 [2:07:18<11:03:36, 81.59s/it]
 23%|██▎       | 143/630 [2:08:39<11:02:59, 81.68s/it]
 23%|██▎       | 144/630 [2:10:01<11:02:11, 81.75s/it]
 23%|██▎       | 145/630 [2:11:23<11:01:10, 81.80s/it]
 23%|██▎       | 146/630 [2:12:45<10:59:37, 81.77s/it]
 23%|██▎       | 147/630 [2:14:07<10:58:24, 81.79s/it]
 23%|██▎       | 148/630 [2:15:29<10:57:18, 81.82s/it]
 24%|██▎       | 149/630 [2:16:51<10:56:10, 81.85s/it]
 24%|██▍       | 150/630 [2:18:13<10:54:54, 81.86s/it]
 24%|██▍       | 151/630 [2:19:34<10:53:13, 81.82s/it]
 24%|██▍       | 152/630 [2:20:56<10:52:38, 81.92s/it]
 24%|██▍       | 153/630 [2:22:18<10:50:37, 81.84s/it]
 24%|██▍       | 154/630 [2:23:40<10:49:02, 81.81s/it]
 25%|██▍       | 155/630 [2:25:03<10:51:47, 82.33s/it]
 25%|██▍       | 156/630 [2:26:27<10:53:18, 82.70s/it]
 25%|██▍       | 157/630 [2:27:49<10:49:56, 82.45s/it]
 25%|██▌       | 158/630 [2:29:11<10:47:46, 82.34s/it]
 25%|██▌       | 159/630 [2:30:33<10:45:18, 82.20s/it]
 25%|██▌       | 160/630 [2:31:54<10:42:42, 82.05s/it]
                                                      
{'loss': 0.5508, 'learning_rate': 0.0001, 'global_step': 160, 'epoch': 0.76}

 25%|██▌       | 160/630 [2:31:54<10:42:42, 82.05s/it]
 26%|██▌       | 161/630 [2:33:16<10:40:25, 81.93s/it]
 26%|██▌       | 162/630 [2:34:38<10:38:02, 81.80s/it]
 26%|██▌       | 163/630 [2:36:00<10:37:19, 81.88s/it]
 26%|██▌       | 164/630 [2:37:21<10:33:44, 81.60s/it]
 26%|██▌       | 165/630 [2:38:42<10:32:23, 81.60s/it]
 26%|██▋       | 166/630 [2:40:04<10:31:41, 81.68s/it]
 27%|██▋       | 167/630 [2:41:26<10:30:15, 81.68s/it]
 27%|██▋       | 168/630 [2:42:47<10:28:49, 81.67s/it]
 27%|██▋       | 169/630 [2:44:09<10:27:50, 81.71s/it]
 27%|██▋       | 170/630 [2:45:31<10:27:21, 81.83s/it]
 27%|██▋       | 171/630 [2:46:53<10:26:38, 81.91s/it]
 27%|██▋       | 172/630 [2:48:15<10:25:16, 81.91s/it]
 27%|██▋       | 173/630 [2:49:38<10:24:43, 82.02s/it]
 28%|██▊       | 174/630 [2:50:59<10:22:55, 81.96s/it]
 28%|██▊       | 175/630 [2:52:21<10:21:21, 81.94s/it]
 28%|██▊       | 176/630 [2:53:44<10:20:46, 82.04s/it]
 28%|██▊       | 177/630 [2:55:06<10:19:28, 82.05s/it]
 28%|██▊       | 178/630 [2:56:27<10:17:36, 81.98s/it]
 28%|██▊       | 179/630 [2:57:49<10:15:59, 81.95s/it]
 29%|██▊       | 180/630 [2:59:11<10:13:51, 81.85s/it]
                                                      
{'loss': 0.5314, 'learning_rate': 0.0001, 'global_step': 180, 'epoch': 0.86}

 29%|██▊       | 180/630 [2:59:11<10:13:51, 81.85s/it]
 29%|██▊       | 181/630 [3:00:33<10:12:41, 81.87s/it]
 29%|██▉       | 182/630 [3:01:54<10:10:10, 81.72s/it]
 29%|██▉       | 183/630 [3:03:16<10:08:40, 81.70s/it]
 29%|██▉       | 184/630 [3:04:38<10:08:04, 81.80s/it]
 29%|██▉       | 185/630 [3:05:58<10:03:33, 81.38s/it]
 30%|██▉       | 186/630 [3:07:20<10:03:04, 81.50s/it]
 30%|██▉       | 187/630 [3:08:42<10:02:12, 81.56s/it]
 30%|██▉       | 188/630 [3:10:03<10:00:57, 81.58s/it]
 30%|███       | 189/630 [3:11:25<10:00:18, 81.68s/it]
 30%|███       | 190/630 [3:12:47<9:58:54, 81.67s/it] 
 30%|███       | 191/630 [3:14:09<9:59:05, 81.88s/it]
 30%|███       | 192/630 [3:15:31<9:57:08, 81.80s/it]
 31%|███       | 193/630 [3:16:53<9:56:32, 81.91s/it]
 31%|███       | 194/630 [3:18:15<9:55:06, 81.90s/it]
 31%|███       | 195/630 [3:19:37<9:53:02, 81.80s/it]
 31%|███       | 196/630 [3:20:59<9:52:13, 81.88s/it]
 31%|███▏      | 197/630 [3:22:21<9:50:57, 81.89s/it]
 31%|███▏      | 198/630 [3:23:42<9:49:23, 81.86s/it]
 32%|███▏      | 199/630 [3:25:04<9:47:31, 81.79s/it]
 32%|███▏      | 200/630 [3:26:26<9:45:53, 81.75s/it]
                                                     
{'loss': 0.5251, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 0.95}

 32%|███▏      | 200/630 [3:26:26<9:45:53, 81.75s/it]
 32%|███▏      | 201/630 [3:27:47<9:44:42, 81.78s/it]
 32%|███▏      | 202/630 [3:29:10<9:44:02, 81.88s/it]
 32%|███▏      | 203/630 [3:30:31<9:42:42, 81.88s/it]
 32%|███▏      | 204/630 [3:31:53<9:40:47, 81.80s/it]
 33%|███▎      | 205/630 [3:33:15<9:40:30, 81.96s/it]
 33%|███▎      | 206/630 [3:34:37<9:38:53, 81.92s/it]
 33%|███▎      | 207/630 [3:35:59<9:37:21, 81.90s/it]
 33%|███▎      | 208/630 [3:37:21<9:35:55, 81.89s/it]
 33%|███▎      | 209/630 [3:38:43<9:34:01, 81.81s/it]
 33%|███▎      | 210/630 [3:40:04<9:32:44, 81.82s/it]
 33%|███▎      | 211/630 [3:41:21<9:21:02, 80.34s/it]
 34%|███▎      | 212/630 [3:42:43<9:22:53, 80.80s/it]
 34%|███▍      | 213/630 [3:44:05<9:23:41, 81.11s/it]
 34%|███▍      | 214/630 [3:45:27<9:23:30, 81.28s/it]
 34%|███▍      | 215/630 [3:46:49<9:23:21, 81.45s/it]
 34%|███▍      | 216/630 [3:48:09<9:19:44, 81.12s/it]
 34%|███▍      | 217/630 [3:49:31<9:19:51, 81.34s/it]
 35%|███▍      | 218/630 [3:50:53<9:20:33, 81.63s/it]
 35%|███▍      | 219/630 [3:52:15<9:19:09, 81.63s/it]
 35%|███▍      | 220/630 [3:53:37<9:18:21, 81.71s/it]
                                                     
{'loss': 0.347, 'learning_rate': 0.0001, 'global_step': 220, 'epoch': 1.05}

 35%|███▍      | 220/630 [3:53:37<9:18:21, 81.71s/it]
 35%|███▌      | 221/630 [3:54:58<9:17:11, 81.74s/it]
 35%|███▌      | 222/630 [3:56:20<9:16:33, 81.85s/it]
 35%|███▌      | 223/630 [3:57:42<9:14:36, 81.76s/it]
 36%|███▌      | 224/630 [3:59:04<9:13:20, 81.77s/it]
 36%|███▌      | 225/630 [4:00:26<9:11:49, 81.75s/it]
 36%|███▌      | 226/630 [4:01:48<9:11:14, 81.87s/it]
 36%|███▌      | 227/630 [4:03:10<9:10:51, 82.01s/it]
 36%|███▌      | 228/630 [4:04:32<9:09:14, 81.98s/it]
 36%|███▋      | 229/630 [4:05:54<9:07:39, 81.95s/it]
 37%|███▋      | 230/630 [4:07:16<9:06:14, 81.94s/it]
 37%|███▋      | 231/630 [4:08:38<9:04:45, 81.92s/it]
 37%|███▋      | 232/630 [4:09:59<9:03:23, 81.92s/it]
 37%|███▋      | 233/630 [4:11:21<9:01:36, 81.85s/it]
 37%|███▋      | 234/630 [4:12:43<9:00:02, 81.82s/it]
 37%|███▋      | 235/630 [4:14:05<8:58:49, 81.85s/it]
 37%|███▋      | 236/630 [4:15:27<8:57:08, 81.80s/it]
 38%|███▊      | 237/630 [4:16:49<8:56:24, 81.90s/it]
 38%|███▊      | 238/630 [4:18:11<8:54:58, 81.89s/it]
 38%|███▊      | 239/630 [4:19:33<8:54:27, 82.01s/it]
 38%|███▊      | 240/630 [4:20:55<8:52:26, 81.92s/it]
                                                     
{'loss': 0.2006, 'learning_rate': 0.0001, 'global_step': 240, 'epoch': 1.14}

 38%|███▊      | 240/630 [4:20:55<8:52:26, 81.92s/it]
 38%|███▊      | 241/630 [4:22:17<8:51:17, 81.95s/it]
 38%|███▊      | 242/630 [4:23:39<8:50:16, 82.00s/it]
 39%|███▊      | 243/630 [4:25:01<8:48:40, 81.97s/it]
 39%|███▊      | 244/630 [4:26:22<8:46:45, 81.88s/it]
 39%|███▉      | 245/630 [4:27:44<8:44:58, 81.81s/it]
 39%|███▉      | 246/630 [4:29:06<8:43:15, 81.76s/it]
 39%|███▉      | 247/630 [4:30:23<8:34:15, 80.56s/it]
 39%|███▉      | 248/630 [4:31:45<8:35:40, 81.00s/it]
 40%|███▉      | 249/630 [4:33:05<8:31:06, 80.49s/it]
 40%|███▉      | 250/630 [4:34:26<8:32:13, 80.88s/it]
 40%|███▉      | 251/630 [4:35:48<8:32:46, 81.18s/it]
 40%|████      | 252/630 [4:37:10<8:32:37, 81.37s/it]
 40%|████      | 253/630 [4:38:32<8:32:10, 81.51s/it]
 40%|████      | 254/630 [4:39:54<8:31:53, 81.68s/it]
 40%|████      | 255/630 [4:41:16<8:30:48, 81.73s/it]
 41%|████      | 256/630 [4:42:38<8:29:35, 81.75s/it]
 41%|████      | 257/630 [4:43:59<8:28:23, 81.78s/it]
 41%|████      | 258/630 [4:45:22<8:27:27, 81.85s/it]
 41%|████      | 259/630 [4:46:39<8:18:52, 80.68s/it]
 41%|████▏     | 260/630 [4:48:01<8:19:41, 81.03s/it]
                                                     
{'loss': 0.165, 'learning_rate': 0.0001, 'global_step': 260, 'epoch': 1.24}

 41%|████▏     | 260/630 [4:48:01<8:19:41, 81.03s/it]
 41%|████▏     | 261/630 [4:49:23<8:19:20, 81.19s/it]
 42%|████▏     | 262/630 [4:50:45<8:19:17, 81.41s/it]
 42%|████▏     | 263/630 [4:52:06<8:18:12, 81.45s/it]
 42%|████▏     | 264/630 [4:53:28<8:17:31, 81.56s/it]
 42%|████▏     | 265/630 [4:54:50<8:16:38, 81.64s/it]
 42%|████▏     | 266/630 [4:56:13<8:18:35, 82.19s/it]
 42%|████▏     | 267/630 [4:57:35<8:16:34, 82.08s/it]
 43%|████▎     | 268/630 [4:58:57<8:14:08, 81.90s/it]
 43%|████▎     | 269/630 [5:00:18<8:12:13, 81.81s/it]
 43%|████▎     | 270/630 [5:01:39<8:09:16, 81.54s/it]
 43%|████▎     | 271/630 [5:03:01<8:08:22, 81.62s/it]
 43%|████▎     | 272/630 [5:04:23<8:07:03, 81.63s/it]
 43%|████▎     | 273/630 [5:05:45<8:06:07, 81.70s/it]
 43%|████▎     | 274/630 [5:07:06<8:04:41, 81.69s/it]
 44%|████▎     | 275/630 [5:08:27<8:01:07, 81.32s/it]
 44%|████▍     | 276/630 [5:09:46<7:56:14, 80.72s/it]
 44%|████▍     | 277/630 [5:11:08<7:56:51, 81.05s/it]
 44%|████▍     | 278/630 [5:12:30<7:57:01, 81.31s/it]
 44%|████▍     | 279/630 [5:13:52<7:57:01, 81.54s/it]
 44%|████▍     | 280/630 [5:15:14<7:56:17, 81.65s/it]
                                                     
{'loss': 0.2124, 'learning_rate': 0.0001, 'global_step': 280, 'epoch': 1.33}

 44%|████▍     | 280/630 [5:15:14<7:56:17, 81.65s/it]
 45%|████▍     | 281/630 [5:16:36<7:55:59, 81.83s/it]
 45%|████▍     | 282/630 [5:17:58<7:55:22, 81.96s/it]
 45%|████▍     | 283/630 [5:19:20<7:53:35, 81.89s/it]
 45%|████▌     | 284/630 [5:20:42<7:51:51, 81.83s/it]
 45%|████▌     | 285/630 [5:22:02<7:47:33, 81.32s/it]
 45%|████▌     | 286/630 [5:23:23<7:46:44, 81.41s/it]
 46%|████▌     | 287/630 [5:24:45<7:44:49, 81.31s/it]
 46%|████▌     | 288/630 [5:26:06<7:44:19, 81.46s/it]
 46%|████▌     | 289/630 [5:27:28<7:43:58, 81.64s/it]
 46%|████▌     | 290/630 [5:28:51<7:43:42, 81.83s/it]
 46%|████▌     | 291/630 [5:30:12<7:41:24, 81.67s/it]
 46%|████▋     | 292/630 [5:31:31<7:36:26, 81.03s/it]
 47%|████▋     | 293/630 [5:32:53<7:36:25, 81.26s/it]
 47%|████▋     | 294/630 [5:34:15<7:35:59, 81.43s/it]
 47%|████▋     | 295/630 [5:35:37<7:34:57, 81.49s/it]
 47%|████▋     | 296/630 [5:36:58<7:33:30, 81.47s/it]
 47%|████▋     | 297/630 [5:38:17<7:27:50, 80.69s/it]
 47%|████▋     | 298/630 [5:39:39<7:28:35, 81.07s/it]
 47%|████▋     | 299/630 [5:41:01<7:28:52, 81.37s/it]
 48%|████▊     | 300/630 [5:42:23<7:28:02, 81.46s/it]
                                                     
{'loss': 0.2007, 'learning_rate': 0.0001, 'global_step': 300, 'epoch': 1.43}

 48%|████▊     | 300/630 [5:42:23<7:28:02, 81.46s/it]
 48%|████▊     | 301/630 [5:43:45<7:27:55, 81.69s/it]
 48%|████▊     | 302/630 [5:45:07<7:26:49, 81.74s/it]
 48%|████▊     | 303/630 [5:46:28<7:25:17, 81.70s/it]
 48%|████▊     | 304/630 [5:47:50<7:24:14, 81.76s/it]
 48%|████▊     | 305/630 [5:49:12<7:22:50, 81.76s/it]
 49%|████▊     | 306/630 [5:50:34<7:21:27, 81.75s/it]
 49%|████▊     | 307/630 [5:51:56<7:20:17, 81.79s/it]
 49%|████▉     | 308/630 [5:53:17<7:18:57, 81.79s/it]
 49%|████▉     | 309/630 [5:54:38<7:16:09, 81.52s/it]
 49%|████▉     | 310/630 [5:56:00<7:14:37, 81.49s/it]
 49%|████▉     | 311/630 [5:57:21<7:13:29, 81.53s/it]
 50%|████▉     | 312/630 [5:58:43<7:12:19, 81.57s/it]
 50%|████▉     | 313/630 [6:00:05<7:11:55, 81.75s/it]
 50%|████▉     | 314/630 [6:01:27<7:11:16, 81.89s/it]
 50%|█████     | 315/630 [6:02:49<7:09:32, 81.82s/it]
 50%|█████     | 316/630 [6:04:11<7:08:31, 81.88s/it]
 50%|█████     | 317/630 [6:05:33<7:06:45, 81.81s/it]
 50%|█████     | 318/630 [6:06:54<7:05:08, 81.76s/it]
 51%|█████     | 319/630 [6:08:16<7:03:36, 81.73s/it]
 51%|█████     | 320/630 [6:09:38<7:01:58, 81.67s/it]
                                                     
{'loss': 0.1981, 'learning_rate': 0.0001, 'global_step': 320, 'epoch': 1.52}

 51%|█████     | 320/630 [6:09:38<7:01:58, 81.67s/it]
 51%|█████     | 321/630 [6:10:59<7:00:51, 81.72s/it]
 51%|█████     | 322/630 [6:12:21<6:59:21, 81.69s/it]
 51%|█████▏    | 323/630 [6:13:43<6:58:36, 81.81s/it]
 51%|█████▏    | 324/630 [6:15:05<6:57:26, 81.85s/it]
 52%|█████▏    | 325/630 [6:16:27<6:55:46, 81.79s/it]
 52%|█████▏    | 326/630 [6:17:49<6:54:38, 81.84s/it]
 52%|█████▏    | 327/630 [6:19:11<6:53:37, 81.90s/it]
 52%|█████▏    | 328/630 [6:20:33<6:52:13, 81.90s/it]
 52%|█████▏    | 329/630 [6:21:54<6:50:23, 81.81s/it]
 52%|█████▏    | 330/630 [6:23:16<6:48:46, 81.76s/it]
 53%|█████▎    | 331/630 [6:24:38<6:47:23, 81.75s/it]
 53%|█████▎    | 332/630 [6:25:59<6:45:55, 81.73s/it]
 53%|█████▎    | 333/630 [6:27:21<6:44:47, 81.78s/it]
 53%|█████▎    | 334/630 [6:28:43<6:43:11, 81.73s/it]
 53%|█████▎    | 335/630 [6:30:05<6:42:05, 81.78s/it]
 53%|█████▎    | 336/630 [6:31:27<6:41:31, 81.94s/it]
 53%|█████▎    | 337/630 [6:32:49<6:39:46, 81.87s/it]
 54%|█████▎    | 338/630 [6:34:11<6:38:28, 81.88s/it]
 54%|█████▍    | 339/630 [6:35:32<6:36:52, 81.83s/it]
 54%|█████▍    | 340/630 [6:36:54<6:34:47, 81.68s/it]
                                                     
{'loss': 0.2031, 'learning_rate': 0.0001, 'global_step': 340, 'epoch': 1.62}

 54%|█████▍    | 340/630 [6:36:54<6:34:47, 81.68s/it]
 54%|█████▍    | 341/630 [6:38:15<6:33:27, 81.69s/it]
 54%|█████▍    | 342/630 [6:39:37<6:32:42, 81.81s/it]
 54%|█████▍    | 343/630 [6:41:00<6:32:04, 81.97s/it]
 55%|█████▍    | 344/630 [6:42:22<6:30:47, 81.98s/it]
 55%|█████▍    | 345/630 [6:43:42<6:26:53, 81.45s/it]
 55%|█████▍    | 346/630 [6:45:04<6:26:26, 81.64s/it]
 55%|█████▌    | 347/630 [6:46:26<6:25:24, 81.71s/it]
 55%|█████▌    | 348/630 [6:47:48<6:24:39, 81.84s/it]
 55%|█████▌    | 349/630 [6:49:10<6:23:17, 81.84s/it]
 56%|█████▌    | 350/630 [6:50:32<6:21:58, 81.85s/it]
 56%|█████▌    | 351/630 [6:51:54<6:20:38, 81.86s/it]
 56%|█████▌    | 352/630 [6:53:16<6:19:57, 82.01s/it]
 56%|█████▌    | 353/630 [6:54:38<6:18:02, 81.89s/it]
 56%|█████▌    | 354/630 [6:55:59<6:16:28, 81.84s/it]
 56%|█████▋    | 355/630 [6:57:21<6:15:09, 81.85s/it]
 57%|█████▋    | 356/630 [6:58:43<6:13:49, 81.86s/it]
 57%|█████▋    | 357/630 [7:00:05<6:12:12, 81.81s/it]
 57%|█████▋    | 358/630 [7:01:27<6:11:10, 81.88s/it]
 57%|█████▋    | 359/630 [7:02:49<6:10:06, 81.94s/it]
 57%|█████▋    | 360/630 [7:04:11<6:08:23, 81.87s/it]
                                                     
{'loss': 0.1392, 'learning_rate': 0.0001, 'global_step': 360, 'epoch': 1.71}

 57%|█████▋    | 360/630 [7:04:11<6:08:23, 81.87s/it]
 57%|█████▋    | 361/630 [7:05:32<6:06:45, 81.80s/it]
 57%|█████▋    | 362/630 [7:06:54<6:05:23, 81.80s/it]
 58%|█████▊    | 363/630 [7:08:16<6:04:07, 81.82s/it]
 58%|█████▊    | 364/630 [7:09:38<6:02:54, 81.86s/it]
 58%|█████▊    | 365/630 [7:11:00<6:01:48, 81.92s/it]
 58%|█████▊    | 366/630 [7:12:22<6:00:41, 81.98s/it]
 58%|█████▊    | 367/630 [7:13:44<5:59:12, 81.95s/it]
 58%|█████▊    | 368/630 [7:15:06<5:57:41, 81.92s/it]
 59%|█████▊    | 369/630 [7:16:28<5:56:16, 81.90s/it]
 59%|█████▊    | 370/630 [7:17:49<5:54:34, 81.83s/it]
 59%|█████▉    | 371/630 [7:19:11<5:53:00, 81.78s/it]
 59%|█████▉    | 372/630 [7:20:33<5:51:44, 81.80s/it]
 59%|█████▉    | 373/630 [7:21:55<5:50:29, 81.83s/it]
 59%|█████▉    | 374/630 [7:23:17<5:49:10, 81.84s/it]
 60%|█████▉    | 375/630 [7:24:31<5:38:05, 79.55s/it]
 60%|█████▉    | 376/630 [7:25:53<5:39:45, 80.26s/it]
 60%|█████▉    | 377/630 [7:27:12<5:37:05, 79.94s/it]
 60%|██████    | 378/630 [7:28:34<5:38:08, 80.51s/it]
 60%|██████    | 379/630 [7:29:56<5:38:56, 81.02s/it]
 60%|██████    | 380/630 [7:31:18<5:38:18, 81.19s/it]
                                                     
{'loss': 0.204, 'learning_rate': 0.0001, 'global_step': 380, 'epoch': 1.81}

 60%|██████    | 380/630 [7:31:18<5:38:18, 81.19s/it]
 60%|██████    | 381/630 [7:32:40<5:38:01, 81.45s/it]
 61%|██████    | 382/630 [7:34:02<5:37:44, 81.71s/it]
 61%|██████    | 383/630 [7:35:24<5:36:36, 81.77s/it]
 61%|██████    | 384/630 [7:36:46<5:35:27, 81.82s/it]
 61%|██████    | 385/630 [7:38:08<5:34:45, 81.98s/it]
 61%|██████▏   | 386/630 [7:39:30<5:33:19, 81.96s/it]
 61%|██████▏   | 387/630 [7:40:52<5:31:52, 81.95s/it]
 62%|██████▏   | 388/630 [7:42:14<5:30:28, 81.94s/it]
 62%|██████▏   | 389/630 [7:43:36<5:29:04, 81.93s/it]
 62%|██████▏   | 390/630 [7:44:54<5:23:01, 80.76s/it]
 62%|██████▏   | 391/630 [7:46:16<5:23:12, 81.14s/it]
 62%|██████▏   | 392/630 [7:47:37<5:21:51, 81.14s/it]
 62%|██████▏   | 393/630 [7:48:59<5:21:04, 81.29s/it]
 63%|██████▎   | 394/630 [7:50:19<5:18:31, 80.98s/it]
 63%|██████▎   | 395/630 [7:51:41<5:18:13, 81.25s/it]
 63%|██████▎   | 396/630 [7:53:02<5:17:20, 81.37s/it]
 63%|██████▎   | 397/630 [7:54:24<5:16:31, 81.51s/it]
 63%|██████▎   | 398/630 [7:55:46<5:15:19, 81.55s/it]
 63%|██████▎   | 399/630 [7:57:08<5:14:20, 81.65s/it]
 63%|██████▎   | 400/630 [7:58:30<5:13:11, 81.70s/it]
                                                     
{'loss': 0.1626, 'learning_rate': 0.0001, 'global_step': 400, 'epoch': 1.9}

 63%|██████▎   | 400/630 [7:58:30<5:13:11, 81.70s/it]
 64%|██████▎   | 401/630 [7:59:51<5:11:58, 81.74s/it]
 64%|██████▍   | 402/630 [8:01:14<5:10:56, 81.83s/it]
 64%|██████▍   | 403/630 [8:02:35<5:09:31, 81.81s/it]
 64%|██████▍   | 404/630 [8:03:57<5:07:46, 81.71s/it]
 64%|██████▍   | 405/630 [8:05:18<5:06:18, 81.68s/it]
 64%|██████▍   | 406/630 [8:06:40<5:05:04, 81.71s/it]
 65%|██████▍   | 407/630 [8:07:57<4:58:42, 80.37s/it]
 65%|██████▍   | 408/630 [8:09:19<4:59:03, 80.83s/it]
 65%|██████▍   | 409/630 [8:10:41<4:58:51, 81.14s/it]
 65%|██████▌   | 410/630 [8:12:03<4:58:13, 81.33s/it]
 65%|██████▌   | 411/630 [8:13:24<4:56:51, 81.33s/it]
 65%|██████▌   | 412/630 [8:14:46<4:56:14, 81.54s/it]
 66%|██████▌   | 413/630 [8:16:09<4:55:43, 81.77s/it]
 66%|██████▌   | 414/630 [8:17:30<4:54:21, 81.77s/it]
 66%|██████▌   | 415/630 [8:18:52<4:53:06, 81.80s/it]
 66%|██████▌   | 416/630 [8:20:14<4:52:10, 81.92s/it]
 66%|██████▌   | 417/630 [8:21:37<4:51:12, 82.03s/it]
 66%|██████▋   | 418/630 [8:22:58<4:49:25, 81.91s/it]
 67%|██████▋   | 419/630 [8:24:20<4:48:00, 81.90s/it]
 67%|██████▋   | 420/630 [8:25:42<4:46:26, 81.84s/it]
                                                     
{'loss': 0.146, 'learning_rate': 0.0001, 'global_step': 420, 'epoch': 2.0}

 67%|██████▋   | 420/630 [8:25:42<4:46:26, 81.84s/it]
 67%|██████▋   | 421/630 [8:26:59<4:40:28, 80.52s/it]
 67%|██████▋   | 422/630 [8:28:21<4:40:41, 80.97s/it]
 67%|██████▋   | 423/630 [8:29:43<4:40:24, 81.28s/it]
 67%|██████▋   | 424/630 [8:31:05<4:39:34, 81.43s/it]
 67%|██████▋   | 425/630 [8:32:27<4:38:49, 81.61s/it]
 68%|██████▊   | 426/630 [8:33:49<4:37:52, 81.73s/it]
 68%|██████▊   | 427/630 [8:35:11<4:36:34, 81.75s/it]
 68%|██████▊   | 428/630 [8:36:33<4:35:29, 81.83s/it]
 68%|██████▊   | 429/630 [8:37:55<4:34:05, 81.82s/it]
 68%|██████▊   | 430/630 [8:39:17<4:32:42, 81.81s/it]
 68%|██████▊   | 431/630 [8:40:38<4:31:22, 81.82s/it]
 69%|██████▊   | 432/630 [8:42:00<4:30:00, 81.82s/it]
 69%|██████▊   | 433/630 [8:43:22<4:28:39, 81.82s/it]
 69%|██████▉   | 434/630 [8:44:44<4:27:40, 81.94s/it]
 69%|██████▉   | 435/630 [8:46:07<4:26:34, 82.02s/it]
 69%|██████▉   | 436/630 [8:47:28<4:25:00, 81.96s/it]
 69%|██████▉   | 437/630 [8:48:50<4:23:14, 81.84s/it]
 70%|██████▉   | 438/630 [8:50:11<4:21:36, 81.75s/it]
 70%|██████▉   | 439/630 [8:51:33<4:20:04, 81.70s/it]
 70%|██████▉   | 440/630 [8:52:55<4:18:33, 81.65s/it]
                                                     
{'loss': 0.0404, 'learning_rate': 0.0001, 'global_step': 440, 'epoch': 2.09}

 70%|██████▉   | 440/630 [8:52:55<4:18:33, 81.65s/it]
 70%|███████   | 441/630 [8:54:15<4:15:47, 81.20s/it]
 70%|███████   | 442/630 [8:55:36<4:14:57, 81.37s/it]
 70%|███████   | 443/630 [8:56:58<4:13:57, 81.48s/it]
 70%|███████   | 444/630 [8:58:20<4:12:36, 81.49s/it]
 71%|███████   | 445/630 [8:59:41<4:10:46, 81.33s/it]
 71%|███████   | 446/630 [9:01:02<4:09:37, 81.40s/it]
 71%|███████   | 447/630 [9:02:24<4:08:20, 81.42s/it]
 71%|███████   | 448/630 [9:03:45<4:07:05, 81.46s/it]
 71%|███████▏  | 449/630 [9:05:07<4:05:57, 81.54s/it]
 71%|███████▏  | 450/630 [9:06:28<4:04:32, 81.51s/it]
 72%|███████▏  | 451/630 [9:07:49<4:02:25, 81.26s/it]
 72%|███████▏  | 452/630 [9:09:11<4:01:28, 81.40s/it]
 72%|███████▏  | 453/630 [9:10:33<4:00:24, 81.50s/it]
 72%|███████▏  | 454/630 [9:11:54<3:59:00, 81.48s/it]
 72%|███████▏  | 455/630 [9:13:16<3:57:44, 81.51s/it]
 72%|███████▏  | 456/630 [9:14:37<3:56:00, 81.38s/it]
 73%|███████▎  | 457/630 [9:15:58<3:55:00, 81.50s/it]
 73%|███████▎  | 458/630 [9:17:21<3:54:14, 81.71s/it]
 73%|███████▎  | 459/630 [9:18:42<3:52:54, 81.72s/it]
 73%|███████▎  | 460/630 [9:20:04<3:51:25, 81.68s/it]
                                                     
{'loss': 0.0304, 'learning_rate': 0.0001, 'global_step': 460, 'epoch': 2.19}

 73%|███████▎  | 460/630 [9:20:04<3:51:25, 81.68s/it]
 73%|███████▎  | 461/630 [9:21:26<3:50:00, 81.66s/it]
 73%|███████▎  | 462/630 [9:22:47<3:48:33, 81.63s/it]
 73%|███████▎  | 463/630 [9:24:09<3:47:08, 81.61s/it]
 74%|███████▎  | 464/630 [9:25:30<3:45:46, 81.61s/it]
 74%|███████▍  | 465/630 [9:26:52<3:44:42, 81.71s/it]
 74%|███████▍  | 466/630 [9:28:14<3:43:21, 81.72s/it]
 74%|███████▍  | 467/630 [9:29:36<3:42:12, 81.79s/it]
 74%|███████▍  | 468/630 [9:30:57<3:40:21, 81.62s/it]
 74%|███████▍  | 469/630 [9:32:19<3:39:05, 81.65s/it]
 75%|███████▍  | 470/630 [9:33:41<3:37:47, 81.67s/it]
 75%|███████▍  | 471/630 [9:35:02<3:36:35, 81.73s/it]
 75%|███████▍  | 472/630 [9:36:24<3:34:59, 81.64s/it]
 75%|███████▌  | 473/630 [9:37:45<3:33:10, 81.47s/it]
 75%|███████▌  | 474/630 [9:39:07<3:32:01, 81.55s/it]
 75%|███████▌  | 475/630 [9:40:28<3:30:50, 81.62s/it]
 76%|███████▌  | 476/630 [9:41:50<3:29:33, 81.65s/it]
 76%|███████▌  | 477/630 [9:43:12<3:28:33, 81.79s/it]
 76%|███████▌  | 478/630 [9:44:34<3:27:07, 81.76s/it]
 76%|███████▌  | 479/630 [9:45:56<3:25:39, 81.72s/it]
 76%|███████▌  | 480/630 [9:47:15<3:22:28, 80.99s/it]
                                                     
{'loss': 0.0166, 'learning_rate': 0.0001, 'global_step': 480, 'epoch': 2.28}

 76%|███████▌  | 480/630 [9:47:15<3:22:28, 80.99s/it]
 76%|███████▋  | 481/630 [9:48:36<3:21:27, 81.13s/it]
 77%|███████▋  | 482/630 [9:49:58<3:20:18, 81.21s/it]
 77%|███████▋  | 483/630 [9:51:19<3:19:19, 81.36s/it]
 77%|███████▋  | 484/630 [9:52:41<3:18:21, 81.52s/it]
 77%|███████▋  | 485/630 [9:54:03<3:17:05, 81.56s/it]
 77%|███████▋  | 486/630 [9:55:24<3:15:38, 81.52s/it]
 77%|███████▋  | 487/630 [9:56:46<3:14:14, 81.50s/it]
 77%|███████▋  | 488/630 [9:58:07<3:12:50, 81.48s/it]
 78%|███████▊  | 489/630 [9:59:29<3:11:25, 81.46s/it]
 78%|███████▊  | 490/630 [10:00:50<3:09:45, 81.33s/it]
 78%|███████▊  | 491/630 [10:02:11<3:08:37, 81.42s/it]
 78%|███████▊  | 492/630 [10:03:31<3:06:00, 80.87s/it]
 78%|███████▊  | 493/630 [10:04:53<3:05:21, 81.18s/it]
 78%|███████▊  | 494/630 [10:06:15<3:04:28, 81.38s/it]
 79%|███████▊  | 495/630 [10:07:37<3:03:27, 81.54s/it]
 79%|███████▊  | 496/630 [10:08:59<3:02:22, 81.66s/it]
 79%|███████▉  | 497/630 [10:10:20<3:01:11, 81.74s/it]
 79%|███████▉  | 498/630 [10:11:42<2:59:46, 81.72s/it]
 79%|███████▉  | 499/630 [10:13:04<2:58:25, 81.72s/it]
 79%|███████▉  | 500/630 [10:14:25<2:56:56, 81.66s/it]
                                                      
{'loss': 0.0366, 'learning_rate': 0.0001, 'global_step': 500, 'epoch': 2.38}

 79%|███████▉  | 500/630 [10:14:25<2:56:56, 81.66s/it]
 80%|███████▉  | 501/630 [10:15:47<2:55:31, 81.64s/it]
 80%|███████▉  | 502/630 [10:17:09<2:54:05, 81.60s/it]
 80%|███████▉  | 503/630 [10:18:30<2:52:40, 81.58s/it]
 80%|████████  | 504/630 [10:19:52<2:51:20, 81.59s/it]
 80%|████████  | 505/630 [10:21:13<2:50:06, 81.65s/it]
 80%|████████  | 506/630 [10:22:35<2:48:50, 81.70s/it]
 80%|████████  | 507/630 [10:23:57<2:47:38, 81.77s/it]
 81%|████████  | 508/630 [10:25:19<2:46:06, 81.69s/it]
 81%|████████  | 509/630 [10:26:41<2:45:01, 81.83s/it]
 81%|████████  | 510/630 [10:28:03<2:43:37, 81.81s/it]
 81%|████████  | 511/630 [10:29:24<2:42:05, 81.72s/it]
 81%|████████▏ | 512/630 [10:30:46<2:40:38, 81.68s/it]
 81%|████████▏ | 513/630 [10:32:07<2:39:11, 81.64s/it]
 82%|████████▏ | 514/630 [10:33:29<2:37:49, 81.63s/it]
 82%|████████▏ | 515/630 [10:34:50<2:36:23, 81.60s/it]
 82%|████████▏ | 516/630 [10:36:12<2:35:00, 81.59s/it]
 82%|████████▏ | 517/630 [10:37:34<2:33:43, 81.63s/it]
 82%|████████▏ | 518/630 [10:38:55<2:32:26, 81.66s/it]
 82%|████████▏ | 519/630 [10:40:17<2:30:59, 81.62s/it]
 83%|████████▎ | 520/630 [10:41:38<2:29:08, 81.35s/it]
                                                      
{'loss': 0.018, 'learning_rate': 0.0001, 'global_step': 520, 'epoch': 2.47}

 83%|████████▎ | 520/630 [10:41:38<2:29:08, 81.35s/it]
 83%|████████▎ | 521/630 [10:42:59<2:28:00, 81.47s/it]
 83%|████████▎ | 522/630 [10:44:21<2:26:54, 81.61s/it]
 83%|████████▎ | 523/630 [10:45:43<2:25:34, 81.63s/it]
 83%|████████▎ | 524/630 [10:47:05<2:24:08, 81.59s/it]
 83%|████████▎ | 525/630 [10:48:26<2:22:46, 81.58s/it]
 83%|████████▎ | 526/630 [10:49:48<2:21:22, 81.56s/it]
 84%|████████▎ | 527/630 [10:51:09<2:19:59, 81.55s/it]
 84%|████████▍ | 528/630 [10:52:30<2:18:23, 81.40s/it]
 84%|████████▍ | 529/630 [10:53:52<2:17:16, 81.55s/it]
 84%|████████▍ | 530/630 [10:55:14<2:15:53, 81.53s/it]
 84%|████████▍ | 531/630 [10:56:35<2:14:33, 81.55s/it]
 84%|████████▍ | 532/630 [10:57:57<2:13:09, 81.52s/it]
 85%|████████▍ | 533/630 [10:59:18<2:11:45, 81.50s/it]
 85%|████████▍ | 534/630 [11:00:40<2:10:41, 81.68s/it]
 85%|████████▍ | 535/630 [11:02:02<2:09:35, 81.85s/it]
 85%|████████▌ | 536/630 [11:03:18<2:05:08, 79.88s/it]
 85%|████████▌ | 537/630 [11:04:40<2:04:51, 80.56s/it]
 85%|████████▌ | 538/630 [11:06:01<2:04:01, 80.88s/it]
 86%|████████▌ | 539/630 [11:07:24<2:03:12, 81.24s/it]
 86%|████████▌ | 540/630 [11:08:46<2:02:17, 81.53s/it]
                                                      
{'loss': 0.0186, 'learning_rate': 0.0001, 'global_step': 540, 'epoch': 2.57}

 86%|████████▌ | 540/630 [11:08:46<2:02:17, 81.53s/it]
 86%|████████▌ | 541/630 [11:10:08<2:01:09, 81.68s/it]
 86%|████████▌ | 542/630 [11:11:30<1:59:54, 81.76s/it]
 86%|████████▌ | 543/630 [11:12:52<1:58:38, 81.83s/it]
 86%|████████▋ | 544/630 [11:14:14<1:57:25, 81.93s/it]
 87%|████████▋ | 545/630 [11:15:36<1:56:07, 81.98s/it]
 87%|████████▋ | 546/630 [11:16:58<1:54:41, 81.92s/it]
 87%|████████▋ | 547/630 [11:18:19<1:53:13, 81.85s/it]
 87%|████████▋ | 548/630 [11:19:41<1:51:49, 81.82s/it]
 87%|████████▋ | 549/630 [11:21:02<1:50:07, 81.57s/it]
 87%|████████▋ | 550/630 [11:22:23<1:48:36, 81.46s/it]
 87%|████████▋ | 551/630 [11:23:45<1:47:22, 81.55s/it]
 88%|████████▊ | 552/630 [11:25:07<1:46:09, 81.65s/it]
 88%|████████▊ | 553/630 [11:26:23<1:42:39, 79.99s/it]
 88%|████████▊ | 554/630 [11:27:45<1:42:02, 80.56s/it]
 88%|████████▊ | 555/630 [11:29:07<1:41:06, 80.89s/it]
 88%|████████▊ | 556/630 [11:30:25<1:38:54, 80.20s/it]
 88%|████████▊ | 557/630 [11:31:47<1:38:06, 80.64s/it]
 89%|████████▊ | 558/630 [11:33:09<1:37:14, 81.03s/it]
 89%|████████▊ | 559/630 [11:34:31<1:36:06, 81.23s/it]
 89%|████████▉ | 560/630 [11:35:52<1:34:59, 81.42s/it]
                                                      
{'loss': 0.0308, 'learning_rate': 0.0001, 'global_step': 560, 'epoch': 2.66}

 89%|████████▉ | 560/630 [11:35:52<1:34:59, 81.42s/it]
 89%|████████▉ | 561/630 [11:37:14<1:33:44, 81.51s/it]
 89%|████████▉ | 562/630 [11:38:36<1:32:26, 81.56s/it]
 89%|████████▉ | 563/630 [11:39:58<1:31:18, 81.77s/it]
 90%|████████▉ | 564/630 [11:41:19<1:29:41, 81.54s/it]
 90%|████████▉ | 565/630 [11:42:41<1:28:19, 81.53s/it]
 90%|████████▉ | 566/630 [11:44:02<1:26:57, 81.52s/it]
 90%|█████████ | 567/630 [11:45:24<1:25:48, 81.72s/it]
 90%|█████████ | 568/630 [11:46:46<1:24:22, 81.66s/it]
 90%|█████████ | 569/630 [11:48:08<1:23:10, 81.81s/it]
 90%|█████████ | 570/630 [11:49:30<1:21:47, 81.79s/it]
 91%|█████████ | 571/630 [11:50:51<1:20:24, 81.76s/it]
 91%|█████████ | 572/630 [11:52:13<1:19:06, 81.84s/it]
 91%|█████████ | 573/630 [11:53:35<1:17:45, 81.84s/it]
 91%|█████████ | 574/630 [11:54:57<1:16:18, 81.76s/it]
 91%|█████████▏| 575/630 [11:56:18<1:14:51, 81.66s/it]
 91%|█████████▏| 576/630 [11:57:40<1:13:31, 81.69s/it]
 92%|█████████▏| 577/630 [11:59:02<1:12:15, 81.79s/it]
 92%|█████████▏| 578/630 [12:00:22<1:10:25, 81.26s/it]
 92%|█████████▏| 579/630 [12:01:44<1:09:11, 81.40s/it]
 92%|█████████▏| 580/630 [12:03:05<1:07:51, 81.43s/it]
                                                      
{'loss': 0.0536, 'learning_rate': 0.0001, 'global_step': 580, 'epoch': 2.76}

 92%|█████████▏| 580/630 [12:03:05<1:07:51, 81.43s/it]
 92%|█████████▏| 581/630 [12:04:27<1:06:33, 81.50s/it]
 92%|█████████▏| 582/630 [12:05:49<1:05:18, 81.64s/it]
 93%|█████████▎| 583/630 [12:07:11<1:03:58, 81.66s/it]
 93%|█████████▎| 584/630 [12:08:33<1:02:40, 81.74s/it]
 93%|█████████▎| 585/630 [12:09:55<1:01:21, 81.82s/it]
 93%|█████████▎| 586/630 [12:11:16<59:58, 81.79s/it]  
 93%|█████████▎| 587/630 [12:12:38<58:36, 81.77s/it]
 93%|█████████▎| 588/630 [12:14:00<57:11, 81.71s/it]
 93%|█████████▎| 589/630 [12:15:21<55:50, 81.73s/it]
 94%|█████████▎| 590/630 [12:16:43<54:31, 81.80s/it]
 94%|█████████▍| 591/630 [12:18:05<53:08, 81.76s/it]
 94%|█████████▍| 592/630 [12:19:27<51:46, 81.76s/it]
 94%|█████████▍| 593/630 [12:20:49<50:26, 81.79s/it]
 94%|█████████▍| 594/630 [12:22:11<49:05, 81.83s/it]
 94%|█████████▍| 595/630 [12:23:33<47:46, 81.91s/it]
 95%|█████████▍| 596/630 [12:24:55<46:25, 81.92s/it]
 95%|█████████▍| 597/630 [12:26:16<45:01, 81.87s/it]
 95%|█████████▍| 598/630 [12:27:38<43:40, 81.88s/it]
 95%|█████████▌| 599/630 [12:29:00<42:21, 81.97s/it]
 95%|█████████▌| 600/630 [12:30:22<40:58, 81.97s/it]
                                                    
{'loss': 0.0563, 'learning_rate': 0.0001, 'global_step': 600, 'epoch': 2.85}

 95%|█████████▌| 600/630 [12:30:22<40:58, 81.97s/it]
 95%|█████████▌| 601/630 [12:31:44<39:33, 81.84s/it]
 96%|█████████▌| 602/630 [12:33:06<38:11, 81.83s/it]
 96%|█████████▌| 603/630 [12:34:26<36:32, 81.21s/it]
 96%|█████████▌| 604/630 [12:35:47<35:16, 81.39s/it]
 96%|█████████▌| 605/630 [12:37:09<33:57, 81.50s/it]
 96%|█████████▌| 606/630 [12:38:31<32:38, 81.59s/it]
 96%|█████████▋| 607/630 [12:39:52<31:16, 81.57s/it]
 97%|█████████▋| 608/630 [12:41:14<29:56, 81.64s/it]
 97%|█████████▋| 609/630 [12:42:36<28:35, 81.68s/it]
 97%|█████████▋| 610/630 [12:43:58<27:14, 81.75s/it]
 97%|█████████▋| 611/630 [12:45:20<25:53, 81.75s/it]
 97%|█████████▋| 612/630 [12:46:41<24:31, 81.76s/it]
 97%|█████████▋| 613/630 [12:48:03<23:10, 81.77s/it]
 97%|█████████▋| 614/630 [12:49:25<21:48, 81.78s/it]
 98%|█████████▊| 615/630 [12:50:47<20:25, 81.72s/it]
 98%|█████████▊| 616/630 [12:52:08<19:04, 81.74s/it]
 98%|█████████▊| 617/630 [12:53:30<17:42, 81.71s/it]
 98%|█████████▊| 618/630 [12:54:52<16:21, 81.81s/it]
 98%|█████████▊| 619/630 [12:56:14<14:59, 81.77s/it]
 98%|█████████▊| 620/630 [12:57:35<13:36, 81.68s/it]
                                                    
{'loss': 0.0242, 'learning_rate': 0.0001, 'global_step': 620, 'epoch': 2.95}

 98%|█████████▊| 620/630 [12:57:35<13:36, 81.68s/it]
 99%|█████████▊| 621/630 [12:58:57<12:16, 81.80s/it]
 99%|█████████▊| 622/630 [13:00:19<10:53, 81.71s/it]
 99%|█████████▉| 623/630 [13:01:41<09:31, 81.71s/it]
 99%|█████████▉| 624/630 [13:03:02<08:09, 81.66s/it]
 99%|█████████▉| 625/630 [13:04:21<06:44, 80.87s/it]
 99%|█████████▉| 626/630 [13:05:43<05:24, 81.09s/it]
100%|█████████▉| 627/630 [13:07:04<04:03, 81.23s/it]
100%|█████████▉| 628/630 [13:08:26<02:42, 81.33s/it]
100%|█████████▉| 629/630 [13:09:48<01:21, 81.48s/it]
100%|██████████| 630/630 [13:11:09<00:00, 81.50s/it]
                                                    
{'train_runtime': 47469.8541, 'train_samples_per_second': 0.159, 'train_steps_per_second': 0.013, 'train_loss': 0.24650317042592973, 'epoch': 3.0}

100%|██████████| 630/630 [13:11:09<00:00, 81.50s/it]
100%|██████████| 630/630 [13:11:09<00:00, 75.35s/it]
***** train metrics *****
  epoch                    =         3.0
  train_loss               =      0.2465
  train_runtime            = 13:11:09.85
  train_samples_per_second =       0.159
  train_steps_per_second   =       0.013