Asimok's picture
Upload 36 files
89c2adb
model training desc: CCLUE-MRC数据集,使用随机选择的关键句训练
2023-12-06 10:40:28.159 | INFO | __main__:init_components:108 - Initializing components...
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|█████ | 1/2 [00:37<00:37, 37.01s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:53<00:00, 24.68s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:53<00:00, 26.53s/it]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2023-12-06 10:41:21.733 | INFO | __main__:init_components:155 -
2023-12-06 10:41:21.733 | INFO | __main__:init_components:156 - ********************
2023-12-06 10:41:21.733 | INFO | __main__:init_components:157 - using llama2 model
2023-12-06 10:41:21.733 | INFO | __main__:init_components:158 - ********************
2023-12-06 10:41:21.733 | INFO | __main__:init_components:159 -
memory footprint of model: 5.472740173339844 GB
trainable params: 319,815,680 || all params: 7,447,007,232 || trainable%: 4.294553100818044
2023-12-06 10:42:04.751 | INFO | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/CCLUE/random_select/cclue_chunk_1400_instruct/train.jsonl
2023-12-06 10:42:04.804 | INFO | component.dataset:__init__:19 - there are 3473 data in dataset
2023-12-06 10:42:04.946 | INFO | __main__:main:231 - *** starting training ***
0%| | 0/522 [00:00<?, ?it/s] 0%| | 1/522 [00:21<3:07:41, 21.62s/it] 0%| | 2/522 [00:45<3:19:10, 22.98s/it] 1%| | 3/522 [01:10<3:26:19, 23.85s/it] 1%| | 4/522 [01:34<3:26:23, 23.91s/it] 1%| | 5/522 [01:57<3:22:37, 23.51s/it] 1%| | 6/522 [02:22<3:27:27, 24.12s/it] 1%|▏ | 7/522 [02:43<3:18:43, 23.15s/it] 2%|▏ | 8/522 [03:06<3:16:41, 22.96s/it] 2%|▏ | 9/522 [03:25<3:07:42, 21.95s/it] 2%|▏ | 10/522 [03:50<3:14:05, 22.75s/it] 2%|▏ | 11/522 [04:15<3:20:00, 23.48s/it] 2%|▏ | 12/522 [04:35<3:09:56, 22.35s/it] 2%|▏ | 13/522 [05:05<3:28:31, 24.58s/it] 3%|▎ | 14/522 [05:31<3:32:27, 25.09s/it] 3%|▎ | 15/522 [05:56<3:33:14, 25.24s/it] 3%|▎ | 16/522 [06:18<3:23:56, 24.18s/it] 3%|▎ | 17/522 [06:42<3:23:45, 24.21s/it] 3%|▎ | 18/522 [07:07<3:25:06, 24.42s/it] 4%|▎ | 19/522 [07:29<3:17:14, 23.53s/it] 4%|▍ | 20/522 [07:57<3:28:07, 24.88s/it] {'loss': 0.3277, 'learning_rate': 3.7735849056603776e-05, 'global_step': 20, 'epoch': 0.11}
4%|▍ | 20/522 [07:57<3:28:07, 24.88s/it] 4%|▍ | 21/522 [08:18<3:17:50, 23.69s/it] 4%|▍ | 22/522 [08:42<3:18:21, 23.80s/it] 4%|▍ | 23/522 [09:08<3:23:26, 24.46s/it] 5%|▍ | 24/522 [09:37<3:35:14, 25.93s/it] 5%|▍ | 25/522 [10:01<3:30:11, 25.37s/it] 5%|▍ | 26/522 [10:23<3:19:39, 24.15s/it] 5%|▌ | 27/522 [10:52<3:31:45, 25.67s/it] 5%|▌ | 28/522 [11:12<3:17:53, 24.04s/it] 6%|▌ | 29/522 [11:33<3:08:57, 23.00s/it] 6%|▌ | 30/522 [11:57<3:11:03, 23.30s/it] 6%|▌ | 31/522 [12:27<3:27:49, 25.40s/it] 6%|▌ | 32/522 [12:48<3:16:39, 24.08s/it] 6%|▋ | 33/522 [13:11<3:14:54, 23.92s/it] 7%|▋ | 34/522 [13:34<3:10:52, 23.47s/it] 7%|▋ | 35/522 [13:58<3:13:10, 23.80s/it] 7%|▋ | 36/522 [14:20<3:07:12, 23.11s/it] 7%|▋ | 37/522 [14:46<3:12:53, 23.86s/it] 7%|▋ | 38/522 [15:09<3:12:33, 23.87s/it] 7%|▋ | 39/522 [15:31<3:05:39, 23.06s/it] 8%|▊ | 40/522 [15:51<2:59:28, 22.34s/it] {'loss': 0.2854, 'learning_rate': 7.358490566037736e-05, 'global_step': 40, 'epoch': 0.23}
8%|▊ | 40/522 [15:51<2:59:28, 22.34s/it] 8%|▊ | 41/522 [16:16<3:05:28, 23.14s/it] 8%|▊ | 42/522 [16:44<3:15:40, 24.46s/it] 8%|▊ | 43/522 [17:06<3:09:16, 23.71s/it] 8%|▊ | 44/522 [17:26<3:00:15, 22.63s/it] 9%|▊ | 45/522 [17:46<2:53:44, 21.85s/it] 9%|▉ | 46/522 [18:12<3:02:14, 22.97s/it] 9%|▉ | 47/522 [18:33<2:57:17, 22.39s/it] 9%|▉ | 48/522 [18:54<2:53:46, 22.00s/it] 9%|▉ | 49/522 [19:18<3:00:05, 22.84s/it] 10%|▉ | 50/522 [19:39<2:54:54, 22.23s/it] 10%|▉ | 51/522 [20:03<2:58:14, 22.71s/it] 10%|▉ | 52/522 [20:24<2:54:52, 22.32s/it] 10%|█ | 53/522 [20:46<2:53:30, 22.20s/it] 10%|█ | 54/522 [21:10<2:57:00, 22.69s/it] 11%|█ | 55/522 [21:32<2:53:44, 22.32s/it] 11%|█ | 56/522 [22:01<3:08:43, 24.30s/it] 11%|█ | 57/522 [22:21<2:59:31, 23.16s/it] 11%|█ | 58/522 [22:46<3:02:31, 23.60s/it] 11%|█▏ | 59/522 [23:13<3:11:26, 24.81s/it] 11%|█▏ | 60/522 [23:36<3:06:10, 24.18s/it] {'loss': 0.2546, 'learning_rate': 0.0001, 'global_step': 60, 'epoch': 0.34}
11%|█▏ | 60/522 [23:36<3:06:10, 24.18s/it] 12%|█▏ | 61/522 [24:01<3:06:29, 24.27s/it] 12%|█▏ | 62/522 [24:26<3:08:36, 24.60s/it] 12%|█▏ | 63/522 [24:46<2:57:26, 23.19s/it] 12%|█▏ | 64/522 [25:13<3:05:09, 24.26s/it] 12%|█▏ | 65/522 [25:31<2:51:05, 22.46s/it] 13%|█▎ | 66/522 [25:57<2:58:17, 23.46s/it] 13%|█▎ | 67/522 [26:20<2:56:47, 23.31s/it] 13%|█▎ | 68/522 [26:43<2:55:36, 23.21s/it] 13%|█▎ | 69/522 [27:12<3:09:21, 25.08s/it] 13%|█▎ | 70/522 [27:33<2:59:29, 23.83s/it] 14%|█▎ | 71/522 [27:52<2:48:30, 22.42s/it] 14%|█▍ | 72/522 [28:14<2:46:45, 22.23s/it] 14%|█▍ | 73/522 [28:32<2:36:56, 20.97s/it] 14%|█▍ | 74/522 [28:51<2:32:58, 20.49s/it] 14%|█▍ | 75/522 [29:15<2:40:20, 21.52s/it] 15%|█▍ | 76/522 [29:41<2:49:06, 22.75s/it] 15%|█▍ | 77/522 [30:01<2:43:54, 22.10s/it] 15%|█▍ | 78/522 [30:24<2:45:30, 22.37s/it] 15%|█▌ | 79/522 [30:46<2:42:33, 22.02s/it] 15%|█▌ | 80/522 [31:16<3:00:12, 24.46s/it] {'loss': 0.2746, 'learning_rate': 0.0001, 'global_step': 80, 'epoch': 0.46}
15%|█▌ | 80/522 [31:16<3:00:12, 24.46s/it] 16%|█▌ | 81/522 [31:36<2:50:11, 23.16s/it] 16%|█▌ | 82/522 [32:01<2:55:10, 23.89s/it] 16%|█▌ | 83/522 [32:21<2:45:03, 22.56s/it] 16%|█▌ | 84/522 [32:44<2:45:51, 22.72s/it] 16%|█▋ | 85/522 [33:05<2:42:38, 22.33s/it] 16%|█▋ | 86/522 [33:29<2:44:11, 22.59s/it] 17%|█▋ | 87/522 [33:49<2:39:38, 22.02s/it] 17%|█▋ | 88/522 [34:13<2:42:52, 22.52s/it] 17%|█▋ | 89/522 [34:40<2:51:56, 23.83s/it] 17%|█▋ | 90/522 [35:01<2:45:20, 22.96s/it] 17%|█▋ | 91/522 [35:24<2:45:26, 23.03s/it] 18%|█▊ | 92/522 [35:46<2:42:01, 22.61s/it] 18%|█▊ | 93/522 [36:11<2:47:27, 23.42s/it] 18%|█▊ | 94/522 [36:36<2:50:42, 23.93s/it] 18%|█▊ | 95/522 [36:58<2:45:29, 23.25s/it] 18%|█▊ | 96/522 [37:20<2:43:53, 23.08s/it] 19%|█▊ | 97/522 [37:45<2:46:21, 23.49s/it] 19%|█▉ | 98/522 [38:08<2:44:20, 23.26s/it] 19%|█▉ | 99/522 [38:36<2:54:06, 24.70s/it] 19%|█▉ | 100/522 [38:59<2:50:46, 24.28s/it] {'loss': 0.2706, 'learning_rate': 0.0001, 'global_step': 100, 'epoch': 0.57}
19%|█▉ | 100/522 [38:59<2:50:46, 24.28s/it] 19%|█▉ | 101/522 [39:19<2:41:33, 23.02s/it] 20%|█▉ | 102/522 [39:48<2:53:41, 24.81s/it] 20%|█▉ | 103/522 [40:08<2:42:59, 23.34s/it] 20%|█▉ | 104/522 [40:27<2:34:33, 22.18s/it] 20%|██ | 105/522 [40:49<2:33:18, 22.06s/it] 20%|██ | 106/522 [41:11<2:33:07, 22.09s/it] 20%|██ | 107/522 [41:35<2:35:06, 22.43s/it] 21%|██ | 108/522 [41:56<2:32:20, 22.08s/it] 21%|██ | 109/522 [42:15<2:25:34, 21.15s/it] 21%|██ | 110/522 [42:41<2:35:17, 22.61s/it] 21%|██▏ | 111/522 [43:05<2:37:57, 23.06s/it] 21%|██▏ | 112/522 [43:27<2:34:41, 22.64s/it] 22%|██▏ | 113/522 [43:51<2:36:59, 23.03s/it] 22%|██▏ | 114/522 [44:15<2:39:01, 23.39s/it] 22%|██▏ | 115/522 [44:37<2:36:35, 23.08s/it] 22%|██▏ | 116/522 [45:02<2:40:18, 23.69s/it] 22%|██▏ | 117/522 [45:20<2:27:59, 21.93s/it] 23%|██▎ | 118/522 [45:39<2:22:31, 21.17s/it] 23%|██▎ | 119/522 [46:10<2:40:21, 23.87s/it] 23%|██▎ | 120/522 [46:37<2:47:25, 24.99s/it] {'loss': 0.2405, 'learning_rate': 0.0001, 'global_step': 120, 'epoch': 0.69}
23%|██▎ | 120/522 [46:37<2:47:25, 24.99s/it] 23%|██▎ | 121/522 [46:57<2:36:18, 23.39s/it] 23%|██▎ | 122/522 [47:18<2:31:02, 22.66s/it] 24%|██▎ | 123/522 [47:40<2:29:15, 22.45s/it] 24%|██▍ | 124/522 [48:04<2:32:32, 23.00s/it] 24%|██▍ | 125/522 [48:25<2:27:58, 22.37s/it] 24%|██▍ | 126/522 [48:45<2:22:30, 21.59s/it] 24%|██▍ | 127/522 [49:11<2:30:34, 22.87s/it] 25%|██▍ | 128/522 [49:36<2:34:14, 23.49s/it] 25%|██▍ | 129/522 [49:55<2:26:11, 22.32s/it] 25%|██▍ | 130/522 [50:19<2:29:30, 22.88s/it] 25%|██▌ | 131/522 [50:39<2:22:36, 21.88s/it] 25%|██▌ | 132/522 [50:58<2:17:25, 21.14s/it] 25%|██▌ | 133/522 [51:23<2:23:20, 22.11s/it] 26%|██▌ | 134/522 [51:44<2:21:03, 21.81s/it] 26%|██▌ | 135/522 [52:09<2:26:44, 22.75s/it] 26%|██▌ | 136/522 [52:33<2:29:02, 23.17s/it] 26%|██▌ | 137/522 [52:53<2:23:03, 22.30s/it] 26%|██▋ | 138/522 [53:18<2:27:31, 23.05s/it] 27%|██▋ | 139/522 [53:43<2:31:42, 23.77s/it] 27%|██▋ | 140/522 [54:08<2:32:13, 23.91s/it] {'loss': 0.31, 'learning_rate': 0.0001, 'global_step': 140, 'epoch': 0.8}
27%|██▋ | 140/522 [54:08<2:32:13, 23.91s/it] 27%|██▋ | 141/522 [54:26<2:20:27, 22.12s/it] 27%|██▋ | 142/522 [54:47<2:19:26, 22.02s/it] 27%|██▋ | 143/522 [55:12<2:23:26, 22.71s/it] 28%|██▊ | 144/522 [55:36<2:26:56, 23.32s/it] 28%|██▊ | 145/522 [55:56<2:19:39, 22.23s/it] 28%|██▊ | 146/522 [56:22<2:26:25, 23.37s/it] 28%|██▊ | 147/522 [56:49<2:32:25, 24.39s/it] 28%|██▊ | 148/522 [57:14<2:34:15, 24.75s/it] 29%|██▊ | 149/522 [57:35<2:25:50, 23.46s/it] 29%|██▊ | 150/522 [57:59<2:26:41, 23.66s/it] 29%|██▉ | 151/522 [58:17<2:15:08, 21.86s/it] 29%|██▉ | 152/522 [58:44<2:24:33, 23.44s/it] 29%|██▉ | 153/522 [59:06<2:22:27, 23.16s/it] 30%|██▉ | 154/522 [59:33<2:27:59, 24.13s/it] 30%|██▉ | 155/522 [59:54<2:22:00, 23.22s/it] 30%|██▉ | 156/522 [1:00:20<2:26:14, 23.98s/it] 30%|███ | 157/522 [1:00:38<2:15:19, 22.24s/it] 30%|███ | 158/522 [1:01:01<2:16:52, 22.56s/it] 30%|███ | 159/522 [1:01:30<2:28:49, 24.60s/it] 31%|███ | 160/522 [1:01:56<2:29:50, 24.83s/it] {'loss': 0.3137, 'learning_rate': 0.0001, 'global_step': 160, 'epoch': 0.92}
31%|███ | 160/522 [1:01:56<2:29:50, 24.83s/it] 31%|███ | 161/522 [1:02:20<2:27:41, 24.55s/it] 31%|███ | 162/522 [1:02:42<2:22:56, 23.82s/it] 31%|███ | 163/522 [1:03:03<2:16:59, 22.90s/it] 31%|███▏ | 164/522 [1:03:22<2:11:14, 22.00s/it] 32%|███▏ | 165/522 [1:03:41<2:04:54, 20.99s/it] 32%|███▏ | 166/522 [1:04:03<2:06:03, 21.25s/it] 32%|███▏ | 167/522 [1:04:25<2:06:55, 21.45s/it] 32%|███▏ | 168/522 [1:04:46<2:05:15, 21.23s/it] 32%|███▏ | 169/522 [1:05:12<2:14:49, 22.92s/it] 33%|███▎ | 170/522 [1:05:38<2:19:03, 23.70s/it] 33%|███▎ | 171/522 [1:06:05<2:24:56, 24.78s/it] 33%|███▎ | 172/522 [1:06:26<2:17:10, 23.52s/it] 33%|███▎ | 173/522 [1:06:50<2:17:47, 23.69s/it] 33%|███▎ | 174/522 [1:07:05<2:01:50, 21.01s/it] 34%|███▎ | 175/522 [1:07:27<2:02:58, 21.26s/it] 34%|███▎ | 176/522 [1:07:49<2:04:56, 21.67s/it] 34%|███▍ | 177/522 [1:08:16<2:13:27, 23.21s/it] 34%|███▍ | 178/522 [1:08:43<2:18:52, 24.22s/it] 34%|███▍ | 179/522 [1:09:02<2:10:13, 22.78s/it] 34%|███▍ | 180/522 [1:09:27<2:13:28, 23.42s/it] {'loss': 0.2409, 'learning_rate': 0.0001, 'global_step': 180, 'epoch': 1.03}
34%|███▍ | 180/522 [1:09:27<2:13:28, 23.42s/it] 35%|███▍ | 181/522 [1:09:52<2:15:42, 23.88s/it] 35%|███▍ | 182/522 [1:10:18<2:19:14, 24.57s/it] 35%|███▌ | 183/522 [1:10:38<2:11:13, 23.23s/it] 35%|███▌ | 184/522 [1:11:01<2:10:04, 23.09s/it] 35%|███▌ | 185/522 [1:11:27<2:15:14, 24.08s/it] 36%|███▌ | 186/522 [1:11:49<2:10:24, 23.29s/it] 36%|███▌ | 187/522 [1:12:12<2:10:40, 23.40s/it] 36%|███▌ | 188/522 [1:12:38<2:13:45, 24.03s/it] 36%|███▌ | 189/522 [1:13:05<2:17:48, 24.83s/it] 36%|███▋ | 190/522 [1:13:34<2:24:59, 26.20s/it] 37%|███▋ | 191/522 [1:13:59<2:22:03, 25.75s/it] 37%|███▋ | 192/522 [1:14:20<2:13:54, 24.35s/it] 37%|███▋ | 193/522 [1:14:42<2:10:05, 23.72s/it] 37%|███▋ | 194/522 [1:15:06<2:09:22, 23.67s/it] 37%|███▋ | 195/522 [1:15:32<2:13:21, 24.47s/it] 38%|███▊ | 196/522 [1:16:01<2:19:48, 25.73s/it] 38%|███▊ | 197/522 [1:16:25<2:17:04, 25.31s/it] 38%|███▊ | 198/522 [1:16:45<2:08:58, 23.88s/it] 38%|███▊ | 199/522 [1:17:10<2:09:28, 24.05s/it] 38%|███▊ | 200/522 [1:17:35<2:10:47, 24.37s/it] {'loss': 0.1111, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 1.15}
38%|███▊ | 200/522 [1:17:35<2:10:47, 24.37s/it] 39%|███▊ | 201/522 [1:17:56<2:04:15, 23.23s/it] 39%|███▊ | 202/522 [1:18:18<2:02:15, 22.92s/it] 39%|███▉ | 203/522 [1:18:43<2:04:52, 23.49s/it] 39%|███▉ | 204/522 [1:19:03<1:59:27, 22.54s/it] 39%|███▉ | 205/522 [1:19:29<2:04:36, 23.59s/it] 39%|███▉ | 206/522 [1:19:49<1:58:25, 22.48s/it] 40%|███▉ | 207/522 [1:20:08<1:53:34, 21.63s/it] 40%|███▉ | 208/522 [1:20:33<1:58:03, 22.56s/it] 40%|████ | 209/522 [1:20:59<2:02:32, 23.49s/it] 40%|████ | 210/522 [1:21:18<1:56:04, 22.32s/it] 40%|████ | 211/522 [1:21:39<1:53:01, 21.80s/it] 41%|████ | 212/522 [1:22:00<1:51:52, 21.65s/it] 41%|████ | 213/522 [1:22:24<1:55:17, 22.39s/it] 41%|████ | 214/522 [1:22:44<1:50:00, 21.43s/it] 41%|████ | 215/522 [1:23:07<1:51:56, 21.88s/it] 41%|████▏ | 216/522 [1:23:26<1:48:13, 21.22s/it] 42%|████▏ | 217/522 [1:23:46<1:44:55, 20.64s/it] 42%|████▏ | 218/522 [1:24:06<1:43:39, 20.46s/it] 42%|████▏ | 219/522 [1:24:31<1:51:07, 22.00s/it] 42%|████▏ | 220/522 [1:24:59<1:58:56, 23.63s/it] {'loss': 0.1792, 'learning_rate': 0.0001, 'global_step': 220, 'epoch': 1.26}
42%|████▏ | 220/522 [1:24:59<1:58:56, 23.63s/it] 42%|████▏ | 221/522 [1:25:28<2:07:48, 25.48s/it] 43%|████▎ | 222/522 [1:25:54<2:07:03, 25.41s/it] 43%|████▎ | 223/522 [1:26:15<2:00:58, 24.28s/it] 43%|████▎ | 224/522 [1:26:38<1:58:08, 23.79s/it] 43%|████▎ | 225/522 [1:26:57<1:50:22, 22.30s/it] 43%|████▎ | 226/522 [1:27:21<1:52:56, 22.89s/it] 43%|████▎ | 227/522 [1:27:43<1:51:43, 22.72s/it] 44%|████▎ | 228/522 [1:28:03<1:47:33, 21.95s/it] 44%|████▍ | 229/522 [1:28:24<1:45:01, 21.51s/it] 44%|████▍ | 230/522 [1:28:48<1:48:04, 22.21s/it] 44%|████▍ | 231/522 [1:29:07<1:43:54, 21.42s/it] 44%|████▍ | 232/522 [1:29:26<1:39:56, 20.68s/it] 45%|████▍ | 233/522 [1:29:52<1:46:06, 22.03s/it] 45%|████▍ | 234/522 [1:30:13<1:45:04, 21.89s/it] 45%|████▌ | 235/522 [1:30:38<1:49:00, 22.79s/it] 45%|████▌ | 236/522 [1:31:00<1:47:05, 22.47s/it] 45%|████▌ | 237/522 [1:31:20<1:43:34, 21.80s/it] 46%|████▌ | 238/522 [1:31:41<1:42:25, 21.64s/it] 46%|████▌ | 239/522 [1:32:01<1:39:32, 21.10s/it] 46%|████▌ | 240/522 [1:32:20<1:36:29, 20.53s/it] {'loss': 0.1353, 'learning_rate': 0.0001, 'global_step': 240, 'epoch': 1.38}
46%|████▌ | 240/522 [1:32:20<1:36:29, 20.53s/it] 46%|████▌ | 241/522 [1:32:41<1:37:04, 20.73s/it] 46%|████▋ | 242/522 [1:33:07<1:43:03, 22.08s/it] 47%|████▋ | 243/522 [1:33:26<1:38:49, 21.25s/it] 47%|████▋ | 244/522 [1:33:53<1:45:59, 22.87s/it] 47%|████▋ | 245/522 [1:34:15<1:44:26, 22.62s/it] 47%|████▋ | 246/522 [1:34:39<1:46:02, 23.05s/it] 47%|████▋ | 247/522 [1:34:59<1:41:19, 22.11s/it] 48%|████▊ | 248/522 [1:35:24<1:45:33, 23.11s/it] 48%|████▊ | 249/522 [1:35:54<1:54:57, 25.26s/it] 48%|████▊ | 250/522 [1:36:19<1:53:10, 24.96s/it] 48%|████▊ | 251/522 [1:36:44<1:52:40, 24.95s/it] 48%|████▊ | 252/522 [1:37:09<1:53:06, 25.13s/it] 48%|████▊ | 253/522 [1:37:34<1:52:57, 25.20s/it] 49%|████▊ | 254/522 [1:37:56<1:47:15, 24.01s/it] 49%|████▉ | 255/522 [1:38:17<1:43:16, 23.21s/it] 49%|████▉ | 256/522 [1:38:40<1:42:49, 23.19s/it] 49%|████▉ | 257/522 [1:39:01<1:38:50, 22.38s/it] 49%|████▉ | 258/522 [1:39:28<1:44:57, 23.86s/it] 50%|████▉ | 259/522 [1:39:49<1:40:13, 22.87s/it] 50%|████▉ | 260/522 [1:40:15<1:44:07, 23.85s/it] {'loss': 0.1418, 'learning_rate': 0.0001, 'global_step': 260, 'epoch': 1.49}
50%|████▉ | 260/522 [1:40:15<1:44:07, 23.85s/it] 50%|█████ | 261/522 [1:40:34<1:37:21, 22.38s/it] 50%|█████ | 262/522 [1:40:57<1:38:50, 22.81s/it] 50%|█████ | 263/522 [1:41:18<1:35:11, 22.05s/it] 51%|█████ | 264/522 [1:41:38<1:33:07, 21.66s/it] 51%|█████ | 265/522 [1:41:59<1:31:51, 21.44s/it] 51%|█████ | 266/522 [1:42:24<1:35:25, 22.36s/it] 51%|█████ | 267/522 [1:42:45<1:33:44, 22.06s/it] 51%|█████▏ | 268/522 [1:43:07<1:33:01, 21.97s/it] 52%|█████▏ | 269/522 [1:43:34<1:38:57, 23.47s/it] 52%|█████▏ | 270/522 [1:43:56<1:36:51, 23.06s/it] 52%|█████▏ | 271/522 [1:44:22<1:40:19, 23.98s/it] 52%|█████▏ | 272/522 [1:44:45<1:38:50, 23.72s/it] 52%|█████▏ | 273/522 [1:45:08<1:36:57, 23.36s/it] 52%|█████▏ | 274/522 [1:45:32<1:37:16, 23.53s/it] 53%|█████▎ | 275/522 [1:45:59<1:41:03, 24.55s/it] 53%|█████▎ | 276/522 [1:46:20<1:36:54, 23.64s/it] 53%|█████▎ | 277/522 [1:46:43<1:35:34, 23.41s/it] 53%|█████▎ | 278/522 [1:47:04<1:32:26, 22.73s/it] 53%|█████▎ | 279/522 [1:47:31<1:36:33, 23.84s/it] 54%|█████▎ | 280/522 [1:47:55<1:37:08, 24.09s/it] {'loss': 0.1249, 'learning_rate': 0.0001, 'global_step': 280, 'epoch': 1.61}
54%|█████▎ | 280/522 [1:47:55<1:37:08, 24.09s/it] 54%|█████▍ | 281/522 [1:48:25<1:43:16, 25.71s/it] 54%|█████▍ | 282/522 [1:48:46<1:37:33, 24.39s/it] 54%|█████▍ | 283/522 [1:49:10<1:36:21, 24.19s/it] 54%|█████▍ | 284/522 [1:49:35<1:37:19, 24.54s/it] 55%|█████▍ | 285/522 [1:50:02<1:40:01, 25.32s/it] 55%|█████▍ | 286/522 [1:50:29<1:41:24, 25.78s/it] 55%|█████▍ | 287/522 [1:50:50<1:34:43, 24.19s/it] 55%|█████▌ | 288/522 [1:51:12<1:32:02, 23.60s/it] 55%|█████▌ | 289/522 [1:51:32<1:27:50, 22.62s/it] 56%|█████▌ | 290/522 [1:51:52<1:24:39, 21.90s/it] 56%|█████▌ | 291/522 [1:52:12<1:21:44, 21.23s/it] 56%|█████▌ | 292/522 [1:52:32<1:20:04, 20.89s/it] 56%|█████▌ | 293/522 [1:52:55<1:21:51, 21.45s/it] 56%|█████▋ | 294/522 [1:53:20<1:25:35, 22.52s/it] 57%|█████▋ | 295/522 [1:53:46<1:28:38, 23.43s/it] 57%|█████▋ | 296/522 [1:54:07<1:25:52, 22.80s/it] 57%|█████▋ | 297/522 [1:54:26<1:21:18, 21.68s/it] 57%|█████▋ | 298/522 [1:54:47<1:20:26, 21.55s/it] 57%|█████▋ | 299/522 [1:55:09<1:20:29, 21.66s/it] 57%|█████▋ | 300/522 [1:55:38<1:28:32, 23.93s/it] {'loss': 0.1365, 'learning_rate': 0.0001, 'global_step': 300, 'epoch': 1.72}
57%|█████▋ | 300/522 [1:55:38<1:28:32, 23.93s/it] 58%|█████▊ | 301/522 [1:56:05<1:31:19, 24.80s/it] 58%|█████▊ | 302/522 [1:56:30<1:30:54, 24.79s/it] 58%|█████▊ | 303/522 [1:56:53<1:29:01, 24.39s/it] 58%|█████▊ | 304/522 [1:57:19<1:29:44, 24.70s/it] 58%|█████▊ | 305/522 [1:57:46<1:31:59, 25.44s/it] 59%|█████▊ | 306/522 [1:58:06<1:25:09, 23.66s/it] 59%|█████▉ | 307/522 [1:58:26<1:20:55, 22.58s/it] 59%|█████▉ | 308/522 [1:58:54<1:26:36, 24.28s/it] 59%|█████▉ | 309/522 [1:59:18<1:25:42, 24.14s/it] 59%|█████▉ | 310/522 [1:59:42<1:25:16, 24.14s/it] 60%|█████▉ | 311/522 [2:00:03<1:21:28, 23.17s/it] 60%|█████▉ | 312/522 [2:00:27<1:21:47, 23.37s/it] 60%|█████▉ | 313/522 [2:00:49<1:20:01, 22.98s/it] 60%|██████ | 314/522 [2:01:13<1:21:38, 23.55s/it] 60%|██████ | 315/522 [2:01:34<1:18:20, 22.71s/it] 61%|██████ | 316/522 [2:01:52<1:13:20, 21.36s/it] 61%|██████ | 317/522 [2:02:17<1:16:32, 22.40s/it] 61%|██████ | 318/522 [2:02:37<1:13:24, 21.59s/it] 61%|██████ | 319/522 [2:03:06<1:21:03, 23.96s/it] 61%|██████▏ | 320/522 [2:03:32<1:22:12, 24.42s/it] {'loss': 0.1525, 'learning_rate': 0.0001, 'global_step': 320, 'epoch': 1.84}
61%|██████▏ | 320/522 [2:03:32<1:22:12, 24.42s/it] 61%|██████▏ | 321/522 [2:03:55<1:20:15, 23.96s/it] 62%|██████▏ | 322/522 [2:04:16<1:16:53, 23.07s/it] 62%|██████▏ | 323/522 [2:04:37<1:14:27, 22.45s/it] 62%|██████▏ | 324/522 [2:04:58<1:12:34, 21.99s/it] 62%|██████▏ | 325/522 [2:05:22<1:14:49, 22.79s/it] 62%|██████▏ | 326/522 [2:05:45<1:14:27, 22.79s/it] 63%|██████▎ | 327/522 [2:06:08<1:13:42, 22.68s/it] 63%|██████▎ | 328/522 [2:06:27<1:09:41, 21.55s/it] 63%|██████▎ | 329/522 [2:06:51<1:12:32, 22.55s/it] 63%|██████▎ | 330/522 [2:07:17<1:15:04, 23.46s/it] 63%|██████▎ | 331/522 [2:07:37<1:11:19, 22.40s/it] 64%|██████▎ | 332/522 [2:08:02<1:13:30, 23.22s/it] 64%|██████▍ | 333/522 [2:08:26<1:13:37, 23.37s/it] 64%|██████▍ | 334/522 [2:08:47<1:11:01, 22.67s/it] 64%|██████▍ | 335/522 [2:09:07<1:08:03, 21.84s/it] 64%|██████▍ | 336/522 [2:09:32<1:10:48, 22.84s/it] 65%|██████▍ | 337/522 [2:09:56<1:11:46, 23.28s/it] 65%|██████▍ | 338/522 [2:10:25<1:16:46, 25.04s/it] 65%|██████▍ | 339/522 [2:10:52<1:18:07, 25.62s/it] 65%|██████▌ | 340/522 [2:11:15<1:14:51, 24.68s/it] {'loss': 0.1233, 'learning_rate': 0.0001, 'global_step': 340, 'epoch': 1.95}
65%|██████▌ | 340/522 [2:11:15<1:14:51, 24.68s/it] 65%|██████▌ | 341/522 [2:11:40<1:14:53, 24.83s/it] 66%|██████▌ | 342/522 [2:12:04<1:13:36, 24.54s/it] 66%|██████▌ | 343/522 [2:12:29<1:13:55, 24.78s/it] 66%|██████▌ | 344/522 [2:12:51<1:11:12, 24.00s/it] 66%|██████▌ | 345/522 [2:13:15<1:10:25, 23.87s/it] 66%|██████▋ | 346/522 [2:13:38<1:08:57, 23.51s/it] 66%|██████▋ | 347/522 [2:14:03<1:09:58, 23.99s/it] 67%|██████▋ | 348/522 [2:14:17<1:01:19, 21.15s/it] 67%|██████▋ | 349/522 [2:14:42<1:03:53, 22.16s/it] 67%|██████▋ | 350/522 [2:15:01<1:01:26, 21.43s/it] 67%|██████▋ | 351/522 [2:15:25<1:02:51, 22.05s/it] 67%|██████▋ | 352/522 [2:15:50<1:04:42, 22.84s/it] 68%|██████▊ | 353/522 [2:16:11<1:03:01, 22.38s/it] 68%|██████▊ | 354/522 [2:16:30<59:51, 21.38s/it] 68%|██████▊ | 355/522 [2:16:54<1:02:00, 22.28s/it] 68%|██████▊ | 356/522 [2:17:19<1:03:42, 23.03s/it] 68%|██████▊ | 357/522 [2:17:46<1:06:43, 24.27s/it] 69%|██████▊ | 358/522 [2:18:11<1:06:56, 24.49s/it] 69%|██████▉ | 359/522 [2:18:37<1:07:20, 24.79s/it] 69%|██████▉ | 360/522 [2:19:01<1:06:16, 24.55s/it] {'loss': 0.0656, 'learning_rate': 0.0001, 'global_step': 360, 'epoch': 2.07}
69%|██████▉ | 360/522 [2:19:01<1:06:16, 24.55s/it] 69%|██████▉ | 361/522 [2:19:27<1:07:00, 24.97s/it] 69%|██████▉ | 362/522 [2:19:53<1:07:23, 25.27s/it] 70%|██████▉ | 363/522 [2:20:13<1:03:00, 23.78s/it] 70%|██████▉ | 364/522 [2:20:43<1:07:38, 25.69s/it] 70%|██████▉ | 365/522 [2:21:09<1:07:02, 25.62s/it] 70%|███████ | 366/522 [2:21:34<1:06:20, 25.52s/it] 70%|███████ | 367/522 [2:21:54<1:01:46, 23.91s/it] 70%|███████ | 368/522 [2:22:13<57:38, 22.46s/it] 71%|███████ | 369/522 [2:22:39<59:32, 23.35s/it] 71%|███████ | 370/522 [2:22:57<55:33, 21.93s/it] 71%|███████ | 371/522 [2:23:20<56:09, 22.31s/it] 71%|███████▏ | 372/522 [2:23:40<53:45, 21.50s/it] 71%|███████▏ | 373/522 [2:24:06<56:26, 22.73s/it] 72%|███████▏ | 374/522 [2:24:27<54:58, 22.29s/it] 72%|███████▏ | 375/522 [2:24:48<54:02, 22.06s/it] 72%|███████▏ | 376/522 [2:25:13<55:41, 22.89s/it] 72%|███████▏ | 377/522 [2:25:38<56:26, 23.36s/it] 72%|███████▏ | 378/522 [2:25:57<53:13, 22.18s/it] 73%|███████▎ | 379/522 [2:26:25<57:17, 24.04s/it] 73%|███████▎ | 380/522 [2:26:45<53:36, 22.65s/it] {'loss': 0.0608, 'learning_rate': 0.0001, 'global_step': 380, 'epoch': 2.18}
73%|███████▎ | 380/522 [2:26:45<53:36, 22.65s/it] 73%|███████▎ | 381/522 [2:27:06<52:30, 22.34s/it] 73%|███████▎ | 382/522 [2:27:29<52:18, 22.42s/it] 73%|███████▎ | 383/522 [2:27:54<53:54, 23.27s/it] 74%|███████▎ | 384/522 [2:28:15<51:46, 22.51s/it] 74%|███████▍ | 385/522 [2:28:33<48:35, 21.28s/it] 74%|███████▍ | 386/522 [2:28:51<45:58, 20.28s/it] 74%|███████▍ | 387/522 [2:29:12<45:32, 20.24s/it] 74%|███████▍ | 388/522 [2:29:32<45:21, 20.31s/it] 75%|███████▍ | 389/522 [2:29:52<44:46, 20.20s/it] 75%|███████▍ | 390/522 [2:30:11<43:35, 19.81s/it] 75%|███████▍ | 391/522 [2:30:34<45:23, 20.79s/it] 75%|███████▌ | 392/522 [2:30:58<46:52, 21.64s/it] 75%|███████▌ | 393/522 [2:31:16<44:28, 20.68s/it] 75%|███████▌ | 394/522 [2:31:39<45:34, 21.37s/it] 76%|███████▌ | 395/522 [2:32:01<45:48, 21.64s/it] 76%|███████▌ | 396/522 [2:32:27<48:04, 22.89s/it] 76%|███████▌ | 397/522 [2:32:52<48:49, 23.43s/it] 76%|███████▌ | 398/522 [2:33:11<45:47, 22.16s/it] 76%|███████▋ | 399/522 [2:33:30<43:40, 21.30s/it] 77%|███████▋ | 400/522 [2:33:52<43:27, 21.37s/it] {'loss': 0.0805, 'learning_rate': 0.0001, 'global_step': 400, 'epoch': 2.3}
77%|███████▋ | 400/522 [2:33:52<43:27, 21.37s/it] 77%|███████▋ | 401/522 [2:34:18<46:09, 22.89s/it] 77%|███████▋ | 402/522 [2:34:41<45:50, 22.92s/it] 77%|███████▋ | 403/522 [2:35:11<49:44, 25.08s/it] 77%|███████▋ | 404/522 [2:35:32<46:53, 23.85s/it] 78%|███████▊ | 405/522 [2:35:56<46:29, 23.84s/it] 78%|███████▊ | 406/522 [2:36:18<45:01, 23.29s/it] 78%|███████▊ | 407/522 [2:36:41<44:25, 23.18s/it] 78%|███████▊ | 408/522 [2:37:08<45:59, 24.21s/it] 78%|███████▊ | 409/522 [2:37:32<45:53, 24.37s/it] 79%|███████▊ | 410/522 [2:37:51<42:32, 22.79s/it] 79%|███████▊ | 411/522 [2:38:16<43:01, 23.25s/it] 79%|███████▉ | 412/522 [2:38:45<45:50, 25.01s/it] 79%|███████▉ | 413/522 [2:39:10<45:19, 24.95s/it] 79%|███████▉ | 414/522 [2:39:33<44:00, 24.45s/it] 80%|███████▉ | 415/522 [2:39:59<44:33, 24.98s/it] 80%|███████▉ | 416/522 [2:40:22<42:55, 24.30s/it] 80%|███████▉ | 417/522 [2:40:43<40:38, 23.22s/it] 80%|████████ | 418/522 [2:41:05<39:52, 23.00s/it] 80%|████████ | 419/522 [2:41:24<37:10, 21.65s/it] 80%|████████ | 420/522 [2:41:48<38:15, 22.50s/it] {'loss': 0.0562, 'learning_rate': 0.0001, 'global_step': 420, 'epoch': 2.41}
80%|████████ | 420/522 [2:41:48<38:15, 22.50s/it] 81%|████████ | 421/522 [2:42:09<36:53, 21.91s/it] 81%|████████ | 422/522 [2:42:31<36:52, 22.12s/it] 81%|████████ | 423/522 [2:42:52<35:59, 21.81s/it] 81%|████████ | 424/522 [2:43:17<36:48, 22.54s/it] 81%|████████▏ | 425/522 [2:43:40<36:50, 22.79s/it] 82%|████████▏ | 426/522 [2:44:05<37:18, 23.32s/it] 82%|████████▏ | 427/522 [2:44:26<36:10, 22.85s/it] 82%|████████▏ | 428/522 [2:44:50<36:05, 23.03s/it] 82%|████████▏ | 429/522 [2:45:20<39:03, 25.20s/it] 82%|████████▏ | 430/522 [2:45:42<37:17, 24.32s/it] 83%|████████▎ | 431/522 [2:46:05<36:07, 23.82s/it] 83%|████████▎ | 432/522 [2:46:29<35:52, 23.91s/it] 83%|████████▎ | 433/522 [2:46:52<35:11, 23.73s/it] 83%|████████▎ | 434/522 [2:47:16<34:47, 23.73s/it] 83%|████████▎ | 435/522 [2:47:36<32:42, 22.56s/it] 84%|████████▎ | 436/522 [2:48:01<33:26, 23.34s/it] 84%|████████▎ | 437/522 [2:48:26<33:42, 23.80s/it] 84%|████████▍ | 438/522 [2:48:50<33:33, 23.97s/it] 84%|████████▍ | 439/522 [2:49:13<32:31, 23.51s/it] 84%|████████▍ | 440/522 [2:49:32<30:25, 22.27s/it] {'loss': 0.0932, 'learning_rate': 0.0001, 'global_step': 440, 'epoch': 2.53}
84%|████████▍ | 440/522 [2:49:32<30:25, 22.27s/it] 84%|████████▍ | 441/522 [2:49:53<29:40, 21.98s/it] 85%|████████▍ | 442/522 [2:50:11<27:22, 20.53s/it] 85%|████████▍ | 443/522 [2:50:33<27:44, 21.07s/it] 85%|████████▌ | 444/522 [2:50:51<26:15, 20.20s/it] 85%|████████▌ | 445/522 [2:51:17<28:17, 22.05s/it] 85%|████████▌ | 446/522 [2:51:41<28:25, 22.45s/it] 86%|████████▌ | 447/522 [2:52:01<27:06, 21.68s/it] 86%|████████▌ | 448/522 [2:52:23<26:49, 21.75s/it] 86%|████████▌ | 449/522 [2:52:44<26:20, 21.65s/it] 86%|████████▌ | 450/522 [2:53:06<26:07, 21.77s/it] 86%|████████▋ | 451/522 [2:53:31<27:01, 22.84s/it] 87%|████████▋ | 452/522 [2:53:54<26:28, 22.70s/it] 87%|████████▋ | 453/522 [2:54:13<24:43, 21.50s/it] 87%|████████▋ | 454/522 [2:54:38<25:36, 22.59s/it] 87%|████████▋ | 455/522 [2:54:57<24:18, 21.76s/it] 87%|████████▋ | 456/522 [2:55:24<25:32, 23.21s/it] 88%|████████▊ | 457/522 [2:55:46<24:35, 22.70s/it] 88%|████████▊ | 458/522 [2:56:06<23:27, 21.99s/it] 88%|████████▊ | 459/522 [2:56:35<25:19, 24.12s/it] 88%|████████▊ | 460/522 [2:56:57<24:12, 23.43s/it] {'loss': 0.0551, 'learning_rate': 0.0001, 'global_step': 460, 'epoch': 2.64}
88%|████████▊ | 460/522 [2:56:57<24:12, 23.43s/it] 88%|████████▊ | 461/522 [2:57:23<24:31, 24.12s/it] 89%|████████▊ | 462/522 [2:57:49<24:56, 24.94s/it] 89%|████████▊ | 463/522 [2:58:13<24:13, 24.64s/it] 89%|████████▉ | 464/522 [2:58:37<23:38, 24.46s/it] 89%|████████▉ | 465/522 [2:59:06<24:28, 25.76s/it] 89%|████████▉ | 466/522 [2:59:26<22:26, 24.05s/it] 89%|████████▉ | 467/522 [2:59:48<21:33, 23.51s/it] 90%|████████▉ | 468/522 [3:00:07<19:46, 21.97s/it] 90%|████████▉ | 469/522 [3:00:29<19:24, 21.98s/it] 90%|█████████ | 470/522 [3:00:55<20:08, 23.23s/it] 90%|█████████ | 471/522 [3:01:19<20:00, 23.54s/it] 90%|█████████ | 472/522 [3:01:50<21:17, 25.55s/it] 91%|█████████ | 473/522 [3:02:14<20:33, 25.18s/it] 91%|█████████ | 474/522 [3:02:42<20:45, 25.95s/it] 91%|█████████ | 475/522 [3:03:12<21:18, 27.19s/it] 91%|█████████ | 476/522 [3:03:34<19:40, 25.66s/it] 91%|█████████▏| 477/522 [3:03:57<18:39, 24.89s/it] 92%|█████████▏| 478/522 [3:04:24<18:51, 25.72s/it] 92%|█████████▏| 479/522 [3:04:45<17:14, 24.06s/it] 92%|█████████▏| 480/522 [3:05:10<17:11, 24.56s/it] {'loss': 0.0659, 'learning_rate': 0.0001, 'global_step': 480, 'epoch': 2.76}
92%|█████████▏| 480/522 [3:05:10<17:11, 24.56s/it] 92%|█████████▏| 481/522 [3:05:35<16:53, 24.72s/it] 92%|█████████▏| 482/522 [3:05:54<15:15, 22.90s/it] 93%|█████████▎| 483/522 [3:06:19<15:14, 23.46s/it] 93%|█████████▎| 484/522 [3:06:38<14:06, 22.28s/it] 93%|█████████▎| 485/522 [3:07:03<14:06, 22.88s/it] 93%|█████████▎| 486/522 [3:07:22<13:08, 21.89s/it] 93%|█████████▎| 487/522 [3:07:46<13:06, 22.46s/it] 93%|█████████▎| 488/522 [3:08:07<12:23, 21.86s/it] 94%|█████████▎| 489/522 [3:08:28<12:01, 21.86s/it] 94%|█████████▍| 490/522 [3:08:49<11:27, 21.48s/it] 94%|█████████▍| 491/522 [3:09:14<11:36, 22.47s/it] 94%|█████████▍| 492/522 [3:09:43<12:16, 24.56s/it] 94%|█████████▍| 493/522 [3:10:07<11:49, 24.47s/it] 95%|█████████▍| 494/522 [3:10:27<10:41, 22.91s/it] 95%|█████████▍| 495/522 [3:10:52<10:37, 23.63s/it] 95%|█████████▌| 496/522 [3:11:18<10:29, 24.23s/it] 95%|█████████▌| 497/522 [3:11:46<10:38, 25.54s/it] 95%|█████████▌| 498/522 [3:12:11<10:08, 25.37s/it] 96%|█████████▌| 499/522 [3:12:37<09:42, 25.34s/it] 96%|█████████▌| 500/522 [3:12:58<08:50, 24.12s/it] {'loss': 0.0959, 'learning_rate': 0.0001, 'global_step': 500, 'epoch': 2.87}
96%|█████████▌| 500/522 [3:12:58<08:50, 24.12s/it] 96%|█████████▌| 501/522 [3:13:23<08:34, 24.51s/it] 96%|█████████▌| 502/522 [3:13:48<08:11, 24.58s/it] 96%|█████████▋| 503/522 [3:14:14<07:55, 25.02s/it] 97%|█████████▋| 504/522 [3:14:34<07:02, 23.48s/it] 97%|█████████▋| 505/522 [3:14:56<06:33, 23.17s/it] 97%|█████████▋| 506/522 [3:15:19<06:10, 23.13s/it] 97%|█████████▋| 507/522 [3:15:39<05:30, 22.01s/it] 97%|█████████▋| 508/522 [3:15:59<05:02, 21.63s/it] 98%|█████████▊| 509/522 [3:16:21<04:40, 21.61s/it] 98%|█████████▊| 510/522 [3:16:47<04:35, 22.98s/it] 98%|█████████▊| 511/522 [3:17:14<04:25, 24.18s/it] 98%|█████████▊| 512/522 [3:17:36<03:55, 23.55s/it] 98%|█████████▊| 513/522 [3:17:56<03:21, 22.43s/it] 98%|█████████▊| 514/522 [3:18:21<03:05, 23.14s/it] 99%|█████████▊| 515/522 [3:18:45<02:43, 23.42s/it] 99%|█████████▉| 516/522 [3:19:05<02:13, 22.33s/it] 99%|█████████▉| 517/522 [3:19:29<01:53, 22.75s/it] 99%|█████████▉| 518/522 [3:19:56<01:36, 24.16s/it] 99%|█████████▉| 519/522 [3:20:20<01:12, 24.16s/it] 100%|█████████▉| 520/522 [3:20:39<00:44, 22.49s/it] {'loss': 0.0682, 'learning_rate': 0.0001, 'global_step': 520, 'epoch': 2.99}
100%|█████████▉| 520/522 [3:20:39<00:44, 22.49s/it] 100%|█████████▉| 521/522 [3:21:01<00:22, 22.46s/it] 100%|██████████| 522/522 [3:21:19<00:00, 21.10s/it] {'train_runtime': 12079.5406, 'train_samples_per_second': 0.863, 'train_steps_per_second': 0.043, 'train_loss': 0.16342519847783207, 'epoch': 3.0}
100%|██████████| 522/522 [3:21:19<00:00, 21.10s/it] 100%|██████████| 522/522 [3:21:19<00:00, 23.14s/it]
***** train metrics *****
epoch = 3.0
train_loss = 0.1634
train_runtime = 3:21:19.54
train_samples_per_second = 0.863
train_steps_per_second = 0.043