Asimok's picture
Upload 36 files
89c2adb
model training desc: CCLUE-MRC数据集,使用随机选择的关键句训练
2023-12-04 18:36:01.480 | INFO | __main__:init_components:108 - Initializing components...
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|█████ | 1/2 [00:11<00:11, 11.18s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:24<00:00, 12.58s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:24<00:00, 12.37s/it]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2023-12-04 18:36:26.640 | INFO | __main__:init_components:155 -
2023-12-04 18:36:26.640 | INFO | __main__:init_components:156 - ********************
2023-12-04 18:36:26.640 | INFO | __main__:init_components:157 - using llama2 model
2023-12-04 18:36:26.640 | INFO | __main__:init_components:158 - ********************
2023-12-04 18:36:26.640 | INFO | __main__:init_components:159 -
memory footprint of model: 5.472740173339844 GB
trainable params: 319,815,680 || all params: 7,447,007,232 || trainable%: 4.294553100818044
2023-12-04 18:37:07.959 | INFO | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/CCLUE/random_select/cclue_random_1400_instruct/train.jsonl
2023-12-04 18:37:07.983 | INFO | component.dataset:__init__:19 - there are 3473 data in dataset
2023-12-04 18:37:08.021 | INFO | __main__:main:231 - *** starting training ***
0%| | 0/522 [00:00<?, ?it/s] 0%| | 1/522 [00:23<3:19:51, 23.02s/it] 0%| | 2/522 [00:44<3:12:07, 22.17s/it] 1%| | 3/522 [01:10<3:24:46, 23.67s/it] 1%| | 4/522 [01:34<3:26:47, 23.95s/it] 1%| | 5/522 [01:58<3:27:58, 24.14s/it] 1%| | 6/522 [02:28<3:42:36, 25.88s/it] 1%|▏ | 7/522 [02:54<3:42:35, 25.93s/it] 2%|▏ | 8/522 [03:18<3:36:26, 25.26s/it] 2%|▏ | 9/522 [03:37<3:21:46, 23.60s/it] 2%|▏ | 10/522 [03:56<3:07:00, 21.92s/it] 2%|▏ | 11/522 [04:19<3:10:48, 22.40s/it] 2%|▏ | 12/522 [04:39<3:04:40, 21.73s/it] 2%|▏ | 13/522 [05:10<3:26:19, 24.32s/it] 3%|▎ | 14/522 [05:31<3:18:23, 23.43s/it] 3%|▎ | 15/522 [05:52<3:11:31, 22.67s/it] 3%|▎ | 16/522 [06:12<3:03:59, 21.82s/it] 3%|▎ | 17/522 [06:34<3:05:40, 22.06s/it] 3%|▎ | 18/522 [06:54<2:59:12, 21.33s/it] 4%|▎ | 19/522 [07:13<2:53:41, 20.72s/it] 4%|▍ | 20/522 [07:37<3:01:57, 21.75s/it] {'loss': 0.2804, 'learning_rate': 3.7735849056603776e-05, 'global_step': 20, 'epoch': 0.11}
4%|▍ | 20/522 [07:37<3:01:57, 21.75s/it] 4%|▍ | 21/522 [07:58<2:58:53, 21.42s/it] 4%|▍ | 22/522 [08:24<3:10:54, 22.91s/it] 4%|▍ | 23/522 [08:45<3:05:46, 22.34s/it] 5%|▍ | 24/522 [09:10<3:10:24, 22.94s/it] 5%|▍ | 25/522 [09:29<3:00:59, 21.85s/it] 5%|▍ | 26/522 [09:49<2:56:28, 21.35s/it] 5%|▌ | 27/522 [10:18<3:15:24, 23.69s/it] 5%|▌ | 28/522 [10:45<3:22:24, 24.58s/it] 6%|▌ | 29/522 [11:14<3:31:44, 25.77s/it] 6%|▌ | 30/522 [11:39<3:29:17, 25.52s/it] 6%|▌ | 31/522 [12:00<3:17:32, 24.14s/it] 6%|▌ | 32/522 [12:24<3:17:46, 24.22s/it] 6%|▋ | 33/522 [12:47<3:13:40, 23.76s/it] 7%|▋ | 34/522 [13:08<3:06:38, 22.95s/it] 7%|▋ | 35/522 [13:28<2:59:40, 22.14s/it] 7%|▋ | 36/522 [13:51<3:00:35, 22.30s/it] 7%|▋ | 37/522 [14:17<3:09:39, 23.46s/it] 7%|▋ | 38/522 [14:44<3:17:19, 24.46s/it] 7%|▋ | 39/522 [15:08<3:17:51, 24.58s/it] 8%|▊ | 40/522 [15:32<3:14:21, 24.19s/it] {'loss': 0.2879, 'learning_rate': 7.547169811320755e-05, 'global_step': 40, 'epoch': 0.23}
8%|▊ | 40/522 [15:32<3:14:21, 24.19s/it] 8%|▊ | 41/522 [15:54<3:09:22, 23.62s/it] 8%|▊ | 42/522 [16:13<2:57:50, 22.23s/it] 8%|▊ | 43/522 [16:36<2:58:59, 22.42s/it] 8%|▊ | 44/522 [16:58<2:58:40, 22.43s/it] 9%|▊ | 45/522 [17:24<3:06:25, 23.45s/it] 9%|▉ | 46/522 [17:45<3:00:53, 22.80s/it] 9%|▉ | 47/522 [18:08<3:00:39, 22.82s/it] 9%|▉ | 48/522 [18:31<3:01:03, 22.92s/it] 9%|▉ | 49/522 [18:57<3:06:27, 23.65s/it] 10%|▉ | 50/522 [19:17<2:58:27, 22.68s/it] 10%|▉ | 51/522 [19:46<3:12:25, 24.51s/it] 10%|▉ | 52/522 [20:06<3:02:16, 23.27s/it] 10%|█ | 53/522 [20:35<3:15:10, 24.97s/it] 10%|█ | 54/522 [21:00<3:13:32, 24.81s/it] 11%|█ | 55/522 [21:18<2:58:00, 22.87s/it] 11%|█ | 56/522 [21:44<3:05:19, 23.86s/it] 11%|█ | 57/522 [22:06<2:59:18, 23.14s/it] 11%|█ | 58/522 [22:28<2:57:23, 22.94s/it] 11%|█▏ | 59/522 [22:50<2:54:11, 22.57s/it] 11%|█▏ | 60/522 [23:11<2:49:30, 22.01s/it] {'loss': 0.2889, 'learning_rate': 0.0001, 'global_step': 60, 'epoch': 0.34}
11%|█▏ | 60/522 [23:11<2:49:30, 22.01s/it] 12%|█▏ | 61/522 [23:35<2:55:00, 22.78s/it] 12%|█▏ | 62/522 [24:00<2:59:53, 23.46s/it] 12%|█▏ | 63/522 [24:25<3:03:18, 23.96s/it] 12%|█▏ | 64/522 [24:51<3:06:19, 24.41s/it] 12%|█▏ | 65/522 [25:10<2:53:40, 22.80s/it] 13%|█▎ | 66/522 [25:32<2:51:23, 22.55s/it] 13%|█▎ | 67/522 [25:54<2:50:26, 22.48s/it] 13%|█▎ | 68/522 [26:14<2:44:30, 21.74s/it] 13%|█▎ | 69/522 [26:36<2:44:19, 21.76s/it] 13%|█▎ | 70/522 [26:59<2:47:36, 22.25s/it] 14%|█▎ | 71/522 [27:23<2:51:02, 22.76s/it] 14%|█▍ | 72/522 [27:42<2:41:13, 21.50s/it] 14%|█▍ | 73/522 [28:06<2:46:53, 22.30s/it] 14%|█▍ | 74/522 [28:26<2:40:51, 21.54s/it] 14%|█▍ | 75/522 [28:51<2:49:24, 22.74s/it] 15%|█▍ | 76/522 [29:14<2:48:03, 22.61s/it] 15%|█▍ | 77/522 [29:38<2:50:39, 23.01s/it] 15%|█▍ | 78/522 [30:02<2:54:10, 23.54s/it] 15%|█▌ | 79/522 [30:24<2:50:14, 23.06s/it] 15%|█▌ | 80/522 [30:43<2:40:38, 21.81s/it] {'loss': 0.2898, 'learning_rate': 0.0001, 'global_step': 80, 'epoch': 0.46}
15%|█▌ | 80/522 [30:43<2:40:38, 21.81s/it] 16%|█▌ | 81/522 [31:08<2:46:24, 22.64s/it] 16%|█▌ | 82/522 [31:28<2:40:41, 21.91s/it] 16%|█▌ | 83/522 [31:53<2:46:37, 22.77s/it] 16%|█▌ | 84/522 [32:13<2:41:07, 22.07s/it] 16%|█▋ | 85/522 [32:39<2:48:30, 23.14s/it] 16%|█▋ | 86/522 [33:03<2:49:42, 23.35s/it] 17%|█▋ | 87/522 [33:28<2:54:28, 24.07s/it] 17%|█▋ | 88/522 [33:53<2:54:30, 24.13s/it] 17%|█▋ | 89/522 [34:23<3:07:20, 25.96s/it] 17%|█▋ | 90/522 [34:46<3:00:22, 25.05s/it] 17%|█▋ | 91/522 [35:11<3:00:51, 25.18s/it] 18%|█▊ | 92/522 [35:38<3:04:10, 25.70s/it] 18%|█▊ | 93/522 [36:01<2:57:42, 24.85s/it] 18%|█▊ | 94/522 [36:20<2:45:26, 23.19s/it] 18%|█▊ | 95/522 [36:41<2:39:43, 22.44s/it] 18%|█▊ | 96/522 [37:07<2:46:14, 23.41s/it] 19%|█▊ | 97/522 [37:31<2:47:42, 23.68s/it] 19%|█▉ | 98/522 [38:00<2:58:33, 25.27s/it] 19%|█▉ | 99/522 [38:21<2:48:18, 23.87s/it] 19%|█▉ | 100/522 [38:43<2:44:28, 23.39s/it] {'loss': 0.2622, 'learning_rate': 0.0001, 'global_step': 100, 'epoch': 0.57}
19%|█▉ | 100/522 [38:43<2:44:28, 23.39s/it] 19%|█▉ | 101/522 [39:05<2:41:52, 23.07s/it] 20%|█▉ | 102/522 [39:25<2:34:44, 22.10s/it] 20%|█▉ | 103/522 [39:50<2:39:54, 22.90s/it] 20%|█▉ | 104/522 [40:14<2:42:16, 23.29s/it] 20%|██ | 105/522 [40:43<2:53:45, 25.00s/it] 20%|██ | 106/522 [41:11<3:00:27, 26.03s/it] 20%|██ | 107/522 [41:32<2:48:40, 24.39s/it] 21%|██ | 108/522 [41:53<2:41:26, 23.40s/it] 21%|██ | 109/522 [42:17<2:42:52, 23.66s/it] 21%|██ | 110/522 [42:37<2:33:17, 22.32s/it] 21%|██▏ | 111/522 [42:57<2:28:47, 21.72s/it] 21%|██▏ | 112/522 [43:25<2:42:14, 23.74s/it] 22%|██▏ | 113/522 [43:47<2:37:11, 23.06s/it] 22%|██▏ | 114/522 [44:12<2:41:37, 23.77s/it] 22%|██▏ | 115/522 [44:34<2:37:02, 23.15s/it] 22%|██▏ | 116/522 [44:53<2:29:02, 22.03s/it] 22%|██▏ | 117/522 [45:24<2:45:20, 24.49s/it] 23%|██▎ | 118/522 [45:48<2:44:25, 24.42s/it] 23%|██▎ | 119/522 [46:07<2:32:31, 22.71s/it] 23%|██▎ | 120/522 [46:34<2:42:22, 24.24s/it] {'loss': 0.2734, 'learning_rate': 0.0001, 'global_step': 120, 'epoch': 0.69}
23%|██▎ | 120/522 [46:34<2:42:22, 24.24s/it] 23%|██▎ | 121/522 [47:01<2:46:49, 24.96s/it] 23%|██▎ | 122/522 [47:28<2:50:20, 25.55s/it] 24%|██▎ | 123/522 [47:55<2:52:39, 25.96s/it] 24%|██▍ | 124/522 [48:19<2:48:39, 25.43s/it] 24%|██▍ | 125/522 [48:45<2:49:51, 25.67s/it] 24%|██▍ | 126/522 [49:10<2:47:47, 25.42s/it] 24%|██▍ | 127/522 [49:34<2:44:49, 25.04s/it] 25%|██▍ | 128/522 [49:57<2:39:40, 24.32s/it] 25%|██▍ | 129/522 [50:16<2:29:38, 22.85s/it] 25%|██▍ | 130/522 [50:38<2:27:07, 22.52s/it] 25%|██▌ | 131/522 [51:06<2:36:56, 24.08s/it] 25%|██▌ | 132/522 [51:30<2:36:20, 24.05s/it] 25%|██▌ | 133/522 [51:56<2:40:24, 24.74s/it] 26%|██▌ | 134/522 [52:16<2:30:53, 23.33s/it] 26%|██▌ | 135/522 [52:44<2:38:53, 24.63s/it] 26%|██▌ | 136/522 [53:06<2:32:39, 23.73s/it] 26%|██▌ | 137/522 [53:24<2:21:26, 22.04s/it] 26%|██▋ | 138/522 [53:45<2:19:52, 21.86s/it] 27%|██▋ | 139/522 [54:11<2:28:14, 23.22s/it] 27%|██▋ | 140/522 [54:37<2:32:03, 23.88s/it] {'loss': 0.2923, 'learning_rate': 0.0001, 'global_step': 140, 'epoch': 0.8}
27%|██▋ | 140/522 [54:37<2:32:03, 23.88s/it] 27%|██▋ | 141/522 [55:02<2:34:30, 24.33s/it] 27%|██▋ | 142/522 [55:23<2:27:40, 23.32s/it] 27%|██▋ | 143/522 [55:46<2:25:50, 23.09s/it] 28%|██▊ | 144/522 [56:06<2:19:16, 22.11s/it] 28%|██▊ | 145/522 [56:26<2:16:33, 21.73s/it] 28%|██▊ | 146/522 [56:46<2:12:36, 21.16s/it] 28%|██▊ | 147/522 [57:09<2:15:53, 21.74s/it] 28%|██▊ | 148/522 [57:32<2:18:05, 22.15s/it] 29%|██▊ | 149/522 [57:53<2:14:32, 21.64s/it] 29%|██▊ | 150/522 [58:14<2:12:53, 21.43s/it] 29%|██▉ | 151/522 [58:35<2:11:50, 21.32s/it] 29%|██▉ | 152/522 [58:55<2:09:41, 21.03s/it] 29%|██▉ | 153/522 [59:14<2:05:44, 20.45s/it] 30%|██▉ | 154/522 [59:35<2:05:10, 20.41s/it] 30%|██▉ | 155/522 [59:58<2:10:59, 21.41s/it] 30%|██▉ | 156/522 [1:00:22<2:14:36, 22.07s/it] 30%|███ | 157/522 [1:00:40<2:07:28, 20.96s/it] 30%|███ | 158/522 [1:01:03<2:10:13, 21.46s/it] 30%|███ | 159/522 [1:01:23<2:07:20, 21.05s/it] 31%|███ | 160/522 [1:01:43<2:05:12, 20.75s/it] {'loss': 0.2827, 'learning_rate': 0.0001, 'global_step': 160, 'epoch': 0.92}
31%|███ | 160/522 [1:01:43<2:05:12, 20.75s/it] 31%|███ | 161/522 [1:02:07<2:11:16, 21.82s/it] 31%|███ | 162/522 [1:02:31<2:14:19, 22.39s/it] 31%|███ | 163/522 [1:02:52<2:11:10, 21.92s/it] 31%|███▏ | 164/522 [1:03:18<2:18:15, 23.17s/it] 32%|███▏ | 165/522 [1:03:39<2:13:53, 22.50s/it] 32%|███▏ | 166/522 [1:04:03<2:16:26, 23.00s/it] 32%|███▏ | 167/522 [1:04:25<2:13:09, 22.50s/it] 32%|███▏ | 168/522 [1:04:46<2:10:32, 22.13s/it] 32%|███▏ | 169/522 [1:05:09<2:12:47, 22.57s/it] 33%|███▎ | 170/522 [1:05:29<2:06:33, 21.57s/it] 33%|███▎ | 171/522 [1:05:49<2:04:11, 21.23s/it] 33%|███▎ | 172/522 [1:06:13<2:08:48, 22.08s/it] 33%|███▎ | 173/522 [1:06:38<2:13:23, 22.93s/it] 33%|███▎ | 174/522 [1:06:52<1:57:42, 20.29s/it] 34%|███▎ | 175/522 [1:07:18<2:06:12, 21.82s/it] 34%|███▎ | 176/522 [1:07:42<2:10:21, 22.60s/it] 34%|███▍ | 177/522 [1:08:02<2:06:12, 21.95s/it] 34%|███▍ | 178/522 [1:08:28<2:11:11, 22.88s/it] 34%|███▍ | 179/522 [1:08:48<2:07:03, 22.23s/it] 34%|███▍ | 180/522 [1:09:10<2:06:21, 22.17s/it] {'loss': 0.2054, 'learning_rate': 0.0001, 'global_step': 180, 'epoch': 1.03}
34%|███▍ | 180/522 [1:09:10<2:06:21, 22.17s/it] 35%|███▍ | 181/522 [1:09:35<2:09:43, 22.82s/it] 35%|███▍ | 182/522 [1:09:59<2:12:08, 23.32s/it] 35%|███▌ | 183/522 [1:10:24<2:14:20, 23.78s/it] 35%|███▌ | 184/522 [1:10:44<2:07:55, 22.71s/it] 35%|███▌ | 185/522 [1:11:11<2:14:23, 23.93s/it] 36%|███▌ | 186/522 [1:11:36<2:16:06, 24.31s/it] 36%|███▌ | 187/522 [1:12:06<2:25:37, 26.08s/it] 36%|███▌ | 188/522 [1:12:32<2:24:23, 25.94s/it] 36%|███▌ | 189/522 [1:12:56<2:21:05, 25.42s/it] 36%|███▋ | 190/522 [1:13:16<2:11:55, 23.84s/it] 37%|███▋ | 191/522 [1:13:39<2:09:50, 23.54s/it] 37%|███▋ | 192/522 [1:14:03<2:10:49, 23.79s/it] 37%|███▋ | 193/522 [1:14:29<2:13:12, 24.29s/it] 37%|███▋ | 194/522 [1:14:50<2:08:00, 23.42s/it] 37%|███▋ | 195/522 [1:15:10<2:00:43, 22.15s/it] 38%|███▊ | 196/522 [1:15:36<2:07:34, 23.48s/it] 38%|███▊ | 197/522 [1:15:58<2:05:16, 23.13s/it] 38%|███▊ | 198/522 [1:16:23<2:06:59, 23.52s/it] 38%|███▊ | 199/522 [1:16:50<2:12:14, 24.57s/it] 38%|███▊ | 200/522 [1:17:10<2:04:12, 23.14s/it] {'loss': 0.1518, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 1.15}
38%|███▊ | 200/522 [1:17:10<2:04:12, 23.14s/it] 39%|███▊ | 201/522 [1:17:33<2:03:18, 23.05s/it] 39%|███▊ | 202/522 [1:17:59<2:08:46, 24.15s/it] 39%|███▉ | 203/522 [1:18:25<2:10:38, 24.57s/it] 39%|███▉ | 204/522 [1:18:45<2:03:32, 23.31s/it] 39%|███▉ | 205/522 [1:19:12<2:08:31, 24.33s/it] 39%|███▉ | 206/522 [1:19:42<2:17:30, 26.11s/it] 40%|███▉ | 207/522 [1:20:04<2:10:04, 24.78s/it] 40%|███▉ | 208/522 [1:20:30<2:11:21, 25.10s/it] 40%|████ | 209/522 [1:20:53<2:08:40, 24.67s/it] 40%|████ | 210/522 [1:21:17<2:06:28, 24.32s/it] 40%|████ | 211/522 [1:21:37<1:58:54, 22.94s/it] 41%|████ | 212/522 [1:21:56<1:52:51, 21.84s/it] 41%|████ | 213/522 [1:22:16<1:49:59, 21.36s/it] 41%|████ | 214/522 [1:22:45<2:01:36, 23.69s/it] 41%|████ | 215/522 [1:23:10<2:03:25, 24.12s/it] 41%|████▏ | 216/522 [1:23:30<1:55:58, 22.74s/it] 42%|████▏ | 217/522 [1:23:52<1:54:22, 22.50s/it] 42%|████▏ | 218/522 [1:24:12<1:50:51, 21.88s/it] 42%|████▏ | 219/522 [1:24:39<1:57:30, 23.27s/it] 42%|████▏ | 220/522 [1:24:58<1:51:31, 22.16s/it] {'loss': 0.1561, 'learning_rate': 0.0001, 'global_step': 220, 'epoch': 1.26}
42%|████▏ | 220/522 [1:24:58<1:51:31, 22.16s/it] 42%|████▏ | 221/522 [1:25:24<1:57:13, 23.37s/it] 43%|████▎ | 222/522 [1:25:41<1:47:11, 21.44s/it] 43%|████▎ | 223/522 [1:26:03<1:46:54, 21.45s/it] 43%|████▎ | 224/522 [1:26:23<1:45:01, 21.14s/it] 43%|████▎ | 225/522 [1:26:48<1:49:19, 22.08s/it] 43%|████▎ | 226/522 [1:27:12<1:52:23, 22.78s/it] 43%|████▎ | 227/522 [1:27:35<1:52:53, 22.96s/it] 44%|████▎ | 228/522 [1:27:57<1:51:08, 22.68s/it] 44%|████▍ | 229/522 [1:28:20<1:50:08, 22.56s/it] 44%|████▍ | 230/522 [1:28:41<1:48:17, 22.25s/it] 44%|████▍ | 231/522 [1:29:06<1:51:18, 22.95s/it] 44%|████▍ | 232/522 [1:29:29<1:51:39, 23.10s/it] 45%|████▍ | 233/522 [1:29:49<1:46:35, 22.13s/it] 45%|████▍ | 234/522 [1:30:11<1:46:20, 22.15s/it] 45%|████▌ | 235/522 [1:30:40<1:54:54, 24.02s/it] 45%|████▌ | 236/522 [1:31:03<1:53:10, 23.74s/it] 45%|████▌ | 237/522 [1:31:29<1:55:53, 24.40s/it] 46%|████▌ | 238/522 [1:31:53<1:55:49, 24.47s/it] 46%|████▌ | 239/522 [1:32:18<1:55:46, 24.54s/it] 46%|████▌ | 240/522 [1:32:37<1:46:46, 22.72s/it] {'loss': 0.1274, 'learning_rate': 0.0001, 'global_step': 240, 'epoch': 1.38}
46%|████▌ | 240/522 [1:32:37<1:46:46, 22.72s/it] 46%|████▌ | 241/522 [1:32:58<1:44:11, 22.25s/it] 46%|████▋ | 242/522 [1:33:20<1:43:43, 22.23s/it] 47%|████▋ | 243/522 [1:33:41<1:42:12, 21.98s/it] 47%|████▋ | 244/522 [1:34:00<1:37:41, 21.09s/it] 47%|████▋ | 245/522 [1:34:21<1:36:20, 20.87s/it] 47%|████▋ | 246/522 [1:34:46<1:42:42, 22.33s/it] 47%|████▋ | 247/522 [1:35:06<1:38:25, 21.47s/it] 48%|████▊ | 248/522 [1:35:31<1:42:25, 22.43s/it] 48%|████▊ | 249/522 [1:35:53<1:41:31, 22.31s/it] 48%|████▊ | 250/522 [1:36:17<1:44:10, 22.98s/it] 48%|████▊ | 251/522 [1:36:43<1:47:18, 23.76s/it] 48%|████▊ | 252/522 [1:37:04<1:43:34, 23.02s/it] 48%|████▊ | 253/522 [1:37:23<1:38:30, 21.97s/it] 49%|████▊ | 254/522 [1:37:45<1:38:09, 21.98s/it] 49%|████▉ | 255/522 [1:38:10<1:40:59, 22.70s/it] 49%|████▉ | 256/522 [1:38:33<1:40:39, 22.70s/it] 49%|████▉ | 257/522 [1:38:52<1:35:37, 21.65s/it] 49%|████▉ | 258/522 [1:39:14<1:36:03, 21.83s/it] 50%|████▉ | 259/522 [1:39:40<1:41:35, 23.18s/it] 50%|████▉ | 260/522 [1:40:11<1:50:27, 25.29s/it] {'loss': 0.1318, 'learning_rate': 0.0001, 'global_step': 260, 'epoch': 1.49}
50%|████▉ | 260/522 [1:40:11<1:50:27, 25.29s/it] 50%|█████ | 261/522 [1:40:35<1:48:27, 24.93s/it] 50%|█████ | 262/522 [1:40:58<1:46:16, 24.52s/it] 50%|█████ | 263/522 [1:41:19<1:40:29, 23.28s/it] 51%|█████ | 264/522 [1:41:39<1:35:53, 22.30s/it] 51%|█████ | 265/522 [1:41:58<1:31:30, 21.36s/it] 51%|█████ | 266/522 [1:42:18<1:30:16, 21.16s/it] 51%|█████ | 267/522 [1:42:42<1:33:32, 22.01s/it] 51%|█████▏ | 268/522 [1:43:06<1:35:14, 22.50s/it] 52%|█████▏ | 269/522 [1:43:29<1:35:13, 22.58s/it] 52%|█████▏ | 270/522 [1:43:49<1:31:50, 21.87s/it] 52%|█████▏ | 271/522 [1:44:11<1:31:48, 21.95s/it] 52%|█████▏ | 272/522 [1:44:41<1:41:48, 24.43s/it] 52%|█████▏ | 273/522 [1:45:12<1:48:48, 26.22s/it] 52%|█████▏ | 274/522 [1:45:34<1:43:05, 24.94s/it] 53%|█████▎ | 275/522 [1:45:58<1:41:33, 24.67s/it] 53%|█████▎ | 276/522 [1:46:25<1:43:46, 25.31s/it] 53%|█████▎ | 277/522 [1:46:46<1:38:33, 24.14s/it] 53%|█████▎ | 278/522 [1:47:07<1:34:15, 23.18s/it] 53%|█████▎ | 279/522 [1:47:33<1:37:10, 24.00s/it] 54%|█████▎ | 280/522 [1:47:57<1:36:29, 23.92s/it] {'loss': 0.1201, 'learning_rate': 0.0001, 'global_step': 280, 'epoch': 1.61}
54%|█████▎ | 280/522 [1:47:57<1:36:29, 23.92s/it] 54%|█████▍ | 281/522 [1:48:21<1:36:42, 24.08s/it] 54%|█████▍ | 282/522 [1:48:49<1:41:09, 25.29s/it] 54%|█████▍ | 283/522 [1:49:16<1:43:05, 25.88s/it] 54%|█████▍ | 284/522 [1:49:38<1:38:03, 24.72s/it] 55%|█████▍ | 285/522 [1:49:58<1:31:13, 23.09s/it] 55%|█████▍ | 286/522 [1:50:19<1:28:47, 22.57s/it] 55%|█████▍ | 287/522 [1:50:40<1:26:21, 22.05s/it] 55%|█████▌ | 288/522 [1:51:08<1:33:14, 23.91s/it] 55%|█████▌ | 289/522 [1:51:32<1:32:22, 23.79s/it] 56%|█████▌ | 290/522 [1:51:59<1:36:29, 24.95s/it] 56%|█████▌ | 291/522 [1:52:22<1:33:17, 24.23s/it] 56%|█████▌ | 292/522 [1:52:45<1:31:11, 23.79s/it] 56%|█████▌ | 293/522 [1:53:04<1:25:54, 22.51s/it] 56%|█████▋ | 294/522 [1:53:25<1:24:01, 22.11s/it] 57%|█████▋ | 295/522 [1:53:56<1:32:47, 24.53s/it] 57%|█████▋ | 296/522 [1:54:16<1:27:34, 23.25s/it] 57%|█████▋ | 297/522 [1:54:40<1:28:43, 23.66s/it] 57%|█████▋ | 298/522 [1:55:00<1:23:54, 22.47s/it] 57%|█████▋ | 299/522 [1:55:26<1:26:55, 23.39s/it] 57%|█████▋ | 300/522 [1:55:46<1:23:28, 22.56s/it] {'loss': 0.1661, 'learning_rate': 0.0001, 'global_step': 300, 'epoch': 1.72}
57%|█████▋ | 300/522 [1:55:46<1:23:28, 22.56s/it] 58%|█████▊ | 301/522 [1:56:09<1:22:54, 22.51s/it] 58%|█████▊ | 302/522 [1:56:29<1:20:00, 21.82s/it] 58%|█████▊ | 303/522 [1:56:50<1:19:20, 21.74s/it] 58%|█████▊ | 304/522 [1:57:14<1:20:37, 22.19s/it] 58%|█████▊ | 305/522 [1:57:38<1:22:18, 22.76s/it] 59%|█████▊ | 306/522 [1:58:04<1:25:38, 23.79s/it] 59%|█████▉ | 307/522 [1:58:28<1:25:06, 23.75s/it] 59%|█████▉ | 308/522 [1:58:49<1:21:56, 22.97s/it] 59%|█████▉ | 309/522 [1:59:17<1:27:05, 24.53s/it] 59%|█████▉ | 310/522 [1:59:47<1:32:03, 26.05s/it] 60%|█████▉ | 311/522 [2:00:12<1:30:36, 25.76s/it] 60%|█████▉ | 312/522 [2:00:37<1:30:13, 25.78s/it] 60%|█████▉ | 313/522 [2:01:00<1:26:41, 24.89s/it] 60%|██████ | 314/522 [2:01:21<1:21:43, 23.58s/it] 60%|██████ | 315/522 [2:01:45<1:22:08, 23.81s/it] 61%|██████ | 316/522 [2:02:06<1:19:12, 23.07s/it] 61%|██████ | 317/522 [2:02:28<1:16:44, 22.46s/it] 61%|██████ | 318/522 [2:02:55<1:21:05, 23.85s/it] 61%|██████ | 319/522 [2:03:24<1:26:48, 25.66s/it] 61%|██████▏ | 320/522 [2:03:45<1:21:05, 24.09s/it] {'loss': 0.1922, 'learning_rate': 0.0001, 'global_step': 320, 'epoch': 1.84}
61%|██████▏ | 320/522 [2:03:45<1:21:05, 24.09s/it] 61%|██████▏ | 321/522 [2:04:14<1:26:03, 25.69s/it] 62%|██████▏ | 322/522 [2:04:35<1:20:07, 24.04s/it] 62%|██████▏ | 323/522 [2:04:54<1:15:13, 22.68s/it] 62%|██████▏ | 324/522 [2:05:13<1:11:24, 21.64s/it] 62%|██████▏ | 325/522 [2:05:38<1:14:23, 22.66s/it] 62%|██████▏ | 326/522 [2:05:58<1:10:52, 21.70s/it] 63%|██████▎ | 327/522 [2:06:24<1:15:21, 23.19s/it] 63%|██████▎ | 328/522 [2:06:52<1:19:01, 24.44s/it] 63%|██████▎ | 329/522 [2:07:13<1:15:57, 23.61s/it] 63%|██████▎ | 330/522 [2:07:43<1:21:29, 25.47s/it] 63%|██████▎ | 331/522 [2:08:02<1:14:48, 23.50s/it] 64%|██████▎ | 332/522 [2:08:22<1:10:40, 22.32s/it] 64%|██████▍ | 333/522 [2:08:43<1:09:02, 21.92s/it] 64%|██████▍ | 334/522 [2:09:06<1:10:07, 22.38s/it] 64%|██████▍ | 335/522 [2:09:28<1:09:35, 22.33s/it] 64%|██████▍ | 336/522 [2:09:54<1:12:28, 23.38s/it] 65%|██████▍ | 337/522 [2:10:25<1:18:41, 25.52s/it] 65%|██████▍ | 338/522 [2:10:48<1:16:02, 24.79s/it] 65%|██████▍ | 339/522 [2:11:10<1:13:08, 23.98s/it] 65%|██████▌ | 340/522 [2:11:33<1:12:14, 23.82s/it] {'loss': 0.1123, 'learning_rate': 0.0001, 'global_step': 340, 'epoch': 1.95}
65%|██████▌ | 340/522 [2:11:33<1:12:14, 23.82s/it] 65%|██████▌ | 341/522 [2:11:55<1:09:31, 23.05s/it] 66%|██████▌ | 342/522 [2:12:17<1:08:20, 22.78s/it] 66%|██████▌ | 343/522 [2:12:39<1:07:12, 22.53s/it] 66%|██████▌ | 344/522 [2:13:03<1:08:06, 22.96s/it] 66%|██████▌ | 345/522 [2:13:30<1:11:55, 24.38s/it] 66%|██████▋ | 346/522 [2:13:55<1:12:08, 24.59s/it] 66%|██████▋ | 347/522 [2:14:20<1:11:33, 24.53s/it] 67%|██████▋ | 348/522 [2:14:36<1:04:13, 22.15s/it] 67%|██████▋ | 349/522 [2:15:00<1:05:29, 22.72s/it] 67%|██████▋ | 350/522 [2:15:27<1:08:10, 23.78s/it] 67%|██████▋ | 351/522 [2:15:53<1:09:48, 24.49s/it] 67%|██████▋ | 352/522 [2:16:19<1:10:42, 24.95s/it] 68%|██████▊ | 353/522 [2:16:43<1:09:47, 24.78s/it] 68%|██████▊ | 354/522 [2:17:11<1:12:04, 25.74s/it] 68%|██████▊ | 355/522 [2:17:37<1:11:48, 25.80s/it] 68%|██████▊ | 356/522 [2:18:04<1:12:27, 26.19s/it] 68%|██████▊ | 357/522 [2:18:28<1:10:16, 25.55s/it] 69%|██████▊ | 358/522 [2:18:47<1:04:21, 23.55s/it] 69%|██████▉ | 359/522 [2:19:14<1:06:12, 24.37s/it] 69%|██████▉ | 360/522 [2:19:33<1:02:01, 22.97s/it] {'loss': 0.0836, 'learning_rate': 0.0001, 'global_step': 360, 'epoch': 2.07}
69%|██████▉ | 360/522 [2:19:33<1:02:01, 22.97s/it] 69%|██████▉ | 361/522 [2:19:53<59:02, 22.01s/it] 69%|██████▉ | 362/522 [2:20:11<55:27, 20.79s/it] 70%|██████▉ | 363/522 [2:20:41<1:02:06, 23.44s/it] 70%|██████▉ | 364/522 [2:21:02<1:00:32, 22.99s/it] 70%|██████▉ | 365/522 [2:21:27<1:01:09, 23.37s/it] 70%|███████ | 366/522 [2:21:54<1:03:54, 24.58s/it] 70%|███████ | 367/522 [2:22:16<1:01:03, 23.63s/it] 70%|███████ | 368/522 [2:22:38<1:00:04, 23.41s/it] 71%|███████ | 369/522 [2:23:04<1:00:59, 23.92s/it] 71%|███████ | 370/522 [2:23:23<57:28, 22.68s/it] 71%|███████ | 371/522 [2:23:43<54:54, 21.82s/it] 71%|███████▏ | 372/522 [2:24:11<58:49, 23.53s/it] 71%|███████▏ | 373/522 [2:24:33<57:12, 23.04s/it] 72%|███████▏ | 374/522 [2:25:00<1:00:03, 24.35s/it] 72%|███████▏ | 375/522 [2:25:22<58:01, 23.68s/it] 72%|███████▏ | 376/522 [2:25:48<59:13, 24.34s/it] 72%|███████▏ | 377/522 [2:26:11<57:55, 23.97s/it] 72%|███████▏ | 378/522 [2:26:29<53:14, 22.18s/it] 73%|███████▎ | 379/522 [2:26:52<53:14, 22.34s/it] 73%|███████▎ | 380/522 [2:27:13<51:50, 21.91s/it] {'loss': 0.0435, 'learning_rate': 0.0001, 'global_step': 380, 'epoch': 2.18}
73%|███████▎ | 380/522 [2:27:13<51:50, 21.91s/it] 73%|███████▎ | 381/522 [2:27:36<52:35, 22.38s/it] 73%|███████▎ | 382/522 [2:27:54<49:09, 21.07s/it] 73%|███████▎ | 383/522 [2:28:24<54:33, 23.55s/it] 74%|███████▎ | 384/522 [2:28:46<53:20, 23.19s/it] 74%|███████▍ | 385/522 [2:29:11<54:24, 23.83s/it] 74%|███████▍ | 386/522 [2:29:38<56:10, 24.78s/it] 74%|███████▍ | 387/522 [2:29:58<52:25, 23.30s/it] 74%|███████▍ | 388/522 [2:30:29<56:52, 25.47s/it] 75%|███████▍ | 389/522 [2:30:50<53:36, 24.18s/it] 75%|███████▍ | 390/522 [2:31:17<55:14, 25.11s/it] 75%|███████▍ | 391/522 [2:31:40<53:36, 24.56s/it] 75%|███████▌ | 392/522 [2:32:09<56:10, 25.93s/it] 75%|███████▌ | 393/522 [2:32:30<52:23, 24.37s/it] 75%|███████▌ | 394/522 [2:32:52<50:23, 23.62s/it] 76%|███████▌ | 395/522 [2:33:17<51:02, 24.11s/it] 76%|███████▌ | 396/522 [2:33:39<48:52, 23.28s/it] 76%|███████▌ | 397/522 [2:34:04<49:40, 23.85s/it] 76%|███████▌ | 398/522 [2:34:33<52:45, 25.53s/it] 76%|███████▋ | 399/522 [2:34:53<48:55, 23.86s/it] 77%|███████▋ | 400/522 [2:35:14<46:41, 22.96s/it] {'loss': 0.0541, 'learning_rate': 0.0001, 'global_step': 400, 'epoch': 2.3}
77%|███████▋ | 400/522 [2:35:14<46:41, 22.96s/it] 77%|███████▋ | 401/522 [2:35:33<43:42, 21.67s/it] 77%|███████▋ | 402/522 [2:36:02<47:55, 23.97s/it] 77%|███████▋ | 403/522 [2:36:25<46:57, 23.68s/it] 77%|███████▋ | 404/522 [2:36:49<46:28, 23.63s/it] 78%|███████▊ | 405/522 [2:37:16<48:34, 24.91s/it] 78%|███████▊ | 406/522 [2:37:38<46:18, 23.95s/it] 78%|███████▊ | 407/522 [2:38:02<45:48, 23.90s/it] 78%|███████▊ | 408/522 [2:38:21<42:36, 22.42s/it] 78%|███████▊ | 409/522 [2:38:42<41:22, 21.97s/it] 79%|███████▊ | 410/522 [2:39:04<40:59, 21.96s/it] 79%|███████▊ | 411/522 [2:39:25<40:21, 21.82s/it] 79%|███████▉ | 412/522 [2:39:51<42:01, 22.93s/it] 79%|███████▉ | 413/522 [2:40:11<40:25, 22.26s/it] 79%|███████▉ | 414/522 [2:40:37<42:04, 23.37s/it] 80%|███████▉ | 415/522 [2:41:00<41:22, 23.20s/it] 80%|███████▉ | 416/522 [2:41:20<39:18, 22.25s/it] 80%|███████▉ | 417/522 [2:41:41<38:15, 21.86s/it] 80%|████████ | 418/522 [2:42:02<37:04, 21.39s/it] 80%|████████ | 419/522 [2:42:22<36:17, 21.15s/it] 80%|████████ | 420/522 [2:42:42<35:23, 20.82s/it] {'loss': 0.0895, 'learning_rate': 0.0001, 'global_step': 420, 'epoch': 2.41}
80%|████████ | 420/522 [2:42:42<35:23, 20.82s/it] 81%|████████ | 421/522 [2:43:01<34:12, 20.32s/it] 81%|████████ | 422/522 [2:43:20<32:54, 19.74s/it] 81%|████████ | 423/522 [2:43:41<33:22, 20.23s/it] 81%|████████ | 424/522 [2:44:02<33:08, 20.29s/it] 81%|████████▏ | 425/522 [2:44:23<33:22, 20.64s/it] 82%|████████▏ | 426/522 [2:44:44<33:00, 20.63s/it] 82%|████████▏ | 427/522 [2:45:06<33:43, 21.30s/it] 82%|████████▏ | 428/522 [2:45:33<35:48, 22.86s/it] 82%|████████▏ | 429/522 [2:45:53<33:55, 21.89s/it] 82%|████████▏ | 430/522 [2:46:19<35:33, 23.19s/it] 83%|████████▎ | 431/522 [2:46:41<34:43, 22.90s/it] 83%|████████▎ | 432/522 [2:47:01<32:53, 21.92s/it] 83%|████████▎ | 433/522 [2:47:27<34:27, 23.23s/it] 83%|████████▎ | 434/522 [2:47:44<31:32, 21.50s/it] 83%|████████▎ | 435/522 [2:48:09<32:41, 22.54s/it] 84%|████████▎ | 436/522 [2:48:34<33:05, 23.09s/it] 84%|████████▎ | 437/522 [2:48:57<32:45, 23.13s/it] 84%|████████▍ | 438/522 [2:49:18<31:31, 22.52s/it] 84%|████████▍ | 439/522 [2:49:44<32:40, 23.62s/it] 84%|████████▍ | 440/522 [2:50:06<31:18, 22.91s/it] {'loss': 0.0636, 'learning_rate': 0.0001, 'global_step': 440, 'epoch': 2.53}
84%|████████▍ | 440/522 [2:50:06<31:18, 22.91s/it] 84%|████████▍ | 441/522 [2:50:27<30:32, 22.62s/it] 85%|████████▍ | 442/522 [2:50:50<30:08, 22.61s/it] 85%|████████▍ | 443/522 [2:51:13<29:59, 22.77s/it] 85%|████████▌ | 444/522 [2:51:37<30:06, 23.16s/it] 85%|████████▌ | 445/522 [2:52:00<29:31, 23.01s/it] 85%|████████▌ | 446/522 [2:52:25<29:57, 23.66s/it] 86%|████████▌ | 447/522 [2:52:45<28:20, 22.67s/it] 86%|████████▌ | 448/522 [2:53:11<29:08, 23.63s/it] 86%|████████▌ | 449/522 [2:53:38<29:41, 24.41s/it] 86%|████████▌ | 450/522 [2:53:57<27:30, 22.92s/it] 86%|████████▋ | 451/522 [2:54:22<27:53, 23.57s/it] 87%|████████▋ | 452/522 [2:54:42<26:20, 22.58s/it] 87%|████████▋ | 453/522 [2:55:02<25:00, 21.74s/it] 87%|████████▋ | 454/522 [2:55:27<25:42, 22.69s/it] 87%|████████▋ | 455/522 [2:55:51<25:47, 23.10s/it] 87%|████████▋ | 456/522 [2:56:19<26:58, 24.52s/it] 88%|████████▊ | 457/522 [2:56:45<26:54, 24.84s/it] 88%|████████▊ | 458/522 [2:57:09<26:14, 24.60s/it] 88%|████████▊ | 459/522 [2:57:35<26:21, 25.11s/it] 88%|████████▊ | 460/522 [2:57:57<25:04, 24.26s/it] {'loss': 0.0681, 'learning_rate': 0.0001, 'global_step': 460, 'epoch': 2.64}
88%|████████▊ | 460/522 [2:57:57<25:04, 24.26s/it] 88%|████████▊ | 461/522 [2:58:22<24:47, 24.39s/it] 89%|████████▊ | 462/522 [2:58:51<25:42, 25.72s/it] 89%|████████▊ | 463/522 [2:59:16<25:18, 25.74s/it] 89%|████████▉ | 464/522 [2:59:36<23:06, 23.90s/it] 89%|████████▉ | 465/522 [2:59:56<21:38, 22.78s/it] 89%|████████▉ | 466/522 [3:00:22<22:07, 23.70s/it] 89%|████████▉ | 467/522 [3:00:45<21:38, 23.60s/it] 90%|████████▉ | 468/522 [3:01:05<20:08, 22.38s/it] 90%|████████▉ | 469/522 [3:01:29<20:07, 22.78s/it] 90%|█████████ | 470/522 [3:01:57<21:10, 24.43s/it] 90%|█████████ | 471/522 [3:02:19<20:15, 23.84s/it] 90%|█████████ | 472/522 [3:02:42<19:27, 23.36s/it] 91%|█████████ | 473/522 [3:03:02<18:17, 22.39s/it] 91%|█████████ | 474/522 [3:03:24<17:46, 22.21s/it] 91%|█████████ | 475/522 [3:03:50<18:22, 23.45s/it] 91%|█████████ | 476/522 [3:04:16<18:35, 24.26s/it] 91%|█████████▏| 477/522 [3:04:34<16:52, 22.51s/it] 92%|█████████▏| 478/522 [3:05:00<17:04, 23.29s/it] 92%|█████████▏| 479/522 [3:05:25<17:08, 23.92s/it] 92%|█████████▏| 480/522 [3:05:51<17:10, 24.53s/it] {'loss': 0.0669, 'learning_rate': 0.0001, 'global_step': 480, 'epoch': 2.76}
92%|█████████▏| 480/522 [3:05:51<17:10, 24.53s/it] 92%|█████████▏| 481/522 [3:06:12<16:07, 23.59s/it] 92%|█████████▏| 482/522 [3:06:35<15:38, 23.45s/it] 93%|█████████▎| 483/522 [3:06:57<14:52, 22.89s/it] 93%|█████████▎| 484/522 [3:07:19<14:23, 22.71s/it] 93%|█████████▎| 485/522 [3:07:45<14:27, 23.45s/it] 93%|█████████▎| 486/522 [3:08:06<13:47, 22.98s/it] 93%|█████████▎| 487/522 [3:08:30<13:34, 23.28s/it] 93%|█████████▎| 488/522 [3:08:55<13:27, 23.75s/it] 94%|█████████▎| 489/522 [3:09:20<13:13, 24.03s/it] 94%|█████████▍| 490/522 [3:09:43<12:40, 23.78s/it] 94%|█████████▍| 491/522 [3:10:13<13:18, 25.75s/it] 94%|█████████▍| 492/522 [3:10:41<13:05, 26.18s/it] 94%|█████████▍| 493/522 [3:11:01<11:51, 24.55s/it] 95%|█████████▍| 494/522 [3:11:22<10:50, 23.24s/it] 95%|█████████▍| 495/522 [3:11:41<10:00, 22.23s/it] 95%|█████████▌| 496/522 [3:12:02<09:22, 21.64s/it] 95%|█████████▌| 497/522 [3:12:27<09:25, 22.64s/it] 95%|█████████▌| 498/522 [3:12:46<08:37, 21.56s/it] 96%|█████████▌| 499/522 [3:13:09<08:27, 22.07s/it] 96%|█████████▌| 500/522 [3:13:34<08:26, 23.03s/it] {'loss': 0.0711, 'learning_rate': 0.0001, 'global_step': 500, 'epoch': 2.87}
96%|█████████▌| 500/522 [3:13:34<08:26, 23.03s/it] 96%|█████████▌| 501/522 [3:13:59<08:14, 23.57s/it] 96%|█████████▌| 502/522 [3:14:21<07:41, 23.09s/it] 96%|█████████▋| 503/522 [3:14:47<07:32, 23.81s/it] 97%|█████████▋| 504/522 [3:15:16<07:37, 25.39s/it] 97%|█████████▋| 505/522 [3:15:45<07:31, 26.58s/it] 97%|█████████▋| 506/522 [3:16:15<07:24, 27.76s/it] 97%|█████████▋| 507/522 [3:16:44<06:58, 27.90s/it] 97%|█████████▋| 508/522 [3:17:12<06:31, 27.99s/it] 98%|█████████▊| 509/522 [3:17:37<05:53, 27.21s/it] 98%|█████████▊| 510/522 [3:17:57<05:00, 25.06s/it] 98%|█████████▊| 511/522 [3:18:22<04:33, 24.84s/it] 98%|█████████▊| 512/522 [3:18:41<03:51, 23.19s/it] 98%|█████████▊| 513/522 [3:19:05<03:32, 23.56s/it] 98%|█████████▊| 514/522 [3:19:31<03:12, 24.12s/it] 99%|█████████▊| 515/522 [3:19:54<02:46, 23.81s/it] 99%|█████████▉| 516/522 [3:20:23<02:31, 25.28s/it] 99%|█████████▉| 517/522 [3:20:43<01:58, 23.71s/it] 99%|█████████▉| 518/522 [3:21:04<01:32, 23.04s/it] 99%|█████████▉| 519/522 [3:21:23<01:05, 21.89s/it] 100%|█████████▉| 520/522 [3:21:46<00:43, 21.99s/it] {'loss': 0.0772, 'learning_rate': 0.0001, 'global_step': 520, 'epoch': 2.99}
100%|█████████▉| 520/522 [3:21:46<00:43, 21.99s/it] 100%|█████████▉| 521/522 [3:22:06<00:21, 21.55s/it] 100%|██████████| 522/522 [3:22:24<00:00, 20.34s/it] {'train_runtime': 12144.157, 'train_samples_per_second': 0.858, 'train_steps_per_second': 0.043, 'train_loss': 0.16264161262018928, 'epoch': 3.0}
100%|██████████| 522/522 [3:22:24<00:00, 20.34s/it] 100%|██████████| 522/522 [3:22:24<00:00, 23.26s/it]
***** train metrics *****
epoch = 3.0
train_loss = 0.1626
train_runtime = 3:22:24.15
train_samples_per_second = 0.858
train_steps_per_second = 0.043