| WARNING:torch.distributed.run: | |
| ***************************************** | |
| Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
| ***************************************** | |
| model training desc: 使用随机选择的关键句训练 | |
| 2023-12-06 10:34:27.251 | INFO | __main__:init_components:108 - Initializing components... | |
| model training desc: 使用随机选择的关键句训练 | |
| 2023-12-06 10:34:27.257 | INFO | __main__:init_components:108 - Initializing components... | |
| You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 | |
| 2023-12-06 10:34:47.365 | INFO | __main__:init_components:143 - | |
| 2023-12-06 10:34:47.365 | INFO | __main__:init_components:144 - ******************** | |
| 2023-12-06 10:34:47.365 | INFO | __main__:init_components:145 - using TechGPT-7B | |
| 2023-12-06 10:34:47.365 | INFO | __main__:init_components:146 - ******************** | |
| 2023-12-06 10:34:47.365 | INFO | __main__:init_components:147 - | |
| memory footprint of model: 5.472740173339844 GB | |
| You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 | |
| 2023-12-06 10:34:48.089 | INFO | __main__:init_components:143 - | |
| 2023-12-06 10:34:48.090 | INFO | __main__:init_components:144 - ******************** | |
| 2023-12-06 10:34:48.090 | INFO | __main__:init_components:145 - using TechGPT-7B | |
| 2023-12-06 10:34:48.090 | INFO | __main__:init_components:146 - ******************** | |
| 2023-12-06 10:34:48.090 | INFO | __main__:init_components:147 - | |
| memory footprint of model: 5.472740173339844 GB | |
| trainable params: 319,815,680 || all params: 7,447,007,232 || trainable%: 4.294553100818044 | |
| 2023-12-06 10:34:50.010 | INFO | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/NCR/random_select/ncr_chunk_1400_instruct/train.jsonl | |
| 2023-12-06 10:34:50.112 | INFO | component.dataset:__init__:19 - there are 15319 data in dataset | |
| 2023-12-06 10:34:50.170 | INFO | __main__:main:231 - *** starting training *** | |
| trainable params: 319,815,680 || all params: 7,447,007,232 || trainable%: 4.294553100818044 | |
| 2023-12-06 10:34:50.758 | INFO | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/NCR/random_select/ncr_chunk_1400_instruct/train.jsonl | |
| 2023-12-06 10:34:50.857 | INFO | component.dataset:__init__:19 - there are 15319 data in dataset | |
| 2023-12-06 10:34:50.875 | INFO | __main__:main:231 - *** starting training *** | |
| 0%| | 0/1149 [00:00<?, ?it/s] 0%| | 1/1149 [00:37<12:00:27, 37.65s/it] 0%| | 2/1149 [01:09<10:56:27, 34.34s/it] 0%| | 3/1149 [01:38<10:10:31, 31.97s/it] 0%| | 4/1149 [02:04<9:25:09, 29.62s/it] 0%| | 5/1149 [02:40<10:03:48, 31.67s/it] 1%| | 6/1149 [03:10<9:57:35, 31.37s/it] 1%| | 7/1149 [03:45<10:17:03, 32.42s/it] 1%| | 8/1149 [04:17<10:12:20, 32.20s/it] 1%| | 9/1149 [04:50<10:17:59, 32.53s/it] 1%| | 10/1149 [05:23<10:22:37, 32.80s/it] 1%| | 11/1149 [05:53<10:00:57, 31.68s/it] 1%| | 12/1149 [06:25<10:05:32, 31.95s/it] 1%| | 13/1149 [07:00<10:19:48, 32.74s/it] 1%| | 14/1149 [07:34<10:29:43, 33.29s/it] 1%|▏ | 15/1149 [08:05<10:12:08, 32.39s/it] 1%|▏ | 16/1149 [08:39<10:25:23, 33.12s/it] 1%|▏ | 17/1149 [09:15<10:36:33, 33.74s/it] 2%|▏ | 18/1149 [09:44<10:13:52, 32.57s/it] 2%|▏ | 19/1149 [10:17<10:16:15, 32.72s/it] 2%|▏ | 20/1149 [10:51<10:21:47, 33.04s/it] {'loss': 5.7171, 'learning_rate': 1.652173913043478e-05, 'global_step': 20, 'epoch': 0.05} | |
| 2%|▏ | 20/1149 [10:51<10:21:47, 33.04s/it] 2%|▏ | 21/1149 [11:22<10:07:27, 32.31s/it] 2%|▏ | 22/1149 [11:50<9:45:24, 31.17s/it] 2%|▏ | 23/1149 [12:20<9:38:10, 30.81s/it] 2%|▏ | 24/1149 [12:50<9:29:30, 30.37s/it] 2%|▏ | 25/1149 [13:22<9:42:02, 31.07s/it] 2%|▏ | 26/1149 [13:52<9:35:01, 30.72s/it] 2%|▏ | 27/1149 [14:23<9:36:42, 30.84s/it] 2%|▏ | 28/1149 [14:58<9:56:12, 31.91s/it] 3%|▎ | 29/1149 [15:33<10:14:06, 32.90s/it] 3%|▎ | 30/1149 [16:00<9:43:05, 31.26s/it] 3%|▎ | 31/1149 [16:32<9:43:16, 31.30s/it] 3%|▎ | 32/1149 [17:04<9:47:05, 31.54s/it] 3%|▎ | 33/1149 [17:37<9:52:36, 31.86s/it] 3%|▎ | 34/1149 [18:07<9:45:19, 31.50s/it] 3%|▎ | 35/1149 [18:42<10:04:46, 32.57s/it] 3%|▎ | 36/1149 [19:12<9:51:04, 31.86s/it] 3%|▎ | 37/1149 [19:48<10:10:04, 32.92s/it] 3%|▎ | 38/1149 [20:22<10:17:03, 33.32s/it] 3%|▎ | 39/1149 [20:57<10:25:24, 33.81s/it] 3%|▎ | 40/1149 [21:27<10:05:50, 32.78s/it] {'loss': 3.3801, 'learning_rate': 3.304347826086956e-05, 'global_step': 40, 'epoch': 0.1} | |
| 3%|▎ | 40/1149 [21:27<10:05:50, 32.78s/it] 4%|▎ | 41/1149 [22:02<10:13:08, 33.20s/it] 4%|▎ | 42/1149 [22:37<10:24:41, 33.86s/it] 4%|▎ | 43/1149 [23:12<10:31:20, 34.25s/it] 4%|▍ | 44/1149 [23:48<10:36:58, 34.59s/it] 4%|▍ | 45/1149 [24:20<10:24:28, 33.94s/it] 4%|▍ | 46/1149 [24:51<10:10:02, 33.18s/it] 4%|▍ | 47/1149 [25:21<9:50:23, 32.15s/it] 4%|▍ | 48/1149 [25:52<9:45:01, 31.88s/it] 4%|▍ | 49/1149 [26:26<9:53:11, 32.36s/it] 4%|▍ | 50/1149 [26:58<9:51:46, 32.31s/it] 4%|▍ | 51/1149 [27:33<10:04:46, 33.05s/it] 5%|▍ | 52/1149 [28:03<9:50:37, 32.30s/it] 5%|▍ | 53/1149 [28:35<9:44:55, 32.02s/it] 5%|▍ | 54/1149 [29:06<9:42:11, 31.90s/it] 5%|▍ | 55/1149 [29:42<9:59:56, 32.90s/it] 5%|▍ | 56/1149 [30:11<9:39:12, 31.80s/it] 5%|▍ | 57/1149 [30:42<9:34:52, 31.59s/it] 5%|▌ | 58/1149 [31:16<9:48:40, 32.37s/it] 5%|▌ | 59/1149 [31:51<10:04:07, 33.25s/it] 5%|▌ | 60/1149 [32:27<10:14:39, 33.87s/it] {'loss': 0.7521, 'learning_rate': 4.956521739130435e-05, 'global_step': 60, 'epoch': 0.16} | |
| 5%|▌ | 60/1149 [32:27<10:14:39, 33.87s/it] 5%|▌ | 61/1149 [33:02<10:22:28, 34.33s/it] 5%|▌ | 62/1149 [33:35<10:13:51, 33.88s/it] 5%|▌ | 63/1149 [34:10<10:19:55, 34.25s/it] 6%|▌ | 64/1149 [34:45<10:25:17, 34.58s/it] 6%|▌ | 65/1149 [35:18<10:11:07, 33.83s/it] 6%|▌ | 66/1149 [35:52<10:13:43, 34.00s/it] 6%|▌ | 67/1149 [36:22<9:53:55, 32.94s/it] 6%|▌ | 68/1149 [36:51<9:31:10, 31.70s/it] 6%|▌ | 69/1149 [37:27<9:50:02, 32.78s/it] 6%|▌ | 70/1149 [37:58<9:43:14, 32.43s/it] 6%|▌ | 71/1149 [38:33<9:57:52, 33.28s/it] 6%|▋ | 72/1149 [39:08<10:02:40, 33.57s/it] 6%|▋ | 73/1149 [39:39<9:48:17, 32.80s/it] 6%|▋ | 74/1149 [40:14<9:59:05, 33.44s/it] 7%|▋ | 75/1149 [40:47<9:58:03, 33.41s/it] 7%|▋ | 76/1149 [41:19<9:50:41, 33.03s/it] 7%|▋ | 77/1149 [41:51<9:46:06, 32.80s/it] 7%|▋ | 78/1149 [42:27<9:59:09, 33.57s/it] 7%|▋ | 79/1149 [42:55<9:31:12, 32.03s/it] 7%|▋ | 80/1149 [43:31<9:49:11, 33.07s/it] {'loss': 0.7016, 'learning_rate': 6.695652173913044e-05, 'global_step': 80, 'epoch': 0.21} | |
| 7%|▋ | 80/1149 [43:31<9:49:11, 33.07s/it] 7%|▋ | 81/1149 [44:05<9:55:09, 33.44s/it] 7%|▋ | 82/1149 [44:35<9:36:46, 32.43s/it] 7%|▋ | 83/1149 [45:01<9:03:34, 30.59s/it] 7%|▋ | 84/1149 [45:31<9:00:54, 30.47s/it] 7%|▋ | 85/1149 [46:05<9:15:42, 31.34s/it] 7%|▋ | 86/1149 [46:39<9:29:28, 32.14s/it] 8%|▊ | 87/1149 [47:13<9:38:52, 32.70s/it] 8%|▊ | 88/1149 [47:48<9:52:17, 33.49s/it] 8%|▊ | 89/1149 [48:20<9:42:12, 32.95s/it] 8%|▊ | 90/1149 [48:55<9:54:38, 33.69s/it] 8%|▊ | 91/1149 [49:23<9:23:20, 31.95s/it] 8%|▊ | 92/1149 [49:58<9:40:20, 32.94s/it] 8%|▊ | 93/1149 [50:27<9:18:33, 31.74s/it] 8%|▊ | 94/1149 [50:58<9:09:50, 31.27s/it] 8%|▊ | 95/1149 [51:31<9:19:22, 31.84s/it] 8%|▊ | 96/1149 [52:00<9:07:35, 31.20s/it] 8%|▊ | 97/1149 [52:36<9:29:00, 32.45s/it] 9%|▊ | 98/1149 [53:11<9:43:32, 33.31s/it] 9%|▊ | 99/1149 [53:41<9:25:23, 32.31s/it] 9%|▊ | 100/1149 [54:14<9:27:40, 32.47s/it] {'loss': 0.7053, 'learning_rate': 8.434782608695653e-05, 'global_step': 100, 'epoch': 0.26} | |
| 9%|▊ | 100/1149 [54:14<9:27:40, 32.47s/it] 9%|▉ | 101/1149 [54:49<9:41:19, 33.28s/it] 9%|▉ | 102/1149 [55:22<9:38:25, 33.15s/it] 9%|▉ | 103/1149 [55:52<9:23:13, 32.31s/it] 9%|▉ | 104/1149 [56:25<9:26:12, 32.51s/it] 9%|▉ | 105/1149 [56:54<9:04:05, 31.27s/it] 9%|▉ | 106/1149 [57:26<9:07:18, 31.48s/it] 9%|▉ | 107/1149 [57:53<8:44:12, 30.18s/it] 9%|▉ | 108/1149 [58:26<8:57:23, 30.97s/it] 9%|▉ | 109/1149 [58:56<8:54:34, 30.84s/it] 10%|▉ | 110/1149 [59:31<9:12:14, 31.89s/it] 10%|▉ | 111/1149 [1:00:05<9:22:39, 32.52s/it] 10%|▉ | 112/1149 [1:00:40<9:37:05, 33.39s/it] 10%|▉ | 113/1149 [1:01:13<9:37:15, 33.43s/it] 10%|▉ | 114/1149 [1:01:45<9:25:16, 32.77s/it] 10%|█ | 115/1149 [1:02:18<9:26:38, 32.88s/it] 10%|█ | 116/1149 [1:02:50<9:24:13, 32.77s/it] 10%|█ | 117/1149 [1:03:22<9:20:23, 32.58s/it] 10%|█ | 118/1149 [1:03:53<9:11:08, 32.07s/it] 10%|█ | 119/1149 [1:04:26<9:14:33, 32.30s/it] 10%|█ | 120/1149 [1:05:00<9:24:10, 32.90s/it] {'loss': 0.7202, 'learning_rate': 0.0001, 'global_step': 120, 'epoch': 0.31} | |
| 10%|█ | 120/1149 [1:05:00<9:24:10, 32.90s/it] 11%|█ | 121/1149 [1:05:30<9:04:48, 31.80s/it] 11%|█ | 122/1149 [1:06:01<9:00:54, 31.60s/it] 11%|█ | 123/1149 [1:06:34<9:10:49, 32.21s/it] 11%|█ | 124/1149 [1:07:06<9:07:54, 32.07s/it] 11%|█ | 125/1149 [1:07:35<8:51:30, 31.14s/it] 11%|█ | 126/1149 [1:08:06<8:49:55, 31.08s/it] 11%|█ | 127/1149 [1:08:37<8:49:05, 31.06s/it] 11%|█ | 128/1149 [1:09:10<8:55:25, 31.46s/it] 11%|█ | 129/1149 [1:09:44<9:08:27, 32.26s/it] 11%|█▏ | 130/1149 [1:10:16<9:08:18, 32.29s/it] 11%|█▏ | 131/1149 [1:10:46<8:58:13, 31.72s/it] 11%|█▏ | 132/1149 [1:11:20<9:07:03, 32.27s/it] 12%|█▏ | 133/1149 [1:11:52<9:04:22, 32.15s/it] 12%|█▏ | 134/1149 [1:12:27<9:20:18, 33.12s/it] 12%|█▏ | 135/1149 [1:13:01<9:24:20, 33.39s/it] 12%|█▏ | 136/1149 [1:13:30<8:59:22, 31.95s/it] 12%|█▏ | 137/1149 [1:14:03<9:05:35, 32.35s/it] 12%|█▏ | 138/1149 [1:14:38<9:19:41, 33.22s/it] 12%|█▏ | 139/1149 [1:15:08<9:03:29, 32.29s/it] 12%|█▏ | 140/1149 [1:15:43<9:16:04, 33.07s/it] {'loss': 0.7062, 'learning_rate': 0.0001, 'global_step': 140, 'epoch': 0.37} | |
| 12%|█▏ | 140/1149 [1:15:43<9:16:04, 33.07s/it] 12%|█▏ | 141/1149 [1:16:12<8:55:12, 31.86s/it] 12%|█▏ | 142/1149 [1:16:45<8:58:42, 32.10s/it] 12%|█▏ | 143/1149 [1:17:18<9:00:48, 32.26s/it] 13%|█▎ | 144/1149 [1:17:52<9:08:49, 32.77s/it] 13%|█▎ | 145/1149 [1:18:23<8:59:48, 32.26s/it] 13%|█▎ | 146/1149 [1:18:58<9:14:45, 33.19s/it] 13%|█▎ | 147/1149 [1:19:32<9:15:51, 33.29s/it] 13%|█▎ | 148/1149 [1:20:04<9:08:44, 32.89s/it] 13%|█▎ | 149/1149 [1:20:32<8:48:05, 31.69s/it] 13%|█▎ | 150/1149 [1:21:02<8:38:55, 31.17s/it] 13%|█▎ | 151/1149 [1:21:38<8:59:28, 32.43s/it] 13%|█▎ | 152/1149 [1:22:05<8:34:03, 30.94s/it] 13%|█▎ | 153/1149 [1:22:41<8:55:30, 32.26s/it] 13%|█▎ | 154/1149 [1:23:11<8:47:23, 31.80s/it] 13%|█▎ | 155/1149 [1:23:47<9:04:56, 32.89s/it] 14%|█▎ | 156/1149 [1:24:22<9:16:30, 33.63s/it] 14%|█▎ | 157/1149 [1:24:52<8:59:28, 32.63s/it] 14%|█▍ | 158/1149 [1:25:28<9:11:17, 33.38s/it] 14%|█▍ | 159/1149 [1:26:03<9:20:05, 33.94s/it] 14%|█▍ | 160/1149 [1:26:35<9:10:47, 33.41s/it] {'loss': 0.709, 'learning_rate': 0.0001, 'global_step': 160, 'epoch': 0.42} | |
| 14%|█▍ | 160/1149 [1:26:35<9:10:47, 33.41s/it] 14%|█▍ | 161/1149 [1:27:08<9:08:18, 33.30s/it] 14%|█▍ | 162/1149 [1:27:41<9:04:31, 33.10s/it] 14%|█▍ | 163/1149 [1:28:16<9:13:13, 33.66s/it] 14%|█▍ | 164/1149 [1:28:47<9:03:20, 33.10s/it] 14%|█▍ | 165/1149 [1:29:23<9:13:36, 33.76s/it] 14%|█▍ | 166/1149 [1:29:53<8:57:49, 32.83s/it] 15%|█▍ | 167/1149 [1:30:28<9:03:55, 33.23s/it] 15%|█▍ | 168/1149 [1:31:02<9:07:38, 33.50s/it] 15%|█▍ | 169/1149 [1:31:37<9:14:43, 33.96s/it] 15%|█▍ | 170/1149 [1:32:12<9:20:46, 34.37s/it] 15%|█▍ | 171/1149 [1:32:47<9:24:57, 34.66s/it] 15%|█▍ | 172/1149 [1:33:19<9:08:25, 33.68s/it] 15%|█▌ | 173/1149 [1:33:49<8:52:01, 32.71s/it] 15%|█▌ | 174/1149 [1:34:20<8:43:18, 32.20s/it] 15%|█▌ | 175/1149 [1:34:55<8:53:24, 32.86s/it] 15%|█▌ | 176/1149 [1:35:29<8:59:22, 33.26s/it] 15%|█▌ | 177/1149 [1:36:04<9:09:23, 33.91s/it] 15%|█▌ | 178/1149 [1:36:39<9:14:59, 34.29s/it] 16%|█▌ | 179/1149 [1:37:10<8:57:35, 33.25s/it] 16%|█▌ | 180/1149 [1:37:43<8:55:55, 33.18s/it] {'loss': 0.6773, 'learning_rate': 0.0001, 'global_step': 180, 'epoch': 0.47} | |
| 16%|█▌ | 180/1149 [1:37:43<8:55:55, 33.18s/it] 16%|█▌ | 181/1149 [1:38:19<9:06:23, 33.87s/it] 16%|█▌ | 182/1149 [1:38:53<9:06:56, 33.94s/it] 16%|█▌ | 183/1149 [1:39:28<9:12:24, 34.31s/it] 16%|█▌ | 184/1149 [1:40:00<9:02:47, 33.75s/it] 16%|█▌ | 185/1149 [1:40:34<9:01:03, 33.68s/it] 16%|█▌ | 186/1149 [1:41:07<8:56:08, 33.40s/it] 16%|█▋ | 187/1149 [1:41:41<9:01:39, 33.78s/it] 16%|█▋ | 188/1149 [1:42:14<8:55:04, 33.41s/it] 16%|█▋ | 189/1149 [1:42:44<8:39:29, 32.47s/it] 17%|█▋ | 190/1149 [1:43:19<8:51:53, 33.28s/it] 17%|█▋ | 191/1149 [1:43:55<9:01:43, 33.93s/it] 17%|█▋ | 192/1149 [1:44:30<9:08:48, 34.41s/it] 17%|█▋ | 193/1149 [1:45:05<9:09:14, 34.47s/it] 17%|█▋ | 194/1149 [1:45:38<9:00:07, 33.93s/it] 17%|█▋ | 195/1149 [1:46:08<8:42:43, 32.88s/it] 17%|█▋ | 196/1149 [1:46:41<8:43:58, 32.99s/it] 17%|█▋ | 197/1149 [1:47:17<8:54:18, 33.67s/it] 17%|█▋ | 198/1149 [1:47:52<9:01:41, 34.18s/it] 17%|█▋ | 199/1149 [1:48:23<8:46:10, 33.23s/it] 17%|█▋ | 200/1149 [1:48:58<8:55:00, 33.83s/it] {'loss': 0.6675, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 0.52} | |
| 17%|█▋ | 200/1149 [1:48:58<8:55:00, 33.83s/it] 17%|█▋ | 201/1149 [1:49:27<8:31:59, 32.40s/it] 18%|█▊ | 202/1149 [1:49:57<8:17:52, 31.54s/it] 18%|█▊ | 203/1149 [1:50:25<8:02:20, 30.59s/it] 18%|█▊ | 204/1149 [1:50:59<8:19:09, 31.69s/it] 18%|█▊ | 205/1149 [1:51:35<8:35:53, 32.79s/it] 18%|█▊ | 206/1149 [1:52:10<8:47:49, 33.58s/it] 18%|█▊ | 207/1149 [1:52:38<8:20:24, 31.87s/it] 18%|█▊ | 208/1149 [1:53:13<8:36:35, 32.94s/it] 18%|█▊ | 209/1149 [1:53:46<8:34:46, 32.86s/it] 18%|█▊ | 210/1149 [1:54:19<8:36:16, 32.99s/it] 18%|█▊ | 211/1149 [1:54:50<8:26:00, 32.37s/it] 18%|█▊ | 212/1149 [1:55:22<8:19:57, 32.01s/it] 19%|█▊ | 213/1149 [1:55:57<8:35:43, 33.06s/it] 19%|█▊ | 214/1149 [1:56:27<8:20:33, 32.12s/it] 19%|█▊ | 215/1149 [1:57:02<8:32:34, 32.93s/it] 19%|█▉ | 216/1149 [1:57:37<8:43:12, 33.65s/it] 19%|█▉ | 217/1149 [1:58:10<8:40:19, 33.50s/it] 19%|█▉ | 218/1149 [1:58:42<8:31:00, 32.93s/it] 19%|█▉ | 219/1149 [1:59:12<8:18:29, 32.16s/it] 19%|█▉ | 220/1149 [1:59:41<8:01:32, 31.10s/it] {'loss': 0.6582, 'learning_rate': 0.0001, 'global_step': 220, 'epoch': 0.57} | |
| 19%|█▉ | 220/1149 [1:59:41<8:01:32, 31.10s/it] 19%|█▉ | 221/1149 [2:00:16<8:20:44, 32.38s/it] 19%|█▉ | 222/1149 [2:00:51<8:29:54, 33.00s/it] 19%|█▉ | 223/1149 [2:01:26<8:38:06, 33.57s/it] 19%|█▉ | 224/1149 [2:01:59<8:36:30, 33.50s/it] 20%|█▉ | 225/1149 [2:02:34<8:44:12, 34.04s/it] 20%|█▉ | 226/1149 [2:03:06<8:33:20, 33.37s/it] 20%|█▉ | 227/1149 [2:03:35<8:12:25, 32.04s/it] 20%|█▉ | 228/1149 [2:04:05<8:04:27, 31.56s/it] 20%|█▉ | 229/1149 [2:04:37<8:03:02, 31.50s/it] 20%|██ | 230/1149 [2:05:09<8:06:07, 31.74s/it] 20%|██ | 231/1149 [2:05:44<8:21:23, 32.77s/it] 20%|██ | 232/1149 [2:06:12<7:56:47, 31.20s/it] 20%|██ | 233/1149 [2:06:45<8:06:46, 31.88s/it] 20%|██ | 234/1149 [2:07:18<8:08:32, 32.04s/it] 20%|██ | 235/1149 [2:07:51<8:14:26, 32.46s/it] 21%|██ | 236/1149 [2:08:23<8:09:56, 32.20s/it] 21%|██ | 237/1149 [2:08:55<8:09:14, 32.19s/it] 21%|██ | 238/1149 [2:09:30<8:23:03, 33.13s/it] 21%|██ | 239/1149 [2:10:06<8:32:36, 33.80s/it] 21%|██ | 240/1149 [2:10:41<8:38:05, 34.20s/it] {'loss': 0.7473, 'learning_rate': 0.0001, 'global_step': 240, 'epoch': 0.63} | |
| 21%|██ | 240/1149 [2:10:41<8:38:05, 34.20s/it] 21%|██ | 241/1149 [2:11:11<8:22:10, 33.18s/it] 21%|██ | 242/1149 [2:11:43<8:13:23, 32.64s/it] 21%|██ | 243/1149 [2:12:15<8:09:13, 32.40s/it] 21%|██ | 244/1149 [2:12:46<8:05:49, 32.21s/it] 21%|██▏ | 245/1149 [2:13:20<8:12:22, 32.68s/it] 21%|██▏ | 246/1149 [2:13:56<8:23:47, 33.47s/it] 21%|██▏ | 247/1149 [2:14:27<8:15:43, 32.97s/it] 22%|██▏ | 248/1149 [2:14:56<7:55:50, 31.69s/it] 22%|██▏ | 249/1149 [2:15:30<8:05:42, 32.38s/it] 22%|██▏ | 250/1149 [2:16:05<8:16:51, 33.16s/it] 22%|██▏ | 251/1149 [2:16:35<8:03:27, 32.30s/it] 22%|██▏ | 252/1149 [2:17:05<7:50:07, 31.45s/it] 22%|██▏ | 253/1149 [2:17:38<7:58:29, 32.04s/it] 22%|██▏ | 254/1149 [2:18:10<7:56:58, 31.98s/it] 22%|██▏ | 255/1149 [2:18:44<8:05:04, 32.56s/it] 22%|██▏ | 256/1149 [2:19:15<7:58:45, 32.17s/it] 22%|██▏ | 257/1149 [2:19:50<8:11:30, 33.06s/it] 22%|██▏ | 258/1149 [2:20:25<8:15:55, 33.40s/it] 23%|██▎ | 259/1149 [2:20:59<8:19:43, 33.69s/it] 23%|██▎ | 260/1149 [2:21:34<8:27:22, 34.24s/it] {'loss': 0.6401, 'learning_rate': 0.0001, 'global_step': 260, 'epoch': 0.68} | |
| 23%|██▎ | 260/1149 [2:21:35<8:27:22, 34.24s/it] 23%|██▎ | 261/1149 [2:22:05<8:12:14, 33.26s/it] 23%|██▎ | 262/1149 [2:22:36<8:00:10, 32.48s/it] 23%|██▎ | 263/1149 [2:23:08<7:55:56, 32.23s/it] 23%|██▎ | 264/1149 [2:23:35<7:33:58, 30.78s/it] 23%|██▎ | 265/1149 [2:24:11<7:54:03, 32.18s/it] 23%|██▎ | 266/1149 [2:24:44<8:00:52, 32.68s/it] 23%|██▎ | 267/1149 [2:25:16<7:54:32, 32.28s/it] 23%|██▎ | 268/1149 [2:25:51<8:07:13, 33.18s/it] 23%|██▎ | 269/1149 [2:26:24<8:07:00, 33.21s/it] 23%|██▎ | 270/1149 [2:26:59<8:11:51, 33.57s/it] 24%|██▎ | 271/1149 [2:27:31<8:03:59, 33.07s/it] 24%|██▎ | 272/1149 [2:28:02<7:56:32, 32.60s/it] 24%|██▍ | 273/1149 [2:28:32<7:44:53, 31.84s/it] 24%|██▍ | 274/1149 [2:29:06<7:52:03, 32.37s/it] 24%|██▍ | 275/1149 [2:29:41<8:02:40, 33.14s/it] 24%|██▍ | 276/1149 [2:30:11<7:50:23, 32.33s/it] 24%|██▍ | 277/1149 [2:30:42<7:43:05, 31.86s/it] 24%|██▍ | 278/1149 [2:31:14<7:44:01, 31.96s/it] 24%|██▍ | 279/1149 [2:31:49<7:56:36, 32.87s/it] 24%|██▍ | 280/1149 [2:32:24<8:06:05, 33.56s/it] {'loss': 0.6312, 'learning_rate': 0.0001, 'global_step': 280, 'epoch': 0.73} | |
| 24%|██▍ | 280/1149 [2:32:24<8:06:05, 33.56s/it] 24%|██▍ | 281/1149 [2:32:56<7:58:38, 33.09s/it] 25%|██▍ | 282/1149 [2:33:26<7:44:06, 32.12s/it] 25%|██▍ | 283/1149 [2:33:58<7:43:30, 32.11s/it] 25%|██▍ | 284/1149 [2:34:34<7:57:02, 33.09s/it] 25%|██▍ | 285/1149 [2:35:06<7:54:57, 32.98s/it] 25%|██▍ | 286/1149 [2:35:42<8:04:50, 33.71s/it] 25%|██▍ | 287/1149 [2:36:14<7:56:40, 33.18s/it] 25%|██▌ | 288/1149 [2:36:44<7:45:28, 32.44s/it] 25%|██▌ | 289/1149 [2:37:20<7:57:42, 33.33s/it] 25%|██▌ | 290/1149 [2:37:55<8:06:01, 33.95s/it] 25%|██▌ | 291/1149 [2:38:31<8:11:14, 34.35s/it] 25%|██▌ | 292/1149 [2:39:03<8:04:13, 33.90s/it] 26%|██▌ | 293/1149 [2:39:35<7:54:29, 33.26s/it] 26%|██▌ | 294/1149 [2:40:09<7:55:46, 33.39s/it] 26%|██▌ | 295/1149 [2:40:44<8:00:55, 33.79s/it] 26%|██▌ | 296/1149 [2:41:18<8:01:31, 33.87s/it] 26%|██▌ | 297/1149 [2:41:53<8:06:51, 34.29s/it] 26%|██▌ | 298/1149 [2:42:25<7:57:53, 33.69s/it] 26%|██▌ | 299/1149 [2:42:51<7:24:48, 31.40s/it] 26%|██▌ | 300/1149 [2:43:20<7:14:34, 30.71s/it] {'loss': 0.6399, 'learning_rate': 0.0001, 'global_step': 300, 'epoch': 0.78} | |
| 26%|██▌ | 300/1149 [2:43:20<7:14:34, 30.71s/it] 26%|██▌ | 301/1149 [2:43:59<7:48:55, 33.18s/it] 26%|██▋ | 302/1149 [2:44:32<7:47:59, 33.15s/it] 26%|██▋ | 303/1149 [2:45:03<7:35:33, 32.31s/it] 26%|██▋ | 304/1149 [2:45:37<7:42:56, 32.87s/it] 27%|██▋ | 305/1149 [2:46:12<7:53:09, 33.64s/it] 27%|██▋ | 306/1149 [2:46:43<7:40:29, 32.78s/it] 27%|██▋ | 307/1149 [2:47:15<7:36:15, 32.51s/it] 27%|██▋ | 308/1149 [2:47:49<7:43:12, 33.05s/it] 27%|██▋ | 309/1149 [2:48:25<7:52:03, 33.72s/it] 27%|██▋ | 310/1149 [2:48:59<7:54:26, 33.93s/it] 27%|██▋ | 311/1149 [2:49:29<7:36:37, 32.69s/it] 27%|██▋ | 312/1149 [2:49:58<7:21:55, 31.68s/it] 27%|██▋ | 313/1149 [2:50:33<7:36:01, 32.73s/it] 27%|██▋ | 314/1149 [2:51:04<7:25:56, 32.04s/it] 27%|██▋ | 315/1149 [2:51:39<7:37:20, 32.90s/it] 28%|██▊ | 316/1149 [2:52:09<7:27:30, 32.23s/it] 28%|██▊ | 317/1149 [2:52:39<7:17:30, 31.55s/it] 28%|██▊ | 318/1149 [2:53:10<7:15:36, 31.45s/it] 28%|██▊ | 319/1149 [2:53:40<7:07:24, 30.90s/it] 28%|██▊ | 320/1149 [2:54:15<7:25:10, 32.22s/it] {'loss': 0.6484, 'learning_rate': 0.0001, 'global_step': 320, 'epoch': 0.84} | |
| 28%|██▊ | 320/1149 [2:54:15<7:25:10, 32.22s/it] 28%|██▊ | 321/1149 [2:54:46<7:16:42, 31.65s/it] 28%|██▊ | 322/1149 [2:55:17<7:13:47, 31.47s/it] 28%|██▊ | 323/1149 [2:55:52<7:28:00, 32.54s/it] 28%|██▊ | 324/1149 [2:56:22<7:17:56, 31.85s/it] 28%|██▊ | 325/1149 [2:56:57<7:31:28, 32.87s/it] 28%|██▊ | 326/1149 [2:57:33<7:41:11, 33.62s/it] 28%|██▊ | 327/1149 [2:58:05<7:34:28, 33.17s/it] 29%|██▊ | 328/1149 [2:58:39<7:40:19, 33.64s/it] 29%|██▊ | 329/1149 [2:59:13<7:40:05, 33.66s/it] 29%|██▊ | 330/1149 [2:59:45<7:30:49, 33.03s/it] 29%|██▉ | 331/1149 [3:00:20<7:40:20, 33.77s/it] 29%|██▉ | 332/1149 [3:00:56<7:46:00, 34.22s/it] 29%|██▉ | 333/1149 [3:01:31<7:49:38, 34.53s/it] 29%|██▉ | 334/1149 [3:02:02<7:37:00, 33.64s/it] 29%|██▉ | 335/1149 [3:02:37<7:38:37, 33.80s/it] 29%|██▉ | 336/1149 [3:03:12<7:44:51, 34.31s/it] 29%|██▉ | 337/1149 [3:03:45<7:39:39, 33.96s/it] 29%|██▉ | 338/1149 [3:04:18<7:35:07, 33.67s/it] 30%|██▉ | 339/1149 [3:04:49<7:22:43, 32.79s/it] 30%|██▉ | 340/1149 [3:05:23<7:27:04, 33.16s/it] {'loss': 0.6193, 'learning_rate': 0.0001, 'global_step': 340, 'epoch': 0.89} | |
| 30%|██▉ | 340/1149 [3:05:23<7:27:04, 33.16s/it] 30%|██▉ | 341/1149 [3:05:53<7:14:39, 32.28s/it] 30%|██▉ | 342/1149 [3:06:29<7:27:20, 33.26s/it] 30%|██▉ | 343/1149 [3:07:04<7:35:49, 33.93s/it] 30%|██▉ | 344/1149 [3:07:37<7:32:07, 33.70s/it] 30%|███ | 345/1149 [3:08:11<7:31:15, 33.68s/it] 30%|███ | 346/1149 [3:08:42<7:21:44, 33.01s/it] 30%|███ | 347/1149 [3:09:15<7:21:15, 33.01s/it] 30%|███ | 348/1149 [3:09:49<7:24:51, 33.32s/it] 30%|███ | 349/1149 [3:10:21<7:18:13, 32.87s/it] 30%|███ | 350/1149 [3:10:57<7:27:18, 33.59s/it] 31%|███ | 351/1149 [3:11:32<7:33:05, 34.07s/it] 31%|███ | 352/1149 [3:12:06<7:32:43, 34.08s/it] 31%|███ | 353/1149 [3:12:41<7:36:56, 34.44s/it] 31%|███ | 354/1149 [3:13:12<7:23:24, 33.47s/it] 31%|███ | 355/1149 [3:13:45<7:18:24, 33.13s/it] 31%|███ | 356/1149 [3:14:17<7:16:22, 33.02s/it] 31%|███ | 357/1149 [3:14:48<7:04:24, 32.15s/it] 31%|███ | 358/1149 [3:15:22<7:11:09, 32.70s/it] 31%|███ | 359/1149 [3:15:57<7:20:18, 33.44s/it] 31%|███▏ | 360/1149 [3:16:31<7:22:52, 33.68s/it] {'loss': 0.5908, 'learning_rate': 0.0001, 'global_step': 360, 'epoch': 0.94} | |
| 31%|███▏ | 360/1149 [3:16:31<7:22:52, 33.68s/it] 31%|███▏ | 361/1149 [3:17:01<7:09:35, 32.71s/it] 32%|███▏ | 362/1149 [3:17:36<7:18:10, 33.41s/it] 32%|███▏ | 363/1149 [3:18:07<7:07:25, 32.63s/it] 32%|███▏ | 364/1149 [3:18:37<6:55:59, 31.80s/it] 32%|███▏ | 365/1149 [3:19:13<7:09:37, 32.88s/it] 32%|███▏ | 366/1149 [3:19:42<6:55:15, 31.82s/it] 32%|███▏ | 367/1149 [3:20:17<7:08:22, 32.87s/it] 32%|███▏ | 368/1149 [3:20:51<7:13:27, 33.30s/it] 32%|███▏ | 369/1149 [3:21:27<7:20:36, 33.89s/it] 32%|███▏ | 370/1149 [3:21:57<7:07:03, 32.89s/it] 32%|███▏ | 371/1149 [3:22:33<7:16:14, 33.64s/it] 32%|███▏ | 372/1149 [3:23:05<7:09:41, 33.18s/it] 32%|███▏ | 373/1149 [3:23:39<7:11:49, 33.39s/it] 33%|███▎ | 374/1149 [3:24:14<7:19:08, 34.00s/it] 33%|███▎ | 375/1149 [3:24:50<7:24:10, 34.43s/it] 33%|███▎ | 376/1149 [3:25:22<7:17:18, 33.94s/it] 33%|███▎ | 377/1149 [3:25:55<7:10:04, 33.43s/it] 33%|███▎ | 378/1149 [3:26:28<7:08:41, 33.36s/it] 33%|███▎ | 379/1149 [3:27:03<7:15:33, 33.94s/it] 33%|███▎ | 380/1149 [3:27:33<7:01:17, 32.87s/it] {'loss': 0.6047, 'learning_rate': 0.0001, 'global_step': 380, 'epoch': 0.99} | |
| 33%|███▎ | 380/1149 [3:27:33<7:01:17, 32.87s/it] 33%|███▎ | 381/1149 [3:28:05<6:54:01, 32.35s/it] 33%|███▎ | 382/1149 [3:28:37<6:54:02, 32.39s/it] 33%|███▎ | 383/1149 [3:29:04<6:34:12, 30.88s/it] 33%|███▎ | 384/1149 [3:29:41<6:54:35, 32.52s/it] 34%|███▎ | 385/1149 [3:30:13<6:51:48, 32.34s/it] 34%|███▎ | 386/1149 [3:30:42<6:40:47, 31.52s/it] 34%|███▎ | 387/1149 [3:31:09<6:20:40, 29.97s/it] 34%|███▍ | 388/1149 [3:31:44<6:40:25, 31.57s/it] 34%|███▍ | 389/1149 [3:32:15<6:38:04, 31.43s/it] 34%|███▍ | 390/1149 [3:32:50<6:50:51, 32.48s/it] 34%|███▍ | 391/1149 [3:33:22<6:48:36, 32.34s/it] 34%|███▍ | 392/1149 [3:33:56<6:52:40, 32.71s/it] 34%|███▍ | 393/1149 [3:34:29<6:56:02, 33.02s/it] 34%|███▍ | 394/1149 [3:34:59<6:42:05, 31.95s/it] 34%|███▍ | 395/1149 [3:35:32<6:44:58, 32.23s/it] 34%|███▍ | 396/1149 [3:36:06<6:54:18, 33.01s/it] 35%|███▍ | 397/1149 [3:36:41<7:00:45, 33.57s/it] 35%|███▍ | 398/1149 [3:37:12<6:48:50, 32.66s/it] 35%|███▍ | 399/1149 [3:37:47<6:57:28, 33.40s/it] 35%|███▍ | 400/1149 [3:38:22<7:04:14, 33.98s/it] {'loss': 0.5931, 'learning_rate': 0.0001, 'global_step': 400, 'epoch': 1.04} | |
| 35%|███▍ | 400/1149 [3:38:22<7:04:14, 33.98s/it] 35%|███▍ | 401/1149 [3:38:52<6:49:11, 32.82s/it] 35%|███▍ | 402/1149 [3:39:26<6:50:46, 32.99s/it] 35%|███▌ | 403/1149 [3:40:00<6:54:25, 33.33s/it] 35%|███▌ | 404/1149 [3:40:31<6:45:06, 32.63s/it] 35%|███▌ | 405/1149 [3:41:00<6:30:19, 31.48s/it] 35%|███▌ | 406/1149 [3:41:30<6:25:19, 31.12s/it] 35%|███▌ | 407/1149 [3:42:00<6:19:23, 30.68s/it] 36%|███▌ | 408/1149 [3:42:33<6:27:37, 31.39s/it] 36%|███▌ | 409/1149 [3:43:03<6:23:03, 31.06s/it] 36%|███▌ | 410/1149 [3:43:34<6:23:57, 31.17s/it] 36%|███▌ | 411/1149 [3:44:09<6:36:45, 32.26s/it] 36%|███▌ | 412/1149 [3:44:45<6:48:14, 33.24s/it] 36%|███▌ | 413/1149 [3:45:12<6:26:42, 31.52s/it] 36%|███▌ | 414/1149 [3:45:44<6:26:02, 31.51s/it] 36%|███▌ | 415/1149 [3:46:16<6:27:49, 31.70s/it] 36%|███▌ | 416/1149 [3:46:49<6:30:46, 31.99s/it] 36%|███▋ | 417/1149 [3:47:19<6:25:34, 31.60s/it] 36%|███▋ | 418/1149 [3:47:54<6:37:51, 32.66s/it] 36%|███▋ | 419/1149 [3:48:25<6:28:24, 31.92s/it] 37%|███▋ | 420/1149 [3:49:00<6:40:53, 33.00s/it] {'loss': 0.6038, 'learning_rate': 0.0001, 'global_step': 420, 'epoch': 1.1} | |
| 37%|███▋ | 420/1149 [3:49:00<6:40:53, 33.00s/it] 37%|███▋ | 421/1149 [3:49:34<6:43:39, 33.27s/it] 37%|███▋ | 422/1149 [3:50:09<6:50:17, 33.86s/it] 37%|███▋ | 423/1149 [3:50:40<6:36:52, 32.80s/it] 37%|███▋ | 424/1149 [3:51:14<6:41:17, 33.21s/it] 37%|███▋ | 425/1149 [3:51:49<6:48:34, 33.86s/it] 37%|███▋ | 426/1149 [3:52:24<6:52:38, 34.24s/it] 37%|███▋ | 427/1149 [3:53:00<6:56:01, 34.57s/it] 37%|███▋ | 428/1149 [3:53:32<6:47:34, 33.92s/it] 37%|███▋ | 429/1149 [3:54:03<6:37:48, 33.15s/it] 37%|███▋ | 430/1149 [3:54:33<6:24:41, 32.10s/it] 38%|███▊ | 431/1149 [3:55:04<6:21:04, 31.84s/it] 38%|███▊ | 432/1149 [3:55:38<6:26:24, 32.34s/it] 38%|███▊ | 433/1149 [3:56:10<6:25:22, 32.29s/it] 38%|███▊ | 434/1149 [3:56:45<6:33:37, 33.03s/it] 38%|███▊ | 435/1149 [3:57:15<6:24:13, 32.29s/it] 38%|███▊ | 436/1149 [3:57:47<6:20:18, 32.00s/it] 38%|███▊ | 437/1149 [3:58:18<6:18:12, 31.87s/it] 38%|███▊ | 438/1149 [3:58:53<6:29:24, 32.86s/it] 38%|███▊ | 439/1149 [3:59:23<6:17:05, 31.87s/it] 38%|███▊ | 440/1149 [3:59:54<6:13:48, 31.63s/it] {'loss': 0.5726, 'learning_rate': 0.0001, 'global_step': 440, 'epoch': 1.15} | |
| 38%|███▊ | 440/1149 [3:59:54<6:13:48, 31.63s/it] 38%|███▊ | 441/1149 [4:00:28<6:22:26, 32.41s/it] 38%|███▊ | 442/1149 [4:01:03<6:31:55, 33.26s/it] 39%|███▊ | 443/1149 [4:01:39<6:38:28, 33.86s/it] 39%|███▊ | 444/1149 [4:02:14<6:43:10, 34.31s/it] 39%|███▊ | 445/1149 [4:02:47<6:37:15, 33.86s/it] 39%|███▉ | 446/1149 [4:03:22<6:41:00, 34.23s/it] 39%|███▉ | 447/1149 [4:03:57<6:44:06, 34.54s/it] 39%|███▉ | 448/1149 [4:04:29<6:34:46, 33.79s/it] 39%|███▉ | 449/1149 [4:05:04<6:36:14, 33.96s/it] 39%|███▉ | 450/1149 [4:05:34<6:23:16, 32.90s/it] 39%|███▉ | 451/1149 [4:06:03<6:08:23, 31.67s/it] 39%|███▉ | 452/1149 [4:06:38<6:20:30, 32.76s/it] 39%|███▉ | 453/1149 [4:07:10<6:16:01, 32.42s/it] 40%|███▉ | 454/1149 [4:07:45<6:25:16, 33.26s/it] 40%|███▉ | 455/1149 [4:08:19<6:28:13, 33.56s/it] 40%|███▉ | 456/1149 [4:08:50<6:18:39, 32.78s/it] 40%|███▉ | 457/1149 [4:09:25<6:25:20, 33.41s/it] 40%|███▉ | 458/1149 [4:09:58<6:24:30, 33.39s/it] 40%|███▉ | 459/1149 [4:10:31<6:19:29, 33.00s/it] 40%|████ | 460/1149 [4:11:03<6:16:27, 32.78s/it] {'loss': 0.5738, 'learning_rate': 0.0001, 'global_step': 460, 'epoch': 1.2} | |
| 40%|████ | 460/1149 [4:11:03<6:16:27, 32.78s/it] 40%|████ | 461/1149 [4:11:38<6:24:30, 33.53s/it] 40%|████ | 462/1149 [4:12:06<6:06:14, 31.99s/it] 40%|████ | 463/1149 [4:12:42<6:17:29, 33.02s/it] 40%|████ | 464/1149 [4:13:16<6:21:10, 33.39s/it] 40%|████ | 465/1149 [4:13:46<6:09:18, 32.40s/it] 41%|████ | 466/1149 [4:14:13<5:47:55, 30.56s/it] 41%|████ | 467/1149 [4:14:43<5:46:09, 30.45s/it] 41%|████ | 468/1149 [4:15:16<5:55:26, 31.32s/it] 41%|████ | 469/1149 [4:15:50<6:04:04, 32.12s/it] 41%|████ | 470/1149 [4:16:24<6:09:53, 32.68s/it] 41%|████ | 471/1149 [4:16:59<6:18:23, 33.49s/it] 41%|████ | 472/1149 [4:17:31<6:11:51, 32.96s/it] 41%|████ | 473/1149 [4:18:07<6:19:37, 33.69s/it] 41%|████▏ | 474/1149 [4:18:34<5:59:23, 31.95s/it] 41%|████▏ | 475/1149 [4:19:10<6:10:06, 32.95s/it] 41%|████▏ | 476/1149 [4:19:39<5:55:58, 31.74s/it] 42%|████▏ | 477/1149 [4:20:09<5:50:18, 31.28s/it] 42%|████▏ | 478/1149 [4:20:42<5:56:12, 31.85s/it] 42%|████▏ | 479/1149 [4:21:12<5:48:28, 31.21s/it] 42%|████▏ | 480/1149 [4:21:47<6:01:56, 32.46s/it] {'loss': 0.5536, 'learning_rate': 0.0001, 'global_step': 480, 'epoch': 1.25} | |
| 42%|████▏ | 480/1149 [4:21:47<6:01:56, 32.46s/it] 42%|████▏ | 481/1149 [4:22:22<6:10:57, 33.32s/it] 42%|████▏ | 482/1149 [4:22:52<5:59:12, 32.31s/it] 42%|████▏ | 483/1149 [4:23:25<6:00:13, 32.45s/it] 42%|████▏ | 484/1149 [4:24:00<6:08:37, 33.26s/it] 42%|████▏ | 485/1149 [4:24:33<6:06:38, 33.13s/it] 42%|████▏ | 486/1149 [4:25:03<5:56:57, 32.30s/it] 42%|████▏ | 487/1149 [4:25:36<5:58:38, 32.51s/it] 42%|████▏ | 488/1149 [4:26:05<5:44:19, 31.25s/it] 43%|████▎ | 489/1149 [4:26:37<5:46:13, 31.47s/it] 43%|████▎ | 490/1149 [4:27:04<5:31:03, 30.14s/it] 43%|████▎ | 491/1149 [4:27:37<5:39:22, 30.95s/it] 43%|████▎ | 492/1149 [4:28:07<5:37:27, 30.82s/it] 43%|████▎ | 493/1149 [4:28:41<5:48:27, 31.87s/it] 43%|████▎ | 494/1149 [4:29:15<5:54:49, 32.50s/it] 43%|████▎ | 495/1149 [4:29:51<6:03:40, 33.36s/it] 43%|████▎ | 496/1149 [4:30:24<6:03:43, 33.42s/it] 43%|████▎ | 497/1149 [4:30:56<5:55:42, 32.73s/it] 43%|████▎ | 498/1149 [4:31:29<5:56:29, 32.86s/it] 43%|████▎ | 499/1149 [4:32:01<5:54:32, 32.73s/it] 44%|████▎ | 500/1149 [4:32:33<5:52:07, 32.55s/it] {'loss': 0.576, 'learning_rate': 0.0001, 'global_step': 500, 'epoch': 1.31} | |
| 44%|████▎ | 500/1149 [4:32:33<5:52:07, 32.55s/it] 44%|████▎ | 501/1149 [4:33:04<5:46:06, 32.05s/it] 44%|████▎ | 502/1149 [4:33:37<5:48:01, 32.27s/it] 44%|████▍ | 503/1149 [4:34:11<5:53:56, 32.87s/it] 44%|████▍ | 504/1149 [4:34:40<5:41:43, 31.79s/it] 44%|████▍ | 505/1149 [4:35:12<5:39:16, 31.61s/it] 44%|████▍ | 506/1149 [4:35:45<5:45:20, 32.23s/it] 44%|████▍ | 507/1149 [4:36:17<5:43:19, 32.09s/it] 44%|████▍ | 508/1149 [4:36:46<5:32:49, 31.15s/it] 44%|████▍ | 509/1149 [4:37:17<5:31:53, 31.11s/it] 44%|████▍ | 510/1149 [4:37:48<5:31:09, 31.09s/it] 44%|████▍ | 511/1149 [4:38:21<5:34:52, 31.49s/it] 45%|████▍ | 512/1149 [4:38:55<5:42:52, 32.30s/it] 45%|████▍ | 513/1149 [4:39:27<5:42:31, 32.31s/it] 45%|████▍ | 514/1149 [4:39:57<5:35:55, 31.74s/it] 45%|████▍ | 515/1149 [4:40:31<5:41:19, 32.30s/it] 45%|████▍ | 516/1149 [4:41:03<5:39:16, 32.16s/it] 45%|████▍ | 517/1149 [4:41:38<5:48:53, 33.12s/it] 45%|████▌ | 518/1149 [4:42:12<5:51:13, 33.40s/it] 45%|████▌ | 519/1149 [4:42:41<5:35:29, 31.95s/it] 45%|████▌ | 520/1149 [4:43:14<5:39:04, 32.34s/it] {'loss': 0.5598, 'learning_rate': 0.0001, 'global_step': 520, 'epoch': 1.36} | |
| 45%|████▌ | 520/1149 [4:43:14<5:39:04, 32.34s/it] 45%|████▌ | 521/1149 [4:43:49<5:47:43, 33.22s/it] 45%|████▌ | 522/1149 [4:44:20<5:37:23, 32.29s/it] 46%|████▌ | 523/1149 [4:44:54<5:44:55, 33.06s/it] 46%|████▌ | 524/1149 [4:45:23<5:31:46, 31.85s/it] 46%|████▌ | 525/1149 [4:45:56<5:33:45, 32.09s/it] 46%|████▌ | 526/1149 [4:46:29<5:34:54, 32.25s/it] 46%|████▌ | 527/1149 [4:47:03<5:39:38, 32.76s/it] 46%|████▌ | 528/1149 [4:47:34<5:33:53, 32.26s/it] 46%|████▌ | 529/1149 [4:48:09<5:42:52, 33.18s/it] 46%|████▌ | 530/1149 [4:48:43<5:43:17, 33.28s/it] 46%|████▌ | 531/1149 [4:49:15<5:38:38, 32.88s/it] 46%|████▋ | 532/1149 [4:49:43<5:25:41, 31.67s/it] 46%|████▋ | 533/1149 [4:50:13<5:19:48, 31.15s/it] 46%|████▋ | 534/1149 [4:50:49<5:32:22, 32.43s/it] 47%|████▋ | 535/1149 [4:51:16<5:16:32, 30.93s/it] 47%|████▋ | 536/1149 [4:51:51<5:29:29, 32.25s/it] 47%|████▋ | 537/1149 [4:52:22<5:24:04, 31.77s/it] 47%|████▋ | 538/1149 [4:52:58<5:34:34, 32.86s/it] 47%|████▋ | 539/1149 [4:53:33<5:41:30, 33.59s/it] 47%|████▋ | 540/1149 [4:54:03<5:30:41, 32.58s/it] {'loss': 0.547, 'learning_rate': 0.0001, 'global_step': 540, 'epoch': 1.41} | |
| 47%|████▋ | 540/1149 [4:54:03<5:30:41, 32.58s/it] 47%|████▋ | 541/1149 [4:54:38<5:37:39, 33.32s/it] 47%|████▋ | 542/1149 [4:55:13<5:42:53, 33.89s/it] 47%|████▋ | 543/1149 [4:55:45<5:37:01, 33.37s/it] 47%|████▋ | 544/1149 [4:56:18<5:35:16, 33.25s/it] 47%|████▋ | 545/1149 [4:56:51<5:32:52, 33.07s/it] 48%|████▊ | 546/1149 [4:57:26<5:37:57, 33.63s/it] 48%|████▊ | 547/1149 [4:57:58<5:31:42, 33.06s/it] 48%|████▊ | 548/1149 [4:58:33<5:37:42, 33.71s/it] 48%|████▊ | 549/1149 [4:59:04<5:27:52, 32.79s/it] 48%|████▊ | 550/1149 [4:59:38<5:31:24, 33.20s/it] 48%|████▊ | 551/1149 [5:00:12<5:33:27, 33.46s/it] 48%|████▊ | 552/1149 [5:00:47<5:37:30, 33.92s/it] 48%|████▊ | 553/1149 [5:01:22<5:41:03, 34.33s/it] 48%|████▊ | 554/1149 [5:01:58<5:43:32, 34.64s/it] 48%|████▊ | 555/1149 [5:02:29<5:33:23, 33.68s/it] 48%|████▊ | 556/1149 [5:02:59<5:23:10, 32.70s/it] 48%|████▊ | 557/1149 [5:03:30<5:17:49, 32.21s/it] 49%|████▊ | 558/1149 [5:04:05<5:23:41, 32.86s/it] 49%|████▊ | 559/1149 [5:04:39<5:27:01, 33.26s/it] 49%|████▊ | 560/1149 [5:05:14<5:32:45, 33.90s/it] {'loss': 0.5269, 'learning_rate': 0.0001, 'global_step': 560, 'epoch': 1.46} | |
| 49%|████▊ | 560/1149 [5:05:14<5:32:45, 33.90s/it] 49%|████▉ | 561/1149 [5:05:50<5:35:55, 34.28s/it] 49%|████▉ | 562/1149 [5:06:20<5:25:09, 33.24s/it] 49%|████▉ | 563/1149 [5:06:53<5:23:56, 33.17s/it] 49%|████▉ | 564/1149 [5:07:29<5:29:45, 33.82s/it] 49%|████▉ | 565/1149 [5:08:03<5:29:42, 33.87s/it] 49%|████▉ | 566/1149 [5:08:38<5:32:51, 34.26s/it] 49%|████▉ | 567/1149 [5:09:10<5:26:54, 33.70s/it] 49%|████▉ | 568/1149 [5:09:44<5:25:39, 33.63s/it] 50%|████▉ | 569/1149 [5:10:16<5:22:38, 33.38s/it] 50%|████▉ | 570/1149 [5:10:51<5:25:58, 33.78s/it] 50%|████▉ | 571/1149 [5:11:24<5:21:45, 33.40s/it] 50%|████▉ | 572/1149 [5:11:54<5:12:08, 32.46s/it] 50%|████▉ | 573/1149 [5:12:29<5:19:24, 33.27s/it] 50%|████▉ | 574/1149 [5:13:05<5:25:07, 33.93s/it] 50%|█████ | 575/1149 [5:13:40<5:29:03, 34.40s/it] 50%|█████ | 576/1149 [5:14:15<5:29:12, 34.47s/it] 50%|█████ | 577/1149 [5:14:47<5:23:37, 33.95s/it] 50%|█████ | 578/1149 [5:15:18<5:13:00, 32.89s/it] 50%|█████ | 579/1149 [5:15:51<5:13:30, 33.00s/it] 50%|█████ | 580/1149 [5:16:26<5:19:28, 33.69s/it] {'loss': 0.5244, 'learning_rate': 0.0001, 'global_step': 580, 'epoch': 1.51} | |
| 50%|█████ | 580/1149 [5:16:26<5:19:28, 33.69s/it] 51%|█████ | 581/1149 [5:17:02<5:23:32, 34.18s/it] 51%|█████ | 582/1149 [5:17:33<5:13:47, 33.21s/it] 51%|█████ | 583/1149 [5:18:08<5:18:55, 33.81s/it] 51%|█████ | 584/1149 [5:18:37<5:04:49, 32.37s/it] 51%|█████ | 585/1149 [5:19:06<4:56:16, 31.52s/it] 51%|█████ | 586/1149 [5:19:35<4:46:50, 30.57s/it] 51%|█████ | 587/1149 [5:20:09<4:56:38, 31.67s/it] 51%|█████ | 588/1149 [5:20:44<5:06:16, 32.76s/it] 51%|█████▏ | 589/1149 [5:21:20<5:12:59, 33.54s/it] 51%|█████▏ | 590/1149 [5:21:48<4:56:26, 31.82s/it] 51%|█████▏ | 591/1149 [5:22:23<5:05:45, 32.88s/it] 52%|█████▏ | 592/1149 [5:22:55<5:04:28, 32.80s/it] 52%|█████▏ | 593/1149 [5:23:29<5:04:55, 32.90s/it] 52%|█████▏ | 594/1149 [5:23:59<4:58:40, 32.29s/it] 52%|█████▏ | 595/1149 [5:24:31<4:54:59, 31.95s/it] 52%|█████▏ | 596/1149 [5:25:06<5:04:04, 32.99s/it] 52%|█████▏ | 597/1149 [5:25:36<4:55:02, 32.07s/it] 52%|█████▏ | 598/1149 [5:26:11<5:01:58, 32.88s/it] 52%|█████▏ | 599/1149 [5:26:46<5:08:02, 33.61s/it] 52%|█████▏ | 600/1149 [5:27:19<5:06:18, 33.48s/it] {'loss': 0.5017, 'learning_rate': 0.0001, 'global_step': 600, 'epoch': 1.57} | |
| 52%|█████▏ | 600/1149 [5:27:19<5:06:18, 33.48s/it] 52%|█████▏ | 601/1149 [5:27:54<5:10:01, 33.94s/it] 52%|█████▏ | 602/1149 [5:28:25<4:59:38, 32.87s/it] 52%|█████▏ | 603/1149 [5:28:54<4:48:24, 31.69s/it] 53%|█████▎ | 604/1149 [5:29:29<4:57:46, 32.78s/it] 53%|█████▎ | 605/1149 [5:30:03<5:01:45, 33.28s/it] 53%|█████▎ | 606/1149 [5:30:39<5:06:36, 33.88s/it] 53%|█████▎ | 607/1149 [5:31:12<5:05:39, 33.84s/it] 53%|█████▎ | 608/1149 [5:31:48<5:09:00, 34.27s/it] 53%|█████▎ | 609/1149 [5:32:19<5:01:47, 33.53s/it] 53%|█████▎ | 610/1149 [5:32:48<4:48:47, 32.15s/it] 53%|█████▎ | 611/1149 [5:33:19<4:43:40, 31.64s/it] 53%|█████▎ | 612/1149 [5:33:50<4:42:23, 31.55s/it] 53%|█████▎ | 613/1149 [5:34:22<4:43:51, 31.78s/it] 53%|█████▎ | 614/1149 [5:34:58<4:52:23, 32.79s/it] 54%|█████▎ | 615/1149 [5:35:25<4:37:44, 31.21s/it] 54%|█████▎ | 616/1149 [5:35:59<4:43:26, 31.91s/it] 54%|█████▎ | 617/1149 [5:36:31<4:44:10, 32.05s/it] 54%|█████▍ | 618/1149 [5:37:05<4:47:21, 32.47s/it] 54%|█████▍ | 619/1149 [5:37:36<4:44:22, 32.19s/it] 54%|█████▍ | 620/1149 [5:38:08<4:43:42, 32.18s/it] {'loss': 0.557, 'learning_rate': 0.0001, 'global_step': 620, 'epoch': 1.62} | |
| 54%|█████▍ | 620/1149 [5:38:08<4:43:42, 32.18s/it] 54%|█████▍ | 621/1149 [5:38:44<4:51:28, 33.12s/it] 54%|█████▍ | 622/1149 [5:39:19<4:56:43, 33.78s/it] 54%|█████▍ | 623/1149 [5:39:54<4:59:30, 34.16s/it] 54%|█████▍ | 624/1149 [5:40:25<4:50:10, 33.16s/it] 54%|█████▍ | 625/1149 [5:40:56<4:44:52, 32.62s/it] 54%|█████▍ | 626/1149 [5:41:28<4:42:20, 32.39s/it] 55%|█████▍ | 627/1149 [5:42:00<4:40:07, 32.20s/it] 55%|█████▍ | 628/1149 [5:42:34<4:43:47, 32.68s/it] 55%|█████▍ | 629/1149 [5:43:09<4:50:08, 33.48s/it] 55%|█████▍ | 630/1149 [5:43:41<4:45:09, 32.97s/it] 55%|█████▍ | 631/1149 [5:44:09<4:33:30, 31.68s/it] 55%|█████▌ | 632/1149 [5:44:43<4:38:58, 32.38s/it] 55%|█████▌ | 633/1149 [5:45:18<4:45:10, 33.16s/it] 55%|█████▌ | 634/1149 [5:45:49<4:37:26, 32.32s/it] 55%|█████▌ | 635/1149 [5:46:18<4:29:36, 31.47s/it] 55%|█████▌ | 636/1149 [5:46:52<4:34:05, 32.06s/it] 55%|█████▌ | 637/1149 [5:47:23<4:33:02, 32.00s/it] 56%|█████▌ | 638/1149 [5:47:57<4:37:24, 32.57s/it] 56%|█████▌ | 639/1149 [5:48:29<4:33:33, 32.18s/it] 56%|█████▌ | 640/1149 [5:49:04<4:40:34, 33.07s/it] {'loss': 0.4926, 'learning_rate': 0.0001, 'global_step': 640, 'epoch': 1.67} | |
| 56%|█████▌ | 640/1149 [5:49:04<4:40:34, 33.07s/it] 56%|█████▌ | 641/1149 [5:49:38<4:42:48, 33.40s/it] 56%|█████▌ | 642/1149 [5:50:12<4:44:37, 33.68s/it] 56%|█████▌ | 643/1149 [5:50:48<4:48:39, 34.23s/it] 56%|█████▌ | 644/1149 [5:51:19<4:39:52, 33.25s/it] 56%|█████▌ | 645/1149 [5:51:49<4:32:41, 32.46s/it] 56%|█████▌ | 646/1149 [5:52:21<4:30:02, 32.21s/it] 56%|█████▋ | 647/1149 [5:52:48<4:17:24, 30.77s/it] 56%|█████▋ | 648/1149 [5:53:24<4:28:33, 32.16s/it] 56%|█████▋ | 649/1149 [5:53:58<4:32:12, 32.67s/it] 57%|█████▋ | 650/1149 [5:54:29<4:28:23, 32.27s/it] 57%|█████▋ | 651/1149 [5:55:04<4:35:24, 33.18s/it] 57%|█████▋ | 652/1149 [5:55:38<4:35:07, 33.21s/it] 57%|█████▋ | 653/1149 [5:56:12<4:37:38, 33.58s/it] 57%|█████▋ | 654/1149 [5:56:44<4:32:50, 33.07s/it] 57%|█████▋ | 655/1149 [5:57:15<4:28:21, 32.60s/it] 57%|█████▋ | 656/1149 [5:57:45<4:21:36, 31.84s/it] 57%|█████▋ | 657/1149 [5:58:19<4:25:31, 32.38s/it] 57%|█████▋ | 658/1149 [5:58:54<4:31:34, 33.19s/it] 57%|█████▋ | 659/1149 [5:59:25<4:24:17, 32.36s/it] 57%|█████▋ | 660/1149 [5:59:55<4:19:50, 31.88s/it] {'loss': 0.4458, 'learning_rate': 0.0001, 'global_step': 660, 'epoch': 1.72} | |
| 57%|█████▋ | 660/1149 [5:59:55<4:19:50, 31.88s/it] 58%|█████▊ | 661/1149 [6:00:28<4:20:06, 31.98s/it] 58%|█████▊ | 662/1149 [6:01:03<4:26:50, 32.88s/it] 58%|█████▊ | 663/1149 [6:01:38<4:31:50, 33.56s/it] 58%|█████▊ | 664/1149 [6:02:10<4:27:22, 33.08s/it] 58%|█████▊ | 665/1149 [6:02:40<4:19:00, 32.11s/it] 58%|█████▊ | 666/1149 [6:03:12<4:18:30, 32.11s/it] 58%|█████▊ | 667/1149 [6:03:47<4:25:54, 33.10s/it] 58%|█████▊ | 668/1149 [6:04:20<4:24:31, 33.00s/it] 58%|█████▊ | 669/1149 [6:04:55<4:29:40, 33.71s/it] 58%|█████▊ | 670/1149 [6:05:27<4:24:51, 33.18s/it] 58%|█████▊ | 671/1149 [6:05:58<4:18:20, 32.43s/it] 58%|█████▊ | 672/1149 [6:06:33<4:24:54, 33.32s/it] 59%|█████▊ | 673/1149 [6:07:09<4:29:17, 33.94s/it] 59%|█████▊ | 674/1149 [6:07:44<4:31:55, 34.35s/it] 59%|█████▊ | 675/1149 [6:08:17<4:27:48, 33.90s/it] 59%|█████▉ | 676/1149 [6:08:48<4:22:12, 33.26s/it] 59%|█████▉ | 677/1149 [6:09:22<4:22:22, 33.35s/it] 59%|█████▉ | 678/1149 [6:09:57<4:24:38, 33.71s/it] 59%|█████▉ | 679/1149 [6:10:31<4:25:02, 33.84s/it] 59%|█████▉ | 680/1149 [6:11:06<4:27:54, 34.27s/it] {'loss': 0.4503, 'learning_rate': 0.0001, 'global_step': 680, 'epoch': 1.78} | |
| 59%|█████▉ | 680/1149 [6:11:06<4:27:54, 34.27s/it] 59%|█████▉ | 681/1149 [6:11:38<4:22:49, 33.70s/it] 59%|█████▉ | 682/1149 [6:12:04<4:04:27, 31.41s/it] 59%|█████▉ | 683/1149 [6:12:34<3:58:36, 30.72s/it] 60%|█████▉ | 684/1149 [6:13:09<4:09:14, 32.16s/it] 60%|█████▉ | 685/1149 [6:13:42<4:10:47, 32.43s/it] 60%|█████▉ | 686/1149 [6:14:13<4:05:30, 31.81s/it] 60%|█████▉ | 687/1149 [6:14:47<4:10:27, 32.53s/it] 60%|█████▉ | 688/1149 [6:15:22<4:16:32, 33.39s/it] 60%|█████▉ | 689/1149 [6:15:53<4:09:56, 32.60s/it] 60%|██████ | 690/1149 [6:16:25<4:07:49, 32.40s/it] 60%|██████ | 691/1149 [6:16:59<4:11:39, 32.97s/it] 60%|██████ | 692/1149 [6:17:34<4:16:32, 33.68s/it] 60%|██████ | 693/1149 [6:18:09<4:17:42, 33.91s/it] 60%|██████ | 694/1149 [6:18:39<4:07:51, 32.68s/it] 60%|██████ | 695/1149 [6:19:08<3:59:41, 31.68s/it] 61%|██████ | 696/1149 [6:19:43<4:07:08, 32.74s/it] 61%|██████ | 697/1149 [6:20:14<4:01:21, 32.04s/it] 61%|██████ | 698/1149 [6:20:49<4:07:15, 32.90s/it] 61%|██████ | 699/1149 [6:21:19<4:01:39, 32.22s/it] 61%|██████ | 700/1149 [6:21:49<3:55:58, 31.53s/it] {'loss': 0.4413, 'learning_rate': 0.0001, 'global_step': 700, 'epoch': 1.83} | |
| 61%|██████ | 700/1149 [6:21:49<3:55:58, 31.53s/it] 61%|██████ | 701/1149 [6:22:20<3:54:42, 31.43s/it] 61%|██████ | 702/1149 [6:22:50<3:50:04, 30.88s/it] 61%|██████ | 703/1149 [6:23:25<3:59:27, 32.21s/it] 61%|██████▏ | 704/1149 [6:23:56<3:54:40, 31.64s/it] 61%|██████▏ | 705/1149 [6:24:27<3:52:53, 31.47s/it] 61%|██████▏ | 706/1149 [6:25:02<4:00:14, 32.54s/it] 62%|██████▏ | 707/1149 [6:25:32<3:54:30, 31.83s/it] 62%|██████▏ | 708/1149 [6:26:07<4:01:22, 32.84s/it] 62%|██████▏ | 709/1149 [6:26:42<4:06:19, 33.59s/it] 62%|██████▏ | 710/1149 [6:27:14<4:02:31, 33.15s/it] 62%|██████▏ | 711/1149 [6:27:49<4:05:23, 33.61s/it] 62%|██████▏ | 712/1149 [6:28:23<4:05:04, 33.65s/it] 62%|██████▏ | 713/1149 [6:28:54<3:59:59, 33.03s/it] 62%|██████▏ | 714/1149 [6:29:30<4:04:46, 33.76s/it] 62%|██████▏ | 715/1149 [6:30:05<4:07:27, 34.21s/it] 62%|██████▏ | 716/1149 [6:30:40<4:09:08, 34.52s/it] 62%|██████▏ | 717/1149 [6:31:12<4:02:06, 33.63s/it] 62%|██████▏ | 718/1149 [6:31:46<4:02:38, 33.78s/it] 63%|██████▎ | 719/1149 [6:32:22<4:05:43, 34.29s/it] 63%|██████▎ | 720/1149 [6:32:55<4:02:39, 33.94s/it] {'loss': 0.4396, 'learning_rate': 0.0001, 'global_step': 720, 'epoch': 1.88} | |
| 63%|██████▎ | 720/1149 [6:32:55<4:02:39, 33.94s/it] 63%|██████▎ | 721/1149 [6:33:28<4:00:00, 33.65s/it] 63%|██████▎ | 722/1149 [6:33:58<3:53:15, 32.78s/it] 63%|██████▎ | 723/1149 [6:34:32<3:55:23, 33.15s/it] 63%|██████▎ | 724/1149 [6:35:03<3:48:31, 32.26s/it] 63%|██████▎ | 725/1149 [6:35:38<3:54:50, 33.23s/it] 63%|██████▎ | 726/1149 [6:36:14<3:58:56, 33.89s/it] 63%|██████▎ | 727/1149 [6:36:47<3:56:52, 33.68s/it] 63%|██████▎ | 728/1149 [6:37:20<3:56:07, 33.65s/it] 63%|██████▎ | 729/1149 [6:37:52<3:50:53, 32.98s/it] 64%|██████▎ | 730/1149 [6:38:25<3:50:21, 32.99s/it] 64%|██████▎ | 731/1149 [6:38:59<3:51:59, 33.30s/it] 64%|██████▎ | 732/1149 [6:39:31<3:48:16, 32.84s/it] 64%|██████▍ | 733/1149 [6:40:06<3:52:50, 33.58s/it] 64%|██████▍ | 734/1149 [6:40:41<3:55:40, 34.07s/it] 64%|██████▍ | 735/1149 [6:41:15<3:55:18, 34.10s/it] 64%|██████▍ | 736/1149 [6:41:51<3:57:17, 34.47s/it] 64%|██████▍ | 737/1149 [6:42:22<3:49:56, 33.49s/it] 64%|██████▍ | 738/1149 [6:42:54<3:47:03, 33.15s/it] 64%|██████▍ | 739/1149 [6:43:27<3:45:38, 33.02s/it] 64%|██████▍ | 740/1149 [6:43:57<3:39:12, 32.16s/it] {'loss': 0.3788, 'learning_rate': 0.0001, 'global_step': 740, 'epoch': 1.93} | |
| 64%|██████▍ | 740/1149 [6:43:57<3:39:12, 32.16s/it] 64%|██████▍ | 741/1149 [6:44:31<3:42:26, 32.71s/it] 65%|██████▍ | 742/1149 [6:45:06<3:46:54, 33.45s/it] 65%|██████▍ | 743/1149 [6:45:40<3:48:00, 33.69s/it] 65%|██████▍ | 744/1149 [6:46:11<3:40:53, 32.72s/it] 65%|██████▍ | 745/1149 [6:46:46<3:44:59, 33.41s/it] 65%|██████▍ | 746/1149 [6:47:17<3:39:10, 32.63s/it] 65%|██████▌ | 747/1149 [6:47:47<3:32:58, 31.79s/it] 65%|██████▌ | 748/1149 [6:48:22<3:39:40, 32.87s/it] 65%|██████▌ | 749/1149 [6:48:51<3:32:07, 31.82s/it] 65%|██████▌ | 750/1149 [6:49:27<3:38:36, 32.87s/it] 65%|██████▌ | 751/1149 [6:50:01<3:41:04, 33.33s/it] 65%|██████▌ | 752/1149 [6:50:36<3:44:27, 33.92s/it] 66%|██████▌ | 753/1149 [6:51:07<3:37:06, 32.90s/it] 66%|██████▌ | 754/1149 [6:51:42<3:41:32, 33.65s/it] 66%|██████▌ | 755/1149 [6:52:14<3:37:59, 33.20s/it] 66%|██████▌ | 756/1149 [6:52:48<3:38:41, 33.39s/it] 66%|██████▌ | 757/1149 [6:53:24<3:42:05, 33.99s/it] 66%|██████▌ | 758/1149 [6:53:59<3:44:19, 34.42s/it] 66%|██████▌ | 759/1149 [6:54:32<3:40:34, 33.94s/it] 66%|██████▌ | 760/1149 [6:55:04<3:36:36, 33.41s/it] {'loss': 0.3731, 'learning_rate': 0.0001, 'global_step': 760, 'epoch': 1.98} | |
| 66%|██████▌ | 760/1149 [6:55:04<3:36:36, 33.41s/it] 66%|██████▌ | 761/1149 [6:55:37<3:35:33, 33.33s/it] 66%|██████▋ | 762/1149 [6:56:13<3:38:43, 33.91s/it] 66%|██████▋ | 763/1149 [6:56:43<3:31:17, 32.84s/it] 66%|██████▋ | 764/1149 [6:57:14<3:27:21, 32.31s/it] 67%|██████▋ | 765/1149 [6:57:46<3:27:03, 32.35s/it] 67%|██████▋ | 766/1149 [6:58:14<3:16:57, 30.85s/it] 67%|██████▋ | 767/1149 [6:58:50<3:26:58, 32.51s/it] 67%|██████▋ | 768/1149 [6:59:22<3:25:18, 32.33s/it] 67%|██████▋ | 769/1149 [6:59:52<3:19:31, 31.50s/it] 67%|██████▋ | 770/1149 [7:00:18<3:09:16, 29.97s/it] 67%|██████▋ | 771/1149 [7:00:53<3:18:50, 31.56s/it] 67%|██████▋ | 772/1149 [7:01:24<3:17:23, 31.41s/it] 67%|██████▋ | 773/1149 [7:01:59<3:23:25, 32.46s/it] 67%|██████▋ | 774/1149 [7:02:31<3:22:05, 32.33s/it] 67%|██████▋ | 775/1149 [7:03:05<3:23:49, 32.70s/it] 68%|██████▊ | 776/1149 [7:03:39<3:25:12, 33.01s/it] 68%|██████▊ | 777/1149 [7:04:08<3:18:01, 31.94s/it] 68%|██████▊ | 778/1149 [7:04:41<3:19:13, 32.22s/it] 68%|██████▊ | 779/1149 [7:05:16<3:23:33, 33.01s/it] 68%|██████▊ | 780/1149 [7:05:51<3:26:29, 33.58s/it] {'loss': 0.3797, 'learning_rate': 0.0001, 'global_step': 780, 'epoch': 2.04} | |
| 68%|██████▊ | 780/1149 [7:05:51<3:26:29, 33.58s/it] 68%|██████▊ | 781/1149 [7:06:21<3:20:19, 32.66s/it] 68%|██████▊ | 782/1149 [7:06:56<3:24:23, 33.42s/it] 68%|██████▊ | 783/1149 [7:07:32<3:27:17, 33.98s/it] 68%|██████▊ | 784/1149 [7:08:02<3:19:41, 32.83s/it] 68%|██████▊ | 785/1149 [7:08:35<3:20:09, 32.99s/it] 68%|██████▊ | 786/1149 [7:09:09<3:21:36, 33.32s/it] 68%|██████▊ | 787/1149 [7:09:40<3:16:35, 32.59s/it] 69%|██████▊ | 788/1149 [7:10:09<3:09:12, 31.45s/it] 69%|██████▊ | 789/1149 [7:10:39<3:06:33, 31.09s/it] 69%|██████▉ | 790/1149 [7:11:09<3:03:28, 30.66s/it] 69%|██████▉ | 791/1149 [7:11:42<3:07:10, 31.37s/it] 69%|██████▉ | 792/1149 [7:12:12<3:04:40, 31.04s/it] 69%|██████▉ | 793/1149 [7:12:44<3:04:49, 31.15s/it] 69%|██████▉ | 794/1149 [7:13:18<3:10:41, 32.23s/it] 69%|██████▉ | 795/1149 [7:13:54<3:15:53, 33.20s/it] 69%|██████▉ | 796/1149 [7:14:21<3:05:16, 31.49s/it] 69%|██████▉ | 797/1149 [7:14:53<3:04:38, 31.47s/it] 69%|██████▉ | 798/1149 [7:15:25<3:05:18, 31.68s/it] 70%|██████▉ | 799/1149 [7:15:57<3:06:26, 31.96s/it] 70%|██████▉ | 800/1149 [7:16:28<3:03:39, 31.57s/it] {'loss': 0.4084, 'learning_rate': 0.0001, 'global_step': 800, 'epoch': 2.09} | |
| 70%|██████▉ | 800/1149 [7:16:28<3:03:39, 31.57s/it] 70%|██████▉ | 801/1149 [7:17:03<3:09:18, 32.64s/it] 70%|██████▉ | 802/1149 [7:17:33<3:04:38, 31.93s/it] 70%|██████▉ | 803/1149 [7:18:09<3:10:12, 32.98s/it] 70%|██████▉ | 804/1149 [7:18:43<3:11:13, 33.26s/it] 70%|███████ | 805/1149 [7:19:18<3:14:06, 33.85s/it] 70%|███████ | 806/1149 [7:19:48<3:07:29, 32.80s/it] 70%|███████ | 807/1149 [7:20:23<3:09:16, 33.21s/it] 70%|███████ | 808/1149 [7:20:58<3:12:22, 33.85s/it] 70%|███████ | 809/1149 [7:21:33<3:14:00, 34.24s/it] 70%|███████ | 810/1149 [7:22:08<3:15:10, 34.54s/it] 71%|███████ | 811/1149 [7:22:41<3:10:55, 33.89s/it] 71%|███████ | 812/1149 [7:23:12<3:06:03, 33.13s/it] 71%|███████ | 813/1149 [7:23:42<2:59:42, 32.09s/it] 71%|███████ | 814/1149 [7:24:13<2:57:41, 31.83s/it] 71%|███████ | 815/1149 [7:24:46<3:00:00, 32.34s/it] 71%|███████ | 816/1149 [7:25:19<2:59:14, 32.29s/it] 71%|███████ | 817/1149 [7:25:53<3:02:45, 33.03s/it] 71%|███████ | 818/1149 [7:26:24<2:58:05, 32.28s/it] 71%|███████▏ | 819/1149 [7:26:55<2:55:59, 32.00s/it] 71%|███████▏ | 820/1149 [7:27:27<2:54:42, 31.86s/it] {'loss': 0.3583, 'learning_rate': 0.0001, 'global_step': 820, 'epoch': 2.14} | |
| 71%|███████▏ | 820/1149 [7:27:27<2:54:42, 31.86s/it] 71%|███████▏ | 821/1149 [7:28:02<2:59:42, 32.87s/it] 72%|███████▏ | 822/1149 [7:28:32<2:53:41, 31.87s/it] 72%|███████▏ | 823/1149 [7:29:03<2:51:59, 31.65s/it] 72%|███████▏ | 824/1149 [7:29:37<2:55:35, 32.42s/it] 72%|███████▏ | 825/1149 [7:30:12<2:59:46, 33.29s/it] 72%|███████▏ | 826/1149 [7:30:48<3:02:22, 33.88s/it] 72%|███████▏ | 827/1149 [7:31:23<3:04:13, 34.33s/it] 72%|███████▏ | 828/1149 [7:31:56<3:01:10, 33.86s/it] 72%|███████▏ | 829/1149 [7:32:31<3:02:29, 34.22s/it] 72%|███████▏ | 830/1149 [7:33:06<3:03:36, 34.54s/it] 72%|███████▏ | 831/1149 [7:33:38<2:59:01, 33.78s/it] 72%|███████▏ | 832/1149 [7:34:12<2:59:23, 33.95s/it] 72%|███████▏ | 833/1149 [7:34:43<2:53:13, 32.89s/it] 73%|███████▎ | 834/1149 [7:35:12<2:46:15, 31.67s/it] 73%|███████▎ | 835/1149 [7:35:47<2:51:26, 32.76s/it] 73%|███████▎ | 836/1149 [7:36:19<2:49:09, 32.43s/it] 73%|███████▎ | 837/1149 [7:36:54<2:53:00, 33.27s/it] 73%|███████▎ | 838/1149 [7:37:28<2:54:04, 33.58s/it] 73%|███████▎ | 839/1149 [7:37:59<2:49:30, 32.81s/it] 73%|███████▎ | 840/1149 [7:38:34<2:52:10, 33.43s/it] {'loss': 0.3262, 'learning_rate': 0.0001, 'global_step': 840, 'epoch': 2.19} | |
| 73%|███████▎ | 840/1149 [7:38:34<2:52:10, 33.43s/it] 73%|███████▎ | 841/1149 [7:39:07<2:51:27, 33.40s/it] 73%|███████▎ | 842/1149 [7:39:39<2:48:59, 33.03s/it] 73%|███████▎ | 843/1149 [7:40:12<2:47:20, 32.81s/it] 73%|███████▎ | 844/1149 [7:40:47<2:50:36, 33.56s/it] 74%|███████▎ | 845/1149 [7:41:15<2:42:06, 32.00s/it] 74%|███████▎ | 846/1149 [7:41:51<2:47:03, 33.08s/it] 74%|███████▎ | 847/1149 [7:42:25<2:48:11, 33.42s/it] 74%|███████▍ | 848/1149 [7:42:55<2:42:35, 32.41s/it] 74%|███████▍ | 849/1149 [7:43:22<2:32:52, 30.57s/it] 74%|███████▍ | 850/1149 [7:43:52<2:31:46, 30.46s/it] 74%|███████▍ | 851/1149 [7:44:25<2:35:32, 31.32s/it] 74%|███████▍ | 852/1149 [7:44:59<2:39:00, 32.12s/it] 74%|███████▍ | 853/1149 [7:45:33<2:41:14, 32.68s/it] 74%|███████▍ | 854/1149 [7:46:08<2:44:36, 33.48s/it] 74%|███████▍ | 855/1149 [7:46:40<2:41:32, 32.97s/it] 74%|███████▍ | 856/1149 [7:47:16<2:44:37, 33.71s/it] 75%|███████▍ | 857/1149 [7:47:44<2:35:35, 31.97s/it] 75%|███████▍ | 858/1149 [7:48:19<2:39:50, 32.96s/it] 75%|███████▍ | 859/1149 [7:48:48<2:33:26, 31.75s/it] 75%|███████▍ | 860/1149 [7:49:18<2:30:39, 31.28s/it] {'loss': 0.3407, 'learning_rate': 0.0001, 'global_step': 860, 'epoch': 2.25} | |
| 75%|███████▍ | 860/1149 [7:49:18<2:30:39, 31.28s/it] 75%|███████▍ | 861/1149 [7:49:51<2:32:50, 31.84s/it] 75%|███████▌ | 862/1149 [7:50:21<2:29:07, 31.18s/it] 75%|███████▌ | 863/1149 [7:50:56<2:34:30, 32.41s/it] 75%|███████▌ | 864/1149 [7:51:31<2:38:01, 33.27s/it] 75%|███████▌ | 865/1149 [7:52:01<2:32:47, 32.28s/it] 75%|███████▌ | 866/1149 [7:52:34<2:32:58, 32.43s/it] 75%|███████▌ | 867/1149 [7:53:09<2:36:19, 33.26s/it] 76%|███████▌ | 868/1149 [7:53:42<2:35:13, 33.14s/it] 76%|███████▌ | 869/1149 [7:54:12<2:30:48, 32.31s/it] 76%|███████▌ | 870/1149 [7:54:45<2:31:14, 32.53s/it] 76%|███████▌ | 871/1149 [7:55:14<2:24:56, 31.28s/it] 76%|███████▌ | 872/1149 [7:55:46<2:25:24, 31.50s/it] 76%|███████▌ | 873/1149 [7:56:13<2:18:44, 30.16s/it] 76%|███████▌ | 874/1149 [7:56:46<2:21:49, 30.94s/it] 76%|███████▌ | 875/1149 [7:57:16<2:20:41, 30.81s/it] 76%|███████▌ | 876/1149 [7:57:51<2:24:58, 31.86s/it] 76%|███████▋ | 877/1149 [7:58:24<2:27:20, 32.50s/it] 76%|███████▋ | 878/1149 [7:59:00<2:30:38, 33.35s/it] 77%|███████▋ | 879/1149 [7:59:33<2:30:18, 33.40s/it] 77%|███████▋ | 880/1149 [8:00:04<2:26:42, 32.72s/it] {'loss': 0.3476, 'learning_rate': 0.0001, 'global_step': 880, 'epoch': 2.3} | |
| 77%|███████▋ | 880/1149 [8:00:04<2:26:42, 32.72s/it] 77%|███████▋ | 881/1149 [8:00:38<2:26:40, 32.84s/it] 77%|███████▋ | 882/1149 [8:01:10<2:25:31, 32.70s/it] 77%|███████▋ | 883/1149 [8:01:42<2:24:13, 32.53s/it] 77%|███████▋ | 884/1149 [8:02:13<2:21:28, 32.03s/it] 77%|███████▋ | 885/1149 [8:02:46<2:21:57, 32.26s/it] 77%|███████▋ | 886/1149 [8:03:20<2:24:01, 32.86s/it] 77%|███████▋ | 887/1149 [8:03:49<2:18:46, 31.78s/it] 77%|███████▋ | 888/1149 [8:04:20<2:17:23, 31.58s/it] 77%|███████▋ | 889/1149 [8:04:54<2:19:32, 32.20s/it] 77%|███████▋ | 890/1149 [8:05:26<2:18:23, 32.06s/it] 78%|███████▊ | 891/1149 [8:05:55<2:13:48, 31.12s/it] 78%|███████▊ | 892/1149 [8:06:26<2:13:03, 31.06s/it] 78%|███████▊ | 893/1149 [8:06:57<2:12:27, 31.05s/it] 78%|███████▊ | 894/1149 [8:07:29<2:13:42, 31.46s/it] 78%|███████▊ | 895/1149 [8:08:03<2:16:35, 32.26s/it] 78%|███████▊ | 896/1149 [8:08:36<2:16:11, 32.30s/it] 78%|███████▊ | 897/1149 [8:09:06<2:13:18, 31.74s/it] 78%|███████▊ | 898/1149 [8:09:40<2:15:05, 32.29s/it] 78%|███████▊ | 899/1149 [8:10:11<2:14:01, 32.16s/it] 78%|███████▊ | 900/1149 [8:10:47<2:17:30, 33.13s/it] {'loss': 0.3041, 'learning_rate': 0.0001, 'global_step': 900, 'epoch': 2.35} | |
| 78%|███████▊ | 900/1149 [8:10:47<2:17:30, 33.13s/it] 78%|███████▊ | 901/1149 [8:11:24<2:22:11, 34.40s/it] 79%|███████▊ | 902/1149 [8:11:53<2:14:26, 32.66s/it] 79%|███████▊ | 903/1149 [8:12:26<2:14:39, 32.84s/it] 79%|███████▊ | 904/1149 [8:13:01<2:17:02, 33.56s/it] 79%|███████▉ | 905/1149 [8:13:31<2:12:14, 32.52s/it] 79%|███████▉ | 906/1149 [8:14:06<2:14:36, 33.24s/it] 79%|███████▉ | 907/1149 [8:14:35<2:08:57, 31.97s/it] 79%|███████▉ | 908/1149 [8:15:08<2:09:13, 32.17s/it] 79%|███████▉ | 909/1149 [8:15:41<2:09:11, 32.30s/it] 79%|███████▉ | 910/1149 [8:16:15<2:10:37, 32.79s/it] 79%|███████▉ | 911/1149 [8:16:46<2:07:59, 32.27s/it] 79%|███████▉ | 912/1149 [8:17:21<2:11:01, 33.17s/it] 79%|███████▉ | 913/1149 [8:17:54<2:10:47, 33.25s/it] 80%|███████▉ | 914/1149 [8:18:26<2:08:41, 32.86s/it] 80%|███████▉ | 915/1149 [8:18:55<2:03:23, 31.64s/it] 80%|███████▉ | 916/1149 [8:19:25<2:00:48, 31.11s/it] 80%|███████▉ | 917/1149 [8:20:00<2:05:11, 32.38s/it] 80%|███████▉ | 918/1149 [8:20:28<1:58:56, 30.89s/it] 80%|███████▉ | 919/1149 [8:21:03<2:03:31, 32.22s/it] 80%|████████ | 920/1149 [8:21:34<2:01:10, 31.75s/it] {'loss': 0.3037, 'learning_rate': 0.0001, 'global_step': 920, 'epoch': 2.4} | |
| 80%|████████ | 920/1149 [8:21:34<2:01:10, 31.75s/it] 80%|████████ | 921/1149 [8:22:09<2:04:50, 32.85s/it] 80%|████████ | 922/1149 [8:22:44<2:07:02, 33.58s/it] 80%|████████ | 923/1149 [8:23:15<2:02:41, 32.57s/it] 80%|████████ | 924/1149 [8:23:50<2:05:00, 33.34s/it] 81%|████████ | 925/1149 [8:24:25<2:06:35, 33.91s/it] 81%|████████ | 926/1149 [8:24:57<2:04:01, 33.37s/it] 81%|████████ | 927/1149 [8:25:30<2:03:02, 33.25s/it] 81%|████████ | 928/1149 [8:26:03<2:01:46, 33.06s/it] 81%|████████ | 929/1149 [8:26:38<2:03:15, 33.62s/it] 81%|████████ | 930/1149 [8:27:09<2:00:36, 33.04s/it] 81%|████████ | 931/1149 [8:27:45<2:02:27, 33.70s/it] 81%|████████ | 932/1149 [8:28:15<1:58:32, 32.78s/it] 81%|████████ | 933/1149 [8:28:49<1:59:28, 33.19s/it] 81%|████████▏ | 934/1149 [8:29:23<1:59:51, 33.45s/it] 81%|████████▏ | 935/1149 [8:29:58<2:00:58, 33.92s/it] 81%|████████▏ | 936/1149 [8:30:34<2:01:50, 34.32s/it] 82%|████████▏ | 937/1149 [8:31:09<2:02:18, 34.62s/it] 82%|████████▏ | 938/1149 [8:31:40<1:58:19, 33.65s/it] 82%|████████▏ | 939/1149 [8:32:11<1:54:21, 32.67s/it] 82%|████████▏ | 940/1149 [8:32:42<1:52:07, 32.19s/it] {'loss': 0.2546, 'learning_rate': 0.0001, 'global_step': 940, 'epoch': 2.45} | |
| 82%|████████▏ | 940/1149 [8:32:42<1:52:07, 32.19s/it] 82%|████████▏ | 941/1149 [8:33:16<1:53:49, 32.83s/it] 82%|████████▏ | 942/1149 [8:33:50<1:54:39, 33.23s/it] 82%|████████▏ | 943/1149 [8:34:26<1:56:23, 33.90s/it] 82%|████████▏ | 944/1149 [8:35:01<1:57:08, 34.29s/it] 82%|████████▏ | 945/1149 [8:35:32<1:53:01, 33.24s/it] 82%|████████▏ | 946/1149 [8:36:05<1:52:12, 33.16s/it] 82%|████████▏ | 947/1149 [8:36:40<1:53:52, 33.83s/it] 83%|████████▎ | 948/1149 [8:37:14<1:53:30, 33.88s/it] 83%|████████▎ | 949/1149 [8:37:49<1:54:12, 34.26s/it] 83%|████████▎ | 950/1149 [8:38:22<1:51:48, 33.71s/it] 83%|████████▎ | 951/1149 [8:38:55<1:51:01, 33.64s/it] 83%|████████▎ | 952/1149 [8:39:28<1:49:37, 33.39s/it] 83%|████████▎ | 953/1149 [8:40:03<1:50:21, 33.78s/it] 83%|████████▎ | 954/1149 [8:40:35<1:48:31, 33.39s/it] 83%|████████▎ | 955/1149 [8:41:05<1:44:56, 32.45s/it] 83%|████████▎ | 956/1149 [8:41:41<1:47:00, 33.27s/it] 83%|████████▎ | 957/1149 [8:42:16<1:48:33, 33.92s/it] 83%|████████▎ | 958/1149 [8:42:51<1:49:28, 34.39s/it] 83%|████████▎ | 959/1149 [8:43:26<1:49:06, 34.46s/it] 84%|████████▎ | 960/1149 [8:43:59<1:46:51, 33.92s/it] {'loss': 0.2807, 'learning_rate': 0.0001, 'global_step': 960, 'epoch': 2.51} | |
| 84%|████████▎ | 960/1149 [8:43:59<1:46:51, 33.92s/it] 84%|████████▎ | 961/1149 [8:44:29<1:42:57, 32.86s/it] 84%|████████▎ | 962/1149 [8:45:02<1:42:45, 32.97s/it] 84%|████████▍ | 963/1149 [8:45:38<1:44:20, 33.66s/it] 84%|████████▍ | 964/1149 [8:46:13<1:45:18, 34.15s/it] 84%|████████▍ | 965/1149 [8:46:44<1:41:47, 33.19s/it] 84%|████████▍ | 966/1149 [8:47:19<1:43:04, 33.79s/it] 84%|████████▍ | 967/1149 [8:47:48<1:38:07, 32.35s/it] 84%|████████▍ | 968/1149 [8:48:18<1:35:00, 31.50s/it] 84%|████████▍ | 969/1149 [8:48:46<1:31:39, 30.55s/it] 84%|████████▍ | 970/1149 [8:49:20<1:34:26, 31.66s/it] 85%|████████▍ | 971/1149 [8:49:55<1:37:09, 32.75s/it] 85%|████████▍ | 972/1149 [8:50:31<1:38:53, 33.52s/it] 85%|████████▍ | 973/1149 [8:50:59<1:33:16, 31.80s/it] 85%|████████▍ | 974/1149 [8:51:34<1:35:52, 32.87s/it] 85%|████████▍ | 975/1149 [8:52:07<1:35:06, 32.80s/it] 85%|████████▍ | 976/1149 [8:52:40<1:34:55, 32.92s/it] 85%|████████▌ | 977/1149 [8:53:11<1:32:37, 32.31s/it] 85%|████████▌ | 978/1149 [8:53:42<1:31:06, 31.97s/it] 85%|████████▌ | 979/1149 [8:54:17<1:33:31, 33.01s/it] 85%|████████▌ | 980/1149 [8:54:47<1:30:22, 32.08s/it] {'loss': 0.2338, 'learning_rate': 0.0001, 'global_step': 980, 'epoch': 2.56} | |
| 85%|████████▌ | 980/1149 [8:54:47<1:30:22, 32.08s/it] 85%|████████▌ | 981/1149 [8:55:22<1:32:06, 32.90s/it] 85%|████████▌ | 982/1149 [8:55:57<1:33:33, 33.61s/it] 86%|████████▌ | 983/1149 [8:56:30<1:32:35, 33.47s/it] 86%|████████▌ | 984/1149 [8:57:02<1:30:29, 32.91s/it] 86%|████████▌ | 985/1149 [8:57:32<1:27:49, 32.13s/it] 86%|████████▌ | 986/1149 [8:58:01<1:24:41, 31.18s/it] 86%|████████▌ | 987/1149 [8:58:37<1:27:33, 32.43s/it] 86%|████████▌ | 988/1149 [8:59:11<1:28:40, 33.04s/it] 86%|████████▌ | 989/1149 [8:59:46<1:29:55, 33.72s/it] 86%|████████▌ | 990/1149 [9:00:20<1:29:22, 33.72s/it] 86%|████████▌ | 991/1149 [9:00:55<1:30:02, 34.19s/it] 86%|████████▋ | 992/1149 [9:01:27<1:27:37, 33.48s/it] 86%|████████▋ | 993/1149 [9:01:56<1:23:31, 32.12s/it] 87%|████████▋ | 994/1149 [9:02:27<1:21:40, 31.61s/it] 87%|████████▋ | 995/1149 [9:02:58<1:20:55, 31.53s/it] 87%|████████▋ | 996/1149 [9:03:30<1:20:56, 31.74s/it] 87%|████████▋ | 997/1149 [9:04:05<1:22:59, 32.76s/it] 87%|████████▋ | 998/1149 [9:04:33<1:18:29, 31.19s/it] 87%|████████▋ | 999/1149 [9:05:06<1:19:39, 31.86s/it] 87%|████████▋ | 1000/1149 [9:05:39<1:19:29, 32.01s/it] {'loss': 0.3036, 'learning_rate': 0.0001, 'global_step': 1000, 'epoch': 2.61} | |
| 87%|████████▋ | 1000/1149 [9:05:39<1:19:29, 32.01s/it] 87%|████████▋ | 1001/1149 [9:06:12<1:20:01, 32.44s/it] 87%|████████▋ | 1002/1149 [9:06:44<1:18:48, 32.17s/it] 87%|████████▋ | 1003/1149 [9:07:16<1:18:14, 32.15s/it] 87%|████████▋ | 1004/1149 [9:07:51<1:19:59, 33.10s/it] 87%|████████▋ | 1005/1149 [9:08:26<1:21:01, 33.76s/it] 88%|████████▊ | 1006/1149 [9:09:01<1:21:24, 34.16s/it] 88%|████████▊ | 1007/1149 [9:09:32<1:18:28, 33.16s/it] 88%|████████▊ | 1008/1149 [9:10:04<1:16:39, 32.62s/it] 88%|████████▊ | 1009/1149 [9:10:35<1:15:34, 32.39s/it] 88%|████████▊ | 1010/1149 [9:11:07<1:14:35, 32.20s/it] 88%|████████▊ | 1011/1149 [9:11:41<1:15:08, 32.67s/it] 88%|████████▊ | 1012/1149 [9:12:16<1:16:23, 33.45s/it] 88%|████████▊ | 1013/1149 [9:12:48<1:14:41, 32.95s/it] 88%|████████▊ | 1014/1149 [9:13:17<1:11:14, 31.66s/it] 88%|████████▊ | 1015/1149 [9:13:51<1:12:15, 32.35s/it] 88%|████████▊ | 1016/1149 [9:14:26<1:13:27, 33.14s/it] 89%|████████▊ | 1017/1149 [9:14:56<1:11:03, 32.30s/it] 89%|████████▊ | 1018/1149 [9:15:25<1:08:38, 31.44s/it] 89%|████████▊ | 1019/1149 [9:15:59<1:09:24, 32.04s/it] 89%|████████▉ | 1020/1149 [9:16:31<1:08:45, 31.98s/it] {'loss': 0.2492, 'learning_rate': 0.0001, 'global_step': 1020, 'epoch': 2.66} | |
| 89%|████████▉ | 1020/1149 [9:16:31<1:08:45, 31.98s/it] 89%|████████▉ | 1021/1149 [9:17:05<1:09:29, 32.57s/it] 89%|████████▉ | 1022/1149 [9:17:36<1:08:08, 32.19s/it] 89%|████████▉ | 1023/1149 [9:18:11<1:09:27, 33.08s/it] 89%|████████▉ | 1024/1149 [9:18:45<1:09:35, 33.40s/it] 89%|████████▉ | 1025/1149 [9:19:20<1:09:36, 33.68s/it] 89%|████████▉ | 1026/1149 [9:19:55<1:10:08, 34.22s/it] 89%|████████▉ | 1027/1149 [9:20:26<1:07:36, 33.25s/it] 89%|████████▉ | 1028/1149 [9:20:57<1:05:27, 32.46s/it] 90%|████████▉ | 1029/1149 [9:21:28<1:04:27, 32.23s/it] 90%|████████▉ | 1030/1149 [9:21:56<1:01:01, 30.77s/it] 90%|████████▉ | 1031/1149 [9:22:31<1:03:14, 32.15s/it] 90%|████████▉ | 1032/1149 [9:23:05<1:03:39, 32.65s/it] 90%|████████▉ | 1033/1149 [9:23:36<1:02:21, 32.25s/it] 90%|████████▉ | 1034/1149 [9:24:12<1:03:32, 33.15s/it] 90%|█████████ | 1035/1149 [9:24:45<1:03:04, 33.20s/it] 90%|█████████ | 1036/1149 [9:25:19<1:03:13, 33.57s/it] 90%|█████████ | 1037/1149 [9:25:51<1:01:42, 33.06s/it] 90%|█████████ | 1038/1149 [9:26:23<1:00:18, 32.60s/it] 90%|█████████ | 1039/1149 [9:26:53<58:23, 31.85s/it] 91%|█████████ | 1040/1149 [9:27:26<58:49, 32.38s/it] {'loss': 0.2118, 'learning_rate': 0.0001, 'global_step': 1040, 'epoch': 2.72} | |
| 91%|█████████ | 1040/1149 [9:27:26<58:49, 32.38s/it] 91%|█████████ | 1041/1149 [9:28:01<59:39, 33.14s/it] 91%|█████████ | 1042/1149 [9:28:32<57:39, 32.33s/it] 91%|█████████ | 1043/1149 [9:29:02<56:16, 31.85s/it] 91%|█████████ | 1044/1149 [9:29:35<55:54, 31.95s/it] 91%|█████████ | 1045/1149 [9:30:10<56:56, 32.85s/it] 91%|█████████ | 1046/1149 [9:30:45<57:32, 33.52s/it] 91%|█████████ | 1047/1149 [9:31:17<56:08, 33.03s/it] 91%|█████████ | 1048/1149 [9:31:46<53:58, 32.07s/it] 91%|█████████▏| 1049/1149 [9:32:18<53:27, 32.07s/it] 91%|█████████▏| 1050/1149 [9:32:54<54:35, 33.08s/it] 91%|█████████▏| 1051/1149 [9:33:27<53:50, 32.96s/it] 92%|█████████▏| 1052/1149 [9:34:02<54:28, 33.70s/it] 92%|█████████▏| 1053/1149 [9:34:34<53:03, 33.16s/it] 92%|█████████▏| 1054/1149 [9:35:05<51:19, 32.42s/it] 92%|█████████▏| 1055/1149 [9:35:40<52:12, 33.32s/it] 92%|█████████▏| 1056/1149 [9:36:15<52:37, 33.95s/it] 92%|█████████▏| 1057/1149 [9:36:51<52:40, 34.35s/it] 92%|█████████▏| 1058/1149 [9:37:24<51:24, 33.90s/it] 92%|█████████▏| 1059/1149 [9:37:55<49:52, 33.25s/it] 92%|█████████▏| 1060/1149 [9:38:29<49:28, 33.36s/it] {'loss': 0.2155, 'learning_rate': 0.0001, 'global_step': 1060, 'epoch': 2.77} | |
| 92%|█████████▏| 1060/1149 [9:38:29<49:28, 33.36s/it] 92%|█████████▏| 1061/1149 [9:39:03<49:26, 33.71s/it] 92%|█████████▏| 1062/1149 [9:39:38<49:04, 33.84s/it] 93%|█████████▎| 1063/1149 [9:40:13<49:07, 34.27s/it] 93%|█████████▎| 1064/1149 [9:40:45<47:42, 33.68s/it] 93%|█████████▎| 1065/1149 [9:41:11<43:56, 31.38s/it] 93%|█████████▎| 1066/1149 [9:41:40<42:28, 30.71s/it] 93%|█████████▎| 1067/1149 [9:42:16<43:55, 32.15s/it] 93%|█████████▎| 1068/1149 [9:42:49<43:45, 32.41s/it] 93%|█████████▎| 1069/1149 [9:43:19<42:22, 31.78s/it] 93%|█████████▎| 1070/1149 [9:43:53<42:47, 32.50s/it] 93%|█████████▎| 1071/1149 [9:44:29<43:22, 33.36s/it] 93%|█████████▎| 1072/1149 [9:44:59<41:48, 32.58s/it] 93%|█████████▎| 1073/1149 [9:45:31<41:00, 32.37s/it] 93%|█████████▎| 1074/1149 [9:46:06<41:09, 32.93s/it] 94%|█████████▎| 1075/1149 [9:46:41<41:28, 33.63s/it] 94%|█████████▎| 1076/1149 [9:47:15<41:12, 33.87s/it] 94%|█████████▎| 1077/1149 [9:47:45<39:10, 32.65s/it] 94%|█████████▍| 1078/1149 [9:48:14<37:27, 31.65s/it] 94%|█████████▍| 1079/1149 [9:48:50<38:09, 32.70s/it] 94%|█████████▍| 1080/1149 [9:49:20<36:49, 32.02s/it] {'loss': 0.2032, 'learning_rate': 0.0001, 'global_step': 1080, 'epoch': 2.82} | |
| 94%|█████████▍| 1080/1149 [9:49:20<36:49, 32.02s/it] 94%|█████████▍| 1081/1149 [9:49:55<37:15, 32.88s/it] 94%|█████████▍| 1082/1149 [9:50:26<35:58, 32.22s/it] 94%|█████████▍| 1083/1149 [9:50:55<34:40, 31.53s/it] 94%|█████████▍| 1084/1149 [9:51:27<34:02, 31.43s/it] 94%|█████████▍| 1085/1149 [9:51:56<32:56, 30.89s/it] 95%|█████████▍| 1086/1149 [9:52:32<33:49, 32.21s/it] 95%|█████████▍| 1087/1149 [9:53:02<32:41, 31.63s/it] 95%|█████████▍| 1088/1149 [9:53:33<31:59, 31.46s/it] 95%|█████████▍| 1089/1149 [9:54:08<32:31, 32.53s/it] 95%|█████████▍| 1090/1149 [9:54:38<31:18, 31.83s/it] 95%|█████████▍| 1091/1149 [9:55:13<31:44, 32.84s/it] 95%|█████████▌| 1092/1149 [9:55:49<31:54, 33.58s/it] 95%|█████████▌| 1093/1149 [9:56:21<30:56, 33.15s/it] 95%|█████████▌| 1094/1149 [9:56:55<30:48, 33.61s/it] 95%|█████████▌| 1095/1149 [9:57:29<30:16, 33.63s/it] 95%|█████████▌| 1096/1149 [9:58:01<29:09, 33.00s/it] 95%|█████████▌| 1097/1149 [9:58:36<29:14, 33.75s/it] 96%|█████████▌| 1098/1149 [9:59:11<29:04, 34.20s/it] 96%|█████████▌| 1099/1149 [9:59:47<28:46, 34.53s/it] 96%|█████████▌| 1100/1149 [10:00:18<27:27, 33.62s/it] {'loss': 0.1983, 'learning_rate': 0.0001, 'global_step': 1100, 'epoch': 2.87} | |
| 96%|█████████▌| 1100/1149 [10:00:18<27:27, 33.62s/it] 96%|█████████▌| 1101/1149 [10:00:52<27:01, 33.78s/it] 96%|█████████▌| 1102/1149 [10:01:28<26:50, 34.27s/it] 96%|█████████▌| 1103/1149 [10:02:01<25:59, 33.90s/it] 96%|█████████▌| 1104/1149 [10:02:34<25:12, 33.61s/it] 96%|█████████▌| 1105/1149 [10:03:04<24:00, 32.74s/it] 96%|█████████▋| 1106/1149 [10:03:38<23:43, 33.11s/it] 96%|█████████▋| 1107/1149 [10:04:09<22:33, 32.23s/it] 96%|█████████▋| 1108/1149 [10:04:44<22:41, 33.21s/it] 97%|█████████▋| 1109/1149 [10:05:20<22:35, 33.88s/it] 97%|█████████▋| 1110/1149 [10:05:53<21:52, 33.66s/it] 97%|█████████▋| 1111/1149 [10:06:26<21:18, 33.64s/it] 97%|█████████▋| 1112/1149 [10:06:58<20:20, 32.98s/it] 97%|█████████▋| 1113/1149 [10:07:31<19:47, 32.98s/it] 97%|█████████▋| 1114/1149 [10:08:05<19:25, 33.30s/it] 97%|█████████▋| 1115/1149 [10:08:37<18:36, 32.84s/it] 97%|█████████▋| 1116/1149 [10:09:12<18:27, 33.57s/it] 97%|█████████▋| 1117/1149 [10:09:47<18:09, 34.06s/it] 97%|█████████▋| 1118/1149 [10:10:21<17:36, 34.07s/it] 97%|█████████▋| 1119/1149 [10:10:56<17:12, 34.42s/it] 97%|█████████▋| 1120/1149 [10:11:28<16:09, 33.44s/it] {'loss': 0.1879, 'learning_rate': 0.0001, 'global_step': 1120, 'epoch': 2.92} | |
| 97%|█████████▋| 1120/1149 [10:11:28<16:09, 33.44s/it] 98%|█████████▊| 1121/1149 [10:12:00<15:26, 33.10s/it] 98%|█████████▊| 1122/1149 [10:12:33<14:50, 32.99s/it] 98%|█████████▊| 1123/1149 [10:13:03<13:55, 32.13s/it] 98%|█████████▊| 1124/1149 [10:13:37<13:37, 32.70s/it] 98%|█████████▊| 1125/1149 [10:14:12<13:22, 33.44s/it] 98%|█████████▊| 1126/1149 [10:14:46<12:54, 33.67s/it] 98%|█████████▊| 1127/1149 [10:15:16<11:59, 32.70s/it] 98%|█████████▊| 1128/1149 [10:15:52<11:41, 33.39s/it] 98%|█████████▊| 1129/1149 [10:16:22<10:52, 32.61s/it] 98%|█████████▊| 1130/1149 [10:16:52<10:03, 31.78s/it] 98%|█████████▊| 1131/1149 [10:17:27<09:51, 32.85s/it] 99%|█████████▊| 1132/1149 [10:17:57<09:00, 31.80s/it] 99%|█████████▊| 1133/1149 [10:18:32<08:45, 32.86s/it] 99%|█████████▊| 1134/1149 [10:19:07<08:19, 33.32s/it] 99%|█████████▉| 1135/1149 [10:19:42<07:54, 33.93s/it] 99%|█████████▉| 1136/1149 [10:20:12<07:07, 32.92s/it] 99%|█████████▉| 1137/1149 [10:20:48<06:44, 33.67s/it] 99%|█████████▉| 1138/1149 [10:21:20<06:05, 33.22s/it] 99%|█████████▉| 1139/1149 [10:21:54<05:34, 33.41s/it] 99%|█████████▉| 1140/1149 [10:22:29<05:06, 34.01s/it] {'loss': 0.1938, 'learning_rate': 0.0001, 'global_step': 1140, 'epoch': 2.98} | |
| 99%|█████████▉| 1140/1149 [10:22:29<05:06, 34.01s/it] 99%|█████████▉| 1141/1149 [10:23:05<04:35, 34.44s/it] 99%|█████████▉| 1142/1149 [10:23:38<03:57, 33.95s/it] 99%|█████████▉| 1143/1149 [10:24:10<03:20, 33.43s/it] 100%|█████████▉| 1144/1149 [10:24:43<02:46, 33.35s/it] 100%|█████████▉| 1145/1149 [10:25:18<02:15, 33.93s/it] 100%|█████████▉| 1146/1149 [10:25:49<01:38, 32.86s/it] 100%|█████████▉| 1147/1149 [10:26:20<01:04, 32.35s/it] 100%|█████████▉| 1148/1149 [10:26:52<00:32, 32.40s/it] 100%|██████████| 1149/1149 [10:27:20<00:00, 30.89s/it] {'train_runtime': 37640.6404, 'train_samples_per_second': 1.221, 'train_steps_per_second': 0.031, 'train_loss': 0.6198576812229538, 'epoch': 3.0} | |
| 100%|██████████| 1149/1149 [10:27:20<00:00, 30.89s/it] 100%|██████████| 1149/1149 [10:27:20<00:00, 32.76s/it] | |
| ***** train metrics ***** | |
| epoch = 3.0 | |
| train_loss = 0.6199 | |
| train_runtime = 10:27:20.64 | |
| train_samples_per_second = 1.221 | |
| train_steps_per_second = 0.031 | |