| WARNING:torch.distributed.run: | |
| ***************************************** | |
| Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
| ***************************************** | |
| model training desc: 使用随机选择的关键句训练 | |
| 2023-12-07 13:36:09.354 | INFO | __main__:init_components:108 - Initializing components... | |
| model training desc: 使用随机选择的关键句训练 | |
| 2023-12-07 13:36:09.360 | INFO | __main__:init_components:108 - Initializing components... | |
| You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 | |
| 2023-12-07 13:36:37.046 | INFO | __main__:init_components:143 - | |
| 2023-12-07 13:36:37.046 | INFO | __main__:init_components:144 - ******************** | |
| 2023-12-07 13:36:37.046 | INFO | __main__:init_components:145 - using TechGPT-7B | |
| 2023-12-07 13:36:37.046 | INFO | __main__:init_components:146 - ******************** | |
| 2023-12-07 13:36:37.046 | INFO | __main__:init_components:147 - | |
| memory footprint of model: 5.472740173339844 GB | |
| You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 | |
| 2023-12-07 13:36:37.818 | INFO | __main__:init_components:143 - | |
| 2023-12-07 13:36:37.819 | INFO | __main__:init_components:144 - ******************** | |
| 2023-12-07 13:36:37.819 | INFO | __main__:init_components:145 - using TechGPT-7B | |
| 2023-12-07 13:36:37.819 | INFO | __main__:init_components:146 - ******************** | |
| 2023-12-07 13:36:37.819 | INFO | __main__:init_components:147 - | |
| memory footprint of model: 5.472740173339844 GB | |
| trainable params: 319,815,680 || all params: 7,447,007,232 || trainable%: 4.294553100818044 | |
| 2023-12-07 13:36:39.748 | INFO | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/NCR/random_select/ncr_random_1400_instruct/train.jsonl | |
| 2023-12-07 13:36:39.846 | INFO | component.dataset:__init__:19 - there are 15319 data in dataset | |
| 2023-12-07 13:36:39.938 | INFO | __main__:main:231 - *** starting training *** | |
| trainable params: 319,815,680 || all params: 7,447,007,232 || trainable%: 4.294553100818044 | |
| 2023-12-07 13:36:40.517 | INFO | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/NCR/random_select/ncr_random_1400_instruct/train.jsonl | |
| 2023-12-07 13:36:40.618 | INFO | component.dataset:__init__:19 - there are 15319 data in dataset | |
| 2023-12-07 13:36:40.712 | INFO | __main__:main:231 - *** starting training *** | |
| 0%| | 0/1149 [00:00<?, ?it/s] 0%| | 1/1149 [00:34<11:04:29, 34.73s/it] 0%| | 2/1149 [01:07<10:44:46, 33.73s/it] 0%| | 3/1149 [01:41<10:41:18, 33.58s/it] 0%| | 4/1149 [02:12<10:27:12, 32.87s/it] 0%| | 5/1149 [02:44<10:17:11, 32.37s/it] 1%| | 6/1149 [03:12<9:51:04, 31.03s/it] 1%| | 7/1149 [03:42<9:42:23, 30.60s/it] 1%| | 8/1149 [04:13<9:42:59, 30.66s/it] 1%| | 9/1149 [04:42<9:32:24, 30.13s/it] 1%| | 10/1149 [05:15<9:48:25, 31.00s/it] 1%| | 11/1149 [05:47<9:53:20, 31.28s/it] 1%| | 12/1149 [06:16<9:40:44, 30.65s/it] 1%| | 13/1149 [06:43<9:20:22, 29.60s/it] 1%| | 14/1149 [07:13<9:22:25, 29.73s/it] 1%|▏ | 15/1149 [07:45<9:34:21, 30.39s/it] 1%|▏ | 16/1149 [08:18<9:46:49, 31.08s/it] 1%|▏ | 17/1149 [08:50<9:56:04, 31.59s/it] 2%|▏ | 18/1149 [09:20<9:46:33, 31.12s/it] 2%|▏ | 19/1149 [09:50<9:37:04, 30.64s/it] 2%|▏ | 20/1149 [10:21<9:38:39, 30.75s/it] {'loss': 5.7095, 'learning_rate': 1.652173913043478e-05, 'global_step': 20, 'epoch': 0.05} | |
| 2%|▏ | 20/1149 [10:21<9:38:39, 30.75s/it] 2%|▏ | 21/1149 [10:54<9:52:39, 31.52s/it] 2%|▏ | 22/1149 [11:26<9:54:41, 31.66s/it] 2%|▏ | 23/1149 [11:57<9:50:44, 31.48s/it] 2%|▏ | 24/1149 [12:28<9:45:43, 31.24s/it] 2%|▏ | 25/1149 [13:01<9:54:45, 31.75s/it] 2%|▏ | 26/1149 [13:35<10:04:25, 32.29s/it] 2%|▏ | 27/1149 [14:07<10:02:09, 32.20s/it] 2%|▏ | 28/1149 [14:35<9:38:51, 30.98s/it] 3%|▎ | 29/1149 [15:09<9:59:40, 32.13s/it] 3%|▎ | 30/1149 [15:41<9:54:25, 31.87s/it] 3%|▎ | 31/1149 [16:12<9:48:58, 31.61s/it] 3%|▎ | 32/1149 [16:42<9:42:33, 31.29s/it] 3%|▎ | 33/1149 [17:13<9:38:45, 31.12s/it] 3%|▎ | 34/1149 [17:43<9:30:00, 30.67s/it] 3%|▎ | 35/1149 [18:11<9:15:39, 29.93s/it] 3%|▎ | 36/1149 [18:41<9:18:28, 30.11s/it] 3%|▎ | 37/1149 [19:13<9:28:21, 30.67s/it] 3%|▎ | 38/1149 [19:47<9:44:13, 31.55s/it] 3%|▎ | 39/1149 [20:17<9:36:22, 31.16s/it] 3%|▎ | 40/1149 [20:50<9:45:19, 31.67s/it] {'loss': 3.3215, 'learning_rate': 3.3913043478260867e-05, 'global_step': 40, 'epoch': 0.1} | |
| 3%|▎ | 40/1149 [20:50<9:45:19, 31.67s/it] 4%|▎ | 41/1149 [21:23<9:53:50, 32.16s/it] 4%|▎ | 42/1149 [21:54<9:46:45, 31.80s/it] 4%|▎ | 43/1149 [22:25<9:42:11, 31.58s/it] 4%|▍ | 44/1149 [22:58<9:49:27, 32.01s/it] 4%|▍ | 45/1149 [23:30<9:44:14, 31.75s/it] 4%|▍ | 46/1149 [24:01<9:42:52, 31.71s/it] 4%|▍ | 47/1149 [24:31<9:31:32, 31.12s/it] 4%|▍ | 48/1149 [25:00<9:19:12, 30.47s/it] 4%|▍ | 49/1149 [25:30<9:15:20, 30.29s/it] 4%|▍ | 50/1149 [25:59<9:07:30, 29.89s/it] 4%|▍ | 51/1149 [26:32<9:23:35, 30.80s/it] 5%|▍ | 52/1149 [26:59<9:06:49, 29.91s/it] 5%|▍ | 53/1149 [27:32<9:18:18, 30.56s/it] 5%|▍ | 54/1149 [28:03<9:20:28, 30.71s/it] 5%|▍ | 55/1149 [28:36<9:32:49, 31.42s/it] 5%|▍ | 56/1149 [29:06<9:26:49, 31.12s/it] 5%|▍ | 57/1149 [29:38<9:30:09, 31.33s/it] 5%|▌ | 58/1149 [30:11<9:38:47, 31.83s/it] 5%|▌ | 59/1149 [30:39<9:17:28, 30.69s/it] 5%|▌ | 60/1149 [31:09<9:14:19, 30.54s/it] {'loss': 0.7177, 'learning_rate': 4.782608695652174e-05, 'global_step': 60, 'epoch': 0.16} | |
| 5%|▌ | 60/1149 [31:09<9:14:19, 30.54s/it] 5%|▌ | 61/1149 [31:44<9:35:12, 31.72s/it] 5%|▌ | 62/1149 [32:16<9:40:47, 32.06s/it] 5%|▌ | 63/1149 [32:47<9:34:02, 31.71s/it] 6%|▌ | 64/1149 [33:19<9:35:50, 31.84s/it] 6%|▌ | 65/1149 [33:52<9:41:06, 32.16s/it] 6%|▌ | 66/1149 [34:24<9:37:47, 32.01s/it] 6%|▌ | 67/1149 [34:53<9:18:56, 30.99s/it] 6%|▌ | 68/1149 [35:23<9:13:08, 30.70s/it] 6%|▌ | 69/1149 [35:55<9:22:55, 31.27s/it] 6%|▌ | 70/1149 [36:28<9:29:14, 31.65s/it] 6%|▌ | 71/1149 [36:58<9:21:32, 31.25s/it] 6%|▋ | 72/1149 [37:33<9:37:47, 32.19s/it] 6%|▋ | 73/1149 [38:05<9:41:04, 32.40s/it] 6%|▋ | 74/1149 [38:34<9:22:01, 31.37s/it] 7%|▋ | 75/1149 [39:04<9:13:46, 30.94s/it] 7%|▋ | 76/1149 [39:35<9:10:19, 30.77s/it] 7%|▋ | 77/1149 [40:06<9:10:53, 30.83s/it] 7%|▋ | 78/1149 [40:40<9:26:42, 31.75s/it] 7%|▋ | 79/1149 [41:09<9:15:30, 31.15s/it] 7%|▋ | 80/1149 [41:38<9:01:06, 30.37s/it] {'loss': 0.7374, 'learning_rate': 6.434782608695652e-05, 'global_step': 80, 'epoch': 0.21} | |
| 7%|▋ | 80/1149 [41:38<9:01:06, 30.37s/it] 7%|▋ | 81/1149 [42:09<9:05:35, 30.65s/it] 7%|▋ | 82/1149 [42:41<9:13:57, 31.15s/it] 7%|▋ | 83/1149 [43:13<9:17:42, 31.39s/it] 7%|▋ | 84/1149 [43:47<9:27:56, 32.00s/it] 7%|▋ | 85/1149 [44:18<9:25:07, 31.87s/it] 7%|▋ | 86/1149 [44:51<9:29:23, 32.14s/it] 8%|▊ | 87/1149 [45:22<9:20:27, 31.66s/it] 8%|▊ | 88/1149 [45:55<9:28:36, 32.15s/it] 8%|▊ | 89/1149 [46:28<9:30:29, 32.29s/it] 8%|▊ | 90/1149 [47:00<9:32:03, 32.41s/it] 8%|▊ | 91/1149 [47:32<9:27:03, 32.16s/it] 8%|▊ | 92/1149 [48:06<9:38:22, 32.83s/it] 8%|▊ | 93/1149 [48:40<9:43:55, 33.18s/it] 8%|▊ | 94/1149 [49:13<9:40:31, 33.02s/it] 8%|▊ | 95/1149 [49:43<9:26:22, 32.24s/it] 8%|▊ | 96/1149 [50:12<9:07:35, 31.20s/it] 8%|▊ | 97/1149 [50:46<9:23:06, 32.12s/it] 9%|▊ | 98/1149 [51:20<9:31:22, 32.62s/it] 9%|▊ | 99/1149 [51:52<9:26:02, 32.35s/it] 9%|▊ | 100/1149 [52:23<9:20:36, 32.07s/it] {'loss': 0.7097, 'learning_rate': 8.173913043478262e-05, 'global_step': 100, 'epoch': 0.26} | |
| 9%|▊ | 100/1149 [52:23<9:20:36, 32.07s/it] 9%|▉ | 101/1149 [52:50<8:50:08, 30.35s/it] 9%|▉ | 102/1149 [53:21<8:56:45, 30.76s/it] 9%|▉ | 103/1149 [53:54<9:06:06, 31.33s/it] 9%|▉ | 104/1149 [54:28<9:18:49, 32.09s/it] 9%|▉ | 105/1149 [55:01<9:21:52, 32.29s/it] 9%|▉ | 106/1149 [55:32<9:16:40, 32.02s/it] 9%|▉ | 107/1149 [56:05<9:20:41, 32.29s/it] 9%|▉ | 108/1149 [56:34<9:04:09, 31.36s/it] 9%|▉ | 109/1149 [57:06<9:04:15, 31.40s/it] 10%|▉ | 110/1149 [57:38<9:07:04, 31.59s/it] 10%|▉ | 111/1149 [58:08<9:00:00, 31.21s/it] 10%|▉ | 112/1149 [58:38<8:51:58, 30.78s/it] 10%|▉ | 113/1149 [59:08<8:50:52, 30.75s/it] 10%|▉ | 114/1149 [59:35<8:27:14, 29.41s/it] 10%|█ | 115/1149 [1:00:08<8:48:14, 30.65s/it] 10%|█ | 116/1149 [1:00:43<9:09:01, 31.89s/it] 10%|█ | 117/1149 [1:01:11<8:47:05, 30.64s/it] 10%|█ | 118/1149 [1:01:42<8:48:10, 30.74s/it] 10%|█ | 119/1149 [1:02:09<8:31:05, 29.77s/it] 10%|█ | 120/1149 [1:02:39<8:30:44, 29.78s/it] {'loss': 0.7031, 'learning_rate': 9.91304347826087e-05, 'global_step': 120, 'epoch': 0.31} | |
| 10%|█ | 120/1149 [1:02:39<8:30:44, 29.78s/it] 11%|█ | 121/1149 [1:03:09<8:28:44, 29.69s/it] 11%|█ | 122/1149 [1:03:37<8:20:18, 29.23s/it] 11%|█ | 123/1149 [1:04:08<8:31:16, 29.90s/it] 11%|█ | 124/1149 [1:04:41<8:44:24, 30.70s/it] 11%|█ | 125/1149 [1:05:16<9:06:01, 31.99s/it] 11%|█ | 126/1149 [1:05:43<8:42:40, 30.66s/it] 11%|█ | 127/1149 [1:06:13<8:35:58, 30.29s/it] 11%|█ | 128/1149 [1:06:42<8:29:06, 29.92s/it] 11%|█ | 129/1149 [1:07:13<8:34:02, 30.24s/it] 11%|█▏ | 130/1149 [1:07:44<8:37:49, 30.49s/it] 11%|█▏ | 131/1149 [1:08:16<8:44:12, 30.90s/it] 11%|█▏ | 132/1149 [1:08:45<8:36:16, 30.46s/it] 12%|█▏ | 133/1149 [1:09:17<8:41:17, 30.78s/it] 12%|█▏ | 134/1149 [1:09:52<9:02:44, 32.08s/it] 12%|█▏ | 135/1149 [1:10:21<8:48:10, 31.25s/it] 12%|█▏ | 136/1149 [1:10:54<8:57:23, 31.83s/it] 12%|█▏ | 137/1149 [1:11:23<8:43:07, 31.02s/it] 12%|█▏ | 138/1149 [1:11:57<8:57:02, 31.87s/it] 12%|█▏ | 139/1149 [1:12:26<8:41:02, 30.95s/it] 12%|█▏ | 140/1149 [1:13:01<8:59:20, 32.07s/it] {'loss': 0.7144, 'learning_rate': 0.0001, 'global_step': 140, 'epoch': 0.37} | |
| 12%|█▏ | 140/1149 [1:13:01<8:59:20, 32.07s/it] 12%|█▏ | 141/1149 [1:13:34<9:04:56, 32.44s/it] 12%|█▏ | 142/1149 [1:14:06<8:59:31, 32.15s/it] 12%|█▏ | 143/1149 [1:14:38<9:01:27, 32.29s/it] 13%|█▎ | 144/1149 [1:15:11<9:02:29, 32.39s/it] 13%|█▎ | 145/1149 [1:15:44<9:03:54, 32.50s/it] 13%|█▎ | 146/1149 [1:16:07<8:19:43, 29.89s/it] 13%|█▎ | 147/1149 [1:16:38<8:22:09, 30.07s/it] 13%|█▎ | 148/1149 [1:17:11<8:36:08, 30.94s/it] 13%|█▎ | 149/1149 [1:17:44<8:45:11, 31.51s/it] 13%|█▎ | 150/1149 [1:18:16<8:48:24, 31.74s/it] 13%|█▎ | 151/1149 [1:18:49<8:55:21, 32.19s/it] 13%|█▎ | 152/1149 [1:19:22<8:55:59, 32.26s/it] 13%|█▎ | 153/1149 [1:19:53<8:50:37, 31.97s/it] 13%|█▎ | 154/1149 [1:20:23<8:40:03, 31.36s/it] 13%|█▎ | 155/1149 [1:20:52<8:27:40, 30.64s/it] 14%|█▎ | 156/1149 [1:21:25<8:39:18, 31.38s/it] 14%|█▎ | 157/1149 [1:21:56<8:36:49, 31.26s/it] 14%|█▍ | 158/1149 [1:22:26<8:29:33, 30.85s/it] 14%|█▍ | 159/1149 [1:22:56<8:24:07, 30.55s/it] 14%|█▍ | 160/1149 [1:23:27<8:29:48, 30.93s/it] {'loss': 0.7025, 'learning_rate': 0.0001, 'global_step': 160, 'epoch': 0.42} | |
| 14%|█▍ | 160/1149 [1:23:28<8:29:48, 30.93s/it] 14%|█▍ | 161/1149 [1:24:01<8:44:11, 31.83s/it] 14%|█▍ | 162/1149 [1:24:36<8:55:09, 32.53s/it] 14%|█▍ | 163/1149 [1:25:06<8:42:46, 31.81s/it] 14%|█▍ | 164/1149 [1:25:39<8:48:52, 32.22s/it] 14%|█▍ | 165/1149 [1:26:08<8:31:09, 31.17s/it] 14%|█▍ | 166/1149 [1:26:41<8:40:37, 31.78s/it] 15%|█▍ | 167/1149 [1:27:14<8:46:45, 32.18s/it] 15%|█▍ | 168/1149 [1:27:47<8:48:43, 32.34s/it] 15%|█▍ | 169/1149 [1:28:16<8:32:46, 31.39s/it] 15%|█▍ | 170/1149 [1:28:48<8:36:16, 31.64s/it] 15%|█▍ | 171/1149 [1:29:21<8:44:14, 32.16s/it] 15%|█▍ | 172/1149 [1:29:57<8:59:14, 33.12s/it] 15%|█▌ | 173/1149 [1:30:28<8:48:12, 32.47s/it] 15%|█▌ | 174/1149 [1:30:59<8:40:15, 32.02s/it] 15%|█▌ | 175/1149 [1:31:29<8:30:13, 31.43s/it] 15%|█▌ | 176/1149 [1:32:00<8:29:48, 31.44s/it] 15%|█▌ | 177/1149 [1:32:34<8:38:53, 32.03s/it] 15%|█▌ | 178/1149 [1:33:09<8:54:20, 33.02s/it] 16%|█▌ | 179/1149 [1:33:38<8:34:48, 31.84s/it] 16%|█▌ | 180/1149 [1:34:10<8:34:16, 31.84s/it] {'loss': 0.7132, 'learning_rate': 0.0001, 'global_step': 180, 'epoch': 0.47} | |
| 16%|█▌ | 180/1149 [1:34:10<8:34:16, 31.84s/it] 16%|█▌ | 181/1149 [1:34:41<8:32:53, 31.79s/it] 16%|█▌ | 182/1149 [1:35:13<8:30:14, 31.66s/it] 16%|█▌ | 183/1149 [1:35:43<8:22:03, 31.18s/it] 16%|█▌ | 184/1149 [1:36:12<8:11:16, 30.55s/it] 16%|█▌ | 185/1149 [1:36:46<8:27:56, 31.61s/it] 16%|█▌ | 186/1149 [1:37:16<8:21:29, 31.25s/it] 16%|█▋ | 187/1149 [1:37:49<8:26:43, 31.60s/it] 16%|█▋ | 188/1149 [1:38:22<8:31:26, 31.93s/it] 16%|█▋ | 189/1149 [1:38:50<8:14:01, 30.88s/it] 17%|█▋ | 190/1149 [1:39:21<8:14:40, 30.95s/it] 17%|█▋ | 191/1149 [1:39:53<8:18:25, 31.22s/it] 17%|█▋ | 192/1149 [1:40:19<7:52:57, 29.65s/it] 17%|█▋ | 193/1149 [1:40:52<8:06:52, 30.56s/it] 17%|█▋ | 194/1149 [1:41:22<8:05:52, 30.53s/it] 17%|█▋ | 195/1149 [1:41:57<8:25:46, 31.81s/it] 17%|█▋ | 196/1149 [1:42:29<8:25:39, 31.84s/it] 17%|█▋ | 197/1149 [1:43:03<8:36:13, 32.53s/it] 17%|█▋ | 198/1149 [1:43:32<8:16:46, 31.34s/it] 17%|█▋ | 199/1149 [1:44:02<8:11:49, 31.06s/it] 17%|█▋ | 200/1149 [1:44:37<8:29:05, 32.19s/it] {'loss': 0.6826, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 0.52} | |
| 17%|█▋ | 200/1149 [1:44:37<8:29:05, 32.19s/it] 17%|█▋ | 201/1149 [1:45:10<8:32:50, 32.46s/it] 18%|█▊ | 202/1149 [1:45:41<8:23:57, 31.93s/it] 18%|█▊ | 203/1149 [1:46:14<8:31:32, 32.44s/it] 18%|█▊ | 204/1149 [1:46:48<8:38:28, 32.92s/it] 18%|█▊ | 205/1149 [1:47:18<8:22:05, 31.91s/it] 18%|█▊ | 206/1149 [1:47:50<8:23:04, 32.01s/it] 18%|█▊ | 207/1149 [1:48:20<8:11:14, 31.29s/it] 18%|█▊ | 208/1149 [1:48:49<8:00:44, 30.65s/it] 18%|█▊ | 209/1149 [1:49:20<8:02:46, 30.81s/it] 18%|█▊ | 210/1149 [1:49:50<8:00:35, 30.71s/it] 18%|█▊ | 211/1149 [1:50:23<8:08:33, 31.25s/it] 18%|█▊ | 212/1149 [1:50:52<7:59:18, 30.69s/it] 19%|█▊ | 213/1149 [1:51:25<8:09:36, 31.39s/it] 19%|█▊ | 214/1149 [1:51:56<8:06:55, 31.25s/it] 19%|█▊ | 215/1149 [1:52:27<8:03:32, 31.06s/it] 19%|█▉ | 216/1149 [1:52:59<8:06:05, 31.26s/it] 19%|█▉ | 217/1149 [1:53:29<8:03:19, 31.12s/it] 19%|█▉ | 218/1149 [1:54:04<8:21:25, 32.32s/it] 19%|█▉ | 219/1149 [1:54:36<8:17:29, 32.10s/it] 19%|█▉ | 220/1149 [1:55:09<8:21:34, 32.39s/it] {'loss': 0.6852, 'learning_rate': 0.0001, 'global_step': 220, 'epoch': 0.57} | |
| 19%|█▉ | 220/1149 [1:55:09<8:21:34, 32.39s/it] 19%|█▉ | 221/1149 [1:55:43<8:28:56, 32.91s/it] 19%|█▉ | 222/1149 [1:56:16<8:27:26, 32.84s/it] 19%|█▉ | 223/1149 [1:56:47<8:18:36, 32.31s/it] 19%|█▉ | 224/1149 [1:57:20<8:21:44, 32.55s/it] 20%|█▉ | 225/1149 [1:57:52<8:17:23, 32.30s/it] 20%|█▉ | 226/1149 [1:58:25<8:18:49, 32.43s/it] 20%|█▉ | 227/1149 [1:58:56<8:12:43, 32.06s/it] 20%|█▉ | 228/1149 [1:59:31<8:27:50, 33.08s/it] 20%|█▉ | 229/1149 [2:00:03<8:21:04, 32.68s/it] 20%|██ | 230/1149 [2:00:32<8:02:37, 31.51s/it] 20%|██ | 231/1149 [2:01:02<7:55:52, 31.10s/it] 20%|██ | 232/1149 [2:01:33<7:56:20, 31.17s/it] 20%|██ | 233/1149 [2:02:05<7:56:40, 31.22s/it] 20%|██ | 234/1149 [2:02:35<7:54:41, 31.13s/it] 20%|██ | 235/1149 [2:03:07<7:54:05, 31.12s/it] 21%|██ | 236/1149 [2:03:39<7:58:39, 31.46s/it] 21%|██ | 237/1149 [2:04:13<8:10:44, 32.29s/it] 21%|██ | 238/1149 [2:04:43<8:01:04, 31.68s/it] 21%|██ | 239/1149 [2:05:15<8:01:50, 31.77s/it] 21%|██ | 240/1149 [2:05:47<7:59:14, 31.63s/it] {'loss': 0.6686, 'learning_rate': 0.0001, 'global_step': 240, 'epoch': 0.63} | |
| 21%|██ | 240/1149 [2:05:47<7:59:14, 31.63s/it] 21%|██ | 241/1149 [2:06:18<7:58:10, 31.60s/it] 21%|██ | 242/1149 [2:06:47<7:45:03, 30.76s/it] 21%|██ | 243/1149 [2:07:18<7:45:57, 30.86s/it] 21%|██ | 244/1149 [2:07:46<7:31:02, 29.90s/it] 21%|██▏ | 245/1149 [2:08:20<7:49:36, 31.17s/it] 21%|██▏ | 246/1149 [2:08:52<7:54:45, 31.54s/it] 21%|██▏ | 247/1149 [2:09:23<7:50:26, 31.29s/it] 22%|██▏ | 248/1149 [2:09:54<7:49:28, 31.26s/it] 22%|██▏ | 249/1149 [2:10:27<7:55:47, 31.72s/it] 22%|██▏ | 250/1149 [2:10:57<7:46:06, 31.11s/it] 22%|██▏ | 251/1149 [2:11:29<7:50:15, 31.42s/it] 22%|██▏ | 252/1149 [2:11:52<7:11:23, 28.86s/it] 22%|██▏ | 253/1149 [2:12:21<7:11:47, 28.91s/it] 22%|██▏ | 254/1149 [2:12:52<7:20:15, 29.51s/it] 22%|██▏ | 255/1149 [2:13:19<7:12:20, 29.02s/it] 22%|██▏ | 256/1149 [2:13:51<7:24:17, 29.85s/it] 22%|██▏ | 257/1149 [2:14:23<7:31:01, 30.34s/it] 22%|██▏ | 258/1149 [2:14:49<7:10:09, 28.97s/it] 23%|██▎ | 259/1149 [2:15:19<7:16:59, 29.46s/it] 23%|██▎ | 260/1149 [2:15:52<7:31:49, 30.49s/it] {'loss': 0.6625, 'learning_rate': 0.0001, 'global_step': 260, 'epoch': 0.68} | |
| 23%|██▎ | 260/1149 [2:15:53<7:31:49, 30.49s/it] 23%|██▎ | 261/1149 [2:16:25<7:44:15, 31.37s/it] 23%|██▎ | 262/1149 [2:16:58<7:49:27, 31.76s/it] 23%|██▎ | 263/1149 [2:17:32<7:58:40, 32.42s/it] 23%|██▎ | 264/1149 [2:18:04<7:56:32, 32.31s/it] 23%|██▎ | 265/1149 [2:18:37<7:57:32, 32.41s/it] 23%|██▎ | 266/1149 [2:19:11<8:04:44, 32.94s/it] 23%|██▎ | 267/1149 [2:19:44<8:05:58, 33.06s/it] 23%|██▎ | 268/1149 [2:20:13<7:45:38, 31.71s/it] 23%|██▎ | 269/1149 [2:20:43<7:36:26, 31.12s/it] 23%|██▎ | 270/1149 [2:21:14<7:35:47, 31.11s/it] 24%|██▎ | 271/1149 [2:21:45<7:35:25, 31.12s/it] 24%|██▎ | 272/1149 [2:22:18<7:43:52, 31.74s/it] 24%|██▍ | 273/1149 [2:22:51<7:47:50, 32.04s/it] 24%|██▍ | 274/1149 [2:23:24<7:52:39, 32.41s/it] 24%|██▍ | 275/1149 [2:24:00<8:07:18, 33.45s/it] 24%|██▍ | 276/1149 [2:24:30<7:51:18, 32.39s/it] 24%|██▍ | 277/1149 [2:25:03<7:52:32, 32.51s/it] 24%|██▍ | 278/1149 [2:25:36<7:56:04, 32.79s/it] 24%|██▍ | 279/1149 [2:26:12<8:11:17, 33.88s/it] 24%|██▍ | 280/1149 [2:26:47<8:13:58, 34.11s/it] {'loss': 0.6539, 'learning_rate': 0.0001, 'global_step': 280, 'epoch': 0.73} | |
| 24%|██▍ | 280/1149 [2:26:47<8:13:58, 34.11s/it] 24%|██▍ | 281/1149 [2:27:21<8:13:51, 34.14s/it] 25%|██▍ | 282/1149 [2:27:52<7:59:49, 33.21s/it] 25%|██▍ | 283/1149 [2:28:21<7:41:19, 31.96s/it] 25%|██▍ | 284/1149 [2:28:52<7:35:50, 31.62s/it] 25%|██▍ | 285/1149 [2:29:30<8:03:59, 33.61s/it] 25%|██▍ | 286/1149 [2:30:07<8:14:37, 34.39s/it] 25%|██▍ | 287/1149 [2:30:43<8:21:44, 34.92s/it] 25%|██▌ | 288/1149 [2:31:19<8:27:01, 35.33s/it] 25%|██▌ | 289/1149 [2:31:52<8:14:55, 34.53s/it] 25%|██▌ | 290/1149 [2:32:27<8:16:21, 34.67s/it] 25%|██▌ | 291/1149 [2:33:06<8:33:28, 35.91s/it] 25%|██▌ | 292/1149 [2:33:39<8:23:34, 35.26s/it] 26%|██▌ | 293/1149 [2:34:11<8:05:58, 34.06s/it] 26%|██▌ | 294/1149 [2:34:44<8:03:07, 33.90s/it] 26%|██▌ | 295/1149 [2:35:19<8:08:19, 34.31s/it] 26%|██▌ | 296/1149 [2:35:54<8:10:21, 34.49s/it] 26%|██▌ | 297/1149 [2:36:25<7:52:56, 33.31s/it] 26%|██▌ | 298/1149 [2:36:56<7:44:21, 32.74s/it] 26%|██▌ | 299/1149 [2:37:33<7:59:46, 33.87s/it] 26%|██▌ | 300/1149 [2:38:05<7:52:29, 33.39s/it] {'loss': 0.6656, 'learning_rate': 0.0001, 'global_step': 300, 'epoch': 0.78} | |
| 26%|██▌ | 300/1149 [2:38:05<7:52:29, 33.39s/it] 26%|██▌ | 301/1149 [2:38:44<8:14:58, 35.02s/it] 26%|██▋ | 302/1149 [2:39:19<8:13:43, 34.97s/it] 26%|██▋ | 303/1149 [2:39:56<8:21:32, 35.57s/it] 26%|██▋ | 304/1149 [2:40:31<8:20:43, 35.55s/it] 27%|██▋ | 305/1149 [2:41:03<8:05:18, 34.50s/it] 27%|██▋ | 306/1149 [2:41:38<8:06:13, 34.61s/it] 27%|██▋ | 307/1149 [2:42:05<7:34:01, 32.35s/it] 27%|██▋ | 308/1149 [2:42:40<7:43:37, 33.08s/it] 27%|██▋ | 309/1149 [2:43:14<7:45:44, 33.27s/it] 27%|██▋ | 310/1149 [2:43:48<7:50:23, 33.64s/it] 27%|██▋ | 311/1149 [2:44:18<7:32:35, 32.41s/it] 27%|██▋ | 312/1149 [2:44:49<7:27:37, 32.09s/it] 27%|██▋ | 313/1149 [2:45:22<7:31:45, 32.42s/it] 27%|██▋ | 314/1149 [2:45:55<7:34:08, 32.63s/it] 27%|██▋ | 315/1149 [2:46:30<7:41:46, 33.22s/it] 28%|██▊ | 316/1149 [2:47:03<7:40:57, 33.20s/it] 28%|██▊ | 317/1149 [2:47:28<7:06:58, 30.79s/it] 28%|██▊ | 318/1149 [2:48:05<7:32:22, 32.66s/it] 28%|██▊ | 319/1149 [2:48:38<7:31:22, 32.63s/it] 28%|██▊ | 320/1149 [2:49:08<7:21:51, 31.98s/it] {'loss': 0.6486, 'learning_rate': 0.0001, 'global_step': 320, 'epoch': 0.84} | |
| 28%|██▊ | 320/1149 [2:49:09<7:21:51, 31.98s/it] 28%|██▊ | 321/1149 [2:49:42<7:27:07, 32.40s/it] 28%|██▊ | 322/1149 [2:50:13<7:21:38, 32.04s/it] 28%|██▊ | 323/1149 [2:50:49<7:38:55, 33.34s/it] 28%|██▊ | 324/1149 [2:51:22<7:37:35, 33.28s/it] 28%|██▊ | 325/1149 [2:51:55<7:31:58, 32.91s/it] 28%|██▊ | 326/1149 [2:52:26<7:27:12, 32.60s/it] 28%|██▊ | 327/1149 [2:52:56<7:15:10, 31.76s/it] 29%|██▊ | 328/1149 [2:53:27<7:12:37, 31.62s/it] 29%|██▊ | 329/1149 [2:54:02<7:23:01, 32.42s/it] 29%|██▊ | 330/1149 [2:54:34<7:19:46, 32.22s/it] 29%|██▉ | 331/1149 [2:55:07<7:26:23, 32.74s/it] 29%|██▉ | 332/1149 [2:55:37<7:11:38, 31.70s/it] 29%|██▉ | 333/1149 [2:56:10<7:18:34, 32.25s/it] 29%|██▉ | 334/1149 [2:56:41<7:13:05, 31.88s/it] 29%|██▉ | 335/1149 [2:57:13<7:10:53, 31.76s/it] 29%|██▉ | 336/1149 [2:57:47<7:18:20, 32.35s/it] 29%|██▉ | 337/1149 [2:58:20<7:21:28, 32.62s/it] 29%|██▉ | 338/1149 [2:58:55<7:31:24, 33.40s/it] 30%|██▉ | 339/1149 [2:59:28<7:30:26, 33.37s/it] 30%|██▉ | 340/1149 [3:00:02<7:32:38, 33.57s/it] {'loss': 0.6383, 'learning_rate': 0.0001, 'global_step': 340, 'epoch': 0.89} | |
| 30%|██▉ | 340/1149 [3:00:03<7:32:38, 33.57s/it] 30%|██▉ | 341/1149 [3:00:35<7:27:43, 33.25s/it] 30%|██▉ | 342/1149 [3:01:06<7:18:56, 32.63s/it] 30%|██▉ | 343/1149 [3:01:37<7:12:02, 32.16s/it] 30%|██▉ | 344/1149 [3:02:14<7:29:01, 33.47s/it] 30%|███ | 345/1149 [3:02:44<7:15:46, 32.52s/it] 30%|███ | 346/1149 [3:03:18<7:20:26, 32.91s/it] 30%|███ | 347/1149 [3:03:51<7:20:15, 32.94s/it] 30%|███ | 348/1149 [3:04:23<7:16:27, 32.69s/it] 30%|███ | 349/1149 [3:04:56<7:18:38, 32.90s/it] 30%|███ | 350/1149 [3:05:31<7:25:27, 33.45s/it] 31%|███ | 351/1149 [3:06:04<7:25:01, 33.46s/it] 31%|███ | 352/1149 [3:06:35<7:14:02, 32.68s/it] 31%|███ | 353/1149 [3:07:11<7:24:01, 33.47s/it] 31%|███ | 354/1149 [3:07:44<7:24:09, 33.52s/it] 31%|███ | 355/1149 [3:08:19<7:26:49, 33.76s/it] 31%|███ | 356/1149 [3:08:50<7:18:32, 33.18s/it] 31%|███ | 357/1149 [3:09:24<7:19:20, 33.28s/it] 31%|███ | 358/1149 [3:09:58<7:23:40, 33.65s/it] 31%|███ | 359/1149 [3:10:31<7:20:36, 33.46s/it] 31%|███▏ | 360/1149 [3:11:05<7:20:41, 33.51s/it] {'loss': 0.6106, 'learning_rate': 0.0001, 'global_step': 360, 'epoch': 0.94} | |
| 31%|███▏ | 360/1149 [3:11:05<7:20:41, 33.51s/it] 31%|███▏ | 361/1149 [3:11:38<7:18:22, 33.38s/it] 32%|███▏ | 362/1149 [3:12:12<7:20:52, 33.61s/it] 32%|███▏ | 363/1149 [3:12:46<7:22:08, 33.75s/it] 32%|███▏ | 364/1149 [3:13:19<7:16:57, 33.40s/it] 32%|███▏ | 365/1149 [3:13:49<7:02:36, 32.34s/it] 32%|███▏ | 366/1149 [3:14:22<7:05:11, 32.58s/it] 32%|███▏ | 367/1149 [3:14:55<7:05:03, 32.61s/it] 32%|███▏ | 368/1149 [3:15:27<7:02:03, 32.42s/it] 32%|███▏ | 369/1149 [3:16:02<7:12:28, 33.27s/it] 32%|███▏ | 370/1149 [3:16:38<7:23:04, 34.13s/it] 32%|███▏ | 371/1149 [3:17:13<7:25:16, 34.34s/it] 32%|███▏ | 372/1149 [3:17:44<7:13:07, 33.45s/it] 32%|███▏ | 373/1149 [3:18:14<6:57:13, 32.26s/it] 33%|███▎ | 374/1149 [3:18:51<7:16:52, 33.82s/it] 33%|███▎ | 375/1149 [3:19:26<7:18:56, 34.03s/it] 33%|███▎ | 376/1149 [3:20:00<7:18:41, 34.05s/it] 33%|███▎ | 377/1149 [3:20:36<7:25:39, 34.64s/it] 33%|███▎ | 378/1149 [3:21:13<7:33:25, 35.29s/it] 33%|███▎ | 379/1149 [3:21:44<7:16:18, 34.00s/it] 33%|███▎ | 380/1149 [3:22:15<7:06:12, 33.25s/it] {'loss': 0.6263, 'learning_rate': 0.0001, 'global_step': 380, 'epoch': 0.99} | |
| 33%|███▎ | 380/1149 [3:22:15<7:06:12, 33.25s/it] 33%|███▎ | 381/1149 [3:22:45<6:52:29, 32.23s/it] 33%|███▎ | 382/1149 [3:23:17<6:50:40, 32.13s/it] 33%|███▎ | 383/1149 [3:23:52<7:03:44, 33.19s/it] 33%|███▎ | 384/1149 [3:24:26<7:04:29, 33.29s/it] 34%|███▎ | 385/1149 [3:25:03<7:16:30, 34.28s/it] 34%|███▎ | 386/1149 [3:25:36<7:13:46, 34.11s/it] 34%|███▎ | 387/1149 [3:26:11<7:16:32, 34.37s/it] 34%|███▍ | 388/1149 [3:26:43<7:05:49, 33.57s/it] 34%|███▍ | 389/1149 [3:27:14<6:54:21, 32.71s/it] 34%|███▍ | 390/1149 [3:27:47<6:54:44, 32.79s/it] 34%|███▍ | 391/1149 [3:28:19<6:53:57, 32.77s/it] 34%|███▍ | 392/1149 [3:28:52<6:51:00, 32.58s/it] 34%|███▍ | 393/1149 [3:29:27<7:02:33, 33.54s/it] 34%|███▍ | 394/1149 [3:29:59<6:56:47, 33.12s/it] 34%|███▍ | 395/1149 [3:30:30<6:46:36, 32.36s/it] 34%|███▍ | 396/1149 [3:31:00<6:38:17, 31.74s/it] 35%|███▍ | 397/1149 [3:31:34<6:43:49, 32.22s/it] 35%|███▍ | 398/1149 [3:32:09<6:56:00, 33.24s/it] 35%|███▍ | 399/1149 [3:32:46<7:06:54, 34.15s/it] 35%|███▍ | 400/1149 [3:33:21<7:10:55, 34.52s/it] {'loss': 0.6074, 'learning_rate': 0.0001, 'global_step': 400, 'epoch': 1.04} | |
| 35%|███▍ | 400/1149 [3:33:21<7:10:55, 34.52s/it] 35%|███▍ | 401/1149 [3:33:54<7:06:14, 34.19s/it] 35%|███▍ | 402/1149 [3:34:27<7:00:07, 33.74s/it] 35%|███▌ | 403/1149 [3:35:02<7:02:34, 33.99s/it] 35%|███▌ | 404/1149 [3:35:35<7:00:19, 33.85s/it] 35%|███▌ | 405/1149 [3:36:11<7:06:40, 34.41s/it] 35%|███▌ | 406/1149 [3:36:44<7:00:19, 33.94s/it] 35%|███▌ | 407/1149 [3:37:15<6:51:28, 33.27s/it] 36%|███▌ | 408/1149 [3:37:52<7:02:50, 34.24s/it] 36%|███▌ | 409/1149 [3:38:26<7:00:42, 34.11s/it] 36%|███▌ | 410/1149 [3:39:01<7:05:28, 34.54s/it] 36%|███▌ | 411/1149 [3:39:31<6:46:52, 33.08s/it] 36%|███▌ | 412/1149 [3:40:06<6:53:24, 33.66s/it] 36%|███▌ | 413/1149 [3:40:40<6:53:34, 33.72s/it] 36%|███▌ | 414/1149 [3:41:12<6:48:43, 33.37s/it] 36%|███▌ | 415/1149 [3:41:43<6:39:55, 32.69s/it] 36%|███▌ | 416/1149 [3:42:18<6:44:32, 33.11s/it] 36%|███▋ | 417/1149 [3:42:49<6:38:06, 32.63s/it] 36%|███▋ | 418/1149 [3:43:18<6:23:54, 31.51s/it] 36%|███▋ | 419/1149 [3:43:50<6:23:55, 31.55s/it] 37%|███▋ | 420/1149 [3:44:25<6:37:51, 32.75s/it] {'loss': 0.6066, 'learning_rate': 0.0001, 'global_step': 420, 'epoch': 1.1} | |
| 37%|███▋ | 420/1149 [3:44:25<6:37:51, 32.75s/it] 37%|███▋ | 421/1149 [3:44:59<6:41:17, 33.07s/it] 37%|███▋ | 422/1149 [3:45:30<6:31:26, 32.31s/it] 37%|███▋ | 423/1149 [3:46:06<6:45:52, 33.54s/it] 37%|███▋ | 424/1149 [3:46:41<6:49:13, 33.87s/it] 37%|███▋ | 425/1149 [3:47:13<6:42:06, 33.32s/it] 37%|███▋ | 426/1149 [3:47:44<6:34:52, 32.77s/it] 37%|███▋ | 427/1149 [3:48:21<6:47:29, 33.86s/it] 37%|███▋ | 428/1149 [3:48:52<6:39:36, 33.26s/it] 37%|███▋ | 429/1149 [3:49:24<6:33:50, 32.82s/it] 37%|███▋ | 430/1149 [3:49:54<6:23:20, 31.99s/it] 38%|███▊ | 431/1149 [3:50:26<6:22:21, 31.95s/it] 38%|███▊ | 432/1149 [3:50:59<6:25:38, 32.27s/it] 38%|███▊ | 433/1149 [3:51:28<6:13:57, 31.34s/it] 38%|███▊ | 434/1149 [3:52:02<6:20:33, 31.93s/it] 38%|███▊ | 435/1149 [3:52:33<6:16:54, 31.67s/it] 38%|███▊ | 436/1149 [3:53:05<6:19:00, 31.89s/it] 38%|███▊ | 437/1149 [3:53:40<6:28:08, 32.71s/it] 38%|███▊ | 438/1149 [3:54:16<6:41:22, 33.87s/it] 38%|███▊ | 439/1149 [3:54:49<6:36:39, 33.52s/it] 38%|███▊ | 440/1149 [3:55:24<6:42:09, 34.03s/it] {'loss': 0.5916, 'learning_rate': 0.0001, 'global_step': 440, 'epoch': 1.15} | |
| 38%|███▊ | 440/1149 [3:55:24<6:42:09, 34.03s/it] 38%|███▊ | 441/1149 [3:56:01<6:51:02, 34.83s/it] 38%|███▊ | 442/1149 [3:56:32<6:37:26, 33.73s/it] 39%|███▊ | 443/1149 [3:57:03<6:27:32, 32.93s/it] 39%|███▊ | 444/1149 [3:57:38<6:33:08, 33.46s/it] 39%|███▊ | 445/1149 [3:58:14<6:42:25, 34.30s/it] 39%|███▉ | 446/1149 [3:58:48<6:42:06, 34.32s/it] 39%|███▉ | 447/1149 [3:59:22<6:37:41, 33.99s/it] 39%|███▉ | 448/1149 [3:59:58<6:46:15, 34.77s/it] 39%|███▉ | 449/1149 [4:00:30<6:35:26, 33.89s/it] 39%|███▉ | 450/1149 [4:01:02<6:27:47, 33.29s/it] 39%|███▉ | 451/1149 [4:01:32<6:16:45, 32.39s/it] 39%|███▉ | 452/1149 [4:02:07<6:25:08, 33.15s/it] 39%|███▉ | 453/1149 [4:02:43<6:34:51, 34.04s/it] 40%|███▉ | 454/1149 [4:03:17<6:32:31, 33.89s/it] 40%|███▉ | 455/1149 [4:03:51<6:34:27, 34.10s/it] 40%|███▉ | 456/1149 [4:04:25<6:31:03, 33.86s/it] 40%|███▉ | 457/1149 [4:04:56<6:20:39, 33.01s/it] 40%|███▉ | 458/1149 [4:05:29<6:20:55, 33.08s/it] 40%|███▉ | 459/1149 [4:06:00<6:12:03, 32.35s/it] 40%|████ | 460/1149 [4:06:31<6:07:50, 32.03s/it] {'loss': 0.5747, 'learning_rate': 0.0001, 'global_step': 460, 'epoch': 1.2} | |
| 40%|████ | 460/1149 [4:06:31<6:07:50, 32.03s/it] 40%|████ | 461/1149 [4:07:09<6:26:43, 33.73s/it] 40%|████ | 462/1149 [4:07:39<6:13:07, 32.59s/it] 40%|████ | 463/1149 [4:08:10<6:09:28, 32.32s/it] 40%|████ | 464/1149 [4:08:43<6:10:14, 32.43s/it] 40%|████ | 465/1149 [4:09:19<6:21:30, 33.47s/it] 41%|████ | 466/1149 [4:09:54<6:27:59, 34.08s/it] 41%|████ | 467/1149 [4:10:31<6:37:45, 34.99s/it] 41%|████ | 468/1149 [4:11:07<6:37:26, 35.02s/it] 41%|████ | 469/1149 [4:11:39<6:29:58, 34.41s/it] 41%|████ | 470/1149 [4:12:10<6:17:10, 33.33s/it] 41%|████ | 471/1149 [4:12:47<6:28:32, 34.38s/it] 41%|████ | 472/1149 [4:13:21<6:24:31, 34.08s/it] 41%|████ | 473/1149 [4:13:56<6:29:36, 34.58s/it] 41%|████▏ | 474/1149 [4:14:28<6:19:41, 33.75s/it] 41%|████▏ | 475/1149 [4:15:03<6:22:20, 34.04s/it] 41%|████▏ | 476/1149 [4:15:37<6:22:33, 34.11s/it] 42%|████▏ | 477/1149 [4:16:13<6:29:26, 34.77s/it] 42%|████▏ | 478/1149 [4:16:46<6:21:39, 34.13s/it] 42%|████▏ | 479/1149 [4:17:15<6:05:00, 32.69s/it] 42%|████▏ | 480/1149 [4:17:53<6:22:47, 34.33s/it] {'loss': 0.5778, 'learning_rate': 0.0001, 'global_step': 480, 'epoch': 1.25} | |
| 42%|████▏ | 480/1149 [4:17:54<6:22:47, 34.33s/it] 42%|████▏ | 481/1149 [4:18:30<6:29:09, 34.95s/it] 42%|████▏ | 482/1149 [4:19:05<6:28:48, 34.98s/it] 42%|████▏ | 483/1149 [4:19:39<6:26:38, 34.83s/it] 42%|████▏ | 484/1149 [4:20:09<6:07:31, 33.16s/it] 42%|████▏ | 485/1149 [4:20:44<6:13:28, 33.75s/it] 42%|████▏ | 486/1149 [4:21:17<6:10:04, 33.49s/it] 42%|████▏ | 487/1149 [4:21:51<6:11:46, 33.69s/it] 42%|████▏ | 488/1149 [4:22:26<6:15:40, 34.10s/it] 43%|████▎ | 489/1149 [4:23:01<6:16:51, 34.26s/it] 43%|████▎ | 490/1149 [4:23:37<6:23:09, 34.89s/it] 43%|████▎ | 491/1149 [4:24:06<6:04:43, 33.26s/it] 43%|████▎ | 492/1149 [4:24:38<5:59:24, 32.82s/it] 43%|████▎ | 493/1149 [4:25:14<6:08:06, 33.67s/it] 43%|████▎ | 494/1149 [4:25:46<6:03:22, 33.29s/it] 43%|████▎ | 495/1149 [4:26:19<6:01:42, 33.18s/it] 43%|████▎ | 496/1149 [4:26:50<5:54:20, 32.56s/it] 43%|████▎ | 497/1149 [4:27:18<5:37:50, 31.09s/it] 43%|████▎ | 498/1149 [4:27:55<5:56:54, 32.89s/it] 43%|████▎ | 499/1149 [4:28:34<6:14:43, 34.59s/it] 44%|████▎ | 500/1149 [4:29:04<6:01:07, 33.39s/it] {'loss': 0.5881, 'learning_rate': 0.0001, 'global_step': 500, 'epoch': 1.31} | |
| 44%|████▎ | 500/1149 [4:29:04<6:01:07, 33.39s/it] 44%|████▎ | 501/1149 [4:29:36<5:57:13, 33.08s/it] 44%|████▎ | 502/1149 [4:30:06<5:44:58, 31.99s/it] 44%|████▍ | 503/1149 [4:30:38<5:43:36, 31.91s/it] 44%|████▍ | 504/1149 [4:31:09<5:40:41, 31.69s/it] 44%|████▍ | 505/1149 [4:31:37<5:29:53, 30.74s/it] 44%|████▍ | 506/1149 [4:32:09<5:32:29, 31.03s/it] 44%|████▍ | 507/1149 [4:32:42<5:37:48, 31.57s/it] 44%|████▍ | 508/1149 [4:33:17<5:49:27, 32.71s/it] 44%|████▍ | 509/1149 [4:33:48<5:42:20, 32.09s/it] 44%|████▍ | 510/1149 [4:34:21<5:44:24, 32.34s/it] 44%|████▍ | 511/1149 [4:34:53<5:43:35, 32.31s/it] 45%|████▍ | 512/1149 [4:35:28<5:49:55, 32.96s/it] 45%|████▍ | 513/1149 [4:36:02<5:54:26, 33.44s/it] 45%|████▍ | 514/1149 [4:36:37<5:59:32, 33.97s/it] 45%|████▍ | 515/1149 [4:37:10<5:55:03, 33.60s/it] 45%|████▍ | 516/1149 [4:37:43<5:52:21, 33.40s/it] 45%|████▍ | 517/1149 [4:38:18<5:57:55, 33.98s/it] 45%|████▌ | 518/1149 [4:38:48<5:44:52, 32.79s/it] 45%|████▌ | 519/1149 [4:39:25<5:56:38, 33.97s/it] 45%|████▌ | 520/1149 [4:39:57<5:50:16, 33.41s/it] {'loss': 0.5883, 'learning_rate': 0.0001, 'global_step': 520, 'epoch': 1.36} | |
| 45%|████▌ | 520/1149 [4:39:57<5:50:16, 33.41s/it] 45%|████▌ | 521/1149 [4:40:31<5:51:11, 33.55s/it] 45%|████▌ | 522/1149 [4:41:00<5:35:45, 32.13s/it] 46%|████▌ | 523/1149 [4:41:38<5:54:30, 33.98s/it] 46%|████▌ | 524/1149 [4:42:11<5:51:44, 33.77s/it] 46%|████▌ | 525/1149 [4:42:43<5:44:05, 33.09s/it] 46%|████▌ | 526/1149 [4:43:16<5:42:11, 32.96s/it] 46%|████▌ | 527/1149 [4:43:48<5:40:54, 32.89s/it] 46%|████▌ | 528/1149 [4:44:24<5:50:20, 33.85s/it] 46%|████▌ | 529/1149 [4:44:50<5:25:34, 31.51s/it] 46%|████▌ | 530/1149 [4:45:24<5:31:22, 32.12s/it] 46%|████▌ | 531/1149 [4:46:00<5:43:23, 33.34s/it] 46%|████▋ | 532/1149 [4:46:36<5:51:27, 34.18s/it] 46%|████▋ | 533/1149 [4:47:12<5:54:38, 34.54s/it] 46%|████▋ | 534/1149 [4:47:48<6:00:30, 35.17s/it] 47%|████▋ | 535/1149 [4:48:23<5:58:49, 35.06s/it] 47%|████▋ | 536/1149 [4:48:57<5:55:48, 34.83s/it] 47%|████▋ | 537/1149 [4:49:30<5:49:48, 34.29s/it] 47%|████▋ | 538/1149 [4:50:02<5:41:17, 33.51s/it] 47%|████▋ | 539/1149 [4:50:39<5:49:27, 34.37s/it] 47%|████▋ | 540/1149 [4:51:10<5:38:29, 33.35s/it] {'loss': 0.5526, 'learning_rate': 0.0001, 'global_step': 540, 'epoch': 1.41} | |
| 47%|████▋ | 540/1149 [4:51:10<5:38:29, 33.35s/it] 47%|████▋ | 541/1149 [4:51:40<5:29:49, 32.55s/it] 47%|████▋ | 542/1149 [4:52:13<5:30:05, 32.63s/it] 47%|████▋ | 543/1149 [4:52:48<5:36:49, 33.35s/it] 47%|████▋ | 544/1149 [4:53:22<5:37:58, 33.52s/it] 47%|████▋ | 545/1149 [4:53:59<5:49:23, 34.71s/it] 48%|████▊ | 546/1149 [4:54:30<5:36:26, 33.48s/it] 48%|████▊ | 547/1149 [4:55:06<5:43:57, 34.28s/it] 48%|████▊ | 548/1149 [4:55:36<5:28:36, 32.81s/it] 48%|████▊ | 549/1149 [4:56:12<5:38:45, 33.88s/it] 48%|████▊ | 550/1149 [4:56:48<5:45:37, 34.62s/it] 48%|████▊ | 551/1149 [4:57:21<5:39:14, 34.04s/it] 48%|████▊ | 552/1149 [4:57:50<5:24:19, 32.60s/it] 48%|████▊ | 553/1149 [4:58:25<5:31:46, 33.40s/it] 48%|████▊ | 554/1149 [4:59:00<5:33:57, 33.68s/it] 48%|████▊ | 555/1149 [4:59:35<5:38:01, 34.14s/it] 48%|████▊ | 556/1149 [5:00:07<5:31:57, 33.59s/it] 48%|████▊ | 557/1149 [5:00:41<5:32:41, 33.72s/it] 49%|████▊ | 558/1149 [5:01:14<5:29:51, 33.49s/it] 49%|████▊ | 559/1149 [5:01:49<5:32:49, 33.85s/it] 49%|████▊ | 560/1149 [5:02:23<5:34:02, 34.03s/it] {'loss': 0.5331, 'learning_rate': 0.0001, 'global_step': 560, 'epoch': 1.46} | |
| 49%|████▊ | 560/1149 [5:02:24<5:34:02, 34.03s/it] 49%|████▉ | 561/1149 [5:03:02<5:47:49, 35.49s/it] 49%|████▉ | 562/1149 [5:03:34<5:36:51, 34.43s/it] 49%|████▉ | 563/1149 [5:04:09<5:37:47, 34.59s/it] 49%|████▉ | 564/1149 [5:04:43<5:35:45, 34.44s/it] 49%|████▉ | 565/1149 [5:05:18<5:35:06, 34.43s/it] 49%|████▉ | 566/1149 [5:05:48<5:21:50, 33.12s/it] 49%|████▉ | 567/1149 [5:06:17<5:09:32, 31.91s/it] 49%|████▉ | 568/1149 [5:06:51<5:15:14, 32.56s/it] 50%|████▉ | 569/1149 [5:07:24<5:16:54, 32.78s/it] 50%|████▉ | 570/1149 [5:07:57<5:15:16, 32.67s/it] 50%|████▉ | 571/1149 [5:08:29<5:14:48, 32.68s/it] 50%|████▉ | 572/1149 [5:09:00<5:08:26, 32.07s/it] 50%|████▉ | 573/1149 [5:09:34<5:14:07, 32.72s/it] 50%|████▉ | 574/1149 [5:10:09<5:19:48, 33.37s/it] 50%|█████ | 575/1149 [5:10:37<5:03:31, 31.73s/it] 50%|█████ | 576/1149 [5:11:10<5:05:20, 31.97s/it] 50%|█████ | 577/1149 [5:11:43<5:09:19, 32.45s/it] 50%|█████ | 578/1149 [5:12:21<5:25:32, 34.21s/it] 50%|█████ | 579/1149 [5:12:54<5:20:20, 33.72s/it] 50%|█████ | 580/1149 [5:13:32<5:31:15, 34.93s/it] {'loss': 0.5498, 'learning_rate': 0.0001, 'global_step': 580, 'epoch': 1.51} | |
| 50%|█████ | 580/1149 [5:13:32<5:31:15, 34.93s/it] 51%|█████ | 581/1149 [5:14:00<5:12:25, 33.00s/it] 51%|█████ | 582/1149 [5:14:34<5:13:42, 33.20s/it] 51%|█████ | 583/1149 [5:15:12<5:27:47, 34.75s/it] 51%|█████ | 584/1149 [5:15:48<5:28:33, 34.89s/it] 51%|█████ | 585/1149 [5:16:21<5:23:22, 34.40s/it] 51%|█████ | 586/1149 [5:16:56<5:24:53, 34.62s/it] 51%|█████ | 587/1149 [5:17:33<5:32:17, 35.48s/it] 51%|█████ | 588/1149 [5:18:03<5:15:17, 33.72s/it] 51%|█████▏ | 589/1149 [5:18:35<5:10:30, 33.27s/it] 51%|█████▏ | 590/1149 [5:19:07<5:06:22, 32.89s/it] 51%|█████▏ | 591/1149 [5:19:39<5:03:37, 32.65s/it] 52%|█████▏ | 592/1149 [5:20:14<5:08:13, 33.20s/it] 52%|█████▏ | 593/1149 [5:20:45<5:01:54, 32.58s/it] 52%|█████▏ | 594/1149 [5:21:21<5:10:15, 33.54s/it] 52%|█████▏ | 595/1149 [5:21:50<4:58:01, 32.28s/it] 52%|█████▏ | 596/1149 [5:22:26<5:08:40, 33.49s/it] 52%|█████▏ | 597/1149 [5:22:57<5:01:00, 32.72s/it] 52%|█████▏ | 598/1149 [5:23:31<5:03:00, 33.00s/it] 52%|█████▏ | 599/1149 [5:24:04<5:01:15, 32.87s/it] 52%|█████▏ | 600/1149 [5:24:34<4:54:42, 32.21s/it] {'loss': 0.5424, 'learning_rate': 0.0001, 'global_step': 600, 'epoch': 1.57} | |
| 52%|█████▏ | 600/1149 [5:24:35<4:54:42, 32.21s/it] 52%|█████▏ | 601/1149 [5:25:18<5:25:56, 35.69s/it] 52%|█████▏ | 602/1149 [5:25:49<5:13:37, 34.40s/it] 52%|█████▏ | 603/1149 [5:26:26<5:18:33, 35.01s/it] 53%|█████▎ | 604/1149 [5:27:04<5:25:23, 35.82s/it] 53%|█████▎ | 605/1149 [5:27:40<5:25:14, 35.87s/it] 53%|█████▎ | 606/1149 [5:28:14<5:20:28, 35.41s/it] 53%|█████▎ | 607/1149 [5:28:50<5:22:53, 35.74s/it] 53%|█████▎ | 608/1149 [5:29:22<5:11:21, 34.53s/it] 53%|█████▎ | 609/1149 [5:29:55<5:06:01, 34.00s/it] 53%|█████▎ | 610/1149 [5:30:29<5:05:55, 34.05s/it] 53%|█████▎ | 611/1149 [5:31:04<5:08:55, 34.45s/it] 53%|█████▎ | 612/1149 [5:31:39<5:09:58, 34.63s/it] 53%|█████▎ | 613/1149 [5:32:11<5:01:16, 33.72s/it] 53%|█████▎ | 614/1149 [5:32:44<4:58:23, 33.46s/it] 54%|█████▎ | 615/1149 [5:33:18<5:00:18, 33.74s/it] 54%|█████▎ | 616/1149 [5:33:53<5:01:42, 33.96s/it] 54%|█████▎ | 617/1149 [5:34:27<5:01:24, 33.99s/it] 54%|█████▍ | 618/1149 [5:34:58<4:53:05, 33.12s/it] 54%|█████▍ | 619/1149 [5:35:33<4:58:51, 33.83s/it] 54%|█████▍ | 620/1149 [5:36:11<5:08:45, 35.02s/it] {'loss': 0.5117, 'learning_rate': 0.0001, 'global_step': 620, 'epoch': 1.62} | |
| 54%|█████▍ | 620/1149 [5:36:11<5:08:45, 35.02s/it] 54%|█████▍ | 621/1149 [5:36:45<5:03:36, 34.50s/it] 54%|█████▍ | 622/1149 [5:37:20<5:05:02, 34.73s/it] 54%|█████▍ | 623/1149 [5:37:54<5:03:51, 34.66s/it] 54%|█████▍ | 624/1149 [5:38:29<5:03:19, 34.67s/it] 54%|█████▍ | 625/1149 [5:38:59<4:50:22, 33.25s/it] 54%|█████▍ | 626/1149 [5:39:33<4:52:10, 33.52s/it] 55%|█████▍ | 627/1149 [5:40:04<4:43:38, 32.60s/it] 55%|█████▍ | 628/1149 [5:40:40<4:53:05, 33.75s/it] 55%|█████▍ | 629/1149 [5:41:12<4:49:04, 33.36s/it] 55%|█████▍ | 630/1149 [5:41:46<4:49:57, 33.52s/it] 55%|█████▍ | 631/1149 [5:42:21<4:51:35, 33.78s/it] 55%|█████▌ | 632/1149 [5:42:54<4:49:57, 33.65s/it] 55%|█████▌ | 633/1149 [5:43:27<4:46:27, 33.31s/it] 55%|█████▌ | 634/1149 [5:44:02<4:51:22, 33.95s/it] 55%|█████▌ | 635/1149 [5:44:27<4:28:15, 31.32s/it] 55%|█████▌ | 636/1149 [5:44:59<4:29:31, 31.52s/it] 55%|█████▌ | 637/1149 [5:45:33<4:35:27, 32.28s/it] 56%|█████▌ | 638/1149 [5:46:04<4:30:55, 31.81s/it] 56%|█████▌ | 639/1149 [5:46:36<4:31:01, 31.89s/it] 56%|█████▌ | 640/1149 [5:47:11<4:37:32, 32.72s/it] {'loss': 0.4956, 'learning_rate': 0.0001, 'global_step': 640, 'epoch': 1.67} | |
| 56%|█████▌ | 640/1149 [5:47:11<4:37:32, 32.72s/it] 56%|█████▌ | 641/1149 [5:47:39<4:25:47, 31.39s/it] 56%|█████▌ | 642/1149 [5:48:10<4:23:15, 31.15s/it] 56%|█████▌ | 643/1149 [5:48:42<4:27:00, 31.66s/it] 56%|█████▌ | 644/1149 [5:49:19<4:38:46, 33.12s/it] 56%|█████▌ | 645/1149 [5:49:55<4:45:29, 33.99s/it] 56%|█████▌ | 646/1149 [5:50:29<4:45:04, 34.01s/it] 56%|█████▋ | 647/1149 [5:51:01<4:39:27, 33.40s/it] 56%|█████▋ | 648/1149 [5:51:34<4:37:01, 33.18s/it] 56%|█████▋ | 649/1149 [5:52:11<4:47:53, 34.55s/it] 57%|█████▋ | 650/1149 [5:52:47<4:49:02, 34.76s/it] 57%|█████▋ | 651/1149 [5:53:18<4:40:23, 33.78s/it] 57%|█████▋ | 652/1149 [5:53:49<4:33:07, 32.97s/it] 57%|█████▋ | 653/1149 [5:54:23<4:35:33, 33.33s/it] 57%|█████▋ | 654/1149 [5:54:57<4:35:06, 33.35s/it] 57%|█████▋ | 655/1149 [5:55:31<4:35:50, 33.50s/it] 57%|█████▋ | 656/1149 [5:56:03<4:33:13, 33.25s/it] 57%|█████▋ | 657/1149 [5:56:38<4:35:07, 33.55s/it] 57%|█████▋ | 658/1149 [5:57:14<4:42:05, 34.47s/it] 57%|█████▋ | 659/1149 [5:57:44<4:30:44, 33.15s/it] 57%|█████▋ | 660/1149 [5:58:17<4:29:04, 33.01s/it] {'loss': 0.4815, 'learning_rate': 0.0001, 'global_step': 660, 'epoch': 1.72} | |
| 57%|█████▋ | 660/1149 [5:58:17<4:29:04, 33.01s/it] 58%|█████▊ | 661/1149 [5:58:50<4:29:22, 33.12s/it] 58%|█████▊ | 662/1149 [5:59:27<4:37:08, 34.14s/it] 58%|█████▊ | 663/1149 [6:00:02<4:38:39, 34.40s/it] 58%|█████▊ | 664/1149 [6:00:36<4:38:00, 34.39s/it] 58%|█████▊ | 665/1149 [6:01:08<4:30:14, 33.50s/it] 58%|█████▊ | 666/1149 [6:01:37<4:19:12, 32.20s/it] 58%|█████▊ | 667/1149 [6:02:08<4:16:03, 31.87s/it] 58%|█████▊ | 668/1149 [6:02:46<4:31:11, 33.83s/it] 58%|█████▊ | 669/1149 [6:03:23<4:36:57, 34.62s/it] 58%|█████▊ | 670/1149 [6:03:59<4:40:33, 35.14s/it] 58%|█████▊ | 671/1149 [6:04:36<4:43:28, 35.58s/it] 58%|█████▊ | 672/1149 [6:05:08<4:35:48, 34.69s/it] 59%|█████▊ | 673/1149 [6:05:43<4:35:39, 34.75s/it] 59%|█████▊ | 674/1149 [6:06:22<4:45:32, 36.07s/it] 59%|█████▊ | 675/1149 [6:06:56<4:39:03, 35.32s/it] 59%|█████▉ | 676/1149 [6:07:28<4:29:44, 34.22s/it] 59%|█████▉ | 677/1149 [6:08:01<4:27:47, 34.04s/it] 59%|█████▉ | 678/1149 [6:08:36<4:29:54, 34.38s/it] 59%|█████▉ | 679/1149 [6:09:12<4:31:29, 34.66s/it] 59%|█████▉ | 680/1149 [6:09:42<4:21:42, 33.48s/it] {'loss': 0.4809, 'learning_rate': 0.0001, 'global_step': 680, 'epoch': 1.78} | |
| 59%|█████▉ | 680/1149 [6:09:43<4:21:42, 33.48s/it] 59%|█████▉ | 681/1149 [6:10:14<4:16:18, 32.86s/it] 59%|█████▉ | 682/1149 [6:10:51<4:25:00, 34.05s/it] 59%|█████▉ | 683/1149 [6:11:23<4:21:04, 33.61s/it] 60%|█████▉ | 684/1149 [6:11:57<4:21:47, 33.78s/it] 60%|█████▉ | 685/1149 [6:12:32<4:23:49, 34.11s/it] 60%|█████▉ | 686/1149 [6:13:10<4:30:55, 35.11s/it] 60%|█████▉ | 687/1149 [6:13:46<4:31:51, 35.31s/it] 60%|█████▉ | 688/1149 [6:14:18<4:23:41, 34.32s/it] 60%|█████▉ | 689/1149 [6:14:53<4:24:57, 34.56s/it] 60%|██████ | 690/1149 [6:15:20<4:07:34, 32.36s/it] 60%|██████ | 691/1149 [6:15:55<4:12:51, 33.13s/it] 60%|██████ | 692/1149 [6:16:29<4:14:07, 33.36s/it] 60%|██████ | 693/1149 [6:17:04<4:16:53, 33.80s/it] 60%|██████ | 694/1149 [6:17:33<4:06:22, 32.49s/it] 60%|██████ | 695/1149 [6:18:04<4:03:31, 32.18s/it] 61%|██████ | 696/1149 [6:18:38<4:05:03, 32.46s/it] 61%|██████ | 697/1149 [6:19:11<4:07:01, 32.79s/it] 61%|██████ | 698/1149 [6:19:46<4:10:19, 33.30s/it] 61%|██████ | 699/1149 [6:20:22<4:16:38, 34.22s/it] 61%|██████ | 700/1149 [6:20:47<3:56:05, 31.55s/it] {'loss': 0.4641, 'learning_rate': 0.0001, 'global_step': 700, 'epoch': 1.83} | |
| 61%|██████ | 700/1149 [6:20:48<3:56:05, 31.55s/it] 61%|██████ | 701/1149 [6:21:25<4:09:25, 33.40s/it] 61%|██████ | 702/1149 [6:21:58<4:08:00, 33.29s/it] 61%|██████ | 703/1149 [6:22:30<4:03:38, 32.78s/it] 61%|██████▏ | 704/1149 [6:23:03<4:04:21, 32.95s/it] 61%|██████▏ | 705/1149 [6:23:34<4:00:14, 32.47s/it] 61%|██████▏ | 706/1149 [6:24:11<4:08:32, 33.66s/it] 62%|██████▏ | 707/1149 [6:24:44<4:06:56, 33.52s/it] 62%|██████▏ | 708/1149 [6:25:16<4:02:52, 33.04s/it] 62%|██████▏ | 709/1149 [6:25:48<3:59:41, 32.69s/it] 62%|██████▏ | 710/1149 [6:26:17<3:52:38, 31.80s/it] 62%|██████▏ | 711/1149 [6:26:49<3:51:08, 31.66s/it] 62%|██████▏ | 712/1149 [6:27:23<3:56:19, 32.45s/it] 62%|██████▏ | 713/1149 [6:27:55<3:54:23, 32.25s/it] 62%|██████▏ | 714/1149 [6:28:29<3:57:43, 32.79s/it] 62%|██████▏ | 715/1149 [6:28:58<3:49:27, 31.72s/it] 62%|██████▏ | 716/1149 [6:29:32<3:53:20, 32.33s/it] 62%|██████▏ | 717/1149 [6:30:03<3:49:51, 31.93s/it] 62%|██████▏ | 718/1149 [6:30:34<3:48:26, 31.80s/it] 63%|██████▎ | 719/1149 [6:31:08<3:51:46, 32.34s/it] 63%|██████▎ | 720/1149 [6:31:41<3:52:41, 32.54s/it] {'loss': 0.4715, 'learning_rate': 0.0001, 'global_step': 720, 'epoch': 1.88} | |
| 63%|██████▎ | 720/1149 [6:31:41<3:52:41, 32.54s/it] 63%|██████▎ | 721/1149 [6:32:16<3:57:55, 33.35s/it] 63%|██████▎ | 722/1149 [6:32:50<3:57:07, 33.32s/it] 63%|██████▎ | 723/1149 [6:33:24<3:58:26, 33.58s/it] 63%|██████▎ | 724/1149 [6:33:56<3:55:24, 33.23s/it] 63%|██████▎ | 725/1149 [6:34:27<3:50:40, 32.64s/it] 63%|██████▎ | 726/1149 [6:34:58<3:46:26, 32.12s/it] 63%|██████▎ | 727/1149 [6:35:35<3:55:16, 33.45s/it] 63%|██████▎ | 728/1149 [6:36:05<3:48:01, 32.50s/it] 63%|██████▎ | 729/1149 [6:36:39<3:50:39, 32.95s/it] 64%|██████▎ | 730/1149 [6:37:12<3:49:55, 32.93s/it] 64%|██████▎ | 731/1149 [6:37:44<3:47:43, 32.69s/it] 64%|██████▎ | 732/1149 [6:38:17<3:48:06, 32.82s/it] 64%|██████▍ | 733/1149 [6:38:52<3:51:25, 33.38s/it] 64%|██████▍ | 734/1149 [6:39:25<3:51:07, 33.42s/it] 64%|██████▍ | 735/1149 [6:39:56<3:45:11, 32.64s/it] 64%|██████▍ | 736/1149 [6:40:32<3:50:16, 33.46s/it] 64%|██████▍ | 737/1149 [6:41:05<3:50:00, 33.50s/it] 64%|██████▍ | 738/1149 [6:41:39<3:51:02, 33.73s/it] 64%|██████▍ | 739/1149 [6:42:11<3:46:28, 33.14s/it] 64%|██████▍ | 740/1149 [6:42:45<3:46:36, 33.24s/it] {'loss': 0.4011, 'learning_rate': 0.0001, 'global_step': 740, 'epoch': 1.93} | |
| 64%|██████▍ | 740/1149 [6:42:45<3:46:36, 33.24s/it] 64%|██████▍ | 741/1149 [6:43:19<3:48:45, 33.64s/it] 65%|██████▍ | 742/1149 [6:43:52<3:46:59, 33.46s/it] 65%|██████▍ | 743/1149 [6:44:26<3:47:08, 33.57s/it] 65%|██████▍ | 744/1149 [6:44:59<3:45:21, 33.39s/it] 65%|██████▍ | 745/1149 [6:45:33<3:46:20, 33.61s/it] 65%|██████▍ | 746/1149 [6:46:07<3:46:41, 33.75s/it] 65%|██████▌ | 747/1149 [6:46:40<3:44:08, 33.45s/it] 65%|██████▌ | 748/1149 [6:47:10<3:36:21, 32.37s/it] 65%|██████▌ | 749/1149 [6:47:43<3:37:22, 32.61s/it] 65%|██████▌ | 750/1149 [6:48:16<3:37:04, 32.64s/it] 65%|██████▌ | 751/1149 [6:48:48<3:35:05, 32.43s/it] 65%|██████▌ | 752/1149 [6:49:23<3:39:47, 33.22s/it] 66%|██████▌ | 753/1149 [6:49:59<3:44:52, 34.07s/it] 66%|██████▌ | 754/1149 [6:50:34<3:45:56, 34.32s/it] 66%|██████▌ | 755/1149 [6:51:05<3:39:23, 33.41s/it] 66%|██████▌ | 756/1149 [6:51:35<3:31:07, 32.23s/it] 66%|██████▌ | 757/1149 [6:52:12<3:40:40, 33.78s/it] 66%|██████▌ | 758/1149 [6:52:46<3:41:39, 34.01s/it] 66%|██████▌ | 759/1149 [6:53:21<3:41:26, 34.07s/it] 66%|██████▌ | 760/1149 [6:53:57<3:44:36, 34.64s/it] {'loss': 0.4476, 'learning_rate': 0.0001, 'global_step': 760, 'epoch': 1.98} | |
| 66%|██████▌ | 760/1149 [6:53:57<3:44:36, 34.64s/it] 66%|██████▌ | 761/1149 [6:54:34<3:48:22, 35.32s/it] 66%|██████▋ | 762/1149 [6:55:05<3:39:33, 34.04s/it] 66%|██████▋ | 763/1149 [6:55:36<3:34:21, 33.32s/it] 66%|██████▋ | 764/1149 [6:56:06<3:26:51, 32.24s/it] 67%|██████▋ | 765/1149 [6:56:38<3:25:41, 32.14s/it] 67%|██████▋ | 766/1149 [6:57:13<3:31:27, 33.13s/it] 67%|██████▋ | 767/1149 [6:57:47<3:31:31, 33.22s/it] 67%|██████▋ | 768/1149 [6:58:23<3:37:05, 34.19s/it] 67%|██████▋ | 769/1149 [6:58:57<3:35:42, 34.06s/it] 67%|██████▋ | 770/1149 [6:59:32<3:36:52, 34.33s/it] 67%|██████▋ | 771/1149 [7:00:04<3:31:21, 33.55s/it] 67%|██████▋ | 772/1149 [7:00:34<3:25:08, 32.65s/it] 67%|██████▋ | 773/1149 [7:01:07<3:25:28, 32.79s/it] 67%|██████▋ | 774/1149 [7:01:40<3:24:58, 32.80s/it] 67%|██████▋ | 775/1149 [7:02:12<3:23:09, 32.59s/it] 68%|██████▊ | 776/1149 [7:02:48<3:29:04, 33.63s/it] 68%|██████▊ | 777/1149 [7:03:20<3:25:43, 33.18s/it] 68%|██████▊ | 778/1149 [7:03:51<3:20:22, 32.41s/it] 68%|██████▊ | 779/1149 [7:04:21<3:15:40, 31.73s/it] 68%|██████▊ | 780/1149 [7:04:55<3:18:22, 32.26s/it] {'loss': 0.3754, 'learning_rate': 0.0001, 'global_step': 780, 'epoch': 2.04} | |
| 68%|██████▊ | 780/1149 [7:04:55<3:18:22, 32.26s/it] 68%|██████▊ | 781/1149 [7:05:30<3:23:51, 33.24s/it] 68%|██████▊ | 782/1149 [7:06:07<3:29:08, 34.19s/it] 68%|██████▊ | 783/1149 [7:06:42<3:31:15, 34.63s/it] 68%|██████▊ | 784/1149 [7:07:16<3:28:18, 34.24s/it] 68%|██████▊ | 785/1149 [7:07:48<3:25:12, 33.83s/it] 68%|██████▊ | 786/1149 [7:08:23<3:26:04, 34.06s/it] 68%|██████▊ | 787/1149 [7:08:57<3:24:34, 33.91s/it] 69%|██████▊ | 788/1149 [7:09:32<3:26:48, 34.37s/it] 69%|██████▊ | 789/1149 [7:10:05<3:23:54, 33.99s/it] 69%|██████▉ | 790/1149 [7:10:37<3:19:05, 33.27s/it] 69%|██████▉ | 791/1149 [7:11:13<3:24:23, 34.26s/it] 69%|██████▉ | 792/1149 [7:11:47<3:23:02, 34.12s/it] 69%|██████▉ | 793/1149 [7:12:23<3:24:58, 34.55s/it] 69%|██████▉ | 794/1149 [7:12:52<3:15:35, 33.06s/it] 69%|██████▉ | 795/1149 [7:13:27<3:18:28, 33.64s/it] 69%|██████▉ | 796/1149 [7:14:01<3:18:15, 33.70s/it] 69%|██████▉ | 797/1149 [7:14:33<3:15:24, 33.31s/it] 69%|██████▉ | 798/1149 [7:15:05<3:11:11, 32.68s/it] 70%|██████▉ | 799/1149 [7:15:39<3:13:14, 33.13s/it] 70%|██████▉ | 800/1149 [7:16:10<3:09:53, 32.65s/it] {'loss': 0.4075, 'learning_rate': 0.0001, 'global_step': 800, 'epoch': 2.09} | |
| 70%|██████▉ | 800/1149 [7:16:11<3:09:53, 32.65s/it] 70%|██████▉ | 801/1149 [7:16:39<3:02:46, 31.51s/it] 70%|██████▉ | 802/1149 [7:17:11<3:02:38, 31.58s/it] 70%|██████▉ | 803/1149 [7:17:47<3:08:56, 32.77s/it] 70%|██████▉ | 804/1149 [7:18:20<3:10:13, 33.08s/it] 70%|███████ | 805/1149 [7:18:51<3:05:15, 32.31s/it] 70%|███████ | 806/1149 [7:19:27<3:11:49, 33.55s/it] 70%|███████ | 807/1149 [7:20:02<3:13:09, 33.89s/it] 70%|███████ | 808/1149 [7:20:34<3:09:39, 33.37s/it] 70%|███████ | 809/1149 [7:21:06<3:05:50, 32.80s/it] 70%|███████ | 810/1149 [7:21:42<3:11:24, 33.88s/it] 71%|███████ | 811/1149 [7:22:14<3:07:15, 33.24s/it] 71%|███████ | 812/1149 [7:22:46<3:04:29, 32.85s/it] 71%|███████ | 813/1149 [7:23:16<2:59:21, 32.03s/it] 71%|███████ | 814/1149 [7:23:48<2:58:45, 32.02s/it] 71%|███████ | 815/1149 [7:24:21<3:00:22, 32.40s/it] 71%|███████ | 816/1149 [7:24:50<2:54:29, 31.44s/it] 71%|███████ | 817/1149 [7:25:24<2:57:04, 32.00s/it] 71%|███████ | 818/1149 [7:25:55<2:54:55, 31.71s/it] 71%|███████▏ | 819/1149 [7:26:27<2:55:34, 31.92s/it] 71%|███████▏ | 820/1149 [7:27:02<2:59:29, 32.73s/it] {'loss': 0.3348, 'learning_rate': 0.0001, 'global_step': 820, 'epoch': 2.14} | |
| 71%|███████▏ | 820/1149 [7:27:02<2:59:29, 32.73s/it] 71%|███████▏ | 821/1149 [7:27:38<3:05:10, 33.87s/it] 72%|███████▏ | 822/1149 [7:28:11<3:02:56, 33.57s/it] 72%|███████▏ | 823/1149 [7:28:46<3:05:00, 34.05s/it] 72%|███████▏ | 824/1149 [7:29:23<3:08:43, 34.84s/it] 72%|███████▏ | 825/1149 [7:29:54<3:02:10, 33.74s/it] 72%|███████▏ | 826/1149 [7:30:25<2:57:24, 32.95s/it] 72%|███████▏ | 827/1149 [7:31:00<2:59:39, 33.48s/it] 72%|███████▏ | 828/1149 [7:31:36<3:03:19, 34.27s/it] 72%|███████▏ | 829/1149 [7:32:10<3:02:46, 34.27s/it] 72%|███████▏ | 830/1149 [7:32:44<3:00:47, 34.00s/it] 72%|███████▏ | 831/1149 [7:33:20<3:04:28, 34.81s/it] 72%|███████▏ | 832/1149 [7:33:52<2:59:12, 33.92s/it] 72%|███████▏ | 833/1149 [7:34:24<2:55:22, 33.30s/it] 73%|███████▎ | 834/1149 [7:34:54<2:50:03, 32.39s/it] 73%|███████▎ | 835/1149 [7:35:29<2:53:19, 33.12s/it] 73%|███████▎ | 836/1149 [7:36:05<2:57:29, 34.02s/it] 73%|███████▎ | 837/1149 [7:36:39<2:55:49, 33.81s/it] 73%|███████▎ | 838/1149 [7:37:13<2:56:26, 34.04s/it] 73%|███████▎ | 839/1149 [7:37:46<2:54:35, 33.79s/it] 73%|███████▎ | 840/1149 [7:38:17<2:49:52, 32.98s/it] {'loss': 0.3219, 'learning_rate': 0.0001, 'global_step': 840, 'epoch': 2.19} | |
| 73%|███████▎ | 840/1149 [7:38:18<2:49:52, 32.98s/it] 73%|███████▎ | 841/1149 [7:38:51<2:49:43, 33.06s/it] 73%|███████▎ | 842/1149 [7:39:21<2:45:30, 32.35s/it] 73%|███████▎ | 843/1149 [7:39:53<2:43:19, 32.03s/it] 73%|███████▎ | 844/1149 [7:40:30<2:51:30, 33.74s/it] 74%|███████▎ | 845/1149 [7:41:00<2:45:18, 32.63s/it] 74%|███████▎ | 846/1149 [7:41:33<2:44:00, 32.48s/it] 74%|███████▎ | 847/1149 [7:42:06<2:44:13, 32.63s/it] 74%|███████▍ | 848/1149 [7:42:42<2:48:51, 33.66s/it] 74%|███████▍ | 849/1149 [7:43:17<2:51:19, 34.26s/it] 74%|███████▍ | 850/1149 [7:43:54<2:54:54, 35.10s/it] 74%|███████▍ | 851/1149 [7:44:29<2:54:15, 35.09s/it] 74%|███████▍ | 852/1149 [7:45:02<2:50:38, 34.47s/it] 74%|███████▍ | 853/1149 [7:45:33<2:44:43, 33.39s/it] 74%|███████▍ | 854/1149 [7:46:10<2:49:39, 34.51s/it] 74%|███████▍ | 855/1149 [7:46:44<2:47:24, 34.17s/it] 74%|███████▍ | 856/1149 [7:47:20<2:49:15, 34.66s/it] 75%|███████▍ | 857/1149 [7:47:51<2:44:30, 33.80s/it] 75%|███████▍ | 858/1149 [7:48:26<2:45:16, 34.08s/it] 75%|███████▍ | 859/1149 [7:49:00<2:44:55, 34.12s/it] 75%|███████▍ | 860/1149 [7:49:37<2:47:35, 34.79s/it] {'loss': 0.3434, 'learning_rate': 0.0001, 'global_step': 860, 'epoch': 2.25} | |
| 75%|███████▍ | 860/1149 [7:49:37<2:47:35, 34.79s/it] 75%|███████▍ | 861/1149 [7:50:09<2:44:04, 34.18s/it] 75%|███████▌ | 862/1149 [7:50:39<2:36:32, 32.73s/it] 75%|███████▌ | 863/1149 [7:51:17<2:43:41, 34.34s/it] 75%|███████▌ | 864/1149 [7:51:53<2:46:04, 34.96s/it] 75%|███████▌ | 865/1149 [7:52:29<2:45:53, 35.05s/it] 75%|███████▌ | 866/1149 [7:53:03<2:44:32, 34.89s/it] 75%|███████▌ | 867/1149 [7:53:32<2:36:13, 33.24s/it] 76%|███████▌ | 868/1149 [7:54:08<2:38:16, 33.79s/it] 76%|███████▌ | 869/1149 [7:54:40<2:36:28, 33.53s/it] 76%|███████▌ | 870/1149 [7:55:15<2:36:46, 33.71s/it] 76%|███████▌ | 871/1149 [7:55:49<2:37:49, 34.06s/it] 76%|███████▌ | 872/1149 [7:56:24<2:38:25, 34.32s/it] 76%|███████▌ | 873/1149 [7:57:01<2:40:49, 34.96s/it] 76%|███████▌ | 874/1149 [7:57:30<2:32:37, 33.30s/it] 76%|███████▌ | 875/1149 [7:58:02<2:29:56, 32.83s/it] 76%|███████▌ | 876/1149 [7:58:38<2:33:24, 33.72s/it] 76%|███████▋ | 877/1149 [7:59:10<2:31:04, 33.33s/it] 76%|███████▋ | 878/1149 [7:59:43<2:30:13, 33.26s/it] 77%|███████▋ | 879/1149 [8:00:14<2:26:48, 32.62s/it] 77%|███████▋ | 880/1149 [8:00:42<2:19:35, 31.13s/it] {'loss': 0.3409, 'learning_rate': 0.0001, 'global_step': 880, 'epoch': 2.3} | |
| 77%|███████▋ | 880/1149 [8:00:42<2:19:35, 31.13s/it] 77%|███████▋ | 881/1149 [8:01:19<2:27:09, 32.94s/it] 77%|███████▋ | 882/1149 [8:01:58<2:34:04, 34.62s/it] 77%|███████▋ | 883/1149 [8:02:29<2:28:22, 33.47s/it] 77%|███████▋ | 884/1149 [8:03:01<2:26:05, 33.08s/it] 77%|███████▋ | 885/1149 [8:03:30<2:20:44, 31.99s/it] 77%|███████▋ | 886/1149 [8:04:02<2:19:32, 31.84s/it] 77%|███████▋ | 887/1149 [8:04:33<2:18:32, 31.73s/it] 77%|███████▋ | 888/1149 [8:05:02<2:13:52, 30.78s/it] 77%|███████▋ | 889/1149 [8:05:33<2:14:32, 31.05s/it] 77%|███████▋ | 890/1149 [8:06:06<2:16:19, 31.58s/it] 78%|███████▊ | 891/1149 [8:06:42<2:20:36, 32.70s/it] 78%|███████▊ | 892/1149 [8:07:12<2:17:36, 32.13s/it] 78%|███████▊ | 893/1149 [8:07:45<2:17:52, 32.32s/it] 78%|███████▊ | 894/1149 [8:08:17<2:17:16, 32.30s/it] 78%|███████▊ | 895/1149 [8:08:52<2:19:25, 32.93s/it] 78%|███████▊ | 896/1149 [8:09:26<2:20:50, 33.40s/it] 78%|███████▊ | 897/1149 [8:10:02<2:22:40, 33.97s/it] 78%|███████▊ | 898/1149 [8:10:34<2:20:35, 33.61s/it] 78%|███████▊ | 899/1149 [8:11:07<2:19:07, 33.39s/it] 78%|███████▊ | 900/1149 [8:11:43<2:20:58, 33.97s/it] {'loss': 0.3337, 'learning_rate': 0.0001, 'global_step': 900, 'epoch': 2.35} | |
| 78%|███████▊ | 900/1149 [8:11:43<2:20:58, 33.97s/it] 78%|███████▊ | 901/1149 [8:12:18<2:21:47, 34.31s/it] 79%|███████▊ | 902/1149 [8:12:54<2:23:53, 34.95s/it] 79%|███████▊ | 903/1149 [8:13:26<2:19:49, 34.10s/it] 79%|███████▊ | 904/1149 [8:14:00<2:18:55, 34.02s/it] 79%|███████▉ | 905/1149 [8:14:29<2:11:58, 32.45s/it] 79%|███████▉ | 906/1149 [8:15:07<2:18:32, 34.21s/it] 79%|███████▉ | 907/1149 [8:15:40<2:16:49, 33.92s/it] 79%|███████▉ | 908/1149 [8:16:12<2:13:17, 33.18s/it] 79%|███████▉ | 909/1149 [8:16:45<2:12:06, 33.03s/it] 79%|███████▉ | 910/1149 [8:17:17<2:11:17, 32.96s/it] 79%|███████▉ | 911/1149 [8:17:53<2:14:26, 33.89s/it] 79%|███████▉ | 912/1149 [8:18:19<2:04:29, 31.52s/it] 79%|███████▉ | 913/1149 [8:18:53<2:06:30, 32.16s/it] 80%|███████▉ | 914/1149 [8:19:29<2:10:54, 33.42s/it] 80%|███████▉ | 915/1149 [8:20:06<2:13:33, 34.25s/it] 80%|███████▉ | 916/1149 [8:20:41<2:14:27, 34.62s/it] 80%|███████▉ | 917/1149 [8:21:18<2:16:26, 35.29s/it] 80%|███████▉ | 918/1149 [8:21:53<2:15:32, 35.21s/it] 80%|███████▉ | 919/1149 [8:22:27<2:14:07, 34.99s/it] 80%|████████ | 920/1149 [8:23:00<2:11:18, 34.40s/it] {'loss': 0.3103, 'learning_rate': 0.0001, 'global_step': 920, 'epoch': 2.4} | |
| 80%|████████ | 920/1149 [8:23:01<2:11:18, 34.40s/it] 80%|████████ | 921/1149 [8:23:32<2:07:28, 33.55s/it] 80%|████████ | 922/1149 [8:24:08<2:10:01, 34.37s/it] 80%|████████ | 923/1149 [8:24:39<2:05:35, 33.34s/it] 80%|████████ | 924/1149 [8:25:10<2:01:59, 32.53s/it] 81%|████████ | 925/1149 [8:25:43<2:01:59, 32.67s/it] 81%|████████ | 926/1149 [8:26:18<2:04:02, 33.37s/it] 81%|████████ | 927/1149 [8:26:52<2:04:03, 33.53s/it] 81%|████████ | 928/1149 [8:27:29<2:08:03, 34.77s/it] 81%|████████ | 929/1149 [8:28:00<2:03:06, 33.57s/it] 81%|████████ | 930/1149 [8:28:37<2:05:37, 34.42s/it] 81%|████████ | 931/1149 [8:29:06<1:59:33, 32.91s/it] 81%|████████ | 932/1149 [8:29:43<2:03:05, 34.03s/it] 81%|████████ | 933/1149 [8:30:19<2:05:09, 34.77s/it] 81%|████████▏ | 934/1149 [8:30:52<2:02:18, 34.13s/it] 81%|████████▏ | 935/1149 [8:31:21<1:56:30, 32.67s/it] 81%|████████▏ | 936/1149 [8:31:56<1:58:52, 33.49s/it] 82%|████████▏ | 937/1149 [8:32:31<1:59:13, 33.74s/it] 82%|████████▏ | 938/1149 [8:33:06<2:00:13, 34.19s/it] 82%|████████▏ | 939/1149 [8:33:38<1:57:51, 33.67s/it] 82%|████████▏ | 940/1149 [8:34:12<1:57:36, 33.77s/it] {'loss': 0.3431, 'learning_rate': 0.0001, 'global_step': 940, 'epoch': 2.45} | |
| 82%|████████▏ | 940/1149 [8:34:13<1:57:36, 33.77s/it] 82%|████████▏ | 941/1149 [8:34:46<1:56:19, 33.56s/it] 82%|████████▏ | 942/1149 [8:35:20<1:56:58, 33.90s/it] 82%|████████▏ | 943/1149 [8:35:55<1:57:02, 34.09s/it] 82%|████████▏ | 944/1149 [8:36:34<2:01:27, 35.55s/it] 82%|████████▏ | 945/1149 [8:37:06<1:57:17, 34.50s/it] 82%|████████▏ | 946/1149 [8:37:41<1:57:25, 34.71s/it] 82%|████████▏ | 947/1149 [8:38:15<1:56:20, 34.56s/it] 83%|████████▎ | 948/1149 [8:38:50<1:55:39, 34.53s/it] 83%|████████▎ | 949/1149 [8:39:20<1:50:35, 33.18s/it] 83%|████████▎ | 950/1149 [8:39:49<1:46:00, 31.96s/it] 83%|████████▎ | 951/1149 [8:40:23<1:47:33, 32.59s/it] 83%|████████▎ | 952/1149 [8:40:56<1:47:58, 32.88s/it] 83%|████████▎ | 953/1149 [8:41:29<1:46:59, 32.75s/it] 83%|████████▎ | 954/1149 [8:42:02<1:46:27, 32.76s/it] 83%|████████▎ | 955/1149 [8:42:32<1:43:53, 32.13s/it] 83%|████████▎ | 956/1149 [8:43:07<1:45:25, 32.77s/it] 83%|████████▎ | 957/1149 [8:43:42<1:47:05, 33.47s/it] 83%|████████▎ | 958/1149 [8:44:10<1:41:14, 31.80s/it] 83%|████████▎ | 959/1149 [8:44:42<1:41:26, 32.03s/it] 84%|████████▎ | 960/1149 [8:45:16<1:42:18, 32.48s/it] {'loss': 0.3256, 'learning_rate': 0.0001, 'global_step': 960, 'epoch': 2.51} | |
| 84%|████████▎ | 960/1149 [8:45:16<1:42:18, 32.48s/it] 84%|████████▎ | 961/1149 [8:45:54<1:47:23, 34.27s/it] 84%|████████▎ | 962/1149 [8:46:27<1:45:06, 33.73s/it] 84%|████████▍ | 963/1149 [8:47:04<1:48:08, 34.88s/it] 84%|████████▍ | 964/1149 [8:47:33<1:41:40, 32.98s/it] 84%|████████▍ | 965/1149 [8:48:06<1:41:36, 33.13s/it] 84%|████████▍ | 966/1149 [8:48:44<1:45:47, 34.68s/it] 84%|████████▍ | 967/1149 [8:49:20<1:45:37, 34.82s/it] 84%|████████▍ | 968/1149 [8:49:53<1:43:27, 34.30s/it] 84%|████████▍ | 969/1149 [8:50:28<1:43:31, 34.51s/it] 84%|████████▍ | 970/1149 [8:51:05<1:45:37, 35.40s/it] 85%|████████▍ | 971/1149 [8:51:35<1:39:49, 33.65s/it] 85%|████████▍ | 972/1149 [8:52:07<1:38:02, 33.23s/it] 85%|████████▍ | 973/1149 [8:52:39<1:36:16, 32.82s/it] 85%|████████▍ | 974/1149 [8:53:11<1:35:03, 32.59s/it] 85%|████████▍ | 975/1149 [8:53:45<1:36:03, 33.12s/it] 85%|████████▍ | 976/1149 [8:54:16<1:33:41, 32.50s/it] 85%|████████▌ | 977/1149 [8:54:52<1:36:03, 33.51s/it] 85%|████████▌ | 978/1149 [8:55:21<1:31:54, 32.25s/it] 85%|████████▌ | 979/1149 [8:55:58<1:34:57, 33.51s/it] 85%|████████▌ | 980/1149 [8:56:29<1:32:13, 32.74s/it] {'loss': 0.2869, 'learning_rate': 0.0001, 'global_step': 980, 'epoch': 2.56} | |
| 85%|████████▌ | 980/1149 [8:56:29<1:32:13, 32.74s/it] 85%|████████▌ | 981/1149 [8:57:03<1:32:38, 33.09s/it] 85%|████████▌ | 982/1149 [8:57:35<1:31:42, 32.95s/it] 86%|████████▌ | 983/1149 [8:58:06<1:29:16, 32.27s/it] 86%|████████▌ | 984/1149 [8:58:45<1:34:09, 34.24s/it] 86%|████████▌ | 985/1149 [8:59:16<1:31:21, 33.42s/it] 86%|████████▌ | 986/1149 [8:59:53<1:33:27, 34.40s/it] 86%|████████▌ | 987/1149 [9:00:31<1:35:25, 35.34s/it] 86%|████████▌ | 988/1149 [9:01:07<1:35:26, 35.57s/it] 86%|████████▌ | 989/1149 [9:01:41<1:33:45, 35.16s/it] 86%|████████▌ | 990/1149 [9:02:17<1:34:11, 35.55s/it] 86%|████████▌ | 991/1149 [9:02:49<1:30:34, 34.39s/it] 86%|████████▋ | 992/1149 [9:03:22<1:28:43, 33.90s/it] 86%|████████▋ | 993/1149 [9:03:56<1:28:25, 34.01s/it] 87%|████████▋ | 994/1149 [9:04:32<1:28:56, 34.43s/it] 87%|████████▋ | 995/1149 [9:05:07<1:28:50, 34.61s/it] 87%|████████▋ | 996/1149 [9:05:38<1:25:52, 33.68s/it] 87%|████████▋ | 997/1149 [9:06:11<1:24:34, 33.39s/it] 87%|████████▋ | 998/1149 [9:06:45<1:24:48, 33.70s/it] 87%|████████▋ | 999/1149 [9:07:20<1:25:00, 34.00s/it] 87%|████████▋ | 1000/1149 [9:07:54<1:24:31, 34.04s/it] {'loss': 0.2756, 'learning_rate': 0.0001, 'global_step': 1000, 'epoch': 2.61} | |
| 87%|████████▋ | 1000/1149 [9:07:54<1:24:31, 34.04s/it] 87%|████████▋ | 1001/1149 [9:08:25<1:21:45, 33.14s/it] 87%|████████▋ | 1002/1149 [9:09:01<1:23:06, 33.92s/it] 87%|████████▋ | 1003/1149 [9:09:38<1:25:13, 35.03s/it] 87%|████████▋ | 1004/1149 [9:10:12<1:23:24, 34.51s/it] 87%|████████▋ | 1005/1149 [9:10:47<1:23:21, 34.73s/it] 88%|████████▊ | 1006/1149 [9:11:22<1:22:38, 34.67s/it] 88%|████████▊ | 1007/1149 [9:11:56<1:22:09, 34.71s/it] 88%|████████▊ | 1008/1149 [9:12:26<1:18:08, 33.25s/it] 88%|████████▊ | 1009/1149 [9:13:00<1:18:14, 33.53s/it] 88%|████████▊ | 1010/1149 [9:13:31<1:15:33, 32.62s/it] 88%|████████▊ | 1011/1149 [9:14:07<1:17:41, 33.78s/it] 88%|████████▊ | 1012/1149 [9:14:40<1:16:13, 33.38s/it] 88%|████████▊ | 1013/1149 [9:15:14<1:15:55, 33.50s/it] 88%|████████▊ | 1014/1149 [9:15:48<1:15:57, 33.76s/it] 88%|████████▊ | 1015/1149 [9:16:21<1:14:59, 33.58s/it] 88%|████████▊ | 1016/1149 [9:16:54<1:13:45, 33.28s/it] 89%|████████▊ | 1017/1149 [9:17:29<1:14:32, 33.88s/it] 89%|████████▊ | 1018/1149 [9:17:54<1:08:16, 31.27s/it] 89%|████████▊ | 1019/1149 [9:18:26<1:08:12, 31.48s/it] 89%|████████▉ | 1020/1149 [9:19:00<1:09:22, 32.27s/it] {'loss': 0.2265, 'learning_rate': 0.0001, 'global_step': 1020, 'epoch': 2.66} | |
| 89%|████████▉ | 1020/1149 [9:19:00<1:09:22, 32.27s/it] 89%|████████▉ | 1021/1149 [9:19:31<1:07:47, 31.77s/it] 89%|████████▉ | 1022/1149 [9:20:03<1:07:24, 31.85s/it] 89%|████████▉ | 1023/1149 [9:20:37<1:08:35, 32.66s/it] 89%|████████▉ | 1024/1149 [9:21:06<1:05:22, 31.38s/it] 89%|████████▉ | 1025/1149 [9:21:36<1:04:22, 31.15s/it] 89%|████████▉ | 1026/1149 [9:22:09<1:04:53, 31.66s/it] 89%|████████▉ | 1027/1149 [9:22:46<1:07:19, 33.11s/it] 89%|████████▉ | 1028/1149 [9:23:22<1:08:31, 33.98s/it] 90%|████████▉ | 1029/1149 [9:23:56<1:07:57, 33.98s/it] 90%|████████▉ | 1030/1149 [9:24:28<1:06:14, 33.40s/it] 90%|████████▉ | 1031/1149 [9:25:00<1:05:14, 33.18s/it] 90%|████████▉ | 1032/1149 [9:25:38<1:07:21, 34.54s/it] 90%|████████▉ | 1033/1149 [9:26:13<1:07:10, 34.75s/it] 90%|████████▉ | 1034/1149 [9:26:45<1:04:43, 33.77s/it] 90%|█████████ | 1035/1149 [9:27:16<1:02:45, 33.03s/it] 90%|█████████ | 1036/1149 [9:27:50<1:02:51, 33.38s/it] 90%|█████████ | 1037/1149 [9:28:24<1:02:19, 33.39s/it] 90%|█████████ | 1038/1149 [9:28:58<1:02:02, 33.54s/it] 90%|█████████ | 1039/1149 [9:29:30<1:01:02, 33.29s/it] 91%|█████████ | 1040/1149 [9:30:05<1:00:59, 33.58s/it] {'loss': 0.2489, 'learning_rate': 0.0001, 'global_step': 1040, 'epoch': 2.72} | |
| 91%|█████████ | 1040/1149 [9:30:05<1:00:59, 33.58s/it] 91%|█████████ | 1041/1149 [9:30:41<1:01:56, 34.41s/it] 91%|█████████ | 1042/1149 [9:31:11<59:06, 33.15s/it] 91%|█████████ | 1043/1149 [9:31:44<58:20, 33.03s/it] 91%|█████████ | 1044/1149 [9:32:17<57:58, 33.13s/it] 91%|█████████ | 1045/1149 [9:32:54<59:09, 34.13s/it] 91%|█████████ | 1046/1149 [9:33:29<58:59, 34.37s/it] 91%|█████████ | 1047/1149 [9:34:03<58:25, 34.37s/it] 91%|█████████ | 1048/1149 [9:34:34<56:15, 33.42s/it] 91%|█████████▏| 1049/1149 [9:35:04<53:37, 32.17s/it] 91%|█████████▏| 1050/1149 [9:35:35<52:34, 31.86s/it] 91%|█████████▏| 1051/1149 [9:36:13<55:24, 33.92s/it] 92%|█████████▏| 1052/1149 [9:36:50<56:03, 34.67s/it] 92%|█████████▏| 1053/1149 [9:37:26<56:14, 35.15s/it] 92%|█████████▏| 1054/1149 [9:38:02<56:13, 35.51s/it] 92%|█████████▏| 1055/1149 [9:38:35<54:16, 34.64s/it] 92%|█████████▏| 1056/1149 [9:39:10<53:48, 34.72s/it] 92%|█████████▏| 1057/1149 [9:39:49<55:14, 36.02s/it] 92%|█████████▏| 1058/1149 [9:40:23<53:33, 35.31s/it] 92%|█████████▏| 1059/1149 [9:40:54<51:11, 34.13s/it] 92%|█████████▏| 1060/1149 [9:41:28<50:25, 33.99s/it] {'loss': 0.2473, 'learning_rate': 0.0001, 'global_step': 1060, 'epoch': 2.77} | |
| 92%|█████████▏| 1060/1149 [9:41:28<50:25, 33.99s/it] 92%|█████████▏| 1061/1149 [9:42:03<50:23, 34.35s/it] 92%|█████████▏| 1062/1149 [9:42:38<50:09, 34.59s/it] 93%|█████████▎| 1063/1149 [9:43:09<47:56, 33.45s/it] 93%|█████████▎| 1064/1149 [9:43:40<46:30, 32.83s/it] 93%|█████████▎| 1065/1149 [9:44:17<47:37, 34.02s/it] 93%|█████████▎| 1066/1149 [9:44:50<46:27, 33.59s/it] 93%|█████████▎| 1067/1149 [9:45:24<46:09, 33.78s/it] 93%|█████████▎| 1068/1149 [9:45:59<46:02, 34.11s/it] 93%|█████████▎| 1069/1149 [9:46:36<46:44, 35.06s/it] 93%|█████████▎| 1070/1149 [9:47:12<46:28, 35.30s/it] 93%|█████████▎| 1071/1149 [9:47:44<44:36, 34.31s/it] 93%|█████████▎| 1072/1149 [9:48:19<44:19, 34.54s/it] 93%|█████████▎| 1073/1149 [9:48:46<40:56, 32.33s/it] 93%|█████████▎| 1074/1149 [9:49:21<41:26, 33.15s/it] 94%|█████████▎| 1075/1149 [9:49:55<41:06, 33.33s/it] 94%|█████████▎| 1076/1149 [9:50:30<41:06, 33.79s/it] 94%|█████████▎| 1077/1149 [9:50:59<38:59, 32.49s/it] 94%|█████████▍| 1078/1149 [9:51:31<38:04, 32.18s/it] 94%|█████████▍| 1079/1149 [9:52:04<37:52, 32.47s/it] 94%|█████████▍| 1080/1149 [9:52:37<37:43, 32.80s/it] {'loss': 0.2297, 'learning_rate': 0.0001, 'global_step': 1080, 'epoch': 2.82} | |
| 94%|█████████▍| 1080/1149 [9:52:38<37:43, 32.80s/it] 94%|█████████▍| 1081/1149 [9:53:12<37:45, 33.32s/it] 94%|█████████▍| 1082/1149 [9:53:48<38:12, 34.22s/it] 94%|█████████▍| 1083/1149 [9:54:14<34:43, 31.57s/it] 94%|█████████▍| 1084/1149 [9:54:51<36:06, 33.33s/it] 94%|█████████▍| 1085/1149 [9:55:24<35:27, 33.24s/it] 95%|█████████▍| 1086/1149 [9:55:55<34:17, 32.65s/it] 95%|█████████▍| 1087/1149 [9:56:29<33:57, 32.86s/it] 95%|█████████▍| 1088/1149 [9:57:00<32:58, 32.44s/it] 95%|█████████▍| 1089/1149 [9:57:37<33:39, 33.66s/it] 95%|█████████▍| 1090/1149 [9:58:10<32:57, 33.52s/it] 95%|█████████▍| 1091/1149 [9:58:42<31:58, 33.08s/it] 95%|█████████▌| 1092/1149 [9:59:14<31:03, 32.70s/it] 95%|█████████▌| 1093/1149 [9:59:44<29:43, 31.84s/it] 95%|█████████▌| 1094/1149 [10:00:15<29:02, 31.68s/it] 95%|█████████▌| 1095/1149 [10:00:49<29:11, 32.44s/it] 95%|█████████▌| 1096/1149 [10:01:21<28:27, 32.21s/it] 95%|█████████▌| 1097/1149 [10:01:55<28:22, 32.73s/it] 96%|█████████▌| 1098/1149 [10:02:24<26:55, 31.67s/it] 96%|█████████▌| 1099/1149 [10:02:58<26:52, 32.25s/it] 96%|█████████▌| 1100/1149 [10:03:29<26:02, 31.89s/it] {'loss': 0.232, 'learning_rate': 0.0001, 'global_step': 1100, 'epoch': 2.87} | |
| 96%|█████████▌| 1100/1149 [10:03:29<26:02, 31.89s/it] 96%|█████████▌| 1101/1149 [10:04:00<25:24, 31.76s/it] 96%|█████████▌| 1102/1149 [10:04:34<25:17, 32.28s/it] 96%|█████████▌| 1103/1149 [10:05:07<24:54, 32.49s/it] 96%|█████████▌| 1104/1149 [10:05:42<24:58, 33.30s/it] 96%|█████████▌| 1105/1149 [10:06:15<24:23, 33.27s/it] 96%|█████████▋| 1106/1149 [10:06:49<24:02, 33.55s/it] 96%|█████████▋| 1107/1149 [10:07:22<23:15, 33.22s/it] 96%|█████████▋| 1108/1149 [10:07:53<22:17, 32.62s/it] 97%|█████████▋| 1109/1149 [10:08:24<21:25, 32.13s/it] 97%|█████████▋| 1110/1149 [10:09:00<21:44, 33.45s/it] 97%|█████████▋| 1111/1149 [10:09:31<20:35, 32.50s/it] 97%|█████████▋| 1112/1149 [10:10:05<20:19, 32.95s/it] 97%|█████████▋| 1113/1149 [10:10:38<19:45, 32.94s/it] 97%|█████████▋| 1114/1149 [10:11:10<19:04, 32.69s/it] 97%|█████████▋| 1115/1149 [10:11:43<18:35, 32.82s/it] 97%|█████████▋| 1116/1149 [10:12:17<18:21, 33.38s/it] 97%|█████████▋| 1117/1149 [10:12:51<17:49, 33.42s/it] 97%|█████████▋| 1118/1149 [10:13:22<16:52, 32.65s/it] 97%|█████████▋| 1119/1149 [10:13:57<16:45, 33.51s/it] 97%|█████████▋| 1120/1149 [10:14:31<16:12, 33.53s/it] {'loss': 0.182, 'learning_rate': 0.0001, 'global_step': 1120, 'epoch': 2.92} | |
| 97%|█████████▋| 1120/1149 [10:14:31<16:12, 33.53s/it] 98%|█████████▊| 1121/1149 [10:15:05<15:45, 33.76s/it] 98%|█████████▊| 1122/1149 [10:15:37<14:55, 33.16s/it] 98%|█████████▊| 1123/1149 [10:16:10<14:24, 33.27s/it] 98%|█████████▊| 1124/1149 [10:16:45<14:02, 33.70s/it] 98%|█████████▊| 1125/1149 [10:17:18<13:24, 33.50s/it] 98%|█████████▊| 1126/1149 [10:17:52<12:52, 33.60s/it] 98%|█████████▊| 1127/1149 [10:18:25<12:15, 33.42s/it] 98%|█████████▊| 1128/1149 [10:18:59<11:46, 33.64s/it] 98%|█████████▊| 1129/1149 [10:19:33<11:16, 33.83s/it] 98%|█████████▊| 1130/1149 [10:20:06<10:36, 33.50s/it] 98%|█████████▊| 1131/1149 [10:20:36<09:43, 32.43s/it] 99%|█████████▊| 1132/1149 [10:21:09<09:15, 32.67s/it] 99%|█████████▊| 1133/1149 [10:21:42<08:43, 32.69s/it] 99%|█████████▊| 1134/1149 [10:22:14<08:07, 32.48s/it] 99%|█████████▉| 1135/1149 [10:22:49<07:46, 33.30s/it] 99%|█████████▉| 1136/1149 [10:23:26<07:24, 34.19s/it] 99%|█████████▉| 1137/1149 [10:24:01<06:52, 34.41s/it] 99%|█████████▉| 1138/1149 [10:24:32<06:08, 33.48s/it] 99%|█████████▉| 1139/1149 [10:25:01<05:22, 32.28s/it] 99%|█████████▉| 1140/1149 [10:25:39<05:04, 33.82s/it] {'loss': 0.2187, 'learning_rate': 0.0001, 'global_step': 1140, 'epoch': 2.98} | |
| 99%|█████████▉| 1140/1149 [10:25:39<05:04, 33.82s/it] 99%|█████████▉| 1141/1149 [10:26:13<04:32, 34.03s/it] 99%|█████████▉| 1142/1149 [10:26:47<03:58, 34.04s/it] 99%|█████████▉| 1143/1149 [10:27:23<03:27, 34.59s/it] 100%|█████████▉| 1144/1149 [10:28:00<02:56, 35.24s/it] 100%|█████████▉| 1145/1149 [10:28:31<02:15, 33.94s/it] 100%|█████████▉| 1146/1149 [10:29:02<01:39, 33.21s/it] 100%|█████████▉| 1147/1149 [10:29:32<01:04, 32.17s/it] 100%|█████████▉| 1148/1149 [10:30:04<00:32, 32.10s/it] 100%|██████████| 1149/1149 [10:30:39<00:00, 33.09s/it] {'train_runtime': 37840.3688, 'train_samples_per_second': 1.214, 'train_steps_per_second': 0.03, 'train_loss': 0.6321681946438225, 'epoch': 3.0} | |
| 100%|██████████| 1149/1149 [10:30:40<00:00, 33.09s/it] 100%|██████████| 1149/1149 [10:30:40<00:00, 32.93s/it] | |
| ***** train metrics ***** | |
| epoch = 3.0 | |
| train_loss = 0.6322 | |
| train_runtime = 10:30:40.36 | |
| train_samples_per_second = 1.214 | |
| train_steps_per_second = 0.03 | |