File size: 49,390 Bytes
9e536bb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
model training desc: 做知识选择,使用QuALITY数据集,随机选择的知识和关键句训练
2023-12-16 23:07:37.734 | INFO | __main__:init_components:108 - Initializing components...
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|█████ | 1/2 [00:48<00:48, 48.58s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [01:02<00:00, 28.35s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [01:02<00:00, 31.38s/it]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2023-12-16 23:08:41.148 | INFO | __main__:init_components:155 -
2023-12-16 23:08:41.148 | INFO | __main__:init_components:156 - ********************
2023-12-16 23:08:41.148 | INFO | __main__:init_components:157 - using llama2 model
2023-12-16 23:08:41.148 | INFO | __main__:init_components:158 - ********************
2023-12-16 23:08:41.148 | INFO | __main__:init_components:159 -
memory footprint of model: 4.024436950683594 GB
trainable params: 319,815,680 || all params: 7,058,231,296 || trainable%: 4.531102291607305
2023-12-16 23:08:44.549 | INFO | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/QuALITY/random_select/with_knowledge_without_select_instruction/train.jsonl
2023-12-16 23:08:44.647 | INFO | component.dataset:__init__:19 - there are 2523 data in dataset
2023-12-16 23:08:44.773 | INFO | __main__:main:231 - *** starting training ***
0%| | 0/630 [00:00<?, ?it/s]
0%| | 1/630 [01:28<15:29:52, 88.70s/it]
0%| | 2/630 [02:48<14:34:48, 83.58s/it]
0%| | 3/630 [04:10<14:23:46, 82.66s/it]
1%| | 4/630 [05:32<14:18:41, 82.30s/it]
1%| | 5/630 [06:53<14:15:21, 82.11s/it]
1%| | 6/630 [08:16<14:15:39, 82.28s/it]
1%| | 7/630 [09:38<14:12:59, 82.15s/it]
1%|▏ | 8/630 [10:59<14:09:57, 81.99s/it]
1%|▏ | 9/630 [12:21<14:08:12, 81.95s/it]
2%|▏ | 10/630 [13:16<12:38:25, 73.40s/it]
2%|▏ | 11/630 [13:52<10:41:19, 62.16s/it]
2%|▏ | 12/630 [14:29<9:20:14, 54.39s/it]
2%|▏ | 13/630 [15:05<8:23:57, 49.01s/it]
2%|▏ | 14/630 [15:42<7:43:09, 45.11s/it]
2%|▏ | 15/630 [16:18<7:16:08, 42.55s/it]
3%|▎ | 16/630 [16:55<6:57:10, 40.77s/it]
3%|▎ | 17/630 [17:32<6:44:01, 39.55s/it]
3%|▎ | 18/630 [18:08<6:34:23, 38.67s/it]
3%|▎ | 19/630 [18:45<6:28:19, 38.13s/it]
3%|▎ | 20/630 [19:20<6:17:18, 37.11s/it]
{'loss': 0.5195, 'learning_rate': 3.1746031746031745e-05, 'global_step': 20, 'epoch': 0.1}
3%|▎ | 20/630 [19:20<6:17:18, 37.11s/it]
3%|▎ | 21/630 [19:57<6:15:54, 37.03s/it]
3%|▎ | 22/630 [20:33<6:14:20, 36.94s/it]
4%|▎ | 23/630 [21:10<6:12:23, 36.81s/it]
4%|▍ | 24/630 [21:47<6:11:31, 36.78s/it]
4%|▍ | 25/630 [22:23<6:10:36, 36.75s/it]
4%|▍ | 26/630 [23:00<6:09:53, 36.74s/it]
4%|▍ | 27/630 [23:37<6:09:29, 36.76s/it]
4%|▍ | 28/630 [24:14<6:08:44, 36.75s/it]
5%|▍ | 29/630 [24:50<6:06:48, 36.62s/it]
5%|▍ | 30/630 [25:27<6:06:30, 36.65s/it]
5%|▍ | 31/630 [26:03<6:06:41, 36.73s/it]
5%|▌ | 32/630 [26:40<6:05:45, 36.70s/it]
5%|▌ | 33/630 [27:17<6:05:11, 36.70s/it]
5%|▌ | 34/630 [27:54<6:04:39, 36.71s/it]
6%|▌ | 35/630 [28:30<6:03:47, 36.68s/it]
6%|▌ | 36/630 [29:07<6:03:48, 36.75s/it]
6%|▌ | 37/630 [29:44<6:02:54, 36.72s/it]
6%|▌ | 38/630 [30:20<6:02:17, 36.72s/it]
6%|▌ | 39/630 [30:57<6:01:33, 36.71s/it]
6%|▋ | 40/630 [31:34<6:00:55, 36.70s/it]
{'loss': 0.5055, 'learning_rate': 6.349206349206349e-05, 'global_step': 40, 'epoch': 0.19}
6%|▋ | 40/630 [31:34<6:00:55, 36.70s/it]
7%|▋ | 41/630 [32:10<6:00:05, 36.68s/it]
7%|▋ | 42/630 [32:47<5:59:28, 36.68s/it]
7%|▋ | 43/630 [33:24<5:58:37, 36.66s/it]
7%|▋ | 44/630 [34:00<5:57:55, 36.65s/it]
7%|▋ | 45/630 [34:37<5:57:30, 36.67s/it]
7%|▋ | 46/630 [35:14<5:57:03, 36.68s/it]
7%|▋ | 47/630 [35:50<5:56:33, 36.70s/it]
8%|▊ | 48/630 [36:27<5:56:19, 36.74s/it]
8%|▊ | 49/630 [37:04<5:55:23, 36.70s/it]
8%|▊ | 50/630 [37:41<5:55:16, 36.75s/it]
8%|▊ | 51/630 [38:17<5:54:16, 36.71s/it]
8%|▊ | 52/630 [38:54<5:53:36, 36.71s/it]
8%|▊ | 53/630 [39:31<5:52:58, 36.70s/it]
9%|▊ | 54/630 [40:08<5:52:24, 36.71s/it]
9%|▊ | 55/630 [40:44<5:51:43, 36.70s/it]
9%|▉ | 56/630 [41:21<5:51:02, 36.69s/it]
9%|▉ | 57/630 [41:58<5:50:25, 36.69s/it]
9%|▉ | 58/630 [42:34<5:49:38, 36.67s/it]
9%|▉ | 59/630 [43:11<5:49:06, 36.68s/it]
10%|▉ | 60/630 [43:48<5:48:36, 36.70s/it]
{'loss': 0.5483, 'learning_rate': 9.523809523809524e-05, 'global_step': 60, 'epoch': 0.29}
10%|▉ | 60/630 [43:48<5:48:36, 36.70s/it]
10%|▉ | 61/630 [44:24<5:48:05, 36.71s/it]
10%|▉ | 62/630 [45:01<5:47:12, 36.68s/it]
10%|█ | 63/630 [45:38<5:46:13, 36.64s/it]
10%|█ | 64/630 [46:14<5:45:51, 36.66s/it]
10%|█ | 65/630 [46:51<5:45:21, 36.68s/it]
10%|█ | 66/630 [47:28<5:45:26, 36.75s/it]
11%|█ | 67/630 [48:05<5:44:57, 36.76s/it]
11%|█ | 68/630 [48:41<5:44:27, 36.78s/it]
11%|█ | 69/630 [49:18<5:43:38, 36.75s/it]
11%|█ | 70/630 [49:54<5:41:28, 36.59s/it]
11%|█▏ | 71/630 [50:31<5:41:02, 36.61s/it]
11%|█▏ | 72/630 [51:08<5:40:28, 36.61s/it]
12%|█▏ | 73/630 [51:44<5:39:51, 36.61s/it]
12%|█▏ | 74/630 [52:21<5:39:47, 36.67s/it]
12%|█▏ | 75/630 [52:58<5:39:09, 36.67s/it]
12%|█▏ | 76/630 [53:34<5:38:41, 36.68s/it]
12%|█▏ | 77/630 [54:11<5:38:25, 36.72s/it]
12%|█▏ | 78/630 [54:48<5:38:03, 36.75s/it]
13%|█▎ | 79/630 [55:25<5:37:16, 36.73s/it]
13%|█▎ | 80/630 [56:02<5:36:51, 36.75s/it]
{'loss': 0.4659, 'learning_rate': 0.0001, 'global_step': 80, 'epoch': 0.38}
13%|█▎ | 80/630 [56:02<5:36:51, 36.75s/it]
13%|█▎ | 81/630 [56:38<5:35:51, 36.71s/it]
13%|█▎ | 82/630 [57:15<5:35:14, 36.71s/it]
13%|█▎ | 83/630 [57:52<5:34:37, 36.71s/it]
13%|█▎ | 84/630 [58:28<5:34:32, 36.76s/it]
13%|█▎ | 85/630 [59:05<5:33:41, 36.74s/it]
14%|█▎ | 86/630 [59:42<5:33:31, 36.79s/it]
14%|█▍ | 87/630 [1:00:19<5:32:41, 36.76s/it]
14%|█▍ | 88/630 [1:00:55<5:31:40, 36.72s/it]
14%|█▍ | 89/630 [1:01:32<5:31:14, 36.74s/it]
14%|█▍ | 90/630 [1:02:09<5:30:50, 36.76s/it]
14%|█▍ | 91/630 [1:02:46<5:30:03, 36.74s/it]
15%|█▍ | 92/630 [1:03:22<5:29:23, 36.73s/it]
15%|█▍ | 93/630 [1:03:59<5:28:42, 36.73s/it]
15%|█▍ | 94/630 [1:04:36<5:28:16, 36.75s/it]
15%|█▌ | 95/630 [1:05:12<5:27:19, 36.71s/it]
15%|█▌ | 96/630 [1:05:49<5:26:29, 36.69s/it]
15%|█▌ | 97/630 [1:06:26<5:26:04, 36.71s/it]
16%|█▌ | 98/630 [1:07:32<6:43:26, 45.50s/it]
16%|█▌ | 99/630 [1:08:52<8:15:43, 56.01s/it]
16%|█▌ | 100/630 [1:10:14<9:23:01, 63.74s/it]
{'loss': 0.5625, 'learning_rate': 0.0001, 'global_step': 100, 'epoch': 0.48}
16%|█▌ | 100/630 [1:10:14<9:23:01, 63.74s/it]
16%|█▌ | 101/630 [1:11:36<10:09:46, 69.16s/it]
16%|█▌ | 102/630 [1:12:58<10:42:36, 73.02s/it]
16%|█▋ | 103/630 [1:14:20<11:04:01, 75.60s/it]
17%|█▋ | 104/630 [1:15:42<11:19:25, 77.50s/it]
17%|█▋ | 105/630 [1:17:03<11:28:59, 78.74s/it]
17%|█▋ | 106/630 [1:18:25<11:35:56, 79.69s/it]
17%|█▋ | 107/630 [1:19:47<11:39:41, 80.27s/it]
17%|█▋ | 108/630 [1:21:08<11:41:52, 80.68s/it]
17%|█▋ | 109/630 [1:22:30<11:43:33, 81.02s/it]
17%|█▋ | 110/630 [1:23:52<11:43:00, 81.12s/it]
18%|█▊ | 111/630 [1:25:14<11:44:41, 81.47s/it]
18%|█▊ | 112/630 [1:26:36<11:44:32, 81.61s/it]
18%|█▊ | 113/630 [1:27:58<11:43:51, 81.69s/it]
18%|█▊ | 114/630 [1:29:19<11:42:07, 81.64s/it]
18%|█▊ | 115/630 [1:30:41<11:40:43, 81.64s/it]
18%|█▊ | 116/630 [1:32:03<11:40:05, 81.72s/it]
19%|█▊ | 117/630 [1:33:25<11:39:02, 81.76s/it]
19%|█▊ | 118/630 [1:34:47<11:38:07, 81.81s/it]
19%|█▉ | 119/630 [1:36:09<11:37:18, 81.88s/it]
19%|█▉ | 120/630 [1:37:28<11:30:08, 81.19s/it]
{'loss': 0.545, 'learning_rate': 0.0001, 'global_step': 120, 'epoch': 0.57}
19%|█▉ | 120/630 [1:37:28<11:30:08, 81.19s/it]
19%|█▉ | 121/630 [1:38:50<11:30:27, 81.39s/it]
19%|█▉ | 122/630 [1:40:11<11:29:23, 81.42s/it]
20%|█▉ | 123/630 [1:41:34<11:29:43, 81.62s/it]
20%|█▉ | 124/630 [1:42:55<11:29:06, 81.71s/it]
20%|█▉ | 125/630 [1:44:17<11:27:27, 81.68s/it]
20%|██ | 126/630 [1:45:39<11:25:25, 81.60s/it]
20%|██ | 127/630 [1:47:00<11:24:39, 81.67s/it]
20%|██ | 128/630 [1:48:20<11:17:02, 80.92s/it]
20%|██ | 129/630 [1:49:41<11:17:34, 81.15s/it]
21%|██ | 130/630 [1:51:03<11:18:30, 81.42s/it]
21%|██ | 131/630 [1:52:25<11:18:42, 81.61s/it]
21%|██ | 132/630 [1:53:47<11:18:20, 81.73s/it]
21%|██ | 133/630 [1:55:09<11:16:46, 81.70s/it]
21%|██▏ | 134/630 [1:56:31<11:17:03, 81.90s/it]
21%|██▏ | 135/630 [1:57:50<11:08:23, 81.02s/it]
22%|██▏ | 136/630 [1:59:07<10:56:04, 79.68s/it]
22%|██▏ | 137/630 [2:00:28<10:59:34, 80.27s/it]
22%|██▏ | 138/630 [2:01:50<11:02:16, 80.76s/it]
22%|██▏ | 139/630 [2:03:12<11:02:53, 81.01s/it]
22%|██▏ | 140/630 [2:04:34<11:03:02, 81.19s/it]
{'loss': 0.4962, 'learning_rate': 0.0001, 'global_step': 140, 'epoch': 0.67}
22%|██▏ | 140/630 [2:04:34<11:03:02, 81.19s/it]
22%|██▏ | 141/630 [2:05:56<11:03:47, 81.45s/it]
23%|██▎ | 142/630 [2:07:18<11:03:36, 81.59s/it]
23%|██▎ | 143/630 [2:08:39<11:02:59, 81.68s/it]
23%|██▎ | 144/630 [2:10:01<11:02:11, 81.75s/it]
23%|██▎ | 145/630 [2:11:23<11:01:10, 81.80s/it]
23%|██▎ | 146/630 [2:12:45<10:59:37, 81.77s/it]
23%|██▎ | 147/630 [2:14:07<10:58:24, 81.79s/it]
23%|██▎ | 148/630 [2:15:29<10:57:18, 81.82s/it]
24%|██▎ | 149/630 [2:16:51<10:56:10, 81.85s/it]
24%|██▍ | 150/630 [2:18:13<10:54:54, 81.86s/it]
24%|██▍ | 151/630 [2:19:34<10:53:13, 81.82s/it]
24%|██▍ | 152/630 [2:20:56<10:52:38, 81.92s/it]
24%|██▍ | 153/630 [2:22:18<10:50:37, 81.84s/it]
24%|██▍ | 154/630 [2:23:40<10:49:02, 81.81s/it]
25%|██▍ | 155/630 [2:25:03<10:51:47, 82.33s/it]
25%|██▍ | 156/630 [2:26:27<10:53:18, 82.70s/it]
25%|██▍ | 157/630 [2:27:49<10:49:56, 82.45s/it]
25%|██▌ | 158/630 [2:29:11<10:47:46, 82.34s/it]
25%|██▌ | 159/630 [2:30:33<10:45:18, 82.20s/it]
25%|██▌ | 160/630 [2:31:54<10:42:42, 82.05s/it]
{'loss': 0.5508, 'learning_rate': 0.0001, 'global_step': 160, 'epoch': 0.76}
25%|██▌ | 160/630 [2:31:54<10:42:42, 82.05s/it]
26%|██▌ | 161/630 [2:33:16<10:40:25, 81.93s/it]
26%|██▌ | 162/630 [2:34:38<10:38:02, 81.80s/it]
26%|██▌ | 163/630 [2:36:00<10:37:19, 81.88s/it]
26%|██▌ | 164/630 [2:37:21<10:33:44, 81.60s/it]
26%|██▌ | 165/630 [2:38:42<10:32:23, 81.60s/it]
26%|██▋ | 166/630 [2:40:04<10:31:41, 81.68s/it]
27%|██▋ | 167/630 [2:41:26<10:30:15, 81.68s/it]
27%|██▋ | 168/630 [2:42:47<10:28:49, 81.67s/it]
27%|██▋ | 169/630 [2:44:09<10:27:50, 81.71s/it]
27%|██▋ | 170/630 [2:45:31<10:27:21, 81.83s/it]
27%|██▋ | 171/630 [2:46:53<10:26:38, 81.91s/it]
27%|██▋ | 172/630 [2:48:15<10:25:16, 81.91s/it]
27%|██▋ | 173/630 [2:49:38<10:24:43, 82.02s/it]
28%|██▊ | 174/630 [2:50:59<10:22:55, 81.96s/it]
28%|██▊ | 175/630 [2:52:21<10:21:21, 81.94s/it]
28%|██▊ | 176/630 [2:53:44<10:20:46, 82.04s/it]
28%|██▊ | 177/630 [2:55:06<10:19:28, 82.05s/it]
28%|██▊ | 178/630 [2:56:27<10:17:36, 81.98s/it]
28%|██▊ | 179/630 [2:57:49<10:15:59, 81.95s/it]
29%|██▊ | 180/630 [2:59:11<10:13:51, 81.85s/it]
{'loss': 0.5314, 'learning_rate': 0.0001, 'global_step': 180, 'epoch': 0.86}
29%|██▊ | 180/630 [2:59:11<10:13:51, 81.85s/it]
29%|██▊ | 181/630 [3:00:33<10:12:41, 81.87s/it]
29%|██▉ | 182/630 [3:01:54<10:10:10, 81.72s/it]
29%|██▉ | 183/630 [3:03:16<10:08:40, 81.70s/it]
29%|██▉ | 184/630 [3:04:38<10:08:04, 81.80s/it]
29%|██▉ | 185/630 [3:05:58<10:03:33, 81.38s/it]
30%|██▉ | 186/630 [3:07:20<10:03:04, 81.50s/it]
30%|██▉ | 187/630 [3:08:42<10:02:12, 81.56s/it]
30%|██▉ | 188/630 [3:10:03<10:00:57, 81.58s/it]
30%|███ | 189/630 [3:11:25<10:00:18, 81.68s/it]
30%|███ | 190/630 [3:12:47<9:58:54, 81.67s/it]
30%|███ | 191/630 [3:14:09<9:59:05, 81.88s/it]
30%|███ | 192/630 [3:15:31<9:57:08, 81.80s/it]
31%|███ | 193/630 [3:16:53<9:56:32, 81.91s/it]
31%|███ | 194/630 [3:18:15<9:55:06, 81.90s/it]
31%|███ | 195/630 [3:19:37<9:53:02, 81.80s/it]
31%|███ | 196/630 [3:20:59<9:52:13, 81.88s/it]
31%|███▏ | 197/630 [3:22:21<9:50:57, 81.89s/it]
31%|███▏ | 198/630 [3:23:42<9:49:23, 81.86s/it]
32%|███▏ | 199/630 [3:25:04<9:47:31, 81.79s/it]
32%|███▏ | 200/630 [3:26:26<9:45:53, 81.75s/it]
{'loss': 0.5251, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 0.95}
32%|███▏ | 200/630 [3:26:26<9:45:53, 81.75s/it]
32%|███▏ | 201/630 [3:27:47<9:44:42, 81.78s/it]
32%|███▏ | 202/630 [3:29:10<9:44:02, 81.88s/it]
32%|███▏ | 203/630 [3:30:31<9:42:42, 81.88s/it]
32%|███▏ | 204/630 [3:31:53<9:40:47, 81.80s/it]
33%|███▎ | 205/630 [3:33:15<9:40:30, 81.96s/it]
33%|███▎ | 206/630 [3:34:37<9:38:53, 81.92s/it]
33%|███▎ | 207/630 [3:35:59<9:37:21, 81.90s/it]
33%|███▎ | 208/630 [3:37:21<9:35:55, 81.89s/it]
33%|███▎ | 209/630 [3:38:43<9:34:01, 81.81s/it]
33%|███▎ | 210/630 [3:40:04<9:32:44, 81.82s/it]
33%|███▎ | 211/630 [3:41:21<9:21:02, 80.34s/it]
34%|███▎ | 212/630 [3:42:43<9:22:53, 80.80s/it]
34%|███▍ | 213/630 [3:44:05<9:23:41, 81.11s/it]
34%|███▍ | 214/630 [3:45:27<9:23:30, 81.28s/it]
34%|███▍ | 215/630 [3:46:49<9:23:21, 81.45s/it]
34%|███▍ | 216/630 [3:48:09<9:19:44, 81.12s/it]
34%|███▍ | 217/630 [3:49:31<9:19:51, 81.34s/it]
35%|███▍ | 218/630 [3:50:53<9:20:33, 81.63s/it]
35%|███▍ | 219/630 [3:52:15<9:19:09, 81.63s/it]
35%|███▍ | 220/630 [3:53:37<9:18:21, 81.71s/it]
{'loss': 0.347, 'learning_rate': 0.0001, 'global_step': 220, 'epoch': 1.05}
35%|███▍ | 220/630 [3:53:37<9:18:21, 81.71s/it]
35%|███▌ | 221/630 [3:54:58<9:17:11, 81.74s/it]
35%|███▌ | 222/630 [3:56:20<9:16:33, 81.85s/it]
35%|███▌ | 223/630 [3:57:42<9:14:36, 81.76s/it]
36%|███▌ | 224/630 [3:59:04<9:13:20, 81.77s/it]
36%|███▌ | 225/630 [4:00:26<9:11:49, 81.75s/it]
36%|███▌ | 226/630 [4:01:48<9:11:14, 81.87s/it]
36%|███▌ | 227/630 [4:03:10<9:10:51, 82.01s/it]
36%|███▌ | 228/630 [4:04:32<9:09:14, 81.98s/it]
36%|███▋ | 229/630 [4:05:54<9:07:39, 81.95s/it]
37%|███▋ | 230/630 [4:07:16<9:06:14, 81.94s/it]
37%|███▋ | 231/630 [4:08:38<9:04:45, 81.92s/it]
37%|███▋ | 232/630 [4:09:59<9:03:23, 81.92s/it]
37%|███▋ | 233/630 [4:11:21<9:01:36, 81.85s/it]
37%|███▋ | 234/630 [4:12:43<9:00:02, 81.82s/it]
37%|███▋ | 235/630 [4:14:05<8:58:49, 81.85s/it]
37%|███▋ | 236/630 [4:15:27<8:57:08, 81.80s/it]
38%|███▊ | 237/630 [4:16:49<8:56:24, 81.90s/it]
38%|███▊ | 238/630 [4:18:11<8:54:58, 81.89s/it]
38%|███▊ | 239/630 [4:19:33<8:54:27, 82.01s/it]
38%|███▊ | 240/630 [4:20:55<8:52:26, 81.92s/it]
{'loss': 0.2006, 'learning_rate': 0.0001, 'global_step': 240, 'epoch': 1.14}
38%|███▊ | 240/630 [4:20:55<8:52:26, 81.92s/it]
38%|███▊ | 241/630 [4:22:17<8:51:17, 81.95s/it]
38%|███▊ | 242/630 [4:23:39<8:50:16, 82.00s/it]
39%|███▊ | 243/630 [4:25:01<8:48:40, 81.97s/it]
39%|███▊ | 244/630 [4:26:22<8:46:45, 81.88s/it]
39%|███▉ | 245/630 [4:27:44<8:44:58, 81.81s/it]
39%|███▉ | 246/630 [4:29:06<8:43:15, 81.76s/it]
39%|███▉ | 247/630 [4:30:23<8:34:15, 80.56s/it]
39%|███▉ | 248/630 [4:31:45<8:35:40, 81.00s/it]
40%|███▉ | 249/630 [4:33:05<8:31:06, 80.49s/it]
40%|███▉ | 250/630 [4:34:26<8:32:13, 80.88s/it]
40%|███▉ | 251/630 [4:35:48<8:32:46, 81.18s/it]
40%|████ | 252/630 [4:37:10<8:32:37, 81.37s/it]
40%|████ | 253/630 [4:38:32<8:32:10, 81.51s/it]
40%|████ | 254/630 [4:39:54<8:31:53, 81.68s/it]
40%|████ | 255/630 [4:41:16<8:30:48, 81.73s/it]
41%|████ | 256/630 [4:42:38<8:29:35, 81.75s/it]
41%|████ | 257/630 [4:43:59<8:28:23, 81.78s/it]
41%|████ | 258/630 [4:45:22<8:27:27, 81.85s/it]
41%|████ | 259/630 [4:46:39<8:18:52, 80.68s/it]
41%|████▏ | 260/630 [4:48:01<8:19:41, 81.03s/it]
{'loss': 0.165, 'learning_rate': 0.0001, 'global_step': 260, 'epoch': 1.24}
41%|████▏ | 260/630 [4:48:01<8:19:41, 81.03s/it]
41%|████▏ | 261/630 [4:49:23<8:19:20, 81.19s/it]
42%|████▏ | 262/630 [4:50:45<8:19:17, 81.41s/it]
42%|████▏ | 263/630 [4:52:06<8:18:12, 81.45s/it]
42%|████▏ | 264/630 [4:53:28<8:17:31, 81.56s/it]
42%|████▏ | 265/630 [4:54:50<8:16:38, 81.64s/it]
42%|████▏ | 266/630 [4:56:13<8:18:35, 82.19s/it]
42%|████▏ | 267/630 [4:57:35<8:16:34, 82.08s/it]
43%|████▎ | 268/630 [4:58:57<8:14:08, 81.90s/it]
43%|████▎ | 269/630 [5:00:18<8:12:13, 81.81s/it]
43%|████▎ | 270/630 [5:01:39<8:09:16, 81.54s/it]
43%|████▎ | 271/630 [5:03:01<8:08:22, 81.62s/it]
43%|████▎ | 272/630 [5:04:23<8:07:03, 81.63s/it]
43%|████▎ | 273/630 [5:05:45<8:06:07, 81.70s/it]
43%|████▎ | 274/630 [5:07:06<8:04:41, 81.69s/it]
44%|████▎ | 275/630 [5:08:27<8:01:07, 81.32s/it]
44%|████▍ | 276/630 [5:09:46<7:56:14, 80.72s/it]
44%|████▍ | 277/630 [5:11:08<7:56:51, 81.05s/it]
44%|████▍ | 278/630 [5:12:30<7:57:01, 81.31s/it]
44%|████▍ | 279/630 [5:13:52<7:57:01, 81.54s/it]
44%|████▍ | 280/630 [5:15:14<7:56:17, 81.65s/it]
{'loss': 0.2124, 'learning_rate': 0.0001, 'global_step': 280, 'epoch': 1.33}
44%|████▍ | 280/630 [5:15:14<7:56:17, 81.65s/it]
45%|████▍ | 281/630 [5:16:36<7:55:59, 81.83s/it]
45%|████▍ | 282/630 [5:17:58<7:55:22, 81.96s/it]
45%|████▍ | 283/630 [5:19:20<7:53:35, 81.89s/it]
45%|████▌ | 284/630 [5:20:42<7:51:51, 81.83s/it]
45%|████▌ | 285/630 [5:22:02<7:47:33, 81.32s/it]
45%|████▌ | 286/630 [5:23:23<7:46:44, 81.41s/it]
46%|████▌ | 287/630 [5:24:45<7:44:49, 81.31s/it]
46%|████▌ | 288/630 [5:26:06<7:44:19, 81.46s/it]
46%|████▌ | 289/630 [5:27:28<7:43:58, 81.64s/it]
46%|████▌ | 290/630 [5:28:51<7:43:42, 81.83s/it]
46%|████▌ | 291/630 [5:30:12<7:41:24, 81.67s/it]
46%|████▋ | 292/630 [5:31:31<7:36:26, 81.03s/it]
47%|████▋ | 293/630 [5:32:53<7:36:25, 81.26s/it]
47%|████▋ | 294/630 [5:34:15<7:35:59, 81.43s/it]
47%|████▋ | 295/630 [5:35:37<7:34:57, 81.49s/it]
47%|████▋ | 296/630 [5:36:58<7:33:30, 81.47s/it]
47%|████▋ | 297/630 [5:38:17<7:27:50, 80.69s/it]
47%|████▋ | 298/630 [5:39:39<7:28:35, 81.07s/it]
47%|████▋ | 299/630 [5:41:01<7:28:52, 81.37s/it]
48%|████▊ | 300/630 [5:42:23<7:28:02, 81.46s/it]
{'loss': 0.2007, 'learning_rate': 0.0001, 'global_step': 300, 'epoch': 1.43}
48%|████▊ | 300/630 [5:42:23<7:28:02, 81.46s/it]
48%|████▊ | 301/630 [5:43:45<7:27:55, 81.69s/it]
48%|████▊ | 302/630 [5:45:07<7:26:49, 81.74s/it]
48%|████▊ | 303/630 [5:46:28<7:25:17, 81.70s/it]
48%|████▊ | 304/630 [5:47:50<7:24:14, 81.76s/it]
48%|████▊ | 305/630 [5:49:12<7:22:50, 81.76s/it]
49%|████▊ | 306/630 [5:50:34<7:21:27, 81.75s/it]
49%|████▊ | 307/630 [5:51:56<7:20:17, 81.79s/it]
49%|████▉ | 308/630 [5:53:17<7:18:57, 81.79s/it]
49%|████▉ | 309/630 [5:54:38<7:16:09, 81.52s/it]
49%|████▉ | 310/630 [5:56:00<7:14:37, 81.49s/it]
49%|████▉ | 311/630 [5:57:21<7:13:29, 81.53s/it]
50%|████▉ | 312/630 [5:58:43<7:12:19, 81.57s/it]
50%|████▉ | 313/630 [6:00:05<7:11:55, 81.75s/it]
50%|████▉ | 314/630 [6:01:27<7:11:16, 81.89s/it]
50%|█████ | 315/630 [6:02:49<7:09:32, 81.82s/it]
50%|█████ | 316/630 [6:04:11<7:08:31, 81.88s/it]
50%|█████ | 317/630 [6:05:33<7:06:45, 81.81s/it]
50%|█████ | 318/630 [6:06:54<7:05:08, 81.76s/it]
51%|█████ | 319/630 [6:08:16<7:03:36, 81.73s/it]
51%|█████ | 320/630 [6:09:38<7:01:58, 81.67s/it]
{'loss': 0.1981, 'learning_rate': 0.0001, 'global_step': 320, 'epoch': 1.52}
51%|█████ | 320/630 [6:09:38<7:01:58, 81.67s/it]
51%|█████ | 321/630 [6:10:59<7:00:51, 81.72s/it]
51%|█████ | 322/630 [6:12:21<6:59:21, 81.69s/it]
51%|█████▏ | 323/630 [6:13:43<6:58:36, 81.81s/it]
51%|█████▏ | 324/630 [6:15:05<6:57:26, 81.85s/it]
52%|█████▏ | 325/630 [6:16:27<6:55:46, 81.79s/it]
52%|█████▏ | 326/630 [6:17:49<6:54:38, 81.84s/it]
52%|█████▏ | 327/630 [6:19:11<6:53:37, 81.90s/it]
52%|█████▏ | 328/630 [6:20:33<6:52:13, 81.90s/it]
52%|█████▏ | 329/630 [6:21:54<6:50:23, 81.81s/it]
52%|█████▏ | 330/630 [6:23:16<6:48:46, 81.76s/it]
53%|█████▎ | 331/630 [6:24:38<6:47:23, 81.75s/it]
53%|█████▎ | 332/630 [6:25:59<6:45:55, 81.73s/it]
53%|█████▎ | 333/630 [6:27:21<6:44:47, 81.78s/it]
53%|█████▎ | 334/630 [6:28:43<6:43:11, 81.73s/it]
53%|█████▎ | 335/630 [6:30:05<6:42:05, 81.78s/it]
53%|█████▎ | 336/630 [6:31:27<6:41:31, 81.94s/it]
53%|█████▎ | 337/630 [6:32:49<6:39:46, 81.87s/it]
54%|█████▎ | 338/630 [6:34:11<6:38:28, 81.88s/it]
54%|█████▍ | 339/630 [6:35:32<6:36:52, 81.83s/it]
54%|█████▍ | 340/630 [6:36:54<6:34:47, 81.68s/it]
{'loss': 0.2031, 'learning_rate': 0.0001, 'global_step': 340, 'epoch': 1.62}
54%|█████▍ | 340/630 [6:36:54<6:34:47, 81.68s/it]
54%|█████▍ | 341/630 [6:38:15<6:33:27, 81.69s/it]
54%|█████▍ | 342/630 [6:39:37<6:32:42, 81.81s/it]
54%|█████▍ | 343/630 [6:41:00<6:32:04, 81.97s/it]
55%|█████▍ | 344/630 [6:42:22<6:30:47, 81.98s/it]
55%|█████▍ | 345/630 [6:43:42<6:26:53, 81.45s/it]
55%|█████▍ | 346/630 [6:45:04<6:26:26, 81.64s/it]
55%|█████▌ | 347/630 [6:46:26<6:25:24, 81.71s/it]
55%|█████▌ | 348/630 [6:47:48<6:24:39, 81.84s/it]
55%|█████▌ | 349/630 [6:49:10<6:23:17, 81.84s/it]
56%|█████▌ | 350/630 [6:50:32<6:21:58, 81.85s/it]
56%|█████▌ | 351/630 [6:51:54<6:20:38, 81.86s/it]
56%|█████▌ | 352/630 [6:53:16<6:19:57, 82.01s/it]
56%|█████▌ | 353/630 [6:54:38<6:18:02, 81.89s/it]
56%|█████▌ | 354/630 [6:55:59<6:16:28, 81.84s/it]
56%|█████▋ | 355/630 [6:57:21<6:15:09, 81.85s/it]
57%|█████▋ | 356/630 [6:58:43<6:13:49, 81.86s/it]
57%|█████▋ | 357/630 [7:00:05<6:12:12, 81.81s/it]
57%|█████▋ | 358/630 [7:01:27<6:11:10, 81.88s/it]
57%|█████▋ | 359/630 [7:02:49<6:10:06, 81.94s/it]
57%|█████▋ | 360/630 [7:04:11<6:08:23, 81.87s/it]
{'loss': 0.1392, 'learning_rate': 0.0001, 'global_step': 360, 'epoch': 1.71}
57%|█████▋ | 360/630 [7:04:11<6:08:23, 81.87s/it]
57%|█████▋ | 361/630 [7:05:32<6:06:45, 81.80s/it]
57%|█████▋ | 362/630 [7:06:54<6:05:23, 81.80s/it]
58%|█████▊ | 363/630 [7:08:16<6:04:07, 81.82s/it]
58%|█████▊ | 364/630 [7:09:38<6:02:54, 81.86s/it]
58%|█████▊ | 365/630 [7:11:00<6:01:48, 81.92s/it]
58%|█████▊ | 366/630 [7:12:22<6:00:41, 81.98s/it]
58%|█████▊ | 367/630 [7:13:44<5:59:12, 81.95s/it]
58%|█████▊ | 368/630 [7:15:06<5:57:41, 81.92s/it]
59%|█████▊ | 369/630 [7:16:28<5:56:16, 81.90s/it]
59%|█████▊ | 370/630 [7:17:49<5:54:34, 81.83s/it]
59%|█████▉ | 371/630 [7:19:11<5:53:00, 81.78s/it]
59%|█████▉ | 372/630 [7:20:33<5:51:44, 81.80s/it]
59%|█████▉ | 373/630 [7:21:55<5:50:29, 81.83s/it]
59%|█████▉ | 374/630 [7:23:17<5:49:10, 81.84s/it]
60%|█████▉ | 375/630 [7:24:31<5:38:05, 79.55s/it]
60%|█████▉ | 376/630 [7:25:53<5:39:45, 80.26s/it]
60%|█████▉ | 377/630 [7:27:12<5:37:05, 79.94s/it]
60%|██████ | 378/630 [7:28:34<5:38:08, 80.51s/it]
60%|██████ | 379/630 [7:29:56<5:38:56, 81.02s/it]
60%|██████ | 380/630 [7:31:18<5:38:18, 81.19s/it]
{'loss': 0.204, 'learning_rate': 0.0001, 'global_step': 380, 'epoch': 1.81}
60%|██████ | 380/630 [7:31:18<5:38:18, 81.19s/it]
60%|██████ | 381/630 [7:32:40<5:38:01, 81.45s/it]
61%|██████ | 382/630 [7:34:02<5:37:44, 81.71s/it]
61%|██████ | 383/630 [7:35:24<5:36:36, 81.77s/it]
61%|██████ | 384/630 [7:36:46<5:35:27, 81.82s/it]
61%|██████ | 385/630 [7:38:08<5:34:45, 81.98s/it]
61%|██████▏ | 386/630 [7:39:30<5:33:19, 81.96s/it]
61%|██████▏ | 387/630 [7:40:52<5:31:52, 81.95s/it]
62%|██████▏ | 388/630 [7:42:14<5:30:28, 81.94s/it]
62%|██████▏ | 389/630 [7:43:36<5:29:04, 81.93s/it]
62%|██████▏ | 390/630 [7:44:54<5:23:01, 80.76s/it]
62%|██████▏ | 391/630 [7:46:16<5:23:12, 81.14s/it]
62%|██████▏ | 392/630 [7:47:37<5:21:51, 81.14s/it]
62%|██████▏ | 393/630 [7:48:59<5:21:04, 81.29s/it]
63%|██████▎ | 394/630 [7:50:19<5:18:31, 80.98s/it]
63%|██████▎ | 395/630 [7:51:41<5:18:13, 81.25s/it]
63%|██████▎ | 396/630 [7:53:02<5:17:20, 81.37s/it]
63%|██████▎ | 397/630 [7:54:24<5:16:31, 81.51s/it]
63%|██████▎ | 398/630 [7:55:46<5:15:19, 81.55s/it]
63%|██████▎ | 399/630 [7:57:08<5:14:20, 81.65s/it]
63%|██████▎ | 400/630 [7:58:30<5:13:11, 81.70s/it]
{'loss': 0.1626, 'learning_rate': 0.0001, 'global_step': 400, 'epoch': 1.9}
63%|██████▎ | 400/630 [7:58:30<5:13:11, 81.70s/it]
64%|██████▎ | 401/630 [7:59:51<5:11:58, 81.74s/it]
64%|██████▍ | 402/630 [8:01:14<5:10:56, 81.83s/it]
64%|██████▍ | 403/630 [8:02:35<5:09:31, 81.81s/it]
64%|██████▍ | 404/630 [8:03:57<5:07:46, 81.71s/it]
64%|██████▍ | 405/630 [8:05:18<5:06:18, 81.68s/it]
64%|██████▍ | 406/630 [8:06:40<5:05:04, 81.71s/it]
65%|██████▍ | 407/630 [8:07:57<4:58:42, 80.37s/it]
65%|██████▍ | 408/630 [8:09:19<4:59:03, 80.83s/it]
65%|██████▍ | 409/630 [8:10:41<4:58:51, 81.14s/it]
65%|██████▌ | 410/630 [8:12:03<4:58:13, 81.33s/it]
65%|██████▌ | 411/630 [8:13:24<4:56:51, 81.33s/it]
65%|██████▌ | 412/630 [8:14:46<4:56:14, 81.54s/it]
66%|██████▌ | 413/630 [8:16:09<4:55:43, 81.77s/it]
66%|██████▌ | 414/630 [8:17:30<4:54:21, 81.77s/it]
66%|██████▌ | 415/630 [8:18:52<4:53:06, 81.80s/it]
66%|██████▌ | 416/630 [8:20:14<4:52:10, 81.92s/it]
66%|██████▌ | 417/630 [8:21:37<4:51:12, 82.03s/it]
66%|██████▋ | 418/630 [8:22:58<4:49:25, 81.91s/it]
67%|██████▋ | 419/630 [8:24:20<4:48:00, 81.90s/it]
67%|██████▋ | 420/630 [8:25:42<4:46:26, 81.84s/it]
{'loss': 0.146, 'learning_rate': 0.0001, 'global_step': 420, 'epoch': 2.0}
67%|██████▋ | 420/630 [8:25:42<4:46:26, 81.84s/it]
67%|██████▋ | 421/630 [8:26:59<4:40:28, 80.52s/it]
67%|██████▋ | 422/630 [8:28:21<4:40:41, 80.97s/it]
67%|██████▋ | 423/630 [8:29:43<4:40:24, 81.28s/it]
67%|██████▋ | 424/630 [8:31:05<4:39:34, 81.43s/it]
67%|██████▋ | 425/630 [8:32:27<4:38:49, 81.61s/it]
68%|██████▊ | 426/630 [8:33:49<4:37:52, 81.73s/it]
68%|██████▊ | 427/630 [8:35:11<4:36:34, 81.75s/it]
68%|██████▊ | 428/630 [8:36:33<4:35:29, 81.83s/it]
68%|██████▊ | 429/630 [8:37:55<4:34:05, 81.82s/it]
68%|██████▊ | 430/630 [8:39:17<4:32:42, 81.81s/it]
68%|██████▊ | 431/630 [8:40:38<4:31:22, 81.82s/it]
69%|██████▊ | 432/630 [8:42:00<4:30:00, 81.82s/it]
69%|██████▊ | 433/630 [8:43:22<4:28:39, 81.82s/it]
69%|██████▉ | 434/630 [8:44:44<4:27:40, 81.94s/it]
69%|██████▉ | 435/630 [8:46:07<4:26:34, 82.02s/it]
69%|██████▉ | 436/630 [8:47:28<4:25:00, 81.96s/it]
69%|██████▉ | 437/630 [8:48:50<4:23:14, 81.84s/it]
70%|██████▉ | 438/630 [8:50:11<4:21:36, 81.75s/it]
70%|██████▉ | 439/630 [8:51:33<4:20:04, 81.70s/it]
70%|██████▉ | 440/630 [8:52:55<4:18:33, 81.65s/it]
{'loss': 0.0404, 'learning_rate': 0.0001, 'global_step': 440, 'epoch': 2.09}
70%|██████▉ | 440/630 [8:52:55<4:18:33, 81.65s/it]
70%|███████ | 441/630 [8:54:15<4:15:47, 81.20s/it]
70%|███████ | 442/630 [8:55:36<4:14:57, 81.37s/it]
70%|███████ | 443/630 [8:56:58<4:13:57, 81.48s/it]
70%|███████ | 444/630 [8:58:20<4:12:36, 81.49s/it]
71%|███████ | 445/630 [8:59:41<4:10:46, 81.33s/it]
71%|███████ | 446/630 [9:01:02<4:09:37, 81.40s/it]
71%|███████ | 447/630 [9:02:24<4:08:20, 81.42s/it]
71%|███████ | 448/630 [9:03:45<4:07:05, 81.46s/it]
71%|███████▏ | 449/630 [9:05:07<4:05:57, 81.54s/it]
71%|███████▏ | 450/630 [9:06:28<4:04:32, 81.51s/it]
72%|███████▏ | 451/630 [9:07:49<4:02:25, 81.26s/it]
72%|███████▏ | 452/630 [9:09:11<4:01:28, 81.40s/it]
72%|███████▏ | 453/630 [9:10:33<4:00:24, 81.50s/it]
72%|███████▏ | 454/630 [9:11:54<3:59:00, 81.48s/it]
72%|███████▏ | 455/630 [9:13:16<3:57:44, 81.51s/it]
72%|███████▏ | 456/630 [9:14:37<3:56:00, 81.38s/it]
73%|███████▎ | 457/630 [9:15:58<3:55:00, 81.50s/it]
73%|███████▎ | 458/630 [9:17:21<3:54:14, 81.71s/it]
73%|███████▎ | 459/630 [9:18:42<3:52:54, 81.72s/it]
73%|███████▎ | 460/630 [9:20:04<3:51:25, 81.68s/it]
{'loss': 0.0304, 'learning_rate': 0.0001, 'global_step': 460, 'epoch': 2.19}
73%|███████▎ | 460/630 [9:20:04<3:51:25, 81.68s/it]
73%|███████▎ | 461/630 [9:21:26<3:50:00, 81.66s/it]
73%|███████▎ | 462/630 [9:22:47<3:48:33, 81.63s/it]
73%|███████▎ | 463/630 [9:24:09<3:47:08, 81.61s/it]
74%|███████▎ | 464/630 [9:25:30<3:45:46, 81.61s/it]
74%|███████▍ | 465/630 [9:26:52<3:44:42, 81.71s/it]
74%|███████▍ | 466/630 [9:28:14<3:43:21, 81.72s/it]
74%|███████▍ | 467/630 [9:29:36<3:42:12, 81.79s/it]
74%|███████▍ | 468/630 [9:30:57<3:40:21, 81.62s/it]
74%|███████▍ | 469/630 [9:32:19<3:39:05, 81.65s/it]
75%|███████▍ | 470/630 [9:33:41<3:37:47, 81.67s/it]
75%|███████▍ | 471/630 [9:35:02<3:36:35, 81.73s/it]
75%|███████▍ | 472/630 [9:36:24<3:34:59, 81.64s/it]
75%|███████▌ | 473/630 [9:37:45<3:33:10, 81.47s/it]
75%|███████▌ | 474/630 [9:39:07<3:32:01, 81.55s/it]
75%|███████▌ | 475/630 [9:40:28<3:30:50, 81.62s/it]
76%|███████▌ | 476/630 [9:41:50<3:29:33, 81.65s/it]
76%|███████▌ | 477/630 [9:43:12<3:28:33, 81.79s/it]
76%|███████▌ | 478/630 [9:44:34<3:27:07, 81.76s/it]
76%|███████▌ | 479/630 [9:45:56<3:25:39, 81.72s/it]
76%|███████▌ | 480/630 [9:47:15<3:22:28, 80.99s/it]
{'loss': 0.0166, 'learning_rate': 0.0001, 'global_step': 480, 'epoch': 2.28}
76%|███████▌ | 480/630 [9:47:15<3:22:28, 80.99s/it]
76%|███████▋ | 481/630 [9:48:36<3:21:27, 81.13s/it]
77%|███████▋ | 482/630 [9:49:58<3:20:18, 81.21s/it]
77%|███████▋ | 483/630 [9:51:19<3:19:19, 81.36s/it]
77%|███████▋ | 484/630 [9:52:41<3:18:21, 81.52s/it]
77%|███████▋ | 485/630 [9:54:03<3:17:05, 81.56s/it]
77%|███████▋ | 486/630 [9:55:24<3:15:38, 81.52s/it]
77%|███████▋ | 487/630 [9:56:46<3:14:14, 81.50s/it]
77%|███████▋ | 488/630 [9:58:07<3:12:50, 81.48s/it]
78%|███████▊ | 489/630 [9:59:29<3:11:25, 81.46s/it]
78%|███████▊ | 490/630 [10:00:50<3:09:45, 81.33s/it]
78%|███████▊ | 491/630 [10:02:11<3:08:37, 81.42s/it]
78%|███████▊ | 492/630 [10:03:31<3:06:00, 80.87s/it]
78%|███████▊ | 493/630 [10:04:53<3:05:21, 81.18s/it]
78%|███████▊ | 494/630 [10:06:15<3:04:28, 81.38s/it]
79%|███████▊ | 495/630 [10:07:37<3:03:27, 81.54s/it]
79%|███████▊ | 496/630 [10:08:59<3:02:22, 81.66s/it]
79%|███████▉ | 497/630 [10:10:20<3:01:11, 81.74s/it]
79%|███████▉ | 498/630 [10:11:42<2:59:46, 81.72s/it]
79%|███████▉ | 499/630 [10:13:04<2:58:25, 81.72s/it]
79%|███████▉ | 500/630 [10:14:25<2:56:56, 81.66s/it]
{'loss': 0.0366, 'learning_rate': 0.0001, 'global_step': 500, 'epoch': 2.38}
79%|███████▉ | 500/630 [10:14:25<2:56:56, 81.66s/it]
80%|███████▉ | 501/630 [10:15:47<2:55:31, 81.64s/it]
80%|███████▉ | 502/630 [10:17:09<2:54:05, 81.60s/it]
80%|███████▉ | 503/630 [10:18:30<2:52:40, 81.58s/it]
80%|████████ | 504/630 [10:19:52<2:51:20, 81.59s/it]
80%|████████ | 505/630 [10:21:13<2:50:06, 81.65s/it]
80%|████████ | 506/630 [10:22:35<2:48:50, 81.70s/it]
80%|████████ | 507/630 [10:23:57<2:47:38, 81.77s/it]
81%|████████ | 508/630 [10:25:19<2:46:06, 81.69s/it]
81%|████████ | 509/630 [10:26:41<2:45:01, 81.83s/it]
81%|████████ | 510/630 [10:28:03<2:43:37, 81.81s/it]
81%|████████ | 511/630 [10:29:24<2:42:05, 81.72s/it]
81%|████████▏ | 512/630 [10:30:46<2:40:38, 81.68s/it]
81%|████████▏ | 513/630 [10:32:07<2:39:11, 81.64s/it]
82%|████████▏ | 514/630 [10:33:29<2:37:49, 81.63s/it]
82%|████████▏ | 515/630 [10:34:50<2:36:23, 81.60s/it]
82%|████████▏ | 516/630 [10:36:12<2:35:00, 81.59s/it]
82%|████████▏ | 517/630 [10:37:34<2:33:43, 81.63s/it]
82%|████████▏ | 518/630 [10:38:55<2:32:26, 81.66s/it]
82%|████████▏ | 519/630 [10:40:17<2:30:59, 81.62s/it]
83%|████████▎ | 520/630 [10:41:38<2:29:08, 81.35s/it]
{'loss': 0.018, 'learning_rate': 0.0001, 'global_step': 520, 'epoch': 2.47}
83%|████████▎ | 520/630 [10:41:38<2:29:08, 81.35s/it]
83%|████████▎ | 521/630 [10:42:59<2:28:00, 81.47s/it]
83%|████████▎ | 522/630 [10:44:21<2:26:54, 81.61s/it]
83%|████████▎ | 523/630 [10:45:43<2:25:34, 81.63s/it]
83%|████████▎ | 524/630 [10:47:05<2:24:08, 81.59s/it]
83%|████████▎ | 525/630 [10:48:26<2:22:46, 81.58s/it]
83%|████████▎ | 526/630 [10:49:48<2:21:22, 81.56s/it]
84%|████████▎ | 527/630 [10:51:09<2:19:59, 81.55s/it]
84%|████████▍ | 528/630 [10:52:30<2:18:23, 81.40s/it]
84%|████████▍ | 529/630 [10:53:52<2:17:16, 81.55s/it]
84%|████████▍ | 530/630 [10:55:14<2:15:53, 81.53s/it]
84%|████████▍ | 531/630 [10:56:35<2:14:33, 81.55s/it]
84%|████████▍ | 532/630 [10:57:57<2:13:09, 81.52s/it]
85%|████████▍ | 533/630 [10:59:18<2:11:45, 81.50s/it]
85%|████████▍ | 534/630 [11:00:40<2:10:41, 81.68s/it]
85%|████████▍ | 535/630 [11:02:02<2:09:35, 81.85s/it]
85%|████████▌ | 536/630 [11:03:18<2:05:08, 79.88s/it]
85%|████████▌ | 537/630 [11:04:40<2:04:51, 80.56s/it]
85%|████████▌ | 538/630 [11:06:01<2:04:01, 80.88s/it]
86%|████████▌ | 539/630 [11:07:24<2:03:12, 81.24s/it]
86%|████████▌ | 540/630 [11:08:46<2:02:17, 81.53s/it]
{'loss': 0.0186, 'learning_rate': 0.0001, 'global_step': 540, 'epoch': 2.57}
86%|████████▌ | 540/630 [11:08:46<2:02:17, 81.53s/it]
86%|████████▌ | 541/630 [11:10:08<2:01:09, 81.68s/it]
86%|████████▌ | 542/630 [11:11:30<1:59:54, 81.76s/it]
86%|████████▌ | 543/630 [11:12:52<1:58:38, 81.83s/it]
86%|████████▋ | 544/630 [11:14:14<1:57:25, 81.93s/it]
87%|████████▋ | 545/630 [11:15:36<1:56:07, 81.98s/it]
87%|████████▋ | 546/630 [11:16:58<1:54:41, 81.92s/it]
87%|████████▋ | 547/630 [11:18:19<1:53:13, 81.85s/it]
87%|████████▋ | 548/630 [11:19:41<1:51:49, 81.82s/it]
87%|████████▋ | 549/630 [11:21:02<1:50:07, 81.57s/it]
87%|████████▋ | 550/630 [11:22:23<1:48:36, 81.46s/it]
87%|████████▋ | 551/630 [11:23:45<1:47:22, 81.55s/it]
88%|████████▊ | 552/630 [11:25:07<1:46:09, 81.65s/it]
88%|████████▊ | 553/630 [11:26:23<1:42:39, 79.99s/it]
88%|████████▊ | 554/630 [11:27:45<1:42:02, 80.56s/it]
88%|████████▊ | 555/630 [11:29:07<1:41:06, 80.89s/it]
88%|████████▊ | 556/630 [11:30:25<1:38:54, 80.20s/it]
88%|████████▊ | 557/630 [11:31:47<1:38:06, 80.64s/it]
89%|████████▊ | 558/630 [11:33:09<1:37:14, 81.03s/it]
89%|████████▊ | 559/630 [11:34:31<1:36:06, 81.23s/it]
89%|████████▉ | 560/630 [11:35:52<1:34:59, 81.42s/it]
{'loss': 0.0308, 'learning_rate': 0.0001, 'global_step': 560, 'epoch': 2.66}
89%|████████▉ | 560/630 [11:35:52<1:34:59, 81.42s/it]
89%|████████▉ | 561/630 [11:37:14<1:33:44, 81.51s/it]
89%|████████▉ | 562/630 [11:38:36<1:32:26, 81.56s/it]
89%|████████▉ | 563/630 [11:39:58<1:31:18, 81.77s/it]
90%|████████▉ | 564/630 [11:41:19<1:29:41, 81.54s/it]
90%|████████▉ | 565/630 [11:42:41<1:28:19, 81.53s/it]
90%|████████▉ | 566/630 [11:44:02<1:26:57, 81.52s/it]
90%|█████████ | 567/630 [11:45:24<1:25:48, 81.72s/it]
90%|█████████ | 568/630 [11:46:46<1:24:22, 81.66s/it]
90%|█████████ | 569/630 [11:48:08<1:23:10, 81.81s/it]
90%|█████████ | 570/630 [11:49:30<1:21:47, 81.79s/it]
91%|█████████ | 571/630 [11:50:51<1:20:24, 81.76s/it]
91%|█████████ | 572/630 [11:52:13<1:19:06, 81.84s/it]
91%|█████████ | 573/630 [11:53:35<1:17:45, 81.84s/it]
91%|█████████ | 574/630 [11:54:57<1:16:18, 81.76s/it]
91%|█████████▏| 575/630 [11:56:18<1:14:51, 81.66s/it]
91%|█████████▏| 576/630 [11:57:40<1:13:31, 81.69s/it]
92%|█████████▏| 577/630 [11:59:02<1:12:15, 81.79s/it]
92%|█████████▏| 578/630 [12:00:22<1:10:25, 81.26s/it]
92%|█████████▏| 579/630 [12:01:44<1:09:11, 81.40s/it]
92%|█████████▏| 580/630 [12:03:05<1:07:51, 81.43s/it]
{'loss': 0.0536, 'learning_rate': 0.0001, 'global_step': 580, 'epoch': 2.76}
92%|█████████▏| 580/630 [12:03:05<1:07:51, 81.43s/it]
92%|█████████▏| 581/630 [12:04:27<1:06:33, 81.50s/it]
92%|█████████▏| 582/630 [12:05:49<1:05:18, 81.64s/it]
93%|█████████▎| 583/630 [12:07:11<1:03:58, 81.66s/it]
93%|█████████▎| 584/630 [12:08:33<1:02:40, 81.74s/it]
93%|█████████▎| 585/630 [12:09:55<1:01:21, 81.82s/it]
93%|█████████▎| 586/630 [12:11:16<59:58, 81.79s/it]
93%|█████████▎| 587/630 [12:12:38<58:36, 81.77s/it]
93%|█████████▎| 588/630 [12:14:00<57:11, 81.71s/it]
93%|█████████▎| 589/630 [12:15:21<55:50, 81.73s/it]
94%|█████████▎| 590/630 [12:16:43<54:31, 81.80s/it]
94%|█████████▍| 591/630 [12:18:05<53:08, 81.76s/it]
94%|█████████▍| 592/630 [12:19:27<51:46, 81.76s/it]
94%|█████████▍| 593/630 [12:20:49<50:26, 81.79s/it]
94%|█████████▍| 594/630 [12:22:11<49:05, 81.83s/it]
94%|█████████▍| 595/630 [12:23:33<47:46, 81.91s/it]
95%|█████████▍| 596/630 [12:24:55<46:25, 81.92s/it]
95%|█████████▍| 597/630 [12:26:16<45:01, 81.87s/it]
95%|█████████▍| 598/630 [12:27:38<43:40, 81.88s/it]
95%|█████████▌| 599/630 [12:29:00<42:21, 81.97s/it]
95%|█████████▌| 600/630 [12:30:22<40:58, 81.97s/it]
{'loss': 0.0563, 'learning_rate': 0.0001, 'global_step': 600, 'epoch': 2.85}
95%|█████████▌| 600/630 [12:30:22<40:58, 81.97s/it]
95%|█████████▌| 601/630 [12:31:44<39:33, 81.84s/it]
96%|█████████▌| 602/630 [12:33:06<38:11, 81.83s/it]
96%|█████████▌| 603/630 [12:34:26<36:32, 81.21s/it]
96%|█████████▌| 604/630 [12:35:47<35:16, 81.39s/it]
96%|█████████▌| 605/630 [12:37:09<33:57, 81.50s/it]
96%|█████████▌| 606/630 [12:38:31<32:38, 81.59s/it]
96%|█████████▋| 607/630 [12:39:52<31:16, 81.57s/it]
97%|█████████▋| 608/630 [12:41:14<29:56, 81.64s/it]
97%|█████████▋| 609/630 [12:42:36<28:35, 81.68s/it]
97%|█████████▋| 610/630 [12:43:58<27:14, 81.75s/it]
97%|█████████▋| 611/630 [12:45:20<25:53, 81.75s/it]
97%|█████████▋| 612/630 [12:46:41<24:31, 81.76s/it]
97%|█████████▋| 613/630 [12:48:03<23:10, 81.77s/it]
97%|█████████▋| 614/630 [12:49:25<21:48, 81.78s/it]
98%|█████████▊| 615/630 [12:50:47<20:25, 81.72s/it]
98%|█████████▊| 616/630 [12:52:08<19:04, 81.74s/it]
98%|█████████▊| 617/630 [12:53:30<17:42, 81.71s/it]
98%|█████████▊| 618/630 [12:54:52<16:21, 81.81s/it]
98%|█████████▊| 619/630 [12:56:14<14:59, 81.77s/it]
98%|█████████▊| 620/630 [12:57:35<13:36, 81.68s/it]
{'loss': 0.0242, 'learning_rate': 0.0001, 'global_step': 620, 'epoch': 2.95}
98%|█████████▊| 620/630 [12:57:35<13:36, 81.68s/it]
99%|█████████▊| 621/630 [12:58:57<12:16, 81.80s/it]
99%|█████████▊| 622/630 [13:00:19<10:53, 81.71s/it]
99%|█████████▉| 623/630 [13:01:41<09:31, 81.71s/it]
99%|█████████▉| 624/630 [13:03:02<08:09, 81.66s/it]
99%|█████████▉| 625/630 [13:04:21<06:44, 80.87s/it]
99%|█████████▉| 626/630 [13:05:43<05:24, 81.09s/it]
100%|█████████▉| 627/630 [13:07:04<04:03, 81.23s/it]
100%|█████████▉| 628/630 [13:08:26<02:42, 81.33s/it]
100%|█████████▉| 629/630 [13:09:48<01:21, 81.48s/it]
100%|██████████| 630/630 [13:11:09<00:00, 81.50s/it]
{'train_runtime': 47469.8541, 'train_samples_per_second': 0.159, 'train_steps_per_second': 0.013, 'train_loss': 0.24650317042592973, 'epoch': 3.0}
100%|██████████| 630/630 [13:11:09<00:00, 81.50s/it]
100%|██████████| 630/630 [13:11:09<00:00, 75.35s/it]
***** train metrics *****
epoch = 3.0
train_loss = 0.2465
train_runtime = 13:11:09.85
train_samples_per_second = 0.159
train_steps_per_second = 0.013
|