| | model training desc: CCLUE-MRC仅在TechGPT-7B上训练 |
| | 2023-12-04 14:37:55.661 | INFO | __main__:init_components:108 - Initializing components... |
| | You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 |
| | 2023-12-04 14:38:18.229 | INFO | __main__:init_components:143 - |
| |
|
| | 2023-12-04 14:38:18.229 | INFO | __main__:init_components:144 - ******************** |
| | 2023-12-04 14:38:18.229 | INFO | __main__:init_components:145 - using TechGPT-7B |
| | 2023-12-04 14:38:18.229 | INFO | __main__:init_components:146 - ******************** |
| | 2023-12-04 14:38:18.229 | INFO | __main__:init_components:147 - |
| |
|
| | memory footprint of model: 5.472740173339844 GB |
| | trainable params: 319,815,680 || all params: 7,447,007,232 || trainable%: 4.294553100818044 |
| | 2023-12-04 14:39:00.475 | INFO | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/CCLUE/cclue_instruct/train.jsonl |
| | 2023-12-04 14:39:00.498 | INFO | component.dataset:__init__:19 - there are 3473 data in dataset |
| | 2023-12-04 14:39:00.535 | INFO | __main__:main:231 - *** starting training *** |
| |
0%| | 0/522 [00:00<?, ?it/s]
0%| | 1/522 [00:24<3:34:55, 24.75s/it]
0%| | 2/522 [00:50<3:38:56, 25.26s/it]
1%| | 3/522 [01:13<3:31:10, 24.41s/it]
1%| | 4/522 [01:42<3:45:26, 26.11s/it]
1%| | 5/522 [02:09<3:47:32, 26.41s/it]
1%| | 6/522 [02:37<3:51:28, 26.92s/it]
1%|▏ | 7/522 [03:03<3:48:31, 26.62s/it]
2%|▏ | 8/522 [03:30<3:48:53, 26.72s/it]
2%|▏ | 9/522 [03:57<3:50:43, 26.98s/it]
2%|▏ | 10/522 [04:20<3:38:15, 25.58s/it]
2%|▏ | 11/522 [04:46<3:39:12, 25.74s/it]
2%|▏ | 12/522 [05:04<3:20:10, 23.55s/it]
2%|▏ | 13/522 [05:33<3:31:54, 24.98s/it]
3%|▎ | 14/522 [05:51<3:14:44, 23.00s/it]
3%|▎ | 15/522 [06:18<3:23:20, 24.06s/it]
3%|▎ | 16/522 [06:41<3:21:27, 23.89s/it]
3%|▎ | 17/522 [07:07<3:24:51, 24.34s/it]
3%|▎ | 18/522 [07:31<3:24:56, 24.40s/it]
4%|▎ | 19/522 [07:56<3:25:54, 24.56s/it]
4%|▍ | 20/522 [08:20<3:25:05, 24.51s/it]
{'loss': 5.6443, 'learning_rate': 3.39622641509434e-05, 'global_step': 20, 'epoch': 0.11} |
| |
4%|▍ | 20/522 [08:20<3:25:05, 24.51s/it]
4%|▍ | 21/522 [08:47<3:29:46, 25.12s/it]
4%|▍ | 22/522 [09:08<3:19:00, 23.88s/it]
4%|▍ | 23/522 [09:39<3:35:20, 25.89s/it]
5%|▍ | 24/522 [10:01<3:27:38, 25.02s/it]
5%|▍ | 25/522 [10:28<3:30:10, 25.37s/it]
5%|▍ | 26/522 [10:53<3:30:29, 25.46s/it]
5%|▌ | 27/522 [11:19<3:30:50, 25.56s/it]
5%|▌ | 28/522 [11:42<3:23:28, 24.71s/it]
6%|▌ | 29/522 [12:02<3:11:09, 23.26s/it]
6%|▌ | 30/522 [12:21<3:00:56, 22.07s/it]
6%|▌ | 31/522 [12:47<3:11:01, 23.34s/it]
6%|▌ | 32/522 [13:09<3:05:17, 22.69s/it]
6%|▋ | 33/522 [13:32<3:07:37, 23.02s/it]
7%|▋ | 34/522 [13:53<3:01:04, 22.26s/it]
7%|▋ | 35/522 [14:19<3:10:33, 23.48s/it]
7%|▋ | 36/522 [14:44<3:14:20, 23.99s/it]
7%|▋ | 37/522 [15:11<3:21:33, 24.94s/it]
7%|▋ | 38/522 [15:36<3:19:47, 24.77s/it]
7%|▋ | 39/522 [16:01<3:20:47, 24.94s/it]
8%|▊ | 40/522 [16:29<3:28:12, 25.92s/it]
{'loss': 1.5982, 'learning_rate': 7.169811320754717e-05, 'global_step': 40, 'epoch': 0.23} |
| |
8%|▊ | 40/522 [16:29<3:28:12, 25.92s/it]
8%|▊ | 41/522 [16:51<3:17:33, 24.64s/it]
8%|▊ | 42/522 [17:16<3:17:28, 24.68s/it]
8%|▊ | 43/522 [17:42<3:20:12, 25.08s/it]
8%|▊ | 44/522 [18:09<3:23:42, 25.57s/it]
9%|▊ | 45/522 [18:34<3:22:32, 25.48s/it]
9%|▉ | 46/522 [18:58<3:20:09, 25.23s/it]
9%|▉ | 47/522 [19:24<3:19:59, 25.26s/it]
9%|▉ | 48/522 [19:49<3:20:26, 25.37s/it]
9%|▉ | 49/522 [20:11<3:10:43, 24.19s/it]
10%|▉ | 50/522 [20:39<3:20:13, 25.45s/it]
10%|▉ | 51/522 [21:02<3:13:58, 24.71s/it]
10%|▉ | 52/522 [21:27<3:12:49, 24.62s/it]
10%|█ | 53/522 [21:50<3:09:26, 24.23s/it]
10%|█ | 54/522 [22:09<2:57:16, 22.73s/it]
11%|█ | 55/522 [22:34<3:00:44, 23.22s/it]
11%|█ | 56/522 [22:57<3:00:14, 23.21s/it]
11%|█ | 57/522 [23:19<2:56:51, 22.82s/it]
11%|█ | 58/522 [23:40<2:52:06, 22.26s/it]
11%|█▏ | 59/522 [24:01<2:49:34, 21.97s/it]
11%|█▏ | 60/522 [24:28<3:01:15, 23.54s/it]
{'loss': 0.7384, 'learning_rate': 0.0001, 'global_step': 60, 'epoch': 0.34} |
| |
11%|█▏ | 60/522 [24:28<3:01:15, 23.54s/it]
12%|█▏ | 61/522 [24:53<3:03:51, 23.93s/it]
12%|█▏ | 62/522 [25:14<2:57:30, 23.15s/it]
12%|█▏ | 63/522 [25:38<2:58:08, 23.29s/it]
12%|█▏ | 64/522 [26:04<3:03:57, 24.10s/it]
12%|█▏ | 65/522 [26:22<2:49:52, 22.30s/it]
13%|█▎ | 66/522 [26:51<3:05:48, 24.45s/it]
13%|█▎ | 67/522 [27:12<2:56:23, 23.26s/it]
13%|█▎ | 68/522 [27:37<3:00:37, 23.87s/it]
13%|█▎ | 69/522 [28:06<3:11:12, 25.32s/it]
13%|█▎ | 70/522 [28:32<3:12:53, 25.61s/it]
14%|█▎ | 71/522 [28:59<3:15:50, 26.05s/it]
14%|█▍ | 72/522 [29:23<3:09:10, 25.22s/it]
14%|█▍ | 73/522 [29:49<3:10:24, 25.44s/it]
14%|█▍ | 74/522 [30:15<3:12:14, 25.75s/it]
14%|█▍ | 75/522 [30:35<2:58:55, 24.02s/it]
15%|█▍ | 76/522 [30:56<2:52:01, 23.14s/it]
15%|█▍ | 77/522 [31:22<2:56:50, 23.84s/it]
15%|█▍ | 78/522 [31:46<2:57:50, 24.03s/it]
15%|█▌ | 79/522 [32:11<2:58:50, 24.22s/it]
15%|█▌ | 80/522 [32:32<2:51:56, 23.34s/it]
{'loss': 0.7772, 'learning_rate': 0.0001, 'global_step': 80, 'epoch': 0.46} |
| |
15%|█▌ | 80/522 [32:32<2:51:56, 23.34s/it]
16%|█▌ | 81/522 [32:53<2:45:32, 22.52s/it]
16%|█▌ | 82/522 [33:17<2:50:18, 23.22s/it]
16%|█▌ | 83/522 [33:45<2:58:49, 24.44s/it]
16%|█▌ | 84/522 [34:11<3:01:57, 24.93s/it]
16%|█▋ | 85/522 [34:38<3:05:27, 25.46s/it]
16%|█▋ | 86/522 [35:07<3:14:36, 26.78s/it]
17%|█▋ | 87/522 [35:30<3:06:12, 25.68s/it]
17%|█▋ | 88/522 [35:56<3:06:05, 25.73s/it]
17%|█▋ | 89/522 [36:18<2:56:37, 24.48s/it]
17%|█▋ | 90/522 [36:46<3:03:31, 25.49s/it]
17%|█▋ | 91/522 [37:08<2:56:55, 24.63s/it]
18%|█▊ | 92/522 [37:27<2:44:01, 22.89s/it]
18%|█▊ | 93/522 [37:53<2:50:29, 23.84s/it]
18%|█▊ | 94/522 [38:23<3:03:45, 25.76s/it]
18%|█▊ | 95/522 [38:49<3:02:30, 25.65s/it]
18%|█▊ | 96/522 [39:13<2:59:25, 25.27s/it]
19%|█▊ | 97/522 [39:38<2:57:29, 25.06s/it]
19%|█▉ | 98/522 [40:02<2:55:16, 24.80s/it]
19%|█▉ | 99/522 [40:25<2:50:50, 24.23s/it]
19%|█▉ | 100/522 [40:49<2:51:00, 24.31s/it]
{'loss': 0.7537, 'learning_rate': 0.0001, 'global_step': 100, 'epoch': 0.57} |
| |
19%|█▉ | 100/522 [40:49<2:51:00, 24.31s/it]
19%|█▉ | 101/522 [41:11<2:44:35, 23.46s/it]
20%|█▉ | 102/522 [41:35<2:46:01, 23.72s/it]
20%|█▉ | 103/522 [41:59<2:44:48, 23.60s/it]
20%|█▉ | 104/522 [42:28<2:55:41, 25.22s/it]
20%|██ | 105/522 [42:51<2:51:57, 24.74s/it]
20%|██ | 106/522 [43:17<2:54:41, 25.20s/it]
20%|██ | 107/522 [43:40<2:48:14, 24.32s/it]
21%|██ | 108/522 [44:00<2:39:14, 23.08s/it]
21%|██ | 109/522 [44:26<2:44:30, 23.90s/it]
21%|██ | 110/522 [44:47<2:37:58, 23.01s/it]
21%|██▏ | 111/522 [45:09<2:36:18, 22.82s/it]
21%|██▏ | 112/522 [45:28<2:28:14, 21.69s/it]
22%|██▏ | 113/522 [45:52<2:31:50, 22.28s/it]
22%|██▏ | 114/522 [46:16<2:34:38, 22.74s/it]
22%|██▏ | 115/522 [46:38<2:33:14, 22.59s/it]
22%|██▏ | 116/522 [47:03<2:39:01, 23.50s/it]
22%|██▏ | 117/522 [47:32<2:49:13, 25.07s/it]
23%|██▎ | 118/522 [47:56<2:45:57, 24.65s/it]
23%|██▎ | 119/522 [48:24<2:52:14, 25.64s/it]
23%|██▎ | 120/522 [48:46<2:45:57, 24.77s/it]
{'loss': 0.7405, 'learning_rate': 0.0001, 'global_step': 120, 'epoch': 0.69} |
| |
23%|██▎ | 120/522 [48:46<2:45:57, 24.77s/it]
23%|██▎ | 121/522 [49:11<2:45:45, 24.80s/it]
23%|██▎ | 122/522 [49:40<2:52:32, 25.88s/it]
24%|██▎ | 123/522 [50:03<2:46:24, 25.02s/it]
24%|██▍ | 124/522 [50:29<2:48:30, 25.40s/it]
24%|██▍ | 125/522 [50:52<2:42:25, 24.55s/it]
24%|██▍ | 126/522 [51:16<2:40:59, 24.39s/it]
24%|██▍ | 127/522 [51:40<2:39:35, 24.24s/it]
25%|██▍ | 128/522 [52:04<2:39:20, 24.26s/it]
25%|██▍ | 129/522 [52:26<2:34:31, 23.59s/it]
25%|██▍ | 130/522 [52:45<2:25:16, 22.24s/it]
25%|██▌ | 131/522 [53:11<2:33:09, 23.50s/it]
25%|██▌ | 132/522 [53:36<2:34:15, 23.73s/it]
25%|██▌ | 133/522 [53:57<2:29:42, 23.09s/it]
26%|██▌ | 134/522 [54:25<2:38:58, 24.58s/it]
26%|██▌ | 135/522 [54:47<2:33:36, 23.81s/it]
26%|██▌ | 136/522 [55:13<2:36:34, 24.34s/it]
26%|██▌ | 137/522 [55:34<2:30:29, 23.45s/it]
26%|██▋ | 138/522 [56:00<2:33:44, 24.02s/it]
27%|██▋ | 139/522 [56:24<2:33:06, 23.99s/it]
27%|██▋ | 140/522 [56:51<2:39:08, 25.00s/it]
{'loss': 0.716, 'learning_rate': 0.0001, 'global_step': 140, 'epoch': 0.8} |
| |
27%|██▋ | 140/522 [56:51<2:39:08, 25.00s/it]
27%|██▋ | 141/522 [57:17<2:41:07, 25.37s/it]
27%|██▋ | 142/522 [57:41<2:38:13, 24.98s/it]
27%|██▋ | 143/522 [58:06<2:36:32, 24.78s/it]
28%|██▊ | 144/522 [58:31<2:37:54, 25.07s/it]
28%|██▊ | 145/522 [58:51<2:28:16, 23.60s/it]
28%|██▊ | 146/522 [59:14<2:25:07, 23.16s/it]
28%|██▊ | 147/522 [59:35<2:22:22, 22.78s/it]
28%|██▊ | 148/522 [1:00:05<2:35:27, 24.94s/it]
29%|██▊ | 149/522 [1:00:30<2:35:08, 24.96s/it]
29%|██▊ | 150/522 [1:01:01<2:44:49, 26.59s/it]
29%|██▉ | 151/522 [1:01:25<2:40:38, 25.98s/it]
29%|██▉ | 152/522 [1:01:48<2:34:37, 25.07s/it]
29%|██▉ | 153/522 [1:02:12<2:30:57, 24.55s/it]
30%|██▉ | 154/522 [1:02:39<2:34:53, 25.25s/it]
30%|██▉ | 155/522 [1:03:04<2:35:19, 25.39s/it]
30%|██▉ | 156/522 [1:03:28<2:32:25, 24.99s/it]
30%|███ | 157/522 [1:03:52<2:29:15, 24.54s/it]
30%|███ | 158/522 [1:04:12<2:21:32, 23.33s/it]
30%|███ | 159/522 [1:04:40<2:29:03, 24.64s/it]
31%|███ | 160/522 [1:05:03<2:25:33, 24.12s/it]
{'loss': 0.7134, 'learning_rate': 0.0001, 'global_step': 160, 'epoch': 0.92} |
| |
31%|███ | 160/522 [1:05:03<2:25:33, 24.12s/it]
31%|███ | 161/522 [1:05:28<2:26:38, 24.37s/it]
31%|███ | 162/522 [1:05:50<2:21:48, 23.63s/it]
31%|███ | 163/522 [1:06:15<2:24:29, 24.15s/it]
31%|███▏ | 164/522 [1:06:35<2:16:51, 22.94s/it]
32%|███▏ | 165/522 [1:06:57<2:14:05, 22.54s/it]
32%|███▏ | 166/522 [1:07:19<2:12:29, 22.33s/it]
32%|███▏ | 167/522 [1:07:38<2:06:59, 21.46s/it]
32%|███▏ | 168/522 [1:08:03<2:12:22, 22.44s/it]
32%|███▏ | 169/522 [1:08:26<2:12:58, 22.60s/it]
33%|███▎ | 170/522 [1:08:50<2:15:22, 23.07s/it]
33%|███▎ | 171/522 [1:09:21<2:28:17, 25.35s/it]
33%|███▎ | 172/522 [1:09:47<2:28:47, 25.51s/it]
33%|███▎ | 173/522 [1:10:15<2:34:00, 26.48s/it]
33%|███▎ | 174/522 [1:10:29<2:10:36, 22.52s/it]
34%|███▎ | 175/522 [1:10:56<2:18:01, 23.87s/it]
34%|███▎ | 176/522 [1:11:23<2:22:53, 24.78s/it]
34%|███▍ | 177/522 [1:11:48<2:23:18, 24.92s/it]
34%|███▍ | 178/522 [1:12:12<2:21:06, 24.61s/it]
34%|███▍ | 179/522 [1:12:35<2:18:26, 24.22s/it]
34%|███▍ | 180/522 [1:12:59<2:17:17, 24.09s/it]
{'loss': 0.7104, 'learning_rate': 0.0001, 'global_step': 180, 'epoch': 1.03} |
| |
34%|███▍ | 180/522 [1:12:59<2:17:17, 24.09s/it]
35%|███▍ | 181/522 [1:13:27<2:24:27, 25.42s/it]
35%|███▍ | 182/522 [1:13:53<2:24:35, 25.52s/it]
35%|███▌ | 183/522 [1:14:17<2:21:57, 25.13s/it]
35%|███▌ | 184/522 [1:14:41<2:19:10, 24.71s/it]
35%|███▌ | 185/522 [1:15:02<2:12:20, 23.56s/it]
36%|███▌ | 186/522 [1:15:26<2:12:45, 23.71s/it]
36%|███▌ | 187/522 [1:15:54<2:20:09, 25.10s/it]
36%|███▌ | 188/522 [1:16:20<2:21:03, 25.34s/it]
36%|███▌ | 189/522 [1:16:47<2:22:18, 25.64s/it]
36%|███▋ | 190/522 [1:17:08<2:14:34, 24.32s/it]
37%|███▋ | 191/522 [1:17:28<2:07:48, 23.17s/it]
37%|███▋ | 192/522 [1:17:52<2:08:05, 23.29s/it]
37%|███▋ | 193/522 [1:18:19<2:13:52, 24.41s/it]
37%|███▋ | 194/522 [1:18:45<2:16:52, 25.04s/it]
37%|███▋ | 195/522 [1:19:06<2:10:03, 23.86s/it]
38%|███▊ | 196/522 [1:19:28<2:06:08, 23.22s/it]
38%|███▊ | 197/522 [1:19:54<2:09:49, 23.97s/it]
38%|███▊ | 198/522 [1:20:12<2:00:37, 22.34s/it]
38%|███▊ | 199/522 [1:20:39<2:06:36, 23.52s/it]
38%|███▊ | 200/522 [1:21:02<2:06:12, 23.52s/it]
{'loss': 0.721, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 1.15} |
| |
38%|███▊ | 200/522 [1:21:02<2:06:12, 23.52s/it]
39%|███▊ | 201/522 [1:21:31<2:13:59, 25.05s/it]
39%|███▊ | 202/522 [1:21:56<2:13:03, 24.95s/it]
39%|███▉ | 203/522 [1:22:19<2:10:13, 24.49s/it]
39%|███▉ | 204/522 [1:22:46<2:13:58, 25.28s/it]
39%|███▉ | 205/522 [1:23:10<2:11:47, 24.94s/it]
39%|███▉ | 206/522 [1:23:33<2:07:41, 24.24s/it]
40%|███▉ | 207/522 [1:23:58<2:08:43, 24.52s/it]
40%|███▉ | 208/522 [1:24:21<2:05:13, 23.93s/it]
40%|████ | 209/522 [1:24:43<2:03:16, 23.63s/it]
40%|████ | 210/522 [1:25:04<1:57:19, 22.56s/it]
40%|████ | 211/522 [1:25:30<2:03:26, 23.82s/it]
41%|████ | 212/522 [1:25:55<2:03:49, 23.96s/it]
41%|████ | 213/522 [1:26:15<1:57:09, 22.75s/it]
41%|████ | 214/522 [1:26:36<1:55:07, 22.43s/it]
41%|████ | 215/522 [1:26:56<1:51:05, 21.71s/it]
41%|████▏ | 216/522 [1:27:24<2:00:09, 23.56s/it]
42%|████▏ | 217/522 [1:27:48<2:00:04, 23.62s/it]
42%|████▏ | 218/522 [1:28:10<1:57:59, 23.29s/it]
42%|████▏ | 219/522 [1:28:35<2:00:03, 23.77s/it]
42%|████▏ | 220/522 [1:29:00<2:01:05, 24.06s/it]
{'loss': 0.7116, 'learning_rate': 0.0001, 'global_step': 220, 'epoch': 1.26} |
| |
42%|████▏ | 220/522 [1:29:00<2:01:05, 24.06s/it]
42%|████▏ | 221/522 [1:29:24<2:00:07, 23.94s/it]
43%|████▎ | 222/522 [1:29:46<1:57:25, 23.48s/it]
43%|████▎ | 223/522 [1:30:08<1:54:06, 22.90s/it]
43%|████▎ | 224/522 [1:30:28<1:49:30, 22.05s/it]
43%|████▎ | 225/522 [1:30:51<1:51:10, 22.46s/it]
43%|████▎ | 226/522 [1:31:11<1:46:37, 21.61s/it]
43%|████▎ | 227/522 [1:31:39<1:55:40, 23.53s/it]
44%|████▎ | 228/522 [1:32:04<1:57:31, 23.98s/it]
44%|████▍ | 229/522 [1:32:23<1:50:11, 22.56s/it]
44%|████▍ | 230/522 [1:32:45<1:48:55, 22.38s/it]
44%|████▍ | 231/522 [1:33:06<1:45:58, 21.85s/it]
44%|████▍ | 232/522 [1:33:28<1:46:28, 22.03s/it]
45%|████▍ | 233/522 [1:33:59<1:58:23, 24.58s/it]
45%|████▍ | 234/522 [1:34:20<1:52:52, 23.52s/it]
45%|████▌ | 235/522 [1:34:50<2:02:21, 25.58s/it]
45%|████▌ | 236/522 [1:35:15<2:01:13, 25.43s/it]
45%|████▌ | 237/522 [1:35:43<2:04:56, 26.30s/it]
46%|████▌ | 238/522 [1:36:05<1:57:04, 24.73s/it]
46%|████▌ | 239/522 [1:36:29<1:56:18, 24.66s/it]
46%|████▌ | 240/522 [1:36:54<1:56:52, 24.87s/it]
{'loss': 0.7153, 'learning_rate': 0.0001, 'global_step': 240, 'epoch': 1.38} |
| |
46%|████▌ | 240/522 [1:36:54<1:56:52, 24.87s/it]
46%|████▌ | 241/522 [1:37:19<1:56:11, 24.81s/it]
46%|████▋ | 242/522 [1:37:49<2:02:40, 26.29s/it]
47%|████▋ | 243/522 [1:38:13<1:59:32, 25.71s/it]
47%|████▋ | 244/522 [1:38:35<1:53:33, 24.51s/it]
47%|████▋ | 245/522 [1:38:57<1:49:48, 23.79s/it]
47%|████▋ | 246/522 [1:39:20<1:47:58, 23.47s/it]
47%|████▋ | 247/522 [1:39:47<1:53:31, 24.77s/it]
48%|████▊ | 248/522 [1:40:10<1:50:33, 24.21s/it]
48%|████▊ | 249/522 [1:40:33<1:47:51, 23.70s/it]
48%|████▊ | 250/522 [1:41:00<1:51:27, 24.59s/it]
48%|████▊ | 251/522 [1:41:23<1:49:47, 24.31s/it]
48%|████▊ | 252/522 [1:41:46<1:47:53, 23.98s/it]
48%|████▊ | 253/522 [1:42:13<1:50:21, 24.62s/it]
49%|████▊ | 254/522 [1:42:40<1:53:17, 25.36s/it]
49%|████▉ | 255/522 [1:43:07<1:55:47, 26.02s/it]
49%|████▉ | 256/522 [1:43:28<1:48:30, 24.48s/it]
49%|████▉ | 257/522 [1:43:54<1:50:41, 25.06s/it]
49%|████▉ | 258/522 [1:44:22<1:52:58, 25.68s/it]
50%|████▉ | 259/522 [1:44:43<1:46:55, 24.39s/it]
50%|████▉ | 260/522 [1:45:04<1:42:14, 23.42s/it]
{'loss': 0.7116, 'learning_rate': 0.0001, 'global_step': 260, 'epoch': 1.49} |
| |
50%|████▉ | 260/522 [1:45:04<1:42:14, 23.42s/it]
50%|█████ | 261/522 [1:45:25<1:38:31, 22.65s/it]
50%|█████ | 262/522 [1:45:45<1:35:19, 22.00s/it]
50%|█████ | 263/522 [1:46:07<1:34:14, 21.83s/it]
51%|█████ | 264/522 [1:46:29<1:33:46, 21.81s/it]
51%|█████ | 265/522 [1:46:53<1:36:13, 22.47s/it]
51%|█████ | 266/522 [1:47:12<1:32:22, 21.65s/it]
51%|█████ | 267/522 [1:47:31<1:28:28, 20.82s/it]
51%|█████▏ | 268/522 [1:47:58<1:35:01, 22.45s/it]
52%|█████▏ | 269/522 [1:48:17<1:31:27, 21.69s/it]
52%|█████▏ | 270/522 [1:48:42<1:34:55, 22.60s/it]
52%|█████▏ | 271/522 [1:49:02<1:30:32, 21.64s/it]
52%|█████▏ | 272/522 [1:49:27<1:34:39, 22.72s/it]
52%|█████▏ | 273/522 [1:49:49<1:33:59, 22.65s/it]
52%|█████▏ | 274/522 [1:50:09<1:29:49, 21.73s/it]
53%|█████▎ | 275/522 [1:50:35<1:34:38, 22.99s/it]
53%|█████▎ | 276/522 [1:51:00<1:37:10, 23.70s/it]
53%|█████▎ | 277/522 [1:51:26<1:38:59, 24.24s/it]
53%|█████▎ | 278/522 [1:51:46<1:33:25, 22.98s/it]
53%|█████▎ | 279/522 [1:52:12<1:36:44, 23.88s/it]
54%|█████▎ | 280/522 [1:52:37<1:38:02, 24.31s/it]
{'loss': 0.6967, 'learning_rate': 0.0001, 'global_step': 280, 'epoch': 1.61} |
| |
54%|█████▎ | 280/522 [1:52:37<1:38:02, 24.31s/it]
54%|█████▍ | 281/522 [1:53:04<1:40:28, 25.02s/it]
54%|█████▍ | 282/522 [1:53:29<1:40:14, 25.06s/it]
54%|█████▍ | 283/522 [1:53:51<1:35:54, 24.08s/it]
54%|█████▍ | 284/522 [1:54:15<1:36:16, 24.27s/it]
55%|█████▍ | 285/522 [1:54:41<1:37:17, 24.63s/it]
55%|█████▍ | 286/522 [1:55:04<1:35:09, 24.19s/it]
55%|█████▍ | 287/522 [1:55:26<1:31:52, 23.46s/it]
55%|█████▌ | 288/522 [1:55:48<1:30:04, 23.10s/it]
55%|█████▌ | 289/522 [1:56:13<1:31:42, 23.61s/it]
56%|█████▌ | 290/522 [1:56:34<1:28:27, 22.88s/it]
56%|█████▌ | 291/522 [1:56:59<1:30:06, 23.40s/it]
56%|█████▌ | 292/522 [1:57:21<1:28:27, 23.08s/it]
56%|█████▌ | 293/522 [1:57:43<1:26:26, 22.65s/it]
56%|█████▋ | 294/522 [1:58:09<1:30:43, 23.88s/it]
57%|█████▋ | 295/522 [1:58:31<1:27:56, 23.25s/it]
57%|█████▋ | 296/522 [1:59:01<1:34:50, 25.18s/it]
57%|█████▋ | 297/522 [1:59:29<1:37:37, 26.03s/it]
57%|█████▋ | 298/522 [1:59:57<1:39:32, 26.66s/it]
57%|█████▋ | 299/522 [2:00:27<1:43:06, 27.74s/it]
57%|█████▋ | 300/522 [2:00:56<1:43:41, 28.02s/it]
{'loss': 0.7469, 'learning_rate': 0.0001, 'global_step': 300, 'epoch': 1.72} |
| |
57%|█████▋ | 300/522 [2:00:56<1:43:41, 28.02s/it]
58%|█████▊ | 301/522 [2:01:18<1:36:45, 26.27s/it]
58%|█████▊ | 302/522 [2:01:47<1:39:10, 27.05s/it]
58%|█████▊ | 303/522 [2:02:13<1:37:13, 26.64s/it]
58%|█████▊ | 304/522 [2:02:39<1:36:34, 26.58s/it]
58%|█████▊ | 305/522 [2:03:04<1:34:17, 26.07s/it]
59%|█████▊ | 306/522 [2:03:32<1:35:46, 26.60s/it]
59%|█████▉ | 307/522 [2:03:52<1:28:21, 24.66s/it]
59%|█████▉ | 308/522 [2:04:19<1:30:50, 25.47s/it]
59%|█████▉ | 309/522 [2:04:44<1:29:50, 25.31s/it]
59%|█████▉ | 310/522 [2:05:06<1:25:15, 24.13s/it]
60%|█████▉ | 311/522 [2:05:29<1:23:57, 23.87s/it]
60%|█████▉ | 312/522 [2:05:54<1:25:04, 24.31s/it]
60%|█████▉ | 313/522 [2:06:25<1:31:00, 26.13s/it]
60%|██████ | 314/522 [2:06:47<1:26:41, 25.00s/it]
60%|██████ | 315/522 [2:07:09<1:23:23, 24.17s/it]
61%|██████ | 316/522 [2:07:40<1:29:33, 26.08s/it]
61%|██████ | 317/522 [2:08:06<1:28:54, 26.02s/it]
61%|██████ | 318/522 [2:08:30<1:26:46, 25.52s/it]
61%|██████ | 319/522 [2:08:58<1:28:42, 26.22s/it]
61%|██████▏ | 320/522 [2:09:24<1:28:28, 26.28s/it]
{'loss': 0.6993, 'learning_rate': 0.0001, 'global_step': 320, 'epoch': 1.84} |
| |
61%|██████▏ | 320/522 [2:09:24<1:28:28, 26.28s/it]
61%|██████▏ | 321/522 [2:09:44<1:21:03, 24.19s/it]
62%|██████▏ | 322/522 [2:10:09<1:21:57, 24.59s/it]
62%|██████▏ | 323/522 [2:10:34<1:22:11, 24.78s/it]
62%|██████▏ | 324/522 [2:10:54<1:16:56, 23.31s/it]
62%|██████▏ | 325/522 [2:11:22<1:21:10, 24.72s/it]
62%|██████▏ | 326/522 [2:11:48<1:22:13, 25.17s/it]
63%|██████▎ | 327/522 [2:12:18<1:26:12, 26.52s/it]
63%|██████▎ | 328/522 [2:12:43<1:23:46, 25.91s/it]
63%|██████▎ | 329/522 [2:13:09<1:23:43, 26.03s/it]
63%|██████▎ | 330/522 [2:13:34<1:22:09, 25.68s/it]
63%|██████▎ | 331/522 [2:13:54<1:16:22, 23.99s/it]
64%|██████▎ | 332/522 [2:14:23<1:20:30, 25.42s/it]
64%|██████▍ | 333/522 [2:14:46<1:18:16, 24.85s/it]
64%|██████▍ | 334/522 [2:15:17<1:23:15, 26.57s/it]
64%|██████▍ | 335/522 [2:15:39<1:18:51, 25.30s/it]
64%|██████▍ | 336/522 [2:16:02<1:15:54, 24.49s/it]
65%|██████▍ | 337/522 [2:16:30<1:19:24, 25.76s/it]
65%|██████▍ | 338/522 [2:16:53<1:16:31, 24.95s/it]
65%|██████▍ | 339/522 [2:17:19<1:16:53, 25.21s/it]
65%|██████▌ | 340/522 [2:17:44<1:16:25, 25.20s/it]
{'loss': 0.7135, 'learning_rate': 0.0001, 'global_step': 340, 'epoch': 1.95} |
| |
65%|██████▌ | 340/522 [2:17:44<1:16:25, 25.20s/it]
65%|██████▌ | 341/522 [2:18:12<1:18:15, 25.94s/it]
66%|██████▌ | 342/522 [2:18:37<1:16:47, 25.60s/it]
66%|██████▌ | 343/522 [2:19:03<1:16:49, 25.75s/it]
66%|██████▌ | 344/522 [2:19:24<1:12:40, 24.50s/it]
66%|██████▌ | 345/522 [2:19:45<1:08:54, 23.36s/it]
66%|██████▋ | 346/522 [2:20:11<1:11:06, 24.24s/it]
66%|██████▋ | 347/522 [2:20:32<1:07:16, 23.07s/it]
67%|██████▋ | 348/522 [2:20:46<58:45, 20.26s/it]
67%|██████▋ | 349/522 [2:21:12<1:04:03, 22.21s/it]
67%|██████▋ | 350/522 [2:21:42<1:09:44, 24.33s/it]
67%|██████▋ | 351/522 [2:22:08<1:11:11, 24.98s/it]
67%|██████▋ | 352/522 [2:22:34<1:11:46, 25.33s/it]
68%|██████▊ | 353/522 [2:23:01<1:12:27, 25.72s/it]
68%|██████▊ | 354/522 [2:23:23<1:08:52, 24.60s/it]
68%|██████▊ | 355/522 [2:23:48<1:08:32, 24.63s/it]
68%|██████▊ | 356/522 [2:24:11<1:07:30, 24.40s/it]
68%|██████▊ | 357/522 [2:24:30<1:02:15, 22.64s/it]
69%|██████▊ | 358/522 [2:24:50<59:27, 21.75s/it]
69%|██████▉ | 359/522 [2:25:17<1:03:36, 23.42s/it]
69%|██████▉ | 360/522 [2:25:39<1:02:29, 23.14s/it]
{'loss': 0.682, 'learning_rate': 0.0001, 'global_step': 360, 'epoch': 2.07} |
| |
69%|██████▉ | 360/522 [2:25:39<1:02:29, 23.14s/it]
69%|██████▉ | 361/522 [2:26:01<1:00:30, 22.55s/it]
69%|██████▉ | 362/522 [2:26:24<1:00:30, 22.69s/it]
70%|██████▉ | 363/522 [2:26:54<1:06:09, 24.96s/it]
70%|██████▉ | 364/522 [2:27:15<1:02:23, 23.70s/it]
70%|██████▉ | 365/522 [2:27:39<1:02:19, 23.82s/it]
70%|███████ | 366/522 [2:28:00<59:49, 23.01s/it]
70%|███████ | 367/522 [2:28:26<1:02:09, 24.06s/it]
70%|███████ | 368/522 [2:28:49<1:00:38, 23.63s/it]
71%|███████ | 369/522 [2:29:16<1:02:34, 24.54s/it]
71%|███████ | 370/522 [2:29:40<1:02:23, 24.63s/it]
71%|███████ | 371/522 [2:30:07<1:03:15, 25.14s/it]
71%|███████▏ | 372/522 [2:30:33<1:03:43, 25.49s/it]
71%|███████▏ | 373/522 [2:30:57<1:02:09, 25.03s/it]
72%|███████▏ | 374/522 [2:31:23<1:02:25, 25.31s/it]
72%|███████▏ | 375/522 [2:31:46<1:00:15, 24.60s/it]
72%|███████▏ | 376/522 [2:32:12<1:00:57, 25.05s/it]
72%|███████▏ | 377/522 [2:32:33<57:32, 23.81s/it]
72%|███████▏ | 378/522 [2:33:00<59:10, 24.66s/it]
73%|███████▎ | 379/522 [2:33:20<55:51, 23.44s/it]
73%|███████▎ | 380/522 [2:33:46<57:29, 24.29s/it]
{'loss': 0.6944, 'learning_rate': 0.0001, 'global_step': 380, 'epoch': 2.18} |
| |
73%|███████▎ | 380/522 [2:33:46<57:29, 24.29s/it]
73%|███████▎ | 381/522 [2:34:11<56:56, 24.23s/it]
73%|███████▎ | 382/522 [2:34:41<1:00:42, 26.02s/it]
73%|███████▎ | 383/522 [2:35:10<1:02:42, 27.07s/it]
74%|███████▎ | 384/522 [2:35:38<1:02:39, 27.24s/it]
74%|███████▍ | 385/522 [2:36:02<1:00:05, 26.32s/it]
74%|███████▍ | 386/522 [2:36:29<59:47, 26.38s/it]
74%|███████▍ | 387/522 [2:36:54<58:49, 26.14s/it]
74%|███████▍ | 388/522 [2:37:21<58:47, 26.32s/it]
75%|███████▍ | 389/522 [2:37:42<54:47, 24.72s/it]
75%|███████▍ | 390/522 [2:38:03<51:44, 23.52s/it]
75%|███████▍ | 391/522 [2:38:25<50:31, 23.14s/it]
75%|███████▌ | 392/522 [2:38:50<51:43, 23.87s/it]
75%|███████▌ | 393/522 [2:39:20<55:05, 25.63s/it]
75%|███████▌ | 394/522 [2:39:41<51:40, 24.22s/it]
76%|███████▌ | 395/522 [2:40:03<49:58, 23.61s/it]
76%|███████▌ | 396/522 [2:40:29<50:37, 24.11s/it]
76%|███████▌ | 397/522 [2:40:49<48:10, 23.13s/it]
76%|███████▌ | 398/522 [2:41:14<48:26, 23.44s/it]
76%|███████▋ | 399/522 [2:41:41<50:35, 24.67s/it]
77%|███████▋ | 400/522 [2:42:06<50:13, 24.70s/it]
{'loss': 0.6939, 'learning_rate': 0.0001, 'global_step': 400, 'epoch': 2.3} |
| |
77%|███████▋ | 400/522 [2:42:06<50:13, 24.70s/it]
77%|███████▋ | 401/522 [2:42:29<49:04, 24.34s/it]
77%|███████▋ | 402/522 [2:42:59<52:07, 26.06s/it]
77%|███████▋ | 403/522 [2:43:27<52:20, 26.39s/it]
77%|███████▋ | 404/522 [2:43:52<51:31, 26.20s/it]
78%|███████▊ | 405/522 [2:44:14<48:27, 24.85s/it]
78%|███████▊ | 406/522 [2:44:40<48:41, 25.18s/it]
78%|███████▊ | 407/522 [2:45:03<46:50, 24.44s/it]
78%|███████▊ | 408/522 [2:45:24<44:23, 23.37s/it]
78%|███████▊ | 409/522 [2:45:47<44:09, 23.45s/it]
79%|███████▊ | 410/522 [2:46:08<42:09, 22.58s/it]
79%|███████▊ | 411/522 [2:46:30<41:29, 22.43s/it]
79%|███████▉ | 412/522 [2:46:55<42:38, 23.26s/it]
79%|███████▉ | 413/522 [2:47:18<41:56, 23.09s/it]
79%|███████▉ | 414/522 [2:47:45<43:59, 24.44s/it]
80%|███████▉ | 415/522 [2:48:05<40:55, 22.95s/it]
80%|███████▉ | 416/522 [2:48:29<40:55, 23.17s/it]
80%|███████▉ | 417/522 [2:48:53<41:10, 23.53s/it]
80%|████████ | 418/522 [2:49:15<39:53, 23.02s/it]
80%|████████ | 419/522 [2:49:42<41:56, 24.43s/it]
80%|████████ | 420/522 [2:50:08<42:07, 24.78s/it]
{'loss': 0.6787, 'learning_rate': 0.0001, 'global_step': 420, 'epoch': 2.41} |
| |
80%|████████ | 420/522 [2:50:08<42:07, 24.78s/it]
81%|████████ | 421/522 [2:50:30<40:06, 23.83s/it]
81%|████████ | 422/522 [2:50:54<39:57, 23.98s/it]
81%|████████ | 423/522 [2:51:15<38:13, 23.16s/it]
81%|████████ | 424/522 [2:51:43<39:55, 24.44s/it]
81%|████████▏ | 425/522 [2:52:06<38:55, 24.08s/it]
82%|████████▏ | 426/522 [2:52:28<37:38, 23.53s/it]
82%|████████▏ | 427/522 [2:52:51<37:05, 23.43s/it]
82%|████████▏ | 428/522 [2:53:13<35:55, 22.93s/it]
82%|████████▏ | 429/522 [2:53:41<37:50, 24.42s/it]
82%|████████▏ | 430/522 [2:54:03<36:31, 23.83s/it]
83%|████████▎ | 431/522 [2:54:29<36:55, 24.34s/it]
83%|████████▎ | 432/522 [2:54:50<34:54, 23.28s/it]
83%|████████▎ | 433/522 [2:55:15<35:30, 23.94s/it]
83%|████████▎ | 434/522 [2:55:41<35:50, 24.44s/it]
83%|████████▎ | 435/522 [2:56:10<37:18, 25.73s/it]
84%|████████▎ | 436/522 [2:56:33<35:58, 25.10s/it]
84%|████████▎ | 437/522 [2:56:58<35:30, 25.06s/it]
84%|████████▍ | 438/522 [2:57:21<34:08, 24.39s/it]
84%|████████▍ | 439/522 [2:57:42<32:14, 23.30s/it]
84%|████████▍ | 440/522 [2:58:04<31:31, 23.07s/it]
{'loss': 0.6928, 'learning_rate': 0.0001, 'global_step': 440, 'epoch': 2.53} |
| |
84%|████████▍ | 440/522 [2:58:04<31:31, 23.07s/it]
84%|████████▍ | 441/522 [2:58:29<31:44, 23.51s/it]
85%|████████▍ | 442/522 [2:58:55<32:25, 24.32s/it]
85%|████████▍ | 443/522 [2:59:21<32:49, 24.93s/it]
85%|████████▌ | 444/522 [2:59:49<33:27, 25.74s/it]
85%|████████▌ | 445/522 [3:00:15<32:58, 25.69s/it]
85%|████████▌ | 446/522 [3:00:36<30:49, 24.34s/it]
86%|████████▌ | 447/522 [3:00:59<30:00, 24.01s/it]
86%|████████▌ | 448/522 [3:01:28<31:16, 25.36s/it]
86%|████████▌ | 449/522 [3:01:49<29:15, 24.05s/it]
86%|████████▌ | 450/522 [3:02:08<27:04, 22.57s/it]
86%|████████▋ | 451/522 [3:02:37<29:09, 24.64s/it]
87%|████████▋ | 452/522 [3:02:59<27:44, 23.78s/it]
87%|████████▋ | 453/522 [3:03:18<25:49, 22.46s/it]
87%|████████▋ | 454/522 [3:03:44<26:24, 23.30s/it]
87%|████████▋ | 455/522 [3:04:12<27:43, 24.82s/it]
87%|████████▋ | 456/522 [3:04:33<26:01, 23.66s/it]
88%|████████▊ | 457/522 [3:05:01<27:06, 25.02s/it]
88%|████████▊ | 458/522 [3:05:28<27:26, 25.73s/it]
88%|████████▊ | 459/522 [3:05:55<27:17, 25.99s/it]
88%|████████▊ | 460/522 [3:06:20<26:36, 25.75s/it]
{'loss': 0.7083, 'learning_rate': 0.0001, 'global_step': 460, 'epoch': 2.64} |
| |
88%|████████▊ | 460/522 [3:06:20<26:36, 25.75s/it]
88%|████████▊ | 461/522 [3:06:46<26:02, 25.62s/it]
89%|████████▊ | 462/522 [3:07:10<25:18, 25.30s/it]
89%|████████▊ | 463/522 [3:07:31<23:27, 23.85s/it]
89%|████████▉ | 464/522 [3:07:49<21:30, 22.24s/it]
89%|████████▉ | 465/522 [3:08:16<22:28, 23.65s/it]
89%|████████▉ | 466/522 [3:08:37<21:20, 22.86s/it]
89%|████████▉ | 467/522 [3:09:04<22:00, 24.01s/it]
90%|████████▉ | 468/522 [3:09:31<22:36, 25.12s/it]
90%|████████▉ | 469/522 [3:09:58<22:30, 25.49s/it]
90%|█████████ | 470/522 [3:10:19<20:59, 24.21s/it]
90%|█████████ | 471/522 [3:10:46<21:16, 25.04s/it]
90%|█████████ | 472/522 [3:11:09<20:20, 24.41s/it]
91%|█████████ | 473/522 [3:11:32<19:31, 23.90s/it]
91%|█████████ | 474/522 [3:11:52<18:18, 22.88s/it]
91%|█████████ | 475/522 [3:12:20<19:06, 24.39s/it]
91%|█████████ | 476/522 [3:12:51<20:06, 26.23s/it]
91%|█████████▏| 477/522 [3:13:13<18:55, 25.23s/it]
92%|█████████▏| 478/522 [3:13:40<18:41, 25.49s/it]
92%|█████████▏| 479/522 [3:14:03<17:49, 24.88s/it]
92%|█████████▏| 480/522 [3:14:28<17:23, 24.85s/it]
{'loss': 0.694, 'learning_rate': 0.0001, 'global_step': 480, 'epoch': 2.76} |
| |
92%|█████████▏| 480/522 [3:14:28<17:23, 24.85s/it]
92%|█████████▏| 481/522 [3:14:47<15:47, 23.11s/it]
92%|█████████▏| 482/522 [3:15:13<15:57, 23.94s/it]
93%|█████████▎| 483/522 [3:15:37<15:36, 24.02s/it]
93%|█████████▎| 484/522 [3:16:00<15:02, 23.75s/it]
93%|█████████▎| 485/522 [3:16:21<14:10, 23.00s/it]
93%|█████████▎| 486/522 [3:16:47<14:12, 23.69s/it]
93%|█████████▎| 487/522 [3:17:08<13:27, 23.06s/it]
93%|█████████▎| 488/522 [3:17:29<12:43, 22.46s/it]
94%|█████████▎| 489/522 [3:17:52<12:27, 22.65s/it]
94%|█████████▍| 490/522 [3:18:13<11:45, 22.06s/it]
94%|█████████▍| 491/522 [3:18:37<11:38, 22.54s/it]
94%|█████████▍| 492/522 [3:18:58<11:06, 22.21s/it]
94%|█████████▍| 493/522 [3:19:21<10:49, 22.41s/it]
95%|█████████▍| 494/522 [3:19:46<10:51, 23.26s/it]
95%|█████████▍| 495/522 [3:20:12<10:47, 23.99s/it]
95%|█████████▌| 496/522 [3:20:41<11:05, 25.59s/it]
95%|█████████▌| 497/522 [3:21:04<10:21, 24.87s/it]
95%|█████████▌| 498/522 [3:21:30<09:59, 24.99s/it]
96%|█████████▌| 499/522 [3:21:55<09:33, 24.93s/it]
96%|█████████▌| 500/522 [3:22:18<08:58, 24.47s/it]
{'loss': 0.688, 'learning_rate': 0.0001, 'global_step': 500, 'epoch': 2.87} |
| |
96%|█████████▌| 500/522 [3:22:18<08:58, 24.47s/it]
96%|█████████▌| 501/522 [3:22:47<09:01, 25.80s/it]
96%|█████████▌| 502/522 [3:23:17<09:01, 27.09s/it]
96%|█████████▋| 503/522 [3:23:38<08:01, 25.35s/it]
97%|█████████▋| 504/522 [3:24:07<07:55, 26.41s/it]
97%|█████████▋| 505/522 [3:24:32<07:19, 25.84s/it]
97%|█████████▋| 506/522 [3:24:52<06:24, 24.05s/it]
97%|█████████▋| 507/522 [3:25:18<06:10, 24.68s/it]
97%|█████████▋| 508/522 [3:25:41<05:40, 24.33s/it]
98%|█████████▊| 509/522 [3:26:01<05:00, 23.08s/it]
98%|█████████▊| 510/522 [3:26:31<05:00, 25.08s/it]
98%|█████████▊| 511/522 [3:26:58<04:42, 25.71s/it]
98%|█████████▊| 512/522 [3:27:21<04:08, 24.84s/it]
98%|█████████▊| 513/522 [3:27:51<03:58, 26.51s/it]
98%|█████████▊| 514/522 [3:28:14<03:21, 25.23s/it]
99%|█████████▊| 515/522 [3:28:36<02:50, 24.41s/it]
99%|█████████▉| 516/522 [3:28:59<02:24, 24.04s/it]
99%|█████████▉| 517/522 [3:29:22<01:58, 23.72s/it]
99%|█████████▉| 518/522 [3:29:42<01:29, 22.37s/it]
99%|█████████▉| 519/522 [3:30:02<01:05, 21.92s/it]
100%|█████████▉| 520/522 [3:30:28<00:45, 22.91s/it]
{'loss': 0.6821, 'learning_rate': 0.0001, 'global_step': 520, 'epoch': 2.99} |
| |
100%|█████████▉| 520/522 [3:30:28<00:45, 22.91s/it]
100%|█████████▉| 521/522 [3:30:53<00:23, 23.50s/it]
100%|██████████| 522/522 [3:31:10<00:00, 21.74s/it]
{'train_runtime': 12670.7287, 'train_samples_per_second': 0.822, 'train_steps_per_second': 0.041, 'train_loss': 0.9344269825124193, 'epoch': 3.0} |
| |
100%|██████████| 522/522 [3:31:10<00:00, 21.74s/it]
100%|██████████| 522/522 [3:31:10<00:00, 24.27s/it] |
| | ***** train metrics ***** |
| | epoch = 3.0 |
| | train_loss = 0.9344 |
| | train_runtime = 3:31:10.72 |
| | train_samples_per_second = 0.822 |
| | train_steps_per_second = 0.041 |
| |
|