|
|
model training desc: initialize model training... |
|
|
2023-12-29 11:30:02.222 | INFO | __main__:init_components:108 - Initializing components... |
|
|
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|█████ | 1/2 [00:08<00:08, 8.69s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:11<00:00, 5.05s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:11<00:00, 5.60s/it] |
|
|
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 |
|
|
2023-12-29 11:30:14.063 | INFO | __main__:init_components:155 - |
|
|
|
|
|
2023-12-29 11:30:14.063 | INFO | __main__:init_components:156 - ******************** |
|
|
2023-12-29 11:30:14.063 | INFO | __main__:init_components:157 - using llama2 model |
|
|
2023-12-29 11:30:14.063 | INFO | __main__:init_components:158 - ******************** |
|
|
2023-12-29 11:30:14.063 | INFO | __main__:init_components:159 - |
|
|
|
|
|
memory footprint of model: 4.024436950683594 GB |
|
|
trainable params: 319,815,680 || all params: 7,058,231,296 || trainable%: 4.531102291607305 |
|
|
2023-12-29 11:30:17.486 | INFO | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/QuALITY/quality_rocketqa_2048_instruct/train.jsonl |
|
|
2023-12-29 11:30:17.514 | INFO | component.dataset:__init__:19 - there are 2523 data in dataset |
|
|
2023-12-29 11:30:17.533 | INFO | __main__:main:231 - *** starting training *** |
|
|
0%| | 0/420 [00:00<?, ?it/s]
0%| | 1/420 [01:27<10:09:46, 87.32s/it]
0%| | 2/420 [02:49<9:47:41, 84.36s/it]
1%| | 3/420 [04:12<9:40:35, 83.54s/it]
1%| | 4/420 [05:35<9:38:48, 83.48s/it]
1%| | 5/420 [06:58<9:35:44, 83.24s/it]
1%|▏ | 6/420 [08:21<9:33:24, 83.10s/it]
2%|▏ | 7/420 [09:44<9:32:39, 83.20s/it]
2%|▏ | 8/420 [11:05<9:26:55, 82.56s/it]
2%|▏ | 9/420 [12:28<9:26:11, 82.65s/it]
2%|▏ | 10/420 [13:52<9:26:25, 82.89s/it]
3%|▎ | 11/420 [15:15<9:25:07, 82.90s/it]
3%|▎ | 12/420 [16:37<9:23:51, 82.92s/it]
3%|▎ | 13/420 [18:01<9:23:32, 83.08s/it]
3%|▎ | 14/420 [19:24<9:22:22, 83.11s/it]
4%|▎ | 15/420 [20:47<9:20:46, 83.08s/it]
4%|▍ | 16/420 [22:10<9:18:57, 83.01s/it]
4%|▍ | 17/420 [23:33<9:16:42, 82.88s/it]
4%|▍ | 18/420 [24:56<9:16:35, 83.07s/it]
5%|▍ | 19/420 [26:19<9:15:03, 83.05s/it]
5%|▍ | 20/420 [27:42<9:13:48, 83.07s/it]
5%|▌ | 21/420 [29:05<9:11:50, 82.98s/it]
5%|▌ | 22/420 [30:28<9:09:35, 82.85s/it]
5%|▌ | 23/420 [31:50<9:08:27, 82.89s/it]
6%|▌ | 24/420 [33:13<9:06:53, 82.86s/it]
6%|▌ | 25/420 [34:36<9:05:25, 82.85s/it]
6%|▌ | 26/420 [35:59<9:03:24, 82.75s/it]
6%|▋ | 27/420 [37:22<9:02:15, 82.79s/it]
7%|▋ | 28/420 [38:44<9:00:44, 82.77s/it]
7%|▋ | 29/420 [40:08<9:00:28, 82.94s/it]
7%|▋ | 30/420 [41:30<8:58:22, 82.83s/it]
7%|▋ | 31/420 [42:53<8:57:12, 82.86s/it]
8%|▊ | 32/420 [44:17<8:57:35, 83.13s/it]
8%|▊ | 33/420 [45:40<8:55:49, 83.07s/it]
8%|▊ | 34/420 [47:02<8:53:36, 82.95s/it]
8%|▊ | 35/420 [48:24<8:50:01, 82.60s/it]
9%|▊ | 36/420 [49:47<8:49:05, 82.67s/it]
9%|▉ | 37/420 [51:10<8:48:20, 82.77s/it]
9%|▉ | 38/420 [52:33<8:47:44, 82.89s/it]
9%|▉ | 39/420 [53:57<8:47:06, 83.01s/it]
10%|▉ | 40/420 [55:20<8:45:50, 83.03s/it]
10%|▉ | 41/420 [56:42<8:43:45, 82.92s/it]
10%|█ | 42/420 [58:06<8:43:48, 83.14s/it]
10%|█ | 43/420 [59:29<8:42:42, 83.19s/it]
10%|█ | 44/420 [1:00:52<8:40:17, 83.03s/it]
11%|█ | 45/420 [1:02:15<8:38:15, 82.92s/it]
11%|█ | 46/420 [1:03:37<8:36:56, 82.93s/it]
11%|█ | 47/420 [1:05:00<8:34:08, 82.70s/it]
11%|█▏ | 48/420 [1:06:22<8:32:49, 82.71s/it]
12%|█▏ | 49/420 [1:07:46<8:32:23, 82.87s/it]
12%|█▏ | 50/420 [1:09:09<8:31:11, 82.90s/it]
{'loss': 0.4779, 'learning_rate': 0.0001, 'global_step': 50, 'epoch': 0.24} |
|
|
12%|█▏ | 50/420 [1:09:09<8:31:11, 82.90s/it]
12%|█▏ | 51/420 [1:10:32<8:29:54, 82.91s/it]
12%|█▏ | 52/420 [1:11:54<8:28:09, 82.85s/it]
13%|█▎ | 53/420 [1:13:16<8:25:28, 82.64s/it]
13%|█▎ | 54/420 [1:14:40<8:25:06, 82.80s/it]
13%|█▎ | 55/420 [1:16:00<8:19:47, 82.16s/it]
13%|█▎ | 56/420 [1:17:23<8:20:21, 82.48s/it]
14%|█▎ | 57/420 [1:18:46<8:18:39, 82.42s/it]
14%|█▍ | 58/420 [1:20:08<8:17:47, 82.51s/it]
14%|█▍ | 59/420 [1:21:32<8:17:48, 82.74s/it]
14%|█▍ | 60/420 [1:22:54<8:16:28, 82.74s/it]
15%|█▍ | 61/420 [1:24:18<8:15:46, 82.86s/it]
15%|█▍ | 62/420 [1:25:41<8:14:43, 82.91s/it]
15%|█▌ | 63/420 [1:27:04<8:14:38, 83.13s/it]
15%|█▌ | 64/420 [1:28:27<8:12:20, 82.98s/it]
15%|█▌ | 65/420 [1:29:49<8:10:02, 82.82s/it]
16%|█▌ | 66/420 [1:31:13<8:09:18, 82.93s/it]
16%|█▌ | 67/420 [1:32:35<8:07:43, 82.90s/it]
16%|█▌ | 68/420 [1:33:57<8:04:17, 82.55s/it]
16%|█▋ | 69/420 [1:35:20<8:03:08, 82.59s/it]
17%|█▋ | 70/420 [1:36:42<8:01:27, 82.54s/it]
17%|█▋ | 71/420 [1:38:05<8:00:50, 82.67s/it]
17%|█▋ | 72/420 [1:39:28<8:00:00, 82.76s/it]
17%|█▋ | 73/420 [1:40:51<7:59:31, 82.91s/it]
18%|█▊ | 74/420 [1:42:15<7:59:17, 83.11s/it]
18%|█▊ | 75/420 [1:43:38<7:57:25, 83.03s/it]
18%|█▊ | 76/420 [1:45:01<7:56:29, 83.11s/it]
18%|█▊ | 77/420 [1:46:24<7:55:03, 83.10s/it]
19%|█▊ | 78/420 [1:47:47<7:53:20, 83.04s/it]
19%|█▉ | 79/420 [1:49:10<7:51:58, 83.05s/it]
19%|█▉ | 80/420 [1:50:33<7:49:22, 82.83s/it]
19%|█▉ | 81/420 [1:51:56<7:48:23, 82.90s/it]
20%|█▉ | 82/420 [1:53:17<7:45:16, 82.59s/it]
20%|█▉ | 83/420 [1:54:40<7:43:31, 82.53s/it]
20%|██ | 84/420 [1:56:03<7:42:44, 82.63s/it]
20%|██ | 85/420 [1:57:26<7:42:42, 82.87s/it]
20%|██ | 86/420 [1:58:49<7:40:55, 82.80s/it]
21%|██ | 87/420 [2:00:12<7:39:45, 82.84s/it]
21%|██ | 88/420 [2:01:34<7:38:14, 82.81s/it]
21%|██ | 89/420 [2:02:58<7:37:23, 82.91s/it]
21%|██▏ | 90/420 [2:04:20<7:35:47, 82.87s/it]
22%|██▏ | 91/420 [2:05:43<7:34:03, 82.81s/it]
22%|██▏ | 92/420 [2:07:06<7:32:54, 82.85s/it]
22%|██▏ | 93/420 [2:08:29<7:32:03, 82.95s/it]
22%|██▏ | 94/420 [2:09:52<7:30:14, 82.87s/it]
23%|██▎ | 95/420 [2:11:15<7:28:40, 82.83s/it]
23%|██▎ | 96/420 [2:12:38<7:27:26, 82.86s/it]
23%|██▎ | 97/420 [2:14:00<7:25:57, 82.84s/it]
23%|██▎ | 98/420 [2:15:24<7:25:45, 83.06s/it]
24%|██▎ | 99/420 [2:16:47<7:24:00, 82.99s/it]
24%|██▍ | 100/420 [2:18:09<7:21:56, 82.86s/it]
{'loss': 0.4432, 'learning_rate': 0.0001, 'global_step': 100, 'epoch': 0.48} |
|
|
24%|██▍ | 100/420 [2:18:09<7:21:56, 82.86s/it]
24%|██▍ | 101/420 [2:19:32<7:21:04, 82.96s/it]
24%|██▍ | 102/420 [2:20:55<7:19:18, 82.89s/it]
25%|██▍ | 103/420 [2:22:19<7:18:37, 83.02s/it]
25%|██▍ | 104/420 [2:23:42<7:17:28, 83.07s/it]
25%|██▌ | 105/420 [2:25:05<7:16:11, 83.09s/it]
25%|██▌ | 106/420 [2:26:28<7:15:23, 83.20s/it]
25%|██▌ | 107/420 [2:27:51<7:14:02, 83.20s/it]
26%|██▌ | 108/420 [2:29:14<7:11:45, 83.03s/it]
26%|██▌ | 109/420 [2:30:37<7:09:49, 82.92s/it]
26%|██▌ | 110/420 [2:32:00<7:08:40, 82.97s/it]
26%|██▋ | 111/420 [2:33:23<7:07:22, 82.98s/it]
27%|██▋ | 112/420 [2:34:46<7:05:55, 82.97s/it]
27%|██▋ | 113/420 [2:36:09<7:04:40, 83.00s/it]
27%|██▋ | 114/420 [2:37:32<7:03:35, 83.06s/it]
27%|██▋ | 115/420 [2:38:55<7:02:29, 83.11s/it]
28%|██▊ | 116/420 [2:40:18<7:00:06, 82.92s/it]
28%|██▊ | 117/420 [2:41:40<6:58:20, 82.84s/it]
28%|██▊ | 118/420 [2:43:03<6:57:00, 82.85s/it]
28%|██▊ | 119/420 [2:44:26<6:55:14, 82.77s/it]
29%|██▊ | 120/420 [2:45:49<6:54:33, 82.91s/it]
29%|██▉ | 121/420 [2:47:12<6:53:43, 83.02s/it]
29%|██▉ | 122/420 [2:48:36<6:52:30, 83.06s/it]
29%|██▉ | 123/420 [2:49:59<6:51:04, 83.05s/it]
30%|██▉ | 124/420 [2:51:21<6:49:06, 82.93s/it]
30%|██▉ | 125/420 [2:52:44<6:47:17, 82.84s/it]
30%|███ | 126/420 [2:54:07<6:46:26, 82.95s/it]
30%|███ | 127/420 [2:55:31<6:45:56, 83.13s/it]
30%|███ | 128/420 [2:56:53<6:43:58, 83.01s/it]
31%|███ | 129/420 [2:58:16<6:42:14, 82.93s/it]
31%|███ | 130/420 [2:59:39<6:41:19, 83.03s/it]
31%|███ | 131/420 [3:01:03<6:40:25, 83.13s/it]
31%|███▏ | 132/420 [3:02:25<6:37:29, 82.81s/it]
32%|███▏ | 133/420 [3:03:48<6:36:25, 82.88s/it]
32%|███▏ | 134/420 [3:05:11<6:35:34, 82.99s/it]
32%|███▏ | 135/420 [3:06:34<6:34:21, 83.02s/it]
32%|███▏ | 136/420 [3:07:57<6:32:55, 83.01s/it]
33%|███▎ | 137/420 [3:09:20<6:31:11, 82.94s/it]
33%|███▎ | 138/420 [3:10:42<6:29:01, 82.77s/it]
33%|███▎ | 139/420 [3:12:05<6:27:48, 82.80s/it]
33%|███▎ | 140/420 [3:13:28<6:26:20, 82.79s/it]
34%|███▎ | 141/420 [3:14:51<6:25:56, 83.00s/it]
34%|███▍ | 142/420 [3:16:14<6:24:30, 82.99s/it]
34%|███▍ | 143/420 [3:17:38<6:23:31, 83.07s/it]
34%|███▍ | 144/420 [3:19:01<6:21:58, 83.04s/it]
35%|███▍ | 145/420 [3:20:24<6:20:30, 83.02s/it]
35%|███▍ | 146/420 [3:21:46<6:18:45, 82.94s/it]
35%|███▌ | 147/420 [3:23:09<6:16:49, 82.82s/it]
35%|███▌ | 148/420 [3:24:32<6:15:15, 82.78s/it]
35%|███▌ | 149/420 [3:25:54<6:13:55, 82.79s/it]
36%|███▌ | 150/420 [3:27:17<6:11:59, 82.67s/it]
{'loss': 0.4268, 'learning_rate': 0.0001, 'global_step': 150, 'epoch': 0.71} |
|
|
36%|███▌ | 150/420 [3:27:17<6:11:59, 82.67s/it]
36%|███▌ | 151/420 [3:28:39<6:10:42, 82.68s/it]
36%|███▌ | 152/420 [3:30:02<6:09:28, 82.72s/it]
36%|███▋ | 153/420 [3:31:25<6:08:15, 82.76s/it]
37%|███▋ | 154/420 [3:32:48<6:07:13, 82.83s/it]
37%|███▋ | 155/420 [3:34:07<6:00:09, 81.55s/it]
37%|███▋ | 156/420 [3:35:29<6:00:20, 81.90s/it]
37%|███▋ | 157/420 [3:36:52<6:00:28, 82.24s/it]
38%|███▊ | 158/420 [3:38:15<6:00:02, 82.45s/it]
38%|███▊ | 159/420 [3:39:38<5:59:14, 82.58s/it]
38%|███▊ | 160/420 [3:41:01<5:57:33, 82.51s/it]
38%|███▊ | 161/420 [3:42:23<5:55:39, 82.39s/it]
39%|███▊ | 162/420 [3:43:45<5:54:41, 82.49s/it]
39%|███▉ | 163/420 [3:45:08<5:53:41, 82.57s/it]
39%|███▉ | 164/420 [3:46:29<5:50:29, 82.15s/it]
39%|███▉ | 165/420 [3:47:51<5:48:47, 82.07s/it]
40%|███▉ | 166/420 [3:49:14<5:48:35, 82.35s/it]
40%|███▉ | 167/420 [3:50:37<5:48:05, 82.55s/it]
40%|████ | 168/420 [3:52:00<5:47:27, 82.73s/it]
40%|████ | 169/420 [3:53:23<5:46:27, 82.82s/it]
40%|████ | 170/420 [3:54:46<5:45:20, 82.88s/it]
41%|████ | 171/420 [3:56:09<5:44:05, 82.91s/it]
41%|████ | 172/420 [3:57:32<5:42:29, 82.86s/it]
41%|████ | 173/420 [3:58:55<5:40:32, 82.72s/it]
41%|████▏ | 174/420 [4:00:18<5:39:51, 82.89s/it]
42%|████▏ | 175/420 [4:01:41<5:38:53, 82.99s/it]
42%|████▏ | 176/420 [4:03:04<5:37:19, 82.95s/it]
42%|████▏ | 177/420 [4:04:28<5:36:41, 83.13s/it]
42%|████▏ | 178/420 [4:05:51<5:35:36, 83.21s/it]
43%|████▎ | 179/420 [4:07:14<5:33:39, 83.07s/it]
43%|████▎ | 180/420 [4:08:37<5:32:36, 83.15s/it]
43%|████▎ | 181/420 [4:10:00<5:30:53, 83.07s/it]
43%|████▎ | 182/420 [4:11:23<5:29:40, 83.11s/it]
44%|████▎ | 183/420 [4:12:46<5:28:13, 83.10s/it]
44%|████▍ | 184/420 [4:14:10<5:27:26, 83.25s/it]
44%|████▍ | 185/420 [4:15:32<5:25:26, 83.09s/it]
44%|████▍ | 186/420 [4:16:55<5:23:19, 82.90s/it]
45%|████▍ | 187/420 [4:18:18<5:22:19, 83.00s/it]
45%|████▍ | 188/420 [4:19:41<5:21:11, 83.07s/it]
45%|████▌ | 189/420 [4:21:04<5:19:49, 83.07s/it]
45%|████▌ | 190/420 [4:22:28<5:18:35, 83.11s/it]
45%|████▌ | 191/420 [4:23:50<5:16:47, 83.00s/it]
46%|████▌ | 192/420 [4:25:14<5:15:34, 83.05s/it]
46%|████▌ | 193/420 [4:26:36<5:13:47, 82.94s/it]
46%|████▌ | 194/420 [4:27:59<5:12:06, 82.86s/it]
46%|████▋ | 195/420 [4:29:22<5:11:21, 83.03s/it]
47%|████▋ | 196/420 [4:30:46<5:10:13, 83.10s/it]
47%|████▋ | 197/420 [4:32:08<5:08:15, 82.94s/it]
47%|████▋ | 198/420 [4:33:32<5:07:23, 83.08s/it]
47%|████▋ | 199/420 [4:34:55<5:06:15, 83.15s/it]
48%|████▊ | 200/420 [4:36:18<5:04:29, 83.04s/it]
{'loss': 0.4376, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 0.95} |
|
|
48%|████▊ | 200/420 [4:36:18<5:04:29, 83.04s/it]
48%|████▊ | 201/420 [4:37:45<5:07:21, 84.21s/it]
48%|████▊ | 202/420 [4:39:08<5:05:06, 83.97s/it]
48%|████▊ | 203/420 [4:40:31<5:02:56, 83.76s/it]
49%|████▊ | 204/420 [4:41:54<5:00:27, 83.46s/it]
49%|████▉ | 205/420 [4:43:17<4:58:22, 83.27s/it]
49%|████▉ | 206/420 [4:44:40<4:56:41, 83.18s/it]
49%|████▉ | 207/420 [4:46:03<4:54:43, 83.02s/it]
50%|████▉ | 208/420 [4:47:26<4:53:47, 83.15s/it]
50%|████▉ | 209/420 [4:48:49<4:52:07, 83.07s/it]
50%|█████ | 210/420 [4:50:12<4:50:47, 83.08s/it]
50%|█████ | 211/420 [4:51:29<4:42:36, 81.13s/it]
50%|█████ | 212/420 [4:52:51<4:43:04, 81.65s/it]
51%|█████ | 213/420 [4:54:15<4:43:29, 82.17s/it]
51%|█████ | 214/420 [4:55:38<4:43:02, 82.44s/it]
51%|█████ | 215/420 [4:57:00<4:41:47, 82.47s/it]
51%|█████▏ | 216/420 [4:58:23<4:40:00, 82.35s/it]
52%|█████▏ | 217/420 [4:59:46<4:39:45, 82.69s/it]
52%|█████▏ | 218/420 [5:01:09<4:38:40, 82.78s/it]
52%|█████▏ | 219/420 [5:02:31<4:37:01, 82.69s/it]
52%|█████▏ | 220/420 [5:03:55<4:36:01, 82.81s/it]
53%|█████▎ | 221/420 [5:05:16<4:33:20, 82.42s/it]
53%|█████▎ | 222/420 [5:06:39<4:32:31, 82.58s/it]
53%|█████▎ | 223/420 [5:08:02<4:31:22, 82.65s/it]
53%|█████▎ | 224/420 [5:09:25<4:30:11, 82.71s/it]
54%|█████▎ | 225/420 [5:10:47<4:28:47, 82.70s/it]
54%|█████▍ | 226/420 [5:12:10<4:27:14, 82.65s/it]
54%|█████▍ | 227/420 [5:13:33<4:26:03, 82.71s/it]
54%|█████▍ | 228/420 [5:14:56<4:25:23, 82.93s/it]
55%|█████▍ | 229/420 [5:16:19<4:23:25, 82.75s/it]
55%|█████▍ | 230/420 [5:17:41<4:22:15, 82.82s/it]
55%|█████▌ | 231/420 [5:19:05<4:21:35, 83.04s/it]
55%|█████▌ | 232/420 [5:20:28<4:20:01, 82.99s/it]
55%|█████▌ | 233/420 [5:21:50<4:18:04, 82.81s/it]
56%|█████▌ | 234/420 [5:23:13<4:16:32, 82.75s/it]
56%|█████▌ | 235/420 [5:24:36<4:15:06, 82.74s/it]
56%|█████▌ | 236/420 [5:25:58<4:13:38, 82.71s/it]
56%|█████▋ | 237/420 [5:27:21<4:12:27, 82.78s/it]
57%|█████▋ | 238/420 [5:28:45<4:11:35, 82.94s/it]
57%|█████▋ | 239/420 [5:30:07<4:10:05, 82.90s/it]
57%|█████▋ | 240/420 [5:31:31<4:08:59, 83.00s/it]
57%|█████▋ | 241/420 [5:32:53<4:07:26, 82.94s/it]
58%|█████▊ | 242/420 [5:34:15<4:05:04, 82.61s/it]
58%|█████▊ | 243/420 [5:35:38<4:03:47, 82.64s/it]
58%|█████▊ | 244/420 [5:37:00<4:02:14, 82.58s/it]
58%|█████▊ | 245/420 [5:38:23<4:00:51, 82.58s/it]
59%|█████▊ | 246/420 [5:39:46<3:59:29, 82.58s/it]
59%|█████▉ | 247/420 [5:41:09<3:58:34, 82.74s/it]
59%|█████▉ | 248/420 [5:42:32<3:57:19, 82.79s/it]
59%|█████▉ | 249/420 [5:43:55<3:56:16, 82.90s/it]
60%|█████▉ | 250/420 [5:45:18<3:54:57, 82.93s/it]
{'loss': 0.2366, 'learning_rate': 0.0001, 'global_step': 250, 'epoch': 1.19} |
|
|
60%|█████▉ | 250/420 [5:45:18<3:54:57, 82.93s/it]
60%|█████▉ | 251/420 [5:46:41<3:53:52, 83.03s/it]
60%|██████ | 252/420 [5:48:04<3:52:13, 82.94s/it]
60%|██████ | 253/420 [5:49:27<3:51:15, 83.09s/it]
60%|██████ | 254/420 [5:50:50<3:49:44, 83.04s/it]
61%|██████ | 255/420 [5:52:13<3:47:59, 82.91s/it]
61%|██████ | 256/420 [5:53:35<3:46:33, 82.89s/it]
61%|██████ | 257/420 [5:54:58<3:45:11, 82.89s/it]
61%|██████▏ | 258/420 [5:56:21<3:43:58, 82.95s/it]
62%|██████▏ | 259/420 [5:57:44<3:42:36, 82.96s/it]
62%|██████▏ | 260/420 [5:59:05<3:39:35, 82.35s/it]
62%|██████▏ | 261/420 [6:00:29<3:39:01, 82.65s/it]
62%|██████▏ | 262/420 [6:01:51<3:37:35, 82.63s/it]
63%|██████▎ | 263/420 [6:03:14<3:36:36, 82.78s/it]
63%|██████▎ | 264/420 [6:04:38<3:35:32, 82.90s/it]
63%|██████▎ | 265/420 [6:06:00<3:33:54, 82.81s/it]
63%|██████▎ | 266/420 [6:07:23<3:32:41, 82.87s/it]
64%|██████▎ | 267/420 [6:08:46<3:31:06, 82.79s/it]
64%|██████▍ | 268/420 [6:10:09<3:29:48, 82.82s/it]
64%|██████▍ | 269/420 [6:11:32<3:28:35, 82.89s/it]
64%|██████▍ | 270/420 [6:12:55<3:27:35, 83.04s/it]
65%|██████▍ | 271/420 [6:14:18<3:26:06, 83.00s/it]
65%|██████▍ | 272/420 [6:15:41<3:24:30, 82.91s/it]
65%|██████▌ | 273/420 [6:17:04<3:23:27, 83.05s/it]
65%|██████▌ | 274/420 [6:18:27<3:21:55, 82.98s/it]
65%|██████▌ | 275/420 [6:19:50<3:20:27, 82.95s/it]
66%|██████▌ | 276/420 [6:21:13<3:19:21, 83.06s/it]
66%|██████▌ | 277/420 [6:22:36<3:17:57, 83.06s/it]
66%|██████▌ | 278/420 [6:23:59<3:16:35, 83.07s/it]
66%|██████▋ | 279/420 [6:25:21<3:14:26, 82.74s/it]
67%|██████▋ | 280/420 [6:26:44<3:13:19, 82.86s/it]
67%|██████▋ | 281/420 [6:28:07<3:11:47, 82.79s/it]
67%|██████▋ | 282/420 [6:29:30<3:10:12, 82.70s/it]
67%|██████▋ | 283/420 [6:30:53<3:09:14, 82.88s/it]
68%|██████▊ | 284/420 [6:32:16<3:07:52, 82.88s/it]
68%|██████▊ | 285/420 [6:33:39<3:06:35, 82.93s/it]
68%|██████▊ | 286/420 [6:35:01<3:04:57, 82.82s/it]
68%|██████▊ | 287/420 [6:36:25<3:03:53, 82.96s/it]
69%|██████▊ | 288/420 [6:37:47<3:02:20, 82.88s/it]
69%|██████▉ | 289/420 [6:39:10<3:01:03, 82.93s/it]
69%|██████▉ | 290/420 [6:40:33<2:59:42, 82.94s/it]
69%|██████▉ | 291/420 [6:41:56<2:58:24, 82.98s/it]
70%|██████▉ | 292/420 [6:43:20<2:57:09, 83.04s/it]
70%|██████▉ | 293/420 [6:44:42<2:55:25, 82.88s/it]
70%|███████ | 294/420 [6:46:05<2:54:09, 82.93s/it]
70%|███████ | 295/420 [6:47:28<2:52:37, 82.86s/it]
70%|███████ | 296/420 [6:48:50<2:50:33, 82.53s/it]
71%|███████ | 297/420 [6:50:13<2:49:49, 82.84s/it]
71%|███████ | 298/420 [6:51:36<2:48:44, 82.98s/it]
71%|███████ | 299/420 [6:53:00<2:47:38, 83.13s/it]
71%|███████▏ | 300/420 [6:54:22<2:45:50, 82.92s/it]
{'loss': 0.1605, 'learning_rate': 0.0001, 'global_step': 300, 'epoch': 1.43} |
|
|
71%|███████▏ | 300/420 [6:54:22<2:45:50, 82.92s/it]
72%|███████▏ | 301/420 [6:55:46<2:44:45, 83.07s/it]
72%|███████▏ | 302/420 [6:57:09<2:43:16, 83.02s/it]
72%|███████▏ | 303/420 [6:58:31<2:41:25, 82.78s/it]
72%|███████▏ | 304/420 [6:59:54<2:40:02, 82.78s/it]
73%|███████▎ | 305/420 [7:01:17<2:38:43, 82.81s/it]
73%|███████▎ | 306/420 [7:02:40<2:37:36, 82.95s/it]
73%|███████▎ | 307/420 [7:04:03<2:36:23, 83.04s/it]
73%|███████▎ | 308/420 [7:05:26<2:34:50, 82.95s/it]
74%|███████▎ | 309/420 [7:06:49<2:33:47, 83.13s/it]
74%|███████▍ | 310/420 [7:08:12<2:32:10, 83.00s/it]
74%|███████▍ | 311/420 [7:09:34<2:30:23, 82.79s/it]
74%|███████▍ | 312/420 [7:10:57<2:28:54, 82.73s/it]
75%|███████▍ | 313/420 [7:12:20<2:27:55, 82.95s/it]
75%|███████▍ | 314/420 [7:13:43<2:26:17, 82.81s/it]
75%|███████▌ | 315/420 [7:15:06<2:24:50, 82.77s/it]
75%|███████▌ | 316/420 [7:16:29<2:23:38, 82.87s/it]
75%|███████▌ | 317/420 [7:17:52<2:22:27, 82.99s/it]
76%|███████▌ | 318/420 [7:19:15<2:21:14, 83.08s/it]
76%|███████▌ | 319/420 [7:20:38<2:19:48, 83.06s/it]
76%|███████▌ | 320/420 [7:22:01<2:18:13, 82.94s/it]
76%|███████▋ | 321/420 [7:23:24<2:16:47, 82.91s/it]
77%|███████▋ | 322/420 [7:24:47<2:15:24, 82.90s/it]
77%|███████▋ | 323/420 [7:26:10<2:14:01, 82.90s/it]
77%|███████▋ | 324/420 [7:27:33<2:12:40, 82.92s/it]
77%|███████▋ | 325/420 [7:28:56<2:11:32, 83.08s/it]
78%|███████▊ | 326/420 [7:30:19<2:09:58, 82.97s/it]
78%|███████▊ | 327/420 [7:31:41<2:08:31, 82.92s/it]
78%|███████▊ | 328/420 [7:33:04<2:07:00, 82.83s/it]
78%|███████▊ | 329/420 [7:34:27<2:05:38, 82.84s/it]
79%|███████▊ | 330/420 [7:35:49<2:04:05, 82.72s/it]
79%|███████▉ | 331/420 [7:37:12<2:02:37, 82.67s/it]
79%|███████▉ | 332/420 [7:38:35<2:01:33, 82.88s/it]
79%|███████▉ | 333/420 [7:39:58<2:00:11, 82.89s/it]
80%|███████▉ | 334/420 [7:41:21<1:58:44, 82.85s/it]
80%|███████▉ | 335/420 [7:42:44<1:57:25, 82.89s/it]
80%|████████ | 336/420 [7:44:07<1:56:02, 82.88s/it]
80%|████████ | 337/420 [7:45:30<1:54:36, 82.85s/it]
80%|████████ | 338/420 [7:46:53<1:53:14, 82.86s/it]
81%|████████ | 339/420 [7:48:15<1:51:52, 82.87s/it]
81%|████████ | 340/420 [7:49:38<1:50:25, 82.81s/it]
81%|████████ | 341/420 [7:50:41<1:41:16, 76.91s/it]
81%|████████▏ | 342/420 [7:51:18<1:24:13, 64.79s/it]
82%|████████▏ | 343/420 [7:51:54<1:12:10, 56.24s/it]
82%|████████▏ | 344/420 [7:52:31<1:03:44, 50.32s/it]
82%|████████▏ | 345/420 [7:53:07<57:41, 46.16s/it]
82%|████████▏ | 346/420 [7:53:43<53:14, 43.16s/it]
83%|████████▎ | 347/420 [7:54:20<50:08, 41.21s/it]
83%|████████▎ | 348/420 [7:54:56<47:48, 39.84s/it]
83%|████████▎ | 349/420 [7:55:33<45:56, 38.82s/it]
83%|████████▎ | 350/420 [7:56:10<44:32, 38.17s/it]
{'loss': 0.1625, 'learning_rate': 0.0001, 'global_step': 350, 'epoch': 1.66} |
|
|
83%|████████▎ | 350/420 [7:56:10<44:32, 38.17s/it]
84%|████████▎ | 351/420 [7:56:46<43:16, 37.63s/it]
84%|████████▍ | 352/420 [7:57:22<42:06, 37.15s/it]
84%|████████▍ | 353/420 [7:57:58<41:14, 36.94s/it]
84%|████████▍ | 354/420 [7:58:35<40:28, 36.79s/it]
85%|████████▍ | 355/420 [7:59:11<39:42, 36.65s/it]
85%|████████▍ | 356/420 [7:59:48<39:04, 36.63s/it]
85%|████████▌ | 357/420 [8:00:24<38:24, 36.58s/it]
85%|████████▌ | 358/420 [8:01:01<37:49, 36.60s/it]
85%|████████▌ | 359/420 [8:01:37<37:12, 36.60s/it]
86%|████████▌ | 360/420 [8:02:14<36:33, 36.56s/it]
86%|████████▌ | 361/420 [8:02:50<35:52, 36.48s/it]
86%|████████▌ | 362/420 [8:03:27<35:13, 36.43s/it]
86%|████████▋ | 363/420 [8:04:03<34:38, 36.46s/it]
87%|████████▋ | 364/420 [8:04:39<33:57, 36.39s/it]
87%|████████▋ | 365/420 [8:05:15<33:18, 36.33s/it]
87%|████████▋ | 366/420 [8:05:52<32:45, 36.40s/it]
87%|████████▋ | 367/420 [8:06:29<32:11, 36.45s/it]
88%|████████▊ | 368/420 [8:07:05<31:34, 36.44s/it]
88%|████████▊ | 369/420 [8:07:41<30:58, 36.44s/it]
88%|████████▊ | 370/420 [8:08:18<30:21, 36.43s/it]
88%|████████▊ | 371/420 [8:08:54<29:43, 36.40s/it]
89%|████████▊ | 372/420 [8:09:31<29:08, 36.42s/it]
89%|████████▉ | 373/420 [8:10:07<28:31, 36.41s/it]
89%|████████▉ | 374/420 [8:10:43<27:53, 36.39s/it]
89%|████████▉ | 375/420 [8:11:20<27:14, 36.33s/it]
90%|████████▉ | 376/420 [8:11:56<26:40, 36.37s/it]
90%|████████▉ | 377/420 [8:12:32<26:02, 36.34s/it]
90%|█████████ | 378/420 [8:13:09<25:28, 36.39s/it]
90%|█████████ | 379/420 [8:13:45<24:52, 36.40s/it]
90%|█████████ | 380/420 [8:14:22<24:16, 36.42s/it]
91%|█████████ | 381/420 [8:14:58<23:40, 36.43s/it]
91%|█████████ | 382/420 [8:15:34<23:02, 36.38s/it]
91%|█████████ | 383/420 [8:16:11<22:25, 36.36s/it]
91%|█████████▏| 384/420 [8:16:47<21:49, 36.38s/it]
92%|█████████▏| 385/420 [8:17:24<21:13, 36.37s/it]
92%|█████████▏| 386/420 [8:18:00<20:37, 36.40s/it]
92%|█████████▏| 387/420 [8:18:36<20:01, 36.41s/it]
92%|█████████▏| 388/420 [8:19:13<19:26, 36.47s/it]
93%|█████████▎| 389/420 [8:19:49<18:49, 36.44s/it]
93%|█████████▎| 390/420 [8:20:26<18:10, 36.33s/it]
93%|█████████▎| 391/420 [8:21:02<17:35, 36.40s/it]
93%|█████████▎| 392/420 [8:21:39<17:00, 36.44s/it]
94%|█████████▎| 393/420 [8:22:15<16:23, 36.41s/it]
94%|█████████▍| 394/420 [8:22:52<15:48, 36.47s/it]
94%|█████████▍| 395/420 [8:23:28<15:10, 36.41s/it]
94%|█████████▍| 396/420 [8:24:04<14:30, 36.25s/it]
95%|█████████▍| 397/420 [8:24:40<13:52, 36.21s/it]
95%|█████████▍| 398/420 [8:25:16<13:18, 36.29s/it]
95%|█████████▌| 399/420 [8:25:53<12:43, 36.36s/it]
95%|█████████▌| 400/420 [8:26:29<12:08, 36.44s/it]
{'loss': 0.1883, 'learning_rate': 0.0001, 'global_step': 400, 'epoch': 1.9} |
|
|
95%|█████████▌| 400/420 [8:26:29<12:08, 36.44s/it]
95%|█████████▌| 401/420 [8:27:10<11:53, 37.56s/it]
96%|█████████▌| 402/420 [8:27:46<11:09, 37.19s/it]
96%|█████████▌| 403/420 [8:28:23<10:29, 37.02s/it]
96%|█████████▌| 404/420 [8:28:59<09:50, 36.91s/it]
96%|█████████▋| 405/420 [8:29:36<09:15, 37.02s/it]
97%|█████████▋| 406/420 [8:30:13<08:37, 36.96s/it]
97%|█████████▋| 407/420 [8:30:50<08:00, 36.96s/it]
97%|█████████▋| 408/420 [8:31:27<07:22, 36.87s/it]
97%|█████████▋| 409/420 [8:32:03<06:43, 36.73s/it]
98%|█████████▊| 410/420 [8:32:41<06:09, 36.93s/it]
98%|█████████▊| 411/420 [8:33:18<05:32, 36.97s/it]
98%|█████████▊| 412/420 [8:33:54<04:54, 36.85s/it]
98%|█████████▊| 413/420 [8:34:31<04:17, 36.79s/it]
99%|█████████▊| 414/420 [8:35:08<03:40, 36.80s/it]
99%|█████████▉| 415/420 [8:35:44<03:03, 36.66s/it]
99%|█████████▉| 416/420 [8:36:21<02:26, 36.60s/it]
99%|█████████▉| 417/420 [8:37:02<01:54, 38.09s/it]
100%|█████████▉| 418/420 [8:37:39<01:15, 37.61s/it]
100%|█████████▉| 419/420 [8:38:15<00:37, 37.26s/it]
100%|██████████| 420/420 [8:38:52<00:00, 37.02s/it]
{'train_runtime': 31132.2559, 'train_samples_per_second': 0.162, 'train_steps_per_second': 0.013, 'train_loss': 0.3099689483642578, 'epoch': 2.0} |
|
|
100%|██████████| 420/420 [8:38:52<00:00, 37.02s/it]
100%|██████████| 420/420 [8:38:52<00:00, 74.12s/it] |
|
|
***** train metrics ***** |
|
|
epoch = 2.0 |
|
|
train_loss = 0.31 |
|
|
train_runtime = 8:38:52.25 |
|
|
train_samples_per_second = 0.162 |
|
|
train_steps_per_second = 0.013 |
|
|
|