|
|
model training desc: QuALITY 使用随机选择的关键句训练 |
|
|
2023-12-05 18:56:31.460 | INFO | __main__:init_components:108 - Initializing components... |
|
|
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|█████ | 1/2 [00:26<00:26, 26.40s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:31<00:00, 13.63s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:31<00:00, 15.54s/it] |
|
|
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 |
|
|
2023-12-05 18:57:02.985 | INFO | __main__:init_components:155 - |
|
|
|
|
|
2023-12-05 18:57:02.985 | INFO | __main__:init_components:156 - ******************** |
|
|
2023-12-05 18:57:02.985 | INFO | __main__:init_components:157 - using llama2 model |
|
|
2023-12-05 18:57:02.985 | INFO | __main__:init_components:158 - ******************** |
|
|
2023-12-05 18:57:02.985 | INFO | __main__:init_components:159 - |
|
|
|
|
|
memory footprint of model: 4.024436950683594 GB |
|
|
trainable params: 319,815,680 || all params: 7,058,231,296 || trainable%: 4.531102291607305 |
|
|
2023-12-05 18:57:44.936 | INFO | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/QuALITY/random_select/quality_chunk_2048_instruct/train.jsonl |
|
|
2023-12-05 18:57:45.028 | INFO | component.dataset:__init__:19 - there are 2523 data in dataset |
|
|
2023-12-05 18:57:45.086 | INFO | __main__:main:231 - *** starting training *** |
|
|
0%| | 0/420 [00:00<?, ?it/s]
0%| | 1/420 [00:33<3:55:39, 33.74s/it]
0%| | 2/420 [01:03<3:37:45, 31.26s/it]
1%| | 3/420 [01:32<3:32:06, 30.52s/it]
1%| | 4/420 [02:02<3:28:14, 30.03s/it]
1%| | 5/420 [02:32<3:27:39, 30.02s/it]
1%|▏ | 6/420 [03:02<3:27:09, 30.02s/it]
2%|▏ | 7/420 [03:31<3:25:11, 29.81s/it]
2%|▏ | 8/420 [04:00<3:23:43, 29.67s/it]
2%|▏ | 9/420 [04:30<3:23:20, 29.68s/it]
2%|▏ | 10/420 [05:00<3:22:54, 29.69s/it]
3%|▎ | 11/420 [05:30<3:22:29, 29.71s/it]
3%|▎ | 12/420 [05:59<3:21:19, 29.61s/it]
3%|▎ | 13/420 [06:29<3:21:03, 29.64s/it]
3%|▎ | 14/420 [06:58<3:20:44, 29.67s/it]
4%|▎ | 15/420 [07:28<3:20:20, 29.68s/it]
4%|▍ | 16/420 [07:58<3:19:14, 29.59s/it]
4%|▍ | 17/420 [08:27<3:19:00, 29.63s/it]
4%|▍ | 18/420 [08:57<3:19:20, 29.75s/it]
5%|▍ | 19/420 [09:27<3:18:46, 29.74s/it]
5%|▍ | 20/420 [09:57<3:18:54, 29.84s/it]
5%|▌ | 21/420 [10:27<3:18:11, 29.80s/it]
5%|▌ | 22/420 [10:56<3:16:51, 29.68s/it]
5%|▌ | 23/420 [11:26<3:16:26, 29.69s/it]
6%|▌ | 24/420 [11:56<3:15:57, 29.69s/it]
6%|▌ | 25/420 [12:25<3:15:30, 29.70s/it]
6%|▌ | 26/420 [12:55<3:15:01, 29.70s/it]
6%|▋ | 27/420 [13:25<3:15:13, 29.81s/it]
7%|▋ | 28/420 [13:55<3:15:14, 29.89s/it]
7%|▋ | 29/420 [14:25<3:14:20, 29.82s/it]
7%|▋ | 30/420 [14:55<3:13:35, 29.78s/it]
7%|▋ | 31/420 [15:24<3:12:16, 29.66s/it]
8%|▊ | 32/420 [15:54<3:12:29, 29.77s/it]
8%|▊ | 33/420 [16:23<3:11:12, 29.65s/it]
8%|▊ | 34/420 [16:53<3:11:30, 29.77s/it]
8%|▊ | 35/420 [17:23<3:10:53, 29.75s/it]
9%|▊ | 36/420 [17:53<3:10:57, 29.84s/it]
9%|▉ | 37/420 [18:23<3:10:53, 29.91s/it]
9%|▉ | 38/420 [18:53<3:09:59, 29.84s/it]
9%|▉ | 39/420 [19:23<3:09:53, 29.90s/it]
10%|▉ | 40/420 [19:52<3:08:25, 29.75s/it]
10%|▉ | 41/420 [20:22<3:07:13, 29.64s/it]
10%|█ | 42/420 [20:51<3:06:51, 29.66s/it]
10%|█ | 43/420 [21:21<3:06:26, 29.67s/it]
10%|█ | 44/420 [21:51<3:06:40, 29.79s/it]
11%|█ | 45/420 [22:21<3:06:02, 29.77s/it]
11%|█ | 46/420 [22:51<3:05:26, 29.75s/it]
11%|█ | 47/420 [23:20<3:04:15, 29.64s/it]
11%|█▏ | 48/420 [23:50<3:04:31, 29.76s/it]
12%|█▏ | 49/420 [24:20<3:03:54, 29.74s/it]
12%|█▏ | 50/420 [24:49<3:03:20, 29.73s/it]
{'loss': 0.5683, 'learning_rate': 0.0001, 'global_step': 50, 'epoch': 0.24} |
|
|
12%|█▏ | 50/420 [24:49<3:03:20, 29.73s/it]
12%|█▏ | 51/420 [25:19<3:03:31, 29.84s/it]
12%|█▏ | 52/420 [25:49<3:02:09, 29.70s/it]
13%|█▎ | 53/420 [26:18<3:01:03, 29.60s/it]
13%|█▎ | 54/420 [26:48<3:00:46, 29.64s/it]
13%|█▎ | 55/420 [27:18<3:00:25, 29.66s/it]
13%|█▎ | 56/420 [27:47<3:00:02, 29.68s/it]
14%|█▎ | 57/420 [28:17<3:00:13, 29.79s/it]
14%|█▍ | 58/420 [28:47<2:59:36, 29.77s/it]
14%|█▍ | 59/420 [29:17<2:59:34, 29.85s/it]
14%|█▍ | 60/420 [29:47<2:58:50, 29.81s/it]
15%|█▍ | 61/420 [30:16<2:57:34, 29.68s/it]
15%|█▍ | 62/420 [30:46<2:57:41, 29.78s/it]
15%|█▌ | 63/420 [31:16<2:56:19, 29.64s/it]
15%|█▌ | 64/420 [31:45<2:55:23, 29.56s/it]
15%|█▌ | 65/420 [32:15<2:55:12, 29.61s/it]
16%|█▌ | 66/420 [32:44<2:54:17, 29.54s/it]
16%|█▌ | 67/420 [33:14<2:54:04, 29.59s/it]
16%|█▌ | 68/420 [33:44<2:54:20, 29.72s/it]
16%|█▋ | 69/420 [34:14<2:54:23, 29.81s/it]
17%|█▋ | 70/420 [34:44<2:54:17, 29.88s/it]
17%|█▋ | 71/420 [35:14<2:53:30, 29.83s/it]
17%|█▋ | 72/420 [35:43<2:52:09, 29.68s/it]
17%|█▋ | 73/420 [36:13<2:52:14, 29.78s/it]
18%|█▊ | 74/420 [36:43<2:52:10, 29.86s/it]
18%|█▊ | 75/420 [37:13<2:51:25, 29.81s/it]
18%|█▊ | 76/420 [37:42<2:50:09, 29.68s/it]
18%|█▊ | 77/420 [38:11<2:49:08, 29.59s/it]
19%|█▊ | 78/420 [38:41<2:49:21, 29.71s/it]
19%|█▉ | 79/420 [39:11<2:49:24, 29.81s/it]
19%|█▉ | 80/420 [39:41<2:49:15, 29.87s/it]
19%|█▉ | 81/420 [40:11<2:49:01, 29.92s/it]
20%|█▉ | 82/420 [40:41<2:47:59, 29.82s/it]
20%|█▉ | 83/420 [41:10<2:46:43, 29.68s/it]
20%|██ | 84/420 [41:40<2:46:16, 29.69s/it]
20%|██ | 85/420 [42:10<2:45:47, 29.70s/it]
20%|██ | 86/420 [42:40<2:45:18, 29.70s/it]
21%|██ | 87/420 [43:09<2:44:49, 29.70s/it]
21%|██ | 88/420 [43:39<2:44:19, 29.70s/it]
21%|██ | 89/420 [44:09<2:44:21, 29.79s/it]
21%|██▏ | 90/420 [44:38<2:43:07, 29.66s/it]
22%|██▏ | 91/420 [45:08<2:42:08, 29.57s/it]
22%|██▏ | 92/420 [45:37<2:41:52, 29.61s/it]
22%|██▏ | 93/420 [46:07<2:42:03, 29.73s/it]
22%|██▏ | 94/420 [46:37<2:42:02, 29.82s/it]
23%|██▎ | 95/420 [47:07<2:40:40, 29.66s/it]
23%|██▎ | 96/420 [47:37<2:40:46, 29.77s/it]
23%|██▎ | 97/420 [48:06<2:40:12, 29.76s/it]
23%|██▎ | 98/420 [48:36<2:40:07, 29.84s/it]
24%|██▎ | 99/420 [49:06<2:38:42, 29.67s/it]
24%|██▍ | 100/420 [49:35<2:38:18, 29.68s/it]
{'loss': 0.5393, 'learning_rate': 0.0001, 'global_step': 100, 'epoch': 0.48} |
|
|
24%|██▍ | 100/420 [49:35<2:38:18, 29.68s/it]
24%|██▍ | 101/420 [50:06<2:38:24, 29.80s/it]
24%|██▍ | 102/420 [50:35<2:37:36, 29.74s/it]
25%|██▍ | 103/420 [51:05<2:37:06, 29.74s/it]
25%|██▍ | 104/420 [51:35<2:36:34, 29.73s/it]
25%|██▌ | 105/420 [52:05<2:36:32, 29.82s/it]
25%|██▌ | 106/420 [52:35<2:36:24, 29.89s/it]
25%|██▌ | 107/420 [53:04<2:35:37, 29.83s/it]
26%|██▌ | 108/420 [53:34<2:35:27, 29.90s/it]
26%|██▌ | 109/420 [54:04<2:34:28, 29.80s/it]
26%|██▌ | 110/420 [54:33<2:33:16, 29.66s/it]
26%|██▋ | 111/420 [55:03<2:32:50, 29.68s/it]
27%|██▋ | 112/420 [55:33<2:32:55, 29.79s/it]
27%|██▋ | 113/420 [56:02<2:31:47, 29.67s/it]
27%|██▋ | 114/420 [56:32<2:30:51, 29.58s/it]
27%|██▋ | 115/420 [57:02<2:30:33, 29.62s/it]
28%|██▊ | 116/420 [57:31<2:29:40, 29.54s/it]
28%|██▊ | 117/420 [58:01<2:29:27, 29.60s/it]
28%|██▊ | 118/420 [58:31<2:29:35, 29.72s/it]
28%|██▊ | 119/420 [59:00<2:29:02, 29.71s/it]
29%|██▊ | 120/420 [59:30<2:28:01, 29.61s/it]
29%|██▉ | 121/420 [1:00:00<2:28:08, 29.73s/it]
29%|██▉ | 122/420 [1:00:29<2:27:37, 29.72s/it]
29%|██▉ | 123/420 [1:00:59<2:27:06, 29.72s/it]
30%|██▉ | 124/420 [1:01:29<2:26:35, 29.71s/it]
30%|██▉ | 125/420 [1:01:59<2:26:06, 29.72s/it]
30%|███ | 126/420 [1:02:28<2:25:38, 29.72s/it]
30%|███ | 127/420 [1:02:58<2:24:38, 29.62s/it]
30%|███ | 128/420 [1:03:27<2:23:46, 29.54s/it]
31%|███ | 129/420 [1:03:57<2:23:58, 29.69s/it]
31%|███ | 130/420 [1:04:26<2:23:00, 29.59s/it]
31%|███ | 131/420 [1:04:56<2:22:43, 29.63s/it]
31%|███▏ | 132/420 [1:05:26<2:22:19, 29.65s/it]
32%|███▏ | 133/420 [1:05:56<2:21:54, 29.67s/it]
32%|███▏ | 134/420 [1:06:26<2:21:54, 29.77s/it]
32%|███▏ | 135/420 [1:06:56<2:21:47, 29.85s/it]
32%|███▏ | 136/420 [1:07:26<2:21:32, 29.90s/it]
33%|███▎ | 137/420 [1:07:55<2:20:45, 29.84s/it]
33%|███▎ | 138/420 [1:08:25<2:20:03, 29.80s/it]
33%|███▎ | 139/420 [1:08:55<2:19:22, 29.76s/it]
33%|███▎ | 140/420 [1:09:24<2:18:47, 29.74s/it]
34%|███▎ | 141/420 [1:09:54<2:18:14, 29.73s/it]
34%|███▍ | 142/420 [1:10:24<2:18:07, 29.81s/it]
34%|███▍ | 143/420 [1:10:54<2:17:54, 29.87s/it]
34%|███▍ | 144/420 [1:11:24<2:17:11, 29.82s/it]
35%|███▍ | 145/420 [1:11:54<2:16:58, 29.88s/it]
35%|███▍ | 146/420 [1:12:24<2:16:15, 29.84s/it]
35%|███▌ | 147/420 [1:12:53<2:15:33, 29.79s/it]
35%|███▌ | 148/420 [1:13:23<2:14:54, 29.76s/it]
35%|███▌ | 149/420 [1:13:53<2:14:19, 29.74s/it]
36%|███▌ | 150/420 [1:14:22<2:13:18, 29.63s/it]
{'loss': 0.5376, 'learning_rate': 0.0001, 'global_step': 150, 'epoch': 0.71} |
|
|
36%|███▌ | 150/420 [1:14:22<2:13:18, 29.63s/it]
36%|███▌ | 151/420 [1:14:52<2:13:20, 29.74s/it]
36%|███▌ | 152/420 [1:15:21<2:12:19, 29.62s/it]
36%|███▋ | 153/420 [1:15:51<2:11:55, 29.65s/it]
37%|███▋ | 154/420 [1:16:21<2:11:56, 29.76s/it]
37%|███▋ | 155/420 [1:16:50<2:10:54, 29.64s/it]
37%|███▋ | 156/420 [1:17:20<2:10:31, 29.66s/it]
37%|███▋ | 157/420 [1:17:50<2:10:31, 29.78s/it]
38%|███▊ | 158/420 [1:18:20<2:09:56, 29.76s/it]
38%|███▊ | 159/420 [1:18:50<2:09:46, 29.84s/it]
38%|███▊ | 160/420 [1:19:20<2:09:07, 29.80s/it]
38%|███▊ | 161/420 [1:19:49<2:08:32, 29.78s/it]
39%|███▊ | 162/420 [1:20:19<2:08:21, 29.85s/it]
39%|███▉ | 163/420 [1:20:49<2:08:03, 29.90s/it]
39%|███▉ | 164/420 [1:21:19<2:07:19, 29.84s/it]
39%|███▉ | 165/420 [1:21:49<2:07:04, 29.90s/it]
40%|███▉ | 166/420 [1:22:19<2:06:21, 29.85s/it]
40%|███▉ | 167/420 [1:22:49<2:05:43, 29.82s/it]
40%|████ | 168/420 [1:23:18<2:05:04, 29.78s/it]
40%|████ | 169/420 [1:23:48<2:04:30, 29.76s/it]
40%|████ | 170/420 [1:24:18<2:03:57, 29.75s/it]
41%|████ | 171/420 [1:24:47<2:03:23, 29.73s/it]
41%|████ | 172/420 [1:25:17<2:02:27, 29.63s/it]
41%|████ | 173/420 [1:25:47<2:02:29, 29.76s/it]
41%|████▏ | 174/420 [1:26:16<2:01:48, 29.71s/it]
42%|████▏ | 175/420 [1:26:46<2:01:19, 29.71s/it]
42%|████▏ | 176/420 [1:27:16<2:00:25, 29.61s/it]
42%|████▏ | 177/420 [1:27:45<2:00:02, 29.64s/it]
42%|████▏ | 178/420 [1:28:15<1:59:37, 29.66s/it]
43%|████▎ | 179/420 [1:28:45<1:59:34, 29.77s/it]
43%|████▎ | 180/420 [1:29:15<1:59:00, 29.75s/it]
43%|████▎ | 181/420 [1:29:44<1:58:27, 29.74s/it]
43%|████▎ | 182/420 [1:30:14<1:57:31, 29.63s/it]
44%|████▎ | 183/420 [1:30:44<1:57:30, 29.75s/it]
44%|████▍ | 184/420 [1:31:14<1:56:57, 29.74s/it]
44%|████▍ | 185/420 [1:31:43<1:56:25, 29.73s/it]
44%|████▍ | 186/420 [1:32:13<1:55:29, 29.61s/it]
45%|████▍ | 187/420 [1:32:42<1:55:06, 29.64s/it]
45%|████▍ | 188/420 [1:33:12<1:55:06, 29.77s/it]
45%|████▌ | 189/420 [1:33:42<1:54:54, 29.85s/it]
45%|████▌ | 190/420 [1:34:12<1:54:15, 29.81s/it]
45%|████▌ | 191/420 [1:34:42<1:54:00, 29.87s/it]
46%|████▌ | 192/420 [1:35:12<1:52:56, 29.72s/it]
46%|████▌ | 193/420 [1:35:41<1:52:26, 29.72s/it]
46%|████▌ | 194/420 [1:36:11<1:51:32, 29.61s/it]
46%|████▋ | 195/420 [1:36:40<1:50:45, 29.54s/it]
47%|████▋ | 196/420 [1:37:10<1:50:28, 29.59s/it]
47%|████▋ | 197/420 [1:37:40<1:50:28, 29.72s/it]
47%|████▋ | 198/420 [1:38:10<1:50:16, 29.80s/it]
47%|████▋ | 199/420 [1:38:40<1:49:59, 29.86s/it]
48%|████▊ | 200/420 [1:39:09<1:48:50, 29.69s/it]
{'loss': 0.5331, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 0.95} |
|
|
48%|████▊ | 200/420 [1:39:09<1:48:50, 29.69s/it]
48%|████▊ | 201/420 [1:39:42<1:52:28, 30.81s/it]
48%|████▊ | 202/420 [1:40:12<1:50:45, 30.49s/it]
48%|████▊ | 203/420 [1:40:42<1:49:23, 30.25s/it]
49%|████▊ | 204/420 [1:41:12<1:48:37, 30.17s/it]
49%|████▉ | 205/420 [1:41:42<1:47:37, 30.03s/it]
49%|████▉ | 206/420 [1:42:12<1:47:06, 30.03s/it]
49%|████▉ | 207/420 [1:42:41<1:46:15, 29.93s/it]
50%|████▉ | 208/420 [1:43:11<1:45:32, 29.87s/it]
50%|████▉ | 209/420 [1:43:41<1:44:52, 29.82s/it]
50%|█████ | 210/420 [1:44:10<1:43:52, 29.68s/it]
50%|█████ | 211/420 [1:44:35<1:38:22, 28.24s/it]
50%|█████ | 212/420 [1:45:05<1:39:23, 28.67s/it]
51%|█████ | 213/420 [1:45:35<1:40:18, 29.07s/it]
51%|█████ | 214/420 [1:46:04<1:40:06, 29.16s/it]
51%|█████ | 215/420 [1:46:34<1:40:29, 29.41s/it]
51%|█████▏ | 216/420 [1:47:04<1:40:17, 29.50s/it]
52%|█████▏ | 217/420 [1:47:34<1:40:21, 29.66s/it]
52%|█████▏ | 218/420 [1:48:03<1:39:28, 29.55s/it]
52%|█████▏ | 219/420 [1:48:32<1:38:45, 29.48s/it]
52%|█████▏ | 220/420 [1:49:02<1:38:30, 29.55s/it]
53%|█████▎ | 221/420 [1:49:32<1:38:09, 29.59s/it]
53%|█████▎ | 222/420 [1:50:01<1:37:45, 29.63s/it]
53%|█████▎ | 223/420 [1:50:31<1:37:20, 29.65s/it]
53%|█████▎ | 224/420 [1:51:01<1:36:51, 29.65s/it]
54%|█████▎ | 225/420 [1:51:31<1:36:41, 29.75s/it]
54%|█████▍ | 226/420 [1:52:01<1:36:26, 29.83s/it]
54%|█████▍ | 227/420 [1:52:30<1:35:47, 29.78s/it]
54%|█████▍ | 228/420 [1:53:00<1:35:31, 29.85s/it]
55%|█████▍ | 229/420 [1:53:30<1:34:52, 29.80s/it]
55%|█████▍ | 230/420 [1:54:00<1:34:15, 29.77s/it]
55%|█████▌ | 231/420 [1:54:30<1:33:59, 29.84s/it]
55%|█████▌ | 232/420 [1:55:00<1:33:20, 29.79s/it]
55%|█████▌ | 233/420 [1:55:29<1:32:44, 29.76s/it]
56%|█████▌ | 234/420 [1:55:58<1:31:45, 29.60s/it]
56%|█████▌ | 235/420 [1:56:28<1:31:20, 29.63s/it]
56%|█████▌ | 236/420 [1:56:57<1:30:34, 29.54s/it]
56%|█████▋ | 237/420 [1:57:27<1:30:12, 29.58s/it]
57%|█████▋ | 238/420 [1:57:57<1:29:50, 29.62s/it]
57%|█████▋ | 239/420 [1:58:27<1:29:24, 29.64s/it]
57%|█████▋ | 240/420 [1:58:56<1:28:58, 29.66s/it]
57%|█████▋ | 241/420 [1:59:26<1:28:30, 29.67s/it]
58%|█████▊ | 242/420 [1:59:56<1:28:19, 29.77s/it]
58%|█████▊ | 243/420 [2:00:26<1:27:47, 29.76s/it]
58%|█████▊ | 244/420 [2:00:56<1:27:31, 29.84s/it]
58%|█████▊ | 245/420 [2:01:26<1:27:10, 29.89s/it]
59%|█████▊ | 246/420 [2:01:56<1:26:48, 29.93s/it]
59%|█████▉ | 247/420 [2:02:26<1:26:21, 29.95s/it]
59%|█████▉ | 248/420 [2:02:55<1:25:38, 29.87s/it]
59%|█████▉ | 249/420 [2:03:25<1:24:41, 29.71s/it]
60%|█████▉ | 250/420 [2:03:54<1:24:10, 29.71s/it]
{'loss': 0.3011, 'learning_rate': 0.0001, 'global_step': 250, 'epoch': 1.19} |
|
|
60%|█████▉ | 250/420 [2:03:54<1:24:10, 29.71s/it]
60%|█████▉ | 251/420 [2:04:24<1:23:41, 29.71s/it]
60%|██████ | 252/420 [2:04:54<1:23:10, 29.70s/it]
60%|██████ | 253/420 [2:05:24<1:22:40, 29.70s/it]
60%|██████ | 254/420 [2:05:53<1:22:10, 29.70s/it]
61%|██████ | 255/420 [2:06:23<1:21:40, 29.70s/it]
61%|██████ | 256/420 [2:06:53<1:21:09, 29.69s/it]
61%|██████ | 257/420 [2:07:22<1:20:39, 29.69s/it]
61%|██████▏ | 258/420 [2:07:52<1:19:53, 29.59s/it]
62%|██████▏ | 259/420 [2:08:21<1:19:29, 29.63s/it]
62%|██████▏ | 260/420 [2:08:51<1:19:18, 29.74s/it]
62%|██████▏ | 261/420 [2:09:21<1:18:45, 29.72s/it]
62%|██████▏ | 262/420 [2:09:50<1:17:58, 29.61s/it]
63%|██████▎ | 263/420 [2:10:20<1:17:32, 29.63s/it]
63%|██████▎ | 264/420 [2:10:49<1:16:49, 29.55s/it]
63%|██████▎ | 265/420 [2:11:19<1:16:39, 29.68s/it]
63%|██████▎ | 266/420 [2:11:49<1:16:24, 29.77s/it]
64%|██████▎ | 267/420 [2:12:19<1:15:51, 29.75s/it]
64%|██████▍ | 268/420 [2:12:49<1:15:19, 29.73s/it]
64%|██████▍ | 269/420 [2:13:18<1:14:47, 29.72s/it]
64%|██████▍ | 270/420 [2:13:48<1:14:29, 29.79s/it]
65%|██████▍ | 271/420 [2:14:18<1:13:53, 29.75s/it]
65%|██████▍ | 272/420 [2:14:48<1:13:21, 29.74s/it]
65%|██████▌ | 273/420 [2:15:18<1:13:03, 29.82s/it]
65%|██████▌ | 274/420 [2:15:48<1:12:42, 29.88s/it]
65%|██████▌ | 275/420 [2:16:17<1:11:49, 29.72s/it]
66%|██████▌ | 276/420 [2:16:47<1:11:29, 29.79s/it]
66%|██████▌ | 277/420 [2:17:17<1:10:54, 29.75s/it]
66%|██████▌ | 278/420 [2:17:46<1:10:07, 29.63s/it]
66%|██████▋ | 279/420 [2:18:16<1:09:38, 29.64s/it]
67%|██████▋ | 280/420 [2:18:46<1:09:11, 29.66s/it]
67%|██████▋ | 281/420 [2:19:16<1:08:56, 29.76s/it]
67%|██████▋ | 282/420 [2:19:45<1:08:22, 29.73s/it]
67%|██████▋ | 283/420 [2:20:15<1:07:50, 29.71s/it]
68%|██████▊ | 284/420 [2:20:45<1:07:32, 29.80s/it]
68%|██████▊ | 285/420 [2:21:14<1:06:44, 29.66s/it]
68%|██████▊ | 286/420 [2:21:44<1:06:29, 29.77s/it]
68%|██████▊ | 287/420 [2:22:14<1:05:55, 29.74s/it]
69%|██████▊ | 288/420 [2:22:44<1:05:24, 29.73s/it]
69%|██████▉ | 289/420 [2:23:14<1:05:05, 29.81s/it]
69%|██████▉ | 290/420 [2:23:43<1:04:17, 29.67s/it]
69%|██████▉ | 291/420 [2:24:12<1:03:33, 29.57s/it]
70%|██████▉ | 292/420 [2:24:42<1:03:10, 29.61s/it]
70%|██████▉ | 293/420 [2:25:12<1:02:38, 29.60s/it]
70%|███████ | 294/420 [2:25:41<1:02:13, 29.63s/it]
70%|███████ | 295/420 [2:26:11<1:01:57, 29.74s/it]
70%|███████ | 296/420 [2:26:41<1:01:13, 29.63s/it]
71%|███████ | 297/420 [2:27:10<1:00:34, 29.55s/it]
71%|███████ | 298/420 [2:27:40<1:00:09, 29.59s/it]
71%|███████ | 299/420 [2:28:09<59:42, 29.61s/it]
71%|███████▏ | 300/420 [2:28:39<59:16, 29.64s/it]
{'loss': 0.1997, 'learning_rate': 0.0001, 'global_step': 300, 'epoch': 1.43} |
|
|
71%|███████▏ | 300/420 [2:28:39<59:16, 29.64s/it]
72%|███████▏ | 301/420 [2:29:09<59:00, 29.75s/it]
72%|███████▏ | 302/420 [2:29:39<58:28, 29.73s/it]
72%|███████▏ | 303/420 [2:30:08<57:45, 29.62s/it]
72%|███████▏ | 304/420 [2:30:38<57:18, 29.64s/it]
73%|███████▎ | 305/420 [2:31:07<56:39, 29.56s/it]
73%|███████▎ | 306/420 [2:31:36<56:02, 29.50s/it]
73%|███████▎ | 307/420 [2:32:06<55:49, 29.64s/it]
73%|███████▎ | 308/420 [2:32:36<55:21, 29.65s/it]
74%|███████▎ | 309/420 [2:33:06<54:52, 29.66s/it]
74%|███████▍ | 310/420 [2:33:35<54:22, 29.66s/it]
74%|███████▍ | 311/420 [2:34:05<54:04, 29.76s/it]
74%|███████▍ | 312/420 [2:34:35<53:41, 29.83s/it]
75%|███████▍ | 313/420 [2:35:05<52:56, 29.68s/it]
75%|███████▍ | 314/420 [2:35:35<52:26, 29.69s/it]
75%|███████▌ | 315/420 [2:36:04<51:57, 29.69s/it]
75%|███████▌ | 316/420 [2:36:34<51:26, 29.68s/it]
75%|███████▌ | 317/420 [2:37:03<50:45, 29.57s/it]
76%|███████▌ | 318/420 [2:37:33<50:19, 29.60s/it]
76%|███████▌ | 319/420 [2:38:03<49:51, 29.62s/it]
76%|███████▌ | 320/420 [2:38:32<49:32, 29.72s/it]
76%|███████▋ | 321/420 [2:39:02<49:09, 29.80s/it]
77%|███████▋ | 322/420 [2:39:32<48:36, 29.76s/it]
77%|███████▋ | 323/420 [2:40:01<47:54, 29.63s/it]
77%|███████▋ | 324/420 [2:40:31<47:15, 29.54s/it]
77%|███████▋ | 325/420 [2:41:00<46:40, 29.48s/it]
78%|███████▊ | 326/420 [2:41:30<46:16, 29.54s/it]
78%|███████▊ | 327/420 [2:42:00<45:58, 29.67s/it]
78%|███████▊ | 328/420 [2:42:30<45:39, 29.77s/it]
78%|███████▊ | 329/420 [2:42:59<44:57, 29.65s/it]
79%|███████▊ | 330/420 [2:43:29<44:29, 29.66s/it]
79%|███████▉ | 331/420 [2:43:58<43:56, 29.62s/it]
79%|███████▉ | 332/420 [2:44:28<43:28, 29.64s/it]
79%|███████▉ | 333/420 [2:44:58<42:59, 29.65s/it]
80%|███████▉ | 334/420 [2:45:27<42:31, 29.66s/it]
80%|███████▉ | 335/420 [2:45:57<41:53, 29.57s/it]
80%|████████ | 336/420 [2:46:26<41:26, 29.60s/it]
80%|████████ | 337/420 [2:46:56<40:56, 29.60s/it]
80%|████████ | 338/420 [2:47:25<40:20, 29.52s/it]
81%|████████ | 339/420 [2:47:55<39:55, 29.57s/it]
81%|████████ | 340/420 [2:48:25<39:28, 29.61s/it]
81%|████████ | 341/420 [2:48:54<39:01, 29.64s/it]
81%|████████▏ | 342/420 [2:49:24<38:32, 29.65s/it]
82%|████████▏ | 343/420 [2:49:53<37:55, 29.56s/it]
82%|████████▏ | 344/420 [2:50:23<37:29, 29.60s/it]
82%|████████▏ | 345/420 [2:50:53<37:09, 29.72s/it]
82%|████████▏ | 346/420 [2:51:23<36:38, 29.71s/it]
83%|████████▎ | 347/420 [2:51:52<36:01, 29.61s/it]
83%|████████▎ | 348/420 [2:52:22<35:33, 29.63s/it]
83%|████████▎ | 349/420 [2:52:51<34:56, 29.52s/it]
83%|████████▎ | 350/420 [2:53:21<34:27, 29.54s/it]
{'loss': 0.1919, 'learning_rate': 0.0001, 'global_step': 350, 'epoch': 1.66} |
|
|
83%|████████▎ | 350/420 [2:53:21<34:27, 29.54s/it]
84%|████████▎ | 351/420 [2:53:51<34:07, 29.67s/it]
84%|████████▍ | 352/420 [2:54:20<33:38, 29.69s/it]
84%|████████▍ | 353/420 [2:54:50<33:15, 29.78s/it]
84%|████████▍ | 354/420 [2:55:20<32:35, 29.63s/it]
85%|████████▍ | 355/420 [2:55:49<32:07, 29.65s/it]
85%|████████▍ | 356/420 [2:56:19<31:38, 29.66s/it]
85%|████████▌ | 357/420 [2:56:49<31:15, 29.77s/it]
85%|████████▌ | 358/420 [2:57:19<30:38, 29.65s/it]
85%|████████▌ | 359/420 [2:57:49<30:15, 29.76s/it]
86%|████████▌ | 360/420 [2:58:19<29:50, 29.84s/it]
86%|████████▌ | 361/420 [2:58:48<29:17, 29.79s/it]
86%|████████▌ | 362/420 [2:59:18<28:51, 29.86s/it]
86%|████████▋ | 363/420 [2:59:48<28:18, 29.81s/it]
87%|████████▋ | 364/420 [3:00:18<27:47, 29.77s/it]
87%|████████▋ | 365/420 [3:00:48<27:20, 29.83s/it]
87%|████████▋ | 366/420 [3:01:17<26:48, 29.79s/it]
87%|████████▋ | 367/420 [3:01:47<26:16, 29.75s/it]
88%|████████▊ | 368/420 [3:02:17<25:51, 29.83s/it]
88%|████████▊ | 369/420 [3:02:47<25:18, 29.78s/it]
88%|████████▊ | 370/420 [3:03:16<24:42, 29.65s/it]
88%|████████▊ | 371/420 [3:03:46<24:13, 29.66s/it]
89%|████████▊ | 372/420 [3:04:15<23:44, 29.67s/it]
89%|████████▉ | 373/420 [3:04:45<23:09, 29.56s/it]
89%|████████▉ | 374/420 [3:05:14<22:41, 29.60s/it]
89%|████████▉ | 375/420 [3:05:44<22:08, 29.52s/it]
90%|████████▉ | 376/420 [3:06:13<21:41, 29.57s/it]
90%|████████▉ | 377/420 [3:06:43<21:17, 29.70s/it]
90%|█████████ | 378/420 [3:07:13<20:51, 29.79s/it]
90%|█████████ | 379/420 [3:07:43<20:16, 29.66s/it]
90%|█████████ | 380/420 [3:08:12<19:46, 29.67s/it]
91%|█████████ | 381/420 [3:08:42<19:21, 29.77s/it]
91%|█████████ | 382/420 [3:09:12<18:50, 29.75s/it]
91%|█████████ | 383/420 [3:09:41<18:16, 29.63s/it]
91%|█████████▏| 384/420 [3:10:11<17:47, 29.64s/it]
92%|█████████▏| 385/420 [3:10:41<17:21, 29.76s/it]
92%|█████████▏| 386/420 [3:11:10<16:47, 29.63s/it]
92%|█████████▏| 387/420 [3:11:40<16:21, 29.73s/it]
92%|█████████▏| 388/420 [3:12:10<15:53, 29.81s/it]
93%|█████████▎| 389/420 [3:12:40<15:23, 29.78s/it]
93%|█████████▎| 390/420 [3:13:10<14:52, 29.75s/it]
93%|█████████▎| 391/420 [3:13:40<14:24, 29.83s/it]
93%|█████████▎| 392/420 [3:14:10<13:53, 29.78s/it]
94%|█████████▎| 393/420 [3:14:39<13:23, 29.75s/it]
94%|█████████▍| 394/420 [3:15:09<12:53, 29.74s/it]
94%|█████████▍| 395/420 [3:15:39<12:22, 29.72s/it]
94%|█████████▍| 396/420 [3:16:08<11:50, 29.61s/it]
95%|█████████▍| 397/420 [3:16:37<11:18, 29.49s/it]
95%|█████████▍| 398/420 [3:17:07<10:52, 29.64s/it]
95%|█████████▌| 399/420 [3:17:37<10:22, 29.66s/it]
95%|█████████▌| 400/420 [3:18:07<09:53, 29.67s/it]
{'loss': 0.1817, 'learning_rate': 0.0001, 'global_step': 400, 'epoch': 1.9} |
|
|
95%|█████████▌| 400/420 [3:18:07<09:53, 29.67s/it]
95%|█████████▌| 401/420 [3:18:39<09:42, 30.66s/it]
96%|█████████▌| 402/420 [3:19:09<09:06, 30.36s/it]
96%|█████████▌| 403/420 [3:19:39<08:30, 30.06s/it]
96%|█████████▌| 404/420 [3:20:08<07:59, 29.95s/it]
96%|█████████▋| 405/420 [3:20:38<07:29, 29.97s/it]
97%|█████████▋| 406/420 [3:21:08<06:59, 29.97s/it]
97%|█████████▋| 407/420 [3:21:38<06:28, 29.89s/it]
97%|█████████▋| 408/420 [3:22:08<05:59, 29.92s/it]
97%|█████████▋| 409/420 [3:22:38<05:28, 29.84s/it]
98%|█████████▊| 410/420 [3:23:07<04:57, 29.78s/it]
98%|█████████▊| 411/420 [3:23:37<04:27, 29.75s/it]
98%|█████████▊| 412/420 [3:24:07<03:57, 29.73s/it]
98%|█████████▊| 413/420 [3:24:36<03:27, 29.71s/it]
99%|█████████▊| 414/420 [3:25:06<02:57, 29.59s/it]
99%|█████████▉| 415/420 [3:25:35<02:28, 29.61s/it]
99%|█████████▉| 416/420 [3:26:05<01:58, 29.63s/it]
99%|█████████▉| 417/420 [3:26:34<01:28, 29.53s/it]
100%|█████████▉| 418/420 [3:27:04<00:59, 29.58s/it]
100%|█████████▉| 419/420 [3:27:34<00:29, 29.69s/it]
100%|██████████| 420/420 [3:28:03<00:00, 29.69s/it]
{'train_runtime': 12484.1415, 'train_samples_per_second': 0.404, 'train_steps_per_second': 0.034, 'train_loss': 0.37379822503952753, 'epoch': 2.0} |
|
|
100%|██████████| 420/420 [3:28:04<00:00, 29.69s/it]
100%|██████████| 420/420 [3:28:04<00:00, 29.72s/it] |
|
|
***** train metrics ***** |
|
|
epoch = 2.0 |
|
|
train_loss = 0.3738 |
|
|
train_runtime = 3:28:04.14 |
|
|
train_samples_per_second = 0.404 |
|
|
train_steps_per_second = 0.034 |
|
|
|