| | model training desc: initialize model training... |
| | 2023-12-30 17:59:51.341 | INFO | __main__:init_components:108 - Initializing components... |
| |
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|█████ | 1/2 [00:09<00:09, 9.01s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:11<00:00, 5.18s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:11<00:00, 5.75s/it] |
| | You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 |
| | 2023-12-30 18:00:03.415 | INFO | __main__:init_components:155 - |
| |
|
| | 2023-12-30 18:00:03.415 | INFO | __main__:init_components:156 - ******************** |
| | 2023-12-30 18:00:03.415 | INFO | __main__:init_components:157 - using llama2 model |
| | 2023-12-30 18:00:03.415 | INFO | __main__:init_components:158 - ******************** |
| | 2023-12-30 18:00:03.415 | INFO | __main__:init_components:159 - |
| |
|
| | memory footprint of model: 4.024436950683594 GB |
| | trainable params: 319,815,680 || all params: 7,058,231,296 || trainable%: 4.531102291607305 |
| | 2023-12-30 18:00:48.703 | INFO | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/QuALITY/Caption/quality_caption_and_rel_instruct/train.jsonl |
| | 2023-12-30 18:00:48.807 | INFO | component.dataset:__init__:19 - there are 2523 data in dataset |
| | 2023-12-30 18:00:49.225 | INFO | __main__:main:231 - *** starting training *** |
| |
0%| | 0/210 [00:00<?, ?it/s]
0%| | 1/210 [00:33<1:58:19, 33.97s/it]
1%| | 2/210 [01:03<1:48:04, 31.18s/it]
1%|▏ | 3/210 [01:32<1:44:41, 30.34s/it]
2%|▏ | 4/210 [02:01<1:42:54, 29.97s/it]
2%|▏ | 5/210 [02:31<1:41:41, 29.76s/it]
3%|▎ | 6/210 [03:01<1:41:11, 29.76s/it]
3%|▎ | 7/210 [03:30<1:40:18, 29.65s/it]
4%|▍ | 8/210 [04:00<1:39:55, 29.68s/it]
4%|▍ | 9/210 [04:29<1:39:06, 29.58s/it]
5%|▍ | 10/210 [04:58<1:38:23, 29.52s/it]
{'loss': 0.5879, 'learning_rate': 4.761904761904762e-05, 'global_step': 10, 'epoch': 0.05} |
| |
5%|▍ | 10/210 [04:59<1:38:23, 29.52s/it]
5%|▌ | 11/210 [05:28<1:38:06, 29.58s/it]
6%|▌ | 12/210 [05:58<1:37:45, 29.62s/it]
6%|▌ | 13/210 [06:28<1:37:21, 29.65s/it]
7%|▋ | 14/210 [06:57<1:36:36, 29.57s/it]
7%|▋ | 15/210 [07:27<1:36:17, 29.63s/it]
8%|▊ | 16/210 [07:57<1:35:56, 29.67s/it]
8%|▊ | 17/210 [08:26<1:35:31, 29.70s/it]
9%|▊ | 18/210 [08:56<1:35:05, 29.71s/it]
9%|▉ | 19/210 [09:27<1:35:15, 29.92s/it]
10%|▉ | 20/210 [09:57<1:34:55, 29.97s/it]
{'loss': 0.506, 'learning_rate': 9.523809523809524e-05, 'global_step': 20, 'epoch': 0.1} |
| |
10%|▉ | 20/210 [09:57<1:34:55, 29.97s/it]
10%|█ | 21/210 [10:26<1:33:54, 29.81s/it]
10%|█ | 22/210 [10:56<1:33:39, 29.89s/it]
11%|█ | 23/210 [11:26<1:32:43, 29.75s/it]
11%|█▏ | 24/210 [11:55<1:31:56, 29.66s/it]
12%|█▏ | 25/210 [12:24<1:31:14, 29.59s/it]
12%|█▏ | 26/210 [12:54<1:30:36, 29.55s/it]
13%|█▎ | 27/210 [13:24<1:30:16, 29.60s/it]
13%|█▎ | 28/210 [13:53<1:29:29, 29.50s/it]
14%|█▍ | 29/210 [14:23<1:29:11, 29.57s/it]
14%|█▍ | 30/210 [14:53<1:29:08, 29.72s/it]
{'loss': 0.5252, 'learning_rate': 0.0001, 'global_step': 30, 'epoch': 0.14} |
| |
14%|█▍ | 30/210 [14:53<1:29:08, 29.72s/it]
15%|█▍ | 31/210 [15:22<1:28:25, 29.64s/it]
15%|█▌ | 32/210 [15:52<1:28:16, 29.76s/it]
16%|█▌ | 33/210 [16:22<1:27:45, 29.75s/it]
16%|█▌ | 34/210 [16:51<1:26:58, 29.65s/it]
17%|█▋ | 35/210 [17:21<1:26:17, 29.59s/it]
17%|█▋ | 36/210 [17:50<1:25:39, 29.54s/it]
18%|█▊ | 37/210 [18:20<1:25:38, 29.70s/it]
18%|█▊ | 38/210 [18:50<1:24:54, 29.62s/it]
19%|█▊ | 39/210 [19:19<1:24:15, 29.56s/it]
19%|█▉ | 40/210 [19:49<1:24:10, 29.71s/it]
{'loss': 0.5572, 'learning_rate': 0.0001, 'global_step': 40, 'epoch': 0.19} |
| |
19%|█▉ | 40/210 [19:49<1:24:10, 29.71s/it]
20%|█▉ | 41/210 [20:19<1:23:43, 29.73s/it]
20%|██ | 42/210 [20:48<1:23:00, 29.64s/it]
20%|██ | 43/210 [21:18<1:22:36, 29.68s/it]
21%|██ | 44/210 [21:48<1:22:11, 29.71s/it]
21%|██▏ | 45/210 [22:18<1:22:00, 29.82s/it]
22%|██▏ | 46/210 [22:48<1:21:27, 29.80s/it]
22%|██▏ | 47/210 [23:18<1:20:56, 29.80s/it]
23%|██▎ | 48/210 [23:47<1:20:09, 29.69s/it]
23%|██▎ | 49/210 [24:16<1:19:28, 29.62s/it]
24%|██▍ | 50/210 [24:46<1:19:05, 29.66s/it]
{'loss': 0.4937, 'learning_rate': 0.0001, 'global_step': 50, 'epoch': 0.24} |
| |
24%|██▍ | 50/210 [24:46<1:19:05, 29.66s/it]
24%|██▍ | 51/210 [25:16<1:18:25, 29.60s/it]
25%|██▍ | 52/210 [25:45<1:17:48, 29.55s/it]
25%|██▌ | 53/210 [26:15<1:17:28, 29.61s/it]
26%|██▌ | 54/210 [26:44<1:16:50, 29.55s/it]
26%|██▌ | 55/210 [27:14<1:16:45, 29.71s/it]
27%|██▋ | 56/210 [27:44<1:16:18, 29.73s/it]
27%|██▋ | 57/210 [28:14<1:15:50, 29.74s/it]
28%|██▊ | 58/210 [28:44<1:15:22, 29.75s/it]
28%|██▊ | 59/210 [29:13<1:14:53, 29.76s/it]
29%|██▊ | 60/210 [29:43<1:14:09, 29.66s/it]
{'loss': 0.4925, 'learning_rate': 0.0001, 'global_step': 60, 'epoch': 0.29} |
| |
29%|██▊ | 60/210 [29:43<1:14:09, 29.66s/it]
29%|██▉ | 61/210 [30:12<1:13:31, 29.61s/it]
30%|██▉ | 62/210 [30:42<1:12:55, 29.56s/it]
30%|███ | 63/210 [31:12<1:12:35, 29.63s/it]
30%|███ | 64/210 [31:41<1:11:56, 29.56s/it]
31%|███ | 65/210 [32:10<1:11:23, 29.54s/it]
31%|███▏ | 66/210 [32:40<1:11:03, 29.61s/it]
32%|███▏ | 67/210 [33:10<1:10:27, 29.56s/it]
32%|███▏ | 68/210 [33:39<1:09:52, 29.53s/it]
33%|███▎ | 69/210 [34:09<1:09:19, 29.50s/it]
33%|███▎ | 70/210 [34:38<1:08:48, 29.49s/it]
{'loss': 0.4309, 'learning_rate': 0.0001, 'global_step': 70, 'epoch': 0.33} |
| |
33%|███▎ | 70/210 [34:38<1:08:48, 29.49s/it]
34%|███▍ | 71/210 [35:07<1:08:18, 29.48s/it]
34%|███▍ | 72/210 [35:37<1:07:47, 29.48s/it]
35%|███▍ | 73/210 [36:06<1:07:16, 29.46s/it]
35%|███▌ | 74/210 [36:36<1:07:00, 29.56s/it]
36%|███▌ | 75/210 [37:06<1:06:26, 29.53s/it]
36%|███▌ | 76/210 [37:35<1:06:05, 29.59s/it]
37%|███▋ | 77/210 [38:05<1:05:41, 29.64s/it]
37%|███▋ | 78/210 [38:35<1:05:03, 29.57s/it]
38%|███▊ | 79/210 [39:04<1:04:29, 29.54s/it]
38%|███▊ | 80/210 [39:33<1:03:55, 29.51s/it]
{'loss': 0.4831, 'learning_rate': 0.0001, 'global_step': 80, 'epoch': 0.38} |
| |
38%|███▊ | 80/210 [39:33<1:03:55, 29.51s/it]
39%|███▊ | 81/210 [40:03<1:03:34, 29.57s/it]
39%|███▉ | 82/210 [40:33<1:03:12, 29.63s/it]
40%|███▉ | 83/210 [41:03<1:02:47, 29.67s/it]
40%|████ | 84/210 [41:32<1:02:21, 29.70s/it]
40%|████ | 85/210 [42:02<1:01:42, 29.62s/it]
41%|████ | 86/210 [42:31<1:01:05, 29.56s/it]
41%|████▏ | 87/210 [43:01<1:00:32, 29.53s/it]
42%|████▏ | 88/210 [43:30<59:59, 29.51s/it]
42%|████▏ | 89/210 [44:00<59:38, 29.58s/it]
43%|████▎ | 90/210 [44:29<59:04, 29.53s/it]
{'loss': 0.4896, 'learning_rate': 0.0001, 'global_step': 90, 'epoch': 0.43} |
| |
43%|████▎ | 90/210 [44:29<59:04, 29.53s/it]
43%|████▎ | 91/210 [44:59<58:43, 29.61s/it]
44%|████▍ | 92/210 [45:29<58:18, 29.65s/it]
44%|████▍ | 93/210 [45:58<57:41, 29.59s/it]
45%|████▍ | 94/210 [46:28<57:29, 29.74s/it]
45%|████▌ | 95/210 [46:58<57:00, 29.74s/it]
46%|████▌ | 96/210 [47:28<56:19, 29.64s/it]
46%|████▌ | 97/210 [47:57<55:41, 29.57s/it]
47%|████▋ | 98/210 [48:27<55:18, 29.63s/it]
47%|████▋ | 99/210 [48:56<54:42, 29.57s/it]
48%|████▊ | 100/210 [49:26<54:18, 29.62s/it]
{'loss': 0.4257, 'learning_rate': 0.0001, 'global_step': 100, 'epoch': 0.48} |
| |
48%|████▊ | 100/210 [49:26<54:18, 29.62s/it]
48%|████▊ | 101/210 [49:59<55:56, 30.80s/it]
49%|████▊ | 102/210 [50:29<54:42, 30.39s/it]
49%|████▉ | 103/210 [50:59<53:50, 30.20s/it]
50%|████▉ | 104/210 [51:28<52:57, 29.98s/it]
50%|█████ | 105/210 [51:58<52:20, 29.91s/it]
50%|█████ | 106/210 [52:27<51:35, 29.77s/it]
51%|█████ | 107/210 [52:57<51:05, 29.77s/it]
51%|█████▏ | 108/210 [53:27<50:35, 29.76s/it]
52%|█████▏ | 109/210 [53:56<49:54, 29.65s/it]
52%|█████▏ | 110/210 [54:26<49:19, 29.59s/it]
{'loss': 0.5, 'learning_rate': 0.0001, 'global_step': 110, 'epoch': 0.52} |
| |
52%|█████▏ | 110/210 [54:26<49:19, 29.59s/it]
53%|█████▎ | 111/210 [54:55<48:54, 29.64s/it]
53%|█████▎ | 112/210 [55:25<48:16, 29.56s/it]
54%|█████▍ | 113/210 [55:54<47:44, 29.53s/it]
54%|█████▍ | 114/210 [56:24<47:21, 29.59s/it]
55%|█████▍ | 115/210 [56:53<46:46, 29.54s/it]
55%|█████▌ | 116/210 [57:24<46:31, 29.70s/it]
56%|█████▌ | 117/210 [57:53<46:03, 29.71s/it]
56%|█████▌ | 118/210 [58:23<45:25, 29.63s/it]
57%|█████▋ | 119/210 [58:52<44:59, 29.66s/it]
57%|█████▋ | 120/210 [59:22<44:23, 29.59s/it]
{'loss': 0.4954, 'learning_rate': 0.0001, 'global_step': 120, 'epoch': 0.57} |
| |
57%|█████▋ | 120/210 [59:22<44:23, 29.59s/it]
58%|█████▊ | 121/210 [59:52<43:55, 29.61s/it]
58%|█████▊ | 122/210 [1:00:21<43:29, 29.65s/it]
59%|█████▊ | 123/210 [1:00:51<42:53, 29.58s/it]
59%|█████▉ | 124/210 [1:01:20<42:19, 29.53s/it]
60%|█████▉ | 125/210 [1:01:50<41:56, 29.60s/it]
60%|██████ | 126/210 [1:02:19<41:21, 29.54s/it]
60%|██████ | 127/210 [1:02:49<40:48, 29.50s/it]
61%|██████ | 128/210 [1:03:18<40:19, 29.50s/it]
61%|██████▏ | 129/210 [1:03:48<39:55, 29.58s/it]
62%|██████▏ | 130/210 [1:04:17<39:22, 29.53s/it]
{'loss': 0.4691, 'learning_rate': 0.0001, 'global_step': 130, 'epoch': 0.62} |
| |
62%|██████▏ | 130/210 [1:04:17<39:22, 29.53s/it]
62%|██████▏ | 131/210 [1:04:47<38:49, 29.49s/it]
63%|██████▎ | 132/210 [1:05:16<38:18, 29.47s/it]
63%|██████▎ | 133/210 [1:05:46<37:47, 29.44s/it]
64%|██████▍ | 134/210 [1:06:15<37:16, 29.43s/it]
64%|██████▍ | 135/210 [1:06:45<36:53, 29.52s/it]
65%|██████▍ | 136/210 [1:07:14<36:21, 29.49s/it]
65%|██████▌ | 137/210 [1:07:44<35:51, 29.47s/it]
66%|██████▌ | 138/210 [1:08:13<35:27, 29.55s/it]
66%|██████▌ | 139/210 [1:08:43<35:02, 29.61s/it]
67%|██████▋ | 140/210 [1:09:13<34:35, 29.65s/it]
{'loss': 0.4373, 'learning_rate': 0.0001, 'global_step': 140, 'epoch': 0.67} |
| |
67%|██████▋ | 140/210 [1:09:13<34:35, 29.65s/it]
67%|██████▋ | 141/210 [1:09:42<34:07, 29.67s/it]
68%|██████▊ | 142/210 [1:10:12<33:39, 29.69s/it]
68%|██████▊ | 143/210 [1:10:42<33:16, 29.80s/it]
69%|██████▊ | 144/210 [1:11:12<32:39, 29.69s/it]
69%|██████▉ | 145/210 [1:11:41<32:10, 29.70s/it]
70%|██████▉ | 146/210 [1:12:11<31:35, 29.62s/it]
70%|███████ | 147/210 [1:12:40<31:02, 29.56s/it]
70%|███████ | 148/210 [1:13:10<30:30, 29.52s/it]
71%|███████ | 149/210 [1:13:39<29:58, 29.49s/it]
71%|███████▏ | 150/210 [1:14:09<29:28, 29.47s/it]
{'loss': 0.526, 'learning_rate': 0.0001, 'global_step': 150, 'epoch': 0.71} |
| |
71%|███████▏ | 150/210 [1:14:09<29:28, 29.47s/it]
72%|███████▏ | 151/210 [1:14:39<29:08, 29.64s/it]
72%|███████▏ | 152/210 [1:15:08<28:35, 29.58s/it]
73%|███████▎ | 153/210 [1:15:38<28:14, 29.73s/it]
73%|███████▎ | 154/210 [1:16:07<27:39, 29.63s/it]
74%|███████▍ | 155/210 [1:16:37<27:10, 29.65s/it]
74%|███████▍ | 156/210 [1:17:07<26:37, 29.58s/it]
75%|███████▍ | 157/210 [1:17:36<26:04, 29.52s/it]
75%|███████▌ | 158/210 [1:18:06<25:36, 29.55s/it]
76%|███████▌ | 159/210 [1:18:35<25:09, 29.60s/it]
76%|███████▌ | 160/210 [1:19:05<24:42, 29.65s/it]
{'loss': 0.4297, 'learning_rate': 0.0001, 'global_step': 160, 'epoch': 0.76} |
| |
76%|███████▌ | 160/210 [1:19:05<24:42, 29.65s/it]
77%|███████▋ | 161/210 [1:19:35<24:14, 29.68s/it]
77%|███████▋ | 162/210 [1:20:04<23:41, 29.60s/it]
78%|███████▊ | 163/210 [1:20:34<23:12, 29.64s/it]
78%|███████▊ | 164/210 [1:21:04<22:44, 29.66s/it]
79%|███████▊ | 165/210 [1:21:33<22:15, 29.67s/it]
79%|███████▉ | 166/210 [1:22:03<21:40, 29.56s/it]
80%|███████▉ | 167/210 [1:22:32<21:08, 29.51s/it]
80%|████████ | 168/210 [1:23:01<20:38, 29.48s/it]
80%|████████ | 169/210 [1:23:31<20:11, 29.56s/it]
81%|████████ | 170/210 [1:24:01<19:40, 29.51s/it]
{'loss': 0.4708, 'learning_rate': 0.0001, 'global_step': 170, 'epoch': 0.81} |
| |
81%|████████ | 170/210 [1:24:01<19:40, 29.51s/it]
81%|████████▏ | 171/210 [1:24:30<19:13, 29.57s/it]
82%|████████▏ | 172/210 [1:25:00<18:42, 29.53s/it]
82%|████████▏ | 173/210 [1:25:30<18:18, 29.68s/it]
83%|████████▎ | 174/210 [1:25:59<17:45, 29.61s/it]
83%|████████▎ | 175/210 [1:26:29<17:14, 29.54s/it]
84%|████████▍ | 176/210 [1:26:58<16:43, 29.51s/it]
84%|████████▍ | 177/210 [1:27:28<16:15, 29.57s/it]
85%|████████▍ | 178/210 [1:27:57<15:44, 29.52s/it]
85%|████████▌ | 179/210 [1:28:27<15:13, 29.48s/it]
86%|████████▌ | 180/210 [1:28:56<14:43, 29.44s/it]
{'loss': 0.4872, 'learning_rate': 0.0001, 'global_step': 180, 'epoch': 0.86} |
| |
86%|████████▌ | 180/210 [1:28:56<14:43, 29.44s/it]
86%|████████▌ | 181/210 [1:29:26<14:16, 29.52s/it]
87%|████████▋ | 182/210 [1:29:55<13:45, 29.48s/it]
87%|████████▋ | 183/210 [1:30:25<13:17, 29.55s/it]
88%|████████▊ | 184/210 [1:30:54<12:49, 29.60s/it]
88%|████████▊ | 185/210 [1:31:24<12:18, 29.54s/it]
89%|████████▊ | 186/210 [1:31:53<11:47, 29.50s/it]
89%|████████▉ | 187/210 [1:32:23<11:17, 29.47s/it]
90%|████████▉ | 188/210 [1:32:52<10:49, 29.54s/it]
90%|█████████ | 189/210 [1:33:22<10:21, 29.60s/it]
90%|█████████ | 190/210 [1:33:52<09:52, 29.64s/it]
{'loss': 0.4888, 'learning_rate': 0.0001, 'global_step': 190, 'epoch': 0.9} |
| |
90%|█████████ | 190/210 [1:33:52<09:52, 29.64s/it]
91%|█████████ | 191/210 [1:34:22<09:25, 29.76s/it]
91%|█████████▏| 192/210 [1:34:52<08:55, 29.75s/it]
92%|█████████▏| 193/210 [1:35:21<08:25, 29.75s/it]
92%|█████████▏| 194/210 [1:35:51<07:55, 29.74s/it]
93%|█████████▎| 195/210 [1:36:20<07:24, 29.65s/it]
93%|█████████▎| 196/210 [1:36:50<06:55, 29.67s/it]
94%|█████████▍| 197/210 [1:37:20<06:27, 29.79s/it]
94%|█████████▍| 198/210 [1:37:50<05:56, 29.71s/it]
95%|█████████▍| 199/210 [1:38:19<05:25, 29.61s/it]
95%|█████████▌| 200/210 [1:38:49<04:55, 29.55s/it]
{'loss': 0.4754, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 0.95} |
| |
95%|█████████▌| 200/210 [1:38:49<04:55, 29.55s/it]
96%|█████████▌| 201/210 [1:39:22<04:36, 30.70s/it]
96%|█████████▌| 202/210 [1:39:51<04:02, 30.31s/it]
97%|█████████▋| 203/210 [1:40:21<03:30, 30.05s/it]
97%|█████████▋| 204/210 [1:40:50<02:59, 29.86s/it]
98%|█████████▊| 205/210 [1:41:20<02:28, 29.73s/it]
98%|█████████▊| 206/210 [1:41:49<01:58, 29.64s/it]
99%|█████████▊| 207/210 [1:42:19<01:28, 29.58s/it]
99%|█████████▉| 208/210 [1:42:48<00:59, 29.53s/it]
100%|█████████▉| 209/210 [1:43:24<00:31, 31.55s/it]
100%|██████████| 210/210 [1:43:54<00:00, 30.92s/it]
{'loss': 0.4733, 'learning_rate': 0.0001, 'global_step': 210, 'epoch': 1.0} |
| |
100%|██████████| 210/210 [1:43:54<00:00, 30.92s/it]
{'train_runtime': 6234.3462, 'train_samples_per_second': 0.405, 'train_steps_per_second': 0.034, 'train_loss': 0.4878490357171921, 'epoch': 1.0} |
| |
100%|██████████| 210/210 [1:43:54<00:00, 30.92s/it]
100%|██████████| 210/210 [1:43:54<00:00, 29.69s/it] |
| | ***** train metrics ***** |
| | epoch = 1.0 |
| | train_loss = 0.4878 |
| | train_runtime = 1:43:54.34 |
| | train_samples_per_second = 0.405 |
| | train_steps_per_second = 0.034 |
| |
|