Asimok's picture
Upload 10 files
5126c18
model training desc: initialize model training...
2023-12-30 17:59:51.341 | INFO | __main__:init_components:108 - Initializing components...
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|█████ | 1/2 [00:09<00:09, 9.01s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:11<00:00, 5.18s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:11<00:00, 5.75s/it]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2023-12-30 18:00:03.415 | INFO | __main__:init_components:155 -
2023-12-30 18:00:03.415 | INFO | __main__:init_components:156 - ********************
2023-12-30 18:00:03.415 | INFO | __main__:init_components:157 - using llama2 model
2023-12-30 18:00:03.415 | INFO | __main__:init_components:158 - ********************
2023-12-30 18:00:03.415 | INFO | __main__:init_components:159 -
memory footprint of model: 4.024436950683594 GB
trainable params: 319,815,680 || all params: 7,058,231,296 || trainable%: 4.531102291607305
2023-12-30 18:00:48.703 | INFO | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/QuALITY/Caption/quality_caption_and_rel_instruct/train.jsonl
2023-12-30 18:00:48.807 | INFO | component.dataset:__init__:19 - there are 2523 data in dataset
2023-12-30 18:00:49.225 | INFO | __main__:main:231 - *** starting training ***
0%| | 0/210 [00:00<?, ?it/s] 0%| | 1/210 [00:33<1:58:19, 33.97s/it] 1%| | 2/210 [01:03<1:48:04, 31.18s/it] 1%|▏ | 3/210 [01:32<1:44:41, 30.34s/it] 2%|▏ | 4/210 [02:01<1:42:54, 29.97s/it] 2%|▏ | 5/210 [02:31<1:41:41, 29.76s/it] 3%|▎ | 6/210 [03:01<1:41:11, 29.76s/it] 3%|▎ | 7/210 [03:30<1:40:18, 29.65s/it] 4%|▍ | 8/210 [04:00<1:39:55, 29.68s/it] 4%|▍ | 9/210 [04:29<1:39:06, 29.58s/it] 5%|▍ | 10/210 [04:58<1:38:23, 29.52s/it] {'loss': 0.5879, 'learning_rate': 4.761904761904762e-05, 'global_step': 10, 'epoch': 0.05}
5%|▍ | 10/210 [04:59<1:38:23, 29.52s/it] 5%|▌ | 11/210 [05:28<1:38:06, 29.58s/it] 6%|▌ | 12/210 [05:58<1:37:45, 29.62s/it] 6%|▌ | 13/210 [06:28<1:37:21, 29.65s/it] 7%|▋ | 14/210 [06:57<1:36:36, 29.57s/it] 7%|▋ | 15/210 [07:27<1:36:17, 29.63s/it] 8%|▊ | 16/210 [07:57<1:35:56, 29.67s/it] 8%|▊ | 17/210 [08:26<1:35:31, 29.70s/it] 9%|▊ | 18/210 [08:56<1:35:05, 29.71s/it] 9%|▉ | 19/210 [09:27<1:35:15, 29.92s/it] 10%|▉ | 20/210 [09:57<1:34:55, 29.97s/it] {'loss': 0.506, 'learning_rate': 9.523809523809524e-05, 'global_step': 20, 'epoch': 0.1}
10%|▉ | 20/210 [09:57<1:34:55, 29.97s/it] 10%|█ | 21/210 [10:26<1:33:54, 29.81s/it] 10%|█ | 22/210 [10:56<1:33:39, 29.89s/it] 11%|█ | 23/210 [11:26<1:32:43, 29.75s/it] 11%|█▏ | 24/210 [11:55<1:31:56, 29.66s/it] 12%|█▏ | 25/210 [12:24<1:31:14, 29.59s/it] 12%|█▏ | 26/210 [12:54<1:30:36, 29.55s/it] 13%|█▎ | 27/210 [13:24<1:30:16, 29.60s/it] 13%|█▎ | 28/210 [13:53<1:29:29, 29.50s/it] 14%|█▍ | 29/210 [14:23<1:29:11, 29.57s/it] 14%|█▍ | 30/210 [14:53<1:29:08, 29.72s/it] {'loss': 0.5252, 'learning_rate': 0.0001, 'global_step': 30, 'epoch': 0.14}
14%|█▍ | 30/210 [14:53<1:29:08, 29.72s/it] 15%|█▍ | 31/210 [15:22<1:28:25, 29.64s/it] 15%|█▌ | 32/210 [15:52<1:28:16, 29.76s/it] 16%|█▌ | 33/210 [16:22<1:27:45, 29.75s/it] 16%|█▌ | 34/210 [16:51<1:26:58, 29.65s/it] 17%|█▋ | 35/210 [17:21<1:26:17, 29.59s/it] 17%|█▋ | 36/210 [17:50<1:25:39, 29.54s/it] 18%|█▊ | 37/210 [18:20<1:25:38, 29.70s/it] 18%|█▊ | 38/210 [18:50<1:24:54, 29.62s/it] 19%|█▊ | 39/210 [19:19<1:24:15, 29.56s/it] 19%|█▉ | 40/210 [19:49<1:24:10, 29.71s/it] {'loss': 0.5572, 'learning_rate': 0.0001, 'global_step': 40, 'epoch': 0.19}
19%|█▉ | 40/210 [19:49<1:24:10, 29.71s/it] 20%|█▉ | 41/210 [20:19<1:23:43, 29.73s/it] 20%|██ | 42/210 [20:48<1:23:00, 29.64s/it] 20%|██ | 43/210 [21:18<1:22:36, 29.68s/it] 21%|██ | 44/210 [21:48<1:22:11, 29.71s/it] 21%|██▏ | 45/210 [22:18<1:22:00, 29.82s/it] 22%|██▏ | 46/210 [22:48<1:21:27, 29.80s/it] 22%|██▏ | 47/210 [23:18<1:20:56, 29.80s/it] 23%|██▎ | 48/210 [23:47<1:20:09, 29.69s/it] 23%|██▎ | 49/210 [24:16<1:19:28, 29.62s/it] 24%|██▍ | 50/210 [24:46<1:19:05, 29.66s/it] {'loss': 0.4937, 'learning_rate': 0.0001, 'global_step': 50, 'epoch': 0.24}
24%|██▍ | 50/210 [24:46<1:19:05, 29.66s/it] 24%|██▍ | 51/210 [25:16<1:18:25, 29.60s/it] 25%|██▍ | 52/210 [25:45<1:17:48, 29.55s/it] 25%|██▌ | 53/210 [26:15<1:17:28, 29.61s/it] 26%|██▌ | 54/210 [26:44<1:16:50, 29.55s/it] 26%|██▌ | 55/210 [27:14<1:16:45, 29.71s/it] 27%|██▋ | 56/210 [27:44<1:16:18, 29.73s/it] 27%|██▋ | 57/210 [28:14<1:15:50, 29.74s/it] 28%|██▊ | 58/210 [28:44<1:15:22, 29.75s/it] 28%|██▊ | 59/210 [29:13<1:14:53, 29.76s/it] 29%|██▊ | 60/210 [29:43<1:14:09, 29.66s/it] {'loss': 0.4925, 'learning_rate': 0.0001, 'global_step': 60, 'epoch': 0.29}
29%|██▊ | 60/210 [29:43<1:14:09, 29.66s/it] 29%|██▉ | 61/210 [30:12<1:13:31, 29.61s/it] 30%|██▉ | 62/210 [30:42<1:12:55, 29.56s/it] 30%|███ | 63/210 [31:12<1:12:35, 29.63s/it] 30%|███ | 64/210 [31:41<1:11:56, 29.56s/it] 31%|███ | 65/210 [32:10<1:11:23, 29.54s/it] 31%|███▏ | 66/210 [32:40<1:11:03, 29.61s/it] 32%|███▏ | 67/210 [33:10<1:10:27, 29.56s/it] 32%|███▏ | 68/210 [33:39<1:09:52, 29.53s/it] 33%|███▎ | 69/210 [34:09<1:09:19, 29.50s/it] 33%|███▎ | 70/210 [34:38<1:08:48, 29.49s/it] {'loss': 0.4309, 'learning_rate': 0.0001, 'global_step': 70, 'epoch': 0.33}
33%|███▎ | 70/210 [34:38<1:08:48, 29.49s/it] 34%|███▍ | 71/210 [35:07<1:08:18, 29.48s/it] 34%|███▍ | 72/210 [35:37<1:07:47, 29.48s/it] 35%|███▍ | 73/210 [36:06<1:07:16, 29.46s/it] 35%|███▌ | 74/210 [36:36<1:07:00, 29.56s/it] 36%|███▌ | 75/210 [37:06<1:06:26, 29.53s/it] 36%|███▌ | 76/210 [37:35<1:06:05, 29.59s/it] 37%|███▋ | 77/210 [38:05<1:05:41, 29.64s/it] 37%|███▋ | 78/210 [38:35<1:05:03, 29.57s/it] 38%|███▊ | 79/210 [39:04<1:04:29, 29.54s/it] 38%|███▊ | 80/210 [39:33<1:03:55, 29.51s/it] {'loss': 0.4831, 'learning_rate': 0.0001, 'global_step': 80, 'epoch': 0.38}
38%|███▊ | 80/210 [39:33<1:03:55, 29.51s/it] 39%|███▊ | 81/210 [40:03<1:03:34, 29.57s/it] 39%|███▉ | 82/210 [40:33<1:03:12, 29.63s/it] 40%|███▉ | 83/210 [41:03<1:02:47, 29.67s/it] 40%|████ | 84/210 [41:32<1:02:21, 29.70s/it] 40%|████ | 85/210 [42:02<1:01:42, 29.62s/it] 41%|████ | 86/210 [42:31<1:01:05, 29.56s/it] 41%|████▏ | 87/210 [43:01<1:00:32, 29.53s/it] 42%|████▏ | 88/210 [43:30<59:59, 29.51s/it] 42%|████▏ | 89/210 [44:00<59:38, 29.58s/it] 43%|████▎ | 90/210 [44:29<59:04, 29.53s/it] {'loss': 0.4896, 'learning_rate': 0.0001, 'global_step': 90, 'epoch': 0.43}
43%|████▎ | 90/210 [44:29<59:04, 29.53s/it] 43%|████▎ | 91/210 [44:59<58:43, 29.61s/it] 44%|████▍ | 92/210 [45:29<58:18, 29.65s/it] 44%|████▍ | 93/210 [45:58<57:41, 29.59s/it] 45%|████▍ | 94/210 [46:28<57:29, 29.74s/it] 45%|████▌ | 95/210 [46:58<57:00, 29.74s/it] 46%|████▌ | 96/210 [47:28<56:19, 29.64s/it] 46%|████▌ | 97/210 [47:57<55:41, 29.57s/it] 47%|████▋ | 98/210 [48:27<55:18, 29.63s/it] 47%|████▋ | 99/210 [48:56<54:42, 29.57s/it] 48%|████▊ | 100/210 [49:26<54:18, 29.62s/it] {'loss': 0.4257, 'learning_rate': 0.0001, 'global_step': 100, 'epoch': 0.48}
48%|████▊ | 100/210 [49:26<54:18, 29.62s/it] 48%|████▊ | 101/210 [49:59<55:56, 30.80s/it] 49%|████▊ | 102/210 [50:29<54:42, 30.39s/it] 49%|████▉ | 103/210 [50:59<53:50, 30.20s/it] 50%|████▉ | 104/210 [51:28<52:57, 29.98s/it] 50%|█████ | 105/210 [51:58<52:20, 29.91s/it] 50%|█████ | 106/210 [52:27<51:35, 29.77s/it] 51%|█████ | 107/210 [52:57<51:05, 29.77s/it] 51%|█████▏ | 108/210 [53:27<50:35, 29.76s/it] 52%|█████▏ | 109/210 [53:56<49:54, 29.65s/it] 52%|█████▏ | 110/210 [54:26<49:19, 29.59s/it] {'loss': 0.5, 'learning_rate': 0.0001, 'global_step': 110, 'epoch': 0.52}
52%|█████▏ | 110/210 [54:26<49:19, 29.59s/it] 53%|█████▎ | 111/210 [54:55<48:54, 29.64s/it] 53%|█████▎ | 112/210 [55:25<48:16, 29.56s/it] 54%|█████▍ | 113/210 [55:54<47:44, 29.53s/it] 54%|█████▍ | 114/210 [56:24<47:21, 29.59s/it] 55%|█████▍ | 115/210 [56:53<46:46, 29.54s/it] 55%|█████▌ | 116/210 [57:24<46:31, 29.70s/it] 56%|█████▌ | 117/210 [57:53<46:03, 29.71s/it] 56%|█████▌ | 118/210 [58:23<45:25, 29.63s/it] 57%|█████▋ | 119/210 [58:52<44:59, 29.66s/it] 57%|█████▋ | 120/210 [59:22<44:23, 29.59s/it] {'loss': 0.4954, 'learning_rate': 0.0001, 'global_step': 120, 'epoch': 0.57}
57%|█████▋ | 120/210 [59:22<44:23, 29.59s/it] 58%|█████▊ | 121/210 [59:52<43:55, 29.61s/it] 58%|█████▊ | 122/210 [1:00:21<43:29, 29.65s/it] 59%|█████▊ | 123/210 [1:00:51<42:53, 29.58s/it] 59%|█████▉ | 124/210 [1:01:20<42:19, 29.53s/it] 60%|█████▉ | 125/210 [1:01:50<41:56, 29.60s/it] 60%|██████ | 126/210 [1:02:19<41:21, 29.54s/it] 60%|██████ | 127/210 [1:02:49<40:48, 29.50s/it] 61%|██████ | 128/210 [1:03:18<40:19, 29.50s/it] 61%|██████▏ | 129/210 [1:03:48<39:55, 29.58s/it] 62%|██████▏ | 130/210 [1:04:17<39:22, 29.53s/it] {'loss': 0.4691, 'learning_rate': 0.0001, 'global_step': 130, 'epoch': 0.62}
62%|██████▏ | 130/210 [1:04:17<39:22, 29.53s/it] 62%|██████▏ | 131/210 [1:04:47<38:49, 29.49s/it] 63%|██████▎ | 132/210 [1:05:16<38:18, 29.47s/it] 63%|██████▎ | 133/210 [1:05:46<37:47, 29.44s/it] 64%|██████▍ | 134/210 [1:06:15<37:16, 29.43s/it] 64%|██████▍ | 135/210 [1:06:45<36:53, 29.52s/it] 65%|██████▍ | 136/210 [1:07:14<36:21, 29.49s/it] 65%|██████▌ | 137/210 [1:07:44<35:51, 29.47s/it] 66%|██████▌ | 138/210 [1:08:13<35:27, 29.55s/it] 66%|██████▌ | 139/210 [1:08:43<35:02, 29.61s/it] 67%|██████▋ | 140/210 [1:09:13<34:35, 29.65s/it] {'loss': 0.4373, 'learning_rate': 0.0001, 'global_step': 140, 'epoch': 0.67}
67%|██████▋ | 140/210 [1:09:13<34:35, 29.65s/it] 67%|██████▋ | 141/210 [1:09:42<34:07, 29.67s/it] 68%|██████▊ | 142/210 [1:10:12<33:39, 29.69s/it] 68%|██████▊ | 143/210 [1:10:42<33:16, 29.80s/it] 69%|██████▊ | 144/210 [1:11:12<32:39, 29.69s/it] 69%|██████▉ | 145/210 [1:11:41<32:10, 29.70s/it] 70%|██████▉ | 146/210 [1:12:11<31:35, 29.62s/it] 70%|███████ | 147/210 [1:12:40<31:02, 29.56s/it] 70%|███████ | 148/210 [1:13:10<30:30, 29.52s/it] 71%|███████ | 149/210 [1:13:39<29:58, 29.49s/it] 71%|███████▏ | 150/210 [1:14:09<29:28, 29.47s/it] {'loss': 0.526, 'learning_rate': 0.0001, 'global_step': 150, 'epoch': 0.71}
71%|███████▏ | 150/210 [1:14:09<29:28, 29.47s/it] 72%|███████▏ | 151/210 [1:14:39<29:08, 29.64s/it] 72%|███████▏ | 152/210 [1:15:08<28:35, 29.58s/it] 73%|███████▎ | 153/210 [1:15:38<28:14, 29.73s/it] 73%|███████▎ | 154/210 [1:16:07<27:39, 29.63s/it] 74%|███████▍ | 155/210 [1:16:37<27:10, 29.65s/it] 74%|███████▍ | 156/210 [1:17:07<26:37, 29.58s/it] 75%|███████▍ | 157/210 [1:17:36<26:04, 29.52s/it] 75%|███████▌ | 158/210 [1:18:06<25:36, 29.55s/it] 76%|███████▌ | 159/210 [1:18:35<25:09, 29.60s/it] 76%|███████▌ | 160/210 [1:19:05<24:42, 29.65s/it] {'loss': 0.4297, 'learning_rate': 0.0001, 'global_step': 160, 'epoch': 0.76}
76%|███████▌ | 160/210 [1:19:05<24:42, 29.65s/it] 77%|███████▋ | 161/210 [1:19:35<24:14, 29.68s/it] 77%|███████▋ | 162/210 [1:20:04<23:41, 29.60s/it] 78%|███████▊ | 163/210 [1:20:34<23:12, 29.64s/it] 78%|███████▊ | 164/210 [1:21:04<22:44, 29.66s/it] 79%|███████▊ | 165/210 [1:21:33<22:15, 29.67s/it] 79%|███████▉ | 166/210 [1:22:03<21:40, 29.56s/it] 80%|███████▉ | 167/210 [1:22:32<21:08, 29.51s/it] 80%|████████ | 168/210 [1:23:01<20:38, 29.48s/it] 80%|████████ | 169/210 [1:23:31<20:11, 29.56s/it] 81%|████████ | 170/210 [1:24:01<19:40, 29.51s/it] {'loss': 0.4708, 'learning_rate': 0.0001, 'global_step': 170, 'epoch': 0.81}
81%|████████ | 170/210 [1:24:01<19:40, 29.51s/it] 81%|████████▏ | 171/210 [1:24:30<19:13, 29.57s/it] 82%|████████▏ | 172/210 [1:25:00<18:42, 29.53s/it] 82%|████████▏ | 173/210 [1:25:30<18:18, 29.68s/it] 83%|████████▎ | 174/210 [1:25:59<17:45, 29.61s/it] 83%|████████▎ | 175/210 [1:26:29<17:14, 29.54s/it] 84%|████████▍ | 176/210 [1:26:58<16:43, 29.51s/it] 84%|████████▍ | 177/210 [1:27:28<16:15, 29.57s/it] 85%|████████▍ | 178/210 [1:27:57<15:44, 29.52s/it] 85%|████████▌ | 179/210 [1:28:27<15:13, 29.48s/it] 86%|████████▌ | 180/210 [1:28:56<14:43, 29.44s/it] {'loss': 0.4872, 'learning_rate': 0.0001, 'global_step': 180, 'epoch': 0.86}
86%|████████▌ | 180/210 [1:28:56<14:43, 29.44s/it] 86%|████████▌ | 181/210 [1:29:26<14:16, 29.52s/it] 87%|████████▋ | 182/210 [1:29:55<13:45, 29.48s/it] 87%|████████▋ | 183/210 [1:30:25<13:17, 29.55s/it] 88%|████████▊ | 184/210 [1:30:54<12:49, 29.60s/it] 88%|████████▊ | 185/210 [1:31:24<12:18, 29.54s/it] 89%|████████▊ | 186/210 [1:31:53<11:47, 29.50s/it] 89%|████████▉ | 187/210 [1:32:23<11:17, 29.47s/it] 90%|████████▉ | 188/210 [1:32:52<10:49, 29.54s/it] 90%|█████████ | 189/210 [1:33:22<10:21, 29.60s/it] 90%|█████████ | 190/210 [1:33:52<09:52, 29.64s/it] {'loss': 0.4888, 'learning_rate': 0.0001, 'global_step': 190, 'epoch': 0.9}
90%|█████████ | 190/210 [1:33:52<09:52, 29.64s/it] 91%|█████████ | 191/210 [1:34:22<09:25, 29.76s/it] 91%|█████████▏| 192/210 [1:34:52<08:55, 29.75s/it] 92%|█████████▏| 193/210 [1:35:21<08:25, 29.75s/it] 92%|█████████▏| 194/210 [1:35:51<07:55, 29.74s/it] 93%|█████████▎| 195/210 [1:36:20<07:24, 29.65s/it] 93%|█████████▎| 196/210 [1:36:50<06:55, 29.67s/it] 94%|█████████▍| 197/210 [1:37:20<06:27, 29.79s/it] 94%|█████████▍| 198/210 [1:37:50<05:56, 29.71s/it] 95%|█████████▍| 199/210 [1:38:19<05:25, 29.61s/it] 95%|█████████▌| 200/210 [1:38:49<04:55, 29.55s/it] {'loss': 0.4754, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 0.95}
95%|█████████▌| 200/210 [1:38:49<04:55, 29.55s/it] 96%|█████████▌| 201/210 [1:39:22<04:36, 30.70s/it] 96%|█████████▌| 202/210 [1:39:51<04:02, 30.31s/it] 97%|█████████▋| 203/210 [1:40:21<03:30, 30.05s/it] 97%|█████████▋| 204/210 [1:40:50<02:59, 29.86s/it] 98%|█████████▊| 205/210 [1:41:20<02:28, 29.73s/it] 98%|█████████▊| 206/210 [1:41:49<01:58, 29.64s/it] 99%|█████████▊| 207/210 [1:42:19<01:28, 29.58s/it] 99%|█████████▉| 208/210 [1:42:48<00:59, 29.53s/it] 100%|█████████▉| 209/210 [1:43:24<00:31, 31.55s/it] 100%|██████████| 210/210 [1:43:54<00:00, 30.92s/it] {'loss': 0.4733, 'learning_rate': 0.0001, 'global_step': 210, 'epoch': 1.0}
100%|██████████| 210/210 [1:43:54<00:00, 30.92s/it] {'train_runtime': 6234.3462, 'train_samples_per_second': 0.405, 'train_steps_per_second': 0.034, 'train_loss': 0.4878490357171921, 'epoch': 1.0}
100%|██████████| 210/210 [1:43:54<00:00, 30.92s/it] 100%|██████████| 210/210 [1:43:54<00:00, 29.69s/it]
***** train metrics *****
epoch = 1.0
train_loss = 0.4878
train_runtime = 1:43:54.34
train_samples_per_second = 0.405
train_steps_per_second = 0.034