File size: 19,550 Bytes
5126c18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
model training desc:  initialize model training...
2023-12-30 17:59:51.341 | INFO     | __main__:init_components:108 - Initializing components...

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:09<00:09,  9.01s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:11<00:00,  5.18s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:11<00:00,  5.75s/it]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2023-12-30 18:00:03.415 | INFO     | __main__:init_components:155 - 

2023-12-30 18:00:03.415 | INFO     | __main__:init_components:156 - ********************
2023-12-30 18:00:03.415 | INFO     | __main__:init_components:157 - using llama2 model
2023-12-30 18:00:03.415 | INFO     | __main__:init_components:158 - ********************
2023-12-30 18:00:03.415 | INFO     | __main__:init_components:159 - 

memory footprint of model: 4.024436950683594 GB
trainable params: 319,815,680 || all params: 7,058,231,296 || trainable%: 4.531102291607305
2023-12-30 18:00:48.703 | INFO     | component.dataset:__init__:14 - Loading data: /data0/maqi/KGLQA-data/datasets/QuALITY/Caption/quality_caption_and_rel_instruct/train.jsonl
2023-12-30 18:00:48.807 | INFO     | component.dataset:__init__:19 - there are 2523 data in dataset
2023-12-30 18:00:49.225 | INFO     | __main__:main:231 - *** starting training ***

  0%|          | 0/210 [00:00<?, ?it/s]
  0%|          | 1/210 [00:33<1:58:19, 33.97s/it]
  1%|          | 2/210 [01:03<1:48:04, 31.18s/it]
  1%|▏         | 3/210 [01:32<1:44:41, 30.34s/it]
  2%|▏         | 4/210 [02:01<1:42:54, 29.97s/it]
  2%|▏         | 5/210 [02:31<1:41:41, 29.76s/it]
  3%|▎         | 6/210 [03:01<1:41:11, 29.76s/it]
  3%|▎         | 7/210 [03:30<1:40:18, 29.65s/it]
  4%|▍         | 8/210 [04:00<1:39:55, 29.68s/it]
  4%|▍         | 9/210 [04:29<1:39:06, 29.58s/it]
  5%|▍         | 10/210 [04:58<1:38:23, 29.52s/it]
                                                  
{'loss': 0.5879, 'learning_rate': 4.761904761904762e-05, 'global_step': 10, 'epoch': 0.05}

  5%|▍         | 10/210 [04:59<1:38:23, 29.52s/it]
  5%|▌         | 11/210 [05:28<1:38:06, 29.58s/it]
  6%|▌         | 12/210 [05:58<1:37:45, 29.62s/it]
  6%|▌         | 13/210 [06:28<1:37:21, 29.65s/it]
  7%|▋         | 14/210 [06:57<1:36:36, 29.57s/it]
  7%|▋         | 15/210 [07:27<1:36:17, 29.63s/it]
  8%|▊         | 16/210 [07:57<1:35:56, 29.67s/it]
  8%|▊         | 17/210 [08:26<1:35:31, 29.70s/it]
  9%|▊         | 18/210 [08:56<1:35:05, 29.71s/it]
  9%|▉         | 19/210 [09:27<1:35:15, 29.92s/it]
 10%|▉         | 20/210 [09:57<1:34:55, 29.97s/it]
                                                  
{'loss': 0.506, 'learning_rate': 9.523809523809524e-05, 'global_step': 20, 'epoch': 0.1}

 10%|▉         | 20/210 [09:57<1:34:55, 29.97s/it]
 10%|█         | 21/210 [10:26<1:33:54, 29.81s/it]
 10%|█         | 22/210 [10:56<1:33:39, 29.89s/it]
 11%|█         | 23/210 [11:26<1:32:43, 29.75s/it]
 11%|█▏        | 24/210 [11:55<1:31:56, 29.66s/it]
 12%|█▏        | 25/210 [12:24<1:31:14, 29.59s/it]
 12%|█▏        | 26/210 [12:54<1:30:36, 29.55s/it]
 13%|█▎        | 27/210 [13:24<1:30:16, 29.60s/it]
 13%|█▎        | 28/210 [13:53<1:29:29, 29.50s/it]
 14%|█▍        | 29/210 [14:23<1:29:11, 29.57s/it]
 14%|█▍        | 30/210 [14:53<1:29:08, 29.72s/it]
                                                  
{'loss': 0.5252, 'learning_rate': 0.0001, 'global_step': 30, 'epoch': 0.14}

 14%|█▍        | 30/210 [14:53<1:29:08, 29.72s/it]
 15%|█▍        | 31/210 [15:22<1:28:25, 29.64s/it]
 15%|█▌        | 32/210 [15:52<1:28:16, 29.76s/it]
 16%|█▌        | 33/210 [16:22<1:27:45, 29.75s/it]
 16%|█▌        | 34/210 [16:51<1:26:58, 29.65s/it]
 17%|█▋        | 35/210 [17:21<1:26:17, 29.59s/it]
 17%|█▋        | 36/210 [17:50<1:25:39, 29.54s/it]
 18%|█▊        | 37/210 [18:20<1:25:38, 29.70s/it]
 18%|█▊        | 38/210 [18:50<1:24:54, 29.62s/it]
 19%|█▊        | 39/210 [19:19<1:24:15, 29.56s/it]
 19%|█▉        | 40/210 [19:49<1:24:10, 29.71s/it]
                                                  
{'loss': 0.5572, 'learning_rate': 0.0001, 'global_step': 40, 'epoch': 0.19}

 19%|█▉        | 40/210 [19:49<1:24:10, 29.71s/it]
 20%|█▉        | 41/210 [20:19<1:23:43, 29.73s/it]
 20%|██        | 42/210 [20:48<1:23:00, 29.64s/it]
 20%|██        | 43/210 [21:18<1:22:36, 29.68s/it]
 21%|██        | 44/210 [21:48<1:22:11, 29.71s/it]
 21%|██▏       | 45/210 [22:18<1:22:00, 29.82s/it]
 22%|██▏       | 46/210 [22:48<1:21:27, 29.80s/it]
 22%|██▏       | 47/210 [23:18<1:20:56, 29.80s/it]
 23%|██▎       | 48/210 [23:47<1:20:09, 29.69s/it]
 23%|██▎       | 49/210 [24:16<1:19:28, 29.62s/it]
 24%|██▍       | 50/210 [24:46<1:19:05, 29.66s/it]
                                                  
{'loss': 0.4937, 'learning_rate': 0.0001, 'global_step': 50, 'epoch': 0.24}

 24%|██▍       | 50/210 [24:46<1:19:05, 29.66s/it]
 24%|██▍       | 51/210 [25:16<1:18:25, 29.60s/it]
 25%|██▍       | 52/210 [25:45<1:17:48, 29.55s/it]
 25%|██▌       | 53/210 [26:15<1:17:28, 29.61s/it]
 26%|██▌       | 54/210 [26:44<1:16:50, 29.55s/it]
 26%|██▌       | 55/210 [27:14<1:16:45, 29.71s/it]
 27%|██▋       | 56/210 [27:44<1:16:18, 29.73s/it]
 27%|██▋       | 57/210 [28:14<1:15:50, 29.74s/it]
 28%|██▊       | 58/210 [28:44<1:15:22, 29.75s/it]
 28%|██▊       | 59/210 [29:13<1:14:53, 29.76s/it]
 29%|██▊       | 60/210 [29:43<1:14:09, 29.66s/it]
                                                  
{'loss': 0.4925, 'learning_rate': 0.0001, 'global_step': 60, 'epoch': 0.29}

 29%|██▊       | 60/210 [29:43<1:14:09, 29.66s/it]
 29%|██▉       | 61/210 [30:12<1:13:31, 29.61s/it]
 30%|██▉       | 62/210 [30:42<1:12:55, 29.56s/it]
 30%|███       | 63/210 [31:12<1:12:35, 29.63s/it]
 30%|███       | 64/210 [31:41<1:11:56, 29.56s/it]
 31%|███       | 65/210 [32:10<1:11:23, 29.54s/it]
 31%|███▏      | 66/210 [32:40<1:11:03, 29.61s/it]
 32%|███▏      | 67/210 [33:10<1:10:27, 29.56s/it]
 32%|███▏      | 68/210 [33:39<1:09:52, 29.53s/it]
 33%|███▎      | 69/210 [34:09<1:09:19, 29.50s/it]
 33%|███▎      | 70/210 [34:38<1:08:48, 29.49s/it]
                                                  
{'loss': 0.4309, 'learning_rate': 0.0001, 'global_step': 70, 'epoch': 0.33}

 33%|███▎      | 70/210 [34:38<1:08:48, 29.49s/it]
 34%|███▍      | 71/210 [35:07<1:08:18, 29.48s/it]
 34%|███▍      | 72/210 [35:37<1:07:47, 29.48s/it]
 35%|███▍      | 73/210 [36:06<1:07:16, 29.46s/it]
 35%|███▌      | 74/210 [36:36<1:07:00, 29.56s/it]
 36%|███▌      | 75/210 [37:06<1:06:26, 29.53s/it]
 36%|███▌      | 76/210 [37:35<1:06:05, 29.59s/it]
 37%|███▋      | 77/210 [38:05<1:05:41, 29.64s/it]
 37%|███▋      | 78/210 [38:35<1:05:03, 29.57s/it]
 38%|███▊      | 79/210 [39:04<1:04:29, 29.54s/it]
 38%|███▊      | 80/210 [39:33<1:03:55, 29.51s/it]
                                                  
{'loss': 0.4831, 'learning_rate': 0.0001, 'global_step': 80, 'epoch': 0.38}

 38%|███▊      | 80/210 [39:33<1:03:55, 29.51s/it]
 39%|███▊      | 81/210 [40:03<1:03:34, 29.57s/it]
 39%|███▉      | 82/210 [40:33<1:03:12, 29.63s/it]
 40%|███▉      | 83/210 [41:03<1:02:47, 29.67s/it]
 40%|████      | 84/210 [41:32<1:02:21, 29.70s/it]
 40%|████      | 85/210 [42:02<1:01:42, 29.62s/it]
 41%|████      | 86/210 [42:31<1:01:05, 29.56s/it]
 41%|████▏     | 87/210 [43:01<1:00:32, 29.53s/it]
 42%|████▏     | 88/210 [43:30<59:59, 29.51s/it]  
 42%|████▏     | 89/210 [44:00<59:38, 29.58s/it]
 43%|████▎     | 90/210 [44:29<59:04, 29.53s/it]
                                                
{'loss': 0.4896, 'learning_rate': 0.0001, 'global_step': 90, 'epoch': 0.43}

 43%|████▎     | 90/210 [44:29<59:04, 29.53s/it]
 43%|████▎     | 91/210 [44:59<58:43, 29.61s/it]
 44%|████▍     | 92/210 [45:29<58:18, 29.65s/it]
 44%|████▍     | 93/210 [45:58<57:41, 29.59s/it]
 45%|████▍     | 94/210 [46:28<57:29, 29.74s/it]
 45%|████▌     | 95/210 [46:58<57:00, 29.74s/it]
 46%|████▌     | 96/210 [47:28<56:19, 29.64s/it]
 46%|████▌     | 97/210 [47:57<55:41, 29.57s/it]
 47%|████▋     | 98/210 [48:27<55:18, 29.63s/it]
 47%|████▋     | 99/210 [48:56<54:42, 29.57s/it]
 48%|████▊     | 100/210 [49:26<54:18, 29.62s/it]
                                                 
{'loss': 0.4257, 'learning_rate': 0.0001, 'global_step': 100, 'epoch': 0.48}

 48%|████▊     | 100/210 [49:26<54:18, 29.62s/it]
 48%|████▊     | 101/210 [49:59<55:56, 30.80s/it]
 49%|████▊     | 102/210 [50:29<54:42, 30.39s/it]
 49%|████▉     | 103/210 [50:59<53:50, 30.20s/it]
 50%|████▉     | 104/210 [51:28<52:57, 29.98s/it]
 50%|█████     | 105/210 [51:58<52:20, 29.91s/it]
 50%|█████     | 106/210 [52:27<51:35, 29.77s/it]
 51%|█████     | 107/210 [52:57<51:05, 29.77s/it]
 51%|█████▏    | 108/210 [53:27<50:35, 29.76s/it]
 52%|█████▏    | 109/210 [53:56<49:54, 29.65s/it]
 52%|█████▏    | 110/210 [54:26<49:19, 29.59s/it]
                                                 
{'loss': 0.5, 'learning_rate': 0.0001, 'global_step': 110, 'epoch': 0.52}

 52%|█████▏    | 110/210 [54:26<49:19, 29.59s/it]
 53%|█████▎    | 111/210 [54:55<48:54, 29.64s/it]
 53%|█████▎    | 112/210 [55:25<48:16, 29.56s/it]
 54%|█████▍    | 113/210 [55:54<47:44, 29.53s/it]
 54%|█████▍    | 114/210 [56:24<47:21, 29.59s/it]
 55%|█████▍    | 115/210 [56:53<46:46, 29.54s/it]
 55%|█████▌    | 116/210 [57:24<46:31, 29.70s/it]
 56%|█████▌    | 117/210 [57:53<46:03, 29.71s/it]
 56%|█████▌    | 118/210 [58:23<45:25, 29.63s/it]
 57%|█████▋    | 119/210 [58:52<44:59, 29.66s/it]
 57%|█████▋    | 120/210 [59:22<44:23, 29.59s/it]
                                                 
{'loss': 0.4954, 'learning_rate': 0.0001, 'global_step': 120, 'epoch': 0.57}

 57%|█████▋    | 120/210 [59:22<44:23, 29.59s/it]
 58%|█████▊    | 121/210 [59:52<43:55, 29.61s/it]
 58%|█████▊    | 122/210 [1:00:21<43:29, 29.65s/it]
 59%|█████▊    | 123/210 [1:00:51<42:53, 29.58s/it]
 59%|█████▉    | 124/210 [1:01:20<42:19, 29.53s/it]
 60%|█████▉    | 125/210 [1:01:50<41:56, 29.60s/it]
 60%|██████    | 126/210 [1:02:19<41:21, 29.54s/it]
 60%|██████    | 127/210 [1:02:49<40:48, 29.50s/it]
 61%|██████    | 128/210 [1:03:18<40:19, 29.50s/it]
 61%|██████▏   | 129/210 [1:03:48<39:55, 29.58s/it]
 62%|██████▏   | 130/210 [1:04:17<39:22, 29.53s/it]
                                                   
{'loss': 0.4691, 'learning_rate': 0.0001, 'global_step': 130, 'epoch': 0.62}

 62%|██████▏   | 130/210 [1:04:17<39:22, 29.53s/it]
 62%|██████▏   | 131/210 [1:04:47<38:49, 29.49s/it]
 63%|██████▎   | 132/210 [1:05:16<38:18, 29.47s/it]
 63%|██████▎   | 133/210 [1:05:46<37:47, 29.44s/it]
 64%|██████▍   | 134/210 [1:06:15<37:16, 29.43s/it]
 64%|██████▍   | 135/210 [1:06:45<36:53, 29.52s/it]
 65%|██████▍   | 136/210 [1:07:14<36:21, 29.49s/it]
 65%|██████▌   | 137/210 [1:07:44<35:51, 29.47s/it]
 66%|██████▌   | 138/210 [1:08:13<35:27, 29.55s/it]
 66%|██████▌   | 139/210 [1:08:43<35:02, 29.61s/it]
 67%|██████▋   | 140/210 [1:09:13<34:35, 29.65s/it]
                                                   
{'loss': 0.4373, 'learning_rate': 0.0001, 'global_step': 140, 'epoch': 0.67}

 67%|██████▋   | 140/210 [1:09:13<34:35, 29.65s/it]
 67%|██████▋   | 141/210 [1:09:42<34:07, 29.67s/it]
 68%|██████▊   | 142/210 [1:10:12<33:39, 29.69s/it]
 68%|██████▊   | 143/210 [1:10:42<33:16, 29.80s/it]
 69%|██████▊   | 144/210 [1:11:12<32:39, 29.69s/it]
 69%|██████▉   | 145/210 [1:11:41<32:10, 29.70s/it]
 70%|██████▉   | 146/210 [1:12:11<31:35, 29.62s/it]
 70%|███████   | 147/210 [1:12:40<31:02, 29.56s/it]
 70%|███████   | 148/210 [1:13:10<30:30, 29.52s/it]
 71%|███████   | 149/210 [1:13:39<29:58, 29.49s/it]
 71%|███████▏  | 150/210 [1:14:09<29:28, 29.47s/it]
                                                   
{'loss': 0.526, 'learning_rate': 0.0001, 'global_step': 150, 'epoch': 0.71}

 71%|███████▏  | 150/210 [1:14:09<29:28, 29.47s/it]
 72%|███████▏  | 151/210 [1:14:39<29:08, 29.64s/it]
 72%|███████▏  | 152/210 [1:15:08<28:35, 29.58s/it]
 73%|███████▎  | 153/210 [1:15:38<28:14, 29.73s/it]
 73%|███████▎  | 154/210 [1:16:07<27:39, 29.63s/it]
 74%|███████▍  | 155/210 [1:16:37<27:10, 29.65s/it]
 74%|███████▍  | 156/210 [1:17:07<26:37, 29.58s/it]
 75%|███████▍  | 157/210 [1:17:36<26:04, 29.52s/it]
 75%|███████▌  | 158/210 [1:18:06<25:36, 29.55s/it]
 76%|███████▌  | 159/210 [1:18:35<25:09, 29.60s/it]
 76%|███████▌  | 160/210 [1:19:05<24:42, 29.65s/it]
                                                   
{'loss': 0.4297, 'learning_rate': 0.0001, 'global_step': 160, 'epoch': 0.76}

 76%|███████▌  | 160/210 [1:19:05<24:42, 29.65s/it]
 77%|███████▋  | 161/210 [1:19:35<24:14, 29.68s/it]
 77%|███████▋  | 162/210 [1:20:04<23:41, 29.60s/it]
 78%|███████▊  | 163/210 [1:20:34<23:12, 29.64s/it]
 78%|███████▊  | 164/210 [1:21:04<22:44, 29.66s/it]
 79%|███████▊  | 165/210 [1:21:33<22:15, 29.67s/it]
 79%|███████▉  | 166/210 [1:22:03<21:40, 29.56s/it]
 80%|███████▉  | 167/210 [1:22:32<21:08, 29.51s/it]
 80%|████████  | 168/210 [1:23:01<20:38, 29.48s/it]
 80%|████████  | 169/210 [1:23:31<20:11, 29.56s/it]
 81%|████████  | 170/210 [1:24:01<19:40, 29.51s/it]
                                                   
{'loss': 0.4708, 'learning_rate': 0.0001, 'global_step': 170, 'epoch': 0.81}

 81%|████████  | 170/210 [1:24:01<19:40, 29.51s/it]
 81%|████████▏ | 171/210 [1:24:30<19:13, 29.57s/it]
 82%|████████▏ | 172/210 [1:25:00<18:42, 29.53s/it]
 82%|████████▏ | 173/210 [1:25:30<18:18, 29.68s/it]
 83%|████████▎ | 174/210 [1:25:59<17:45, 29.61s/it]
 83%|████████▎ | 175/210 [1:26:29<17:14, 29.54s/it]
 84%|████████▍ | 176/210 [1:26:58<16:43, 29.51s/it]
 84%|████████▍ | 177/210 [1:27:28<16:15, 29.57s/it]
 85%|████████▍ | 178/210 [1:27:57<15:44, 29.52s/it]
 85%|████████▌ | 179/210 [1:28:27<15:13, 29.48s/it]
 86%|████████▌ | 180/210 [1:28:56<14:43, 29.44s/it]
                                                   
{'loss': 0.4872, 'learning_rate': 0.0001, 'global_step': 180, 'epoch': 0.86}

 86%|████████▌ | 180/210 [1:28:56<14:43, 29.44s/it]
 86%|████████▌ | 181/210 [1:29:26<14:16, 29.52s/it]
 87%|████████▋ | 182/210 [1:29:55<13:45, 29.48s/it]
 87%|████████▋ | 183/210 [1:30:25<13:17, 29.55s/it]
 88%|████████▊ | 184/210 [1:30:54<12:49, 29.60s/it]
 88%|████████▊ | 185/210 [1:31:24<12:18, 29.54s/it]
 89%|████████▊ | 186/210 [1:31:53<11:47, 29.50s/it]
 89%|████████▉ | 187/210 [1:32:23<11:17, 29.47s/it]
 90%|████████▉ | 188/210 [1:32:52<10:49, 29.54s/it]
 90%|█████████ | 189/210 [1:33:22<10:21, 29.60s/it]
 90%|█████████ | 190/210 [1:33:52<09:52, 29.64s/it]
                                                   
{'loss': 0.4888, 'learning_rate': 0.0001, 'global_step': 190, 'epoch': 0.9}

 90%|█████████ | 190/210 [1:33:52<09:52, 29.64s/it]
 91%|█████████ | 191/210 [1:34:22<09:25, 29.76s/it]
 91%|█████████▏| 192/210 [1:34:52<08:55, 29.75s/it]
 92%|█████████▏| 193/210 [1:35:21<08:25, 29.75s/it]
 92%|█████████▏| 194/210 [1:35:51<07:55, 29.74s/it]
 93%|█████████▎| 195/210 [1:36:20<07:24, 29.65s/it]
 93%|█████████▎| 196/210 [1:36:50<06:55, 29.67s/it]
 94%|█████████▍| 197/210 [1:37:20<06:27, 29.79s/it]
 94%|█████████▍| 198/210 [1:37:50<05:56, 29.71s/it]
 95%|█████████▍| 199/210 [1:38:19<05:25, 29.61s/it]
 95%|█████████▌| 200/210 [1:38:49<04:55, 29.55s/it]
                                                   
{'loss': 0.4754, 'learning_rate': 0.0001, 'global_step': 200, 'epoch': 0.95}

 95%|█████████▌| 200/210 [1:38:49<04:55, 29.55s/it]
 96%|█████████▌| 201/210 [1:39:22<04:36, 30.70s/it]
 96%|█████████▌| 202/210 [1:39:51<04:02, 30.31s/it]
 97%|█████████▋| 203/210 [1:40:21<03:30, 30.05s/it]
 97%|█████████▋| 204/210 [1:40:50<02:59, 29.86s/it]
 98%|█████████▊| 205/210 [1:41:20<02:28, 29.73s/it]
 98%|█████████▊| 206/210 [1:41:49<01:58, 29.64s/it]
 99%|█████████▊| 207/210 [1:42:19<01:28, 29.58s/it]
 99%|█████████▉| 208/210 [1:42:48<00:59, 29.53s/it]
100%|█████████▉| 209/210 [1:43:24<00:31, 31.55s/it]
100%|██████████| 210/210 [1:43:54<00:00, 30.92s/it]
                                                   
{'loss': 0.4733, 'learning_rate': 0.0001, 'global_step': 210, 'epoch': 1.0}

100%|██████████| 210/210 [1:43:54<00:00, 30.92s/it]
                                                   
{'train_runtime': 6234.3462, 'train_samples_per_second': 0.405, 'train_steps_per_second': 0.034, 'train_loss': 0.4878490357171921, 'epoch': 1.0}

100%|██████████| 210/210 [1:43:54<00:00, 30.92s/it]
100%|██████████| 210/210 [1:43:54<00:00, 29.69s/it]
***** train metrics *****
  epoch                    =        1.0
  train_loss               =     0.4878
  train_runtime            = 1:43:54.34
  train_samples_per_second =      0.405
  train_steps_per_second   =      0.034