| ## Setup Notes | |
| For this model, a VM with 2 T4 GPUs was used. | |
| To get the training to work on the 2 GPUs (utilize both GPUS simultaneously), the following command was used to initiate training. | |
| WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'b-mc2/sql-create-context' --output_dir './lora-alpaca' --num_epochs 1 --micro_batch_size 16 | |
| Note 1. Micro batch size was increased from the default 4 to 16. Note that increasing it further is possible based on other training that has been performed. This was a first attempt. | |
| Note 2. Output directory was initially lora-alpaca and then contents were moved to new folder when initializing git repository. | |
| ## Log | |
| (sqltest) chrisdono@deep-learning-duo-t4-3:~/alpaca-lora$ WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/lla$ | |
| a-7b-hf' --data_path 'b-mc2/sql-create-context' --output_dir './lora-alpaca' --num_epochs 1 --micro_batch_size 16 | |
| WARNING:torch.distributed.run: | |
| ***************************************** | |
| Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your appli | |
| cation as needed. | |
| ***************************************** | |
| ===================================BUG REPORT=================================== | |
| Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues | |
| ================================================================================ | |
| ===================================BUG REPORT=================================== | |
| Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues | |
| ================================================================================ | |
| /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path | |
| s... | |
| warn(msg) | |
| CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so | |
| CUDA SETUP: Highest compute capability among GPUs detected: 7.5 | |
| CUDA SETUP: Detected CUDA version 113 | |
| CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so... | |
| /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path | |
| s... | |
| warn(msg) | |
| CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so | |
| CUDA SETUP: Highest compute capability among GPUs detected: 7.5 | |
| CUDA SETUP: Detected CUDA version 113 | |
| CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so... | |
| Training Alpaca-LoRA model with params: | |
| base_model: decapoda-research/llama-7b-hf | |
| data_path: b-mc2/sql-create-context | |
| output_dir: ./lora-alpaca | |
| batch_size: 128 | |
| micro_batch_size: 16 | |
| num_epochs: 1 | |
| learning_rate: 0.0003 | |
| cutoff_len: 256 | |
| val_set_size: 2000 | |
| lora_r: 8 | |
| lora_alpha: 16 | |
| lora_dropout: 0.05 | |
| lora_target_modules: ['q_proj', 'v_proj'] | |
| train_on_inputs: True | |
| add_eos_token: False | |
| group_by_length: False | |
| wandb_project: | |
| wandb_run_name: | |
| wandb_watch: | |
| wandb_log_model: | |
| resume_from_checkpoint: False | |
| prompt template: alpaca | |
| Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 33/33 [01:24<00:00, 2.57s/it] | |
| Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 33/33 [01:24<00:00, 2.57s/it] | |
| The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. | |
| The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. | |
| The class this function is called from is 'LlamaTokenizer'. | |
| The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. | |
| The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. | |
| The class this function is called from is 'LlamaTokenizer'. | |
| Found cached dataset json (/home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e | |
| 233e6e) | |
| 0%| | 0/1 [00:00<?, ?it/s] | |
| Found cached dataset json (/home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e | |
| 233e6e) | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 9.30it/s] | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 7.83it/s] | |
| trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199 | |
| trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199 | |
| Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b | |
| 2dd7af1cf934bed8e233e6e/cache-5a5ac0bd39fc20e0.arrow and /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb5 | |
| 50cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-782fec259d4b8f6a.arrow | |
| Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b | |
| 2dd7af1cf934bed8e233e6e/cache-5a5ac0bd39fc20e0.arrow and /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb5 | |
| 50cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-782fec259d4b8f6a.arrow | |
| {'loss': 2.7003, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.02} | |
| {'loss': 2.566, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.03} | |
| {'loss': 2.2648, 'learning_rate': 8.999999999999999e-05, 'epoch': 0.05} | |
| {'loss': 1.657, 'learning_rate': 0.00011099999999999999, 'epoch': 0.07} | |
| {'loss': 1.1599, 'learning_rate': 0.00014099999999999998, 'epoch': 0.08} | |
| {'loss': 0.9037, 'learning_rate': 0.00017099999999999998, 'epoch': 0.1} | |
| {'loss': 0.8137, 'learning_rate': 0.000201, 'epoch': 0.12} | |
| {'loss': 0.7827, 'learning_rate': 0.00023099999999999998, 'epoch': 0.13} | |
| {'loss': 0.7554, 'learning_rate': 0.000261, 'epoch': 0.15} | |
| {'loss': 0.7357, 'learning_rate': 0.00029099999999999997, 'epoch': 0.17} | |
| {'loss': 0.6893, 'learning_rate': 0.0002957831325301205, 'epoch': 0.18} | |
| {'loss': 0.6606, 'learning_rate': 0.00028975903614457827, 'epoch': 0.2} | |
| {'loss': 0.6506, 'learning_rate': 0.0002837349397590361, 'epoch': 0.22} | |
| {'loss': 0.6462, 'learning_rate': 0.00027771084337349395, 'epoch': 0.23} [215/1857] | |
| {'loss': 0.6315, 'learning_rate': 0.0002716867469879518, 'epoch': 0.25} | |
| {'loss': 0.6337, 'learning_rate': 0.0002656626506024096, 'epoch': 0.27} | |
| {'loss': 0.6223, 'learning_rate': 0.00025963855421686746, 'epoch': 0.28} | |
| {'loss': 0.6136, 'learning_rate': 0.00025361445783132525, 'epoch': 0.3} | |
| {'loss': 0.6198, 'learning_rate': 0.00024759036144578314, 'epoch': 0.32} | |
| {'loss': 0.6084, 'learning_rate': 0.00024156626506024095, 'epoch': 0.33} | |
| {'eval_loss': 0.608456552028656, 'eval_runtime': 123.856, 'eval_samples_per_second': 16.148, 'eval_steps_per_second': 1.009, 'epoch': 0.33} | |
| {'loss': 0.6021, 'learning_rate': 0.00023554216867469876, 'epoch': 0.35} | |
| {'loss': 0.5949, 'learning_rate': 0.0002295180722891566, 'epoch': 0.37} | |
| {'loss': 0.5972, 'learning_rate': 0.00022349397590361444, 'epoch': 0.38} | |
| {'loss': 0.5922, 'learning_rate': 0.00021746987951807228, 'epoch': 0.4} | |
| {'loss': 0.5876, 'learning_rate': 0.0002114457831325301, 'epoch': 0.42} | |
| {'loss': 0.5788, 'learning_rate': 0.00020542168674698793, 'epoch': 0.43} | |
| {'loss': 0.5894, 'learning_rate': 0.0001993975903614458, 'epoch': 0.45} | |
| {'loss': 0.5877, 'learning_rate': 0.0001933734939759036, 'epoch': 0.47} | |
| {'loss': 0.5835, 'learning_rate': 0.00018734939759036142, 'epoch': 0.48} | |
| {'loss': 0.5791, 'learning_rate': 0.00018132530120481925, 'epoch': 0.5} | |
| {'loss': 0.5841, 'learning_rate': 0.00017530120481927712, 'epoch': 0.52} | |
| {'loss': 0.5728, 'learning_rate': 0.00016927710843373493, 'epoch': 0.53} | |
| {'loss': 0.569, 'learning_rate': 0.00016325301204819274, 'epoch': 0.55} | |
| {'loss': 0.5709, 'learning_rate': 0.00015722891566265058, 'epoch': 0.57} | |
| {'loss': 0.5762, 'learning_rate': 0.00015120481927710845, 'epoch': 0.58} | |
| {'loss': 0.5704, 'learning_rate': 0.00014518072289156626, 'epoch': 0.6} | |
| {'loss': 0.5661, 'learning_rate': 0.0001391566265060241, 'epoch': 0.62} | |
| {'loss': 0.5662, 'learning_rate': 0.00013313253012048193, 'epoch': 0.63} | |
| {'loss': 0.5674, 'learning_rate': 0.00012710843373493975, 'epoch': 0.65} | |
| {'loss': 0.5635, 'learning_rate': 0.00012108433734939758, 'epoch': 0.67} | |
| {'eval_loss': 0.568750262260437, 'eval_runtime': 122.9061, 'eval_samples_per_second': 16.273, 'eval_steps_per_second': 1.017, 'epoch': 0.67} | |
| {'loss': 0.5609, 'learning_rate': 0.00011506024096385541, 'epoch': 0.69} | |
| {'loss': 0.5724, 'learning_rate': 0.00010903614457831325, 'epoch': 0.7} | |
| {'loss': 0.5603, 'learning_rate': 0.00010301204819277107, 'epoch': 0.72} | |
| {'loss': 0.5599, 'learning_rate': 9.698795180722891e-05, 'epoch': 0.74} | |
| {'loss': 0.5655, 'learning_rate': 9.096385542168674e-05, 'epoch': 0.75} | |
| {'loss': 0.5578, 'learning_rate': 8.493975903614457e-05, 'epoch': 0.77} | |
| {'loss': 0.5577, 'learning_rate': 7.89156626506024e-05, 'epoch': 0.79} | |
| {'loss': 0.5606, 'learning_rate': 7.289156626506024e-05, 'epoch': 0.8} | |
| {'loss': 0.5496, 'learning_rate': 6.686746987951806e-05, 'epoch': 0.82} | |
| {'loss': 0.5635, 'learning_rate': 6.08433734939759e-05, 'epoch': 0.84} | |
| {'loss': 0.5522, 'learning_rate': 5.481927710843373e-05, 'epoch': 0.85} | |
| {'loss': 0.5572, 'learning_rate': 4.879518072289156e-05, 'epoch': 0.87} | |
| {'loss': 0.5454, 'learning_rate': 4.2771084337349395e-05, 'epoch': 0.89} | |
| {'loss': 0.5485, 'learning_rate': 3.6746987951807227e-05, 'epoch': 0.9} | |
| {'loss': 0.5592, 'learning_rate': 3.072289156626506e-05, 'epoch': 0.92} | |
| {'loss': 0.5499, 'learning_rate': 2.469879518072289e-05, 'epoch': 0.94} | |
| {'loss': 0.55, 'learning_rate': 1.867469879518072e-05, 'epoch': 0.95} | |
| {'loss': 0.5511, 'learning_rate': 1.2650602409638553e-05, 'epoch': 0.97} | |
| {'loss': 0.5531, 'learning_rate': 6.626506024096385e-06, 'epoch': 0.99} | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 598/598 [4:45:30<00:00, 27.59s/it] | |
| {'train_runtime': 17131.1027, 'train_samples_per_second': 4.47, 'train_steps_per_second': 0.035, 'train_loss': 0.7246327424129116, 'epoch': 1.0} | |
| 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 598/598 [4:45:30<00:00, 28.65s/it] | |