initial model

16a3ff9 over 2 years ago

20.6 kB

	## Setup Notes

	For this model, a VM with 2 T4 GPUs was used.

	To get the training to work on the 2 GPUs (utilize both GPUS simultaneously), the following command was used to initiate training.

	WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'b-mc2/sql-create-context' --output_dir './lora-alpaca' --num_epochs 1 --micro_batch_size 16

	Note 1. Micro batch size was increased from the default 4 to 16. Note that increasing it further is possible based on other training that has been performed. This was a first attempt.

	Note 2. Output directory was initially lora-alpaca and then contents were moved to new folder when initializing git repository.


	## Log

	(sqltest) chrisdono@deep-learning-duo-t4-3:~/alpaca-lora$ WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/lla$
	a-7b-hf' --data_path 'b-mc2/sql-create-context' --output_dir './lora-alpaca' --num_epochs 1 --micro_batch_size 16
	WARNING:torch.distributed.run:
	*****************************************
	Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your appli
	cation as needed.
	*****************************************


	===================================BUG REPORT===================================
	Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
	================================================================================
	===================================BUG REPORT===================================
	Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
	================================================================================
	/opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
	s...
	warn(msg)
	CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
	CUDA SETUP: Highest compute capability among GPUs detected: 7.5
	CUDA SETUP: Detected CUDA version 113
	CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
	/opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
	s...
	warn(msg)
	CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
	CUDA SETUP: Highest compute capability among GPUs detected: 7.5
	CUDA SETUP: Detected CUDA version 113
	CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
	Training Alpaca-LoRA model with params:
	base_model: decapoda-research/llama-7b-hf
	data_path: b-mc2/sql-create-context
	output_dir: ./lora-alpaca
	batch_size: 128
	micro_batch_size: 16
	num_epochs: 1
	learning_rate: 0.0003
	cutoff_len: 256
	val_set_size: 2000
	lora_r: 8
	lora_alpha: 16
	lora_dropout: 0.05
	lora_target_modules: ['q_proj', 'v_proj']
	train_on_inputs: True
	add_eos_token: False
	group_by_length: False
	wandb_project:
	wandb_run_name:
	wandb_watch:
	wandb_log_model:
	resume_from_checkpoint: False
	prompt template: alpaca

	Loading checkpoint shards: 100%\|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 33/33 [01:24<00:00, 2.57s/it]
	Loading checkpoint shards: 100%\|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 33/33 [01:24<00:00, 2.57s/it]
	The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
	The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
	The class this function is called from is 'LlamaTokenizer'.
	The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
	The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
	The class this function is called from is 'LlamaTokenizer'.
	Found cached dataset json (/home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e
	233e6e)
	0%\| \| 0/1 [00:00<?, ?it/s]
	Found cached dataset json (/home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e
	233e6e)
	100%\|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 1/1 [00:00<00:00, 9.30it/s]
	100%\|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 1/1 [00:00<00:00, 7.83it/s]
	trainable params: 4194304 \|\| all params: 6742609920 \|\| trainable%: 0.06220594176090199
	trainable params: 4194304 \|\| all params: 6742609920 \|\| trainable%: 0.06220594176090199
	Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b
	2dd7af1cf934bed8e233e6e/cache-5a5ac0bd39fc20e0.arrow and /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb5
	50cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-782fec259d4b8f6a.arrow
	Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b
	2dd7af1cf934bed8e233e6e/cache-5a5ac0bd39fc20e0.arrow and /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb5
	50cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-782fec259d4b8f6a.arrow
	{'loss': 2.7003, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.02}
	{'loss': 2.566, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.03}
	{'loss': 2.2648, 'learning_rate': 8.999999999999999e-05, 'epoch': 0.05}
	{'loss': 1.657, 'learning_rate': 0.00011099999999999999, 'epoch': 0.07}
	{'loss': 1.1599, 'learning_rate': 0.00014099999999999998, 'epoch': 0.08}
	{'loss': 0.9037, 'learning_rate': 0.00017099999999999998, 'epoch': 0.1}
	{'loss': 0.8137, 'learning_rate': 0.000201, 'epoch': 0.12}
	{'loss': 0.7827, 'learning_rate': 0.00023099999999999998, 'epoch': 0.13}
	{'loss': 0.7554, 'learning_rate': 0.000261, 'epoch': 0.15}
	{'loss': 0.7357, 'learning_rate': 0.00029099999999999997, 'epoch': 0.17}
	{'loss': 0.6893, 'learning_rate': 0.0002957831325301205, 'epoch': 0.18}
	{'loss': 0.6606, 'learning_rate': 0.00028975903614457827, 'epoch': 0.2}
	{'loss': 0.6506, 'learning_rate': 0.0002837349397590361, 'epoch': 0.22}
	{'loss': 0.6462, 'learning_rate': 0.00027771084337349395, 'epoch': 0.23} [215/1857]
	{'loss': 0.6315, 'learning_rate': 0.0002716867469879518, 'epoch': 0.25}
	{'loss': 0.6337, 'learning_rate': 0.0002656626506024096, 'epoch': 0.27}
	{'loss': 0.6223, 'learning_rate': 0.00025963855421686746, 'epoch': 0.28}
	{'loss': 0.6136, 'learning_rate': 0.00025361445783132525, 'epoch': 0.3}
	{'loss': 0.6198, 'learning_rate': 0.00024759036144578314, 'epoch': 0.32}
	{'loss': 0.6084, 'learning_rate': 0.00024156626506024095, 'epoch': 0.33}
	{'eval_loss': 0.608456552028656, 'eval_runtime': 123.856, 'eval_samples_per_second': 16.148, 'eval_steps_per_second': 1.009, 'epoch': 0.33}
	{'loss': 0.6021, 'learning_rate': 0.00023554216867469876, 'epoch': 0.35}
	{'loss': 0.5949, 'learning_rate': 0.0002295180722891566, 'epoch': 0.37}
	{'loss': 0.5972, 'learning_rate': 0.00022349397590361444, 'epoch': 0.38}
	{'loss': 0.5922, 'learning_rate': 0.00021746987951807228, 'epoch': 0.4}
	{'loss': 0.5876, 'learning_rate': 0.0002114457831325301, 'epoch': 0.42}
	{'loss': 0.5788, 'learning_rate': 0.00020542168674698793, 'epoch': 0.43}
	{'loss': 0.5894, 'learning_rate': 0.0001993975903614458, 'epoch': 0.45}
	{'loss': 0.5877, 'learning_rate': 0.0001933734939759036, 'epoch': 0.47}
	{'loss': 0.5835, 'learning_rate': 0.00018734939759036142, 'epoch': 0.48}
	{'loss': 0.5791, 'learning_rate': 0.00018132530120481925, 'epoch': 0.5}
	{'loss': 0.5841, 'learning_rate': 0.00017530120481927712, 'epoch': 0.52}
	{'loss': 0.5728, 'learning_rate': 0.00016927710843373493, 'epoch': 0.53}
	{'loss': 0.569, 'learning_rate': 0.00016325301204819274, 'epoch': 0.55}
	{'loss': 0.5709, 'learning_rate': 0.00015722891566265058, 'epoch': 0.57}
	{'loss': 0.5762, 'learning_rate': 0.00015120481927710845, 'epoch': 0.58}
	{'loss': 0.5704, 'learning_rate': 0.00014518072289156626, 'epoch': 0.6}
	{'loss': 0.5661, 'learning_rate': 0.0001391566265060241, 'epoch': 0.62}
	{'loss': 0.5662, 'learning_rate': 0.00013313253012048193, 'epoch': 0.63}
	{'loss': 0.5674, 'learning_rate': 0.00012710843373493975, 'epoch': 0.65}
	{'loss': 0.5635, 'learning_rate': 0.00012108433734939758, 'epoch': 0.67}
	{'eval_loss': 0.568750262260437, 'eval_runtime': 122.9061, 'eval_samples_per_second': 16.273, 'eval_steps_per_second': 1.017, 'epoch': 0.67}
	{'loss': 0.5609, 'learning_rate': 0.00011506024096385541, 'epoch': 0.69}
	{'loss': 0.5724, 'learning_rate': 0.00010903614457831325, 'epoch': 0.7}
	{'loss': 0.5603, 'learning_rate': 0.00010301204819277107, 'epoch': 0.72}
	{'loss': 0.5599, 'learning_rate': 9.698795180722891e-05, 'epoch': 0.74}
	{'loss': 0.5655, 'learning_rate': 9.096385542168674e-05, 'epoch': 0.75}
	{'loss': 0.5578, 'learning_rate': 8.493975903614457e-05, 'epoch': 0.77}
	{'loss': 0.5577, 'learning_rate': 7.89156626506024e-05, 'epoch': 0.79}
	{'loss': 0.5606, 'learning_rate': 7.289156626506024e-05, 'epoch': 0.8}
	{'loss': 0.5496, 'learning_rate': 6.686746987951806e-05, 'epoch': 0.82}
	{'loss': 0.5635, 'learning_rate': 6.08433734939759e-05, 'epoch': 0.84}
	{'loss': 0.5522, 'learning_rate': 5.481927710843373e-05, 'epoch': 0.85}
	{'loss': 0.5572, 'learning_rate': 4.879518072289156e-05, 'epoch': 0.87}
	{'loss': 0.5454, 'learning_rate': 4.2771084337349395e-05, 'epoch': 0.89}
	{'loss': 0.5485, 'learning_rate': 3.6746987951807227e-05, 'epoch': 0.9}
	{'loss': 0.5592, 'learning_rate': 3.072289156626506e-05, 'epoch': 0.92}
	{'loss': 0.5499, 'learning_rate': 2.469879518072289e-05, 'epoch': 0.94}
	{'loss': 0.55, 'learning_rate': 1.867469879518072e-05, 'epoch': 0.95}
	{'loss': 0.5511, 'learning_rate': 1.2650602409638553e-05, 'epoch': 0.97}
	{'loss': 0.5531, 'learning_rate': 6.626506024096385e-06, 'epoch': 0.99}
	100%\|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 598/598 [4:45:30<00:00, 27.59s/it]
	{'train_runtime': 17131.1027, 'train_samples_per_second': 4.47, 'train_steps_per_second': 0.035, 'train_loss': 0.7246327424129116, 'epoch': 1.0}
	100%\|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 598/598 [4:45:30<00:00, 28.65s/it]