lc_full / README.md

End of training

fcd74c9 verified over 1 year ago

3.88 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: mistralai/Mistral-7B-v0.1
	model-index:
	- name: lc_full
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lc_full

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.8715

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 1.7424 \| 1.0 \| 486 \| 1.6914 \|
	\| 1.301 \| 2.0 \| 972 \| 1.6780 \|
	\| 1.5718 \| 3.0 \| 1458 \| 1.6743 \|
	\| 1.6632 \| 4.0 \| 1944 \| 1.6793 \|
	\| 1.8588 \| 5.0 \| 2430 \| 1.6794 \|
	\| 1.5308 \| 6.0 \| 2916 \| 1.6894 \|
	\| 1.5776 \| 7.0 \| 3402 \| 1.6985 \|
	\| 1.6394 \| 8.0 \| 3888 \| 1.7073 \|
	\| 1.4696 \| 9.0 \| 4374 \| 1.7187 \|
	\| 1.4191 \| 10.0 \| 4860 \| 1.7298 \|
	\| 1.4776 \| 11.0 \| 5346 \| 1.7414 \|
	\| 1.4767 \| 12.0 \| 5832 \| 1.7512 \|
	\| 1.3546 \| 13.0 \| 6318 \| 1.7731 \|
	\| 1.542 \| 14.0 \| 6804 \| 1.7610 \|
	\| 1.3709 \| 15.0 \| 7290 \| 1.7679 \|
	\| 1.3167 \| 16.0 \| 7776 \| 1.7936 \|
	\| 1.3563 \| 17.0 \| 8262 \| 1.8007 \|
	\| 1.4615 \| 18.0 \| 8748 \| 1.8008 \|
	\| 1.511 \| 19.0 \| 9234 \| 1.8068 \|
	\| 1.3145 \| 20.0 \| 9720 \| 1.8232 \|
	\| 1.1285 \| 21.0 \| 10206 \| 1.8204 \|
	\| 1.5045 \| 22.0 \| 10692 \| 1.8204 \|
	\| 1.2697 \| 23.0 \| 11178 \| 1.8453 \|
	\| 1.302 \| 24.0 \| 11664 \| 1.8386 \|
	\| 1.4892 \| 25.0 \| 12150 \| 1.8434 \|
	\| 1.5042 \| 26.0 \| 12636 \| 1.8471 \|
	\| 1.1989 \| 27.0 \| 13122 \| 1.8472 \|
	\| 1.2353 \| 28.0 \| 13608 \| 1.8545 \|
	\| 1.145 \| 29.0 \| 14094 \| 1.8560 \|
	\| 1.4146 \| 30.0 \| 14580 \| 1.8612 \|
	\| 1.3598 \| 31.0 \| 15066 \| 1.8611 \|
	\| 1.2659 \| 32.0 \| 15552 \| 1.8695 \|
	\| 1.2085 \| 33.0 \| 16038 \| 1.8631 \|
	\| 1.0623 \| 34.0 \| 16524 \| 1.8679 \|
	\| 1.4594 \| 35.0 \| 17010 \| 1.8694 \|
	\| 1.3038 \| 36.0 \| 17496 \| 1.8685 \|
	\| 1.5902 \| 37.0 \| 17982 \| 1.8695 \|
	\| 1.2771 \| 38.0 \| 18468 \| 1.8709 \|
	\| 1.2738 \| 39.0 \| 18954 \| 1.8698 \|
	\| 1.3209 \| 40.0 \| 19440 \| 1.8707 \|
	\| 1.2578 \| 41.0 \| 19926 \| 1.8709 \|
	\| 1.1108 \| 42.0 \| 20412 \| 1.8717 \|
	\| 1.3264 \| 43.0 \| 20898 \| 1.8711 \|
	\| 1.3152 \| 44.0 \| 21384 \| 1.8709 \|
	\| 1.4287 \| 45.0 \| 21870 \| 1.8709 \|
	\| 1.299 \| 46.0 \| 22356 \| 1.8709 \|
	\| 1.2863 \| 47.0 \| 22842 \| 1.8710 \|
	\| 1.1795 \| 48.0 \| 23328 \| 1.8716 \|
	\| 1.27 \| 49.0 \| 23814 \| 1.8719 \|
	\| 1.3156 \| 50.0 \| 24300 \| 1.8715 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.2
	- Pytorch 2.1.0+cu118
	- Datasets 2.19.2
	- Tokenizers 0.19.1

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: mistralai/Mistral-7B-v0.1
	model-index:
	- name: lc_full
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lc_full

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.8715

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 1.7424 \| 1.0 \| 486 \| 1.6914 \|
	\| 1.301 \| 2.0 \| 972 \| 1.6780 \|
	\| 1.5718 \| 3.0 \| 1458 \| 1.6743 \|
	\| 1.6632 \| 4.0 \| 1944 \| 1.6793 \|
	\| 1.8588 \| 5.0 \| 2430 \| 1.6794 \|
	\| 1.5308 \| 6.0 \| 2916 \| 1.6894 \|
	\| 1.5776 \| 7.0 \| 3402 \| 1.6985 \|
	\| 1.6394 \| 8.0 \| 3888 \| 1.7073 \|
	\| 1.4696 \| 9.0 \| 4374 \| 1.7187 \|
	\| 1.4191 \| 10.0 \| 4860 \| 1.7298 \|
	\| 1.4776 \| 11.0 \| 5346 \| 1.7414 \|
	\| 1.4767 \| 12.0 \| 5832 \| 1.7512 \|
	\| 1.3546 \| 13.0 \| 6318 \| 1.7731 \|
	\| 1.542 \| 14.0 \| 6804 \| 1.7610 \|
	\| 1.3709 \| 15.0 \| 7290 \| 1.7679 \|
	\| 1.3167 \| 16.0 \| 7776 \| 1.7936 \|
	\| 1.3563 \| 17.0 \| 8262 \| 1.8007 \|
	\| 1.4615 \| 18.0 \| 8748 \| 1.8008 \|
	\| 1.511 \| 19.0 \| 9234 \| 1.8068 \|
	\| 1.3145 \| 20.0 \| 9720 \| 1.8232 \|
	\| 1.1285 \| 21.0 \| 10206 \| 1.8204 \|
	\| 1.5045 \| 22.0 \| 10692 \| 1.8204 \|
	\| 1.2697 \| 23.0 \| 11178 \| 1.8453 \|
	\| 1.302 \| 24.0 \| 11664 \| 1.8386 \|
	\| 1.4892 \| 25.0 \| 12150 \| 1.8434 \|
	\| 1.5042 \| 26.0 \| 12636 \| 1.8471 \|
	\| 1.1989 \| 27.0 \| 13122 \| 1.8472 \|
	\| 1.2353 \| 28.0 \| 13608 \| 1.8545 \|
	\| 1.145 \| 29.0 \| 14094 \| 1.8560 \|
	\| 1.4146 \| 30.0 \| 14580 \| 1.8612 \|
	\| 1.3598 \| 31.0 \| 15066 \| 1.8611 \|
	\| 1.2659 \| 32.0 \| 15552 \| 1.8695 \|
	\| 1.2085 \| 33.0 \| 16038 \| 1.8631 \|
	\| 1.0623 \| 34.0 \| 16524 \| 1.8679 \|
	\| 1.4594 \| 35.0 \| 17010 \| 1.8694 \|
	\| 1.3038 \| 36.0 \| 17496 \| 1.8685 \|
	\| 1.5902 \| 37.0 \| 17982 \| 1.8695 \|
	\| 1.2771 \| 38.0 \| 18468 \| 1.8709 \|
	\| 1.2738 \| 39.0 \| 18954 \| 1.8698 \|
	\| 1.3209 \| 40.0 \| 19440 \| 1.8707 \|
	\| 1.2578 \| 41.0 \| 19926 \| 1.8709 \|
	\| 1.1108 \| 42.0 \| 20412 \| 1.8717 \|
	\| 1.3264 \| 43.0 \| 20898 \| 1.8711 \|
	\| 1.3152 \| 44.0 \| 21384 \| 1.8709 \|
	\| 1.4287 \| 45.0 \| 21870 \| 1.8709 \|
	\| 1.299 \| 46.0 \| 22356 \| 1.8709 \|
	\| 1.2863 \| 47.0 \| 22842 \| 1.8710 \|
	\| 1.1795 \| 48.0 \| 23328 \| 1.8716 \|
	\| 1.27 \| 49.0 \| 23814 \| 1.8719 \|
	\| 1.3156 \| 50.0 \| 24300 \| 1.8715 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.2
	- Pytorch 2.1.0+cu118
	- Datasets 2.19.2
	- Tokenizers 0.19.1