lc_reddit / README.md

End of training

067ebe5 verified over 1 year ago

3.89 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: mistralai/Mistral-7B-v0.1
	model-index:
	- name: lc_reddit
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lc_reddit

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.4421

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 1.893 \| 1.0 \| 306 \| 1.8913 \|
	\| 2.0254 \| 2.0 \| 612 \| 1.8820 \|
	\| 1.7988 \| 3.0 \| 918 \| 1.8821 \|
	\| 1.6315 \| 4.0 \| 1224 \| 1.8907 \|
	\| 1.6124 \| 5.0 \| 1530 \| 1.9035 \|
	\| 1.7146 \| 6.0 \| 1836 \| 1.9222 \|
	\| 1.5807 \| 7.0 \| 2142 \| 1.9383 \|
	\| 1.4356 \| 8.0 \| 2448 \| 1.9796 \|
	\| 1.4935 \| 9.0 \| 2754 \| 1.9838 \|
	\| 1.3923 \| 10.0 \| 3060 \| 2.0112 \|
	\| 1.5712 \| 11.0 \| 3366 \| 2.0217 \|
	\| 1.5216 \| 12.0 \| 3672 \| 2.0493 \|
	\| 1.2513 \| 13.0 \| 3978 \| 2.0731 \|
	\| 1.2956 \| 14.0 \| 4284 \| 2.1151 \|
	\| 1.5635 \| 15.0 \| 4590 \| 2.1298 \|
	\| 1.4956 \| 16.0 \| 4896 \| 2.1525 \|
	\| 1.3592 \| 17.0 \| 5202 \| 2.1751 \|
	\| 1.1305 \| 18.0 \| 5508 \| 2.1893 \|
	\| 1.1396 \| 19.0 \| 5814 \| 2.2454 \|
	\| 1.3858 \| 20.0 \| 6120 \| 2.2797 \|
	\| 1.3174 \| 21.0 \| 6426 \| 2.2689 \|
	\| 1.5609 \| 22.0 \| 6732 \| 2.3098 \|
	\| 1.3431 \| 23.0 \| 7038 \| 2.3238 \|
	\| 1.3111 \| 24.0 \| 7344 \| 2.3742 \|
	\| 1.1365 \| 25.0 \| 7650 \| 2.3727 \|
	\| 1.3318 \| 26.0 \| 7956 \| 2.3978 \|
	\| 1.3297 \| 27.0 \| 8262 \| 2.3647 \|
	\| 1.2178 \| 28.0 \| 8568 \| 2.3971 \|
	\| 1.2757 \| 29.0 \| 8874 \| 2.4292 \|
	\| 1.236 \| 30.0 \| 9180 \| 2.4170 \|
	\| 1.1888 \| 31.0 \| 9486 \| 2.4439 \|
	\| 1.0917 \| 32.0 \| 9792 \| 2.4225 \|
	\| 1.1148 \| 33.0 \| 10098 \| 2.4166 \|
	\| 1.1907 \| 34.0 \| 10404 \| 2.4318 \|
	\| 1.1906 \| 35.0 \| 10710 \| 2.4352 \|
	\| 1.2238 \| 36.0 \| 11016 \| 2.4471 \|
	\| 1.1596 \| 37.0 \| 11322 \| 2.4382 \|
	\| 1.2184 \| 38.0 \| 11628 \| 2.4343 \|
	\| 1.2428 \| 39.0 \| 11934 \| 2.4422 \|
	\| 1.3111 \| 40.0 \| 12240 \| 2.4397 \|
	\| 1.2845 \| 41.0 \| 12546 \| 2.4460 \|
	\| 1.3173 \| 42.0 \| 12852 \| 2.4428 \|
	\| 1.193 \| 43.0 \| 13158 \| 2.4430 \|
	\| 1.1774 \| 44.0 \| 13464 \| 2.4425 \|
	\| 1.1868 \| 45.0 \| 13770 \| 2.4396 \|
	\| 1.2042 \| 46.0 \| 14076 \| 2.4430 \|
	\| 1.2833 \| 47.0 \| 14382 \| 2.4398 \|
	\| 1.2766 \| 48.0 \| 14688 \| 2.4410 \|
	\| 1.4958 \| 49.0 \| 14994 \| 2.4412 \|
	\| 1.1868 \| 50.0 \| 15300 \| 2.4421 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.2
	- Pytorch 2.1.0+cu118
	- Datasets 2.19.2
	- Tokenizers 0.19.1

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: mistralai/Mistral-7B-v0.1
	model-index:
	- name: lc_reddit
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lc_reddit

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.4421

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 1.893 \| 1.0 \| 306 \| 1.8913 \|
	\| 2.0254 \| 2.0 \| 612 \| 1.8820 \|
	\| 1.7988 \| 3.0 \| 918 \| 1.8821 \|
	\| 1.6315 \| 4.0 \| 1224 \| 1.8907 \|
	\| 1.6124 \| 5.0 \| 1530 \| 1.9035 \|
	\| 1.7146 \| 6.0 \| 1836 \| 1.9222 \|
	\| 1.5807 \| 7.0 \| 2142 \| 1.9383 \|
	\| 1.4356 \| 8.0 \| 2448 \| 1.9796 \|
	\| 1.4935 \| 9.0 \| 2754 \| 1.9838 \|
	\| 1.3923 \| 10.0 \| 3060 \| 2.0112 \|
	\| 1.5712 \| 11.0 \| 3366 \| 2.0217 \|
	\| 1.5216 \| 12.0 \| 3672 \| 2.0493 \|
	\| 1.2513 \| 13.0 \| 3978 \| 2.0731 \|
	\| 1.2956 \| 14.0 \| 4284 \| 2.1151 \|
	\| 1.5635 \| 15.0 \| 4590 \| 2.1298 \|
	\| 1.4956 \| 16.0 \| 4896 \| 2.1525 \|
	\| 1.3592 \| 17.0 \| 5202 \| 2.1751 \|
	\| 1.1305 \| 18.0 \| 5508 \| 2.1893 \|
	\| 1.1396 \| 19.0 \| 5814 \| 2.2454 \|
	\| 1.3858 \| 20.0 \| 6120 \| 2.2797 \|
	\| 1.3174 \| 21.0 \| 6426 \| 2.2689 \|
	\| 1.5609 \| 22.0 \| 6732 \| 2.3098 \|
	\| 1.3431 \| 23.0 \| 7038 \| 2.3238 \|
	\| 1.3111 \| 24.0 \| 7344 \| 2.3742 \|
	\| 1.1365 \| 25.0 \| 7650 \| 2.3727 \|
	\| 1.3318 \| 26.0 \| 7956 \| 2.3978 \|
	\| 1.3297 \| 27.0 \| 8262 \| 2.3647 \|
	\| 1.2178 \| 28.0 \| 8568 \| 2.3971 \|
	\| 1.2757 \| 29.0 \| 8874 \| 2.4292 \|
	\| 1.236 \| 30.0 \| 9180 \| 2.4170 \|
	\| 1.1888 \| 31.0 \| 9486 \| 2.4439 \|
	\| 1.0917 \| 32.0 \| 9792 \| 2.4225 \|
	\| 1.1148 \| 33.0 \| 10098 \| 2.4166 \|
	\| 1.1907 \| 34.0 \| 10404 \| 2.4318 \|
	\| 1.1906 \| 35.0 \| 10710 \| 2.4352 \|
	\| 1.2238 \| 36.0 \| 11016 \| 2.4471 \|
	\| 1.1596 \| 37.0 \| 11322 \| 2.4382 \|
	\| 1.2184 \| 38.0 \| 11628 \| 2.4343 \|
	\| 1.2428 \| 39.0 \| 11934 \| 2.4422 \|
	\| 1.3111 \| 40.0 \| 12240 \| 2.4397 \|
	\| 1.2845 \| 41.0 \| 12546 \| 2.4460 \|
	\| 1.3173 \| 42.0 \| 12852 \| 2.4428 \|
	\| 1.193 \| 43.0 \| 13158 \| 2.4430 \|
	\| 1.1774 \| 44.0 \| 13464 \| 2.4425 \|
	\| 1.1868 \| 45.0 \| 13770 \| 2.4396 \|
	\| 1.2042 \| 46.0 \| 14076 \| 2.4430 \|
	\| 1.2833 \| 47.0 \| 14382 \| 2.4398 \|
	\| 1.2766 \| 48.0 \| 14688 \| 2.4410 \|
	\| 1.4958 \| 49.0 \| 14994 \| 2.4412 \|
	\| 1.1868 \| 50.0 \| 15300 \| 2.4421 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.2
	- Pytorch 2.1.0+cu118
	- Datasets 2.19.2
	- Tokenizers 0.19.1