End of training

c2796a2 verified 10 months ago

4.85 kB

	---
	library_name: peft
	license: mit
	base_model: klyang/MentaLLaMA-chat-7B-hf
	tags:
	- llama-factory
	- lora
	- generated_from_trainer
	model-index:
	- name: MentaLLaMA-chat-7B-PsyCourse-fold2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# MentaLLaMA-chat-7B-PsyCourse-fold2

	This model is a fine-tuned version of [klyang/MentaLLaMA-chat-7B-hf](https://huggingface.co/klyang/MentaLLaMA-chat-7B-hf) on the course-train-fold2 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0289

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 16
	- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.7519 \| 0.0775 \| 50 \| 0.6389 \|
	\| 0.1485 \| 0.1550 \| 100 \| 0.1124 \|
	\| 0.074 \| 0.2326 \| 150 \| 0.0650 \|
	\| 0.0655 \| 0.3101 \| 200 \| 0.0619 \|
	\| 0.0598 \| 0.3876 \| 250 \| 0.0512 \|
	\| 0.0414 \| 0.4651 \| 300 \| 0.0449 \|
	\| 0.0427 \| 0.5426 \| 350 \| 0.0414 \|
	\| 0.0471 \| 0.6202 \| 400 \| 0.0387 \|
	\| 0.0433 \| 0.6977 \| 450 \| 0.0362 \|
	\| 0.0432 \| 0.7752 \| 500 \| 0.0353 \|
	\| 0.0445 \| 0.8527 \| 550 \| 0.0353 \|
	\| 0.0529 \| 0.9302 \| 600 \| 0.0353 \|
	\| 0.0313 \| 1.0078 \| 650 \| 0.0318 \|
	\| 0.0301 \| 1.0853 \| 700 \| 0.0322 \|
	\| 0.0289 \| 1.1628 \| 750 \| 0.0338 \|
	\| 0.0267 \| 1.2403 \| 800 \| 0.0314 \|
	\| 0.0314 \| 1.3178 \| 850 \| 0.0317 \|
	\| 0.0382 \| 1.3953 \| 900 \| 0.0327 \|
	\| 0.0354 \| 1.4729 \| 950 \| 0.0320 \|
	\| 0.0265 \| 1.5504 \| 1000 \| 0.0321 \|
	\| 0.0301 \| 1.6279 \| 1050 \| 0.0333 \|
	\| 0.0262 \| 1.7054 \| 1100 \| 0.0312 \|
	\| 0.0273 \| 1.7829 \| 1150 \| 0.0306 \|
	\| 0.0283 \| 1.8605 \| 1200 \| 0.0297 \|
	\| 0.0381 \| 1.9380 \| 1250 \| 0.0299 \|
	\| 0.0207 \| 2.0155 \| 1300 \| 0.0294 \|
	\| 0.0163 \| 2.0930 \| 1350 \| 0.0329 \|
	\| 0.0236 \| 2.1705 \| 1400 \| 0.0311 \|
	\| 0.0191 \| 2.2481 \| 1450 \| 0.0310 \|
	\| 0.0243 \| 2.3256 \| 1500 \| 0.0308 \|
	\| 0.0165 \| 2.4031 \| 1550 \| 0.0327 \|
	\| 0.0224 \| 2.4806 \| 1600 \| 0.0329 \|
	\| 0.0289 \| 2.5581 \| 1650 \| 0.0319 \|
	\| 0.014 \| 2.6357 \| 1700 \| 0.0316 \|
	\| 0.0182 \| 2.7132 \| 1750 \| 0.0334 \|
	\| 0.0175 \| 2.7907 \| 1800 \| 0.0298 \|
	\| 0.0218 \| 2.8682 \| 1850 \| 0.0297 \|
	\| 0.018 \| 2.9457 \| 1900 \| 0.0289 \|
	\| 0.01 \| 3.0233 \| 1950 \| 0.0309 \|
	\| 0.0109 \| 3.1008 \| 2000 \| 0.0338 \|
	\| 0.0076 \| 3.1783 \| 2050 \| 0.0347 \|
	\| 0.0087 \| 3.2558 \| 2100 \| 0.0358 \|
	\| 0.0092 \| 3.3333 \| 2150 \| 0.0323 \|
	\| 0.0078 \| 3.4109 \| 2200 \| 0.0331 \|
	\| 0.0109 \| 3.4884 \| 2250 \| 0.0356 \|
	\| 0.0137 \| 3.5659 \| 2300 \| 0.0360 \|
	\| 0.013 \| 3.6434 \| 2350 \| 0.0350 \|
	\| 0.0133 \| 3.7209 \| 2400 \| 0.0353 \|
	\| 0.0068 \| 3.7984 \| 2450 \| 0.0357 \|
	\| 0.012 \| 3.8760 \| 2500 \| 0.0348 \|
	\| 0.0088 \| 3.9535 \| 2550 \| 0.0344 \|
	\| 0.0066 \| 4.0310 \| 2600 \| 0.0346 \|
	\| 0.0052 \| 4.1085 \| 2650 \| 0.0361 \|
	\| 0.008 \| 4.1860 \| 2700 \| 0.0374 \|
	\| 0.0062 \| 4.2636 \| 2750 \| 0.0383 \|
	\| 0.005 \| 4.3411 \| 2800 \| 0.0386 \|
	\| 0.004 \| 4.4186 \| 2850 \| 0.0395 \|
	\| 0.0075 \| 4.4961 \| 2900 \| 0.0400 \|
	\| 0.003 \| 4.5736 \| 2950 \| 0.0402 \|
	\| 0.0066 \| 4.6512 \| 3000 \| 0.0405 \|
	\| 0.005 \| 4.7287 \| 3050 \| 0.0406 \|
	\| 0.0067 \| 4.8062 \| 3100 \| 0.0407 \|
	\| 0.0067 \| 4.8837 \| 3150 \| 0.0407 \|
	\| 0.006 \| 4.9612 \| 3200 \| 0.0407 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.46.1
	- Pytorch 2.5.1+cu124
	- Datasets 3.1.0
	- Tokenizers 0.20.3

	---
	library_name: peft
	license: mit
	base_model: klyang/MentaLLaMA-chat-7B-hf
	tags:
	- llama-factory
	- lora
	- generated_from_trainer
	model-index:
	- name: MentaLLaMA-chat-7B-PsyCourse-fold2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# MentaLLaMA-chat-7B-PsyCourse-fold2

	This model is a fine-tuned version of [klyang/MentaLLaMA-chat-7B-hf](https://huggingface.co/klyang/MentaLLaMA-chat-7B-hf) on the course-train-fold2 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0289

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 16
	- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.7519 \| 0.0775 \| 50 \| 0.6389 \|
	\| 0.1485 \| 0.1550 \| 100 \| 0.1124 \|
	\| 0.074 \| 0.2326 \| 150 \| 0.0650 \|
	\| 0.0655 \| 0.3101 \| 200 \| 0.0619 \|
	\| 0.0598 \| 0.3876 \| 250 \| 0.0512 \|
	\| 0.0414 \| 0.4651 \| 300 \| 0.0449 \|
	\| 0.0427 \| 0.5426 \| 350 \| 0.0414 \|
	\| 0.0471 \| 0.6202 \| 400 \| 0.0387 \|
	\| 0.0433 \| 0.6977 \| 450 \| 0.0362 \|
	\| 0.0432 \| 0.7752 \| 500 \| 0.0353 \|
	\| 0.0445 \| 0.8527 \| 550 \| 0.0353 \|
	\| 0.0529 \| 0.9302 \| 600 \| 0.0353 \|
	\| 0.0313 \| 1.0078 \| 650 \| 0.0318 \|
	\| 0.0301 \| 1.0853 \| 700 \| 0.0322 \|
	\| 0.0289 \| 1.1628 \| 750 \| 0.0338 \|
	\| 0.0267 \| 1.2403 \| 800 \| 0.0314 \|
	\| 0.0314 \| 1.3178 \| 850 \| 0.0317 \|
	\| 0.0382 \| 1.3953 \| 900 \| 0.0327 \|
	\| 0.0354 \| 1.4729 \| 950 \| 0.0320 \|
	\| 0.0265 \| 1.5504 \| 1000 \| 0.0321 \|
	\| 0.0301 \| 1.6279 \| 1050 \| 0.0333 \|
	\| 0.0262 \| 1.7054 \| 1100 \| 0.0312 \|
	\| 0.0273 \| 1.7829 \| 1150 \| 0.0306 \|
	\| 0.0283 \| 1.8605 \| 1200 \| 0.0297 \|
	\| 0.0381 \| 1.9380 \| 1250 \| 0.0299 \|
	\| 0.0207 \| 2.0155 \| 1300 \| 0.0294 \|
	\| 0.0163 \| 2.0930 \| 1350 \| 0.0329 \|
	\| 0.0236 \| 2.1705 \| 1400 \| 0.0311 \|
	\| 0.0191 \| 2.2481 \| 1450 \| 0.0310 \|
	\| 0.0243 \| 2.3256 \| 1500 \| 0.0308 \|
	\| 0.0165 \| 2.4031 \| 1550 \| 0.0327 \|
	\| 0.0224 \| 2.4806 \| 1600 \| 0.0329 \|
	\| 0.0289 \| 2.5581 \| 1650 \| 0.0319 \|
	\| 0.014 \| 2.6357 \| 1700 \| 0.0316 \|
	\| 0.0182 \| 2.7132 \| 1750 \| 0.0334 \|
	\| 0.0175 \| 2.7907 \| 1800 \| 0.0298 \|
	\| 0.0218 \| 2.8682 \| 1850 \| 0.0297 \|
	\| 0.018 \| 2.9457 \| 1900 \| 0.0289 \|
	\| 0.01 \| 3.0233 \| 1950 \| 0.0309 \|
	\| 0.0109 \| 3.1008 \| 2000 \| 0.0338 \|
	\| 0.0076 \| 3.1783 \| 2050 \| 0.0347 \|
	\| 0.0087 \| 3.2558 \| 2100 \| 0.0358 \|
	\| 0.0092 \| 3.3333 \| 2150 \| 0.0323 \|
	\| 0.0078 \| 3.4109 \| 2200 \| 0.0331 \|
	\| 0.0109 \| 3.4884 \| 2250 \| 0.0356 \|
	\| 0.0137 \| 3.5659 \| 2300 \| 0.0360 \|
	\| 0.013 \| 3.6434 \| 2350 \| 0.0350 \|
	\| 0.0133 \| 3.7209 \| 2400 \| 0.0353 \|
	\| 0.0068 \| 3.7984 \| 2450 \| 0.0357 \|
	\| 0.012 \| 3.8760 \| 2500 \| 0.0348 \|
	\| 0.0088 \| 3.9535 \| 2550 \| 0.0344 \|
	\| 0.0066 \| 4.0310 \| 2600 \| 0.0346 \|
	\| 0.0052 \| 4.1085 \| 2650 \| 0.0361 \|
	\| 0.008 \| 4.1860 \| 2700 \| 0.0374 \|
	\| 0.0062 \| 4.2636 \| 2750 \| 0.0383 \|
	\| 0.005 \| 4.3411 \| 2800 \| 0.0386 \|
	\| 0.004 \| 4.4186 \| 2850 \| 0.0395 \|
	\| 0.0075 \| 4.4961 \| 2900 \| 0.0400 \|
	\| 0.003 \| 4.5736 \| 2950 \| 0.0402 \|
	\| 0.0066 \| 4.6512 \| 3000 \| 0.0405 \|
	\| 0.005 \| 4.7287 \| 3050 \| 0.0406 \|
	\| 0.0067 \| 4.8062 \| 3100 \| 0.0407 \|
	\| 0.0067 \| 4.8837 \| 3150 \| 0.0407 \|
	\| 0.006 \| 4.9612 \| 3200 \| 0.0407 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.46.1
	- Pytorch 2.5.1+cu124
	- Datasets 3.1.0
	- Tokenizers 0.20.3