End of training

e3ea572 verified about 1 month ago

4.92 kB

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-6.7b-base
	tags:
	- base_model:adapter:deepseek-ai/deepseek-coder-6.7b-base
	- lora
	- transformers
	pipeline_tag: text-generation
	model-index:
	- name: lemexp-task1-v3-lemma_object_small_nodefs-deepseek-coder-6.7b-base
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-task1-v3-lemma_object_small_nodefs-deepseek-coder-6.7b-base

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-6.7b-base](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2294

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0004
	- train_batch_size: 4
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- total_train_batch_size: 16
	- total_eval_batch_size: 8
	- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 12
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:-----:\|:---------------:\|
	\| 0.4045 \| 0.2001 \| 720 \| 0.3253 \|
	\| 0.3156 \| 0.4001 \| 1440 \| 0.2905 \|
	\| 0.2636 \| 0.6002 \| 2160 \| 0.2644 \|
	\| 0.2455 \| 0.8002 \| 2880 \| 0.2401 \|
	\| 0.227 \| 1.0003 \| 3600 \| 0.2328 \|
	\| 0.1903 \| 1.2003 \| 4320 \| 0.2267 \|
	\| 0.1872 \| 1.4004 \| 5040 \| 0.2187 \|
	\| 0.1831 \| 1.6004 \| 5760 \| 0.2105 \|
	\| 0.1785 \| 1.8005 \| 6480 \| 0.2082 \|
	\| 0.1736 \| 2.0006 \| 7200 \| 0.1973 \|
	\| 0.1512 \| 2.2006 \| 7920 \| 0.1999 \|
	\| 0.1414 \| 2.4007 \| 8640 \| 0.1973 \|
	\| 0.1413 \| 2.6007 \| 9360 \| 0.1891 \|
	\| 0.1397 \| 2.8008 \| 10080 \| 0.1857 \|
	\| 0.1401 \| 3.0008 \| 10800 \| 0.1840 \|
	\| 0.1111 \| 3.2009 \| 11520 \| 0.1837 \|
	\| 0.1135 \| 3.4009 \| 12240 \| 0.1819 \|
	\| 0.1138 \| 3.6010 \| 12960 \| 0.1821 \|
	\| 0.1133 \| 3.8011 \| 13680 \| 0.1799 \|
	\| 0.1147 \| 4.0011 \| 14400 \| 0.1769 \|
	\| 0.0871 \| 4.2012 \| 15120 \| 0.1822 \|
	\| 0.0902 \| 4.4012 \| 15840 \| 0.1860 \|
	\| 0.0921 \| 4.6013 \| 16560 \| 0.1809 \|
	\| 0.0956 \| 4.8013 \| 17280 \| 0.1696 \|
	\| 0.0932 \| 5.0014 \| 18000 \| 0.1711 \|
	\| 0.0706 \| 5.2014 \| 18720 \| 0.1792 \|
	\| 0.0715 \| 5.4015 \| 19440 \| 0.1803 \|
	\| 0.075 \| 5.6016 \| 20160 \| 0.1746 \|
	\| 0.0775 \| 5.8016 \| 20880 \| 0.1798 \|
	\| 0.0759 \| 6.0017 \| 21600 \| 0.1766 \|
	\| 0.0604 \| 6.2017 \| 22320 \| 0.1849 \|
	\| 0.0608 \| 6.4018 \| 23040 \| 0.1875 \|
	\| 0.0626 \| 6.6018 \| 23760 \| 0.1774 \|
	\| 0.0614 \| 6.8019 \| 24480 \| 0.1776 \|
	\| 0.0622 \| 7.0019 \| 25200 \| 0.1798 \|
	\| 0.0505 \| 7.2020 \| 25920 \| 0.1918 \|
	\| 0.0475 \| 7.4021 \| 26640 \| 0.1941 \|
	\| 0.0487 \| 7.6021 \| 27360 \| 0.1886 \|
	\| 0.0507 \| 7.8022 \| 28080 \| 0.1860 \|
	\| 0.0507 \| 8.0022 \| 28800 \| 0.1883 \|
	\| 0.0372 \| 8.2023 \| 29520 \| 0.2032 \|
	\| 0.0368 \| 8.4023 \| 30240 \| 0.1974 \|
	\| 0.0383 \| 8.6024 \| 30960 \| 0.1965 \|
	\| 0.0387 \| 8.8024 \| 31680 \| 0.1971 \|
	\| 0.0395 \| 9.0025 \| 32400 \| 0.1936 \|
	\| 0.0277 \| 9.2026 \| 33120 \| 0.2076 \|
	\| 0.0287 \| 9.4026 \| 33840 \| 0.1991 \|
	\| 0.0305 \| 9.6027 \| 34560 \| 0.2040 \|
	\| 0.0315 \| 9.8027 \| 35280 \| 0.2068 \|
	\| 0.0305 \| 10.0028 \| 36000 \| 0.1999 \|
	\| 0.0228 \| 10.2028 \| 36720 \| 0.2139 \|
	\| 0.0226 \| 10.4029 \| 37440 \| 0.2154 \|
	\| 0.0237 \| 10.6029 \| 38160 \| 0.2152 \|
	\| 0.0236 \| 10.8030 \| 38880 \| 0.2104 \|
	\| 0.0236 \| 11.0031 \| 39600 \| 0.2112 \|
	\| 0.0192 \| 11.2031 \| 40320 \| 0.2329 \|
	\| 0.0186 \| 11.4032 \| 41040 \| 0.2319 \|
	\| 0.0184 \| 11.6032 \| 41760 \| 0.2299 \|
	\| 0.0189 \| 11.8033 \| 42480 \| 0.2294 \|


	### Framework versions

	- PEFT 0.17.1
	- Transformers 4.55.4
	- Pytorch 2.8.0+cu128
	- Datasets 4.0.0
	- Tokenizers 0.21.4

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-6.7b-base
	tags:
	- base_model:adapter:deepseek-ai/deepseek-coder-6.7b-base
	- lora
	- transformers
	pipeline_tag: text-generation
	model-index:
	- name: lemexp-task1-v3-lemma_object_small_nodefs-deepseek-coder-6.7b-base
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-task1-v3-lemma_object_small_nodefs-deepseek-coder-6.7b-base

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-6.7b-base](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2294

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0004
	- train_batch_size: 4
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- total_train_batch_size: 16
	- total_eval_batch_size: 8
	- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 12
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:-----:\|:---------------:\|
	\| 0.4045 \| 0.2001 \| 720 \| 0.3253 \|
	\| 0.3156 \| 0.4001 \| 1440 \| 0.2905 \|
	\| 0.2636 \| 0.6002 \| 2160 \| 0.2644 \|
	\| 0.2455 \| 0.8002 \| 2880 \| 0.2401 \|
	\| 0.227 \| 1.0003 \| 3600 \| 0.2328 \|
	\| 0.1903 \| 1.2003 \| 4320 \| 0.2267 \|
	\| 0.1872 \| 1.4004 \| 5040 \| 0.2187 \|
	\| 0.1831 \| 1.6004 \| 5760 \| 0.2105 \|
	\| 0.1785 \| 1.8005 \| 6480 \| 0.2082 \|
	\| 0.1736 \| 2.0006 \| 7200 \| 0.1973 \|
	\| 0.1512 \| 2.2006 \| 7920 \| 0.1999 \|
	\| 0.1414 \| 2.4007 \| 8640 \| 0.1973 \|
	\| 0.1413 \| 2.6007 \| 9360 \| 0.1891 \|
	\| 0.1397 \| 2.8008 \| 10080 \| 0.1857 \|
	\| 0.1401 \| 3.0008 \| 10800 \| 0.1840 \|
	\| 0.1111 \| 3.2009 \| 11520 \| 0.1837 \|
	\| 0.1135 \| 3.4009 \| 12240 \| 0.1819 \|
	\| 0.1138 \| 3.6010 \| 12960 \| 0.1821 \|
	\| 0.1133 \| 3.8011 \| 13680 \| 0.1799 \|
	\| 0.1147 \| 4.0011 \| 14400 \| 0.1769 \|
	\| 0.0871 \| 4.2012 \| 15120 \| 0.1822 \|
	\| 0.0902 \| 4.4012 \| 15840 \| 0.1860 \|
	\| 0.0921 \| 4.6013 \| 16560 \| 0.1809 \|
	\| 0.0956 \| 4.8013 \| 17280 \| 0.1696 \|
	\| 0.0932 \| 5.0014 \| 18000 \| 0.1711 \|
	\| 0.0706 \| 5.2014 \| 18720 \| 0.1792 \|
	\| 0.0715 \| 5.4015 \| 19440 \| 0.1803 \|
	\| 0.075 \| 5.6016 \| 20160 \| 0.1746 \|
	\| 0.0775 \| 5.8016 \| 20880 \| 0.1798 \|
	\| 0.0759 \| 6.0017 \| 21600 \| 0.1766 \|
	\| 0.0604 \| 6.2017 \| 22320 \| 0.1849 \|
	\| 0.0608 \| 6.4018 \| 23040 \| 0.1875 \|
	\| 0.0626 \| 6.6018 \| 23760 \| 0.1774 \|
	\| 0.0614 \| 6.8019 \| 24480 \| 0.1776 \|
	\| 0.0622 \| 7.0019 \| 25200 \| 0.1798 \|
	\| 0.0505 \| 7.2020 \| 25920 \| 0.1918 \|
	\| 0.0475 \| 7.4021 \| 26640 \| 0.1941 \|
	\| 0.0487 \| 7.6021 \| 27360 \| 0.1886 \|
	\| 0.0507 \| 7.8022 \| 28080 \| 0.1860 \|
	\| 0.0507 \| 8.0022 \| 28800 \| 0.1883 \|
	\| 0.0372 \| 8.2023 \| 29520 \| 0.2032 \|
	\| 0.0368 \| 8.4023 \| 30240 \| 0.1974 \|
	\| 0.0383 \| 8.6024 \| 30960 \| 0.1965 \|
	\| 0.0387 \| 8.8024 \| 31680 \| 0.1971 \|
	\| 0.0395 \| 9.0025 \| 32400 \| 0.1936 \|
	\| 0.0277 \| 9.2026 \| 33120 \| 0.2076 \|
	\| 0.0287 \| 9.4026 \| 33840 \| 0.1991 \|
	\| 0.0305 \| 9.6027 \| 34560 \| 0.2040 \|
	\| 0.0315 \| 9.8027 \| 35280 \| 0.2068 \|
	\| 0.0305 \| 10.0028 \| 36000 \| 0.1999 \|
	\| 0.0228 \| 10.2028 \| 36720 \| 0.2139 \|
	\| 0.0226 \| 10.4029 \| 37440 \| 0.2154 \|
	\| 0.0237 \| 10.6029 \| 38160 \| 0.2152 \|
	\| 0.0236 \| 10.8030 \| 38880 \| 0.2104 \|
	\| 0.0236 \| 11.0031 \| 39600 \| 0.2112 \|
	\| 0.0192 \| 11.2031 \| 40320 \| 0.2329 \|
	\| 0.0186 \| 11.4032 \| 41040 \| 0.2319 \|
	\| 0.0184 \| 11.6032 \| 41760 \| 0.2299 \|
	\| 0.0189 \| 11.8033 \| 42480 \| 0.2294 \|


	### Framework versions

	- PEFT 0.17.1
	- Transformers 4.55.4
	- Pytorch 2.8.0+cu128
	- Datasets 4.0.0
	- Tokenizers 0.21.4