Se124M10KInfPrompt / README.md

End of fine-tuning training

a389c5c verified 8 months ago

3.79 kB

	---
	library_name: peft
	license: mit
	base_model: gpt2
	tags:
	- generated_from_trainer
	model-index:
	- name: Se124M10KInfPrompt
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Se124M10KInfPrompt

	This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7128

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 0.4014 \| 1.0 \| 267 \| 1.0141 \|
	\| 0.2422 \| 2.0 \| 534 \| 0.8523 \|
	\| 0.2202 \| 3.0 \| 801 \| 0.8168 \|
	\| 0.2129 \| 4.0 \| 1068 \| 0.7993 \|
	\| 0.2059 \| 5.0 \| 1335 \| 0.7837 \|
	\| 0.2041 \| 6.0 \| 1602 \| 0.7695 \|
	\| 0.2031 \| 7.0 \| 1869 \| 0.7635 \|
	\| 0.1982 \| 8.0 \| 2136 \| 0.7586 \|
	\| 0.1975 \| 9.0 \| 2403 \| 0.7532 \|
	\| 0.1974 \| 10.0 \| 2670 \| 0.7483 \|
	\| 0.1978 \| 11.0 \| 2937 \| 0.7467 \|
	\| 0.1939 \| 12.0 \| 3204 \| 0.7445 \|
	\| 0.1953 \| 13.0 \| 3471 \| 0.7439 \|
	\| 0.1929 \| 14.0 \| 3738 \| 0.7362 \|
	\| 0.1937 \| 15.0 \| 4005 \| 0.7328 \|
	\| 0.1934 \| 16.0 \| 4272 \| 0.7329 \|
	\| 0.1927 \| 17.0 \| 4539 \| 0.7323 \|
	\| 0.1927 \| 18.0 \| 4806 \| 0.7257 \|
	\| 0.1909 \| 19.0 \| 5073 \| 0.7276 \|
	\| 0.1919 \| 20.0 \| 5340 \| 0.7251 \|
	\| 0.1919 \| 21.0 \| 5607 \| 0.7239 \|
	\| 0.1912 \| 22.0 \| 5874 \| 0.7260 \|
	\| 0.1897 \| 23.0 \| 6141 \| 0.7241 \|
	\| 0.1916 \| 24.0 \| 6408 \| 0.7235 \|
	\| 0.1905 \| 25.0 \| 6675 \| 0.7225 \|
	\| 0.1919 \| 26.0 \| 6942 \| 0.7188 \|
	\| 0.1883 \| 27.0 \| 7209 \| 0.7207 \|
	\| 0.1898 \| 28.0 \| 7476 \| 0.7198 \|
	\| 0.1874 \| 29.0 \| 7743 \| 0.7195 \|
	\| 0.188 \| 30.0 \| 8010 \| 0.7194 \|
	\| 0.1873 \| 31.0 \| 8277 \| 0.7182 \|
	\| 0.1878 \| 32.0 \| 8544 \| 0.7212 \|
	\| 0.1866 \| 33.0 \| 8811 \| 0.7171 \|
	\| 0.1883 \| 34.0 \| 9078 \| 0.7151 \|
	\| 0.1881 \| 35.0 \| 9345 \| 0.7176 \|
	\| 0.1868 \| 36.0 \| 9612 \| 0.7149 \|
	\| 0.1871 \| 37.0 \| 9879 \| 0.7157 \|
	\| 0.1876 \| 38.0 \| 10146 \| 0.7162 \|
	\| 0.188 \| 39.0 \| 10413 \| 0.7142 \|
	\| 0.1861 \| 40.0 \| 10680 \| 0.7149 \|
	\| 0.1862 \| 41.0 \| 10947 \| 0.7144 \|
	\| 0.1862 \| 42.0 \| 11214 \| 0.7128 \|
	\| 0.186 \| 43.0 \| 11481 \| 0.7136 \|
	\| 0.1868 \| 44.0 \| 11748 \| 0.7137 \|
	\| 0.1837 \| 45.0 \| 12015 \| 0.7138 \|
	\| 0.1868 \| 46.0 \| 12282 \| 0.7141 \|
	\| 0.187 \| 47.0 \| 12549 \| 0.7133 \|


	### Framework versions

	- PEFT 0.15.1
	- Transformers 4.51.3
	- Pytorch 2.6.0+cu118
	- Datasets 3.5.0
	- Tokenizers 0.21.1

	---
	library_name: peft
	license: mit
	base_model: gpt2
	tags:
	- generated_from_trainer
	model-index:
	- name: Se124M10KInfPrompt
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Se124M10KInfPrompt

	This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7128

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 0.4014 \| 1.0 \| 267 \| 1.0141 \|
	\| 0.2422 \| 2.0 \| 534 \| 0.8523 \|
	\| 0.2202 \| 3.0 \| 801 \| 0.8168 \|
	\| 0.2129 \| 4.0 \| 1068 \| 0.7993 \|
	\| 0.2059 \| 5.0 \| 1335 \| 0.7837 \|
	\| 0.2041 \| 6.0 \| 1602 \| 0.7695 \|
	\| 0.2031 \| 7.0 \| 1869 \| 0.7635 \|
	\| 0.1982 \| 8.0 \| 2136 \| 0.7586 \|
	\| 0.1975 \| 9.0 \| 2403 \| 0.7532 \|
	\| 0.1974 \| 10.0 \| 2670 \| 0.7483 \|
	\| 0.1978 \| 11.0 \| 2937 \| 0.7467 \|
	\| 0.1939 \| 12.0 \| 3204 \| 0.7445 \|
	\| 0.1953 \| 13.0 \| 3471 \| 0.7439 \|
	\| 0.1929 \| 14.0 \| 3738 \| 0.7362 \|
	\| 0.1937 \| 15.0 \| 4005 \| 0.7328 \|
	\| 0.1934 \| 16.0 \| 4272 \| 0.7329 \|
	\| 0.1927 \| 17.0 \| 4539 \| 0.7323 \|
	\| 0.1927 \| 18.0 \| 4806 \| 0.7257 \|
	\| 0.1909 \| 19.0 \| 5073 \| 0.7276 \|
	\| 0.1919 \| 20.0 \| 5340 \| 0.7251 \|
	\| 0.1919 \| 21.0 \| 5607 \| 0.7239 \|
	\| 0.1912 \| 22.0 \| 5874 \| 0.7260 \|
	\| 0.1897 \| 23.0 \| 6141 \| 0.7241 \|
	\| 0.1916 \| 24.0 \| 6408 \| 0.7235 \|
	\| 0.1905 \| 25.0 \| 6675 \| 0.7225 \|
	\| 0.1919 \| 26.0 \| 6942 \| 0.7188 \|
	\| 0.1883 \| 27.0 \| 7209 \| 0.7207 \|
	\| 0.1898 \| 28.0 \| 7476 \| 0.7198 \|
	\| 0.1874 \| 29.0 \| 7743 \| 0.7195 \|
	\| 0.188 \| 30.0 \| 8010 \| 0.7194 \|
	\| 0.1873 \| 31.0 \| 8277 \| 0.7182 \|
	\| 0.1878 \| 32.0 \| 8544 \| 0.7212 \|
	\| 0.1866 \| 33.0 \| 8811 \| 0.7171 \|
	\| 0.1883 \| 34.0 \| 9078 \| 0.7151 \|
	\| 0.1881 \| 35.0 \| 9345 \| 0.7176 \|
	\| 0.1868 \| 36.0 \| 9612 \| 0.7149 \|
	\| 0.1871 \| 37.0 \| 9879 \| 0.7157 \|
	\| 0.1876 \| 38.0 \| 10146 \| 0.7162 \|
	\| 0.188 \| 39.0 \| 10413 \| 0.7142 \|
	\| 0.1861 \| 40.0 \| 10680 \| 0.7149 \|
	\| 0.1862 \| 41.0 \| 10947 \| 0.7144 \|
	\| 0.1862 \| 42.0 \| 11214 \| 0.7128 \|
	\| 0.186 \| 43.0 \| 11481 \| 0.7136 \|
	\| 0.1868 \| 44.0 \| 11748 \| 0.7137 \|
	\| 0.1837 \| 45.0 \| 12015 \| 0.7138 \|
	\| 0.1868 \| 46.0 \| 12282 \| 0.7141 \|
	\| 0.187 \| 47.0 \| 12549 \| 0.7133 \|


	### Framework versions

	- PEFT 0.15.1
	- Transformers 4.51.3
	- Pytorch 2.6.0+cu118
	- Datasets 3.5.0
	- Tokenizers 0.21.1