punic-model / README.md

Training in progress epoch 59

e97eb55 almost 3 years ago

4.02 kB

	---
	license: mit
	tags:
	- generated_from_keras_callback
	model-index:
	- name: ghdi/punic-model
	results: []
	---

	<!-- This model card has been generated automatically according to the information Keras had access to. You should
	probably proofread and complete it, then remove this comment. -->

	# ghdi/punic-model

	This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Train Loss: 3.9858
	- Validation Loss: 7.6193
	- Epoch: 59

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': -984, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
	- training_precision: mixed_float16

	### Training results

	\| Train Loss \| Validation Loss \| Epoch \|
	\|:----------:\|:---------------:\|:-----:\|
	\| 10.9100 \| 10.8188 \| 0 \|
	\| 10.7129 \| 10.4690 \| 1 \|
	\| 10.3775 \| 10.1048 \| 2 \|
	\| 10.0587 \| 9.8271 \| 3 \|
	\| 9.8034 \| 9.6395 \| 4 \|
	\| 9.6209 \| 9.5085 \| 5 \|
	\| 9.5047 \| 9.4043 \| 6 \|
	\| 9.3724 \| 9.3072 \| 7 \|
	\| 9.2873 \| 9.2090 \| 8 \|
	\| 9.1690 \| 9.1091 \| 9 \|
	\| 8.9963 \| 9.0013 \| 10 \|
	\| 8.8724 \| 8.8875 \| 11 \|
	\| 8.7316 \| 8.7701 \| 12 \|
	\| 8.6070 \| 8.6477 \| 13 \|
	\| 8.4242 \| 8.5243 \| 14 \|
	\| 8.2700 \| 8.4018 \| 15 \|
	\| 8.1555 \| 8.2834 \| 16 \|
	\| 7.9978 \| 8.1696 \| 17 \|
	\| 7.8495 \| 8.0607 \| 18 \|
	\| 7.6980 \| 7.9635 \| 19 \|
	\| 7.5339 \| 7.8726 \| 20 \|
	\| 7.4741 \| 7.7917 \| 21 \|
	\| 7.3669 \| 7.7233 \| 22 \|
	\| 7.2598 \| 7.6604 \| 23 \|
	\| 7.1434 \| 7.6088 \| 24 \|
	\| 7.0434 \| 7.5579 \| 25 \|
	\| 6.9874 \| 7.5171 \| 26 \|
	\| 6.8629 \| 7.4881 \| 27 \|
	\| 6.8293 \| 7.4694 \| 28 \|
	\| 6.6349 \| 7.4367 \| 29 \|
	\| 6.7589 \| 7.4071 \| 30 \|
	\| 6.5890 \| 7.4003 \| 31 \|
	\| 6.5476 \| 7.3576 \| 32 \|
	\| 6.4606 \| 7.3400 \| 33 \|
	\| 6.3945 \| 7.3327 \| 34 \|
	\| 6.2495 \| 7.3435 \| 35 \|
	\| 6.0722 \| 7.3375 \| 36 \|
	\| 6.1324 \| 7.3365 \| 37 \|
	\| 6.0493 \| 7.3458 \| 38 \|
	\| 5.9514 \| 7.4002 \| 39 \|
	\| 5.8638 \| 7.3356 \| 40 \|
	\| 5.7390 \| 7.3488 \| 41 \|
	\| 5.6403 \| 7.3687 \| 42 \|
	\| 5.5442 \| 7.3831 \| 43 \|
	\| 5.4542 \| 7.3888 \| 44 \|
	\| 5.3243 \| 7.4340 \| 45 \|
	\| 5.2295 \| 7.4170 \| 46 \|
	\| 5.1436 \| 7.4110 \| 47 \|
	\| 5.0199 \| 7.5223 \| 48 \|
	\| 4.9058 \| 7.5142 \| 49 \|
	\| 4.8393 \| 7.4926 \| 50 \|
	\| 4.7104 \| 7.5253 \| 51 \|
	\| 4.6212 \| 7.5420 \| 52 \|
	\| 4.5298 \| 7.5799 \| 53 \|
	\| 4.4251 \| 7.5940 \| 54 \|
	\| 4.3130 \| 7.5752 \| 55 \|
	\| 4.2240 \| 7.6315 \| 56 \|
	\| 4.1587 \| 7.6412 \| 57 \|
	\| 4.0442 \| 7.6748 \| 58 \|
	\| 3.9858 \| 7.6193 \| 59 \|


	### Framework versions

	- Transformers 4.28.1
	- TensorFlow 2.12.0
	- Datasets 2.11.0
	- Tokenizers 0.13.3

	---
	license: mit
	tags:
	- generated_from_keras_callback
	model-index:
	- name: ghdi/punic-model
	results: []
	---

	<!-- This model card has been generated automatically according to the information Keras had access to. You should
	probably proofread and complete it, then remove this comment. -->

	# ghdi/punic-model

	This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Train Loss: 3.9858
	- Validation Loss: 7.6193
	- Epoch: 59

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': -984, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
	- training_precision: mixed_float16

	### Training results

	\| Train Loss \| Validation Loss \| Epoch \|
	\|:----------:\|:---------------:\|:-----:\|
	\| 10.9100 \| 10.8188 \| 0 \|
	\| 10.7129 \| 10.4690 \| 1 \|
	\| 10.3775 \| 10.1048 \| 2 \|
	\| 10.0587 \| 9.8271 \| 3 \|
	\| 9.8034 \| 9.6395 \| 4 \|
	\| 9.6209 \| 9.5085 \| 5 \|
	\| 9.5047 \| 9.4043 \| 6 \|
	\| 9.3724 \| 9.3072 \| 7 \|
	\| 9.2873 \| 9.2090 \| 8 \|
	\| 9.1690 \| 9.1091 \| 9 \|
	\| 8.9963 \| 9.0013 \| 10 \|
	\| 8.8724 \| 8.8875 \| 11 \|
	\| 8.7316 \| 8.7701 \| 12 \|
	\| 8.6070 \| 8.6477 \| 13 \|
	\| 8.4242 \| 8.5243 \| 14 \|
	\| 8.2700 \| 8.4018 \| 15 \|
	\| 8.1555 \| 8.2834 \| 16 \|
	\| 7.9978 \| 8.1696 \| 17 \|
	\| 7.8495 \| 8.0607 \| 18 \|
	\| 7.6980 \| 7.9635 \| 19 \|
	\| 7.5339 \| 7.8726 \| 20 \|
	\| 7.4741 \| 7.7917 \| 21 \|
	\| 7.3669 \| 7.7233 \| 22 \|
	\| 7.2598 \| 7.6604 \| 23 \|
	\| 7.1434 \| 7.6088 \| 24 \|
	\| 7.0434 \| 7.5579 \| 25 \|
	\| 6.9874 \| 7.5171 \| 26 \|
	\| 6.8629 \| 7.4881 \| 27 \|
	\| 6.8293 \| 7.4694 \| 28 \|
	\| 6.6349 \| 7.4367 \| 29 \|
	\| 6.7589 \| 7.4071 \| 30 \|
	\| 6.5890 \| 7.4003 \| 31 \|
	\| 6.5476 \| 7.3576 \| 32 \|
	\| 6.4606 \| 7.3400 \| 33 \|
	\| 6.3945 \| 7.3327 \| 34 \|
	\| 6.2495 \| 7.3435 \| 35 \|
	\| 6.0722 \| 7.3375 \| 36 \|
	\| 6.1324 \| 7.3365 \| 37 \|
	\| 6.0493 \| 7.3458 \| 38 \|
	\| 5.9514 \| 7.4002 \| 39 \|
	\| 5.8638 \| 7.3356 \| 40 \|
	\| 5.7390 \| 7.3488 \| 41 \|
	\| 5.6403 \| 7.3687 \| 42 \|
	\| 5.5442 \| 7.3831 \| 43 \|
	\| 5.4542 \| 7.3888 \| 44 \|
	\| 5.3243 \| 7.4340 \| 45 \|
	\| 5.2295 \| 7.4170 \| 46 \|
	\| 5.1436 \| 7.4110 \| 47 \|
	\| 5.0199 \| 7.5223 \| 48 \|
	\| 4.9058 \| 7.5142 \| 49 \|
	\| 4.8393 \| 7.4926 \| 50 \|
	\| 4.7104 \| 7.5253 \| 51 \|
	\| 4.6212 \| 7.5420 \| 52 \|
	\| 4.5298 \| 7.5799 \| 53 \|
	\| 4.4251 \| 7.5940 \| 54 \|
	\| 4.3130 \| 7.5752 \| 55 \|
	\| 4.2240 \| 7.6315 \| 56 \|
	\| 4.1587 \| 7.6412 \| 57 \|
	\| 4.0442 \| 7.6748 \| 58 \|
	\| 3.9858 \| 7.6193 \| 59 \|


	### Framework versions

	- Transformers 4.28.1
	- TensorFlow 2.12.0
	- Datasets 2.11.0
	- Tokenizers 0.13.3