FiveC
/

BartTayFinal-test

text2text-generation

Generated from Trainer

Model card Files Files and versions

BartTayFinal-test / README.md

FiveC's picture

End of training

a1e2f12 verified 5 months ago

|

history blame contribute delete

3.96 kB

	---
	library_name: transformers
	license: mit
	base_model: FiveC/BartTay
	tags:
	- generated_from_trainer
	metrics:
	- sacrebleu
	model-index:
	- name: BartTayFinal-test
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# BartTayFinal-test

	This model is a fine-tuned version of [FiveC/BartTay](https://huggingface.co/FiveC/BartTay) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1129
	- Sacrebleu: 31.5508
	- Chrf++: 41.2951
	- Bertscore F1: 0.8234

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 3
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Sacrebleu \| Chrf++ \| Bertscore F1 \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:---------:\|:-------:\|:------------:\|
	\| 0.2708 \| 0.0999 \| 548 \| 0.1762 \| 6.1958 \| 14.3337 \| 0.7402 \|
	\| 0.1967 \| 0.1998 \| 1096 \| 0.1543 \| 13.0595 \| 22.6490 \| 0.7681 \|
	\| 0.1653 \| 0.2997 \| 1644 \| 0.1433 \| 16.4281 \| 26.5054 \| 0.7790 \|
	\| 0.148 \| 0.3996 \| 2192 \| 0.1372 \| 18.6916 \| 29.1575 \| 0.7880 \|
	\| 0.1334 \| 0.4995 \| 2740 \| 0.1309 \| 20.7037 \| 30.9321 \| 0.7929 \|
	\| 0.1234 \| 0.5995 \| 3288 \| 0.1291 \| 21.8427 \| 31.8394 \| 0.7953 \|
	\| 0.1153 \| 0.6994 \| 3836 \| 0.1260 \| 23.2862 \| 33.1552 \| 0.7983 \|
	\| 0.1123 \| 0.7993 \| 4384 \| 0.1231 \| 24.3244 \| 34.1894 \| 0.8022 \|
	\| 0.1043 \| 0.8992 \| 4932 \| 0.1210 \| 25.3951 \| 35.1031 \| 0.8037 \|
	\| 0.0982 \| 0.9991 \| 5480 \| 0.1201 \| 25.6618 \| 35.4972 \| 0.8048 \|
	\| 0.0869 \| 1.0990 \| 6028 \| 0.1193 \| 25.8156 \| 35.9535 \| 0.8083 \|
	\| 0.0857 \| 1.1989 \| 6576 \| 0.1179 \| 26.9340 \| 36.8392 \| 0.8107 \|
	\| 0.0815 \| 1.2988 \| 7124 \| 0.1179 \| 27.6491 \| 37.4053 \| 0.8114 \|
	\| 0.08 \| 1.3987 \| 7672 \| 0.1172 \| 28.0729 \| 37.7781 \| 0.8126 \|
	\| 0.0781 \| 1.4986 \| 8220 \| 0.1158 \| 28.3941 \| 38.2541 \| 0.8146 \|
	\| 0.0751 \| 1.5985 \| 8768 \| 0.1145 \| 28.9190 \| 38.6033 \| 0.8150 \|
	\| 0.0743 \| 1.6985 \| 9316 \| 0.1133 \| 29.5192 \| 39.0347 \| 0.8163 \|
	\| 0.0712 \| 1.7984 \| 9864 \| 0.1131 \| 29.9176 \| 39.4411 \| 0.8181 \|
	\| 0.0714 \| 1.8983 \| 10412 \| 0.1122 \| 30.1874 \| 39.6889 \| 0.8190 \|
	\| 0.069 \| 1.9982 \| 10960 \| 0.1115 \| 30.7540 \| 40.5206 \| 0.8205 \|
	\| 0.0591 \| 2.0981 \| 11508 \| 0.1148 \| 30.3703 \| 40.1852 \| 0.8208 \|
	\| 0.059 \| 2.1980 \| 12056 \| 0.1139 \| 30.3753 \| 40.3092 \| 0.8220 \|
	\| 0.0583 \| 2.2979 \| 12604 \| 0.1140 \| 30.8041 \| 40.6839 \| 0.8216 \|
	\| 0.058 \| 2.3978 \| 13152 \| 0.1129 \| 31.5508 \| 41.2951 \| 0.8234 \|
	\| 0.0577 \| 2.4977 \| 13700 \| 0.1126 \| 30.9483 \| 40.6855 \| 0.8231 \|
	\| 0.0564 \| 2.5976 \| 14248 \| 0.1123 \| 30.8206 \| 40.7765 \| 0.8235 \|
	\| 0.0571 \| 2.6975 \| 14796 \| 0.1118 \| 31.1163 \| 41.0993 \| 0.8230 \|


	### Framework versions

	- Transformers 4.57.1
	- Pytorch 2.8.0+cu126
	- Datasets 4.0.0
	- Tokenizers 0.22.1