| | --- |
| | library_name: transformers |
| | license: mit |
| | base_model: FiveC/BartTay |
| | tags: |
| | - generated_from_trainer |
| | metrics: |
| | - sacrebleu |
| | model-index: |
| | - name: BartTayFinal-test |
| | results: [] |
| | --- |
| | |
| | <!-- This model card has been generated automatically according to the information the Trainer had access to. You |
| | should probably proofread and complete it, then remove this comment. --> |
| |
|
| | # BartTayFinal-test |
| |
|
| | This model is a fine-tuned version of [FiveC/BartTay](https://huggingface.co/FiveC/BartTay) on an unknown dataset. |
| | It achieves the following results on the evaluation set: |
| | - Loss: 0.1129 |
| | - Sacrebleu: 31.5508 |
| | - Chrf++: 41.2951 |
| | - Bertscore F1: 0.8234 |
| |
|
| | ## Model description |
| |
|
| | More information needed |
| |
|
| | ## Intended uses & limitations |
| |
|
| | More information needed |
| |
|
| | ## Training and evaluation data |
| |
|
| | More information needed |
| |
|
| | ## Training procedure |
| |
|
| | ### Training hyperparameters |
| |
|
| | The following hyperparameters were used during training: |
| | - learning_rate: 2e-05 |
| | - train_batch_size: 16 |
| | - eval_batch_size: 16 |
| | - seed: 42 |
| | - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
| | - lr_scheduler_type: linear |
| | - num_epochs: 3 |
| | - mixed_precision_training: Native AMP |
| | |
| | ### Training results |
| | |
| | | Training Loss | Epoch | Step | Validation Loss | Sacrebleu | Chrf++ | Bertscore F1 | |
| | |:-------------:|:------:|:-----:|:---------------:|:---------:|:-------:|:------------:| |
| | | 0.2708 | 0.0999 | 548 | 0.1762 | 6.1958 | 14.3337 | 0.7402 | |
| | | 0.1967 | 0.1998 | 1096 | 0.1543 | 13.0595 | 22.6490 | 0.7681 | |
| | | 0.1653 | 0.2997 | 1644 | 0.1433 | 16.4281 | 26.5054 | 0.7790 | |
| | | 0.148 | 0.3996 | 2192 | 0.1372 | 18.6916 | 29.1575 | 0.7880 | |
| | | 0.1334 | 0.4995 | 2740 | 0.1309 | 20.7037 | 30.9321 | 0.7929 | |
| | | 0.1234 | 0.5995 | 3288 | 0.1291 | 21.8427 | 31.8394 | 0.7953 | |
| | | 0.1153 | 0.6994 | 3836 | 0.1260 | 23.2862 | 33.1552 | 0.7983 | |
| | | 0.1123 | 0.7993 | 4384 | 0.1231 | 24.3244 | 34.1894 | 0.8022 | |
| | | 0.1043 | 0.8992 | 4932 | 0.1210 | 25.3951 | 35.1031 | 0.8037 | |
| | | 0.0982 | 0.9991 | 5480 | 0.1201 | 25.6618 | 35.4972 | 0.8048 | |
| | | 0.0869 | 1.0990 | 6028 | 0.1193 | 25.8156 | 35.9535 | 0.8083 | |
| | | 0.0857 | 1.1989 | 6576 | 0.1179 | 26.9340 | 36.8392 | 0.8107 | |
| | | 0.0815 | 1.2988 | 7124 | 0.1179 | 27.6491 | 37.4053 | 0.8114 | |
| | | 0.08 | 1.3987 | 7672 | 0.1172 | 28.0729 | 37.7781 | 0.8126 | |
| | | 0.0781 | 1.4986 | 8220 | 0.1158 | 28.3941 | 38.2541 | 0.8146 | |
| | | 0.0751 | 1.5985 | 8768 | 0.1145 | 28.9190 | 38.6033 | 0.8150 | |
| | | 0.0743 | 1.6985 | 9316 | 0.1133 | 29.5192 | 39.0347 | 0.8163 | |
| | | 0.0712 | 1.7984 | 9864 | 0.1131 | 29.9176 | 39.4411 | 0.8181 | |
| | | 0.0714 | 1.8983 | 10412 | 0.1122 | 30.1874 | 39.6889 | 0.8190 | |
| | | 0.069 | 1.9982 | 10960 | 0.1115 | 30.7540 | 40.5206 | 0.8205 | |
| | | 0.0591 | 2.0981 | 11508 | 0.1148 | 30.3703 | 40.1852 | 0.8208 | |
| | | 0.059 | 2.1980 | 12056 | 0.1139 | 30.3753 | 40.3092 | 0.8220 | |
| | | 0.0583 | 2.2979 | 12604 | 0.1140 | 30.8041 | 40.6839 | 0.8216 | |
| | | 0.058 | 2.3978 | 13152 | 0.1129 | 31.5508 | 41.2951 | 0.8234 | |
| | | 0.0577 | 2.4977 | 13700 | 0.1126 | 30.9483 | 40.6855 | 0.8231 | |
| | | 0.0564 | 2.5976 | 14248 | 0.1123 | 30.8206 | 40.7765 | 0.8235 | |
| | | 0.0571 | 2.6975 | 14796 | 0.1118 | 31.1163 | 41.0993 | 0.8230 | |
| | |
| | |
| | ### Framework versions |
| | |
| | - Transformers 4.57.1 |
| | - Pytorch 2.8.0+cu126 |
| | - Datasets 4.0.0 |
| | - Tokenizers 0.22.1 |
| | |