LED_sum_outcome / README.md

End of training

feae792 verified 6 months ago

5.17 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: allenai/led-base-16384
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	- bleu
	- precision
	- recall
	- f1
	model-index:
	- name: LED_sum_outcome
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# LED_sum_outcome

	This model is a fine-tuned version of [allenai/led-base-16384](https://huggingface.co/allenai/led-base-16384) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.4730
	- Rouge1: 0.3601
	- Rouge2: 0.1515
	- Rougel: 0.3017
	- Rougelsum: 0.301
	- Gen Len: 20.36
	- Bleu: 0.0595
	- Precisions: 0.1544
	- Brevity Penalty: 0.6108
	- Length Ratio: 0.6698
	- Translation Length: 785.0
	- Reference Length: 1172.0
	- Precision: 0.8997
	- Recall: 0.8766
	- F1: 0.8879
	- Hashcode: roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0)

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 4
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 10
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \| Gen Len \| Bleu \| Precisions \| Brevity Penalty \| Length Ratio \| Translation Length \| Reference Length \| Precision \| Recall \| F1 \| Hashcode \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:------:\|:------:\|:---------:\|:-------:\|:------:\|:----------:\|:---------------:\|:------------:\|:------------------:\|:----------------:\|:---------:\|:------:\|:------:\|:---------------------------------------------------------:\|
	\| No log \| 1.0 \| 7 \| 7.7628 \| 0.2668 \| 0.0617 \| 0.2177 \| 0.2179 \| 21.0 \| 0.0211 \| 0.0856 \| 0.6935 \| 0.7321 \| 858.0 \| 1172.0 \| 0.8742 \| 0.86 \| 0.8669 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 2.0 \| 14 \| 6.5648 \| 0.3427 \| 0.124 \| 0.2806 \| 0.2804 \| 20.16 \| 0.0514 \| 0.1396 \| 0.6085 \| 0.6681 \| 783.0 \| 1172.0 \| 0.8991 \| 0.8717 \| 0.8851 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 3.0 \| 21 \| 5.1851 \| 0.3468 \| 0.1383 \| 0.282 \| 0.2807 \| 19.7 \| 0.0722 \| 0.1711 \| 0.578 \| 0.6459 \| 757.0 \| 1172.0 \| 0.9029 \| 0.8772 \| 0.8898 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 4.0 \| 28 \| 4.4398 \| 0.3475 \| 0.1299 \| 0.2825 \| 0.2821 \| 20.18 \| 0.0455 \| 0.1417 \| 0.598 \| 0.6604 \| 774.0 \| 1172.0 \| 0.8979 \| 0.8766 \| 0.887 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 5.0 \| 35 \| 4.0655 \| 0.3506 \| 0.1412 \| 0.2913 \| 0.2901 \| 19.94 \| 0.054 \| 0.1556 \| 0.5685 \| 0.6391 \| 749.0 \| 1172.0 \| 0.8987 \| 0.8772 \| 0.8877 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 6.0 \| 42 \| 3.8299 \| 0.356 \| 0.148 \| 0.295 \| 0.294 \| 20.38 \| 0.0616 \| 0.1566 \| 0.6073 \| 0.6672 \| 782.0 \| 1172.0 \| 0.9002 \| 0.8781 \| 0.8889 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 7.0 \| 49 \| 3.6727 \| 0.3645 \| 0.144 \| 0.2966 \| 0.296 \| 20.38 \| 0.0637 \| 0.1593 \| 0.6108 \| 0.6698 \| 785.0 \| 1172.0 \| 0.8987 \| 0.8774 \| 0.8878 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 8.0 \| 56 \| 3.5737 \| 0.3586 \| 0.146 \| 0.2948 \| 0.2941 \| 20.44 \| 0.0632 \| 0.1563 \| 0.6201 \| 0.6766 \| 793.0 \| 1172.0 \| 0.8965 \| 0.8762 \| 0.8862 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 9.0 \| 63 \| 3.5072 \| 0.3568 \| 0.1486 \| 0.2976 \| 0.2963 \| 20.36 \| 0.0598 \| 0.1536 \| 0.6189 \| 0.6758 \| 792.0 \| 1172.0 \| 0.8986 \| 0.8769 \| 0.8875 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 10.0 \| 70 \| 3.4730 \| 0.3601 \| 0.1515 \| 0.3017 \| 0.301 \| 20.36 \| 0.0595 \| 0.1544 \| 0.6108 \| 0.6698 \| 785.0 \| 1172.0 \| 0.8997 \| 0.8766 \| 0.8879 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|


	### Framework versions

	- Transformers 4.53.0
	- Pytorch 2.7.0+cu126
	- Datasets 3.6.0
	- Tokenizers 0.21.1