LED_sum_challenge / README.md

End of training

e1da713 verified 6 months ago

5.17 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: allenai/led-base-16384
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	- bleu
	- precision
	- recall
	- f1
	model-index:
	- name: LED_sum_challenge
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# LED_sum_challenge

	This model is a fine-tuned version of [allenai/led-base-16384](https://huggingface.co/allenai/led-base-16384) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.8042
	- Rouge1: 0.2495
	- Rouge2: 0.0724
	- Rougel: 0.1912
	- Rougelsum: 0.192
	- Gen Len: 20.5
	- Bleu: 0.0232
	- Precisions: 0.0926
	- Brevity Penalty: 0.6016
	- Length Ratio: 0.6631
	- Translation Length: 801.0
	- Reference Length: 1208.0
	- Precision: 0.8797
	- Recall: 0.8676
	- F1: 0.8736
	- Hashcode: roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0)

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 4
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 10
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \| Gen Len \| Bleu \| Precisions \| Brevity Penalty \| Length Ratio \| Translation Length \| Reference Length \| Precision \| Recall \| F1 \| Hashcode \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:------:\|:------:\|:---------:\|:-------:\|:------:\|:----------:\|:---------------:\|:------------:\|:------------------:\|:----------------:\|:---------:\|:------:\|:------:\|:---------------------------------------------------------:\|
	\| No log \| 1.0 \| 7 \| 8.1739 \| 0.2255 \| 0.0527 \| 0.1686 \| 0.1688 \| 21.0 \| 0.0157 \| 0.069 \| 0.6607 \| 0.707 \| 854.0 \| 1208.0 \| 0.8668 \| 0.8574 \| 0.862 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 2.0 \| 14 \| 6.9457 \| 0.2251 \| 0.0588 \| 0.1702 \| 0.1685 \| 20.7 \| 0.0171 \| 0.0737 \| 0.6408 \| 0.6921 \| 836.0 \| 1208.0 \| 0.8737 \| 0.8597 \| 0.8666 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 3.0 \| 21 \| 5.4862 \| 0.2391 \| 0.0632 \| 0.181 \| 0.1805 \| 20.52 \| 0.021 \| 0.0825 \| 0.6431 \| 0.6937 \| 838.0 \| 1208.0 \| 0.8798 \| 0.862 \| 0.8708 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 4.0 \| 28 \| 4.7435 \| 0.243 \| 0.0758 \| 0.1901 \| 0.1892 \| 20.72 \| 0.0266 \| 0.0886 \| 0.6095 \| 0.6689 \| 808.0 \| 1208.0 \| 0.8775 \| 0.8662 \| 0.8717 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 5.0 \| 35 \| 4.3805 \| 0.2557 \| 0.0788 \| 0.1924 \| 0.1921 \| 20.48 \| 0.0248 \| 0.1003 \| 0.5857 \| 0.6515 \| 787.0 \| 1208.0 \| 0.8811 \| 0.8686 \| 0.8747 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 6.0 \| 42 \| 4.1441 \| 0.2485 \| 0.0701 \| 0.1886 \| 0.1894 \| 20.52 \| 0.0209 \| 0.0929 \| 0.5982 \| 0.6606 \| 798.0 \| 1208.0 \| 0.8816 \| 0.868 \| 0.8747 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 7.0 \| 49 \| 3.9952 \| 0.2574 \| 0.0713 \| 0.1994 \| 0.1997 \| 20.54 \| 0.0213 \| 0.0954 \| 0.6073 \| 0.6672 \| 806.0 \| 1208.0 \| 0.8811 \| 0.8689 \| 0.8749 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 8.0 \| 56 \| 3.8994 \| 0.2524 \| 0.067 \| 0.192 \| 0.192 \| 20.58 \| 0.0203 \| 0.0908 \| 0.614 \| 0.6722 \| 812.0 \| 1208.0 \| 0.8782 \| 0.8675 \| 0.8727 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 9.0 \| 63 \| 3.8355 \| 0.2512 \| 0.0676 \| 0.1917 \| 0.1925 \| 20.54 \| 0.0201 \| 0.0901 \| 0.6062 \| 0.6664 \| 805.0 \| 1208.0 \| 0.8793 \| 0.8681 \| 0.8736 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|
	\| No log \| 10.0 \| 70 \| 3.8042 \| 0.2495 \| 0.0724 \| 0.1912 \| 0.192 \| 20.5 \| 0.0232 \| 0.0926 \| 0.6016 \| 0.6631 \| 801.0 \| 1208.0 \| 0.8797 \| 0.8676 \| 0.8736 \| roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.53.0) \|


	### Framework versions

	- Transformers 4.53.0
	- Pytorch 2.7.0+cu126
	- Datasets 3.6.0
	- Tokenizers 0.21.1