devvanshhh
/

flan-xl-gen2

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

flan-xl-gen2 / README.md

devvanshhh's picture

Model save

077d1e2 over 2 years ago

|

history blame contribute delete

2.53 kB

	---
	base_model: ybelkada/flan-t5-xl-sharded-bf16
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	model-index:
	- name: flan-xl-gen2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# flan-xl-gen2

	This model is a fine-tuned version of [ybelkada/flan-t5-xl-sharded-bf16](https://huggingface.co/ybelkada/flan-t5-xl-sharded-bf16) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7134
	- Rouge1: 32.8362
	- Rouge2: 24.6174
	- Rougel: 29.4825
	- Rougelsum: 29.8057
	- Gen Len: 10.8602

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \| Gen Len \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:-------:\|:-------:\|:-------:\|:---------:\|:-------:\|
	\| No log \| 1.0 \| 362 \| 16.9256 \| 19.0988 \| 13.2345 \| 16.7051 \| 16.8227 \| 14.5155 \|
	\| 21.9637 \| 2.0 \| 724 \| 0.9058 \| 25.6321 \| 19.4333 \| 22.9915 \| 23.0319 \| 12.2298 \|
	\| 1.1153 \| 3.0 \| 1086 \| 0.8224 \| 33.772 \| 27.2536 \| 30.9184 \| 30.9024 \| 9.3851 \|
	\| 1.1153 \| 4.0 \| 1448 \| 0.7790 \| 31.8945 \| 24.0796 \| 28.6922 \| 28.9082 \| 10.7081 \|
	\| 0.8196 \| 5.0 \| 1810 \| 0.7526 \| 32.0479 \| 23.9638 \| 28.7508 \| 28.9928 \| 10.9565 \|
	\| 0.768 \| 6.0 \| 2172 \| 0.7372 \| 32.4934 \| 24.2711 \| 29.1369 \| 29.4352 \| 10.9130 \|
	\| 0.7461 \| 7.0 \| 2534 \| 0.7262 \| 33.7013 \| 25.5198 \| 30.3086 \| 30.6278 \| 10.4938 \|
	\| 0.7461 \| 8.0 \| 2896 \| 0.7187 \| 33.2769 \| 25.0711 \| 29.8857 \| 30.1898 \| 10.6925 \|
	\| 0.7247 \| 9.0 \| 3258 \| 0.7143 \| 32.9304 \| 24.7808 \| 29.6111 \| 29.9039 \| 10.8075 \|
	\| 0.7282 \| 10.0 \| 3620 \| 0.7134 \| 32.8362 \| 24.6174 \| 29.4825 \| 29.8057 \| 10.8602 \|


	### Framework versions

	- Transformers 4.36.0.dev0
	- Pytorch 2.1.0+cu118
	- Datasets 2.15.0
	- Tokenizers 0.15.0