flan-t5-large-train_r_aug_nq

This model is a fine-tuned version of google/flan-t5-large on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 2
total_train_batch_size: 8
total_eval_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3.0

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Gen Len
0.8443	0.2426	1000	0.7528	0.0327	43.0732
0.7599	0.4853	2000	0.7291	0.0327	44.8394
0.8173	0.7279	3000	0.7155	0.0327	44.2668
0.7728	0.9705	4000	0.7051	0.0327	46.6625
0.7575	1.2130	5000	0.7014	0.0327	45.4954
0.6728	1.4557	6000	0.6959	0.0327	44.1418
0.6547	1.6983	7000	0.6901	0.0327	44.7631
0.7072	1.9409	8000	0.6856	0.0327	48.0779
0.6204	2.1834	9000	0.6911	0.0327	46.9579
0.6185	2.4261	10000	0.6871	0.0327	45.6444
0.5904	2.6687	11000	0.6879	0.0327	46.1530
0.6852	2.9113	12000	0.6847	0.0327	46.8571

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

(209)

this model