93dac85b8013dcf7e10d3d38ebb59e2a

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [en-it] dataset. It achieves the following results on the evaluation set:

Loss: 2.2872
Data Size: 1.0
Epoch Runtime: 349.0586
Bleu: 1.8235

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	235.5034	0	25.3887	0.0103
No log	1	808	155.7481	0.0078	28.3509	0.0090
No log	2	1616	85.2883	0.0156	31.4181	0.0014
No log	3	2424	28.7324	0.0312	37.6448	0.0034
2.4881	4	3232	15.2552	0.0625	48.8077	0.1189
21.1557	5	4040	12.0810	0.125	69.8835	0.2116
14.5938	6	4848	8.4854	0.25	110.5674	0.0080
9.8749	7	5656	6.7510	0.5	188.5840	0.0297
6.8205	8.0	6464	4.6572	1.0	347.2873	0.0850
5.4199	9.0	7272	3.9670	1.0	348.2967	0.1064
4.6668	10.0	8080	3.4696	1.0	345.4658	0.1911
4.2211	11.0	8888	3.2600	1.0	349.3905	0.2933
3.8767	12.0	9696	3.1371	1.0	349.5232	0.3225
3.6482	13.0	10504	3.0063	1.0	348.4620	0.4492
3.4935	14.0	11312	2.9206	1.0	345.3365	0.4697
3.3373	15.0	12120	2.8382	1.0	348.1898	0.6622
3.2697	16.0	12928	2.7994	1.0	348.8646	0.4981
3.1322	17.0	13736	2.7335	1.0	345.8731	0.6365
3.0663	18.0	14544	2.6849	1.0	349.8127	0.7391
2.9875	19.0	15352	2.6387	1.0	349.2950	0.7679
2.9183	20.0	16160	2.6118	1.0	351.4093	0.8138
2.8317	21.0	16968	2.5693	1.0	347.5031	0.9289
2.812	22.0	17776	2.5282	1.0	346.9961	0.9154
2.7867	23.0	18584	2.5255	1.0	349.6256	0.9863
2.683	24.0	19392	2.4789	1.0	351.8222	1.0570
2.6524	25.0	20200	2.4598	1.0	346.3992	1.0322
2.5791	26.0	21008	2.4307	1.0	347.3062	1.1019
2.5693	27.0	21816	2.4167	1.0	348.5370	1.1795
2.4995	28.0	22624	2.4055	1.0	347.6225	1.1704
2.4903	29.0	23432	2.3919	1.0	349.8536	1.2286
2.4173	30.0	24240	2.3657	1.0	351.1305	1.2316
2.4013	31.0	25048	2.3593	1.0	350.8515	1.3380
2.3653	32.0	25856	2.3391	1.0	347.7500	1.3574
2.3307	33.0	26664	2.3380	1.0	347.4459	1.3520
2.275	34.0	27472	2.3258	1.0	348.2127	1.4165
2.2421	35.0	28280	2.3144	1.0	350.9273	1.4770
2.2205	36.0	29088	2.2968	1.0	348.9591	1.5379
2.1887	37.0	29896	2.2962	1.0	347.4848	1.5300
2.1615	38.0	30704	2.2939	1.0	347.7210	1.5632
2.0893	39.0	31512	2.2868	1.0	346.9154	1.5886
2.0711	40.0	32320	2.2774	1.0	345.7959	1.5778
2.0476	41.0	33128	2.2797	1.0	350.1582	1.6413
2.0284	42.0	33936	2.2766	1.0	347.6871	1.6602
1.9713	43.0	34744	2.2738	1.0	345.1102	1.7006
1.9567	44.0	35552	2.2731	1.0	349.2087	1.7527
1.9156	45.0	36360	2.2725	1.0	347.9890	1.7388
1.9013	46.0	37168	2.2802	1.0	347.8241	1.7679
1.8791	47.0	37976	2.2744	1.0	347.6741	1.8353
1.8446	48.0	38784	2.2814	1.0	349.0220	1.8127
1.801	49.0	39592	2.2872	1.0	349.0586	1.8235

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 2

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/93dac85b8013dcf7e10d3d38ebb59e2a

Base model

google/long-t5-local-large

Finetuned

(38)

this model