038d5148187e070b975e9031342ef73a

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [fr-pt] dataset. It achieves the following results on the evaluation set:

Loss: 4.5345
Data Size: 1.0
Epoch Runtime: 20.1158
Bleu: 0.7389

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	236.7255	0	1.7678	0.0204
No log	1	31	219.1461	0.0078	2.4638	0.0188
No log	2	62	201.9752	0.0156	3.1682	0.0124
No log	3	93	187.6383	0.0312	4.4594	0.0122
No log	4	124	163.7096	0.0625	6.4885	0.0215
No log	5	155	130.6493	0.125	8.8677	0.0198
No log	6	186	79.7999	0.25	11.0911	0.0040
19.2232	7	217	37.5892	0.5	15.1334	0.0052
19.2232	8.0	248	17.3204	1.0	21.9644	0.0283
27.1063	9.0	279	14.9082	1.0	20.2933	0.2114
23.9019	10.0	310	12.7926	1.0	20.2024	0.2739
23.9019	11.0	341	11.3982	1.0	19.4193	0.2784
18.8143	12.0	372	10.3658	1.0	19.5572	0.0558
16.1903	13.0	403	9.2900	1.0	20.0680	0.1722
16.1903	14.0	434	9.6809	1.0	20.0169	0.0760
14.6433	15.0	465	8.9208	1.0	19.4845	0.0727
14.6433	16.0	496	8.0822	1.0	19.7651	0.0400
13.4211	17.0	527	8.3107	1.0	19.3867	0.0532
12.4357	18.0	558	8.0146	1.0	19.9804	0.0798
12.4357	19.0	589	7.4550	1.0	20.1282	0.2350
11.5437	20.0	620	7.0633	1.0	20.3499	0.3080
10.7571	21.0	651	6.6771	1.0	19.6223	0.2713
10.7571	22.0	682	6.4946	1.0	19.9972	0.3405
10.1704	23.0	713	6.5607	1.0	19.7047	0.3599
10.1704	24.0	744	6.3228	1.0	20.5115	0.4006
9.5748	25.0	775	6.2065	1.0	20.2679	0.4567
9.1474	26.0	806	6.1813	1.0	19.4314	0.2706
9.1474	27.0	837	6.0840	1.0	19.2637	0.3095
8.7856	28.0	868	5.8688	1.0	19.8345	0.4495
8.7856	29.0	899	5.6269	1.0	19.7177	0.5039
8.3737	30.0	930	5.6042	1.0	19.7356	0.5152
8.0599	31.0	961	5.6617	1.0	19.7495	0.4316
8.0599	32.0	992	5.7713	1.0	19.9040	0.5404
7.809	33.0	1023	5.4166	1.0	20.1333	0.5372
7.4953	34.0	1054	5.3529	1.0	19.5804	0.5465
7.4953	35.0	1085	5.4388	1.0	19.7150	0.5503
7.1911	36.0	1116	5.0335	1.0	19.7163	0.5790
7.1911	37.0	1147	5.1159	1.0	19.9428	0.6223
6.9716	38.0	1178	5.0591	1.0	20.3290	0.4831
6.7211	39.0	1209	4.8556	1.0	20.0206	0.6963
6.7211	40.0	1240	5.0075	1.0	20.1885	0.6027
6.5236	41.0	1271	4.7169	1.0	21.1392	0.7565
6.3467	42.0	1302	4.7961	1.0	20.7119	0.6435
6.3467	43.0	1333	4.6043	1.0	19.7915	0.6827
6.1671	44.0	1364	4.5811	1.0	19.8226	0.7857
6.1671	45.0	1395	4.7868	1.0	19.8787	0.6117
6.0261	46.0	1426	4.5726	1.0	20.7483	0.6840
5.8656	47.0	1457	4.4508	1.0	20.1728	0.7430
5.8656	48.0	1488	4.4236	1.0	19.9059	0.8194
5.6751	49.0	1519	4.4156	1.0	20.3834	0.7308
5.5485	50.0	1550	4.5345	1.0	20.1158	0.7389

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/038d5148187e070b975e9031342ef73a

Base model

google/long-t5-local-large

Finetuned

(38)

this model