549aaa3ce14cc70304a30ca95b87074c

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [en-fi] dataset. It achieves the following results on the evaluation set:

Loss: 2.9341
Data Size: 1.0
Epoch Runtime: 46.6996
Bleu: 0.9318

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	211.0632	0	3.5779	0.0016
No log	1	91	188.7779	0.0078	4.1075	0.0019
No log	2	182	169.3628	0.0156	5.8967	0.0022
No log	3	273	149.8084	0.0312	7.5024	0.0022
No log	4	364	117.3038	0.0625	9.7911	0.0021
No log	5	455	67.4181	0.125	13.6960	0.0019
No log	6	546	29.9486	0.25	18.0287	0.0028
9.5787	7	637	14.3971	0.5	28.2492	0.0080
19.5655	8.0	728	10.7885	1.0	47.3220	0.0082
14.8619	9.0	819	8.7578	1.0	46.3086	0.0080
12.7107	10.0	910	8.1946	1.0	46.3137	0.0144
11.1516	11.0	1001	7.0157	1.0	46.2185	0.0318
10.7591	12.0	1092	6.7510	1.0	45.4156	0.0190
9.8214	13.0	1183	6.1254	1.0	46.6275	0.0629
9.0841	14.0	1274	6.1219	1.0	45.7993	0.0495
8.5117	15.0	1365	5.4160	1.0	46.0273	0.0841
8.0009	16.0	1456	5.3270	1.0	46.2666	0.1435
7.7391	17.0	1547	5.1370	1.0	46.2303	0.1310
7.3406	18.0	1638	4.9944	1.0	46.1756	0.1327
6.9568	19.0	1729	4.6342	1.0	45.9808	0.2060
6.6468	20.0	1820	4.5277	1.0	46.7813	0.2704
6.3802	21.0	1911	4.3945	1.0	46.3165	0.2054
6.0991	22.0	2002	4.2401	1.0	46.1848	0.1644
5.9951	23.0	2093	4.0383	1.0	45.9308	0.3349
5.7585	24.0	2184	4.0444	1.0	46.1250	0.3045
5.5825	25.0	2275	4.0388	1.0	46.0497	0.2623
5.3925	26.0	2366	3.9272	1.0	45.7610	0.4223
5.2605	27.0	2457	3.7667	1.0	46.2910	0.3692
5.1225	28.0	2548	3.7613	1.0	45.6301	0.4512
4.9741	29.0	2639	3.6562	1.0	46.4286	0.3588
4.8417	30.0	2730	3.5531	1.0	46.1431	0.5768
4.6741	31.0	2821	3.6014	1.0	46.8742	0.4160
4.6209	32.0	2912	3.4647	1.0	46.7312	0.6189
4.4791	33.0	3003	3.4088	1.0	46.6921	0.7235
4.346	34.0	3094	3.3616	1.0	45.7728	0.5786
4.311	35.0	3185	3.3856	1.0	47.5342	0.5463
4.1986	36.0	3276	3.3260	1.0	46.0606	0.6678
4.1229	37.0	3367	3.2019	1.0	46.4823	0.7473
4.0017	38.0	3458	3.2359	1.0	46.6339	0.6610
3.9321	39.0	3549	3.1598	1.0	46.3126	0.7617
3.8785	40.0	3640	3.1266	1.0	45.4446	0.8805
3.7769	41.0	3731	3.0599	1.0	46.1482	0.7153
3.7148	42.0	3822	3.0918	1.0	45.4459	0.8746
3.6051	43.0	3913	3.0205	1.0	46.5096	0.9528
3.5761	44.0	4004	3.0466	1.0	46.2574	0.8161
3.5664	45.0	4095	2.9364	1.0	46.3092	1.0224
3.4499	46.0	4186	2.9338	1.0	45.6281	0.9838
3.3732	47.0	4277	2.9169	1.0	45.4441	0.9263
3.3295	48.0	4368	2.9877	1.0	46.7569	0.7916
3.2911	49.0	4459	2.9030	1.0	46.8190	0.7812
3.2407	50.0	4550	2.9341	1.0	46.6996	0.9318

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/549aaa3ce14cc70304a30ca95b87074c

Base model

google/long-t5-local-large

Finetuned

(38)

this model