862ec78bac8109fd96f6d604da3f9089

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [en-sv] dataset. It achieves the following results on the evaluation set:

Loss: 3.0722
Data Size: 1.0
Epoch Runtime: 39.7266
Bleu: 0.5534

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	223.9917	0	3.2371	0.0024
No log	1	77	202.5292	0.0078	3.8995	0.0043
No log	2	154	178.5526	0.0156	5.0033	0.0026
No log	3	231	154.7477	0.0312	6.6561	0.0027
No log	4	308	122.7218	0.0625	8.8027	0.0024
No log	5	385	71.8897	0.125	12.3707	0.0040
11.0887	6	462	29.8310	0.25	16.7527	0.0041
15.6486	7	539	15.4986	0.5	25.3639	0.0030
20.3585	8.0	616	11.0047	1.0	40.4686	0.0045
17.1318	9.0	693	9.3429	1.0	40.0204	0.0144
13.9398	10.0	770	8.7452	1.0	40.0600	0.0107
12.9449	11.0	847	7.9409	1.0	41.2014	0.0120
11.4343	12.0	924	7.5845	1.0	40.5638	0.0138
10.4966	13.0	1001	6.5332	1.0	40.3904	0.0194
10.1049	14.0	1078	6.4489	1.0	40.2994	0.0356
9.3497	15.0	1155	6.2815	1.0	39.4442	0.0545
9.0683	16.0	1232	5.8145	1.0	39.6385	0.0658
8.4766	17.0	1309	5.8225	1.0	39.6333	0.0533
8.2237	18.0	1386	5.3779	1.0	39.1000	0.1326
7.806	19.0	1463	5.2641	1.0	39.4449	0.1065
7.5591	20.0	1540	5.1220	1.0	39.8956	0.0928
7.1678	21.0	1617	5.0954	1.0	38.9583	0.0739
7.0647	22.0	1694	4.8270	1.0	39.4165	0.0900
6.6655	23.0	1771	4.7228	1.0	38.8910	0.1248
6.4936	24.0	1848	4.5843	1.0	39.0794	0.1653
6.2763	25.0	1925	4.6011	1.0	38.8424	0.2135
6.0459	26.0	2002	4.3207	1.0	39.4661	0.2475
5.8962	27.0	2079	4.1732	1.0	38.6549	0.2606
5.6867	28.0	2156	4.2663	1.0	39.7735	0.2386
5.5789	29.0	2233	4.2129	1.0	38.3106	0.1862
5.3694	30.0	2310	4.0500	1.0	39.1182	0.3258
5.2994	31.0	2387	3.8977	1.0	39.5137	0.2466
5.097	32.0	2464	3.7638	1.0	39.5681	0.3649
5.0502	33.0	2541	3.6888	1.0	38.6806	0.4253
4.9306	34.0	2618	3.8535	1.0	39.7200	0.2963
4.7802	35.0	2695	3.6710	1.0	38.4306	0.4275
4.6324	36.0	2772	3.6098	1.0	38.7097	0.3102
4.6086	37.0	2849	3.6183	1.0	38.4791	0.3835
4.4612	38.0	2926	3.4896	1.0	39.1358	0.4617
4.3899	39.0	3003	3.4306	1.0	39.7051	0.4606
4.3095	40.0	3080	3.4122	1.0	38.4065	0.4714
4.1906	41.0	3157	3.3982	1.0	39.5659	0.3783
4.1157	42.0	3234	3.3071	1.0	39.3906	0.5062
4.0427	43.0	3311	3.3112	1.0	39.0938	0.4346
3.9858	44.0	3388	3.3214	1.0	39.8291	0.4165
3.894	45.0	3465	3.3178	1.0	39.3775	0.4522
3.8483	46.0	3542	3.2357	1.0	40.2369	0.4297
3.7456	47.0	3619	3.1341	1.0	39.2569	0.5572
3.7164	48.0	3696	3.0997	1.0	39.4336	0.6396
3.6318	49.0	3773	3.0787	1.0	40.4189	0.5809
3.5901	50.0	3850	3.0722	1.0	39.7266	0.5534

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/862ec78bac8109fd96f6d604da3f9089

Base model

google/long-t5-local-large

Finetuned

(38)

this model