End of training

12bffc6 verified 6 months ago

3.19 kB

library_name: transformers
license: other
base_model: facebook/opt-350m
tags:
  - generated_from_trainer
model-index:
  - name: 63a98eda63e5fbc2cdec5b0564ea1a23
    results: []

63a98eda63e5fbc2cdec5b0564ea1a23

This model is a fine-tuned version of facebook/opt-350m on the nyu-mll/glue [stsb] dataset. It achieves the following results on the evaluation set:

Loss: 0.6123
Data Size: 1.0
Epoch Runtime: 29.0491
Mse: 0.6126
Mae: 0.5883
R2: 0.7260

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Mse	Mae	R2
No log	0	0	25.4457	0	2.9664	25.4477	4.5155	-10.3837
No log	1	179	18.6834	0.0078	3.3274	18.6837	3.8334	-7.3579
No log	2	358	2.5488	0.0156	3.4913	2.5497	1.3295	-0.1406
No log	3	537	3.2142	0.0312	4.4894	3.2148	1.4542	-0.4381
No log	4	716	1.3107	0.0625	5.5536	1.3111	0.9318	0.4135
No log	5	895	1.0700	0.125	7.2865	1.0703	0.8595	0.5212
0.2005	6	1074	0.7364	0.25	10.4542	0.7369	0.6970	0.6704
0.8128	7	1253	0.6802	0.5	16.9874	0.6803	0.6656	0.6957
0.6184	8.0	1432	0.6413	1.0	29.6578	0.6416	0.6342	0.7130
0.372	9.0	1611	0.5367	1.0	28.9502	0.5370	0.5762	0.7598
0.2577	10.0	1790	0.5563	1.0	28.4188	0.5564	0.5878	0.7511
0.1893	11.0	1969	0.5885	1.0	29.1525	0.5887	0.6265	0.7366
0.1601	12.0	2148	0.7057	1.0	28.6711	0.7058	0.6645	0.6843
0.1365	13.0	2327	0.6123	1.0	29.0491	0.6126	0.5883	0.7260

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.3.0
Tokenizers 0.22.1