15ae4bc006637eef4fa384d44af115ec

This model is a fine-tuned version of facebook/opt-6.7b on the nyu-mll/glue [stsb] dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Mse	Mae	R2
No log	0	0	6.8833	0	4.1747	6.8847	2.2341	-2.0798
No log	1	179	72.1114	0.0078	5.1186	72.1107	8.2475	-31.2577
No log	2	358	2.5926	0.0156	17.1286	2.5933	1.3334	-0.1601
No log	3	537	2.7606	0.0312	34.0759	2.7613	1.3607	-0.2352
No log	4	716	10.4892	0.0625	40.6204	10.4896	2.8524	-3.6924
No log	5	895	2.2045	0.125	53.6411	2.2053	1.2639	0.0135
0.6145	6	1074	2.9288	0.25	70.8903	2.9295	1.4042	-0.3105
2.2497	7	1253	2.2179	0.5	74.5485	2.2187	1.2476	0.0075
1.8535	8.0	1432	2.1384	1.0	96.5732	2.1390	1.1953	0.0432
1.0909	9.0	1611	1.9532	1.0	97.3312	1.9538	1.1242	0.1260
0.7252	10.0	1790	2.2718	1.0	84.0102	2.2723	1.2055	-0.0165
0.4713	11.0	1969	1.8958	1.0	92.1167	1.8962	1.0977	0.1518
0.4459	12.0	2148	1.8245	1.0	94.9740	1.8251	1.0854	0.1836
0.2641	13.0	2327	2.0616	1.0	84.6851	2.0620	1.1344	0.0776
0.225	14.0	2506	1.8006	1.0	97.4633	1.8010	1.0588	0.1943
0.1828	15.0	2685	1.8810	1.0	84.5028	1.8814	1.0806	0.1584
0.1682	16.0	2864	1.8792	1.0	96.2382	1.8795	1.0832	0.1592
0.1424	17.0	3043	2.0620	1.0	97.8154	2.0622	1.1171	0.0775
0.1453	18.0	3222	2.2519	1.0	89.2239	2.2523	1.1908	-0.0075

Safetensors

Model size

2B params

Tensor type

F32

Base model

Finetuned

(24)

this model