f480c99337292f41ae4e3aeb29048494

This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the nyu-mll/glue [mrpc] dataset. It achieves the following results on the evaluation set:

Loss: 18.6442
Data Size: 1.0
Epoch Runtime: 172.3950
Accuracy: 0.6008
F1 Macro: 0.5693
Rouge1: 0.6014
Rouge2: 0.0
Rougel: 0.6011
Rougelsum: 0.6008

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	5.4493	0	6.5133	0.6509	0.4760	0.6515	0.6504	0.6504
No log	1	114	282.0146	0.0078	7.6330	0.3349	0.2509	0.3343	0.3355	0.3349
No log	2	228	63.0560	0.0156	17.0010	0.6639	0.3990	0.6645	0.6639	0.6633
No log	3	342	31.5455	0.0312	29.1911	0.3349	0.2509	0.3343	0.3355	0.3349
1.624	4	456	8.6098	0.0625	42.8630	0.3349	0.2509	0.3343	0.3355	0.3349
1.624	5	570	3.5260	0.125	59.9365	0.6651	0.3994	0.6657	0.6645	0.6651
1.624	6	684	3.6857	0.25	76.1459	0.3349	0.2509	0.3343	0.3355	0.3349
0.9312	7	798	3.0130	0.5	109.3280	0.6651	0.3994	0.6657	0.6645	0.6651
2.917	8.0	912	2.5740	1.0	173.1737	0.6651	0.3994	0.6657	0.6645	0.6651
2.7939	9.0	1026	2.4549	1.0	173.7266	0.6810	0.4784	0.6810	0.6810	0.6810
1.8994	10.0	1140	2.8548	1.0	180.6811	0.6680	0.5804	0.6686	0.6675	0.6680
0.9845	11.0	1254	4.0298	1.0	184.2375	0.6568	0.5854	0.6568	0.6565	0.6562
0.5983	12.0	1368	5.0009	1.0	163.0190	0.6545	0.5727	0.6557	0.6548	0.6551
0.49	13.0	1482	18.6442	1.0	172.3950	0.6008	0.5693	0.6014	0.6011	0.6008

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.3.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

2B params

Tensor type

F32

Model tree for contemmcm/f480c99337292f41ae4e3aeb29048494

Base model

meta-llama/Llama-3.1-8B

Finetuned

(1440)

this model