bb0d8a4cd61335946f97831cf1467ac1

This model is a fine-tuned version of Qwen/Qwen2.5-7B on the dim/tldr_news dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	38.6071	0	7.8504	0.1406	0.1058	0.1413	0.1399	0.1406
No log	1	178	85.3054	0.0078	10.0640	0.3906	0.2135	0.3906	0.3913	0.3913
No log	2	356	62.3548	0.0156	24.0571	0.2777	0.0902	0.2773	0.2770	0.2777
No log	3	534	10.8972	0.0312	44.4962	0.4141	0.2534	0.4141	0.4141	0.4134
No log	4	712	5.7564	0.0625	71.3863	0.5788	0.4109	0.5788	0.5788	0.5795
No log	5	890	3.5718	0.125	92.3898	0.6811	0.5464	0.6811	0.6811	0.6804
0.8589	6	1068	3.7937	0.25	144.7069	0.5547	0.4181	0.5540	0.5554	0.5540
2.781	7	1246	3.6570	0.5	198.8308	0.6776	0.6542	0.6783	0.6790	0.6776
4.5974	8.0	1424	3.8735	1.0	323.9630	0.6101	0.4236	0.6108	0.6101	0.6101
3.0648	9.0	1602	3.1023	1.0	317.9807	0.7273	0.7068	0.7287	0.7273	0.7273
2.6659	10.0	1780	3.2161	1.0	323.8179	0.7024	0.7106	0.7024	0.7024	0.7024
1.1587	11.0	1958	5.3691	1.0	316.1235	0.6293	0.6338	0.6286	0.6300	0.6300
0.8052	12.0	2136	6.6775	1.0	333.2781	0.6804	0.7018	0.6804	0.6811	0.6804
0.53	13.0	2314	6.1217	1.0	317.3129	0.6982	0.7250	0.6982	0.6989	0.6982

Safetensors

Model size

2B params

Tensor type

F32

Base model

Finetuned

(961)

this model