rada-nlp

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
1.8877	1.0	4	2.6676	32.1274	17.427	27.543	28.0416
1.8556	2.0	8	2.6705	31.2511	16.3095	26.6854	26.8166
1.8127	3.0	12	2.6705	31.037	16.0077	26.813	26.6464
1.784	4.0	16	2.6686	31.5008	16.2333	26.9957	26.7969
1.7672	5.0	20	2.6711	31.2118	15.9968	26.9476	26.9864
1.7407	6.0	24	2.6716	31.4189	15.9951	26.8681	26.7424
1.742	7.0	28	2.6701	30.9705	16.0005	26.5473	26.8081
1.7356	8.0	32	2.6687	31.906	17.254	27.7267	27.6687
1.7271	9.0	36	2.6654	31.8302	17.1851	27.4294	27.4945
1.7224	10.0	40	2.6606	31.5091	17.1353	27.8425	27.5751
1.7207	11.0	44	2.6575	31.6189	17.3582	27.5163	27.519
1.7404	12.0	48	2.6539	32.0071	17.1878	27.6051	27.7916
1.7213	13.0	52	2.6504	32.6314	17.5002	28.0328	28.0245
1.7606	14.0	56	2.6472	32.5161	17.4726	28.16	28.4421
1.7839	15.0	60	2.6444	32.3599	17.9836	27.9445	28.0023
1.812	16.0	64	2.6418	32.2628	17.6188	28.3685	28.3035

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support