xlm-roberta-large-bm

This model is a fine-tuned version of oza75/xlm-roberta-large-bm-cpt on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1.75e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 4
total_train_batch_size: 384
total_eval_batch_size: 96
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 0.06
num_epochs: 50.0

Training Loss	Epoch	Step	Validation Loss	Accuracy
23.4593	2.1994	200	5.6475	0.6440
20.8680	4.3989	400	5.0077	0.6808
19.5976	6.5983	600	4.6963	0.6985
18.6174	8.7978	800	4.4655	0.7104
17.8897	10.9972	1000	4.3012	0.7211
17.1157	13.1884	1200	4.2094	0.7253
16.6388	15.3878	1400	4.0842	0.7310
16.5434	17.5873	1600	4.0012	0.7376
16.2096	19.7867	1800	3.9556	0.7376
15.9932	21.9861	2000	3.8802	0.7426
15.3912	24.1773	2200	3.8404	0.7442
15.2444	26.3767	2400	3.7937	0.7475
15.3315	28.5762	2600	3.7470	0.7488
15.2022	30.7756	2800	3.7129	0.7513
15.1072	32.9751	3000	3.7143	0.7516
14.8385	35.1662	3200	3.7064	0.7505
14.7511	37.3657	3400	3.6804	0.7535
14.9010	39.5651	3600	3.6705	0.7533
14.8393	41.7645	3800	3.6890	0.7521
14.8144	43.9640	4000	3.6512	0.7547
14.5626	46.1551	4200	3.6255	0.7557
14.5697	48.3546	4400	3.6409	0.7549

Safetensors

Model size

0.6B params

Tensor type

F32

Base model

Finetuned

(1)

this model