This model is a continued pre-trained version of xlm-roberta-base on a various cleaned community corpus. It achieves the following results on the evaluation set:

Loss: 1.1697

We thank Microsoft Accelerating Foundation Models Research Program for supporting our research. Authors: Mammad Hajili, Duygu Ataman

Model description

The model was trained on masked language model task on a single V100 GPU for 68 hours. For downstream tasks, it requires to be fine-tuned based on objective of the task.

Training and evaluation data

The training data is clean mix of various Azerbaijani corpus shared by the community.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
1.6126	0.2500	100910	1.4818
1.4961	0.5000	201820	1.4163
1.4324	0.7500	302730	1.3371
1.387	1.0000	403640	1.3070
1.3488	1.2500	504550	1.2649
1.323	1.5000	605460	1.2581
1.3006	1.7500	706370	1.2066
1.2866	2.0000	807280	1.2095
1.2646	2.2500	908190	1.2019
1.2492	2.5000	1009100	1.1779
1.2425	2.7500	1110010	1.1742

Validation loss at epoch 3: 1.1697
Perplexity at epoch 3: 3.22

Framework versions

Transformers 4.40.1
Pytorch 2.3.0+cu121
Datasets 2.19.0
Tokenizers 0.19.1

Downloads last month: 9

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for hajili/roberta-base-azerbaijani

Base model

FacebookAI/xlm-roberta-base

Finetuned

(4005)

this model

hajili
/

roberta-base-azerbaijani