bert-mini-amharic-16k

This model has the same architecture as bert-mini and was pretrained from scratch using the Amharic subsets of the oscar and mc4 datasets, on a total of 165 Million tokens. It achieves the following results on the evaluation set:

Loss: 2.59
Perplexity: 13.33

Even though this model only has 7.5 Million parameters, its perplexity score is comparable to the 36x larger 279 Million parameter xlm-roberta-base model on the same Amharic evaluation set.

Downloads last month: 7

Safetensors

Model size

7.57M params

Tensor type

F32

rasyosef
/

bert-mini-amharic-16k

bert-mini-amharic-16k

Datasets used to train rasyosef/bert-mini-amharic-16k