rasyosef
/

bert-tiny-amharic

Model card Files Files and versions

rasyosef commited on Jun 16, 2024

Commit

87c5d6d

·

verified ·

1 Parent(s): a4db597

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -18,7 +18,8 @@ widget:
 # bert-tiny-amharic
-This model has the same architecture as [bert-tiny](https://huggingface.co/prajjwal1/bert-tiny) and was pretrained from scratch using the Amharic subsets of the [oscar](https://huggingface.co/datasets/oscar), [mc4](https://huggingface.co/datasets/mc4), and [amharic-sentences-corpus](https://huggingface.co/datasets/rasyosef/amharic-sentences-corpus) datasets, on a total of **290 Million tokens**. The tokenizer was trained from scratch on the same text corpus, and had a vocabulary size of 28k.
 It achieves the following results on the evaluation set:
 - `Loss: 4.27`
 - `Perplexity: 71.52`

 # bert-tiny-amharic
+This model has the same architecture as [bert-tiny](https://huggingface.co/prajjwal1/bert-tiny) and was pretrained from scratch using the Amharic subsets of the [oscar](https://huggingface.co/datasets/oscar), [mc4](https://huggingface.co/datasets/mc4), and [amharic-sentences-corpus](https://huggingface.co/datasets/rasyosef/amharic-sentences-corpus) datasets, on a total of **290 million tokens**. The tokenizer was trained from scratch on the same text corpus, and had a vocabulary size of 28k.
 It achieves the following results on the evaluation set:
 - `Loss: 4.27`
 - `Perplexity: 71.52`