Update README.md
Browse files
README.md
CHANGED
|
@@ -16,7 +16,7 @@ widget:
|
|
| 16 |
example_title: Example 2
|
| 17 |
---
|
| 18 |
|
| 19 |
-
# bert-
|
| 20 |
|
| 21 |
This model has the same architecture as [bert-tiny](https://huggingface.co/prajjwal1/bert-tiny) and was pretrained from scratch using the Amharic subsets of the [oscar](https://huggingface.co/datasets/oscar), [mc4](https://huggingface.co/datasets/mc4), and [amharic-sentences-corpus](https://huggingface.co/datasets/rasyosef/amharic-sentences-corpus) datasets, on a total of `290 Million` tokens. The tokenizer was trained from scratch on the same text corpus, and had a vocabulary size of 28k.
|
| 22 |
It achieves the following results on the evaluation set:
|
|
@@ -24,6 +24,7 @@ It achieves the following results on the evaluation set:
|
|
| 24 |
- `Perplexity: 71.52`
|
| 25 |
|
| 26 |
This model has just `4.18M` parameters.
|
|
|
|
| 27 |
# How to use
|
| 28 |
You can use this model directly with a pipeline for masked language modeling:
|
| 29 |
|
|
|
|
| 16 |
example_title: Example 2
|
| 17 |
---
|
| 18 |
|
| 19 |
+
# bert-tiny-amharic
|
| 20 |
|
| 21 |
This model has the same architecture as [bert-tiny](https://huggingface.co/prajjwal1/bert-tiny) and was pretrained from scratch using the Amharic subsets of the [oscar](https://huggingface.co/datasets/oscar), [mc4](https://huggingface.co/datasets/mc4), and [amharic-sentences-corpus](https://huggingface.co/datasets/rasyosef/amharic-sentences-corpus) datasets, on a total of `290 Million` tokens. The tokenizer was trained from scratch on the same text corpus, and had a vocabulary size of 28k.
|
| 22 |
It achieves the following results on the evaluation set:
|
|
|
|
| 24 |
- `Perplexity: 71.52`
|
| 25 |
|
| 26 |
This model has just `4.18M` parameters.
|
| 27 |
+
|
| 28 |
# How to use
|
| 29 |
You can use this model directly with a pipeline for masked language modeling:
|
| 30 |
|