Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,5 @@
|
|
|
|
|
|
|
|
| 1 |
Standard BERT tokenizers (like bert-base-uncased) often struggle with Nigerian linguistic nuances. They tend to break down local words into meaningless sub-tokens (e.g., "Owanbe" might become "Ow", "##an", "##be").
|
| 2 |
|
| 3 |
NairaBERT Tokenizer was trained to recognize these as high-frequency units, ensuring that the model preserves the semantic meaning of Nigerian-centric text.
|
|
|
|
| 1 |
+
## Nairabert-tokenizer
|
| 2 |
+
|
| 3 |
Standard BERT tokenizers (like bert-base-uncased) often struggle with Nigerian linguistic nuances. They tend to break down local words into meaningless sub-tokens (e.g., "Owanbe" might become "Ow", "##an", "##be").
|
| 4 |
|
| 5 |
NairaBERT Tokenizer was trained to recognize these as high-frequency units, ensuring that the model preserves the semantic meaning of Nigerian-centric text.
|