sijirama
/

nairabert-tokenizer

Model card Files Files and versions

sijirama commited on Jan 19

Commit

9da0c9e

·

verified ·

1 Parent(s): 6bc4258

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -1,3 +1,5 @@
 Standard BERT tokenizers (like bert-base-uncased) often struggle with Nigerian linguistic nuances. They tend to break down local words into meaningless sub-tokens (e.g., "Owanbe" might become "Ow", "##an", "##be").
 NairaBERT Tokenizer was trained to recognize these as high-frequency units, ensuring that the model preserves the semantic meaning of Nigerian-centric text.

+## Nairabert-tokenizer
 Standard BERT tokenizers (like bert-base-uncased) often struggle with Nigerian linguistic nuances. They tend to break down local words into meaningless sub-tokens (e.g., "Owanbe" might become "Ow", "##an", "##be").
 NairaBERT Tokenizer was trained to recognize these as high-frequency units, ensuring that the model preserves the semantic meaning of Nigerian-centric text.