sijirama commited on
Commit
9da0c9e
·
verified ·
1 Parent(s): 6bc4258

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -1,3 +1,5 @@
 
 
1
  Standard BERT tokenizers (like bert-base-uncased) often struggle with Nigerian linguistic nuances. They tend to break down local words into meaningless sub-tokens (e.g., "Owanbe" might become "Ow", "##an", "##be").
2
 
3
  NairaBERT Tokenizer was trained to recognize these as high-frequency units, ensuring that the model preserves the semantic meaning of Nigerian-centric text.
 
1
+ ## Nairabert-tokenizer
2
+
3
  Standard BERT tokenizers (like bert-base-uncased) often struggle with Nigerian linguistic nuances. They tend to break down local words into meaningless sub-tokens (e.g., "Owanbe" might become "Ow", "##an", "##be").
4
 
5
  NairaBERT Tokenizer was trained to recognize these as high-frequency units, ensuring that the model preserves the semantic meaning of Nigerian-centric text.