hellosindh
/

sindhi-bert-base

masked-language-modeling

Model card Files Files and versions

hellosindh commited on Mar 9

Commit

a8234de

·

verified ·

1 Parent(s): ec371f4

Update README.md

Files changed (1) hide show

README.md +0 -25

README.md CHANGED Viewed

@@ -70,27 +70,6 @@ Tested on 10 Sindhi sentences after 5 epochs of training:
 Overall: 50% top-1 accuracy after 5 epochs on 500K sentences.
 Results improve significantly with more training.
-## Comparison With Other Models
-| Model | Type | Perplexity | Fill-mask Quality |
-|---|---|---|---|
-| mBERT fine-tuned | Multilingual | 4.19 | Poor — predicts punctuation |
-| XLM-R fine-tuned | Multilingual | 5.88 | Good — 80% correct |
-| Sindhi-BERT scratch | Sindhi only | 78.10 | 50% — still improving |
-Note: Perplexity is not directly comparable between from-scratch and fine-tuned models. SindhiBERT starts from zero knowledge while mBERT/XLM-R start from pre-trained multilingual weights. SindhiBERT predictions are always real Sindhi words — never punctuation.
-## Roadmap
-- [x] Train custom Sindhi BPE tokenizer (32K vocab)
-- [x] Session 1 — 500K lines, 5 epochs, A100
-- [ ] Session 2 — full corpus 2.1M lines
-- [ ] Session 3 — more epochs, lower learning rate
-- [ ] Fine-tune for spell checking
-- [ ] Fine-tune for next word prediction
-- [ ] Fine-tune for named entity recognition
-- [ ] Sindhi chatbot
 ## Citation
 If you use this model please cite:
@@ -99,7 +78,3 @@ sindhibert2026,
   title  = Sindhi-BERT: A Sindhi Language Model Trained From Scratch,
   year   = 2026,
   url    = https://huggingface.co/hellosindh/sindhi-bert-base
-## About
-This model is part of a larger effort to build complete NLP tools for the Sindhi language — one of the oldest languages in the world with over 30 million speakers across Pakistan and India.

 Overall: 50% top-1 accuracy after 5 epochs on 500K sentences.
 Results improve significantly with more training.
 ## Citation
 If you use this model please cite:
   title  = Sindhi-BERT: A Sindhi Language Model Trained From Scratch,
   year   = 2026,
   url    = https://huggingface.co/hellosindh/sindhi-bert-base