Polygl0t/LilTii-v0.1
Text Generation • 0.7B • Updated
• 14
A 0.6B Bengali Language Model that Outperforms Qwen.
Note 🧱 Base model pretrained only with Bengali text.
Note 🧱 Base model pretrained with a Bengali + English mixture.
Note 📚 Pretraining dataset.
Note 📚 Annotations to train classifiers/filters (Educational).
Note 📚 Annotations to train classifiers/filters (Toxicity).
Note 🎯 Quality Filter (Educational)
Note 🎯 Quality Filter (Toxicity)
Note 📚 Data used to train the LilTii tokenizer.