LilTii - a Polygl0t Collection

Polygl0t 's Collections

ViTucano-v1 (Portuguese)

Tucano (Portuguese)

TeenyTinyLlama (Portuguese)

LilTii

updated 22 days ago

A 0.6B Bengali Language Model that Outperforms Qwen.

Polygl0t/LilTii-v0.1

Text Generation • 0.7B • Updated Mar 5 • 6

Note 🧱 Base model pretrained only with Bengali text.
Polygl0t/LilTii-v0.2

Text Generation • 0.7B • Updated Mar 5 • 38

Note 🧱 Base model pretrained with a Bengali + English mixture.
Polygl0t/gigakriya-v1

Viewer • Updated 22 days ago • 41.6M • 507

Note 📚 Pretraining dataset.
Polygl0t/GigaKriya-ablation-EDU-1.5B

Text Generation • 2B • Updated 24 days ago • 13

Note 🔬 Ablation Experiment (Edu)
Polygl0t/GigaKriya-ablation-NonEDU-1.5B

Text Generation • 2B • Updated 24 days ago • 10

Note 🔬 Ablation Experiment (NonEdu)
Polygl0t/bengali-edu-qwen-annotations

Viewer • Updated Mar 5 • 320k • 60

Note 📚 Annotations to train classifiers/filters (Educational).
Polygl0t/bengali-banglabert-edu-classifier

Text Classification • 34.7M • Updated Mar 5 • 5

Note 🎯 Quality Filter (Educational)
Polygl0t/bengali-toxicity-qwen-annotations

Viewer • Updated Mar 5 • 320k • 16

Note 📚 Annotations to train classifiers/filters (Toxicity).
Polygl0t/bengali-banglabert-toxicity-classifier

Text Classification • 34.7M • Updated Mar 5 • 6

Note 🎯 Quality Filter (Toxicity)
Polygl0t/tokenizers

Viewer • Updated Mar 5 • 8.98M • 538

Note 📚 Data used to train the LilTii tokenizer.