Koshur Pixel: a large-scale synthetic ocr dataset for kashmiri Paper • 2606.23144 • Published 3 days ago • 1
Koshur Diacritizer: A Byte-Level Sequence-to-Sequence Model for Kashmiri Diacritic Restoration Paper • 2606.15883 • Published 11 days ago
ks-pret-5m: a 5 million word, 12 million token kashmiri pretraining dataset Paper • 2604.11066 • Published Apr 13