SindhiLM-Tokenizer-v2 / tokenizer.json

Commit History

SindhiLM-Tokenizer-v2: morpheme-aware BPE with SindhiNLTK pre-segmentation, fixed noise filter, byte-ghost reduction
f75c2c6
verified

aakashMeghwar01 commited on