Refactor to standalone v2.0: zero dependencies, internal engine, removed zemberek/hf wrapper edec8b7 Ethosoft commited on 21 days ago
Add 64K vocab + transformers-compatible tokenizer (encode/decode/batch) c1929bd verified nmstech commited on 21 days ago
Migrate to zemberek-python, remove JVM dependency and 31MB JAR, apply O(N^2) init fix 8c72d18 nmstech commited on Mar 19
Rename project from TurkTokenizer to NedoTurkishTokenizer cfffd93 nmstech Claude Opus 4.6 commited on Mar 18
Add smart ACRONYM detection: TDK-based disambiguation for uppercase tokens fcd513a verified nmstech commited on Mar 16
Fix broken placeholder mechanism: replace with segment-based tokenization 58e6961 verified nmstech commited on Mar 16
Upload turk_tokenizer/data/turkish_proper_nouns.txt with huggingface_hub 986e073 verified nmstech commited on Mar 16
Upload turk_tokenizer/_preprocessor.py with huggingface_hub 8f794ec verified nmstech commited on Mar 16
Replace hardcoded base lists with TDK vocab lookup in apostrophe split 183e656 verified nmstech commited on Mar 16
Fix İ lowercase bug + apostrophe merge for BPE-split foreign words 63dbb3f verified nmstech commited on Mar 16
Fix İ lowercase bug + apostrophe merge for BPE-split foreign words 6330193 verified nmstech commited on Mar 16
Fix build backend: setuptools.backends.legacy → setuptools.build_meta 091c896 verified nmstech commited on Mar 16