Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Ethosoft
/
NedoTurkishTokenizer
like
1
Follow
Ethosoft
4
Model card
Files
Files and versions
xet
Community
fcd513a
NedoTurkishTokenizer
/
turk_tokenizer
32.4 MB
Ctrl+K
Ctrl+K
6 contributors
History:
10 commits
nmstech
Add smart ACRONYM detection: TDK-based disambiguation for uppercase tokens
fcd513a
verified
about 1 month ago
data
Upload turk_tokenizer/data/turkish_proper_nouns.txt with huggingface_hub
about 1 month ago
__init__.py
Safe
657 Bytes
Initial release: TurkTokenizer v1.0.0 β TR-MMLU 92%
about 1 month ago
_acronym_dict.py
Safe
4.83 kB
Add smart ACRONYM detection: TDK-based disambiguation for uppercase tokens
about 1 month ago
_allomorph.py
Safe
2.23 kB
Initial release: TurkTokenizer v1.0.0 β TR-MMLU 92%
about 1 month ago
_compound.py
Safe
2.95 kB
Initial release: TurkTokenizer v1.0.0 β TR-MMLU 92%
about 1 month ago
_context_aware.py
Safe
2.09 kB
Initial release: TurkTokenizer v1.0.0 β TR-MMLU 92%
about 1 month ago
_java_check.py
Safe
2.48 kB
Initial release: TurkTokenizer v1.0.0 β TR-MMLU 92%
about 1 month ago
_medical_vocab.py
Safe
7.06 kB
Initial release: TurkTokenizer v1.0.0 β TR-MMLU 92%
about 1 month ago
_normalizer.py
Safe
10.7 kB
Add smart ACRONYM detection: TDK-based disambiguation for uppercase tokens
about 1 month ago
_preprocessor.py
Safe
8.86 kB
Upload turk_tokenizer/_preprocessor.py with huggingface_hub
about 1 month ago
_root_validator.py
Safe
6.66 kB
Initial release: TurkTokenizer v1.0.0 β TR-MMLU 92%
about 1 month ago
_suffix_expander.py
Safe
8.39 kB
Initial release: TurkTokenizer v1.0.0 β TR-MMLU 92%
about 1 month ago
_tdk_vocab.py
Safe
3.76 kB
Load TDK words from HF repo, fallback to TDK API
about 1 month ago
tokenizer.py
12.6 kB
Add smart ACRONYM detection: TDK-based disambiguation for uppercase tokens
about 1 month ago