Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
hafeez007
/
balochi-tokenizers
like
0
Text Generation
balochiml/balochi-dedup-corpus
Baluchi
English
sentencepiece
tokenizer
wordpiece
bpe
balochi
southern-balochi
low-resource-nlp
perso-arabic
nlp
gemma
bert
roberta
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Copy to bucket
new
main
balochi-tokenizers
/
Code
118 kB
Ctrl+K
Ctrl+K
1 contributor
History:
1 commit
hafeez007
Update tokenizer models and README
e899795
verified
27 days ago
Analyze_Vocab_Pruning.py
Safe
4.81 kB
Update tokenizer models and README
27 days ago
Balochi_Data_Cleaning_Pipeline.py
Safe
18.8 kB
Update tokenizer models and README
27 days ago
Renyi_Entropy_Analysis.py
9.15 kB
Update tokenizer models and README
27 days ago
Tokenizers_Comparison.py
52.8 kB
Update tokenizer models and README
27 days ago
Train_Tokenizers.py
9.12 kB
Update tokenizer models and README
27 days ago
Vocab_Size_Ablation.py
12.9 kB
Update tokenizer models and README
27 days ago
update_readmes.py
10.6 kB
Update tokenizer models and README
27 days ago