Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

hafeez007
/
balochi-tokenizers

Text Generation
Baluchi
English
sentencepiece
tokenizer
wordpiece
bpe
balochi
southern-balochi
low-resource-nlp
perso-arabic
nlp
gemma
bert
roberta
Model card Files Files and versions
xet
Community
balochi-tokenizers / Code
118 kB
Ctrl+K
Ctrl+K
  • 1 contributor
History: 1 commit
hafeez007's picture
hafeez007
Update tokenizer models and README
e899795 verified 27 days ago
  • Analyze_Vocab_Pruning.py
    4.81 kB
    Update tokenizer models and README 27 days ago
  • Balochi_Data_Cleaning_Pipeline.py
    18.8 kB
    Update tokenizer models and README 27 days ago
  • Renyi_Entropy_Analysis.py
    9.15 kB
    Update tokenizer models and README 27 days ago
  • Tokenizers_Comparison.py
    52.8 kB
    Update tokenizer models and README 27 days ago
  • Train_Tokenizers.py
    9.12 kB
    Update tokenizer models and README 27 days ago
  • Vocab_Size_Ablation.py
    12.9 kB
    Update tokenizer models and README 27 days ago
  • update_readmes.py
    10.6 kB
    Update tokenizer models and README 27 days ago