Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

rtferraz
/
domainTokenizer

Model card Files Files and versions
xet
Community
domainTokenizer / src /domain_tokenizer /tokenizers
23.2 kB
Ctrl+K
Ctrl+K
  • 1 contributor
History: 4 commits
rtferraz's picture
rtferraz
CRITICAL FIX: Switch from ByteLevel to Whitespace pre-tokenizer β€” fixes 42% UNK rate on domain token sequences
a9c4a62 verified 1 day ago
  • __init__.py
    269 Bytes
    Add tokenizers package init 8 days ago
  • domain_tokenizer.py
    10.4 kB
    CRITICAL FIX: Switch from ByteLevel to Whitespace pre-tokenizer β€” fixes 42% UNK rate on domain token sequences 1 day ago
  • field_tokenizers.py
    12.5 kB
    Add field_tokenizers.py β€” Sign, MagnitudeBucket, Calendar, Categorical, DiscreteNumerical tokenizers 8 days ago