Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

cmeister
/
apertus_v2_tokenizer

Model card Files Files and versions
xet
Community
apertus_v2_tokenizer
Ctrl+K
Ctrl+K
  • 1 contributor
History: 31 commits
cmeister's picture
cmeister
preliminary_mul_200k: replace with v200064 (vocab 200064, 128-aligned; IDs 0-199999 identical to prior 200000 build) + same special tokens + post-processor
6d07642 verified about 4 hours ago
  • preliminary_enh
    preliminary_enh: rename PII tokens to spreadsheet order <iban-pii>/<email-pii>/<ip-pii> 5 days ago
  • preliminary_euh
    preliminary_euh: rename PII tokens to spreadsheet order <iban-pii>/<email-pii>/<ip-pii> 5 days ago
  • preliminary_mul
    preliminary_mul: rename PII tokens to spreadsheet order <iban-pii>/<email-pii>/<ip-pii> 5 days ago
  • preliminary_mul_200k
    preliminary_mul_200k: replace with v200064 (vocab 200064, 128-aligned; IDs 0-199999 identical to prior 200000 build) + same special tokens + post-processor about 4 hours ago
  • .gitattributes
    1.59 kB
    add preliminary_enh (engfull_eu3, 131k English-preserving) + preliminary_mul_200k (eusino_v2c, 200k), both with BOS/EOS post-processor 11 days ago